Forecasting UK Imbalance Price using a Multilayer Perceptron Neural Network

This post is the fourth in a series applying machine learning techniques to an energy problem. The goal of this series is to develop models to forecast the UK Imbalance Price.  

I see huge promise in what machine learning can do in the energy industry.  This series details my initial efforts in gaining an understanding of machine learning.  

Part One – What is the UK Imbalance Price?
Part Two – Elexon API Web Scraping using Python
Part Three – Imbalance Price Visualization

Finally, the fun stuff!  In this post we will forecast the UK Imbalance Price using a machine learning model.  We will use a simple neural network package in scikit-learn – the Mulitlayer Perceptron (MLP).


What is a Multilayer Perceptron

An MLP is a feedforward artifical neural network.

The network is composed of multiple layers of nodes.  The output from each node is fed to each node of the next layer.  Each node takes the inputs from the previous layer and processes them through an activation function such as the hyperbolic tan function.  Both classification and regression are possible in a MLP by changing the activation function.

Learning occurs through optimization of a cost function.  The signal from the input to output layer propagates forward through the network.  The signal from the error then backpropagates from the output layer towards the input layer.

The cost function is optimized by adjusting the weights using an algorithm such as Stochasitic Gradient Descent (SGD).  In our model we use a variation of SGD known as ‘adam’.

MLP in scikit-learn

scikit-learn is a Python machine learning library.  We will use the MLPRegressor model to predict the Imbalance Price.  We also make use of a few other features of scikit-learn – standardization, grid searching and pipelines.

Standardization is a pre-processing technique required for many machine learning algorithms.  Here we use the StandardScaler to transform our data to a mean of zero and with variance in the same order.

GridSearchCV is a module of scikitlearn that iterates through a user defined set of model parameters.  We use a grid search to identify the optimal set of:

  • Number of lags = the size of the lags used as the model features.
  • Model strucutre = both the width (number of nodes per layer) and length (number of layers) of the network.
  • Activation function = logistic sigmoid, hyperbolic tan or a rectified linear function.
  • Alpha = the parameter used in L2 regularization.

GridSearchCV also allows cross validation to be used to understand model overfitting.  The standard deviation of the test scores is used to understand the degree of overfitting.

A pipeline allows multiple steps of the model to be joined together.  The pipeline used in the model is a combination of the StandardScaler and MLPRegressor.

MLP regressor to predict UK Imbalance Price

This model is trained with UK Imbalance Price data obtained from Elexon.  As we know the value of our output this is a supervised learning problem.

The features of our model are the previous values of the Imbalance Price.  The number of lags included is a parameter that we iterate across.

Grid searching uses a custom MASE function to score the model runs.  MASE allows direct comparison with a naive forecast.  Here we set the naive forecast as the value at the same time yesterday (a lag of 48).

The lower the MASE the more accurate the forecast.  A MASE less than 1 means the forecast is superior to the naive forecast.  Greater than one means the naive forecast is better than our modeled forecast.

Figure 1 – A sample of the final forecast
Lags = 1344, hidden layers = (50,50,50,50,50), ‘relu’ activation function, alpha = 0.001

Experiment 1 – Varying number of lags 

Figure 2 – Experiment on the number of lags
Hidden layers = (50,50,50,50,50), ‘relu’ activation function, alpha = 0.001

As expected increasing the number of features leads to an improved MASE.  Also of interest here is the difference between the MASE on the test & train data – indicating overfitting.

Experiment 2 – Varying model structure

Figure 3 – Experiment on number of layers
Lags = 1344, nodes per layer = 50, ‘relu’ activation function, alpha = 0.001

Figure 4 – Experiment on nodes per hidden layer
Lags = 1344, number of layers = 5, ‘relu’ activation function, alpha = 0.001

Experiment 3 – Varying activation function

Table 1 - Results for the activation function experiment
Activation functionRectified linear (relu)Hyperbolic Tab (tanh)
MASE train0.1840.305
MASE test0.6530.698
MASE all0.3220.358
Test score standard deviation0.0370.008

Table 1 shows that our rectified linear function is slightly superior in terms of MASE.  It also suffers from a higher degree of overfitting (seen by a higher test score standard deviation).

When inspecting the performance of the hyperbolic tanh function it seems that the model struggles to capture the peaks in the price (Figure 5 below).  It does however seem to

Figure 5 – Performance of the hyperbolic tan activation function
Lags = 1344, hidden layers = (50,50,50,50,50), ‘relu’ activation function, alpha = 0.001

Experiment 4 – Varying alpha

Figure 6 & 7 – Effect of alpha on MASE for the whole data set & Test score standard deviation
Hidden layer size = (50, 50, 50, 50, 50),  ‘relu’ activation function

Alpha is a parameter that balances between minimizing the cost function and overfitting the model.  Increasing alpha will increase the size of the regularization term

Interestingly we see that both a high and low alpha lead to poorer MASE scores.  We also see that increasing alpha leads to a decrease in the test score standard deviation as expected.

Future work

  • Using December 2016 data to test model performance on data that wasn’t used in either training or testing.
  • More robust use of training, testing and validation sets.
  • More work to understand the value of different activation functions – perhaps using hyperbolic tan for the lower price periods and rectified linear to capture the peaks.
  • Experimenting with different network structures such as recurrent neural networks.
  • Introducing more data such as Net Imbalance Volume.

The Code

This code uses database called ‘ELEXON data.sqlite’.  You can generate one of these files using the code in our previous post on Scraping Data from the Elexon API.