*This post is the fourth in a series applying machine learning techniques to an energy problem. The goal of this series is to develop models to forecast the UK Imbalance Price. *

*I see huge promise in what machine learning can do in the energy industry. This series details my initial efforts in gaining an understanding of machine learning. *

Part One – What is the UK Imbalance Price?

Part Two – Elexon API Web Scraping using Python

Part Three – Imbalance Price Visualization

Finally, the fun stuff! In this post we will forecast the UK Imbalance Price using a machine learning model. We will use a simple neural network package in scikit-learn – the Mulitlayer Perceptron (MLP).

**What is a Multilayer Perceptron**

An MLP is a feedforward artifical neural network.

The network is composed of multiple layers of nodes. The output from each node is fed to each node of the next layer. Each node takes the inputs from the previous layer and processes them through an activation function such as the hyperbolic tan function. Both classification and regression are possible in a MLP by changing the activation function.

Learning occurs through optimization of a cost function. The signal from the input to output layer propagates forward through the network. The signal from the error then backpropagates from the output layer towards the input layer.

The cost function is optimized by adjusting the weights using an algorithm such as Stochasitic Gradient Descent (SGD). In our model we use a variation of SGD known as ‘adam’.

**MLP in scikit-learn**

scikit-learn is a Python machine learning library. We will use the *MLPRegressor* model to predict the Imbalance Price. We also make use of a few other features of scikit-learn – standardization, grid searching and pipelines.

Standardization is a pre-processing technique required for many machine learning algorithms. Here we use the *StandardScaler* to transform our data to a mean of zero and with variance in the same order.

*GridSearchCV* is a module of scikitlearn that iterates through a user defined set of model parameters. We use a grid search to identify the optimal set of:

- Number of lags = the size of the lags used as the model features.
- Model strucutre = both the width (number of nodes per layer) and length (number of layers) of the network.
- Activation function = logistic sigmoid, hyperbolic tan or a rectified linear function.
- Alpha = the parameter used in L2 regularization.

GridSearchCV also allows cross validation to be used to understand model overfitting. The standard deviation of the test scores is used to understand the degree of overfitting.

A pipeline allows multiple steps of the model to be joined together. The pipeline used in the model is a combination of the *StandardScaler* and *MLPRegressor*.

**MLP regressor to predict UK Imbalance Price**

This model is trained with UK Imbalance Price data obtained from Elexon. As we know the value of our output this is a supervised learning problem.

The features of our model are the previous values of the Imbalance Price. The number of lags included is a parameter that we iterate across.

Grid searching uses a custom MASE function to score the model runs. MASE allows direct comparison with a naive forecast. Here we set the naive forecast as the value at the same time yesterday (a lag of 48).

The lower the MASE the more accurate the forecast. A MASE less than 1 means the forecast is superior to the naive forecast. Greater than one means the naive forecast is better than our modeled forecast.

###### Figure 1 – A sample of the final forecast

Lags = 1344, hidden layers = (50,50,50,50,50), ‘relu’ activation function, alpha = 0.001

**Experiment 1 – Varying number of lags **

###### Figure 2 – Experiment on the number of lags

Hidden layers = (50,50,50,50,50), ‘relu’ activation function, alpha = 0.001

As expected increasing the number of features leads to an improved MASE. Also of interest here is the difference between the MASE on the test & train data – indicating overfitting.

**Experiment 2 – Varying model structure**

###### Figure 3 – Experiment on number of layers

Lags = 1344, nodes per layer = 50, ‘relu’ activation function, alpha = 0.001

###### Figure 4 – Experiment on nodes per hidden layer

Lags = 1344, number of layers = 5, ‘relu’ activation function, alpha = 0.001

**Experiment 3 – Varying activation function**

Table 1 - Results for the activation function experiment | ||

Activation function | Rectified linear (relu) | Hyperbolic Tab (tanh) |

MASE train | 0.184 | 0.305 |

MASE test | 0.653 | 0.698 |

MASE all | 0.322 | 0.358 |

Test score standard deviation | 0.037 | 0.008 |

Table 1 shows that our rectified linear function is slightly superior in terms of MASE. It also suffers from a higher degree of overfitting (seen by a higher test score standard deviation).

When inspecting the performance of the hyperbolic tanh function it seems that the model struggles to capture the peaks in the price (Figure 5 below). It does however seem to

###### Figure 5 – Performance of the hyperbolic tan activation function

Lags = 1344, hidden layers = (50,50,50,50,50), ‘relu’ activation function, alpha = 0.001

**Experiment 4 – Varying alpha**

###### Figure 6 & 7 – Effect of alpha on MASE for the whole data set & Test score standard deviation

Hidden layer size = (50, 50, 50, 50, 50), ‘relu’ activation function

Alpha is a parameter that balances between minimizing the cost function and overfitting the model. Increasing alpha will increase the size of the regularization term

Interestingly we see that both a high and low alpha lead to poorer MASE scores. We also see that increasing alpha leads to a decrease in the test score standard deviation as expected.

**Future work**

- Using December 2016 data to test model performance on data that wasn’t used in either training or testing.
- More robust use of training, testing and validation sets.
- More work to understand the value of different activation functions – perhaps using hyperbolic tan for the lower price periods and rectified linear to capture the peaks.
- Experimenting with different network structures such as recurrent neural networks.
- Introducing more data such as Net Imbalance Volume.

**The Code**

This code uses database called ‘ELEXON data.sqlite’. You can generate one of these files using the code in our previous post on Scraping Data from the Elexon API.