# Tuning regularization strength

This post is the fifth in a series applying machine learning techniques to an energy problem. The goal of this series is for me to teach myself machine learning techniques by developing models to forecast the UK Imbalance Price

I see huge promise in what machine learning can do in the energy industry.  This series details my initial efforts in gaining an understanding of machine learning.

In the previous post in this series we introduced a Multi Layer Perceptron neural network to predict the UK Imbalance Price.   This post will dig a bit deeper into optimizing the degree of overfitting of our model.  We do this through tuning the strength of regularization.

What is regularization

Regularization is a tool used to combat the problem of overfitting a model.  Overfitting occurs when a model starts to fit the training data too well – meaning that performance on unseen data is poor.

To prevent overfitting to the training data we can try to keep the model parameters small using regularization.  If we include a regularization term in the cost function the model minimizes we can encourage the model to use smaller parameters.

The equation below shows the loss function minimized during model training.  The first term is the square of the error.  The second term is the regularization term – with lambda shown as the parameter to control regularization.  In order to be consistent with scikit-learn, we will refer to this parameter as alpha.

Regularization penalizes large values of the model parameters (theta) based on the size of the regularization parameter.  Regularization comes in two flavours – L1 and L2.  The MLP Regressor model in scikit-learn uses L2 regularization.

Setting alpha too large will result in underfitting (also known as a high bias problem).  Setting alpha too small may lead to overfitting (a high variance problem).

Setting alpha in the UK Imbalance Price model

Here we will optimize alpha by iterating through a number of different values.

We can then evaluate the degree of overfitting by looking at how alpha affects the loss function and the Mean Absolute Scaled Error (MASE).  The loss function is the cost function the model minimizes during training.  The MASE is the metric we used to judge model performance.

We use K-fold cross validation to get a sense of the degree of overfitting.  Comparing the cross validation to training performance gives us an idea of how much our model is overfitting.  Using K-fold cross validation allows us to leave the test data free for evaluating model performance only.

Figure 1 & 2 show the results of optimizing alpha for a MLP Regressor with five hidden layers of 1344 nodes each.  The input feature set is the previous one week of Imbalance Price data.

Figure 1 shows the effect of alpha on the loss function for the training and cross validation sets.  We would expect to see the training loss increase as alpha increases.  Small values of alpha should allow the model to overfit.  We would also expect to see the loss for the training and CV sets to coverge as alpha gets large.

###### Figure 1 – The effect of alpha on the loss function

Figure 1 shows the expected trend with the training loss increasing as alpha increases – except for alpha = 0.0001 which shows a high training loss.  This I don’t understand!  I was expecting that training loss would decrease with decreasing alpha.

Figure 1 shows the effect of alpha on the Mean Absolute Squared Error for the training, cross validation and test sets.

###### Figure 2 – The effect of alpha on the MASE for the training, cross validation and test data

Figure 2 also shows a confusing result.  I was expecting to see the MASE be a minimum at the smallest alpha and increase as alpha increased.  This is because small values of alpha should allow the model to overfit (and improve performance). Instead we see that the best training MASE is at alpha = 0.01.

Figure 2 shows a minimum for the test MASE at alpha = 0.01 – this is also the minimum for the training data.

Going forward I will be using a value of 0.01 for alpha as this shows a good balance between minimizing the loss for the training and cross validation sets.

Table 1 shows the results for the model as it currently stands.

###### Table 1 – Model performance with alpha = 0.01

 Training MASE 0.3345 Cross validation MASE 0.589 Test MASE 0.5212

Next step in this project is looking at previously unseen data for December 2016 – stay tuned.

# What is the UK Imbalance Price?

This post is the first in a series applying machine learning techniques to an energy problem.  The goal of this series is to develop models to forecast the UK Imbalance Price.

What is the Imbalance Price?

The Imbalance Price is what generators or suppliers pay for any unexpected imbalance.

In the UK generators and suppliers (known as Parties) contract with each other for the supply of electricity.  Generators sell electricity to suppliers who then sell power to end use customers.

As System Operator National Grid handles real time balancing of the UK grid.  Parties submit details of their contracts to National Grid one hour before delivery.  This allows National Grid to understand the expected imbalance.

National Grid will then take actions to correct any predicted imbalance.  For example the Balancing Mechanism allows Parties to submit Bids or Offers to change their position by a certain volume at a certain price.

National Grid also the ability to balance the system using actions outside the Balancing Mechanism.  Examples include:

• Short Term Operating Reserve power plants.
• Frequency Response plants used to balance real time.
• Reserve Services.

More drastic scenarios National Grid may call upon closed power plants or disconnect customers.  National Grid will always reduce the cost of balancing within technical constraints.

Parties submit their expected positions one hour before delivery –  but they do not always meet these contracted positions!

A supplier may underestimate their customers demand.  A power plant might face an unexpected outage.  The difference between the contracted and actual position is charged using the Imbalance Price.

ELEXON uses the costs that National Grid incurs in correcting imbalance to calculate the Imbalance Price.  This is then used to charge Parties for being out of balance with their contracts. ELEXON details the process for the calculation of the Imbalance Price here.

What data is available?

ELEXON make available a significant amount of data online.  This includes data for the Imbalance Price calculation as well as data related to the UK grid.  We will make use of the ELEXON API to access data.

The first iteration of this model will be auto-regressive.  We will use only the previous values of the Imbalance Price to predict future values.

As we continue to develop the model we will add more data and explain it’s relevance to the Imbalance Price.  Adding data iteratively will allow us to understand what value the more data has to the model.

Next steps

The next post will be the Python code used to scrape data using the Elexon API.   We will then do some visualization to analyze the Imbalance Price data.

Posts after that will be developing models in Python to predict the Imbalance Price.

Part Two – Elexon API Web Scraping using Python