Category Archives: Imbalance Price Forecasting

Tuning Model Structure – Number of Layers & Number of Nodes

Imbalance Price Forecasting is a series applying machine learning to forecasting the UK Imbalance Price.  

Last post I introduced a new version of the neural network I am building.  This new version is a feedforward fully connected neural network written in Python built using Keras.

I’m now working on tuning model hyperparameters and structure. Previously I setup two experiments looking at:

  1. Activation functions
    • concluded rectified linear (relu) is superior to tanh, sigmoid & linear.
  2. Data sources
    • concluded more data the better.

In this post I detail two more experiments:

  1. Number of layers
  2. Number of nodes per layer

Python improvements

I’ve made two improvements to my implementation of Keras.  An updated script is available on my GitHub.

I often saw during training that the model trained on the last epoch was not necessarily the best model. I have made use of a ModelCheckpoint that saves the weights of the best model trained.

The second change I have made is to include dropout layers after the input layer and each hidden layer.  This is a better implementation of dropout!

Experiment one – number of layers

Model parameters were:

  • 15,000 epochs. Trained in three batches. 10 folds cross-validation.
  • 2016 Imbalance Price & Imbalance Volume data scraped from Elexon.
  • Feature matrix of lag 48 of Price & Volume & Sparse matrix of Settlement Period, Day of the week and Month.
  • Feed-forward neural network:
    • Input layer with 1000 nodes, fully connected.
    • 0-5 hidden layers with 1000 nodes, fully connected.
    • 1-6 dropout layers. One under input & each hidden layer.  30% dropout.
    • Output layer with 1000 nodes, single output node.
    • Loss function = mean squared error.
    • Optimizer = adam (default parameters).

Results of the experiments are shown below in Fig. 1 – 3.

Figure 1 – number of layers vs final training loss
Figure 2 – number of layers vs MASE

Figure 1 shows two layers with the smallest training loss.

Figure 2 shows that two layers also has the lowest CV MASE (although has a high training MASE).

Figure 3 – number of layers vs overfit. Absolute overfit = Test-Training. Relative = Absolute / Test.

In terms of overfitting two layers shows reasonable absolute & relative overfit.  The low relative overfit is due to a high training MASE (which minimizes the overfit for a constant CV MASE).

My conclusion from this set of experiments is to go forward with a model of two layers.  Increasing the number of layers beyond this doesn’t seem to improve performance.

It is possible that training for more epochs may improve the performance of the more complex networks which will be harder to train.  For the scope of this project I am happy to settle on two layers.

Experiment two – number of nodes

For the second set of experiments all model parameters were all as above except for:

  • 2 hidden layers with 50-1000 nodes.
  • 5 fold cross validation.
Figure 4 – number of layers vs final training loss
Figure 5 – number of layers vs MASE
Figure 6 – number of layers vs overfit.  Absolute overfit = Test-Training.  Relative = Absolute / Test

My conclusion from looking at the number of nodes is that 500 nodes per layer is the optimum result.


Both parameters can be further optimized using the same parametric optimization.  For the scope of this work I am happy to work with the results of these experiments.

I trained a final model using the optimal parameters.  A two layer & 500 node network achieved a test MASE of 0.455 (versus the previous best of 0.477).

Table 1 – results of the final model fitted (two layers, 500 nodes per layer)

The next post in this series will look at controlling overfitting via dropout.

UK Imbalance Price Forecasting using Keras & TensorFlow

Previously I introduced a feedforward neural network model built using scikit-learn in Python. This post details an upgraded version which utilizes Keras, TensorFlow and Plotly.

Keras is a more flexible package for building neural networks. It has the ability to train models on GPU by using Google’s Tensorflow as the backend. I still use useful scikit-learn functions for splitting data into cross-validation and test/train sets.

As I’m becoming more comfortable with Python the quality of these scripts is improving. I’m using list comprehensions and lambda functions to make my code faster and cleaner. It’s a great feeling to rewrite parts of the script using new knowledge.

The full code is available on my GitHub repository for this project – I will also include a copy in this post.  The script philosophy is about running experiments on model structure and hyperparameters.

Features of this script

I needed to use a few workarounds to get TensorFlow to train models on the GPU.  First I had to install the tensorflow-gpu package (not tensorflow) through pip.  I then needed to access the TensorFlow session through the back end:

import keras.backend.tensorflow_backend as KTF

We also need to wrap our Keras model with a statement that sets the TensorFlow device as the GPU:


Finally I modify the Tensorflow session:

KTF.set_session(, log_device_placement=False)))

These three things together got my Keras model training on my GPU.

Another workaround I had to include was a way to reset Keras models. I use K-fold cross validation to tune model hyperparameters. This leaves the test set free for use only in determining the generalization error of the model. It is bad machine learning practice to use the test set to tune hyperparameters.

K-fold cross-validation uses a loop to iterate through the training data K times. I found was the model weights were not being reset after each loop – even when I used a function to build & compile the model outside of the loop. The solution I used involves storing the initial weights then randomly permutating them each time we call the function.

The previous scikit-learn script used L2-regularization to control overfitting. In this new version I make use of a Keras dropout layer. Dropout randomly ignores selected neurons during training.

This means that their contribution to the activation of downstream neurons is removed on the forward pass. Any weight updates will not be applied to the neuron on the backwards pass. This should result in the network being less sensitive to the specific weights of neurons.

I’ve also made this script create new folders for each experiment and each individual model run. For each run this script saves the features and a plot of the forecast versus the actual value.

For the experiment it saves the results (MASE, MAE, training loss etc) for all runs and a plot of the training history for each run.

The last major step I have taken is starting to use Plotly. Plotly allows saving of interactive graph objects as HTML documents and uploading of graphs to their server. This has been a major step up from matplotlib as the graphs created are interactive.

Model Results

The next few posts in this series will be using this script to optimize the structure and hyperparameters.

The first experiment I ran was on activation functions.  The following parameters were used:

  • 2016 Imbalance Price & Imbalance Volume data scraped from Elexon
  • Feature matrix of lag 48 of Price & Volume & Sparse matrix of Settlement Period, Day of the week and Month
  • 8 layer neural network:
    • Input layer with 1000 nodes, fully connected
    • Dropout layer after input layer with 30 % dropout
    • Five hidden layers with 1000 nodes, fully connected
    • Output layer with 1000 nodes, single output node
    • Loss function = mean squared error
    • Optimizer = adam

I ran four experiments with different activation functions (the same function in each layer).

Table 1 – Experiment on activation functions (MASE = Mean Absolute Scaled Error)
Activation FunctionTraining MASETest MASETraining Loss

Table 1 clearly shows that the rectifier (relu) activation function is superior for this task.

I ran a second experiment changing the data included in the feature matrix.  The sparse features are boolean variables for the Settlement Period, the day (Monday – Sunday) and the month.  These are included so that the model can capture trend and seasonality.

The models I ran in this experiment use the same parameters as Experiment One – except the data included in the feature matrix is changed:

  1. Lagged Imbalance Price only
  2. Lagged Imbalance Price & Imbalance Volume
  3. Lagged Imbalance Price & Sparse Features
  4. Lagged Imbalance Price, Imbalance Volume & Sparse features
Table 2 – Experiment on data sources (MASE = Mean Absolute Scaled Error)
Price only0.250.500.460.0%
Price & volume0.210.490.473.5%
Price & sparse0.170.500.490.7%
Price, volume & sparse0.150.470.456.1%

As expected increasing the amount of data for the model to learn from improves performance.  It’s very interesting how the improvement for Run 4 is more than the sum of the improvements for Runs 2 & 3!

It’s also a validation that the feature data has been processed correctly.  If the results got worse it’s because either a mistake was made during processing

Table 2 shows a large difference between the Training & Test MASE.  This indicates overfitting – something we will try to fix in future experiments by working with model structure and dropout.

So conclusions from our first two experiments – use the relu activation function and put in as many useful features as possible.

A quick look at the forecast for Run 4 of Experiment Two for the last few days in December 2016:

Figure 1 – The forecast versus actual Imbalance Price for the end of 2016

Click here to interact with the full 2016 forecast.

Next Steps

I’ll be doing more experiments on model structure and hyperparameters.  Experiments I have lined up include:

  • Model structure – number of layers & nodes per layer
  • Dropout fraction
  • Size of lagged data set

I’ll also keep working on improving the quality of code!

The Script

ADG Efficiency

This model runs experiments on model parameters on a neural network.
The neural network is built in Keras and uses TensorFlow as the backend.

To setup a model run you need to look at
- mdls_D
- mdls_L
- machine learning parameters
- pred_net_D

This script is setup for an experiment on the types
of data used as model features.

This model is setup to run an experiment on the sources of data.

import os
import pandas as pd
import numpy as np
import sqlite3
import pickle
import datetime

# importing some useful scikit-learn stuff
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold

# importing Keras & Tensorflow
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import adam, RMSprop, SGD
from keras.metrics import mean_absolute_error
import keras.backend.tensorflow_backend as KTF #!topic/keras-users/MFUEY9P1sc8

import tensorflow as tf

# importing plotly
import plotly as py
import plotly.graph_objs as go 
import as tls 
tls.set_credentials_file(username="YOUR USERNAME", api_key="YOUR API KEY")

# imported from another script 
from metrics import MASE 

# workaround found on stack overflow
tf.python.control_flow_ops = tf 

# grabbing the TF session from Keras backend
sess = KTF.get_session() 

# function that inputs the database path 
def get_data(db_path,table):
 conn = sqlite3.connect(db_path)
 data = pd.read_sql(sql='SELECT * from ' + str(table),con = conn)
 print('got sql data')
 return data # returns a df with all table data

# setting base directory
base_path = os.path.dirname(os.path.abspath(__file__))

# location of SQL database 
sql_path = os.path.join(base_path,'data','ELEXON DATA.sqlite')
# saving run time for use in creating folder for this run of experiments
run_name = str(
run_name = run_name.replace(':','-')
run_path = str('Experiment Results//' + run_name)
results_csv_path = os.path.join(base_path, run_path, 'results.csv')
history_csv_path = os.path.join(base_path, run_path, 'histories.csv')

# dictionary containing all of the infomation for each report
data_price_D = {'Imbalance Price':{'Report name':'B1770','Data name':'imbalancePriceAmountGBP'}}

data_price_vol_D = {'Imbalance Price':{'Report name':'B1770','Data name':'imbalancePriceAmountGBP'},
 'Imbalance Volume':{'Report name':'B1780','Data name':'imbalanceQuantityMAW'}}

# list of model names - done manually to control order & which models run
mdls_L = ['price, vol & sparse'] # 'price only', 'price & volume', 'price & sparse', 

# dictionary of mdls with parameters
# FL = first lag. LL = last lag. SL = step size, 
# Sparse? = to include sparse data for trend/seasonality or not, 
# Data dict = which dictionary of data to use for the lagged time series
mdls_D = {'price only':{'date_start':'01/01/2016 00:00','date_end':'31/12/2016 00:00','FL':48,'LL':48*2,'SL':48,'Sparse?':False,'Data dict' : data_price_D},
'price & volume':{'date_start':'01/01/2016 00:00','date_end':'31/12/2016 00:00','FL':48,'LL':48*2,'SL':48,'Sparse?':False,'Data dict' : data_price_vol_D},
'price & sparse':{'date_start':'01/01/2016 00:00','date_end':'31/12/2016 00:00','FL':48,'LL':48*2,'SL':48,'Sparse?':True,'Data dict' : data_price_D},
'price, vol & sparse':{'date_start':'01/01/2016 00:00','date_end':'31/12/2016 00:00','FL':48,'LL':48*2,'SL':48,'Sparse?':True,'Data dict' : data_price_vol_D}}

# machine learning parameters
n_folds = 10 # number of folds used in K-fold cross validation
epochs = 200 # number of epochs used in training 
random_state_split = 5
random_state_CV = 3

# dataframes for use in tracking model results
results_cols = ['Model name','CV MASE','Training MASE','Test MASE','MASE',
 'Training MAE','Test MAE','MAE','CV Loss','Training Loss',
 'Training History','Error Time Series']
results_DF = pd.DataFrame(columns = results_cols)
results_all_DF = results_DF 
history_all_DF = pd.DataFrame()

# for loop to iterate through models
for mdl_index, mdl in enumerate(mdls_L):
 mdl_params = mdls_D[mdl]
 # resetting dataframes for storing results from this model run
 results_DF = pd.DataFrame(index = [mdl], columns = results_cols)
 history_DF = pd.DataFrame()

 # creating folder for this run
 exp_path = str(run_path + '/' + str(mdl))

 # setting model parameters
 date_start = mdl_params['date_start']
 date_start = mdl_params['date_end']
 first_lag = mdl_params['FL']
 last_lag = mdl_params['LL']
 step_lag = mdl_params['SL']
 include_sparse = mdl_params['Sparse?']
 data_D = mdl_params['Data dict']
 # unpacking dictionary of data to be used 
 data_sources = [key for key in data_D] # ie Imbalance Price, Imbalance Volume
 # getting the imbalance price first
 if data_sources[0] != 'Imbalance Price':
 price_loc = data_sources.index('Imbalance Price')
 first_source = data_sources[0]
 data_sources[price_loc] = first_source
 data_sources[0] = 'Imbalance Price'

 table_names = [data_D[item]['Report name'] for item in data_sources] 
 data_col_names = [data_D[item]['Data name'] for item in data_sources]
 # getting data from SQL
 data_L = [get_data(db_path = sql_path, table = data_D[data_source]['Report name']) for data_source in data_sources]

 # list of the index objects
 indexes_L = [pd.to_datetime(raw_data['index']) for raw_data in data_L] 
 # list of the settlement periods
 SP_L = [raw_data['settlementPeriod'] for raw_data in data_L]

 # list of the actual data objects
 data_objs_L = [raw_data[data_col_names[index]].astype(float) for index, raw_data in enumerate(data_L)]
 # indexing these data objects
 for i, series in enumerate(data_objs_L):
 df = pd.DataFrame(data=series.values, index=indexes_L[i],columns=[])
 data_objs_L[i] = df

 # creating feature dataframe - gets reset every model run
 data_DF = pd.DataFrame()
 for data_index, data_obj in enumerate(data_objs_L):
 # creating lagged data frame (make this a function)
 for i in range(first_lag,last_lag,step_lag):
 lag_start, lag_end, lag_step = 1, i, 1
 # creating the lagged dataframe
 data_name = data_obj.columns.values[0]
 lagged_DF = pd.DataFrame(data_obj)
 for lag in range(lag_start,lag_end+1,lag_step):
 lagged_DF[str(data_name) + ' lag_'+str(lag)] = lagged_DF[data_name].shift(lag) 
 lagged_DF = lagged_DF[lag_end:] # slicing off the dataframe
 index = lagged_DF.index # saving the index 
 data_DF = pd.concat([data_DF,lagged_DF],axis=1) # creating df with data 
 SP = SP_L[0] 
 SP = SP[(len(SP)-len(data_DF)):].astype(float) # slicing our settlement periods 
 # creating our sparse matricies for seasonality & trend
 date = index
 # creating label binarizer objects 
 encoder_SP = LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)
 encoder_days = LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)
 encoder_months = LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)
 # creating sparse settelment period feature object
 encoded_SP = encoder_SP.transform(SP)
 SP_features = pd.DataFrame(encoded_SP, index, columns = list(range(1,51)))
 # creating sparse day of the week feature object
 days = list(map(lambda x: x.weekday(), date))
 encoded_days = encoder_days.transform(days)
 days_features = pd.DataFrame(encoded_days, index = index, 
 columns = ['Mo','Tu','We','Th','Fr','Sa','Su'])
 # creating sparse month feature object
 months = list(map(lambda x: x.month, date))
 encoded_months = encoder_months.transform(months)
 months_features = pd.DataFrame(encoded_months, index = index, 
 columns = ['Ja','Feb','Mar','Ap','Ma','Jun','Jul','Aug','Sep','Oct','Nov','Dec'])
 sparse_features = pd.concat([SP_features,days_features, months_features],axis=1) 
 print('incl sparse is ' + str(include_sparse))
 if include_sparse == True:
 print('including sparse')
 data_DF = pd.concat([data_DF,sparse_features],axis=1)
 # saving our feature matrix to a csv for checking
 features_path = os.path.join(base_path, exp_path, 'features.csv')
 # creating our target matrix (the imbalance price)
 y = data_DF['imbalancePriceAmountGBP']
 # dropping out the actual values from our data
 for data_col_name in data_col_names:
 data_DF = data_DF.drop(data_col_name,1)
 # setting our feature matrix 
 X = data_DF
 # splitting into test & train
 # keeping the split the same for different model runs
 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state = random_state_split)
 split_D = {'X_train' : len(X_train), 'X_test' : len(X_test)}
 # standardizing our data 
 X_scaler = StandardScaler()
 X_train = X_scaler.fit_transform(X_train)
 X_test = X_scaler.transform(X_test)
 X_all = X_scaler.transform(X)
 # reshaping
 X_train = np.asarray(X_train)
 y_train = np.asarray(y_train).flatten()
 X_test = np.asarray(X_test)
 y_test = np.asarray(y_test).flatten()
 X = np.asarray(X) # not sure if I use this anywhere - also this data is not standardized 
 y = np.asarray(y).flatten()
 # saving our scaler objects for use later
 pickle.dump(X_scaler, open(os.path.join(base_path,'pickle', 'X_scaler - ' + mdl + '.pkl'), 'wb'), protocol=4) 
 # length of the inputs
 training_samples = X_train.shape[0]
 input_length = X_train.shape[1] # should probably rename this to width....
 batch_size = int(training_samples/4)

 pred_net_D = {'price only':{'Layer 1' : Dense(1000, input_dim = input_length, activation = 'relu'), 'Layer 2' : Dense(1000, activation='relu'), 'Layer 3' : Dense(1000, activation='relu'), 'Layer 4' : Dense(output_dim = 1, activation='linear'),'Dropout fraction':0.3},
 'price & volume':{'Layer 1' : Dense(1000, input_dim = input_length, activation = 'relu'), 'Layer 2' : Dense(1000, activation='relu'), 'Layer 3' : Dense(1000, activation='relu'), 'Layer 4' : Dense(output_dim = 1, activation='linear'),'Dropout fraction':0.3},
 'price & sparse':{'Layer 1' : Dense(1000, input_dim = input_length, activation = 'relu'), 'Layer 2' : Dense(1000, activation='relu'), 'Layer 3' : Dense(1000, activation='relu'), 'Layer 4' : Dense(output_dim = 1, activation='linear'),'Dropout fraction':0.3}, 
 'price, vol & sparse':{'Layer 1' : Dense(1000, input_dim = input_length, activation = 'relu'), 'Layer 2' : Dense(1000, activation='relu'), 'Layer 3' : Dense(1000, activation='relu'), 'Layer 4' : Dense(output_dim = 1, activation='linear'),'Dropout fraction':0.3} 
 # defining layers for prediction network
 pred_net_params = pred_net_D[mdl]
 layer1 = pred_net_params['Layer 1']
 layer2 = pred_net_params['Layer 2']
 layer3 = pred_net_params['Layer 2']
 layer4 = pred_net_params['Layer 2']
 layer5 = pred_net_params['Layer 2']
 layer6 = pred_net_params['Layer 2']
 layer7 = pred_net_params['Layer 2']
 layer8 = pred_net_params['Layer 2']
 layer9 = pred_net_params['Layer 2']
 layer10 = pred_net_params['Layer 4']
 dropout_fraction = pred_net_params['Dropout fraction'] 

 with'gpu:0'): # force tensorflow to train on GPU
 KTF.set_session(, log_device_placement=False)))
 def get_model(): # use this function so I can recreate new network within CV loop
 network = Sequential() 
 return network
 def shuffle_weights(model, weights=None):
 """Randomly permute the weights in `model`, or the given `weights`.
 This is a fast approximation of re-initializing the weights of a model.
 Assumes weights are distributed independently of the dimensions of the weight tensors
 (i.e., the weights have the same distribution along each dimension).
 :param Model model: Modify the weights of the given model.
 :param list(ndarray) weights: The model's weights will be replaced by a random permutation of these weights.
 If `None`, permute the model's current weights.
 if weights is None:
 weights = model.get_weights()
 weights = [np.random.permutation(w.flat).reshape(w.shape) for w in weights]
 # Faster, but less random: only permutes along the first dimension
 # weights = [np.random.permutation(w) for w in weights]
 optimizer = adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)

 # cross validation
 print('Starting Cross Validation')
 CV_network = get_model()
 CV_network.compile(loss='mean_squared_error', optimizer=optimizer)
 initial_weights_CV = CV_network.get_weights()
 CV = KFold(n_splits=n_folds, random_state = random_state_CV)
 MASE_CV_L = []
 loss_CV_L = []

 for train, test in CV.split(X_train, y_train):
 shuffle_weights(CV_network, initial_weights_CV)
 mdl_CV =[train], y_train[train], nb_epoch = epochs, batch_size = batch_size) 

 y_CV_pred = CV_network.predict(X_train[test],batch_size = batch_size, verbose=0).flatten()
 MASE_CV = MASE(y_train[test],y_CV_pred, 48)
 loss_CV = mdl_CV.history['loss'][-1]
 MASE_CV = np.average(MASE_CV_L)
 loss_CV = np.average(loss_CV_L)
 # training network on all training data 
 print('Training prediction network')
 network = get_model()
 network.compile(loss='mean_squared_error', optimizer=optimizer) 
 initial_weights_net = network.get_weights()
 shuffle_weights(network, initial_weights_net)
 mdl_net =, y_train, nb_epoch = epochs, batch_size = batch_size)
 y_pred_train = network.predict(X_train,batch_size = batch_size, verbose=0).flatten()
 y_pred_test = network.predict(X_test,batch_size = batch_size, verbose=0).flatten()
 y_pred = network.predict(X_all,batch_size = batch_size, verbose=0).flatten()
 error_train = y_pred_train - y_train
 error_test = y_pred_test - y_test 
 error = y - y_pred
 abs_error = abs(error) 
 MASE_train = MASE(y_train, y_pred_train, 48)
 MASE_test = MASE(y_test, y_pred_test, 48)
 MASE_all = MASE(y, y_pred, 48)
 MAE_train = mean_absolute_error(y_train, y_pred_train).eval(session=sess)
 MAE_test = mean_absolute_error(y_test, y_pred_test).eval(session=sess)
 MAE_all = mean_absolute_error(y, y_pred).eval(session=sess)
 results_DF.loc[mdl,'Model name'] = mdl
 results_DF.loc[mdl,'CV MASE'] = MASE_CV
 results_DF.loc[mdl,'Training MASE'] = MASE_train
 results_DF.loc[mdl,'Test MASE'] = MASE_test
 results_DF.loc[mdl,'MASE'] = MASE_all
 results_DF.loc[mdl,'Training MAE'] = MAE_train
 results_DF.loc[mdl,'Test MAE'] = MAE_test
 results_DF.loc[mdl,'MAE'] = MAE_all
 results_DF.loc[mdl,'CV Loss'] = loss_CV
 results_DF.loc[mdl,'Training Loss'] = mdl_net.history['loss'][-1]

 results_DF.loc[mdl,'Error Time Series'] = error
 results_DF.loc[mdl,'Training History'] = mdl_net.history['loss']

 history_DF = pd.DataFrame(data=list(mdl_net.history.values())[0], index = list(range(1,epochs+1)), columns=[mdl])
 # figure 1 - plotting the actual versus prediction 
 actual_imba_price_G = go.Scatter(x=index,y=y,name='Actual',line=dict(width=2)) 
 predicted_imba_price_G = go.Scatter(x=index,y=y_pred,name='Predicted',line=dict(width=2, dash = 'dash'))
 fig1_data = [actual_imba_price_G, predicted_imba_price_G]
 fig1_layout = go.Layout(title='Forecast',yaxis=dict(title='Imbalance Price [£/MWh]'))
 fig1 = go.Figure(data=fig1_data,layout=fig1_layout) 
 fig1_name = os.path.join(exp_path,'Figure 1.html')
 py.offline.plot(fig1,filename = fig1_name, auto_open = False) # creating offline graph
 # py.plotly.plot(fig1, filename='Forecast', sharing='public') # creating online graph
 # saving results
 network_params_DF = pd.DataFrame(pred_net_params, index=[mdl])
 mdl_params_DF = pd.DataFrame(mdl_params, index=[mdl])
 results_DF = pd.concat([results_DF, network_params_DF, mdl_params_DF], axis = 1, join = 'inner')
 results_all_DF = pd.concat([results_all_DF, results_DF], axis = 0)
 history_all_DF = history_all_DF.join(history_DF, how='outer')



# figure 2 - comparing training history of models
fig2_histories = [history_all_DF[col] for col in history_all_DF]
fig2_data = [go.Scatter(x = data.index, y = data, for data in fig2_histories]
fig2_layout = go.Layout(title='Training History',yaxis=dict(title='Loss'),xaxis=dict(title='Epochs'))
fig2 = go.Figure(data=fig2_data,layout=fig2_layout)
fig2_name = os.path.join(run_path,'Figure 2.html')
py.offline.plot(fig2,filename = fig2_name, auto_open = False) # creating offline graph



Elexon API Web Scraping using Python – Updated

This post is part of a series applying machine learning techniques to an energy problem.  The goal of this series is to develop models to forecast the UK Imbalance Price. 

This post is an update to an earlier post in this series showing how to use Python to access data from Elexon using their API.  As with the previous script you can grab data for any reports Elexon offer through their API by iterating through a dictionary of reports with their keyword arguments.
This script solves two problems that occur with the Elexon data – duplicate and missing Settlement Periods.  You can view the script on my GitHub here.


Duplicate Settlement Periods are removed using the drop_duplicates Data Frame method in pandas.

Missing Settlement Periods are dealt with by first creating an index object of the correct length (SP_DF).  Note that this takes into account daylight savings days (where the correct length of the index is either 46 or 50) using the transition times feature of the pytz module.  Transition times allows identification of which date daylight savings time changes occur – very helpful!

The SP_DF is then joined (using an ‘outer join’) with the data returned from Elexon – meaning that any missing Settlement Periods are identified.  Any missing values are filled in with the average value for that day.


I’m going to start using GitHub as a way to manage this project – you can find the script to scrape Elexon data as well as a SQL database with 2016 data for the Imbalance Price and Imbalance Volume on my UK Imbalance Price Forecasting repository.

I’ve also put another small script I use to check the quality of the SQL database called database  This script should probably be built into the scraping script!  However I’ve decided to spend more time analyzing and building models 🙂

Next posts in this series will detail some dramatically improved models for predicting the Imbalance Price – making use of Keras, Tensorflow and Plotly.

Tuning regularization strength

This post is the fifth in a series applying machine learning techniques to an energy problem. The goal of this series is for me to teach myself machine learning techniques by developing models to forecast the UK Imbalance Price

I see huge promise in what machine learning can do in the energy industry.  This series details my initial efforts in gaining an understanding of machine learning.  

Part One – What is the UK Imbalance Price?
Part Two – Elexon API Web Scraping using Python
Part Three – Imbalance Price Visualization
Part Four – Multilayer Perceptron Neural Network

In the previous post in this series we introduced a Multi Layer Perceptron neural network to predict the UK Imbalance Price.   This post will dig a bit deeper into optimizing the degree of overfitting of our model.  We do this through tuning the strength of regularization.

What is regularization

Regularization is a tool used to combat the problem of overfitting a model.  Overfitting occurs when a model starts to fit the training data too well – meaning that performance on unseen data is poor.

To prevent overfitting to the training data we can try to keep the model parameters small using regularization.  If we include a regularization term in the cost function the model minimizes we can encourage the model to use smaller parameters.

The equation below shows the loss function minimized during model training.  The first term is the square of the error.  The second term is the regularization term – with lambda shown as the parameter to control regularization.  In order to be consistent with scikit-learn, we will refer to this parameter as alpha.

Regularization penalizes large values of the model parameters (theta) based on the size of the regularization parameter.  Regularization comes in two flavours – L1 and L2.  The MLP Regressor model in scikit-learn uses L2 regularization.

Setting alpha too large will result in underfitting (also known as a high bias problem).  Setting alpha too small may lead to overfitting (a high variance problem).

Setting alpha in the UK Imbalance Price model

Here we will optimize alpha by iterating through a number of different values.

We can then evaluate the degree of overfitting by looking at how alpha affects the loss function and the Mean Absolute Scaled Error (MASE).  The loss function is the cost function the model minimizes during training.  The MASE is the metric we used to judge model performance.

We use K-fold cross validation to get a sense of the degree of overfitting.  Comparing the cross validation to training performance gives us an idea of how much our model is overfitting.  Using K-fold cross validation allows us to leave the test data free for evaluating model performance only.

Figure 1 & 2 show the results of optimizing alpha for a MLP Regressor with five hidden layers of 1344 nodes each.  The input feature set is the previous one week of Imbalance Price data.

Figure 1 shows the effect of alpha on the loss function for the training and cross validation sets.  We would expect to see the training loss increase as alpha increases.  Small values of alpha should allow the model to overfit.  We would also expect to see the loss for the training and CV sets to coverge as alpha gets large.

Figure 1 – The effect of alpha on the loss function

Figure 1 shows the expected trend with the training loss increasing as alpha increases – except for alpha = 0.0001 which shows a high training loss.  This I don’t understand!  I was expecting that training loss would decrease with decreasing alpha.

Figure 1 shows the effect of alpha on the Mean Absolute Squared Error for the training, cross validation and test sets.

Figure 2 – The effect of alpha on the MASE for the training, cross validation and test data

Figure 2 also shows a confusing result.  I was expecting to see the MASE be a minimum at the smallest alpha and increase as alpha increased.  This is because small values of alpha should allow the model to overfit (and improve performance). Instead we see that the best training MASE is at alpha = 0.01.

Figure 2 shows a minimum for the test MASE at alpha = 0.01 – this is also the minimum for the training data.

Going forward I will be using a value of 0.01 for alpha as this shows a good balance between minimizing the loss for the training and cross validation sets.

Table 1 shows the results for the model as it currently stands.

Table 1 – Model performance with alpha = 0.01

Training MASE0.3345
Cross validation MASE0.5890
Test MASE0.5212

Next step in this project is looking at previously unseen data for December 2016 – stay tuned.

Forecasting UK Imbalance Price using a Multilayer Perceptron Neural Network

This post is the fourth in a series applying machine learning techniques to an energy problem. The goal of this series is to develop models to forecast the UK Imbalance Price.  

I see huge promise in what machine learning can do in the energy industry.  This series details my initial efforts in gaining an understanding of machine learning.  

Part One – What is the UK Imbalance Price?
Part Two – Elexon API Web Scraping using Python
Part Three – Imbalance Price Visualization

Finally, the fun stuff!  In this post we will forecast the UK Imbalance Price using a machine learning model.  We will use a simple neural network package in scikit-learn – the Mulitlayer Perceptron (MLP).


What is a Multilayer Perceptron

An MLP is a feedforward artifical neural network.

The network is composed of multiple layers of nodes.  The output from each node is fed to each node of the next layer.  Each node takes the inputs from the previous layer and processes them through an activation function such as the hyperbolic tan function.  Both classification and regression are possible in a MLP by changing the activation function.

Learning occurs through optimization of a cost function.  The signal from the input to output layer propagates forward through the network.  The signal from the error then backpropagates from the output layer towards the input layer.

The cost function is optimized by adjusting the weights using an algorithm such as Stochasitic Gradient Descent (SGD).  In our model we use a variation of SGD known as ‘adam’.

MLP in scikit-learn

scikit-learn is a Python machine learning library.  We will use the MLPRegressor model to predict the Imbalance Price.  We also make use of a few other features of scikit-learn – standardization, grid searching and pipelines.

Standardization is a pre-processing technique required for many machine learning algorithms.  Here we use the StandardScaler to transform our data to a mean of zero and with variance in the same order.

GridSearchCV is a module of scikitlearn that iterates through a user defined set of model parameters.  We use a grid search to identify the optimal set of:

  • Number of lags = the size of the lags used as the model features.
  • Model strucutre = both the width (number of nodes per layer) and length (number of layers) of the network.
  • Activation function = logistic sigmoid, hyperbolic tan or a rectified linear function.
  • Alpha = the parameter used in L2 regularization.

GridSearchCV also allows cross validation to be used to understand model overfitting.  The standard deviation of the test scores is used to understand the degree of overfitting.

A pipeline allows multiple steps of the model to be joined together.  The pipeline used in the model is a combination of the StandardScaler and MLPRegressor.

MLP regressor to predict UK Imbalance Price

This model is trained with UK Imbalance Price data obtained from Elexon.  As we know the value of our output this is a supervised learning problem.

The features of our model are the previous values of the Imbalance Price.  The number of lags included is a parameter that we iterate across.

Grid searching uses a custom MASE function to score the model runs.  MASE allows direct comparison with a naive forecast.  Here we set the naive forecast as the value at the same time yesterday (a lag of 48).

The lower the MASE the more accurate the forecast.  A MASE less than 1 means the forecast is superior to the naive forecast.  Greater than one means the naive forecast is better than our modeled forecast.

Figure 1 – A sample of the final forecast
Lags = 1344, hidden layers = (50,50,50,50,50), ‘relu’ activation function, alpha = 0.001

Experiment 1 – Varying number of lags 

Figure 2 – Experiment on the number of lags
Hidden layers = (50,50,50,50,50), ‘relu’ activation function, alpha = 0.001

As expected increasing the number of features leads to an improved MASE.  Also of interest here is the difference between the MASE on the test & train data – indicating overfitting.

Experiment 2 – Varying model structure

Figure 3 – Experiment on number of layers
Lags = 1344, nodes per layer = 50, ‘relu’ activation function, alpha = 0.001

Figure 4 – Experiment on nodes per hidden layer
Lags = 1344, number of layers = 5, ‘relu’ activation function, alpha = 0.001

Experiment 3 – Varying activation function

Table 1 - Results for the activation function experiment
Activation functionRectified linear (relu)Hyperbolic Tab (tanh)
MASE train0.1840.305
MASE test0.6530.698
MASE all0.3220.358
Test score standard deviation0.0370.008

Table 1 shows that our rectified linear function is slightly superior in terms of MASE.  It also suffers from a higher degree of overfitting (seen by a higher test score standard deviation).

When inspecting the performance of the hyperbolic tanh function it seems that the model struggles to capture the peaks in the price (Figure 5 below).  It does however seem to

Figure 5 – Performance of the hyperbolic tan activation function
Lags = 1344, hidden layers = (50,50,50,50,50), ‘relu’ activation function, alpha = 0.001

Experiment 4 – Varying alpha

Figure 6 & 7 – Effect of alpha on MASE for the whole data set & Test score standard deviation
Hidden layer size = (50, 50, 50, 50, 50),  ‘relu’ activation function

Alpha is a parameter that balances between minimizing the cost function and overfitting the model.  Increasing alpha will increase the size of the regularization term

Interestingly we see that both a high and low alpha lead to poorer MASE scores.  We also see that increasing alpha leads to a decrease in the test score standard deviation as expected.

Future work

  • Using December 2016 data to test model performance on data that wasn’t used in either training or testing.
  • More robust use of training, testing and validation sets.
  • More work to understand the value of different activation functions – perhaps using hyperbolic tan for the lower price periods and rectified linear to capture the peaks.
  • Experimenting with different network structures such as recurrent neural networks.
  • Introducing more data such as Net Imbalance Volume.

The Code

This code uses database called ‘ELEXON data.sqlite’.  You can generate one of these files using the code in our previous post on Scraping Data from the Elexon API.


Imbalance Price Visualization

This post is the third in a series applying machine learning techniques to an energy problem. The goal of this series is to develop models to forecast the UK Imbalance Price.

Part One – What is the UK Imbalance Price?
Part Two – Elexon API Web Scraping using Python

In the first post we gave an introduction of what the UK Imbalance Price is.  We then showed how to gather UK grid data off the Elexon API using Python.

This third post will visualize the data we have gathered.  Spending the time to explore the data is an important step in model building.  All of the charts in this post were created in Python using pandas and matplotlib.

Figure 1 – The UK Imbalance Price January to November 2016

Figure 1 shows the volatile nature of the Imbalance Price.  Major positive spikes are observed throughout the year, with even more significant spikes occurring in November.  UK electricity demand is highest in winter, so likely higher demands are leading to National Grid having to use more expensive plant to balance the system.

Figure 2 – Monthly summary statistics 

Figure 2 shows how extreme the month of November was – large peaks and also a very high standard deviation.  It will be interesting to see how December compares in terms of volatility.

Figure 3 – Correlogram

The correlogram shows the autocorrelation function computed at different lags of the time series.  This can be used to identify any seasonality in the time series.  Clearly we have some seasonality present at around 48 lags – equivalent to one day in this half hourly time series.

It makes sense that the Imbalance Price would likely be similar to the day before – many of the reasons for Imbalance (such as forecasting errors) are likely to occur multiple times.  Interestingly the ACF function does not peak at a lag of 336 (corresponding to one week).

Figure 4 – Box plot (note the y-axis maximum was limited at 200 £/MWh)

The box plot clearly shows how much of the data is classed as outliers (i.e. being outside the inner & outer fences).  Also of interest is how close the median and first quartile are in most months!

In the next post we will begin to forecast the Imbalance Price using a multi-layer perceptron in Scikitlearn.

Elexon API Web Scraping using Python

NOTE – the code in this post is now superseeded – please see my update post – or just go straight to my GitHub repository for this project.

This post is the second in a series applying machine learning techniques to an energy problem.  The goal of this series is to develop models to forecast the UK Imbalance Price. 

Part One – What is the UK Imbalance Price?

The first post in this series gave an introduction to what the UK Imbalance Price is.

This post will show how to scrape data UK Grid data from Elexon using their API.  Elexon make available UK grid and electricity market data  to utilities and traders.

Data available includes technical information such as weather or generation.  Market data like prices and volumes is also available.  A full detail of available data is given in the Elexon API guide.

Accessing data requires an API key, available by setting up a free Elexon account.  The API is accessed by passing a URL with the API key and report parameters.  The API will return either an XML or a CSV document.

Features of the Python code

The Python code below is a modified version of code supplied by the excellent Energy Analyst website.  Some features of the code are:

  • Script iterates through a dictionary of reports.  I have setup the script to iterate through two different reports  – B1770 for Imbalance Price data and B1780 for Imbalance Volume data.
  • The script then iterates through a pandas date_range object of days.  This object is created by setting the startdate and the number of days.
  • Two functions written by Energy Analyst are used:
    • BMRS_GetXML – returns an XML object for a given set of keyword arguments & API key.
    • BMRS_Dataframe – creates a pandas DataFrame from the XML object.
  • Data for each iteration is indexed using UTC time.  Two columns are added to the data_DF with the UTC and UK time stamps.
  • The results of each iteration are saved in an SQL database.  Each report is saved in its own table named with the report name.
  • The results of the entire script run is saved in a CSV (named output.csv)

The Python code

Next steps
The next post in this series will be visualizing analyzing the Imbalance Price data for 2016.

What is the UK Imbalance Price?

This post is the first in a series applying machine learning techniques to an energy problem.  The goal of this series is to develop models to forecast the UK Imbalance Price. 

What is the Imbalance Price?

The Imbalance Price is what generators or suppliers pay for any unexpected imbalance.

In the UK generators and suppliers (known as Parties) contract with each other for the supply of electricity.  Generators sell electricity to suppliers who then sell power to end use customers.

As System Operator National Grid handles real time balancing of the UK grid.  Parties submit details of their contracts to National Grid one hour before delivery.  This allows National Grid to understand the expected imbalance.

National Grid will then take actions to correct any predicted imbalance.  For example the Balancing Mechanism allows Parties to submit Bids or Offers to change their position by a certain volume at a certain price.

National Grid also the ability to balance the system using actions outside the Balancing Mechanism.  Examples include:

  • Short Term Operating Reserve power plants.
  • Frequency Response plants used to balance real time.
  • Reserve Services.

More drastic scenarios National Grid may call upon closed power plants or disconnect customers.  National Grid will always reduce the cost of balancing within technical constraints.

Parties submit their expected positions one hour before delivery –  but they do not always meet these contracted positions!

A supplier may underestimate their customers demand.  A power plant might face an unexpected outage.  The difference between the contracted and actual position is charged using the Imbalance Price.

ELEXON uses the costs that National Grid incurs in correcting imbalance to calculate the Imbalance Price.  This is then used to charge Parties for being out of balance with their contracts. ELEXON details the process for the calculation of the Imbalance Price here.

What data is available?

ELEXON make available a significant amount of data online.  This includes data for the Imbalance Price calculation as well as data related to the UK grid.  We will make use of the ELEXON API to access data.

The first iteration of this model will be auto-regressive.  We will use only the previous values of the Imbalance Price to predict future values.

As we continue to develop the model we will add more data and explain it’s relevance to the Imbalance Price.  Adding data iteratively will allow us to understand what value the more data has to the model.

Next steps

The next post will be the Python code used to scrape data using the Elexon API.   We will then do some visualization to analyze the Imbalance Price data.

Posts after that will be developing models in Python to predict the Imbalance Price.

Part Two – Elexon API Web Scraping using Python