This post is part of a series applying machine learning techniques to an energy problem. The goal of this series is to develop models to forecast the UK Imbalance Price.
This post is an update to an earlier post in this series showing how to use Python to access data from Elexon using their API. As with the previous script you can grab data for any reports Elexon offer through their API by iterating through a dictionary of reports with their keyword arguments.
This script solves two problems that occur with the Elexon data – duplicate and missing Settlement Periods. You can view the script on my GitHub here.
Duplicate Settlement Periods are removed using the drop_duplicates Data Frame method in pandas.
Missing Settlement Periods are dealt with by first creating an index object of the correct length (SP_DF). Note that this takes into account daylight savings days (where the correct length of the index is either 46 or 50) using the transition times feature of the pytz module. Transition times allows identification of which date daylight savings time changes occur – very helpful!
The SP_DF is then joined (using an ‘outer join’) with the data returned from Elexon – meaning that any missing Settlement Periods are identified. Any missing values are filled in with the average value for that day.
I’m going to start using GitHub as a way to manage this project – you can find the script to scrape Elexon data as well as a SQL database with 2016 data for the Imbalance Price and Imbalance Volume on my UK Imbalance Price Forecasting repository.
I’ve also put another small script I use to check the quality of the SQL database called database checking.py. This script should probably be built into the scraping script! However I’ve decided to spend more time analyzing and building models 🙂
Next posts in this series will detail some dramatically improved models for predicting the Imbalance Price – making use of Keras, Tensorflow and Plotly.