Energy Basics – Capacity Factor

All men & women are created equal. Unfortunately the same is not true for electricity generating capacity.
 
Capacity on it’s own is worthless – what counts the electricity generated (kWh) from that capacity (kW). If the distinction between kW and kWh is not clear this previous post will be useful.
 

Capacity factor is one way to quantify the value of capacity. It’s the actual electricity (kWh) generated as a percentage of the theoretical maximum (operation at maximum kW).

For example to calculate the capacity factor on an annual basis:
 
 

There are many reasons why capacity will not generate as much as it could.

Three major reasons are maintenance, unavailability of fuel and economics.

 
Maintenance
 
Burning fossil fuels creates a challenging engineering environment. The core of a gas turbine is high pressure & temperature gases rapidly rotating blazing hot metal. Coal power stations generate electricity by high pressure steam forcing a steam turbine to spin incredibly fast.
 
These high challenges mean that fossil fuel plants need a lot of maintenance. The time when the plant is being maintained is time the capacity isn’t generating electricity.
 
Renewables plants need a lot less maintenance than a fossil fuel generator. No combustion means there is a lot less stress on equipment.
 
Availability of fuel
 
Yet while renewables come ahead in terms of maintenance, they fall behind due to a constraint that fossil fuel generation usually doesn’t suffer from – unavailability of fuel.
 
This is why renewables like wind & solar are classed as intermittent. Fuel is often not available meaning generation is often not possible.
 
Solar panels can’t generate at night. Wind turbines need wind speeds to be within a certain range – not too low, not too high – just right.
 
This means that wind & solar plants are often not able to generate at full capacity – or even to generate at all. This problem isn’t common for fossil fuel generation. Fossil fuels are almost always available through natural gas grids or on site coal storage.
 
Economics
 
The final reason for capacity to not generate is economics.
 
The relative price of energy and regulations change how fossil fuel capacity is dispatched. Today’s low natural gas price environment is the reason why coal capacity factors have been dropping.
 
Here renewables come out way ahead of fossil fuels. As the fuel is free renewables can generate electricity at much lower marginal cost than fossil fuels. Wind & solar almost always take priority over fossil fuel generation.
 
Typical capacity factors
 
The capacity factor wraps up and quantifies all of the factors discussed above.
 
Table 1 – Annual capacity factors (2014-2016 US average)

CoalCCGTWindSolar PVNuclear
Annual Capacity Factor56.13%53.40%33.63%26.30%92.17%

 

Table 1 gives us quite a bit of insight into the relative value of different electricity generating technologies. The capacity factor for natural gas is roughly twice as high as solar PV.

 

We could conclude that 1 MW of natural gas capacity is worth around twice as much as 1 MW of solar PV.

How useful is the capacity factor?

Yet the capacity factor is not a perfect measure of how valuable capacity is. Taking the average of anything loses infomation – capacity factor is no different.
 
Two plants operating in quite different ways can have the same capacity factor. A plant that operated 50% for the entire year and a plant that generated for half of the year at full capacity will both have an identical capacity factor.
 
The capacity factor loses infomation about the time of energy generation. The time of generation & demand is a crucial element in almost every energy system.
 
Generation during a peak can be a lot more valuable to the world than generation at other times. Because of the nature of dispatchable generation it is more likely to be running during a peak.
 
This leads us to conclude that low capacity factor generation could be more valuable than higher capacity factor generation.  This is especially true for solar in many countries as a) the peak often occurs when the sun is down and b) all solar generation is coincident.
 
The solution to the intermittency problem of renewables is storage. Storage will allow intermittent generation to be used when it’s most valuable – not just whenever it happens to be windy or sunny.
 
Thanks for reading!

Energy Insights – How to save the energy system – André Bardow

Energy Insights is a series where we highlight interesting energy content from around the web.

The previous post in this series was about the 2017 BP Annual Energy Outlook.


These three Energy Insights come from a TED talk titled ‘How to save the energy system’ given by André Bardow.   André is a fantastic speaker and his talk is well worth a watch.

Below are three of the many interesting things André discusses. I really enjoyed writing this – I find all of this fascinating.

Why peak oil won’t save the planet

As humanity burns more oil the amount of oil left to recover should decrease. This is logical – right?

Figure 1 below shows the opposite is true. Estimated global oil reserves have actually increased over time.  The key is to understand the definition of oil reserves.

Figure 1 – Estiamted global oil reserves Estimated global oil reserves (1980 – 2014)

Oil reserves are defined
 as the amount of oil that can be technically recovered at a cost that is financially feasible at the present price of oil.

Oil reserves are therefore a function of a number of variables that change over time:

  • Total physical oil in place – physical production of oil reduces the physical amount of oil in the ground.
  • Total estimated oil in place – the initial estimates are low and increased over time.
  • Oil prices – historically oil prices have trended upwards (Figure 2). Oil reserves defined as a direct function of the present oil price.
  • Technology – the oil & gas industry has benefited more than any other energy sector from improved technology.  Improved technology reduces the cost of producing oil.  This makes more oil production viable at a given price.
Figure 2 – Crude oil price (1950 – 2010)

Only the physcial production of oil has had a negative effect on oil reserves.

The other three (total oil estimate, technology & oil price) have all caused oil reserve estimates to increase.

We are not going to run out of oil any time soon. The limit on oil   production is not set by physcial reserves but by climate change.  I find this worrying – it would be much better if humanity was forced to swtich to renewables!

Wind & solar lack an inherent economy of scale

 A lot of the advantages in systems are from economies of scale – energy systems are no different.  Larger plants are more energy efficient and have lower specific capital & maintenance costs.

Energy efficiency improves with size as the ratio of fixed energy losses to energy produced improves.   Figure 3 shows an example of this for spark ignition gas engines.
Figure 3 – Effect of gas engine size [kWe] on gross electric efficiency [% HHV]

This is also why part load efficiency is worse than full load efficiency.  Energy production reduces but fixed  energy losses remain constant.

Specific capital & operating costs also improve with size.  For example, a 10 MW and 100 MW plant may need the same land area at a cost of £10,000.  The specific capital cost of land for both projects is £1,000/MW versus £100/MW respectively.

Fossil fuel plants use their economy of scale to generate large amounts of electricity from a small number of prime movers.

Wind & solar plants are not able to do this. The problem is the prime movers in both wind & solar plants.

The maximum size of a wind or solar prime movers (wind turbines or solar panels) is small comapred with fossil fuel prime movers.  For example GE offer a 519 MWe gas turbine – the world’s largest wind turbine is the 8 MWe Vestas V164.

Figure 4 – The Vestas V164

A single gas turbine in CCGT mode is more than enough to generate 500 MWe.  A wind farm needs 63 wind turbines to generate the same amount.

The reason for the difference is fundamental to the technologies – the energy density of fuel.  Fossil fuels offer fantastic energy densities – meaning we can do a lot with less fuel (and less equipment).  Transportation favours liquid fossil fuels for this reason.

Wind & solar radiation have low energy densities. To capture more energy we need lots more blade or panel surface area.  This physical constraint means that scaling the prime movers in wind & solar plants is difficult. The physical size increases very fast as we increase electricity generation.

This means that wind turbines & solar panels need to very cheap at small scales. As wind & solar technologies improve there will be improvements in both the economy of scale & maximum size of a single prime mover.

But to offer a similar economy of scale as fossil fuels is difficult due to low energy density fuel.  It’s not that wind & solar don’t benefit from any economy of scale – for example grid connection costs can be shared. It’s the fact that fossil fuels:

  • share most of these economies of scale.
  • use high energy density fuels, which gives a fundamental advantage in the form of large maximum prime mover sizes.

We need to decarbonise the supply of heat & power as rapidly as possible.  Renewables are going to be a big part of that.  The great thing is that even with this disadvantage wind & solar plants are being built around the world!

Average German capacity factors
Andre gives reference capacity factors for the German grid of:

  • Solar = 10 %.
  • Wind = 20 %.
  • Coal = 80 %.

This data is on an average basis.  The average capacity factor across the fleet is usually more relevant than the capacity factor of a state of the art plant.

It is always good to have some rough estimates in the back if your mind.  A large part of engineering is using your own estimates based on experience with the inputs or outputs of models.

Thanks for reading!

CHP Scoping Model v0.2

See the introductory post for this model here.  

This is v0.2 of the CHP scoping model I am developing.  The model is setup with some dummy data.

If you want to get it working for your project all you need to do is change:

  • heat & power demands (Model : Column F-H)
  • prices (Model : Column BF-BH)
  • CHP engine (Input : Engine Library).

You can also optimize the operation of the CHP using a parametric optimization VBA script (Model : Column BW).

You can download the latest version of the CHP scoping model here.

Thanks for reading!

Tuning Model Structure – Number of Layers & Number of Nodes

Imbalance Price Forecasting is a series applying machine learning to forecasting the UK Imbalance Price.  

Last post I introduced a new version of the neural network I am building.  This new version is a feedforward fully connected neural network written in Python built using Keras.

I’m now working on tuning model hyperparameters and structure. Previously I setup two experiments looking at:

  1. Activation functions
    • concluded rectified linear (relu) is superior to tanh, sigmoid & linear.
  2. Data sources
    • concluded more data the better.

In this post I detail two more experiments:

  1. Number of layers
  2. Number of nodes per layer

Python improvements

I’ve made two improvements to my implementation of Keras.  An updated script is available on my GitHub.

I often saw during training that the model trained on the last epoch was not necessarily the best model. I have made use of a ModelCheckpoint that saves the weights of the best model trained.

The second change I have made is to include dropout layers after the input layer and each hidden layer.  This is a better implementation of dropout!

Experiment one – number of layers

Model parameters were:

  • 15,000 epochs. Trained in three batches. 10 folds cross-validation.
  • 2016 Imbalance Price & Imbalance Volume data scraped from Elexon.
  • Feature matrix of lag 48 of Price & Volume & Sparse matrix of Settlement Period, Day of the week and Month.
  • Feed-forward neural network:
    • Input layer with 1000 nodes, fully connected.
    • 0-5 hidden layers with 1000 nodes, fully connected.
    • 1-6 dropout layers. One under input & each hidden layer.  30% dropout.
    • Output layer with 1000 nodes, single output node.
    • Loss function = mean squared error.
    • Optimizer = adam (default parameters).

Results of the experiments are shown below in Fig. 1 – 3.

Figure 1 – number of layers vs final training loss
Figure 2 – number of layers vs MASE

Figure 1 shows two layers with the smallest training loss.

Figure 2 shows that two layers also has the lowest CV MASE (although has a high training MASE).

Figure 3 – number of layers vs overfit. Absolute overfit = Test-Training. Relative = Absolute / Test.

In terms of overfitting two layers shows reasonable absolute & relative overfit.  The low relative overfit is due to a high training MASE (which minimizes the overfit for a constant CV MASE).

My conclusion from this set of experiments is to go forward with a model of two layers.  Increasing the number of layers beyond this doesn’t seem to improve performance.

It is possible that training for more epochs may improve the performance of the more complex networks which will be harder to train.  For the scope of this project I am happy to settle on two layers.

Experiment two – number of nodes

For the second set of experiments all model parameters were all as above except for:

  • 2 hidden layers with 50-1000 nodes.
  • 5 fold cross validation.
Figure 4 – number of layers vs final training loss
Figure 5 – number of layers vs MASE
Figure 6 – number of layers vs overfit.  Absolute overfit = Test-Training.  Relative = Absolute / Test

My conclusion from looking at the number of nodes is that 500 nodes per layer is the optimum result.

Conclusions

Both parameters can be further optimized using the same parametric optimization.  For the scope of this work I am happy to work with the results of these experiments.

I trained a final model using the optimal parameters.  A two layer & 500 node network achieved a test MASE of 0.455 (versus the previous best of 0.477).

Table 1 – results of the final model fitted (two layers, 500 nodes per layer)

The next post in this series will look at controlling overfitting via dropout.

CHP Scoping Model v0.1

The most recent version of this model can be found here.

My motivation for producing this model is to give engineers something to dig their teeth into.

My first few months as an energy engineering graduate were spent pulling apart the standard CHP model used by my company.  A few years later I was training other technical teams how to use my own model.

I learnt a huge amount through deconstructing other peoples models and iterating through versions of my own.

So this model is mostly about education. I would love it to be used in a professional setting – but we may need a couple of releases to iron out the bugs!

In this post I present the beta version of a CHP feasibility model.  The model takes as inputs (all on a half hourly basis):

  • High grade heat demand (Model : Column F).
  • Low grade heat demand (Model : Column G).
  • Electricity demand (Model : Column H).
  • Gas, import & export electricity price (Model : Column BF-BH).

Features of the model:

  • CHP is modeled as a linear function of load. Load can be varied from 50-100 %.
  • Can model either gas turbines or gas engines. No ability to model supplementary firing.  Engine library needs work.
  • The CHP operating profile can be optimized using a parametric optimization written in VBA.
    • Iteratively increases engine load from 50% – 100% (single HH period),
    • Keeps value that increased annual saving the most (the optimum),
    • Moves to next HH period,
    • Optimization can be started using button on (Model : Column BV). I reccomend watching it run to try understand it.  The VBA routine is called parametric.
  • Availability is modeled using a randomly generated column of binary variables (Model : Column C).
You can download the latest version of the CHP scoping model here.

Thanks for reading!

Monte Carlo Q-Learning to Operate a Battery

I have a vision for using machine learning for optimal control of energy systems.  If a neural network can play a video game, hopefully it can understand how to operate a power plant.

In my previous role at ENGIE I built Mixed Integer Linear Programming models to optimize CHP plants.  Linear Programming is effective in optimizing CHP plants but it has limitations.

I’ll detail these limitations in future post – this post is about Reinforcement Learning (RL).  RL is a tool that can solve some of the limitations inherent in Linear Programming.

In this post I introduce the first stage of my own RL learning process. I’ve built a simple model to charge/discharge a battery using Monte Carlo Q-Learning. The script is available on GitHub.

I made use of two excellent blog posts to develop this.  Both of these posts give a good introduction to RL:

Features of the script
 

As I don’t have access to a battery system I’ve built a simple model within Python.  The battery model takes as inputs the state at time t, the action selected by the agent and returns a reward and the new state.  The reward is the cost/value of electricity charged/discharged.

def battery(state, action):  # the technical model
    # battery can choose to :
    #    discharge 10 MWh (action = 0)
    #    charge 10 MWh or (action = 1)
    #    do nothing (action = 2)

    charge = state[0]  # our charge level
    SP = state[1]  # the current settlement period
    action = action  # our selected action
    prices = getprices()
    price = prices[SP - 1]  # the price in this settlement period

    if action == 0:  # discharging
        new_charge = charge - 10
        new_charge = max(0, new_charge)  
        charge_delta = charge - new_charge
        reward = charge_delta * price
    if action == 1:  # charging
        new_charge = charge + 10
        new_charge = min(100, new_charge)
        charge_delta = charge - new_charge
        reward = charge_delta * price
    if action == 2:  # nothing
        charge_delta = 0
        reward = 0

    new_charge = charge - charge_delta
    new_SP = SP + 1
    state = (new_charge, new_SP)
    return state, reward, charge_delta

The price of electricity varies throughout the day.
The model is not fed this data explicitly – instead it learns through interaction with the environment.
 
One ‘episode’ is equal to one day (48 settlement periods).  The model runs through thousands of iterations of episodes and learns the value of taking a certain action in each state.  
 
Learning occurs by apportioning the reward for the entire episode to every state/action that occurred during that episode. While this method works, more advanced methods do this in better ways.
def updateQtable(av_table, av_count, returns):
    # updating our Q (aka action-value) table
    # ********
    for key in returns:
        av_table[key] = av_table[key] + (1 / av_count[key]) * (returns[key] - av_table[key])
    return av_table
The model uses an epsilon-greedy method for action selection.  Epsilon is decayed as the number of episodes increases.
Results
 
Figure 1 below shows the the optimal disptach for the battery model after training for 5,000 episodes.  
Figure 1 – Electricity prices [£/MWh] and the optimal battery dispatch profile [%]
I’m happy the model is learning well. Charging occurs during periods of low electricity prices. It is also fully draining the battery at the end of the day – which is logical behavior to maximise the reward per episode.  
 

Figure 2 below shows the learning progress of the model.

Figure 2 – Model learning progress
Next steps
 
Monte Carlo Q-learning is a good first start for RL. It’s helped me to start to understand some of the key concepts.
 
Next steps will be developing more advanced Q-learning methods using neural networks.

Energy Insights – 2017 BP Energy Outlook

Energy Insights is a series where we pull out key points from energy articles around the web. This is not a full summary but a taste – if you like the ideas then please watch the presentation & read the report.  

Previous posts in this series include the IEA 2016 World Outlook and Automated Cars.

Often people jump to the conclusion that anything released by an oil major is self-serving.  Don’t be like this!  If you ignore a report like the Outlook it is only you that is missing out.

The search for truth requires humility.  You need to be honest about your own ignorance.  You need to be open to learning from any source of infomation.  You need to be confident that you can judge the quality of that infomation.

Below I highlight BP’s view on passenger cars and the effect of climate policies on natural gas over the course of the Outlook (2015-2035).

Oil consumption for passenger cars

Figure 1 – Net change in car oil consumption

BP project a doubling of the global passenger car fleet due to the continued emergence of the middle class.

Luckily the increased oil consumption associated with double the number of cars is almost entirely offset by a 2.5 % annual improvement in fuel efficiency.

This fuel efficiency assumption seems quite small – but actually it is a strong break with the past.  The average for the last twenty years is only 1 %.

Even small improvements in fuel efficiency have a large effect on oil consumption due to the size of the combustion engine fleet.

The opposite is true with electric cars.  BP are projecting the number of electric cars increasing from 1.2 million to 100 million.  This is a compounded annual growth rate of around 25 %!

Unlike with fuel efficiency this relative increase has very little effect.  Electric car deployment increasing by 100 times leads to only a 6 % reduction versus 2015 oil consumption.

Electric cars are a sexy topic that gets a lot of media attention – yet vehicle fuel efficiency may be more important if we care about climate change.

What we need to remember is that large relative increases can be dwarfed by small relative increases.  It’s important to take everything back to the absolute value (in this case oil consumption) that we care about.

Risks to gas demand

Oil majors and clean energy professionals are both interested in the future of natural gas.  In the Outlook BP take a look at how climate policy could affect the growth of natural gas.

Strong climate policies pose a risk to all fossil fuels – natural gas included.  Strong climate policies lead to the reduction of all fossil fuels in favour of low carbon energy generation.

However the Outlook shows that actually both strong and weak climate policies pose risks to natural gas consumption.

Figure 2 – The effect of climate policy strength on natural gas consumption growth

Weak climate policies will favour fossil fuels but also benefit coal over natural gas.  BP expect the net effect of this would be a reduction in gas growth versus their base case.

This is quite a nice example of a Laffer curve.  The Laffer curve is traditionally used for demonstrating the relationship between tax revenue and the tax rate.  The curve shows there is an optimum somewhere in the middle.

Figure 3 – The Laffer Curve

BP are showing that natural gas consumption likely follows a Laffer curve with respect to climate policy.

I hope you found these two insights as interesting as I did.  I encourage you to check out either the presentation or the report for further interesting insights.

Energy Basics – Average vs Marginal Carbon Emissions

Carbon savings might seem like a simple calculation – yet many professionals are getting it wrong. I know because I was making this mistake in my previous job!

Accurate calculation of carbon savings is crucial in the fight against climate change.  Knowing how much we save from a project can be compared to other projects such as renewable generation.

So where are people going wrong?  The key is to understand the difference between average and marginal carbon emissions.

Average carbon emissions are calculated using the total carbon emissions and total amount of electricity generated:

Table 1 – Calculation of average carbon intensity (for Base Case – see below)
Carbon emissionstC83,330
Electricity generatedMWh182,827
Carbon intensitytC/MWh0.456
This average intensity can be used to calculate carbon savings.  For example if we had a project that saved 2 MWh we would calculate 2 * 0.456 = 0.912 tC as the saving.  This is wrong!

To understand why we need to the concept of the marginal generator.  In reality as electricity is saved the reduction in generation is not spread across each generator.  The reduction occurs in one plant – the marginal generator.  Let’s run through an example.

Suppose we have a grid where electricity is supplied by either wind or coal (the Base Case).  If we save 1 GW of electricity, the generation of the coal plant will reduce by 1 GW (Case 1).

The wholesale mechanism operating in most electricity markets will reduce output on the most expensive plant, not reduce the output of all plants equally.

Figure 1 & 2 – The effect of saving 1 GW of electricity.  Note that the generation from wind is unchanged.
Table 2 – The daily results for the Base Case & Case 1 (you can download the model below)
Base CaseCase 1Saving
WindMWh91,25691,2560
CoalMWh91,57167,57124,000
TotalMWh182,827158,82724,000
Carbon emissionstC83,32961,48921,840
Carbon intensitytC/MWh0.4560.3870.910
Our carbon saving is equal to 1 GW multiplied by the carbon intensity of the marginal plant.

If we were to use the average grid carbon intensity (0.456 tC/MWh) we calculate a daily carbon saving of only 21,480 tC.

You might be asking – how do we know what the marginal generator will be?  It’s likely to be the most expensive generator at that time (it may not be if the plant needs to be kept on for technical reasons).   As renewables are characterized by low marginal costs they are the unlikely to be pushed off the grid.

Luckily high marginal cost generators like open cycle gas turbines are usually also carbon intense – so your saved electricity is likely doing valuable work – and potentially more than you previously thought!

You can download a copy of the model to see my assumptions here:
average-vs-marginal-emissions-2017-02-02.xlsx

UK Imbalance Price Forecasting using Keras & TensorFlow

Previously I introduced a feedforward neural network model built using scikit-learn in Python. This post details an upgraded version which utilizes Keras, TensorFlow and Plotly.

Keras is a more flexible package for building neural networks. It has the ability to train models on GPU by using Google’s Tensorflow as the backend. I still use useful scikit-learn functions for splitting data into cross-validation and test/train sets.

As I’m becoming more comfortable with Python the quality of these scripts is improving. I’m using list comprehensions and lambda functions to make my code faster and cleaner. It’s a great feeling to rewrite parts of the script using new knowledge.

The full code is available on my GitHub repository for this project – I will also include a copy in this post.  The script philosophy is about running experiments on model structure and hyperparameters.

Features of this script

I needed to use a few workarounds to get TensorFlow to train models on the GPU.  First I had to install the tensorflow-gpu package (not tensorflow) through pip.  I then needed to access the TensorFlow session through the back end:

import keras.backend.tensorflow_backend as KTF

We also need to wrap our Keras model with a statement that sets the TensorFlow device as the GPU:

with KTF.tf.device('gpu:0'):

Finally I modify the Tensorflow session:

KTF.set_session(KTF.tf.Session(config=KTF.tf.ConfigProto(allow_soft_placement=True, log_device_placement=False)))

These three things together got my Keras model training on my GPU.

Another workaround I had to include was a way to reset Keras models. I use K-fold cross validation to tune model hyperparameters. This leaves the test set free for use only in determining the generalization error of the model. It is bad machine learning practice to use the test set to tune hyperparameters.

K-fold cross-validation uses a loop to iterate through the training data K times. I found was the model weights were not being reset after each loop – even when I used a function to build & compile the model outside of the loop. The solution I used involves storing the initial weights then randomly permutating them each time we call the function.

The previous scikit-learn script used L2-regularization to control overfitting. In this new version I make use of a Keras dropout layer. Dropout randomly ignores selected neurons during training.

This means that their contribution to the activation of downstream neurons is removed on the forward pass. Any weight updates will not be applied to the neuron on the backwards pass. This should result in the network being less sensitive to the specific weights of neurons.

I’ve also made this script create new folders for each experiment and each individual model run. For each run this script saves the features and a plot of the forecast versus the actual value.

For the experiment it saves the results (MASE, MAE, training loss etc) for all runs and a plot of the training history for each run.

The last major step I have taken is starting to use Plotly. Plotly allows saving of interactive graph objects as HTML documents and uploading of graphs to their server. This has been a major step up from matplotlib as the graphs created are interactive.

Model Results

The next few posts in this series will be using this script to optimize the structure and hyperparameters.

The first experiment I ran was on activation functions.  The following parameters were used:

  • 2016 Imbalance Price & Imbalance Volume data scraped from Elexon
  • Feature matrix of lag 48 of Price & Volume & Sparse matrix of Settlement Period, Day of the week and Month
  • 8 layer neural network:
    • Input layer with 1000 nodes, fully connected
    • Dropout layer after input layer with 30 % dropout
    • Five hidden layers with 1000 nodes, fully connected
    • Output layer with 1000 nodes, single output node
    • Loss function = mean squared error
    • Optimizer = adam

I ran four experiments with different activation functions (the same function in each layer).

Table 1 – Experiment on activation functions (MASE = Mean Absolute Scaled Error)
Activation FunctionTraining MASETest MASETraining Loss
relu0.170.4786
tanh0.360.491,030
sigmoid0.710.722,183
linear0.600.622,269

Table 1 clearly shows that the rectifier (relu) activation function is superior for this task.

I ran a second experiment changing the data included in the feature matrix.  The sparse features are boolean variables for the Settlement Period, the day (Monday – Sunday) and the month.  These are included so that the model can capture trend and seasonality.

The models I ran in this experiment use the same parameters as Experiment One – except the data included in the feature matrix is changed:

  1. Lagged Imbalance Price only
  2. Lagged Imbalance Price & Imbalance Volume
  3. Lagged Imbalance Price & Sparse Features
  4. Lagged Imbalance Price, Imbalance Volume & Sparse features
Table 2 – Experiment on data sources (MASE = Mean Absolute Scaled Error)
Training MASECV MASETest MASEImproved CV MASE
Price only0.250.500.460.0%
Price & volume0.210.490.473.5%
Price & sparse0.170.500.490.7%
Price, volume & sparse0.150.470.456.1%

As expected increasing the amount of data for the model to learn from improves performance.  It’s very interesting how the improvement for Run 4 is more than the sum of the improvements for Runs 2 & 3!

It’s also a validation that the feature data has been processed correctly.  If the results got worse it’s because either a mistake was made during processing

Table 2 shows a large difference between the Training & Test MASE.  This indicates overfitting – something we will try to fix in future experiments by working with model structure and dropout.

So conclusions from our first two experiments – use the relu activation function and put in as many useful features as possible.

A quick look at the forecast for Run 4 of Experiment Two for the last few days in December 2016:

Figure 1 – The forecast versus actual Imbalance Price for the end of 2016

Click here to interact with the full 2016 forecast.

Next Steps

I’ll be doing more experiments on model structure and hyperparameters.  Experiments I have lined up include:

  • Model structure – number of layers & nodes per layer
  • Dropout fraction
  • Size of lagged data set

I’ll also keep working on improving the quality of code!

The Script

"""
ADG Efficiency
2017-01-31

This model runs experiments on model parameters on a neural network.
The neural network is built in Keras and uses TensorFlow as the backend.

To setup a model run you need to look at
- mdls_D
- mdls_L
- machine learning parameters
- pred_net_D

This script is setup for an experiment on the types
of data used as model features.

This model is setup to run an experiment on the sources of data.
"""

import os
import pandas as pd
import numpy as np
import sqlite3
import pickle
import datetime

# importing some useful scikit-learn stuff
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold

# importing Keras & Tensorflow
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import adam, RMSprop, SGD
from keras.metrics import mean_absolute_error
import keras.backend.tensorflow_backend as KTF # https://groups.google.com/forum/#!topic/keras-users/MFUEY9P1sc8

import tensorflow as tf

# importing plotly
import plotly as py
import plotly.graph_objs as go 
import plotly.tools as tls 
tls.set_credentials_file(username="YOUR USERNAME", api_key="YOUR API KEY")

# imported from another script 
from metrics import MASE 

# workaround found on stack overflow
# http://stackoverflow.com/questions/40046619/keras-tensorflow-gives-the-error-no-attribute-control-flow-ops
tf.python.control_flow_ops = tf 

# grabbing the TF session from Keras backend
sess = KTF.get_session() 

# function that inputs the database path 
def get_data(db_path,table):
 conn = sqlite3.connect(db_path)
 data = pd.read_sql(sql='SELECT * from ' + str(table),con = conn)
 conn.close()
 print('got sql data')
 return data # returns a df with all table data

# setting base directory
base_path = os.path.dirname(os.path.abspath(__file__))

# location of SQL database 
sql_path = os.path.join(base_path,'data','ELEXON DATA.sqlite')
 
# saving run time for use in creating folder for this run of experiments
run_name = str(datetime.datetime.now())
run_name = run_name.replace(':','-')
run_path = str('Experiment Results//' + run_name)
results_csv_path = os.path.join(base_path, run_path, 'results.csv')
history_csv_path = os.path.join(base_path, run_path, 'histories.csv')

# dictionary containing all of the infomation for each report
data_price_D = {'Imbalance Price':{'Report name':'B1770','Data name':'imbalancePriceAmountGBP'}}

data_price_vol_D = {'Imbalance Price':{'Report name':'B1770','Data name':'imbalancePriceAmountGBP'},
 'Imbalance Volume':{'Report name':'B1780','Data name':'imbalanceQuantityMAW'}}

# list of model names - done manually to control order & which models run
mdls_L = ['price, vol & sparse'] # 'price only', 'price & volume', 'price & sparse', 

# dictionary of mdls with parameters
# FL = first lag. LL = last lag. SL = step size, 
# Sparse? = to include sparse data for trend/seasonality or not, 
# Data dict = which dictionary of data to use for the lagged time series
mdls_D = {'price only':{'date_start':'01/01/2016 00:00','date_end':'31/12/2016 00:00','FL':48,'LL':48*2,'SL':48,'Sparse?':False,'Data dict' : data_price_D},
'price & volume':{'date_start':'01/01/2016 00:00','date_end':'31/12/2016 00:00','FL':48,'LL':48*2,'SL':48,'Sparse?':False,'Data dict' : data_price_vol_D},
'price & sparse':{'date_start':'01/01/2016 00:00','date_end':'31/12/2016 00:00','FL':48,'LL':48*2,'SL':48,'Sparse?':True,'Data dict' : data_price_D},
'price, vol & sparse':{'date_start':'01/01/2016 00:00','date_end':'31/12/2016 00:00','FL':48,'LL':48*2,'SL':48,'Sparse?':True,'Data dict' : data_price_vol_D}}

# machine learning parameters
n_folds = 10 # number of folds used in K-fold cross validation
epochs = 200 # number of epochs used in training 
random_state_split = 5
random_state_CV = 3

# dataframes for use in tracking model results
results_cols = ['Model name','CV MASE','Training MASE','Test MASE','MASE',
 'Training MAE','Test MAE','MAE','CV Loss','Training Loss',
 'Training History','Error Time Series']
results_DF = pd.DataFrame(columns = results_cols)
results_all_DF = results_DF 
history_all_DF = pd.DataFrame()

# for loop to iterate through models
for mdl_index, mdl in enumerate(mdls_L):
 mdl_params = mdls_D[mdl]
 
 # resetting dataframes for storing results from this model run
 results_DF = pd.DataFrame(index = [mdl], columns = results_cols)
 history_DF = pd.DataFrame()

 # creating folder for this run
 exp_path = str(run_path + '/' + str(mdl))
 os.makedirs(exp_path)

 # setting model parameters
 date_start = mdl_params['date_start']
 date_start = mdl_params['date_end']
 first_lag = mdl_params['FL']
 last_lag = mdl_params['LL']
 step_lag = mdl_params['SL']
 include_sparse = mdl_params['Sparse?']
 data_D = mdl_params['Data dict']
 
 # unpacking dictionary of data to be used 
 data_sources = [key for key in data_D] # ie Imbalance Price, Imbalance Volume
 
 # getting the imbalance price first
 if data_sources[0] != 'Imbalance Price':
 price_loc = data_sources.index('Imbalance Price')
 first_source = data_sources[0]
 data_sources[price_loc] = first_source
 data_sources[0] = 'Imbalance Price'
 print(data_sources)

 table_names = [data_D[item]['Report name'] for item in data_sources] 
 data_col_names = [data_D[item]['Data name'] for item in data_sources]
 
 # getting data from SQL
 data_L = [get_data(db_path = sql_path, table = data_D[data_source]['Report name']) for data_source in data_sources]

 # list of the index objects
 indexes_L = [pd.to_datetime(raw_data['index']) for raw_data in data_L] 
 
 # list of the settlement periods
 SP_L = [raw_data['settlementPeriod'] for raw_data in data_L]

 # list of the actual data objects
 data_objs_L = [raw_data[data_col_names[index]].astype(float) for index, raw_data in enumerate(data_L)]
 
 # indexing these data objects
 for i, series in enumerate(data_objs_L):
 df = pd.DataFrame(data=series.values, index=indexes_L[i],columns=[series.name])
 data_objs_L[i] = df

 # creating feature dataframe - gets reset every model run
 data_DF = pd.DataFrame()
 for data_index, data_obj in enumerate(data_objs_L):
 # creating lagged data frame (make this a function)
 for i in range(first_lag,last_lag,step_lag):
 lag_start, lag_end, lag_step = 1, i, 1
 # creating the lagged dataframe
 data_name = data_obj.columns.values[0]
 lagged_DF = pd.DataFrame(data_obj)
 
 for lag in range(lag_start,lag_end+1,lag_step):
 lagged_DF[str(data_name) + ' lag_'+str(lag)] = lagged_DF[data_name].shift(lag) 
 print(lagged_DF.columns)
 lagged_DF = lagged_DF[lag_end:] # slicing off the dataframe
 index = lagged_DF.index # saving the index 
 
 data_DF = pd.concat([data_DF,lagged_DF],axis=1) # creating df with data 
 
 SP = SP_L[0] 
 SP = SP[(len(SP)-len(data_DF)):].astype(float) # slicing our settlement periods 
 
 # creating our sparse matricies for seasonality & trend
 date = index
 # creating label binarizer objects 
 encoder_SP = LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)
 encoder_days = LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)
 encoder_months = LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)
 # creating sparse settelment period feature object
 encoder_SP.fit(SP)
 encoded_SP = encoder_SP.transform(SP)
 SP_features = pd.DataFrame(encoded_SP, index, columns = list(range(1,51)))
 # creating sparse day of the week feature object
 days = list(map(lambda x: x.weekday(), date))
 encoder_days.fit(days)
 encoded_days = encoder_days.transform(days)
 days_features = pd.DataFrame(encoded_days, index = index, 
 columns = ['Mo','Tu','We','Th','Fr','Sa','Su'])
 # creating sparse month feature object
 months = list(map(lambda x: x.month, date))
 encoder_months.fit(months)
 encoded_months = encoder_months.transform(months)
 months_features = pd.DataFrame(encoded_months, index = index, 
 columns = ['Ja','Feb','Mar','Ap','Ma','Jun','Jul','Aug','Sep','Oct','Nov','Dec'])
 
 sparse_features = pd.concat([SP_features,days_features, months_features],axis=1) 
 
 print('incl sparse is ' + str(include_sparse))
 if include_sparse == True:
 print('including sparse')
 data_DF = pd.concat([data_DF,sparse_features],axis=1)
 
 # saving our feature matrix to a csv for checking
 features_path = os.path.join(base_path, exp_path, 'features.csv')
 data_DF.to_csv(features_path) 
 
 # creating our target matrix (the imbalance price)
 y = data_DF['imbalancePriceAmountGBP']
 y.reshape(1,-1)
 
 # dropping out the actual values from our data
 for data_col_name in data_col_names:
 data_DF = data_DF.drop(data_col_name,1)
 
 # setting our feature matrix 
 X = data_DF
 
 # splitting into test & train
 # keeping the split the same for different model runs
 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state = random_state_split)
 split_D = {'X_train' : len(X_train), 'X_test' : len(X_test)}
 
 # standardizing our data 
 X_scaler = StandardScaler()
 X_train = X_scaler.fit_transform(X_train)
 X_test = X_scaler.transform(X_test)
 X_all = X_scaler.transform(X)
 
 # reshaping
 X_train = np.asarray(X_train)
 y_train = np.asarray(y_train).flatten()
 
 X_test = np.asarray(X_test)
 y_test = np.asarray(y_test).flatten()
 
 X = np.asarray(X) # not sure if I use this anywhere - also this data is not standardized 
 y = np.asarray(y).flatten()
 
 # saving our scaler objects for use later
 pickle.dump(X_scaler, open(os.path.join(base_path,'pickle', 'X_scaler - ' + mdl + '.pkl'), 'wb'), protocol=4) 
 
 # length of the inputs
 training_samples = X_train.shape[0]
 input_length = X_train.shape[1] # should probably rename this to width....
 batch_size = int(training_samples/4)

 pred_net_D = {'price only':{'Layer 1' : Dense(1000, input_dim = input_length, activation = 'relu'), 'Layer 2' : Dense(1000, activation='relu'), 'Layer 3' : Dense(1000, activation='relu'), 'Layer 4' : Dense(output_dim = 1, activation='linear'),'Dropout fraction':0.3},
 'price & volume':{'Layer 1' : Dense(1000, input_dim = input_length, activation = 'relu'), 'Layer 2' : Dense(1000, activation='relu'), 'Layer 3' : Dense(1000, activation='relu'), 'Layer 4' : Dense(output_dim = 1, activation='linear'),'Dropout fraction':0.3},
 'price & sparse':{'Layer 1' : Dense(1000, input_dim = input_length, activation = 'relu'), 'Layer 2' : Dense(1000, activation='relu'), 'Layer 3' : Dense(1000, activation='relu'), 'Layer 4' : Dense(output_dim = 1, activation='linear'),'Dropout fraction':0.3}, 
 'price, vol & sparse':{'Layer 1' : Dense(1000, input_dim = input_length, activation = 'relu'), 'Layer 2' : Dense(1000, activation='relu'), 'Layer 3' : Dense(1000, activation='relu'), 'Layer 4' : Dense(output_dim = 1, activation='linear'),'Dropout fraction':0.3} 
 }
 
 # defining layers for prediction network
 pred_net_params = pred_net_D[mdl]
 layer1 = pred_net_params['Layer 1']
 layer2 = pred_net_params['Layer 2']
 layer3 = pred_net_params['Layer 2']
 layer4 = pred_net_params['Layer 2']
 layer5 = pred_net_params['Layer 2']
 layer6 = pred_net_params['Layer 2']
 layer7 = pred_net_params['Layer 2']
 layer8 = pred_net_params['Layer 2']
 layer9 = pred_net_params['Layer 2']
 layer10 = pred_net_params['Layer 4']
 dropout_fraction = pred_net_params['Dropout fraction'] 

 with KTF.tf.device('gpu:0'): # force tensorflow to train on GPU
 KTF.set_session(KTF.tf.Session(config=KTF.tf.ConfigProto(allow_soft_placement=True, log_device_placement=False)))
 
 def get_model(): # use this function so I can recreate new network within CV loop
 network = Sequential() 
 network.add(layer1)
 network.add(Dropout(dropout_fraction))
 network.add(layer2)
 network.add(layer3)
 network.add(layer4)
 network.add(layer5)
 network.add(layer6)
 network.add(layer10)
 return network
 
 # https://gist.github.com/jkleint/eb6dc49c861a1c21b612b568dd188668 
 def shuffle_weights(model, weights=None):
 """Randomly permute the weights in `model`, or the given `weights`.
 This is a fast approximation of re-initializing the weights of a model.
 Assumes weights are distributed independently of the dimensions of the weight tensors
 (i.e., the weights have the same distribution along each dimension).
 :param Model model: Modify the weights of the given model.
 :param list(ndarray) weights: The model's weights will be replaced by a random permutation of these weights.
 If `None`, permute the model's current weights.
 """
 if weights is None:
 weights = model.get_weights()
 weights = [np.random.permutation(w.flat).reshape(w.shape) for w in weights]
 # Faster, but less random: only permutes along the first dimension
 # weights = [np.random.permutation(w) for w in weights]
 model.set_weights(weights)
 
 optimizer = adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)

 # cross validation
 print('Starting Cross Validation')
 CV_network = get_model()
 CV_network.compile(loss='mean_squared_error', optimizer=optimizer)
 initial_weights_CV = CV_network.get_weights()
 CV = KFold(n_splits=n_folds, random_state = random_state_CV)
 MASE_CV_L = []
 loss_CV_L = []

 for train, test in CV.split(X_train, y_train):
 shuffle_weights(CV_network, initial_weights_CV)
 mdl_CV = CV_network.fit(X_train[train], y_train[train], nb_epoch = epochs, batch_size = batch_size) 

 y_CV_pred = CV_network.predict(X_train[test],batch_size = batch_size, verbose=0).flatten()
 MASE_CV = MASE(y_train[test],y_CV_pred, 48)
 MASE_CV_L.append(MASE_CV)
 loss_CV = mdl_CV.history['loss'][-1]
 loss_CV_L.append(loss_CV)
 
 MASE_CV = np.average(MASE_CV_L)
 loss_CV = np.average(loss_CV_L)
 
 # training network on all training data 
 print('Training prediction network')
 network = get_model()
 network.compile(loss='mean_squared_error', optimizer=optimizer) 
 initial_weights_net = network.get_weights()
 
 shuffle_weights(network, initial_weights_net)
 mdl_net = network.fit(X_train, y_train, nb_epoch = epochs, batch_size = batch_size)
 
 y_pred_train = network.predict(X_train,batch_size = batch_size, verbose=0).flatten()
 y_pred_test = network.predict(X_test,batch_size = batch_size, verbose=0).flatten()
 y_pred = network.predict(X_all,batch_size = batch_size, verbose=0).flatten()
 
 error_train = y_pred_train - y_train
 error_test = y_pred_test - y_test 
 error = y - y_pred
 abs_error = abs(error) 
 
 MASE_train = MASE(y_train, y_pred_train, 48)
 MASE_test = MASE(y_test, y_pred_test, 48)
 MASE_all = MASE(y, y_pred, 48)
 
 MAE_train = mean_absolute_error(y_train, y_pred_train).eval(session=sess)
 MAE_test = mean_absolute_error(y_test, y_pred_test).eval(session=sess)
 MAE_all = mean_absolute_error(y, y_pred).eval(session=sess)
 
 results_DF.loc[mdl,'Model name'] = mdl
 results_DF.loc[mdl,'CV MASE'] = MASE_CV
 results_DF.loc[mdl,'Training MASE'] = MASE_train
 results_DF.loc[mdl,'Test MASE'] = MASE_test
 results_DF.loc[mdl,'MASE'] = MASE_all
 
 results_DF.loc[mdl,'Training MAE'] = MAE_train
 results_DF.loc[mdl,'Test MAE'] = MAE_test
 results_DF.loc[mdl,'MAE'] = MAE_all
 
 results_DF.loc[mdl,'CV Loss'] = loss_CV
 results_DF.loc[mdl,'Training Loss'] = mdl_net.history['loss'][-1]

 results_DF.loc[mdl,'Error Time Series'] = error
 results_DF.loc[mdl,'Training History'] = mdl_net.history['loss']

 history_DF = pd.DataFrame(data=list(mdl_net.history.values())[0], index = list(range(1,epochs+1)), columns=[mdl])
 
 # figure 1 - plotting the actual versus prediction 
 actual_imba_price_G = go.Scatter(x=index,y=y,name='Actual',line=dict(width=2)) 
 predicted_imba_price_G = go.Scatter(x=index,y=y_pred,name='Predicted',line=dict(width=2, dash = 'dash'))
 fig1_data = [actual_imba_price_G, predicted_imba_price_G]
 fig1_layout = go.Layout(title='Forecast',yaxis=dict(title='Imbalance Price [£/MWh]'))
 fig1 = go.Figure(data=fig1_data,layout=fig1_layout) 
 fig1_name = os.path.join(exp_path,'Figure 1.html')
 py.offline.plot(fig1,filename = fig1_name, auto_open = False) # creating offline graph
 # py.plotly.plot(fig1, filename='Forecast', sharing='public') # creating online graph
 
 # saving results
 network_params_DF = pd.DataFrame(pred_net_params, index=[mdl])
 mdl_params_DF = pd.DataFrame(mdl_params, index=[mdl])
 results_DF = pd.concat([results_DF, network_params_DF, mdl_params_DF], axis = 1, join = 'inner')
 results_all_DF = pd.concat([results_all_DF, results_DF], axis = 0)
 history_all_DF = history_all_DF.join(history_DF, how='outer')

print(results_all_DF) 

results_all_DF.to_csv(results_csv_path) 
history_all_DF.to_csv(history_csv_path) 

# figure 2 - comparing training history of models
fig2_histories = [history_all_DF[col] for col in history_all_DF]
fig2_data = [go.Scatter(x = data.index, y = data, name=data.name) for data in fig2_histories]
fig2_layout = go.Layout(title='Training History',yaxis=dict(title='Loss'),xaxis=dict(title='Epochs'))
fig2 = go.Figure(data=fig2_data,layout=fig2_layout)
fig2_name = os.path.join(run_path,'Figure 2.html')
py.offline.plot(fig2,filename = fig2_name, auto_open = False) # creating offline graph

 

 

A Simple Look at the Efficiency Penalty Cost of Renewables

This article is a response to a post called Wind Integration vs. Air Emission Reductions: A Primer for Policymakers.  This article claims the efficiency penalty of turning down fossil fuel power stations offsets the benefit of renewable power generation.
 
I found this an interesting idea – so I developed a simple model to understand what could be happening.
 
Renewable power generation brings a carbon benefit by generating less power in fossil fuel power stations.  Yet there is a factor working in opposition to this. Generating less in fossil fuel power stations can incur an efficiency penalty.
 
I don’t have data for the relationship between fossil fuel power station generation and efficiency. Instead I look at what the breakeven efficiency penalty would be to offset the benefit of renewable generation.
 
I created a simple model supplying 1 GW of electricity.  Renewable output is varied from 0 GW to 0.5 GW with coal supplying the balance.
 

I assumed the fossil fuel power station operates at a 50 % HHV efficiency at full load.  I then looked at various efficiency penalty factors in the form of reduced efficiency per reduction in load (% HHV / % load).   The efficiency penalty was modeled as linear.  You can download a copy of the model here.

 

Figure 1 – The effect of various assumed efficiency penalties on fossil fuel consumption
For this simple model 5 % HHV / % load is the break even.  If the efficiency really reduces at this rate then generating electricity from renewables is giving us no carbon benefit.
 
The real question is what is the actual relationship between fossil fuel power station output and efficiency.   It’s likely to be non-linear.  I also expect it would not be as harsh as 5 % HHV/% load – so likely renewables are providing a carbon benefit.
 
Is it is useful to know that this is a carbon penalty we could be paying somewhere on the system as renewables penetration increases This penalty will net off some of the maximum benefit that renewable generation could supply.
 
However if we can actually start permanently shutting down fossil fuel power stations then this effect doesn’t occur.  This suggests that the number of fossil fuel power stations operating could be a good metric to keep track of.