energy_py is reinforcement learning in energy systems. It’s reinforcement learning agents and environments built in Python.
I have a vision of using reinforcement learners to optimally operate energy systems. energy_py is a step towards this vision. I’ve built this because I’m so excited about the potential of reinforcement learning in the energy industry.
Reinforcement learning in energy systems requires first proving the concepts in a virtual environment. This project demonstrates the ability of reinforcement learning to control a virtual energy environment.
What is reinforcement learning
Reinforcement learning is the branch of machine learning where learning occurs through action. Reinforcement learning will give us the tools to operate our energy systems at superhuman levels of performance.
It’s quite different from supervised learning. In supervised learning we start out with a big data set of features and our target. We train a model to replicate this target from patterns in the data.
In reinforcement learning we start out with no data. The agent generates data by interacting with the environment. The agent then learns patterns in this data. These patterns help the agent to choose actions that maximize total reward.
Why do we need reinforcement learning in energy systems
Optimal operation of energy assets is already very challenging. Our current energy transition is making this difficult problem even harder. The rise of intermittent and distributed generation is introducing volatility and increasing the number of actions available to operators.
For a wide range of problems machine learning results are both state of the art and better than human experts. We can get this level of performance using reinforcement learning in our energy systems.
Today many operators use rules or abstract models to dispatch assets. A set of rules is not able to guarantee optimal operation in many energy systems.
Optimal operating strategies can be developed from abstract models. Yet abstract models (such as linear programming) are often constrained. These models are limited to approximations of the actual plant. Reinforcement learners are able to learn directly from their experience of the actual plant.
Reinforcement learning can also deal with non-linearity. Most energy systems exhibit non-linear behavior (in fact an energy balance is bi-linear!). Reinforcement learning can model non-linearity using neural networks. It is also able to deal with the non-stationary and hidden environment in many energy systems.
There are challenges to be overcome. The first and most important is safety. Safety is the number one concern in any engineering discipline. What is important to understand is we limit the actions available to the agent. All lower levels or systems of controls would remain exactly the same.
There is also the possibility to design the reward function to incentivize safety. A well-designed reinforcement learner could actually reduce hazards to operators.
A final challenge worth addressing is the impact such a learner could have on employment. Machine learning is not a replacement for human operators. A reinforcement learner would not need a reduction in employees to be a good investment.
The value of using a reinforcement learner is to let operations teams do their jobs better. It will allow them to spend more time and improve performance for their remaining responsibilities such as maintaining the plant. The value created here is a better-maintained plant and a happier workforce – in a plant that is operating with superhuman levels of economic and environmental performance.
Any machine requires downtime – a reinforcement learner is no different. There will still be time periods where the plant will operate in manual or semi-automatic modes with human guidance.
energy_py is one step on a long journey of getting reinforcement learners helping us in the energy industry. The fight against climate change is the greatest that humanity faces. Reinforcement learning will be a key ally in fighting it.
energy_py is built using Python. You can checkout the repository on GitHub here.
I’m really looking forward to getting to know energy_py. There are a number of parameters to tune. For example the structure of the environment or the design of the reward function can be modified to make the reinforcement learning problem more challenging.
One key design choice is the number of assets the agent has to control. The more choices available to the agent the more complex the shape of the value function becomes. To approximate a more complex value function we may need a more complex neural network.
There is also a computational cost incurred with increasing the number of actions. More actions means more [state, actions] to consider during action selection and training (both of which require value function predictions).
So far I’ve been experimenting with an environment based on two assets – a 7.5 MWe gas turbine and a 7.5 MWe gas engine. The episode length is set to 336 steps (one week). I run a single naive episode, 30 ε-greedy episodes and a single greedy episode.
Figure 1 below shows the total reward per episode increasing as the agent improves it’s estimate of the value function and spends less time exploring.
Figure 1 – Epsilon decay and total reward per episode
Figure 2 shows the Q-test and the network training history. Q-test is the average of three random [state, actions] evaluated by the value function. It shows how the value function approximation changes over time.
Figure 2 – Q-Test and the network training history
Figure 3 shows some energy engineering outputs. I’m pretty happy with this operating regieme – the model is roughly following both the electricity price and the heat demand which is expected behaviour.
Figure 3 – Energy engineering outputs for the final greedy run
One interesting decision to make is how often to improve the approximation of the value function. David Silver makes the point that you should make use of the freshest data when training – hence training after each step through the environment. He also makes the point that you don’t need to fully train the network – just train a ‘little bit’.
This makes sense as the distribution of the data (the replay memory) will change as the learner trains it’s value function. I train on a 64 sample batch of the replay memory. I perform 100 passes over the entire data set (i.e. 100 epochs). Both values can be optimized. It could make sense to train for more epochs in later episodes as we want to fit the data more than in earlier episodes.
Another challenge in energy_py is balancing exploration versus exploitation. The Q_learner algorithm handles this dilemma using ε-greedy action selection. I decay epsilon at a fixed rate – the optimal selection of this parameter is something I’ll take a look at in the future.
There are many exciting innovations developed recently in reinforcement learning that I’m keen to add to energy_py. One example is the idea of Prioritized Experience Replay – where the batch is not taken randomly from the replay memory but instead prioritizes some samples over others.
It’s unlikely that I’ll ever catch up to the state of the art in reinforcement learning – what I hope of find is that we don’t need state of the art techniques to get superhuman performance from energy systems!
My reinforcement learning journey
I’m a chemical engineer by training (B.Eng, MSc) and an energy engineer by profession. I’m really excited about the potential of machine learning in the energy industry – in fact that’s what this blog is about!
My understanding of reinforcement learning has come from a variety of resources. I’d like to give credit to all of the wonderful resources I’ve used to understand reinforcement learning.
Minh et. al (2013) Playing Atari with Deep Reinforcement Learning – to give you an idea of the importance of this paper – Google purchased DeepMind after this paper was published. DeepMind was a company with no revenue, no customers and no product – valued by Google at $500M! This is a landmark paper in reinforcement learning.
Minh et. al (2015) Human-level control through deep reinforcement learning – an update to the 2013 paper published in Nature.
I would also like to thank Data Science Retreat. I’m just finishing up the three month immersive program – energy_py is my project for the course. Data Science Retreat has been a fantastic experience and I would highly recommend it. The course is a great way to invest in yourself, develop professionally and meet amazing people.
That’s it from me – thanks for reading!