I have a vision for using machine learning for optimal control of energy systems. If a neural network can play a video game, hopefully it can understand how to operate a power plant.
In my previous role at ENGIE I built Mixed Integer Linear Programming models to optimize CHP plants. Linear Programming is effective in optimizing CHP plants but it has limitations.
I’ll detail these limitations in future post – this post is about Reinforcement Learning (RL). RL is a tool that can solve some of the limitations inherent in Linear Programming.
In this post I introduce the first stage of my own RL learning process. I’ve built a simple model to charge/discharge a battery using Monte Carlo Q-Learning. The script is available on GitHub.
I made use of two excellent blog posts to develop this. Both of these posts give a good introduction to RL:
As I don’t have access to a battery system I’ve built a simple model within Python. The battery model takes as inputs the state at time t, the action selected by the agent and returns a reward and the new state. The reward is the cost/value of electricity charged/discharged.
def battery(state, action): # the technical model # battery can choose to : # discharge 10 MWh (action = 0) # charge 10 MWh or (action = 1) # do nothing (action = 2) charge = state # our charge level SP = state # the current settlement period action = action # our selected action prices = getprices() price = prices[SP - 1] # the price in this settlement period if action == 0: # discharging new_charge = charge - 10 new_charge = max(0, new_charge) charge_delta = charge - new_charge reward = charge_delta * price if action == 1: # charging new_charge = charge + 10 new_charge = min(100, new_charge) charge_delta = charge - new_charge reward = charge_delta * price if action == 2: # nothing charge_delta = 0 reward = 0 new_charge = charge - charge_delta new_SP = SP + 1 state = (new_charge, new_SP) return state, reward, charge_delta
The price of electricity varies throughout the day. The model is not fed this data explicitly – instead it learns through interaction with the environment.
def updateQtable(av_table, av_count, returns): # updating our Q (aka action-value) table # ******** for key in returns: av_table[key] = av_table[key] + (1 / av_count[key]) * (returns[key] - av_table[key]) return av_table