Reinforcement Learning for Energy Assets with energy-py
A Python framework for training reinforcement learning agents on energy systems using Gymnasium and Stable Baselines 3.
Created:
Apr 2017
Updated:
Nov 2025
Blog
Energy
Machine Learning
Reinforcement Learning
Energy systems need intelligent control - but most rely on simple heuristics that can’t adapt to changing conditions like price volatility or renewable generation uncertainty.
Reinforcement learning offers a solution - agents can learn optimal control policies through trial and error, adapting to complex, dynamic energy systems.
This post introduces energy-py - a Python framework for training reinforcement learning agents on energy environments, starting with electric battery storage.
energy-py is a reinforcement learning framework for energy systems:
Gymnasium integration: Custom energy environments following the Gymnasium API
Stable Baselines 3: Pre-built RL agents (PPO, DQN, A2C) ready to use
Battery environment: Electric battery storage for price arbitrage
Experiment framework: Train and evaluate on separate datasets
Historical data: Real electricity price data for realistic training
Why Reinforcement Learning for Energy?
Reinforcement learning enables agents to learn control policies without explicit programming:
Adaptation: Learns from experience, improves over time unlike static heuristics
Non-linear patterns: Deep neural networks can capture complex relationships
Strong reward signals: Energy offers clear objectives - cost and carbon intensity
Simulation: Energy systems can be simulated, enabling sample-efficient learning
Traditional heuristics fail when conditions change - a rule to prefer biomass over gas breaks down when gas prices drop. Reinforcement learning adapts.
energy-py vs energypylinear
These are two different approaches to the same problem:
energy-py uses reinforcement learning:
Learning-based: Agent learns optimal policy through trial and error
Non-linear: Can model complex, non-linear system dynamics
Sample inefficient: Requires many training episodes
Stochastic: Different training runs produce different policies
energypylinear uses mixed-integer linear programming:
Deterministic: Same inputs always produce same outputs
Fast: Solves efficiently, no training required
Use energypylinear when your system is linear and you need deterministic, fast solutions. Use energy-py when you need to model non-linear dynamics or want agents that adapt over time.
Installation
Requires Python 3.11 or later:
$ git clone https://github.com/ADGEfficiency/energy-py
$ cd energy-py
$ make setup
Example: Battery Arbitrage with Random Agent
Train a PPO agent on a battery with random electricity prices: