A simple Python workflow for time series simulations

2 minute read

A common workflow I encounter in my data science work is simulating a process through time. I often want to:

  • simulate a process
  • collect the results at each step
  • output a simple plot of the variables over time

In this post I introduce a simple Python implementation for this that works really well.

For those in a rush I’ll first introduce the key components separately. I’ll then show how this simple framework is used to tackle a problem from the machine learning classic Sutton & Barto - An Introduction to Reinforcement Learning.

I’m in love with defaultdict, and I feel fine

The first component is a defaultdict from the collections module in the Python standard library. The advantage of a defaultdict is flexibility - instead of needing to initialize a key/value pair, you can add keys on the fly and append to an already initialized list.

#  if we use a normal python dictionary, adding a new key requires the following
stats = {}
stats['variable'] = []
stats['variable'].append(var)

#  adding another variable requires two more lines
stats['other_variable'] = []
stats['other_variable'].append(other_var)

#  if we instead use a defaultdict, we can do the five lines above in three lines
stats = defaultdict(list)
stats['variable'].append(var)
stats['other_variable'].append(other_var)

Having a dictionary full of lists is not particularly useful. But once our defaultdict is full of data, we can easily turn it into a pandas DataFrame using the from_dict method.

We need to make sure that all of the values in our stats dictionary are lists of the same length. This will be the case if we added one value for each variable at each step.

stats = pd.DataFrame().from_dict(stats)

Finally, we can use this dataframe with matplotlib to plot our data.

fig, axes = plt.subplots()
stats.plot(y='variable', ax=axes)
stats.plot(y='other_variable', ax=axes)

Example - updating the value function for a bandit

Now lets look at this framework in the context of a real problem. The problem is the solution to a question posed in Section 2.6 of Sutton & Barto - An Introduction to Reinforcement Learning. To fully understand the problem I suggest reading the chapter - you can find the 2nd Edition online for free here.

The problem involves the incremental updating the value function for a bandit problem.

Sutton suggest that an improvement to using a constant step size (say ) to use a step size .

Where we update by

The program written for this problem is given below. To get the figure to show, you need to first save the code snippet to bandit.py, then run the program in interactive mode ($ python -i bandit.py).

from collections import defaultdict

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd


alpha = 0.0001
q = 10
omega = 0

stats = defaultdict(list)

for step in range(50):
    stats['q'].append(q)
    stats['omega'].append(omega)

    omega = omega + alpha * (1 - omega)
    beta = alpha / omega
    stats['beta'].append(beta)

    reward = np.random.normal(loc=5, scale=1)
    stats['reward'].append(reward)

    q += beta * (reward - q)

result = pd.DataFrame().from_dict(stats)

f, a = plt.subplots(nrows=4)

result.plot(y='reward', ax=a[0])
result.plot(y='q', ax=a[1])
result.plot(y='omega', ax=a[2])
result.plot(y='beta', ax=a[3])

print('final estimate {}'.format(stats['q'][-1]))

f.show()

The results of the run are stored in the result DataFrame:

>>> result.head()
           q   omega      beta    reward
0  10.000000  0.0000  1.000000  4.762884
1   4.762884  0.0001  0.500025  4.623668
2   4.693273  0.0002  0.333367  4.734825
3   4.707125  0.0003  0.250038  4.573823
4   4.673794  0.0004  0.200040  3.663734

What pops out the end is a simple time series plot of how our variables changed over time:

fig1

Figure 1 - Results using the hyperparameters in the code snippet above

Thanks for reading!

Updated: