Tag Archives: python

11 tips from 11 months of learning Python

I’ve been learning Python for around 11 months. It’s been a wonderful journey! This post is a list of 11 things that I’ve learned along the way.

1 – Setup

The hardest thing about learning Python can be getting it setup in the first place! I recommend using the Anaconda distribution of Python.

Regarding Python 2 vs Python 3 – if you are starting out now it makes sense to learn Python 3. It’s worth knowing what the differences are between the two – once you’ve made some progress with Python 3.

The installation process is pretty straight forward – you can check that Anaconda installed correctly by typing ‘python’ into Terminal or Command Prompt. You should get something like the following:

python_terminal

2 – pip

pip is a way to manage packages in Python. pip is run from a Terminal. Below are the pip commands I use the most.

To install a package (Note that the -U argument forces pip to install the upgraded version of the package)

pip install pandas -U

To remove a package

pip remove pandas

To print all installed packages

pip freeze

3 – Virtual environments

Virtual environments are best practice for managing Python on your machine. Ideally you should have one virtual environment for each project you work on.

This gives you the ability to work with different versions of packages in different projects and to understand the package dependencies of your project.

There are two main packages for managing virtual environments. Personally I use conda (as I always use the Anaconda distribution of Python).

One cool trick is that once you activate your environment, you can start programs such as Atom or Jupyter and they will use your environment.

For example if you use a terminal plugin within Atom, starting Atom this way will mean the terminal uses your environment Python – not your system Python.

4 – Running Python scripts interactively

Running a script interactively can be very useful when you are learning Python – both for debugging and getting and understanding of what is going on!

cd folder_where_script_lives
python -i script_name.py

After the script has run you will be left with an interactive console. If Python encounters an error in the script then you will still end up in interactive mode (at the point where the script broke).

5 – enumerate

Often you want to loop over a list and keep information about the index of the current item in the list.

This can naively be done by

idx = 0
for item in a_list:
    other_list[idx] = item
    idx += idx

Python offers a cleaner way to implement this

for idx, item in enumerate(a_list):
    other_list[idx] = item

We can also start the index at a value other than zero

for idx, item in enumerate(a_list, 2):
    other_list[idx] = item

6 – zip

Often we want to iterate over two lists together. A naive approach would be to

for idx, item_1 in enumerate(first_list):
    item_2 = second_list[idx]
    result = item_1 * item_2

A better approach is to make use of zip – part of the Python standard library

for item_1, item_2 in zip(first_list, second_list):
    result = item_1 * item_2

We can even combine zip with enumerate

for idx, (item_1, item_2) in zip(first_list, second_list):
    other_list[idx] = item_1 * item_2

7 – List comprehensions

List comprehensions are baffling at first. They offer a much cleaner way to implement list creation.

A naive approach to making a list would be

new_list = []
for item in old_list:
    new_list.append(2 * item)

List comprehensions offers a way to do this in a single line

new_list = [item * 2 for item in old_list]

You can also create other iterables such as tuples or dictionaries using similar notation.

8 – Default values for functions

Often we create a function with inputs that only need to be changed rarely. We can set a default value for a function by

def my_function(input_1, input_2=10):
    return input_1 * input_2

We can run this function using

result = my_function(input_1=5)

Which will return result = 50.

If we wanted to change the value of the second input we could

result_2 = my_function(input_1=5, input_2=5)

Which will return result = 25.

9 – git

Git is a fantastic tool that I highly recommend using. As with Python I’m no expert! A full write up of how to use git is outside the scope of this article – these commands are useful to get started. Note that all of these commands should be entered in a Terminal that is inside the git repo.

To check the status of the repo

git status

To add files to a commit and push to your master branch

git add file_name
git commit -m 'commit message'
git push origin master

Note that you can do multiple commits in a single push.

We can also add multiple files at once. To add all files that are already tracked (i.e. part of the repo)

git add -u

To add all files (tracked & untracked)

git add *

Another useful command is

git reset HEAD~

What this command allows you to do is to undo local commits. Sometimes you will add files to your commit you didn’t mean to – this allows you to undo them one by one (ie commit by commit).

10 – Text editors

There are a range of text editors you can use to write Python
– Atom
– vi
– vim
– sypder (comes with anaconda)
– Sublime Text
– Pycharm
– notepad ++

All have their positives and negatives. When you are starting out I reccomend using whatever feels the most comfortable.

Personally I started out using notepad ++, then went to spyder, then to Atom and vim.

It’s important to not focus too much on what editor you are using – more important to just write code.

11 – Books & resources

I can recommend the following resources for Python:
Python Reddit
The Hitchhiker’s Guide to Python

Python 3 Object Oriented Programming
Effective Python: 59 Specific Ways to Write Better Python
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
Python Machine Learning
Automate the Boring Stuf with Python

I can also recommend the following for getting an understanding of git:
Github For The Rest Of Us
Understanding Git Conceptually

Thanks for reading!

Imbalance Price Visualization

This post is the third in a series applying machine learning techniques to an energy problem. The goal of this series is to develop models to forecast the UK Imbalance Price.

Part One – What is the UK Imbalance Price?
Part Two – Elexon API Web Scraping using Python

In the first post we gave an introduction of what the UK Imbalance Price is.  We then showed how to gather UK grid data off the Elexon API using Python.

This third post will visualize the data we have gathered.  Spending the time to explore the data is an important step in model building.  All of the charts in this post were created in Python using pandas and matplotlib.

Figure 1 – The UK Imbalance Price January to November 2016

Figure 1 shows the volatile nature of the Imbalance Price.  Major positive spikes are observed throughout the year, with even more significant spikes occurring in November.  UK electricity demand is highest in winter, so likely higher demands are leading to National Grid having to use more expensive plant to balance the system.

Figure 2 – Monthly summary statistics 

Figure 2 shows how extreme the month of November was – large peaks and also a very high standard deviation.  It will be interesting to see how December compares in terms of volatility.

Figure 3 – Correlogram

The correlogram shows the autocorrelation function computed at different lags of the time series.  This can be used to identify any seasonality in the time series.  Clearly we have some seasonality present at around 48 lags – equivalent to one day in this half hourly time series.

It makes sense that the Imbalance Price would likely be similar to the day before – many of the reasons for Imbalance (such as forecasting errors) are likely to occur multiple times.  Interestingly the ACF function does not peak at a lag of 336 (corresponding to one week).

Figure 4 – Box plot (note the y-axis maximum was limited at 200 £/MWh)

The box plot clearly shows how much of the data is classed as outliers (i.e. being outside the inner & outer fences).  Also of interest is how close the median and first quartile are in most months!

In the next post we will begin to forecast the Imbalance Price using a multi-layer perceptron in Scikitlearn.

Elexon API Web Scraping using Python

NOTE – the code in this post is now superseeded – please see my update post – or just go straight to my GitHub repository for this project.

This post is the second in a series applying machine learning techniques to an energy problem.  The goal of this series is to develop models to forecast the UK Imbalance Price. 

Part One – What is the UK Imbalance Price?

The first post in this series gave an introduction to what the UK Imbalance Price is.

This post will show how to scrape data UK Grid data from Elexon using their API.  Elexon make available UK grid and electricity market data  to utilities and traders.

Data available includes technical information such as weather or generation.  Market data like prices and volumes is also available.  A full detail of available data is given in the Elexon API guide.

Accessing data requires an API key, available by setting up a free Elexon account.  The API is accessed by passing a URL with the API key and report parameters.  The API will return either an XML or a CSV document.

Features of the Python code

The Python code below is a modified version of code supplied by the excellent Energy Analyst website.  Some features of the code are:

  • Script iterates through a dictionary of reports.  I have setup the script to iterate through two different reports  – B1770 for Imbalance Price data and B1780 for Imbalance Volume data.
  • The script then iterates through a pandas date_range object of days.  This object is created by setting the startdate and the number of days.
  • Two functions written by Energy Analyst are used:
    • BMRS_GetXML – returns an XML object for a given set of keyword arguments & API key.
    • BMRS_Dataframe – creates a pandas DataFrame from the XML object.
  • Data for each iteration is indexed using UTC time.  Two columns are added to the data_DF with the UTC and UK time stamps.
  • The results of each iteration are saved in an SQL database.  Each report is saved in its own table named with the report name.
  • The results of the entire script run is saved in a CSV (named output.csv)

The Python code

Next steps
The next post in this series will be visualizing analyzing the Imbalance Price data for 2016.