Exercise 2a Time series#
Aim: The purpose of this lab is to get you started with elementary time series analysis: fitting a trend and a polynomial to a time series.
Learning outcomes: At the end of this lab, you will be able to:
Calculate a linear trend given a time series, and determine its slope and y-offset.
Calculate a polynomial fit to the time series, and determine the acceleration.
Gain basic familiarity with Python
xarray
data arrays, and familiarity with indices and NaNs.
Data: You will work with a time series of monthly sea surface temperature.
Directions: Answer the questions, create an *.ipynb
and two figures which you’ll add to the shared_figures/
folder.
See also
Xarray docs: https://docs.xarray.dev/en/latest/user-guide/time-series.html
Create a notebook & load the data#
Create an
*.ipynb
containing the commands for this assignment, or copy this file.File naming convention
Name your python notebook something useful
ex<X>-<Lastname>-<slug>-seaocn.ipynb
where you replace<X>
with the exercise number and<slug>
with the short slug to name the topic, and<Lastname>
with your last name.Figures should be named something like
ex<X>fig<Y>-<Lastname>-<slug>-seaocn.png
where you replace<X>
with the exercise number,<Y>
with the figure number, and<Lastname>
with your last name.Import necessary packages.
For example,
matplotlib
andpandas
andnumpy
andxarray
. You may also needIf you are missing any of these packages, please refer to Resources: Python.
# Importing required packages here
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
import pandas as pd
Download some data#
Go to the KNMI climate explorer and download a time series of monthly sea surface temperature since 1850.
Click “Monthly Observations” on the right
Choose temperature from “Berkeley 1 deg”. This is a gridded and filled sea surface temperature product.
Select a latitude and longitude in the ocean. Enter a range into the latitude which is 1 degree wide. Note that for the southern hemisphere, you should use negative values. For degrees west you should use “negative degrees east”.
After entering values in the 2 latititude and 2 longitude boxes, click the choice “subset of the field” after make.
Click the button “Make time series”.
Scroll to the bottom of the page, and find the line “If you really want to get it here, is available as a netcdf file. Click the link with “netcdf file” to download your data.
Put the data into the
data/
folder on your computer. This should be accessible from the location of your python exercises.
# Your code here
file_path = '../data/'
filename = 'TAVG_landocean_LatLong1_-51--50E_30-31N.nc'
# To concatenate two strings in python, you use the '+' symbol
fname = file_path + filename
# Try a `print` on your new variable to see what it is
print(fname)
../data/TAVG_landocean_LatLong1_-51--50E_30-31N.nc
Load the data#
The data are in a netCDF file. As before, we will use xarray
to handle the dataset. Make sure you have included xarray
in your cell above importing packages.
import xarray as xr
# Loading the data
# --> Uncomment this line. Note that it will throw an error. After troubleshooting the error, you can comment it back out again.
#sst1 = xr.open_dataset(fname)
Errors with time format#
One of the things that can cause a headache in python is time formatting, and we see this here.
Depending on your installation, xarray
cannot decode the time format which, from the error message, we see is in units of “months since 1850-01-01”. The error message itself recommends a (temporary) fix by using decode_times=False
.
Try loading the data using this hint.
Note the format of the time - it’s now integers counting from 0 to something like 2077, which doesn’t look like “time”
Create a time vector representing the time (you may need to refer back to your error message above to see what the time should be).
Some examples to help you get started.
Replace your coordinate in the
xarray
structure with your new time object.
# Your code here
Fig 1. Plot the time series#
Plot your sea surface temperature against time.
A simple tutorial from matplotlib
for making a plot: https://matplotlib.org/stable/tutorials/pyplot.html#sphx-glr-tutorials-pyplot-py
Replot the time series. Try adding a grid using plt.grid()
. Now try adding only horizontal grid lines. Try reformating the x-axis labels.
# Making a time series plot
# --> The following line will throw an error. Try to troubleshoot the error.
# plt.plot(sst1.time, sst1.temperature, linewidth=2.0)
The above code should generate an error for you, i.e. python says it can’t compute what you’ve asked it to for some reason. The last line of the error may look something like:
ValueError: x and y can be no greater than 2D, but have shapes (2078,) and (2078, 1, 1)
Errors#
Try reading the error message you got from the previous line. It is trying to tell you that there is a problem with plotting your data variable temperature
because it is 3-dimensional (in this case, its “shape” is (2064,1,1) meaning that the number of rows is 2064, the number of columns is 1 and the number of elements in the 3rd dimension is also 1.
As humans, we might see this as 1-dimensional, but python has a problem with it.
Fixing dimensionality#
There are two ways to deal with this problem.
Verify the size of your data that is causing the problem.
Since the time series is really something like 2078 long, and the 2nd and 3rd dimensions are not needed, we can do this by using the indexing
tanom = sst1.temperature[:,0,0]
This says we want all the data from the variable “temperature” in the first dimension (“all” is given by the colon “:” by itself, and the first element in the second dimension (given by 0) and the first element in the third dimension (given by the second zero). Note: in python, indexing (referring to a location in a vector or matrix) starts counting at zero, so the first element in a sequence has an index of 0, the second element has an index of 1, and so on.
An alternative option using
xarray
functionality: In the case of our sea surface tempearture dataset, since we’ve chosen a 1x1 degree region, the latitude and longitude dimensions are singular (length 1), and the only dimension we want to retain (the “time” dimension) is length 2064, we can use a funny little function forxarray
called:
xr.DataArray.squeeze()
This will take our dataset and squeeze out (remove) the extra dimensions.
Try both of these options, and then use the print
commands to see what resulted.
# Your code here
# Try option 1 below
print('---- Option 1: ----')
# Try option 2 below
print('---- Option 2: ----')
---- Option 1: ----
---- Option 2: ----
Re-try your plotting#
Note that if this still doesn’t work, it is possible that your data are not actually 1-dimensional. You can see this if you check the “shape” of your data. When choosing a dataset to download from KNMI climate explorer, verify that the longitude and latitude ranges were each only 1 degree wide. Check the size of your data variable again:
print(sst1.temperature.shape)
The first number in brackets can be \(>1\) but the other two should be \(=1\). If the second or third dimensions are \(>1\), then this is probably where you asked KNMI climate explorer for SSTs in a region that was larger than 1 degree latitude or 1 degree longitude. Go back and redownload your data.
# Your code here
Thought exercise: Try changing the axes limits to zoom in on different parts of the data. Can you describe the data overall? Various periodicities, tendencies or other wobbles?
Explore the data#
What is the mean of the data?
See also
What is the first temperature? the minumum temperature?
What is the last temperature value? the maximum temperature?
How many years long is the record?
Some sample code to get you started (see also: https://numpy.org/doc/stable/reference/arrays.datetime.html)
This is an example for the time in days, but you need to do this in years, so take a look at the documentation to figure out how to do this in years instead.
How quickly is SST rising? Calculate this using your first, last and time span of the data.
What is the average SST in the last year of the dataset? In the first year? Re-estimate how quickly SST is rising using these annually averaged values.
Experiment with printing your answer to the screen.
or
Recall what the ‘+’ does between two strings. (We saw this above when we were working with the file path and name.)
Now note that your numbers previously calculated had a lot of significant digits. Suppose the temperature dataset is only accurate to 0.01 deg C. Use the
.round()
function to round everything to 2 digits past the decimal place.
Note
When should you bundle the whole string of commands together, and when should you break it into steps? There are a few considerations here.
Will I use a computed value again? If so, then you probably want to save it as something (as in the example of computing tmean and then printing its values.
Which option is better for readability? If the computation is long and complex, it may be hard to keep track of what is happening where and when, which can often lead to errors. In this case, separating it out is also useful.
If you only need it once, and the calculation is fairly straightforward (like here, the calculation of the mean or median, and only for display purposes) then your code will overall be more compact if you put it in the same line as the print command.
Plot a histogram of the data. Use
matplotlib.pyplot.hist
with 100 bins. Do the data look normally distributed? I.e., is the histogram roughly a gaussian?
# Your code: Mean and median
# Your code: Calculating the time difference from end to beginning
# Your code: Averaing first year / last year
# Make a histogram of the first year's sst
Fig 2. Plot an annual cycle#
Here we will use some useful functions in xarray for handling time series data. See xarray: Time series data.
Calculate monthly averages. As an example, something like
Plot the seasonal cycle, calculated as monthly averages, in a new figure. Add appropriate annotations.
Next step: Estimate a standard error (SE) for the mean of each monthly value, where the standard error is
\[SE = \frac{\sigma}{\sqrt{N}}\]with \(\sigma\) the standard deviation and \(N\) the number of values. A quick-and-dirty estimate would have, for January, that \(\sigma\) is the standard deviation of January values, and \(N\) the number of years of measurements. This is fine for a first estimate. However, notice from your earlier time series plot that there are some gaps in the data record , especially prior to 1920. So this means that the number of years is not equal to 2063 (length of the time series) divided by 12 months. Recall that the standard error calculation assumes a few things: that your measurements are independent (this is not quite true here) and that your data are normally distributed. Check your histogram above to see whether the data are normal.
Then try plotting the annual cycle with the annual cycle + standard error and the annual cycle - standard error for each month. Now you will have 3 lines on your plot.
Instead of plotting these as lines, try shading.
See the
matplotlib.pyplot.fill_between
command. Experiment with the optional argumentalpha
. Try givingalpha=.8
oralpha=.3
.
Export this figure as figure 2, using the file naming convention. Make sure your figure has appropriate annotations.
# Calculate an annual cycle
Fitting a trend - updating Fig. 1#
Find a line of best fit to the original data.
Calculate an annually-averaged time series. This is similar to what you did above for averaging the months, but instead you’ll average across years. Use the
.median()
instead of.mean()
.Plot your new time series
Fit a first degree polynomial (linear regression). Recall that a 1st order polynomial has the form
\[Ax + B\]where \(A\) is the slope. Read how to use the function xarray.Dataset.polyfit. You should use the time dimension as your coordinate along which to fit the 1st degree polynomial (a line).
What is the slope of the line? What are the units on the slope?
Convert this to units of degrees/year.
Question for thought: How does this compare to the slope you calculated above when looking at the maximum and minimum of the time series, and time series length?
Using your linear regression, what temperature would you predict in 1960? Use the function xarray.polyval to compute this.
What was the observed value in 1960? Use
xarray
functions to extract the year of 1960 and calculate an average over that year.Add the line of best fit to your original plot of the full dataset (or replot here).
Now fit a polynomial of degree 2 (a quadratic). A second degree polynomial has the form
\[Ax^2 + Bx + C\]where \(A\) is the acceleration. What are the units on \(A\)?
Add this polynomial as a third curve on your plot.
Clean up the plot. This means, add a legend, axes labels, fix the time axis annotations if needed, and fix the fontsize to be legible even if the figure is small.
Note
In future exercises, you will be expected to clean up all plots without reminders. All plots should have axes labelled (with units!), legible font size (bigger than you think) and appropriate axis ranges. Legends should be used whenever more than one quantity is plotted.
Save the figure as a new version of your figure 1, overwriting the previous version.
# Your code here
Look at the residual -#
Come back to this python notebook after the lab exercise-filter
.
Subtract the values for the trend or polynomial that you fit (whichever looked better by eye) from your original data. This is the residual.
Plot the residual.
After the second lab exercise-filter.ipynb
you can come back here and apply a filter (whatever seems best) to the residual.
Describe the variability that you see.
Make a figure.
# Your code here