Exercise 2b Filtering

Exercise 2b Filtering#

Aim: The purpose of this lab is to get you started thinking about frequency content of time series

Learning outcomes: At the end of this lab, you will be able to:

Filter data using windows
Check that the window used has “good” spectral properties (a good frequency response)

Data: You will work with a tide gauge time series.

Directions: Answer the questions, create an *.ipynb and three figures. Pick one of these figures to add to the shared_figures/ folder.

Create a notebook & load the data#

Create an *.ipynb containing the commands for this assignment, or copy this file.

Name your python notebook something useful ex<X>-<slug>-<Lastname>-seaocn.ipynb where you replace <X> with the exercise number and <slug> with the short slug to name the topic, and <Lastname> with your last name.

Figures should be named something like `ex<X>fig<Y>-<Lastname>-<slug>-seaocn.png` where you replace `<X>` with the exercise number, `<Y>` with the figure number, and `<Lastname>` with your last name.
```

Import necessary packages.

For example, matplotlib and pandas and numpy and xarray. You may also need

If you are missing any of these packages, please refer to Resources: Python.

# Importing required packages here
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
import pandas as pd
import datetime

# We are going to need some extra packages here
import scipy.signal as sg  #Package for signal analysis
from scipy.fft import fft, fftshift 
plt.rcParams["figure.figsize"] = (10,6.5) #set default figure size
plt.rcParams['font.size'] = 14
plt.rcParams['font.family'] = 'serif'

Download some data + quick check what’s there#

In the time series exercise, you did some quick and easy filtering by making annual averages. Now we’ll try something a little more complicated using tide gauge data.

Pick a tide gauge from the east coast of the USA: https://www.bodc.ac.uk/data/hosted_data_systems/sea_level/international/north_atlantic/#Duck and download the data in *.nc format.
Put the file in your data/ directory.
Load the data using xarray into something called td.
Squeeze the data in xarray to remove extra dimensions of your data variable.
Print the data to see what is in there.
Make a simple plot of your data variable sea_level_1 against time.

# Update the code below for your data
file_path = '../data/'
filename = 'w9006.nc'
fname = file_path + filename

# Load the data
td = xr.open_dataset(fname)
td = td.squeeze()
#print(td)

# Quick plot data
#plt.plot(td.time,td.sea_level_1)

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/micromamba/envs/seaocn_env/lib/python3.8/site-packages/xarray/backends/file_manager.py:209, in CachingFileManager._acquire_with_cache_info(self, needs_lock)
    208 try:
--> 209     file = self._cache[self._key]
    210 except KeyError:

File ~/micromamba/envs/seaocn_env/lib/python3.8/site-packages/xarray/backends/lru_cache.py:55, in LRUCache.__getitem__(self, key)
     54 with self._lock:
---> 55     value = self._cache[key]
     56     self._cache.move_to_end(key)

KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('/Users/eddifying/Library/Mobile Documents/com~apple~CloudDocs/Work/teaching/seaocn-teaching/2024.seaocn-uhh/coursebook-seaocn-2024/coursebook_seaocn/seaocn-cb/data/w9006.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False)), '09542e29-f5d1-45e2-810a-d487ede7cfd9']

During handling of the above exception, another exception occurred:

FileNotFoundError                         Traceback (most recent call last)
Cell In[2], line 7
      4 fname = file_path + filename
      6 # Load the data
----> 7 td = xr.open_dataset(fname)
      8 td = td.squeeze()
      9 #print(td)
     10 
     11 # Quick plot data
     12 #plt.plot(td.time,td.sea_level_1)

File ~/micromamba/envs/seaocn_env/lib/python3.8/site-packages/xarray/backends/api.py:541, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, backend_kwargs, **kwargs)
    529 decoders = _resolve_decoders_kwargs(
    530     decode_cf,
    531     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)
    537     decode_coords=decode_coords,
    538 )
    540 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 541 backend_ds = backend.open_dataset(
    542     filename_or_obj,
    543     drop_variables=drop_variables,
    544     **decoders,
    545     **kwargs,
    546 )
    547 ds = _dataset_from_backend_dataset(
    548     backend_ds,
    549     filename_or_obj,
   (...)
    557     **kwargs,
    558 )
    559 return ds

File ~/micromamba/envs/seaocn_env/lib/python3.8/site-packages/xarray/backends/netCDF4_.py:578, in NetCDF4BackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, format, clobber, diskless, persist, lock, autoclose)
    557 def open_dataset(
    558     self,
    559     filename_or_obj,
   (...)
    574     autoclose=False,
    575 ):
    577     filename_or_obj = _normalize_path(filename_or_obj)
--> 578     store = NetCDF4DataStore.open(
    579         filename_or_obj,
    580         mode=mode,
    581         format=format,
    582         group=group,
    583         clobber=clobber,
    584         diskless=diskless,
    585         persist=persist,
    586         lock=lock,
    587         autoclose=autoclose,
    588     )
    590     store_entrypoint = StoreBackendEntrypoint()
    591     with close_on_error(store):

File ~/micromamba/envs/seaocn_env/lib/python3.8/site-packages/xarray/backends/netCDF4_.py:382, in NetCDF4DataStore.open(cls, filename, mode, format, group, clobber, diskless, persist, lock, lock_maker, autoclose)
    376 kwargs = dict(
    377     clobber=clobber, diskless=diskless, persist=persist, format=format
    378 )
    379 manager = CachingFileManager(
    380     netCDF4.Dataset, filename, mode=mode, kwargs=kwargs
    381 )
--> 382 return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)

File ~/micromamba/envs/seaocn_env/lib/python3.8/site-packages/xarray/backends/netCDF4_.py:329, in NetCDF4DataStore.__init__(self, manager, group, mode, lock, autoclose)
    327 self._group = group
    328 self._mode = mode
--> 329 self.format = self.ds.data_model
    330 self._filename = self.ds.filepath()
    331 self.is_remote = is_remote_uri(self._filename)

File ~/micromamba/envs/seaocn_env/lib/python3.8/site-packages/xarray/backends/netCDF4_.py:391, in NetCDF4DataStore.ds(self)
    389 @property
    390 def ds(self):
--> 391     return self._acquire()

File ~/micromamba/envs/seaocn_env/lib/python3.8/site-packages/xarray/backends/netCDF4_.py:385, in NetCDF4DataStore._acquire(self, needs_lock)
    384 def _acquire(self, needs_lock=True):
--> 385     with self._manager.acquire_context(needs_lock) as root:
    386         ds = _nc4_require_group(root, self._group, self._mode)
    387     return ds

File ~/micromamba/envs/seaocn_env/lib/python3.8/contextlib.py:113, in _GeneratorContextManager.__enter__(self)
    111 del self.args, self.kwds, self.func
    112 try:
--> 113     return next(self.gen)
    114 except StopIteration:
    115     raise RuntimeError("generator didn't yield") from None

File ~/micromamba/envs/seaocn_env/lib/python3.8/site-packages/xarray/backends/file_manager.py:197, in CachingFileManager.acquire_context(self, needs_lock)
    194 @contextlib.contextmanager
    195 def acquire_context(self, needs_lock=True):
    196     """Context manager for acquiring a file."""
--> 197     file, cached = self._acquire_with_cache_info(needs_lock)
    198     try:
    199         yield file

File ~/micromamba/envs/seaocn_env/lib/python3.8/site-packages/xarray/backends/file_manager.py:215, in CachingFileManager._acquire_with_cache_info(self, needs_lock)
    213     kwargs = kwargs.copy()
    214     kwargs["mode"] = self._mode
--> 215 file = self._opener(*self._args, **kwargs)
    216 if self._mode == "w":
    217     # ensure file doesn't get overridden when opened again
    218     self._mode = "a"

File src/netCDF4/_netCDF4.pyx:2464, in netCDF4._netCDF4.Dataset.__init__()

File src/netCDF4/_netCDF4.pyx:2027, in netCDF4._netCDF4._ensure_nc_success()

FileNotFoundError: [Errno 2] No such file or directory: '/Users/eddifying/Library/Mobile Documents/com~apple~CloudDocs/Work/teaching/seaocn-teaching/2024.seaocn-uhh/coursebook-seaocn-2024/coursebook_seaocn/seaocn-cb/data/w9006.nc'

Filter the data#

Select a one year time period to work with#

Pick a year that looks filled in. I.e., straight lines between big blocks of color are probably because there’s not data.
Take a look at the attributes to try to verify that your year has no gaps

See also

https://docs.xarray.dev/en/stable/user-guide/indexing.html
Subselect the data using the .loc command. For example, if you would like to extract the month of January in 1995, you would do:

# Your code here

Fig. 1 Calculate a moving average (Quick-and-dirty filtering)#

You will use xarray.DataArray.rolling()

From your one year period, plot a two week period. As an example, the code below plots data from a 1-year extract, and then adjusts the x-axis limits to be January 1-14, 1995.
Calculate a rolling average (moving average) over 5 hours from your dataset, and add this to your plot in red. Note that your data are assumed to be hourly.
Recalculate this using the option center=True, and add this to your plot in green.
Plot your figure and name it fig1 using the file naming convention.

What is the difference between using center=False or True?

# Your code here

Experiment with windows#

A better filter choice is a ‘windowed’ moving average. It smoothes the data around the edges of the window by weighting the edges less and the middle of the window higher.

Run the code below and take a look at the different windows (weighting) that can be used. Here we are using a window length of 100.

Create a window. When we are using a default rolling average, it is a rectangular or “boxcar” window (where “boxcar” refers to a train car that looks like a container on a container ship). Note that when importing packages above, we imported scipy.signal as sg.

Personal favourite: Tukey. When you use it on a dataset, it looks a lot like a boxcar, but less jaggedy, and the spectral properties (frequency response) are good. You can take a quick look at the “frequency response” for different windows by clicking the function in the scipy documentation. If the side lobes drop away quickly and smoothly (to low decibels) from the center peak, this is better.

Fig. 2 Calculate moving averages with different filters#

This is a tide gauge record, so let’s think about removing the tides. Most tides that we normally think about are diurnal or semi-diurnal, so we’ll start by smoothing the data with a 1-day filter.

Recalculate your moving averages using two types of filters. One of these should be a boxcar, and the other is your choice.
Plot the two filtered time series and the original data on a new set of axes, and zoom in to a 2-week period for better visibility.
Export this figure as fig2.

However! This won’t work on xarray since their rolling doesn’t appear to support windows. So we will need to convert the data to a pandas dataframe first.

```{python}
sla_df = td_1995.sea_level_1.to_dataframe()
type(sla_df)
print(sla_df)
```

Then see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.rolling.html for how to use the optional argument win_type.

Pandas rolling input doesn’t take time=24 but instead just the number 24 as the specified window length. Since our dataset is hourly, then a window of length 24 is a 24-hour window. If our data were monthly, how long would the filter be?

After plotting your time series, note which is bumpier (has higher standard deviation), the one with the boxcar filter or your other window? Think about why this might be.

# Your code here

Fitting a harmonic (optional)#

More typically, to remove tides, the tidal constituents can be removed using a harmonic fit, followed by e.g. a 40-hour filter. For a boxcar, this would be a 40-hour boxcar. For a tukey, it would be something longer.

Explore: See whether you can fit a harmonic to your tidal time series. What are the major tidal constituents?

Some packages exist to help with this: e.g., moflaher/ttide_py

Explore: Go a little deeper into time series analysis by considering spectra. Spectra can immediately help you identify dominant frequencies in your time series, whether they are seasonal (a 365-day cycle) or tidal (12- or 12.4-hr). An example notebook can be found here: https://www.jmlilly.net/course/labs/html/SpectralAnalysis-Python.html from Jonathan Lilly’s time series analysis course.

# Your code here (optional)

Fig. 3 Regridding data#

Now suppose you need to compare your tidal time series with a daily temperature record (for example). You want to calculate a linear regression, but the lengths of the time grids are different. You have some options.

Average your data into 1 day averages. Try this using the methods from the previous exercise.
Subsample your data into the value nearest to noon each day.
Linearly interpolate your data onto the time dimension of the temperature record.

Try these methods and consider what the differences might mean.

For the purpose of this exercise, simply create these new time series and plot a week or a month’s worth of data with all three time series on the same plot.
Save this figure as fig3 using the file naming convention.

# Your code here