seagliderOG1 API

seagliderOG1.convertOG1.add_gps_info_to_dataset(ds, gps_ds)[source]

Add LATITUDE_GPS, LONGITUDE_GPS, and TIME_GPS to the dataset. The values will be present within the N_MEASUREMENTS but with non-Nan values only when the GPS information is available. The dataset will be sorted by TIME.

Parameters:
  • (xarray.Dataset) (gps_ds)

  • (xarray.Dataset)

Returns:

xarray.Dataset

Return type:

The new dataset with added GPS information. This only includes values for LATITUDE_GPS, LONGITUDE_GPS, TIME_GPS when the GPS information is available.

Note

This also sorts by ctd_time (from original basestation dataset) or TIME from ds. If the data are not sorted by time, there may be unintended consequences.

seagliderOG1.convertOG1.convert_to_OG1(list_of_datasets, contrib_to_append=None)[source]

Processes a list of xarray datasets or a single xarray dataset, converts them to OG1 format, concatenates the datasets, sorts by time, and applies attributes.

Parameters:
  • xarray.Dataset) (list_of_datasets (list or)

  • (dict (contrib_to_append)

  • optional) (Dictionary containing additional contributor information to append.)

Returns:

xarray.Dataset

Return type:

The concatenated and processed dataset.

seagliderOG1.convertOG1.extract_attr_to_keep(ds1, attr_as_is={'acknowledgment', 'date_created', 'disclaimer', 'file_version', 'geospatial_lat_max', 'geospatial_lat_min', 'geospatial_lon_max', 'geospatial_lon_min', 'geospatial_vertical_max', 'geospatial_vertical_min', 'id', 'institution', 'keywords', 'keywords_vocabulary', 'license', 'naming_authority', 'project'})[source]
seagliderOG1.convertOG1.extract_attr_to_rename(ds1, attr_to_rename={'comment': 'history', 'platform_id': 'PLATFORM_SERIAL_NUMBER', 'site': 'summary', 'uri': 'uuid', 'uri_comment': 'UUID'})[source]
seagliderOG1.convertOG1.extract_variables(ds)[source]

Further splits the variables from the basestation file that had no dimensions. Extracts them according to whether they were originally from sg_calib_constants, or were from log files, or were other mission/dive specific values.

Parameters:

(xarray.Dataset) (ds)

Returns:

tuple

  • sg_cal (xarray.Dataset): Dataset with variables starting with ‘sg_cal_’, (originally from sg_calib_constants.m). Renamed to remove the prefix, so can be accessed with sg_cal.hd_a.

  • dc_log (xarray.Dataset): Dataset with variables starting with ‘log_’. From log files.

  • dc_other (xarray.Dataset): Other mission/dive specific values. Includes depth-averaged currents but also things like magnetic_variation

Return type:

A tuple containing three xarray Datasets:

seagliderOG1.convertOG1.get_contributors(ds, values_to_append=None)[source]
seagliderOG1.convertOG1.get_time_attributes(ds)[source]

Extracts and cleans time-related attributes from the dataset.

Parameters:

(xarray.Dataset) (ds)

Returns:

dict

Return type:

A dictionary containing cleaned time-related attributes.

seagliderOG1.convertOG1.process_and_save_data(input_location, save=False, output_dir='.', run_quietly=True)[source]

Processes and saves data from the specified input location. This function loads and concatenates datasets from the server, converts them to OG1 format, and saves the resulting dataset to a NetCDF file. If the file already exists, the function will prompt the user to decide whether to overwrite it or not.

Parameters: input_location (str): The location of the input data to be processed. save (bool): Whether to save the processed dataset to a file. Default is False. output_dir (str): The directory where the output file will be saved. Default is ‘../data’.

Returns: xarray.Dataset: The processed dataset.

seagliderOG1.convertOG1.process_dataset(ds1_base, firstrun=False)[source]

Processes a dataset by performing a series of transformations and extractions.

Parameter

ds1_base (xarray.Dataset): The input dataset from a basestation file, containing various attributes and variables.

returns:
  • tuple (A tuple containing:) –

    • ds_new (xarray.Dataset): The processed dataset with renamed variables, assigned attributes,

      converted units, and additional information such as GPS info and dive number.

    • attr_warnings (list): A list of warnings related to attribute assignments.

    • sg_cal (xarray.Dataset): A dataset containing variables starting with ‘sg_cal’.

    • dc_other (xarray.Dataset): A dataset containing other variables not categorized under ‘sg_cal’ or ‘dc_log’.

    • dc_log (xarray.Dataset): A dataset containing variables starting with ‘log_’.

  • Steps

    1. Handle and split the inputs
      • Extract the dive number from the attributes

      • Split the dataset by unique dimensions.

      • Extract the gps_info from the split dataset.

      • Extract variables starting with ‘sg_cal’. These are originally from sg_calib_constants.m.

    2. Rename the dataset dimensions, coordinates and variables according to OG1
      • Extract and rename dimensions for ‘sg_data_point’. These will be the N_MEASUREMENTS.

      • Rename variables according to the OG1 vocabulary.

      • Assign variable attributes according to OG1. Pass back warnings where there were conflicts.

      • Convert units in the dataset (e.g., cm/s to m/s) where possible.

      • Convert QC flags to int8.

    3. Add new variables
      • Add GPS info as LATITUDE_GPS, LONGITUDE_GPS and TIME_GPS (increase length of N_MEASUREMENTS)

      • Add the divenum as a variable of length N_MEASUREMENTS

      • Add the PROFILE_NUMBER (odd for dives, even for ascents)

      • Add the PHASE of the dive (1 for ascent, 2 for descent, 3 for between the first two surface points)

      • Add the DEPTH_Z with positive up

    4. Return the new dataset, the attribute warnings, the sg_cal dataset, and the dc_other dataset.

Note

Possibility of undesired behaviour:
  • It sorts by TIME

  • If there are not two surface GPS fixes before a dive, it may inadvertantly turn the whole thing to a dive.

Checking for valid coordinates: https://github.com/pydata/xarray/issues/3743

seagliderOG1.convertOG1.standardise_OG10(ds, firstrun=False, unit_format={'S/m': 'S m-1', 'cm/s': 'cm s-1', 'degreesCelsius': 'Celsius', 'degrees_Celsius': 'Celsius', 'g/m^3': 'g m-3', 'kg/m^3': 'kg m-3', 'm/s': 'm s-1', 'mS/cm': 'mS cm-1', 'meters': 'm'})[source]

Standardizes the dataset to OG1 format by renaming dimensions, variables, and assigning attributes.

Parameters:

(xarray.Dataset) (ds)

Returns:

xarray.Dataset

Return type:

The standardized dataset.

seagliderOG1.convertOG1.update_dataset_attributes(ds, contrib_to_append)[source]

Updates the attributes of the dataset based on the provided attribute input.

Parameters:
  • (xarray.Dataset) (ds)

  • (module) (vocabularies)

Returns:

xarray.Dataset

Return type:

The dataset with updated attributes.

seagliderOG1.readers.filter_files_by_profile(file_list, start_profile=None, end_profile=None)[source]

Filter a list of files based on the start_profile and end_profile. Expects filenames of the form pXXXYYYY.nc, where XXX is the seaglider serial number and YYYY the divecycle number, e.g. p0420001.nc for glider 41 and divenum 0001. Note: Does not require file_list to be alphabetical/sorted.

Parameters: file_list (list): List of filenames to filter. start_profile (int, optional): The starting profile number to filter files. Defaults to None. end_profile (int, optional): The ending profile number to filter files. Defaults to None.

Returns: list: A list of filtered filenames.

seagliderOG1.readers.list_files(source, registry_loc='seagliderOG1', registry_name='seaglider_registry.txt')[source]

List files from a given source, which can be either a URL or a directory path. For an online source, uses BeautifulSoup and requests.

Parameters: source (str): The source from which to list files. It can be a URL (starting with “http://” or “https://”)

or a local directory path.

Returns: list: A list of filenames available in the specified source, sorted alphabetically Raises: ValueError: If the source is neither a valid URL nor a directory path.

seagliderOG1.readers.load_basestation_files(source, start_profile=None, end_profile=None)[source]

Load datasets from either an online source or a local directory, optionally filtering by profile range.

Parameters: source (str): The URL to the directory containing the NetCDF files or the path to the local directory. start_profile (int, optional): The starting profile number to filter files. Defaults to None. end_profile (int, optional): The ending profile number to filter files. Defaults to None.

Returns: A list of xarray.Dataset objects loaded from the filtered NetCDF files.

seagliderOG1.readers.load_first_basestation_file(source)[source]

Load the first dataset from either an online source or a local directory.

Parameters: source (str): The URL to the directory containing the NetCDF files or the path to the local directory.

Returns: An xarray.Dataset object loaded from the first NetCDF file.

seagliderOG1.readers.load_sample_dataset(dataset_name='p0330015_20100906.nc')[source]

Download sample datasets for use with seagliderOG1

Parameters:

dataset_name (str, optional) – _description_. Defaults to “p0330015_20100906.nc”.

Raises:

ValueError – If the requests dataset is not known, raises a value error

Returns:

Requested sample dataset

Return type:

xarray.Dataset

seagliderOG1.plotters.plot_ctd_depth_vs_time(ds, start_traj=None, end_traj=None)[source]

Plots CTD depth vs time, optionally filtered by trajectory range, and highlights non-NaN GPS latitude values.

Parameters: ds (xr.Dataset): The input dataset containing ‘ctd_time’, ‘ctd_depth’, and ‘gps_lat’. start_traj (int, optional): The starting trajectory number to filter the data. Default is None. end_traj (int, optional): The ending trajectory number to filter the data. Default is None.

seagliderOG1.plotters.plot_depth_colored(data, color_by=None, start_dive=None, end_dive=None)[source]

Plots depth as a function of time, optionally colored by another variable, and filtered by dive number.

Parameters: data (pd.DataFrame or xr.Dataset): The input data containing ‘ctd_depth’ and ‘ctd_time’. color_by (str, optional): The variable to color the plot by. Default is None. start_dive (int, optional): The starting dive number to filter the data. Default is None. end_dive (int, optional): The ending dive number to filter the data. Default is None.

seagliderOG1.plotters.plot_profile_depth(data)[source]

Plots the profile depth (ctd_depth) as a function of time (ctd_time). Reduces the total number of points to be less than 100,000.

Parameters: data (pd.DataFrame or xr.Dataset): The input data containing ‘ctd_depth’ and ‘ctd_time’.

seagliderOG1.plotters.show_attributes(data)[source]

Processes an xarray Dataset or a netCDF file, extracts attribute information, and returns a DataFrame with details about the attributes.

Parameters: data (str or xr.Dataset): The input data, either a file path to a netCDF file or an xarray Dataset.

Returns: pandas.DataFrame: A DataFrame containing the following columns:

  • Attribute: The name of the attribute.

  • Value: The value of the attribute.

seagliderOG1.plotters.show_contents(data, content_type='variables')[source]

Wrapper function to show contents of an xarray Dataset or a netCDF file.

Parameters: data (str or xr.Dataset): The input data, either a file path to a netCDF file or an xarray Dataset. content_type (str): The type of content to show, either ‘variables’ (or ‘vars’) or ‘attributes’ (or ‘attrs’). Default is ‘variables’.

Returns: pandas.io.formats.style.Styler or pandas.DataFrame: A styled DataFrame with details about the variables or attributes.

seagliderOG1.plotters.show_variables(data)[source]

Processes an xarray Dataset or a netCDF file, extracts variable information, and returns a styled DataFrame with details about the variables.

Parameters: data (str or xr.Dataset): The input data, either a file path to a netCDF file or an xarray Dataset.

Returns: pandas.io.formats.style.Styler: A styled DataFrame containing the following columns:

  • dims: The dimension of the variable (or “string” if it is a string type).

  • name: The name of the variable.

  • units: The units of the variable (if available).

  • comment: Any additional comments about the variable (if available).

seagliderOG1.plotters.show_variables_by_dimension(data, dimension_name='trajectory')[source]

Processes an xarray Dataset or a netCDF file, extracts variable information, and returns a styled DataFrame with details about the variables filtered by a specific dimension.

Parameters: data (str or xr.Dataset): The input data, either a file path to a netCDF file or an xarray Dataset. dimension_name (str): The name of the dimension to filter variables by.

Returns: pandas.io.formats.style.Styler: A styled DataFrame containing the following columns:

  • dims: The dimension of the variable (or “string” if it is a string type).

  • name: The name of the variable.

  • units: The units of the variable (if available).

  • comment: Any additional comments about the variable (if available).

seagliderOG1.writers.save_dataset(ds, output_file='../test.nc')[source]

Attempts to save the dataset to a NetCDF file. If a TypeError occurs due to invalid attribute values, it converts the invalid attributes to strings and retries the save operation.

Parameters:
  • (xarray.Dataset) (ds)

  • (str) (output_file)

Returns:

  • bool (True if the dataset was saved successfully, False otherwise.)

  • Based on (https://github.com/pydata/xarray/issues/3743)

seagliderOG1.tools.add_dive_number(ds, dive_number=None)[source]

Add dive number as a variable to the dataset. Assumes present in the basestation attributes.

Parameters: ds (xarray.Dataset): The dataset to which the dive number will be added.

Returns: xarray.Dataset: The dataset with the dive number added.

seagliderOG1.tools.add_sensor_to_dataset(dsa, ds, sg_cal, firstrun=False)[source]
seagliderOG1.tools.assign_phase(ds)[source]

This function adds new variables ‘PHASE’ and ‘PHASE_QC’ to the dataset ds, which indicate the phase of each measurement. The phase is determined based on the pressure readings (‘PRES’) for each unique dive number (‘dive_num’).

Note: In this formulation, we are only separating into dives and climbs based on when the glider is at the maximum depth. Future work needs to separate out the other phases: https://github.com/OceanGlidersCommunity/OG-format-user-manual/blob/main/vocabularyCollection/phase.md and generate a PHASE_QC. Assigns phase values to the dataset based on pressure readings.

Parameters:

(xarray.Dataset) (ds)

Returns:

  • xarray.Dataset (The dataset with an additional ‘PHASE’ variable, where:)

  • xarray.Dataset (The dataset with additional ‘PHASE’ and ‘PHASE_QC’ variables, where:) –

    • ‘PHASE’ indicates the phase of each measurement:
      • Phase 2 is assigned to measurements up to and including the maximum pressure point.

      • Phase 1 is assigned to measurements after the maximum pressure point.

    • ’PHASE_QC’ is an additional variable with no QC applied.

  • Note (In this formulation, we are only separating into dives and climbs based on when the glider is at the maximum depth. Future work needs to separate out the other phases: https://github.com/OceanGlidersCommunity/OG-format-user-manual/blob/main/vocabularyCollection/phase.md and generate a PHASE_QC)

seagliderOG1.tools.assign_profile_number(ds, ds1)[source]
seagliderOG1.tools.calc_Z(ds)[source]

Calculate the depth (Z position) of the glider using the gsw library to convert pressure to depth.

Parameters:

(xarray.Dataset) (ds)

Returns:

xarray.Dataset

Return type:

The dataset with an additional ‘DEPTH’ variable.

seagliderOG1.tools.convert_qc_flags(dsa, qc_name)[source]
seagliderOG1.tools.convert_units(ds)[source]

Convert the units of variables in an xarray Dataset to preferred units. This is useful, for instance, to convert cm/s to m/s.

Parameters:

(xarray.Dataset) (ds)

Returns:

xarray.Dataset

Return type:

The dataset with converted units.

seagliderOG1.tools.convert_units_var(var_values, current_unit, new_unit, unit1_to_unit2={'Celsius_to_degreesCelsius': {'current_unit': 'Celsius', 'factor': 1, 'new_unit': 'degreesCelsius'}, 'Pa_to_dbar': {'current_unit': 'Pa', 'factor': 0.0001, 'new_unit': 'dbar'}, 'S m-1_to_mS cm-1': {'current_unit': 'S m-1', 'factor': 0.1, 'new_unit': 'mS cm-1'}, 'S/m_to_mS/cm': {'current_unit': 'S/m', 'factor': 0.1, 'new_unit': 'mS/cm'}, 'cm s-1_to_m s-1': {'current_unit': 'cm s-1', 'factor': 0.01, 'new_unit': 'm s-1'}, 'cm/s_to_m/s': {'current_unit': 'cm/s', 'factor': 0.01, 'new_unit': 'm/s'}, 'cm_to_m': {'current_unit': 'cm', 'factor': 0.01, 'new_unit': 'm'}, 'dbar_to_Pa': {'current_unit': 'dbar', 'factor': 10000, 'new_unit': 'Pa'}, 'dbar_to_kPa': {'current_unit': 'dbar', 'factor': 10, 'new_unit': 'kPa'}, 'degreesCelsius_to_Celsius': {'current_unit': 'degreesCelsius', 'factor': 1, 'new_unit': 'Celsius'}, 'g m-3_to_kg m-3': {'current_unit': 'g m-3', 'factor': 0.001, 'new_unit': 'kg m-3'}, 'g/m^3_to_kg/m^3': {'current_unit': 'g/m3', 'factor': 0.001, 'new_unit': 'kg/m3'}, 'kg m-3_to_g m-3': {'current_unit': 'kg m-3', 'factor': 1000, 'new_unit': 'g m-3'}, 'kg/m^3_to_g/m^3': {'current_unit': 'kg/m3', 'factor': 1000, 'new_unit': 'g/m3'}, 'km_to_m': {'current_unit': 'km', 'factor': 1000, 'new_unit': 'm'}, 'm s-1_to_cm s-1': {'current_unit': 'm s-1', 'factor': 100, 'new_unit': 'cm s-1'}, 'm/s_to_cm/s': {'current_unit': 'm/s', 'factor': 100, 'new_unit': 'cm/s'}, 'mS cm-1_to_S m-1': {'current_unit': 'mS cm-1', 'factor': 10, 'new_unit': 'S m-1'}, 'mS/cm_to_S/m': {'current_unit': 'mS/cm', 'factor': 10, 'new_unit': 'S/m'}, 'm_to_cm': {'current_unit': 'm', 'factor': 100, 'new_unit': 'cm'}, 'm_to_km': {'current_unit': 'm', 'factor': 0.001, 'new_unit': 'km'}}, firstrun=False)[source]

Convert the units of variables in an xarray Dataset to preferred units. This is useful, for instance, to convert cm/s to m/s.

Parameters:
  • (xarray.Dataset) (ds)

  • (list) (preferred_units)

  • (dict) (unit1_to_unit2)

  • string (Each key is a unit) –

    • ‘factor’: The factor to multiply the variable by to convert it.

    • ’units_name’: The new unit name after conversion.

  • with (and each value is a dictionary) –

    • ‘factor’: The factor to multiply the variable by to convert it.

    • ’units_name’: The new unit name after conversion.

Returns:

xarray.Dataset

Return type:

The dataset with converted units.

seagliderOG1.tools.encode_times(ds)[source]
seagliderOG1.tools.encode_times_og1(ds)[source]
seagliderOG1.tools.find_best_dtype(var_name, da)[source]
seagliderOG1.tools.gather_sensor_info(ds_other, ds_sgcal, firstrun=False)[source]

Gathers sensor information from the provided datasets and organizes it into a new dataset. Parameters: ds_other (xarray.Dataset): The dataset containing sensor data. ds_sgcal (xarray.Dataset): The dataset containing calibration data. firstrun (bool, optional): A flag indicating if this is the first run. Defaults to False. Returns: xarray.Dataset: A dataset containing the gathered sensor information. Notes: - The function looks for specific sensor names in the ds_other dataset and adds them to a new dataset ds_sensor. - If ‘aanderaa4330_instrument_dissolved_oxygen’ is present in ds_other, it is renamed to ‘aa4330’. - If ‘Pcor’ is present in ds_sgcal, an additional sensor ‘sbe43’ is created based on ‘sbe41’ with specific attributes. - If ‘optode_FoilCoefA1’ is present in ds_sgcal, an additional sensor ‘aa4831’ is created based on ‘sbe41’ with specific attributes. - The function sets appropriate attributes for the sensors ‘aa4330’, ‘aa4831’, and ‘sbe43’ if they are present.

seagliderOG1.tools.get_sg_attrs(ds)[source]
seagliderOG1.tools.reformat_units_str(old_unit, unit_format={'S/m': 'S m-1', 'cm/s': 'cm s-1', 'degreesCelsius': 'Celsius', 'degrees_Celsius': 'Celsius', 'g/m^3': 'g m-3', 'kg/m^3': 'kg m-3', 'm/s': 'm s-1', 'mS/cm': 'mS cm-1', 'meters': 'm'})[source]
seagliderOG1.tools.reformat_units_var(ds, var_name, unit_format={'S/m': 'S m-1', 'cm/s': 'cm s-1', 'degreesCelsius': 'Celsius', 'degrees_Celsius': 'Celsius', 'g/m^3': 'g m-3', 'kg/m^3': 'kg m-3', 'm/s': 'm s-1', 'mS/cm': 'mS cm-1', 'meters': 'm'})[source]

Renames units in the dataset based on the provided dictionary for OG1.

Parameters:
  • (xarray.Dataset) (ds)

  • (dict) (unit_format)

Returns:

xarray.Dataset

Return type:

The dataset with renamed units.

seagliderOG1.tools.set_best_dtype(ds)[source]
seagliderOG1.tools.set_best_dtype_value(value, var_name)[source]

Determines the best data type for a single value based on its variable name and converts it.

Parameters:

value (any) – The input value to convert.

Returns:

converted_value – The value converted to the best data type.

Return type:

any

seagliderOG1.tools.set_fill_value(new_dtype)[source]
seagliderOG1.tools.split_by_unique_dims(ds)[source]

Splits an xarray dataset into multiple datasets based on the unique set of dimensions of the variables.

Parameters: ds (xarray.Dataset): The input xarray dataset containing various variables.

Returns: tuple: A tuple containing xarray datasets, each with variables sharing the same set of dimensions.