seagliderOG1 API

convertOG1

seagliderOG1.convertOG1.add_gps_info_to_dataset(ds: Dataset, gps_ds: Dataset) Dataset[source]

Adds GPS information (LATITUDE_GPS, LONGITUDE_GPS, TIME_GPS) to the dataset.

The GPS values will be included within the N_MEASUREMENTS dimension, with non-NaN values only when GPS information is available. The dataset will be sorted by TIME.

Parameters:
  • ds (xarray.Dataset) – The dataset with renamed dimensions and variables, representing the main data.

  • gps_ds (xarray.Dataset) – The dataset containing GPS information, typically extracted from the original basestation dataset.

Returns:

The updated dataset with added GPS information. This includes values for LATITUDE_GPS, LONGITUDE_GPS, and TIME_GPS only when GPS information is available.

Return type:

xarray.Dataset

Notes

  • The dataset is sorted by TIME (or ctd_time from the original basestation dataset).

  • If the data are not sorted by time, there may be unintended consequences.

  • The function assumes that the GPS dataset contains variables log_gps_lon, log_gps_lat, and log_gps_time for longitude, latitude, and time respectively.

  • The function uses the sg_data_point dimension as defined in the OG1 vocabulary.

seagliderOG1.convertOG1.convert_to_OG1(list_of_datasets: list[Dataset] | Dataset, contrib_to_append: dict[str, str] | None = None) tuple[Dataset, list[str]][source]

Processes a list of xarray datasets or a single xarray dataset, converts them to OG1 format, concatenates the datasets, sorts by time, and applies attributes.

Parameters:
  • list_of_datasets (list[xr.Dataset] | xr.Dataset) – A list of xarray datasets or a single xarray dataset in basestation format.

  • contrib_to_append (dict[str, str] | None, optional) – Dictionary containing additional contributor information to append. Default is None.

Returns:

A tuple containing: - ds_og1 (xarray.Dataset): The concatenated and processed dataset in OG1 format. - varlist (list[str]): A list of variable names from the input datasets.

Return type:

tuple[xr.Dataset, list[str]]

seagliderOG1.convertOG1.extract_attr_to_keep(ds1, attr_as_is={'acknowledgment', 'date_created', 'disclaimer', 'file_version', 'geospatial_lat_max', 'geospatial_lat_min', 'geospatial_lon_max', 'geospatial_lon_min', 'geospatial_vertical_max', 'geospatial_vertical_min', 'id', 'institution', 'keywords', 'keywords_vocabulary', 'license', 'naming_authority', 'project'})[source]
seagliderOG1.convertOG1.extract_attr_to_rename(ds1, attr_to_rename={'comment': 'history', 'platform_id': 'PLATFORM_SERIAL_NUMBER', 'site': 'summary', 'uri': 'uuid', 'uri_comment': 'UUID'})[source]
seagliderOG1.convertOG1.extract_variables(ds: Dataset) tuple[Dataset, Dataset, Dataset][source]

Splits variables from the basestation file that have no dimensions into categorized datasets.

This function further processes the variables from the basestation file that had no dimensions. It categorizes them based on their prefixes or characteristics into three groups: variables from sg_calib_constants, log files, and other mission/dive-specific values.

Parameters:

ds (xarray.Dataset) – The input dataset. This function is designed to work on variables from the basestation file that had no dimensions, typically after being processed by split_by_unique_dims.

Returns:

A tuple containing three xarray Datasets: - sg_cal : xarray.Dataset

Dataset containing variables starting with ‘sg_cal_’ (originally from sg_calib_constants.m). The variables are renamed to remove the ‘sg_cal_’ prefix, so they can be accessed directly (e.g., sg_cal.hd_a).

  • dc_logxarray.Dataset

    Dataset containing variables starting with ‘log_’. These variables are typically from log files.

  • dc_otherxarray.Dataset

    Dataset containing other mission/dive-specific values. This includes depth-averaged currents and other variables like magnetic_variation.

Return type:

tuple[xr.Dataset, xr.Dataset, xr.Dataset]

seagliderOG1.convertOG1.get_contributors(ds, values_to_append=None)[source]
seagliderOG1.convertOG1.get_time_attributes(ds)[source]

Extract and clean time-related attributes from the dataset.

Parameters:

ds (xarray.Dataset) – The input dataset containing various attributes.

Returns:

A dictionary containing cleaned time-related attributes.

Return type:

dict

seagliderOG1.convertOG1.process_and_save_data(input_location, save=False, output_dir='.', run_quietly=True)[source]

Processes and saves data from the specified input location.

This function loads and concatenates datasets from the server, converts them to OG1 format, and saves the resulting dataset to a NetCDF file. If the file already exists, the function will prompt the user to decide whether to overwrite it or not.

Parameters:
  • input_location (str) – The location of the input data to be processed.

  • save (bool, optional) – Whether to save the processed dataset to a file. Default is False.

  • output_dir (str, optional) – The directory where the output file will be saved. Default is ‘.’.

  • run_quietly (bool, optional) – If True, suppresses user prompts and assumes ‘no’ for overwriting files. Default is True.

Returns:

The processed dataset.

Return type:

xarray.Dataset

seagliderOG1.convertOG1.process_dataset(ds1_base: Dataset, firstrun: bool = False) tuple[Dataset, list[str], Dataset, Dataset, Dataset][source]

Processes a dataset by performing a series of transformations and extractions.

Parameters:
  • ds1_base (xarray.Dataset) – The input dataset from a basestation file, containing various attributes and variables.

  • firstrun (bool, optional) – Indicates whether this is the first run of the processing pipeline. Default is False.

Returns:

A tuple containing: - ds_new (xarray.Dataset): The processed dataset with renamed variables, assigned attributes,

converted units, and additional information such as GPS info and dive number.

  • attr_warnings (list[str]): A list of warnings related to attribute assignments.

  • sg_cal (xarray.Dataset): A dataset containing variables starting with ‘sg_cal’.

  • dc_other (xarray.Dataset): A dataset containing other variables not categorized under ‘sg_cal’ or ‘dc_log’.

  • dc_log (xarray.Dataset): A dataset containing variables starting with ‘log_’.

Return type:

tuple

Notes

  • The function performs the following steps:
    1. Handles and splits the inputs:
      • Extracts the dive number from the attributes.

      • Splits the dataset by unique dimensions.

      • Extracts the gps_info from the split dataset.

      • Extracts variables starting with ‘sg_cal’ (originally from sg_calib_constants.m).

    2. Renames the dataset dimensions, coordinates, and variables according to OG1:
      • Extracts and renames dimensions for ‘sg_data_point’ (N_MEASUREMENTS).

      • Renames variables according to the OG1 vocabulary.

      • Assigns variable attributes according to OG1 and logs warnings for conflicts.

      • Converts units in the dataset (e.g., cm/s to m/s) where possible.

      • Converts QC flags to int8.

    3. Adds new variables:
      • Adds GPS info as LATITUDE_GPS, LONGITUDE_GPS, and TIME_GPS (increasing the length of N_MEASUREMENTS).

      • Adds the divenum as a variable of length N_MEASUREMENTS.

      • Adds the PROFILE_NUMBER (odd for dives, even for ascents).

      • Adds the PHASE of the dive (1 for ascent, 2 for descent, 3 for between the first two surface points).

      • Adds the DEPTH_Z with positive up.

    4. Returns the processed dataset, attribute warnings, and categorized datasets.

  • The function sorts the dataset by TIME and may exhibit undesired behavior if there are not two surface GPS fixes before a dive.

seagliderOG1.convertOG1.standardise_OG10(ds: Dataset, firstrun: bool = False, unit_format: dict[str, str] = {'S/m': 'S m-1', 'cm/s': 'cm s-1', 'degreesCelsius': 'Celsius', 'degrees_Celsius': 'Celsius', 'g/m^3': 'g m-3', 'kg/m^3': 'kg m-3', 'm/s': 'm s-1', 'mS/cm': 'mS cm-1', 'meters': 'm'}) Dataset[source]

Standardize the dataset to OG1 format by renaming dimensions, variables, and assigning attributes.

Parameters:
  • ds (xarray.Dataset) – The input dataset to be standardized.

  • firstrun (bool, optional) – Indicates whether this is the first run of the standardization process. Default is False.

  • unit_format (dict[str, str], optional) – A dictionary mapping unit strings to their standardized format. Default is vocabularies.unit_str_format.

Returns:

The standardized dataset in OG1 format.

Return type:

xarray.Dataset

seagliderOG1.convertOG1.update_dataset_attributes(ds, contrib_to_append)[source]

Updates the attributes of the dataset based on the provided attribute input.

Parameters:
  • ds (xarray.Dataset) – The input dataset whose attributes need to be updated.

  • contrib_to_append (dict[str, str] or None) – A dictionary containing additional contributor information to append. Default is None.

Returns:

A dictionary of ordered attributes with updated values.

Return type:

dict

vocabularies

readers

seagliderOG1.readers.filter_files_by_profile(file_list, start_profile=None, end_profile=None)[source]

Filter a list of files based on the start_profile and end_profile. Expects filenames of the form pXXXYYYY.nc, where XXX is the seaglider serial number and YYYY the divecycle number, e.g. p0420001.nc for glider 41 and divenum 0001. Note: Does not require file_list to be alphabetical/sorted.

Parameters: file_list (list): List of filenames to filter. start_profile (int, optional): The starting profile number to filter files. Defaults to None. end_profile (int, optional): The ending profile number to filter files. Defaults to None.

Returns: list: A list of filtered filenames.

seagliderOG1.readers.list_files(source, registry_loc='seagliderOG1', registry_name='seaglider_registry.txt')[source]

List files from a given source, which can be either a URL or a directory path. For an online source, uses BeautifulSoup and requests.

Parameters: source (str): The source from which to list files. It can be a URL (starting with “http://” or “https://”)

or a local directory path.

Returns: list: A list of filenames available in the specified source, sorted alphabetically Raises: ValueError: If the source is neither a valid URL nor a directory path.

seagliderOG1.readers.load_basestation_files(source, start_profile=None, end_profile=None)[source]

Load datasets from either an online source or a local directory, optionally filtering by profile range.

Parameters: source (str): The URL to the directory containing the NetCDF files or the path to the local directory. start_profile (int, optional): The starting profile number to filter files. Defaults to None. end_profile (int, optional): The ending profile number to filter files. Defaults to None.

Returns: A list of xarray.Dataset objects loaded from the filtered NetCDF files.

seagliderOG1.readers.load_first_basestation_file(source)[source]

Load the first dataset from either an online source or a local directory.

Parameters: source (str): The URL to the directory containing the NetCDF files or the path to the local directory.

Returns: An xarray.Dataset object loaded from the first NetCDF file.

seagliderOG1.readers.load_sample_dataset(dataset_name='p0330015_20100906.nc')[source]

Download sample datasets for use with seagliderOG1

Parameters:

dataset_name (str, optional) – _description_. Defaults to “p0330015_20100906.nc”.

Raises:

ValueError – If the requests dataset is not known, raises a value error

Returns:

Requested sample dataset

Return type:

xarray.Dataset

plotters

seagliderOG1.plotters.plot_ctd_depth_vs_time(ds, start_traj=None, end_traj=None)[source]

Plots CTD depth vs time, optionally filtered by trajectory range, and highlights non-NaN GPS latitude values.

Parameters: ds (xr.Dataset): The input dataset containing ‘ctd_time’, ‘ctd_depth’, and ‘gps_lat’. start_traj (int, optional): The starting trajectory number to filter the data. Default is None. end_traj (int, optional): The ending trajectory number to filter the data. Default is None.

seagliderOG1.plotters.plot_depth_colored(data, color_by=None, start_dive=None, end_dive=None)[source]

Plots depth as a function of time, optionally colored by another variable, and filtered by dive number.

Parameters: data (pd.DataFrame or xr.Dataset): The input data containing ‘ctd_depth’ and ‘ctd_time’. color_by (str, optional): The variable to color the plot by. Default is None. start_dive (int, optional): The starting dive number to filter the data. Default is None. end_dive (int, optional): The ending dive number to filter the data. Default is None.

seagliderOG1.plotters.plot_profile_depth(data)[source]

Plots the profile depth (ctd_depth) as a function of time (ctd_time). Reduces the total number of points to be less than 100,000.

Parameters: data (pd.DataFrame or xr.Dataset): The input data containing ‘ctd_depth’ and ‘ctd_time’.

seagliderOG1.plotters.show_attributes(data)[source]

Processes an xarray Dataset or a netCDF file, extracts attribute information, and returns a DataFrame with details about the attributes.

Parameters: data (str or xr.Dataset): The input data, either a file path to a netCDF file or an xarray Dataset.

Returns: pandas.DataFrame: A DataFrame containing the following columns:

  • Attribute: The name of the attribute.

  • Value: The value of the attribute.

seagliderOG1.plotters.show_contents(data, content_type='variables')[source]

Wrapper function to show contents of an xarray Dataset or a netCDF file.

Parameters: data (str or xr.Dataset): The input data, either a file path to a netCDF file or an xarray Dataset. content_type (str): The type of content to show, either ‘variables’ (or ‘vars’) or ‘attributes’ (or ‘attrs’). Default is ‘variables’.

Returns: pandas.io.formats.style.Styler or pandas.DataFrame: A styled DataFrame with details about the variables or attributes.

seagliderOG1.plotters.show_variables(data)[source]

Processes an xarray Dataset or a netCDF file, extracts variable information, and returns a styled DataFrame with details about the variables.

Parameters: data (str or xr.Dataset): The input data, either a file path to a netCDF file or an xarray Dataset.

Returns: pandas.io.formats.style.Styler: A styled DataFrame containing the following columns:

  • dims: The dimension of the variable (or “string” if it is a string type).

  • name: The name of the variable.

  • units: The units of the variable (if available).

  • comment: Any additional comments about the variable (if available).

seagliderOG1.plotters.show_variables_by_dimension(data, dimension_name='trajectory')[source]

Processes an xarray Dataset or a netCDF file, extracts variable information, and returns a styled DataFrame with details about the variables filtered by a specific dimension.

Parameters: data (str or xr.Dataset): The input data, either a file path to a netCDF file or an xarray Dataset. dimension_name (str): The name of the dimension to filter variables by.

Returns: pandas.io.formats.style.Styler: A styled DataFrame containing the following columns:

  • dims: The dimension of the variable (or “string” if it is a string type).

  • name: The name of the variable.

  • units: The units of the variable (if available).

  • comment: Any additional comments about the variable (if available).

writers

seagliderOG1.writers.save_dataset(ds, output_file='../test.nc')[source]

Attempts to save the dataset to a NetCDF file. If a TypeError occurs due to invalid attribute values, it converts the invalid attributes to strings and retries the save operation.

Parameters:
  • (xarray.Dataset) (ds)

  • (str) (output_file)

Returns:

  • bool (True if the dataset was saved successfully, False otherwise.)

  • Based on (https://github.com/pydata/xarray/issues/3743)

tools

seagliderOG1.tools.add_dive_number(ds, dive_number=None)[source]

Add dive number as a variable to the dataset. Assumes present in the basestation attributes.

Parameters: ds (xarray.Dataset): The dataset to which the dive number will be added.

Returns: xarray.Dataset: The dataset with the dive number added.

seagliderOG1.tools.add_sensor_to_dataset(dsa, ds, sg_cal, firstrun=False)[source]
seagliderOG1.tools.assign_phase(ds)[source]

This function adds new variables ‘PHASE’ and ‘PHASE_QC’ to the dataset ds, which indicate the phase of each measurement. The phase is determined based on the pressure readings (‘PRES’) for each unique dive number (‘dive_num’).

Note: In this formulation, we are only separating into dives and climbs based on when the glider is at the maximum depth. Future work needs to separate out the other phases: https://github.com/OceanGlidersCommunity/OG-format-user-manual/blob/main/vocabularyCollection/phase.md and generate a PHASE_QC. Assigns phase values to the dataset based on pressure readings.

Parameters:

(xarray.Dataset) (ds)

Returns:

  • xarray.Dataset (The dataset with an additional ‘PHASE’ variable, where:)

  • xarray.Dataset (The dataset with additional ‘PHASE’ and ‘PHASE_QC’ variables, where:) –

    • ‘PHASE’ indicates the phase of each measurement:
      • Phase 2 is assigned to measurements up to and including the maximum pressure point.

      • Phase 1 is assigned to measurements after the maximum pressure point.

    • ’PHASE_QC’ is an additional variable with no QC applied.

  • Note (In this formulation, we are only separating into dives and climbs based on when the glider is at the maximum depth. Future work needs to separate out the other phases: https://github.com/OceanGlidersCommunity/OG-format-user-manual/blob/main/vocabularyCollection/phase.md and generate a PHASE_QC)

seagliderOG1.tools.assign_profile_number(ds, ds1)[source]
seagliderOG1.tools.calc_Z(ds)[source]

Calculate the depth (Z position) of the glider using the gsw library to convert pressure to depth.

Parameters:

(xarray.Dataset) (ds)

Returns:

xarray.Dataset

Return type:

The dataset with an additional ‘DEPTH’ variable.

seagliderOG1.tools.convert_qc_flags(dsa, qc_name)[source]
seagliderOG1.tools.convert_units(ds)[source]

Convert the units of variables in an xarray Dataset to preferred units. This is useful, for instance, to convert cm/s to m/s.

Parameters:

(xarray.Dataset) (ds)

Returns:

xarray.Dataset

Return type:

The dataset with converted units.

seagliderOG1.tools.convert_units_var(var_values, current_unit, new_unit, unit1_to_unit2={'Celsius_to_degreesCelsius': {'current_unit': 'Celsius', 'factor': 1, 'new_unit': 'degreesCelsius'}, 'Pa_to_dbar': {'current_unit': 'Pa', 'factor': 0.0001, 'new_unit': 'dbar'}, 'S m-1_to_mS cm-1': {'current_unit': 'S m-1', 'factor': 0.1, 'new_unit': 'mS cm-1'}, 'S/m_to_mS/cm': {'current_unit': 'S/m', 'factor': 0.1, 'new_unit': 'mS/cm'}, 'cm s-1_to_m s-1': {'current_unit': 'cm s-1', 'factor': 0.01, 'new_unit': 'm s-1'}, 'cm/s_to_m/s': {'current_unit': 'cm/s', 'factor': 0.01, 'new_unit': 'm/s'}, 'cm_to_m': {'current_unit': 'cm', 'factor': 0.01, 'new_unit': 'm'}, 'dbar_to_Pa': {'current_unit': 'dbar', 'factor': 10000, 'new_unit': 'Pa'}, 'dbar_to_kPa': {'current_unit': 'dbar', 'factor': 10, 'new_unit': 'kPa'}, 'degreesCelsius_to_Celsius': {'current_unit': 'degreesCelsius', 'factor': 1, 'new_unit': 'Celsius'}, 'g m-3_to_kg m-3': {'current_unit': 'g m-3', 'factor': 0.001, 'new_unit': 'kg m-3'}, 'g/m^3_to_kg/m^3': {'current_unit': 'g/m3', 'factor': 0.001, 'new_unit': 'kg/m3'}, 'kg m-3_to_g m-3': {'current_unit': 'kg m-3', 'factor': 1000, 'new_unit': 'g m-3'}, 'kg/m^3_to_g/m^3': {'current_unit': 'kg/m3', 'factor': 1000, 'new_unit': 'g/m3'}, 'km_to_m': {'current_unit': 'km', 'factor': 1000, 'new_unit': 'm'}, 'm s-1_to_cm s-1': {'current_unit': 'm s-1', 'factor': 100, 'new_unit': 'cm s-1'}, 'm/s_to_cm/s': {'current_unit': 'm/s', 'factor': 100, 'new_unit': 'cm/s'}, 'mS cm-1_to_S m-1': {'current_unit': 'mS cm-1', 'factor': 10, 'new_unit': 'S m-1'}, 'mS/cm_to_S/m': {'current_unit': 'mS/cm', 'factor': 10, 'new_unit': 'S/m'}, 'm_to_cm': {'current_unit': 'm', 'factor': 100, 'new_unit': 'cm'}, 'm_to_km': {'current_unit': 'm', 'factor': 0.001, 'new_unit': 'km'}}, firstrun=False)[source]

Convert the units of variables in an xarray Dataset to preferred units. This is useful, for instance, to convert cm/s to m/s.

Parameters:
  • (xarray.Dataset) (ds)

  • (list) (preferred_units)

  • (dict) (unit1_to_unit2)

  • string (Each key is a unit) –

    • ‘factor’: The factor to multiply the variable by to convert it.

    • ’units_name’: The new unit name after conversion.

  • with (and each value is a dictionary) –

    • ‘factor’: The factor to multiply the variable by to convert it.

    • ’units_name’: The new unit name after conversion.

Returns:

xarray.Dataset

Return type:

The dataset with converted units.

seagliderOG1.tools.encode_times(ds)[source]
seagliderOG1.tools.encode_times_og1(ds)[source]
seagliderOG1.tools.find_best_dtype(var_name, da)[source]
seagliderOG1.tools.gather_sensor_info(ds_other, ds_sgcal, firstrun=False)[source]

Gathers sensor information from the provided datasets and organizes it into a new dataset. Parameters: ds_other (xarray.Dataset): The dataset containing sensor data. ds_sgcal (xarray.Dataset): The dataset containing calibration data. firstrun (bool, optional): A flag indicating if this is the first run. Defaults to False. Returns: xarray.Dataset: A dataset containing the gathered sensor information. Notes: - The function looks for specific sensor names in the ds_other dataset and adds them to a new dataset ds_sensor. - If ‘aanderaa4330_instrument_dissolved_oxygen’ is present in ds_other, it is renamed to ‘aa4330’. - If ‘Pcor’ is present in ds_sgcal, an additional sensor ‘sbe43’ is created based on ‘sbe41’ with specific attributes. - If ‘optode_FoilCoefA1’ is present in ds_sgcal, an additional sensor ‘aa4831’ is created based on ‘sbe41’ with specific attributes. - The function sets appropriate attributes for the sensors ‘aa4330’, ‘aa4831’, and ‘sbe43’ if they are present.

seagliderOG1.tools.get_sg_attrs(ds)[source]
seagliderOG1.tools.reformat_units_str(old_unit, unit_format={'S/m': 'S m-1', 'cm/s': 'cm s-1', 'degreesCelsius': 'Celsius', 'degrees_Celsius': 'Celsius', 'g/m^3': 'g m-3', 'kg/m^3': 'kg m-3', 'm/s': 'm s-1', 'mS/cm': 'mS cm-1', 'meters': 'm'})[source]
seagliderOG1.tools.reformat_units_var(ds, var_name, unit_format={'S/m': 'S m-1', 'cm/s': 'cm s-1', 'degreesCelsius': 'Celsius', 'degrees_Celsius': 'Celsius', 'g/m^3': 'g m-3', 'kg/m^3': 'kg m-3', 'm/s': 'm s-1', 'mS/cm': 'mS cm-1', 'meters': 'm'})[source]

Renames units in the dataset based on the provided dictionary for OG1.

Parameters:
  • (xarray.Dataset) (ds)

  • (dict) (unit_format)

Returns:

xarray.Dataset

Return type:

The dataset with renamed units.

seagliderOG1.tools.set_best_dtype(ds)[source]
seagliderOG1.tools.set_best_dtype_value(value, var_name)[source]

Determines the best data type for a single value based on its variable name and converts it.

Parameters:

value (any) – The input value to convert.

Returns:

converted_value – The value converted to the best data type.

Return type:

any

seagliderOG1.tools.set_fill_value(new_dtype)[source]
seagliderOG1.tools.split_by_unique_dims(ds)[source]

Splits an xarray dataset into multiple datasets based on the unique set of dimensions of the variables.

Parameters: ds (xarray.Dataset): The input xarray dataset containing various variables.

Returns: tuple: A tuple containing xarray datasets, each with variables sharing the same set of dimensions.

utilities