datamodel Reference¶

Generate¶

class datamodel.generate.datamodel.DataModel(tree_ver: str = None, file_spec: str = None, path: str = None, keywords: list = [], env_label: str = None, location: str = None, verbose: bool = None, release: str = None, filename: str = None, access_path_name: str = None, design: bool = False, science_product: bool = None)[source]¶

Bases: object

Class to enable datamodel file generation for a given product

This class is used to generate valid SDSS datamodel files for a given data product.

Parameters:

tree_ver (str, optional) – an SDSS Tree configuration name, by default None
file_spec (str, optional) – The name of the file species (or sdss_access path name), by default None
path (str, optional) – A file path template definition, by default None
keywords (list, optional) – A list of path template keyword-value pairs, by default None
env_label (str, optional) – The environment variable name of the file’s location, by default None
location (str, optional) – A path location relative to the environment variable, by default None
verbose (bool, optional) – If True, turn on verbosity logging, by default None
release (str, optional) – The name of the SDSS release the file is a part of, by default None
filename (str, optional) – A full filepath to a real file on disk to create the datamodel for
access_path_name (str, optional) – A name of the path name in sdss_access, if different than the file species name, by default None
design (bool, optional) – If True, indicates the datamodel is in a design phase, by default None
science_product (bool, optional) – If True, indicates the datamodel is a recommended science product, by default None

Raises:

ValueError – when neither a path nor a (env_label + location) are specified
ValueError – when no path template keywords are specified

commit_stubs(format: str = None) → None[source]¶

Commit the stub files into git

Commit stub files into git. Performs a git pull, commits all stubs into the repo, and attempts a git push. Optionally specify a format to only commit a specific stub.

Parameters:: format (str, optional) – A stub format to commit, by default None

design_hdf(name: str = '/', description: str = None, hdftype: str = 'group', attrs=None, ds_shape: tuple = None, ds_dtype: str = None)[source]¶

Wrapper to _design_content, to design a new HDF5 section

Design a new HDF entry for the given datamodel. Specify h5py groups or dataset definitions, with optional list of attributes. Each new entry is added to the members entry in the YAML structure. Use name, and description to specify the name and description of each new group or dataset the new table. Use hdftype to specify a “group” or “dataset” entry. For datasets, use ds_shape, ds_size, and ds_dtype to specify the shape, size, and dtype of the array dataset.

New HDF5 members are added to the datamodel in a flattened structure. To add a new group or dataset as a child to an existing group, specify the full path in name, e.g /mygroup/mydataset.

attrs can be a list of tuples of header keywords, conforming to (key, value, comment, dtype), or list of dictionaries conforming to {“key”: key, “value”: value, “comment”: comment, “dtype”: dtype}.

Allowed attribute or dataset dtypes are any valid string representation of numpy dtypes. For example, “<i8”, “int32”, “S10”, etc.

Parameters:

name (str, optional) – the name of the HDF group or dataset, by default ‘/’
description (str, optional) – a description of the HDF group or dataset, by default None
hdftype (str, optional) – the type of HDF5 object, by default ‘group’
attrs (list, optional) – a list of HDF5 Attributes, by default None
ds_shape (tuple, optional) – the shape of an HDF5 array dataset, by default None
ds_dtype (str, optional) – the dtype of an HDF5 array dataset, by default None

Raises:

ValueError – when an invalid hdftype is specified

design_hdu(ext: str = 'primary', extno: int = None, name: str = 'EXAMPLE', description: str = None, header: list | dict | Header = None, columns: List[list | dict | Column] = None, **kwargs)[source]¶

Wrapper to _design_content, to design a new HDU

Design a new astropy HDU for the given datamodel. Specify the extension type ext to indicate a PRIMARY, IMAGE, or BINTABLE HDU extension. Each new HDU is added to the YAML structure using next hdu extension id found, or the one provided with extno. Use name to specify the name of the HDU extension. Each call to this method will write out the new HDU to the YAML design file.

header can be a Header instance, a list of tuples of header keywords, conforming to (keyword, value, comment), or list of dictionaries conforming to {“keyword”: keyword, “value”: value, “comment”: comment}.

columns can be a list of Column objects, a list of tuples minimally conforming to (name, format, unit), or list of dictionaries minimally conforming to {“name”: name, “format”: format, “unit”: unit}. See Astropy’s Binary Table Column Format for the allowed format values. When supplying a list of tuples or dictionaries, can include any number of valid arguments into Column.

Parameters:

ext (str, optional) – the type of HDU to create, by default ‘primary’
extno (int, optional) – the extension number, by default None
name (str, optional) – the name of the HDU extension, by default ‘EXAMPLE’
description (str, optional) – a description for the HDU, by default None
header (Union[list, dict, fits.Header], optional) – valid input to create a Header, by default None
columns (List[Union[list, dict, fits.Column]], optional) – a list of binary table columns, by default None
force (bool) – If True, forces a new design even if the HDU already exists, by default None
**kwargs – additional keyword arguments to pass to the HDU constructor
optional – additional keyword arguments to pass to the HDU constructor

Raises:

ValueError – when the ext type is not supported
ValueError – when the table columns input is not a list

design_par(comments: str = None, header: list | dict = None, name: str = None, description: str = None, columns: list = None)[source]¶

Wrapper to _design_content, to design a new Yanny par section

Design a new Yanny par for the given datamodel. Specify Yanny comments, a header section, or a table definition. Each new table is added to the YAML structure. Use name, and description to specify the name and description of the new table. comments can be a single string of comments, with newlines indicated by “\n”.

header can be a dictionary of key-value pairs, a list of tuples of header keywords, conforming to (keyword, value, comment), or list of dictionaries conforming to {“key”: key, “value”: value, “comment”: comment}.

The columns parameter defines the relevant table columns to add to the file. It can be a list of column names, a list of tuple values conforming to column (name, type, [description]), or a list of dictionaries with keys defined from the complete column yaml definition.

Parameters:

comments (str, optional) – Any comments to add to the file, by default None
header (Union[list, dict], optional) – Keywords to add to the header of the Yanny file, by default None
name (str, optional) – The name of the parameter table
description (str, optional) – A description of the parameter table
columns (list, optional) – A set of Yanny table column definitions

determine_survey(name_only: bool = False)[source]¶: Attempt to determine the SDSS survey for this datamodel

classmethod from_file(filename: str, path_name: str = None, tree_ver: str = None, verbose: bool = None) → D[source]¶

class method to create a datamodel from an absolute filepath

Creates a DataModel for a given full path to a file. Prompts the user to verify any existing entry in sdss_access for the input file, or to define a new file_species / path_name, symbolic path location, and example variable_name=value key mappings.

Parameters:

filename (str) – The full path to the file
path_name (str, optional) – The existing sdss_access path name if any, by default None
tree_ver (str, optional) – The SDSS tree version or release associated with the file, by default None
verbose (bool, optional) – If True, creates the DataModel with verbosity, by default None

Returns:

DataModel – a SDSS DataModel instance

classmethod from_yaml(species: str, release: str = None, verbose: bool = None, tree_ver: str = None) → D[source]¶

class method to create a datamodel from a YAML file species name

Creates a DataModel for a given file species name, from an existing YAML datamodel file. Extracts the abstract path and keyword arguments needed to instantiate a DataModel. Keywords are extracted using the datamodel “location” and “example” fields. The abstract path is extracted from the pre-existing “access_string” field. Fields are pulled from the specified release. If no release specified, it uses the first release it can find from the datamodel. Can optionally specify a tree config version instead for the cases where the WORK release is from the sdss5 config instead of sdsswork. If the tree_ver is set, it supersedes the release keyword.

Parameters:

species (str) – the file species datamodel name
release (str, optional) – the SDSS release, by default None
verbose (bool, optional) – if True, turn on verbosity, by default None
tree_ver (str, optional) – the SDSS tree config version, by default None

Returns:

DataModel – a SDSS DataModel instance

Raises:

ValueError – when no yaml file can be found for the file species
ValueError – when no release can be found in the datamodel
ValueError – when no location or example can be found in the datamodel
ValueError – when no path keyword arguments can be extracted

generate_designed_file(redesign: bool = None, **kwargs)[source]¶

Generate a file from a designed datamodel

Generates a real file on disk from a designed datamodel. If there are any path template keywords, they must be specified here as input keyword arguments to convert the symbolic path / abstract location to a real example location on disk. After generating the file, the datamodel sets design to False and exits design mode.

Parameters:

redesign (bool) – If True, re-enters design mode to create a new file
kwargs – Any path keyword arguments to be filled in

Raises:

KeyError – when there are missing path keywords
AttributeError – when the release is not WORK when in the datamodel design phase

get_stub(format: str = 'yaml') → BaseStub[source]¶

Get a datamodel Stub

Return a datamodel Stub for a given format.

Parameters:: format (str, optional) – the stub format to return, by default ‘yaml’
Returns:: BaseStub – an instance of a stub class

remove_stubs(format: str = None, git: bool = None) → None[source]¶

Remove the stub files

Remove all stubs or a stub of a given format.

Parameters:

format (str, optional) – A stub format to remove, by default None
git (bool, optional) – If True, removes from the git repo

write_stubs(format: str = None, force: bool = None, use_cache_release: str = None, full_cache: bool = None, group: str = 'WORK', force_release: str = None) → None[source]¶

Write out the stub files

Write out all stubs or a stub of a given format.

Parameters:

format (str, optional) – A stub format to write out, by default None
force (bool, optional) – If True, forces a rewrite of the entire cached stub content
force_release (str, optional) – A specific release to force a rewrite in the cache
use_cache_release (str, optional) – Specify a cached release to use to copy over custom user content
full_cache (bool, optional, by default None) – If True, use the entire cached YAML release, rather than only the HDUs
group (str, optional) – The release group to use when writing the markdown file, by default “WORK”. Can be “DR”, or “IPL”.

property file_exists¶: Checks for file existence on disk

property recommended_science_product: bool¶: Checks if the datamodel product is a recommended science product

supported_filetypes = ['.fits', '.fit', '.par', '.h5', '.hdf5']¶

property survey: str¶: Get the SDSS survey for this datamodel

property vac: bool¶: Checks if the datamodel product is a vac by its envvar label

datamodel.generate.datamodel.prompt_for_access(filename: str, path_name: str = None, config: str = None) → tuple[source]¶

Prompt the user to verify or define information

Takes the user through a variety of input prompts in order to verify any existing entry in sdss_access, or to define a new file species, symbolic path location, and example variable=value key mappings for the input file.

Parameters:

filename (str) – The full path to the file
path_name (str, optional) – Any existing sdss_access path_name / file_species, by default None
config (str, optional) – What tree config version, or release the file corresponds to, by default None

Returns:

tuple – a tuple of path_name, path_template, path_keys

datamodel.generate.parse.cleanup_dups(kwargs: dict) → dict[source]¶

Cleanup duplicate keys in the extracted keywords

Removes the duplicated keywords from the extracted kwargs. If both key values are the same, uses it. If both are digits, attempts to remove any front zero-padding, e.g. “45”, and “”000045” -> “45”.

Parameters:: kwargs (dict) – the input extracted keywords
Returns:: dict – reduced keyword dictionary

datamodel.generate.parse.deduplicate(value: str, names: list) → str[source]¶

De-duplicate regex pattern field names

Some paths have duplicate field names, e.g. “run”. The default regex named group replace fails with duplicate field names. To handle this we append each duplicate field name with “_” so the re.groupdict method can work properly.

Parameters:

value (str) – the input regex search pattern
names (list) – a list of path field names

Returns:

str – the new regex search pattern

datamodel.generate.parse.find_kwargs(location: str, example: str) → dict[source]¶

Find and extract keyword arguments

Attempts to extract keyword argumets from an input abstract datamodel path location and its example path. The location and example parts must match exactly. For example, given “{mjd}/sdR-{br}{id}-{frame}.fits.gz” and “55049/sdR-b1-00100006.fits.gz”, it returns {‘mjd’: ‘55049’, ‘br’: ‘b’, ‘id’: ‘1’, ‘frame’: ‘00100006’}

Parameters:

location (str) – a datamodel abstract location
example (str) – a datamodel example location

Returns:

dict – any extracted keyword arguments

datamodel.generate.parse.get_abstract_key(key: str = None, add_brackets: bool = None) → str[source]¶

Sanitize the path keyword name

Sanitizes the path keyword name. Upper cases the keyword name and appends any formatting numbers as an integer to the end of name. E.g. “plate:0>5” is converted to “PLATE5”.

Parameters:: key (str, optional) – The keywork name, by default None
Returns:: str – the sanitized keyword name

datamodel.generate.parse.get_abstract_path(path: str = None, add_brackets: bool = None) → str[source]¶

Converts a path template into an abstract path

Converts a path template into an abstract path. Extracts bracketed keywords from a path a template and converts them to named uppercase. For example, MANGA_SPECTRO_REDUX/{drpver}/{plate}/stack/manga-{plate}-{ifu}-{wave}CUBE.fits.gz is converted to MANGA_SPECTRO_REDUX/DRPVER/PLATE/stack/manga-PLATE-IFU-WAVECUBE.fits.gz.

Parameters:: path (str, optional) – the path template, by default None
Returns:: str – the abstracted path

datamodel.generate.parse.get_file_spec(file_spec: str = None) → str[source]¶

Checks validity of file species string

Checks if the file species name is a valid Python identifier.

Parameters:: file_spec (str, optional) – the name of the file species, by default None
Returns:: str – the name of the file species

datamodel.generate.parse.remap_patterns(value: str) → str[source]¶

Remaps regex search patterns for certain fields

Some paths have abutted keywords, i.e. “{br}{id}” or “{dr}{version}”. The default regex search pattern of “.+?” can sometimes handle these but sometimes not. We replace certain fields with specific patterns to help the extraction process.

Parameters:: value (str) – the input regex search pattern
Returns:: str – the new regex search pattern

class datamodel.generate.stub.AccessStub(datamodel=None, use_cache_release: str = None, full_cache: bool = None, verbose: bool = None, force: bool = None)[source]¶

Bases: BaseStub

cacheable: bool = False¶

format: str = 'access'¶

has_template: bool = False¶

class datamodel.generate.stub.BaseStub(datamodel=None, use_cache_release: str = None, full_cache: bool = None, verbose: bool = None, force: bool = None)[source]¶

Bases: ABC

add_datamodel(datamodel)[source]¶

commit_to_git() → None[source]¶: Commit the stub to Github

classmethod from_datamodel(datamodel)[source]¶

push_to_git() → None[source]¶: Push changes to Github

remove_from_git() → None[source]¶: Remove file from the git repo

remove_output() → None[source]¶: Delete the yaml file on disk

remove_release(release: str)[source]¶: Remove a release from the datamodel stub

render_content(force: bool = None, force_release: str = None) → None[source]¶: Populate the yaml template with generated content

update_cache(force: bool = None) → None[source]¶: Update the in-memory stub cache from the on-disk file

validate_cache()[source]¶: Validate the yaml cache

write(force: bool = None, use_cache_release: str = None, full_cache: bool = None, **kwargs) → None[source]¶

cacheable = False¶

format = None¶

has_template = True¶

class datamodel.generate.stub.JsonStub(datamodel=None, use_cache_release: str = None, full_cache: bool = None, verbose: bool = None, force: bool = None)[source]¶

Bases: BaseStub

format: str = 'json'¶

has_template: bool = False¶

class datamodel.generate.stub.MdStub(datamodel=None, use_cache_release: str = None, full_cache: bool = None, verbose: bool = None, force: bool = None)[source]¶

Bases: BaseStub

get_selected_release(release: str = None, group: str = 'WORK') → str[source]¶: get the hdu content for a given release

render_content(force: bool = None, release: str = None, group: str = 'WORK') → None[source]¶: Populate the yaml template with generated content

write(force: bool = None, release: str = None, group: str = 'WORK', html: bool = None, use_cache_release: str = None, full_cache: bool = None, **kwargs) → None[source]¶

format: str = 'md'¶

class datamodel.generate.stub.YamlStub(datamodel=None, use_cache_release: str = None, full_cache: bool = None, verbose: bool = None, force: bool = None)[source]¶

Bases: BaseStub

cacheable: bool = True¶

format: str = 'yaml'¶

datamodel.generate.stub.stub_iterator(format: str = None) → Iterator[BaseStub][source]¶: Iterator for all stub formats

Filetypes¶

class datamodel.generate.filetypes.base.BaseFile(cache: dict, datamodel=None, stub=None, filename: str = None, release: str = None, file_species: str = None, design: bool = None, use_cache_release: str = None, full_cache: bool = None)[source]¶

Bases: ABC

Base class for supported datamodel file types

This is the abstract base class used for defining new file types to be supported by the sdss datamodel product.

Parameters:

cache (dict) – The initial yaml cache to be populated.
datamodel (DataModel, optional) – an SDSS datamodel for the file, by default None
stub (Stub, optional) – an datamodel Stub for the file, by default None
filename (str, optional) – the name of file, by default None
release (str, optional) – the data release, by default None
file_species (str, optional) – the file species name, by default None
design (bool, optional) – whether the datamodel is in design mode, by default None
use_cache_release (str, optional) – the release to pull existing cache from, by default None
full_cache (bool, optional) – whether to use the entire previous cache, by default None

Raises:

ValueError – when datamodel is not provided when (filename, release, file_species) are not provided.

abstract _generate_new_cache() → dict[source]¶

Abstract method to be implemented by subclass, for generating new cache content

This method is used to generate the file content for new datamodel YAML files. It should return a dictionary to be stored as the value of the cache key.

abstract static _get_designed_object(data: dict)[source]¶

Abstract static method to be implemented by subclass, for creating a valid object from cache

This method is used to create a data object from a designed YAML cache content. It should return a new designed object. Ideally the object should be created through the Pydantic model’s model_validate to ensure proper validation and field type coercion. This method is called by create_from_cache which sets the object as the self._designed_object attribute.

Parameters:: data (dict) – The YAML cache value for the cache_key field

abstract _update_partial_cache(cached_data: dict, old_cache: dict) → dict[source]¶

Abstract method to be implemented by subclass, for partially updating cache content

This method updates the descriptions or comments of the new cached_data with the human-edited fields from the old_cache data. Used when adding a new release to a datamodel and retaining the old descriptions from the previous release. This method should return the cached_data object.

Parameters:

cached_data (dict) – The YAML cache for a the current release
old_cache (dict) – The YAML cache for a previous release

create_from_cache(release: str = 'WORK')[source]¶

Create a file object from the yaml cache

Converts the cache_key dictionary entry in the YAML cache into a file object.

Parameters:

release (str, optional) – the name of the data release, by default ‘WORK’

Returns:

object – a valid file object

Raises:

ValueError – when the release is not in the cache
ValueError – when the release is not WORK when in the datamodel design phase

abstract design_content()[source]¶

Abstract method to be implemented by subclass, for designing file content

This method is used to design new content for a YAML datamodel cache for new files from within Python. It should ultimately update the cache line self._cache[‘releases’][‘WORK’][self.cache_key] = [updated_cache_content] with the new content. This method is called by the DataModel’s global design_content method.

abstract write_design(file: str, overwrite: bool = None)[source]¶

Abstract method to be implemented by subclass, for writing a design to a file

This method is used to write out the designed data object. It should call the self.designed_object’s particular method for writing itself to a file, specific to that filetype.

Parameters:

file (str) – The datamodel filename to write to
overwrite (bool) – Flag to overwrite the file if it exists, by default None

aliases = []¶

cache_key = None¶

compressions = ['.gz', '.bz2', '.zip']¶

suffix = None¶

datamodel.generate.filetypes.base.file_selector(suffix: str = None) → BaseFile[source]¶: Selects the correct File class given a file suffix

datamodel.generate.filetypes.base.format_bytes(value: int = None) → str[source]¶

Convert an integer to human-readable format.

Parameters:: value (int) – An integer representing number of bytes.
Returns:: str – Size of the file in human-readable format.

datamodel.generate.filetypes.base.get_filesize(file) → str[source]¶

Get the size of the input file.

Returns:: str – Size of the file in human-readable format.

datamodel.generate.filetypes.base.get_filetype(file) → str[source]¶

Get the extension of the input file.

Returns:: str – File type in upper case.

datamodel.generate.filetypes.base.get_supported_filetypes() → list[source]¶

Get a list of supported filetypes

Constructs a list of supported filetypes for datamodels, based on the BaseFile subclasses. Collects each subclass file suffix attribute as well as any designated aliases.

Returns:: list – A list of supported file types

class datamodel.generate.filetypes.fits.FitsFile(*args, **kwargs)[source]¶

Bases: BaseFile

Class for supporting FITS files

design_content(ext: str = 'primary', extno: int = None, name: str = 'EXAMPLE', description: str = None, header: list | dict | Header = None, columns: List[list | dict | Column] = None, **kwargs) → None[source]¶

Design a new HDU

Parameters:

ext (str, optional) – the type of HDU to create, by default ‘primary’
extno (int, optional) – the extension number, by default None
name (str, optional) – the name of the HDU extension, by default ‘EXAMPLE’
description (str, optional) – a description for the HDU, by default None
header (Union[list, dict, fits.Header], optional) – valid input to create a Header, by default None
columns (List[Union[list, dict, fits.Column]], optional) – a list of binary table columns, by default None
force (bool) – If True, forces a new design even if the HDU already exists, by default None
**kwargs – additional keyword arguments to pass to the HDU constructor
optional – additional keyword arguments to pass to the HDU constructor

Raises:

ValueError – when the ext type is not supported
ValueError – when the table columns input is not a list

write_design(file: str, overwrite: bool = True) → None[source]¶

Write out the designed file

Write out a designed fits.HDUList object to a file on disk. Must have run create_from_cache method first.

Parameters:

file (str) – The designed filename
overwrite (bool, optional) – If True, overwrites any existing file, by default True

Raises:

AttributeError – when the designed object does not exit

aliases = ['FIT']¶

cache_key = 'hdus'¶

suffix = 'FITS'¶

class datamodel.generate.filetypes.par.ParFile(*args, **kwargs)[source]¶

Bases: BaseFile

Class for supporting Yanny par files

design_content(comments: str = None, header: list | dict = None, name: str = None, description: str = None, columns: list = None) → None[source]¶

Design a new Yanny par section

Allowed column types are any valid Yanny par types, input as strings, e.g. “int”, “float”, “char”. Array columns can be specified by including the array size in “[]”, e.g. “float[6]”. Enum types are defined by setting is_enum to True, and by providing a list of possible values via enum_values.

Parameters:

comments (str, optional) – Any comments to add to the file, by default None
header (Union[list, dict], optional) – Keywords to add to the header of the Yanny file, by default None
name (str, optional) – The name of the parameter table
description (str, optional) – A description of the parameter table
columns (list, optional) – A set of Yanny table column definitions

write_design(file: str, overwrite: bool = True) → None[source]¶

Write out the designed file

Write out a designed Yanny par object to a file on disk. Must have run create_from_cache method first.

Parameters:

file (str) – The designed filename
overwrite (bool, optional) – If True, overwrites any existing file, by default True

Raises:

AttributeError – when the designed object does not exit

cache_key = 'par'¶

suffix = 'PAR'¶

class datamodel.generate.filetypes.par.literal[source]¶: Bases: str

datamodel.generate.filetypes.par.literal_presenter(dumper, data)[source]¶

class datamodel.generate.filetypes.hdf5.HdfFile(cache: dict, datamodel=None, stub=None, filename: str = None, release: str = None, file_species: str = None, design: bool = None, use_cache_release: str = None, full_cache: bool = None)[source]¶

Bases: BaseFile

Class for supporting HDF5 files

design_content(name: str = '/', description: str = None, hdftype: str = 'group', attrs: list = None, ds_shape: tuple = None, ds_dtype: str = None)[source]¶

Design a new HDF5 section for the datamodel

New HDF5 members are added to the datamodel in a flattened structure. To add a new group or dataset as a child to an existing group, specify the full path in name, e.g /mygroup/mydataset.

Allowed attribute or dataset dtypes are any valid string representation of numpy dtypes. For example, “<i8”, “int32”, “S10”, etc.

Parameters:

name (str, optional) – the name of the HDF group or dataset, by default ‘/’
description (str, optional) – a description of the HDF group or dataset, by default None
hdftype (str, optional) – the type of HDF5 object, by default ‘group’
attrs (list, optional) – a list of HDF5 Attributes, by default None
ds_shape (tuple, optional) – the shape of an HDF5 array dataset, by default None
ds_dtype (str, optional) – the dtype of an HDF5 array dataset, by default None

Raises:

ValueError – when an invalid hdftype is specified

write_design(file: str, overwrite: bool = None) → None[source]¶

Write out the designed file

Write out a designed HDF5 object to a file on disk. Must have run create_from_cache method first.

Parameters:

file (str) – The designed filename
overwrite (bool, optional) – If True, overwrites any existing file, by default True

Raises:

AttributeError – when the designed object does not exit

aliases = ['HDF5']¶

cache_key = 'hdfs'¶

suffix = 'H5'¶

Changelog¶

class datamodel.generate.changelog.core.ChangeLog(the_list: list, **kwargs)[source]¶

Bases: list

Class that holds the change logs for all input files

Contains a list of all FileDiff objects. Mainly used as a container to iterate over many changelogs and generate a string report or dictionary object for each item in the list.

Parameters:: the_list (list) – A list of FileDiff objects

generate_report(split: bool = None, insert: bool = True) → str | list[source]¶

Generate a string report of all changelogs

Iterates over all changelogs and builds a complete report containing all differences.

Parameters:

split (bool, optional) – If True, splits the string, by default None
insert (bool, optional) – If True, inserts a divider between changelogs, by default True

Returns:

str | list – A generated report of all changelogs

get_changes() → dict[source]¶

class datamodel.generate.changelog.core.FileDiff(file1: str, file2: str, versions: list = None, diff_type: str = None)[source]¶

Bases: ABC, object

Class that holds the difference between two files

Creates an object that compares the difference between two files. Base class that is subclassed by FitsDiff and CatalogDiff.

Parameters:

file1 (str) – the filepath to compute the changes against
file2 (str) – the filepath to compute the changes from
versions (list, optional) – the named releases/versions corresponding to the two input files, by default None
diff_type (str, optional) – the object data type of which to compute the difference, by default None

abstract report()[source]¶: Print a report

datamodel.generate.changelog.core.compute_changelog(items: list, change: str = 'fits') → ChangeLog[source]¶

Compute the changelogs between a list of datamodels

Given an input list of DataModel objects, computes the differences between them using the on-disk real file location. By default computes the FITS differences.

Parameters:

items (list) – A list of datamodels
change (str, optional) – The type of object, by default fits

Returns:

ChangeLog – A list of changelogs

datamodel.generate.changelog.core.compute_diff(oldfile: str, otherfile: str, change: str = 'fits', versions: list = None) → FileDiff[source]¶

Produce a single changelog between two files

Produce a difference object for two files

Parameters:

oldfile (str) – The filepath to check changes against
otherfile (str) – The filepath to check changes from
change (str, optional) – The type of data input, by default ‘fits’
versions (list, optional) – The named releases/versions of the corresponding file inputs, by default None

Returns:

FileDiff – An instance containing the differences between the two files

Raises:

ValueError – when no valid input filepath is given

datamodel.generate.changelog.core.diff_selector(suffix: str = None) → FileDiff[source]¶: Select the correct class given a file suffix

class datamodel.generate.changelog.yaml.YamlDiff(content: dict = None, file: str = None)[source]¶

Bases: ABC

Computes the difference between two releases in YAML cache

Computes the differences in HDU content between releases in a given YAML datamodel file, or cached dictionary.

Parameters:

content (dict, optional) – The yaml cache content for a given datamodel, by default None
file (str, optional) – A path to a yaml datamodel file, by default None

Raises:

ValueError – when no yaml filepath or cache content is provided
ValueError – when no releases can be identified from the yaml content

abstract _get_changes(version1: str, version2: str, simple: bool = None) → dict[source]¶

Abstract method to be implemented by subclass, for generating changelog content

This method is used to construct a dictionary of changes between two releases for the given file YAML content. It should return a dictionary object, minimally of the form {version1: {“from”: version2, “key1”: value1, “key2”: value2, …}} where key1: value1, etc are the custom changes between the two releases. The input version1 is the new release, and version2 is the older release of which to compute the difference.

clean_empty(d: dict) → dict[source]¶: clean up an empty dictionary

compute_changelog(version1: str = 'A', version2: str = 'B', simple: bool = False) → dict[source]¶

Compute the changelog between two releases

Computes the changes between two releases in a given YAML cache. Compares the “hdus” entries in each release, and looks for differences in HDU extension number, added or removed HDU extensions, differences in primary header keyword number, and any added or removed primary header keywords.

Parameters:

version1 (str, optional) – The release to check differences against, by default ‘A’
version2 (str, optional) – The release to check differences from, by default ‘B’
simple (bool, optional) – If True, simplfies the changelog entries to only non-null values, by default False

Returns:

dict – a dictionary of found changes

Raises:

ValueError – when no HDULists are found in the YAML cache

generate_changelog(order: list = None, simple: bool = False) → dict[source]¶

Generate a full changelog dictionary across all releases

Iterate over all releases and generate a complete changelog from one release to another. The release order to compute the changelog can be specified by passing in a desired list of releases to the order keyword. Set simple to True to produce a cleaner, simpler changelog, containing only non-null entries.

Parameters:

order (list, optional) – The order of releases to generate changelog from, by default None
simple (bool, optional) – If True, simplfies the changelog entries to only non-null values, by default False

Returns:

dict – A complete changelog dictionary over all releases

has_changes(version1: str = 'A', version2: str = 'B') → bool[source]¶

Check if there are any changes between two releases

Computes the changelog between two releases and returns a flag if changes are detected. Compares the differences of release “version1” from release “version2”.

Parameters:

version1 (str, optional) – The release to check differences against, by default ‘A’
version2 (str, optional) – The release to check differences from, by default ‘B’

Returns:

bool – True if any changes detected

cache_key = None¶

suffix = None¶

datamodel.generate.changelog.yaml.yamldiff_selector(suffix: str = None) → YamlDiff[source]¶: Select the correct class given a file suffix

class datamodel.generate.changelog.filetypes.catalog.CatalogDiff(file1: str | Table, file2: str | Table, full: bool = None, versions: list = None)[source]¶

Bases: FileDiff

Compute the difference between two catalog files

Computes the differences in catalog content between two input ascii catalog files, e.g. CSV. Looks for changes in row number, column number, and any added or removed column names.

Parameters:

file1 (Union[str, Table]) – the filepath or Table to compute the changes against
file2 (Union[str, Table]) – the filepath or Table to compute the changes from
full (bool, optional) – If True, compute the full Astropy Ascii Table differences, by default None
versions (list, optional) – the named releases/versions corresponding to the two input files, by default None

get_astropy_diff() → str[source]¶

Returns the full Astropy diff using report_diff_values

Returns:: str – the complete difference between the two catalog files

report(split: bool = None, full: bool = None) → str[source]¶

Print the catalog difference report

Returns the catalog differences as a string blob. Can optionally return the report as a list of string lines.

Parameters:

split (bool, optional) – if True, splits the report into a list of string lines, by default None
full (bool, optional) – if True, appends the full Astropy catalog diff report

Returns:

str – The difference report as a string blob

suffix = 'CATALOG'¶

class datamodel.generate.changelog.filetypes.fits.FitsDiff(file1: str | HDUList, file2: str | HDUList, full: bool = None, versions: list = None)[source]¶

Bases: FileDiff

Compute the difference between two FITS files

Computes the differences in HDUList content between two input FITS files. Looks for changes in HDU extension number, any added or removed HDU extensions, as well as any changes in the primary header keywords.

Parameters:

file1 (Union[str, fits.HDUList]) – the filepath or HDUList to compute the changes against
file2 (Union[str, fits.HDUList]) – the filepath or HDUList to compute the changes from
full (bool, optional) – If True, compute the full Astropy FITS HDUList differences, by default None
versions (list, optional) – the named releases/versions corresponding to the two input files, by default None

get_astropy_diff() → FITSDiff[source]¶

Returns the full Astropy FITSDiff

Returns:: fits.FITSDiff – the complete difference between the two FITS files

report(split: bool = None) → str[source]¶

Print the FITS difference report

Returns the FITS differences as a string blob. Can optionally return the report as a list of string lines.

Parameters:: split (bool, optional) – if True, splits the report into a list of string lines, by default None
Returns:: str – The difference report as a string blob

suffix = 'FITS'¶

class datamodel.generate.changelog.filetypes.fits.YamlFits(content: dict = None, file: str = None)[source]¶

Bases: YamlDiff

Class for supporting YAML changelog generation for FITS files

cache_key = 'hdus'¶

suffix = 'FITS'¶

class datamodel.generate.changelog.filetypes.par.ParDiff(file1: str | None, file2: str | None, versions: list = None)[source]¶

Bases: FileDiff

Class for computing differences between Yanny par files

Computes the differences in table content between two Yanny par files. Looks for changes in header keys, table number, and any added or removed keys, tables, or table columns.

Parameters:

file1 (Union[str, Table]) – the filepath or Table to compute the changes against
file2 (Union[str, Table]) – the filepath or Table to compute the changes from
versions (list, optional) – the named releases/versions corresponding to the two input files, by default None

report(split: bool = None, full: bool = None) → str[source]¶: Print the yanny par difference report

suffix = 'PAR'¶

class datamodel.generate.changelog.filetypes.par.YamlPar(content: dict = None, file: str = None)[source]¶

Bases: YamlDiff

Class for supporting YAML changelog generation for Yanny par files

cache_key = 'par'¶

suffix = 'PAR'¶

class datamodel.generate.changelog.filetypes.hdf5.YamlHDF5(content: dict = None, file: str = None)[source]¶

Bases: YamlDiff

Class for supporting YAML changelog generation for HDF5 files

cache_key = 'hdfs'¶

suffix = 'H5'¶

Git¶

class datamodel.gitio.gitio.Git(verbose=None)[source]¶

Bases: object

Class to run the git commands

Wrapper class to the GitPython package.

add(path: str = None)[source]¶

Add a file to the git repo

Performs a “git add” on the datamodel repo

Parameters:

path (str, optional) – the full path of the file to add, by default None

Raises:

AttributeError – when on the main branch
RuntimeError – when the git command fails

check_if_untracked(path: str = None) → bool[source]¶

Checks if a file is untracked in the git repo

Parameters:: path (str, optional) – the full path of the file to add, by default None
Returns:: bool – if the file is untracked

checkout(branch: str = None)[source]¶

Checks out a branch from the git repo

Performs a “git checkout” on the datamodel repo. If the branch does not exist, it will be created.

Parameters:: branch (str, optional) – the name of the branch to checkout, by default None
Raises:: RuntimeError – when the git command fails

clone(product: str = None, branch: str = None)[source]¶

Clones the git repo

Performs a “git clone” of the datamodel repo.

Parameters:

product (str, optional) – the Github repo URL by default None
branch (str, optional) – the name or directory path of the clone, by default None

Raises:

RuntimeError – when the git command fails

commit(message: str = None)[source]¶

Commit a file to the git repo

Performs a “git commit” on the datamodel repo

Parameters:

message (str, optional) – a git commit message, by default None

Raises:

AttributeError – when on the main branch
RuntimeError – when the git command fails

create_new_branch(branch: str = None)[source]¶

Create a new branch

Create a new branch. If no branch name is provided, it will create a branch name based on the email head found in the git user config. If none found, creates a random branch name using a UUID.

Parameters:: branch (str, optional) – the name of the branch to create, by default None

fetch()[source]¶

Fetch from Github remote origin

Performs a “git fetch” on the datamodel repo

Raises:: RuntimeError – when the git command fails

get_path_location(path: str = None) → str[source]¶

Gets the path location

Gets the location relative to the git repo directory of the filepath.

Parameters:: path (str, optional) – the full path of the file to add, by default None
Returns:: str – the relative location of the path

list_branches(pprint: bool = None) → list[source]¶: List all local branches for the repo

list_remotes(pprint: bool = None) → list[source]¶: List all remotes for the repo

pull()[source]¶

Pull from Github remote origin

Performs a “git pull” on the datamodel repo

Raises:

RuntimeError – when the current branch does not exist on remote
RuntimeError – when the current repo is dirty
RuntimeError – when the git command fails

push()[source]¶

Push to Github remote origin

Performs a “git push” on the datamodel repo

Raises:

RuntimeError – when the current branch does not exist on remote
RuntimeError – when the current repo is dirty
RuntimeError – when the git command fails

rm(path: str = None)[source]¶

Remove a file from the git repo

Performs a “git rm” on the datamodel repo

Parameters:

path (str, optional) – the full path of the file to remove, by default None

Raises:

AttributeError – when on the main branch
RuntimeError – when the git command fails

set_repo()[source]¶: Set the repo if needed

status() → str[source]¶: Return the git status of the repo

property branch_exists_on_remote: bool¶: if the current active branch exists at the remote

property current_branch: str¶: the current active branch

property directory: str¶: the directory of the git repo

property is_dirty: bool¶: if the repo is dirty

property is_main_branch: bool¶: if the current active branch is the main branch

property origin¶: the git remote origin

Io¶

datamodel.io.loaders.dm(loader, node)[source]¶

datamodel.io.loaders.get_yaml_files(get: str = None) → str | list[source]¶

Get a list of yaml files

Return a list of YAML files in the datamodel directory.

Parameters:: get (str, optional) – type of yaml file to get, can be “releases” or “products”, by default None
Returns:: Union[str, list] – The yaml file path or list of yaml file paths

datamodel.io.loaders.include(loader, node)[source]¶

datamodel.io.loaders.read_yaml(ymlfile: str | Path) → dict[source]¶

Opens and reads a YAML file

Parameters:: ymlfile (Union[str, pathlib.Path]) – a file or pathlib.Path object
Returns:: dict – the YAML content

datamodel.io.move.construct_new_path(file: str | Path = None, old_path: str | Path = None, new_path: str | Path = None, release: str = None, kwargs: dict = None) → Path[source]¶

Construct a new filepath

Constructs a new filepath, either from an abstract path location and a set of keyword arguments, or from an existing (old) filepath and abstract location.

Parameters:

file (Union[str, pathlib.Path], optional) – the existing full filepath, by default None
old_path (Union[str, pathlib.Path], optional) – the existing species abstract path, by default None
new_path (Union[str, pathlib.Path], optional) – the new species abstract path, by default None
release (str, optional) – the SDSS release, by default None
kwargs (dict, optional) – a set of path keyword arguments, by default None

Returns:

pathlib.Path – a full filepath

datamodel.io.move.dm_move(old: str, new: str, parent: bool = None, symlink: bool = True)[source]¶

_summary_

_extended_summary_

Parameters:

old (str) – _description_
new (str) – _description_
parent (bool, optional) – _description_, by default None
symlink (bool, optional) – _description_, by default True

datamodel.io.move.dm_move_species(abstract_path: str, new_path: str, release: str, parent: bool = None, symlink: bool = True, test: bool = None)[source]¶

Moves all files from a species to a new location

Moves all files from a given file species. Finds all real files that match an existing file species abstract path, and moves them to a new location. The location is determined by the original filename, a new abstract path location, and a given release.

Parameters:

abstract_path (str) – the existing species abstract path
new_path (str) – the new species abstract path
release (str) – the SDSS release
parent (bool, optional) – flag to move the entire parent directory, by default None
symlink (bool, optional) – flag to create a symlink from new location to old one, by default True
test (bool, optional) – flag to test the move, by default None

datamodel.io.move.find_files_from_species(path: str) → Iterator[source]¶

Find all files species from an abstract path

Finds all files matching the species pattern in a given abstract path.

Parameters:: path (str) – an abstract file species path
Returns:: Iterator – Iterator over all matching files found

Models¶

class datamodel.models.base.BaseList[source]¶

Bases: BaseModel

Base pydantic class for lists of models

list_names()[source]¶: Create a simplified list of name attributes

sort(field: str, key: Callable = None, **kwargs) → None[source]¶

Sort the list of models by a pydantic field name

Performs an in-place sort of the Pydantic Models using Python’s built-in sorted() method. Sets the newly sorted list to the root attribute, to preserve the original BaseList object instance. By default, the input sort key to the sorted function is the field attribute on the model.

Parameters:

field (str) – The Pydantic field name
key (Callable, optional) – a function to be passed into the sorted() function, by default None

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class datamodel.models.base.CoreModel[source]¶

Bases: BaseModel

Custom BaseModel

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

datamodel.models.base.add_repr(schema: Dict[str, Any], model: Type[BaseModel]) → None[source]¶: Adds custom information into the schema

class datamodel.models.releases.Release(*, name: str, description: str, public: bool = False, release_date: str | date = 'unreleased')[source]¶

Bases: CoreModel

Pydantic model presenting an SDSS release

Parameters:

name (str) – The name of the release
description (str) – A description of the release
public (bool) – Whether the release is public or not
release_date (datetime.date) – The date of the release

Raises:

ValueError – when the release name does not start with a valid SDSS release code

classmethod name_check(value)[source]¶

description: str¶

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str¶

public: bool¶

release_date: str | date¶

class datamodel.models.releases.Releases(root: RootModelRootType = PydanticUndefined)[source]¶

Bases: BaseList, RootModel[List[Release]]

Pydantic model representing a list of Releases

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class datamodel.models.yaml.Access(*, in_sdss_access: bool, path_name: str | None = None, path_template: str | None = None, path_kwargs: List[str] | None = None, access_string: str | None = None)[source]¶

Bases: CoreModel

Pydantic model representing the YAML releases access section

Parameters:

in_sdss_access (bool) – Whether or not the data product has an sdss_access entry
path_name (str) – The path name in sdss_access for the data product
path_template (str) – The path template in sdss_access for the data product
path_kwargs (List[str]) – A list of path keywords in the path_template for the data product
access_string (str) – The full sdss_access entry, “path_name=path_template”

classmethod check_path_kwargs(value: str, info: ValidationInfo)[source]¶

classmethod check_path_nulls(value: str, info: ValidationInfo)[source]¶

access_string: str | None¶

in_sdss_access: bool¶

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

path_kwargs: List[str] | None¶

path_name: str | None¶

path_template: str | None¶

class datamodel.models.yaml.ChangeBase(*, from_: str, note: str | None = None)[source]¶

Bases: CoreModel

Base Pydantic model representing a YAML changelog release section

from_: str¶

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

note: str | None¶

class datamodel.models.yaml.ChangeLog(*, description: str, releases: Dict[str, ChangeRelease] = None)[source]¶

Bases: CoreModel

Pydantic model representing the YAML changelog section

Parameters:

description (str) – A description of the changelog
releases (Dict[str, ChangeRelease]) – A dictionary of the file changes between the given release and previous one

dict(**kwargs)[source]¶

override dict method to exclude none fields by default

Need to override this method as well when serializing YamlModel to json, because nested models are already converted to dict when json.dumps is called. See https://github.com/samuelcolvin/pydantic/issues/1778

model_dump_json(**kwargs)[source]¶: override json method to exclude none fields by default

description: str¶

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

releases: Dict[str, ChangeRelease]¶

Bases: ChangeHdf, ChangePar, ChangeFits, ChangeBase

Pydantic model representing a YAML changelog release section

Represents a computed section of the changelog, for the specified release. Changelog is computed between the data products of release (key) and the release indicated in from.

Parameters:

from (str) – The release the changelog is computed from
delta_nhdus (int) – The difference in number of HDUs
added_hdus (List[str]) – A list of any added HDUs
removed_hdus (List[str]) – A list of any removed HDUs
primary_delta_nkeys (int) – The difference in primary header keywords
added_primary_header_kwargs (List[str]) – A list of any added primary header keywords
removed_primary_header_kwargs (List[str]) – A list of any removed primary header keywords
delta_nkeys (int) – The difference in number of Yanny header keys
added_header_keys (List[str]) – A list of any added Yanny header keywords
removed_header_keys (List[str]) – A list of any removed Yanny header keywords
delta_tables (int) – The difference in number of Yanny tables
added_tables (List[str]) – A list of any added Yanny tables
removed_tables (List[str]) – A list of any removed Yanny tables
tables (Dict[str, ChangeTable]) – A dictionary of table column and row changes
new_libver (tuple) – The difference in HDF5 library version
delta_nattrs (int) – The difference in the number of HDF5 Attributes
added_attrs (List[str]) – A list of any added HDF5 Attributes
removed_attrs (List[str]) – A list of any removed HDF5 Attributes
delta_nmembers (int) – The difference in number members in HDF5 file
added_members (List[str]) – A list of any added HDF5 groups or datasets
removed_members (List[str]) – A list of any removed HDF5 groups or datasets
members (Dict[str, ChangeMember]) – A dictionary of HDF5 group/dataset member changes

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class datamodel.models.yaml.GeneralSection(*, name: str, short: str, description: str, environments: List[str] = None, surveys: List[str | Survey] = None, datatype: str | None, filesize: str | None, releases: List[str | Release] = None, naming_convention: str, generated_by: str, design: bool = None, vac: bool = None, recommended_science_product: bool = None, data_level: DataLevel = None)[source]¶

Bases: CoreModel

Pydantic model representing the YAML general section

Parameters:

name (str) – The file species name of the data product (or sdss_access path_name)
short (str) – A one sentence summary of the data product
description (str) – A longer description of the data product
environments (List[str]) – A list of environment variables associated with the data product
datatype (str) – The type of data product, e.g. FITS
filesize (str) – An estimated size of the data product
releases (List[str]) – A list of SDSS releases the data product is in
naming_convention (str) – A description of the naming convention
generated_by (str) – An identifiable piece of the code that generates the data product
design (bool) – If True, the datamodel is in the design phase, before any file exists yet
vac (bool) – True if the datamodel is a VAC
recommended_science_product (bool) – True if the product is recommended for science use
data_level (str) – The product level or ranking, as numeral x.y.z

Raises:

ValueError – when any of the releases are not a valid SDSS Release

classmethod no_design(value: bool)[source]¶: Validator to check if the design flag is set to True

classmethod valid_data_level(value: DataLevel)[source]¶

data_level: DataLevel¶

datatype: str | None¶

description: str¶

design: bool¶

environments: List[str]¶

filesize: str | None¶

generated_by: str¶

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str¶

naming_convention: str¶

recommended_science_product: bool¶

releases: List[AnnoRelease]¶

short: str¶

surveys: List[AnnoSurvey]¶

vac: bool¶

class datamodel.models.yaml.ProductModel(*, general: GeneralSection, changelog: ChangeLog, releases: Dict[str, ReleaseModel], notes: str = None, regrets: str = 'I have no regrets!')[source]¶

Bases: YamlModel

Pydantic model representing a data product JSON file

Parameters:

general (GeneralSection) – The general metadata section of the datamodel
changelog (ChangeLog) – An automated log of data product changes across releases
releases (Dict[str, ReleaseModel]) – A dictionary of information specific to that release

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class datamodel.models.yaml.ReleaseModel(*, template: str, example: str | None, location: str, environment: str, survey: str = None, access: Access, hdus: Dict[str, HDU] | None = None, par: ParModel | None = None, hdfs: HdfModel | None = None)[source]¶

Bases: CoreModel

Pydantic model representing an item in the YAML releases section

Contains any information on the data product that is specific to a given release, or that changes across releases.

Parameters:

template (str) – The full template representation of the path to the data product
example (str) – A real example path of the data product
location (str) – The symbolic location of the data product
environment (str) – The SAS environment variable the product lives under
access (Access) – Information on any relevant sdss_access entry
hdus (Dict[str, HDU]) – A dictionary of HDU content for the product for the given release

convert_to_hdulist() → HDUList[source]¶: Convert the hdus to a fits.HDUList

access: Access¶

environment: str¶

example: str | None¶

hdfs: HdfModel | None¶

hdus: Dict[str, HDU] | None¶

location: str¶

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

par: ParModel | None¶

survey: str¶

template: str¶

class datamodel.models.yaml.YamlModel(*, general: GeneralSection, changelog: ChangeLog, releases: Dict[str, ReleaseModel], notes: str = None, regrets: str = 'I have no regrets!')[source]¶

Bases: CoreModel

Pydantic model representing a YAML file

Parameters:

general (GeneralSection) – The general metadata section of the datamodel
changelog (ChangeLog) – An automated log of data product changes across releases
releases (Dict[str, ReleaseModel]) – A dictionary of information specific to that release
notes (str) – A string or multi-line text blob of additional information
regrets (str) – A string or multi-line text blob of any regrets over the datamodel

changelog: ChangeLog¶

general: GeneralSection¶

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

notes: str¶

regrets: str¶

releases: Dict[str, ReleaseModel]¶

datamodel.models.yaml.check_gen_release(value: str) → str[source]¶: Validator to check release against list of releases

datamodel.models.yaml.check_survey(value: str) → str[source]¶: Validator to check survey against list of surveys

datamodel.models.yaml.orjson_dumps(v, *, default)[source]¶

class datamodel.models.surveys.Phase(*, name: str, id: int, start: int | None = None, end: int | None = None, active: bool = False)[source]¶

Bases: CoreModel

Pydantic model representing an SDSS phase

Parameters:

name (str) – The name of the phase
id (int) – The id of the phase
start (int) – The year the phase started
end (int) – The year the phase ended
active (bool) – Whether the phase is currently active

active: bool¶

end: int | None¶

id: int¶

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str¶

start: int | None¶

class datamodel.models.surveys.Phases(root: RootModelRootType = PydanticUndefined)[source]¶

Bases: BaseList, RootModel[List[Phase]]

Pydantic model representing a list of Phases

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class datamodel.models.surveys.Survey(*, name: str, long: str = None, description: str, phase: int | Phase = None, id: str = None, aliases: list = [])[source]¶

Bases: CoreModel

Pydantic model representing an SDSS survey

Parameters:

name (str) – The short name of the survey
long (str) – The full name of the survey
description (str) – A description of the survey
phase (Phase) – The main phase the survey was in
id (str) – An internal reference id for the survey

Raises:

ValueError – when the survey phase is not a valid SDSS Phase

classmethod get_phase(v)[source]¶: check the phase is a valid SDSS phase

aliases: list¶

description: str¶

id: str¶

long: str¶

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str¶

phase: int | Phase¶

class datamodel.models.surveys.Surveys(root: RootModelRootType = PydanticUndefined)[source]¶

Bases: BaseList, RootModel[List[Survey]]

Pydantic model representing a list of Surveys

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class datamodel.models.versions.Tag(*, version: Version, tag: list[str] | str = None, release: str | Release | List[Release], survey: str | Survey)[source]¶

Bases: CoreModel

Pydantic model representing an SDSS software tag

Parameters:

version (Version) – The version key
tag (str) – The version tag number or name
release (Release) – The SDSS release the tag is associated with
survey (Survey) – The SDSS survey the tag is associated with

Raises:

ValueError – when the tag release is not a valid SDSS Release
ValueError – when the tag survey is not a valid SDSS Survey

classmethod get_release(v)[source]¶: check the release is a valid SDSS release

classmethod get_survey(v)[source]¶: check the survey is a valid SDSS survey

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property name¶: A name for the tag

release: str | Release | List[Release]¶

survey: str | Survey¶

tag: list[str] | str¶

version: Version¶

class datamodel.models.versions.Tags(root: RootModelRootType = PydanticUndefined)[source]¶

Bases: BaseList, RootModel[List[Tag]]

Pydantic model representing a list of Tags

group_by(order_by: str = 'release') → dict[source]¶

Group tags by SDSS release or survey

Convert the list of tags to a series of dictionaries, ordered by the SDSS release or survey, with key:value pairs of version_name:tag. Default is to group by release, then survey. With order_by set to survey, grouped by survey, then release. For example, “{‘DR17’: {‘manga’: {‘drpver’: ‘v3_1_1’, ‘dapver’: ‘3.1.0’}}”.

Parameters:: order_by (str, optional) – _description_, by default ‘release’
Returns:: dict – nested dictionary of tags

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class datamodel.models.versions.Version(*, name: str, description: str)[source]¶

Bases: CoreModel

Pydantic model representing an SDSS version

Parameters:

name (str) – The name of the software version key
description (str) – A description of the software key

description: str¶

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str¶

class datamodel.models.vacs.VAC(*, name: str)[source]¶

Bases: BaseModel

Pydantic model presenting an SDSS VAC

Parameters:: name (str) – The environment variable label name of the VAC
Raises:: ValueError – when the release name does not start with a valid SDSS release code

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str¶

class datamodel.models.vacs.VACS(root: RootModelRootType = PydanticUndefined)[source]¶

Bases: BaseList, RootModel[List[VAC]]

Pydantic model representing a list of VACs

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class datamodel.models.filetypes.fits.ChangeFits(*, delta_nhdus: int | None = None, added_hdus: List[str] | None = None, removed_hdus: List[str] | None = None, primary_delta_nkeys: int | None = None, added_primary_header_kwargs: List[str] | None = None, removed_primary_header_kwargs: List[str] | None = None)[source]¶

Bases: CoreModel

Pydantic model representing the FITS hdu fields of the YAML changelog release section

Represents a computed section of the changelog, for the specified release. Changelog is computed between the data products of release (key) and the release indicated in from.

Parameters:

delta_nhdus (int) – The difference in number of HDUs
added_hdus (List[str]) – A list of any added HDUs
removed_hdus (List[str]) – A list of any removed HDUs
primary_delta_nkeys (int) – The difference in primary header keywords
added_primary_header_kwargs (List[str]) – A list of any added primary header keywords
removed_primary_header_kwargs (List[str]) – A list of any removed primary header keywords

added_hdus: List[str] | None¶

added_primary_header_kwargs: List[str] | None¶

delta_nhdus: int | None¶

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

primary_delta_nkeys: int | None¶

removed_hdus: List[str] | None¶

removed_primary_header_kwargs: List[str] | None¶

class datamodel.models.filetypes.fits.Column(*, name: str, description: str, type: str, unit: str = '')[source]¶

Bases: CoreModel

Pydantic model representing a YAML column section

Represents a FITS binary table column

Parameters:

name (str) – The name of the table column
description (str) – A description of the table column
type (str) – The data type of the table column
unit (str) – The unit of the table column

to_fitscolumn() → Column[source]¶

Convert the column to a fits.Column

Converts the column entry in the yaml file to an Astropy fits.Column object. Performs a mapping between type and format, using the reverse of datamodel.generate.stub.Stub._format_type.

Returns:: fits.Column – a valid astropy fits.Column object
Raises:: TypeError – when the column type cannot be coerced into a valid fits.Column format

description: str¶

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str¶

type: str¶

unit: str¶

class datamodel.models.filetypes.fits.HDU(*, name: str, is_image: bool, description: str, size: str = None, header: List[Header] = None, columns: Dict[str, Column] | None = None)[source]¶

Bases: CoreModel

Pydantic model representing a YAML hdu section

Represents a FITS HDU extension

Parameters:

name (str) – The name of the HDU extension
is_image (bool) – Whether the HDU is an image extension
description (str) – A description of the HDU extension
size (str) – An estimated size of the HDU extension
header (List[Header]) – A list of header values for the extension
columns (Dict[str, Column]) – A list of any binary table columns for the extension

convert_columns() → List[Column][source]¶: Convert the columns dict into a a list of fits.Columns

convert_hdu() → PrimaryHDU | ImageHDU | BinTableHDU[source]¶: Convert the HDU entry into a valid fits.HDU

convert_header() → Header[source]¶: Convert the list of header keys into a fits.Header

columns: Dict[str, Column] | None¶

description: str¶

header: List[Header]¶

is_image: bool¶

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str¶

size: str¶

class datamodel.models.filetypes.fits.Header(*, key: str, value: str | None = '', comment: str = '')[source]¶

Bases: CoreModel

Pydantic model representing a YAML header section

Represents an individual FITS Header Key

Parameters:

key (str) – The name of the header keyword
value (str) – The value of the header keyword
comment (str) – A comment for the header keyword, if any

to_tuple()[source]¶: Convert the header key to a tuple

comment: str¶

key: str¶

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

value: str | None¶

class datamodel.models.filetypes.par.ChangePar(*, delta_nkeys: int | None = None, addead_header_keys: List[str] | None = None, removed_header_keys: List[str] | None = None, delta_ntables: int | None = None, addead_tables: List[str] | None = None, removed_tables: List[str] | None = None, tables: Dict[str, ChangeTable] | None = None)[source]¶

Bases: CoreModel

Pydantic model representing the Yanny par fields of the YAML changelog release section

Represents a computed section of the changelog, for the specified release. Changelog is computed between the data products of release (key) and the release indicated in from.

Parameters:

delta_nkeys (int) – The difference in number of Yanny header keys
added_header_keys (List[str]) – A list of any added Yanny header keywords
removed_header_keys (List[str]) – A list of any removed Yanny header keywords
delta_tables (int) – The difference in number of Yanny tables
added_tables (List[str]) – A list of any added Yanny tables
removed_tables (List[str]) – A list of any removed Yanny tables
tables (Dict[str, ChangeTable]) – A dictionary of table column and row changes

addead_header_keys: List[str] | None¶

addead_tables: List[str] | None¶

delta_nkeys: int | None¶

delta_ntables: int | None¶

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

removed_header_keys: List[str] | None¶

removed_tables: List[str] | None¶

tables: Dict[str, ChangeTable] | None¶

class datamodel.models.filetypes.par.ChangeTable(*, delta_nrows: int | None = None, added_cols: List[str] | None = None, removed_cols: List[str] | None = None)[source]¶

Bases: CoreModel

Pydantic model representing a YAML changelog Yanny table section

Represents a computed section of the changelog, for a specific Yanny table. For each similar Yanny table between releases, the changes in row number and structure columns are computed.

Parameters:

delta_nrows (int) – The difference in rows between Yanny tables
added_cols (List[str]) – A list of any added Yanny table columns
removed_cols (List[str]) – A list of any removed Yanny table columns

added_cols: List[str] | None¶

delta_nrows: int | None¶

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

removed_cols: List[str] | None¶

class datamodel.models.filetypes.par.ParColumn(*, name: str, type: str, description: str, unit: str, is_array: bool, is_enum: bool, enum_values: list = None, example: str | int | float | list)[source]¶

Bases: CoreModel

Pydantic model representing a YAML par column section

Represents a typedef column definition in a Yanny parameter file

Parameters:

name (str) – The name of the column
description (str) – A description of the column
type (str) – The data type of the column
unit (str) – The unit of the column, if any
is_array (bool) – If the column is an array type
is_enum (bool) – If the column is an enum type
example (str) – An example value for the column

parse_type()[source]¶: Parse the yanny YAML column type

description: str¶

enum_values: list¶

example: str | int | float | list¶

is_array: bool¶

is_enum: bool¶

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str¶

type: str¶

unit: str¶

class datamodel.models.filetypes.par.ParModel(*, comments: str = None, header: List[Header] = None, tables: Dict[str, ParTable])[source]¶

Bases: CoreModel

Pydantic model representing a YAML par section

Represents a Yanny parameter file

Parameters:

comments (str) – Any header comments in the parameter file
header (list) – A list of header key-value pairs in the parameter file
tables (dict) – A dictionary of tables in the parameter file

convert_header() → dict[source]¶: Convert the header into a dicionary

convert_par()[source]¶: Convert the YAML par section into a Yanny par object

comments: str¶

header: List[Header]¶

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

tables: Dict[str, ParTable]¶

class datamodel.models.filetypes.par.ParTable(*, name: str, description: str, n_rows: int, structure: List[ParColumn])[source]¶

Bases: CoreModel

Pydantic model representing a YAML par table section

Represents the structure of a single Yanny parameter table

Parameters:

name (str) – The name of the table
description (str) – A description of the table
n_rows (int) – The number of rows in the table
structure (list) – A list of column definitions for the table

convert_table()[source]¶: Create a dictionary to prepare a Yanny table

create_enum()[source]¶: Create a Yanny typedef enum string

create_typedef()[source]¶: Create a Yanny typedef struct string

description: str¶

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_rows: int¶

name: str¶

structure: List[ParColumn]¶

Bases: CoreModel

Pydantic model representing the HDF5 fields of the YAML changelog release section

Represents a computed section of the changelog, for the specified release. Changelog is computed between the data products of release (key) and the release indicated in from.

Parameters:

new_libver (tuple) – The difference in HDF5 library version
delta_nattrs (int) – The difference in the number of HDF5 Attributes
added_attrs (List[str]) – A list of any added HDF5 Attributes
removed_attrs (List[str]) – A list of any removed HDF5 Attributes
delta_nmembers (int) – The difference in number members in HDF5 file
added_members (List[str]) – A list of any added HDF5 groups or datasets
removed_members (List[str]) – A list of any removed HDF5 groups or datasets
members (Dict[str, ChangeMember]) – A dictionary of HDF5 group/dataset member changes

addead_attrs: List[str] | None¶

addead_members: List[str] | None¶

delta_nattrs: int | None¶

delta_nmembers: int | None¶

members: Dict[str, ChangeMember] | None¶

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

new_libver: tuple | None¶

removed_attrs: List[str] | None¶

removed_members: List[str] | None¶

Bases: CoreModel

Pydantic model representing a YAML changelog HDF5 member section

Represents a computed section of the changelog, for a specific HDF member. For each similar HDF5 member between releases, the changes in member number, attributes, and dataset dimensions, size and shape are computed.

Parameters:

delta_nmembers (int) – The difference in member number between HDF5 groups
delta_nattrs (int) – The difference in attribute number between HDF5 members
added_attrs (List[str]) – A list of any added HDF5 Attributes
removed_attrs (List[str]) – A list of any removed HDF5 Attributes The difference in dataset dimension number between HDF5 members
new_shape (int) – The difference in dataset shape between HDF5 members
delta_size (int) – The difference in dataset size between HDF5 members

added_attrs: List[str] | None¶

delta_nattrs: int | None¶

delta_ndim: int | None¶

delta_nmembers: int | None¶

delta_size: int | None¶

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

new_shape: tuple | None¶

removed_attrs: List[str] | None¶

class datamodel.models.filetypes.hdf5.HdfAttr(*, key: str, value: str | int | float | bool = None, comment: str, dtype: str, is_empty: bool = None, shape: tuple | None = <factory>)[source]¶

Bases: CoreModel

Pydantic model representing a YAML hdfs attrs section

Represents the Attributes of an HDF5 file. Each group or dataset has a set of attributes (attrs), which contains metadata about the group or dataset.

Parameters:

key (str) – The name of the attribute
value (str) – The value of the attribute
comment (str) – A description of the attribute
dtype (str) – The numpy dtype of the attribute
is_empty (bool) – If the attribute is an HDF5 Empty atribute
shape (tuple) – The shape of the attribute, if any

check_value()[source]¶

comment: str¶

dtype: str¶

is_empty: bool¶

key: str¶

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

shape: tuple | None¶

value: str | int | float | bool¶

class datamodel.models.filetypes.hdf5.HdfBase(*, name: str, parent: str, object: ~datamodel.models.filetypes.hdf5.HdfEnum, description: str, pytables: bool = None, attrs: ~typing.List[~datamodel.models.filetypes.hdf5.HdfAttr] = <factory>)[source]¶

Bases: CoreModel

Base Pydantic model representing a YAML hdfs section

Represents of an HDF5 file. Each group or dataset has a set of attributes (attrs), which contains metadata about the group or dataset.

Parameters:

name (str) – The name of the attribute
parent (str) – The value of the attribute
object (str) – A description of the attribute
description (str) – The numpy dtype of the attribute
pytables (bool) – Flag is object is a PyTables object
attrs (list) – If the attribute is an HDF5 Empty object

attrs: List[HdfAttr]¶

description: str¶

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str¶

object: HdfEnum¶

parent: str¶

pytables: bool¶

class datamodel.models.filetypes.hdf5.HdfDataset(*, name: str, parent: str, object: ~datamodel.models.filetypes.hdf5.HdfEnum, description: str, pytables: bool = None, attrs: ~typing.List[~datamodel.models.filetypes.hdf5.HdfAttr] = <factory>, shape: tuple, size: int, ndim: int, dtype: str, nbytes: int = None, is_virtual: bool = None, is_empty: bool = None)[source]¶

Bases: HdfBase

Pydantic model representing a YAML HDF Dataset section

Represents a Dataset of an HDF5 file.

Parameters:

shape (tuple) – The dimensional shape of the dataset
size (int) – The size or number or elements in the dataset
ndim (int) – The number of dimensions in the dataset
dtype (str) – The numpy dtype of the dataset
nbytes (int) – The number of bytes in the dataset
is_virutal (bool) – Whether the dataset is virtual
is_empty (bool) – Whether the dataset is an HDF5 Empty object

dtype: str¶

is_empty: bool¶

is_virtual: bool¶

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

nbytes: int¶

ndim: int¶

shape: tuple¶

size: int¶

class datamodel.models.filetypes.hdf5.HdfEnum(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Bases: str, Enum

Pydantic Enum for HDF5 Group or Dataset

dataset = 'dataset'¶

group = 'group'¶

class datamodel.models.filetypes.hdf5.HdfGroup(*, name: str, parent: str, object: ~datamodel.models.filetypes.hdf5.HdfEnum, description: str, pytables: bool = None, attrs: ~typing.List[~datamodel.models.filetypes.hdf5.HdfAttr] = <factory>, n_members: int)[source]¶

Bases: HdfBase

Pydantic model representing a YAML HDF Group section

Represents a Group of an HDF5 file.

Parameters:: n_members (int) – The number of members in the group

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_members: int¶

class datamodel.models.filetypes.hdf5.HdfModel(*, name: str, parent: str, object: ~datamodel.models.filetypes.hdf5.HdfEnum, description: str, pytables: bool = None, attrs: ~typing.List[~datamodel.models.filetypes.hdf5.HdfAttr] = <factory>, n_members: int, libver: tuple = [], members: ~typing.Dict[str, ~datamodel.models.filetypes.hdf5.HdfGroup | ~datamodel.models.filetypes.hdf5.HdfDataset] = <factory>)[source]¶

Bases: HdfGroup

Pydantic model representing a YAML hfds section

Represents a base HDF5 file, which is also an HDF5 Group. See HdfGroup, HdfDataset, and HdfBase Moodels for more information on the fields.

Parameters:

libver (tuple) – The HDF5 library version used to create the file
members (dict) – All groups and datasets in the HDF5 file

convert_hdf()[source]¶: Convert the hdfs to a h5py.File

libver: tuple¶

members: Dict[str, HdfGroup | HdfDataset]¶

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

datamodel.models.validators.check_release(value: dict) → str[source]¶

Validator for datamodel release keys

Validator for yaml “releases” fields. Checks the “releases” keys against valid SDSS releases, from the Releases Model.

Parameters:: value (dict) – the value of the field
Returns:: str – the value of the field
Raises:: ValueError – when the release key is not a valid release

datamodel.models.validators.replace_me(value: str) → str[source]¶

Validator for datamodel text fields

Validator for yaml fields where the string values have the text “replace me” within it. This text indicates a template text that must be replaced.

Parameters:: value (str) – the value of the field
Returns:: str – the value of the field
Raises:: ValueError – when “replace me” is the in the value text

Products¶

class datamodel.products.product.DataProducts[source]¶

Bases: FuzzyList

Class of a fuzzy list of SDSS data products

Creates a list of all available SDSS data products that have valid JSON datamodel files, i.e. those in the datamodel/products/json/ directory. All products are lazy-loaded at first for efficiency. Products are automatically loaded with content when the items in the list are accessed.

get_level(level: str) → dict[source]¶

Get products by data level

Get all products for a given data level. The input data level can be any ranking, e.g. “1”, “1.2”, “1.2.3”, etc, and it will return all products that match that level.

Parameters:: level (str) – the data level to retrieve
Returns:: dict – the products for the requested data level

group_by(field: str) → dict[source]¶

Group the products by an attribute

Group all products by either a product attribute, e.g. “releases”, or a field in the underlying JSON model, e.g. “_model.general.environments”. A dotted attribute string is resolved as a set of nested attribute. Returns a dictionary of products grouped by the field, or fields, if the requested field is a list.

Parameters:: field (str) – The name of the attribute or field
Returns:: dict – A dictionary of products grouped by desired field

Example

>>> from datamodel.products import DataProducts
>>> dp = DataProducts()
>>> gg = dp.group_by('releases')
>>> gg
    {"DR15": ...,
     "DR16": ....}

list_products() → list[source]¶: List all data products

load_all()[source]¶: Load all data products

static mapper(item) → str[source]¶: Override the fuzzy mapper to match on product’s name

class datamodel.products.product.Product(name: str, load: bool = False)[source]¶

Bases: object

Class for an SDSS data product

Entry point for individual SDSS data products. This class reads in the content from the validated JSON datamodel file, handling deserialization via the pydantic ProductModel. By default, products are lazy-loaded, i.e. they will not load the underlying JSON content. Pass load=True or use load() to manually load the product’s datamodel.

Parameters:

name (str) – The file species name of the datamodel
load (bool, optional) – If True, loads the model’s JSON content, by default False

classmethod from_file(value: str | Path, load: bool = None) → PType[source]¶

Class method to load a data Product from a JSON datamodel filepath

Parameters:

value (Union[str, pathlib.Path]) – The full path to a JSON datamodel file
load (bool, optional) – If True, loads the model content on instantiation, by default None

Returns:

PType – A new instance of a Product

get_access(release: str = None) → dict[source]¶

Get the sdss-access information for a given release

Get the “access” entry from the datamodel for a given release. If no release is given, returns the access information for all releases for the product. The access information returned is also the same content as in the products/access/[fileSpecies].access file.

Parameters:

release (str, optional) – The data release to use, by default None

Returns:

dict – the access information from the datamodel

Raises:

AttributeError – when “releases” is not set and product is not loaded
ValueError – when the specified release is not a valid one for the product

get_content(*args, **kwargs) → dict[source]¶

Returns the entire cached JSON datamodel content

Returns:: dict – The JSON datamodel content

get_example(release: str = 'WORK', expand: bool = True) → str[source]¶

Get the example file from the datamodel

Returns the resolved example filepath for a specified release. By default the SAS environment variable will be expanded, but can optionally return the path unresolved.

Parameters:

release (str, optional) – The data release to use, by default “WORK”
expand (bool, optional) – If True, expands the SAS environment variable, by default True

Returns:

str – The generated filepath

Raises:

AttributeError – when “releases” is not set and product is not loaded
ValueError – when the specified release is not a valid one for the product

get_location(release: str = 'WORK', symbolic: bool = False, expand: bool = True, **kwargs) → str[source]¶

Get a file location from the datamodel

Returns a resolved filepath for a specified release. The symbolic location can be given keyword arguments to resolve it to a real filepath. By default the SAS environment variable will be expanded, but can optionally return the path unresolved.

Parameters:

name (str) – The type of path to extract. Either “example” or “location”.
release (str, optional) – The data release to use, by default “WORK”
expand (bool, optional) – If True, expands the SAS environment variable, by default True
symbolic (bool, optional) – If True, returns only the symbolic path, by default False
kwargs (str) – Any set of keyword arguments needed to resolve the symbolic path

Returns:

str – The generated filepath

Raises:

AttributeError – when “releases” is not set and product is not loaded
ValueError – when the specified release is not a valid one for the product

get_release(value: str) → Release[source]¶

Get the JSON content for the given product for a given SDSS release

Returns the Pydantic yaml.Release model for a given SDSS release. All JSON keys are accessible as instance attributes. The model can be dumped into a dictionary with the model_dump() method.

Parameters:: value (str) – a valid SDSS release
Returns:: Release – The JSON ReleaseModel content for the given SDSS release
Raises:: ValueError – when the input release is an invalid SDSS release

get_schema(*args, **kwargs) → dict[source]¶

Returns the Pydantic schema datamodel definition

Returns:: dict – The datamodel schema

load() → None[source]¶: Loads the DataModel content into the Product

loader() → PType[source]¶: Contextmanager to temporarily load the product

unload() → None[source]¶: Unloads the DataModel content from the Product

class datamodel.products.product.ReleaseList(the_items: list | dict, use_fuzzy: Callable = None, dottable: bool = True)[source]¶

Bases: FuzzyList

Class for a fuzzy list of Releases

static mapper(item)[source]¶

Mapper between list/dict item and rapidfuzz choices

Static method used to map a list’s items or dict’s keys to a string representation used by rapidfuzz for. By default returns an explicit string case of the item. To see the output, view the choices property. Can be overridden to customize what is input into rapidfuzz.

Parameters:: item (Union[str, object]) – Any iterable item, i.e. a list item or dictionary key
Returns:: str – The string representation to use in the choices supplied to rapidfuzz

class datamodel.products.product.SDSSDataModel[source]¶

Bases: object

Class for the SDSS DataModel

High-level entry point into the SDSS DataModel. Contains accounting of all relevant SDSS phases, surveys, data releases, and available data products.

datamodel.products.product.grouper(field: str, products: list) → dict[source]¶

Group the products by an attribute

Parameters:

field (str) – The name of the attribute or field
products (list) – A list of Products to group by

Returns:

dict – A dictionary of products grouped by desired field

datamodel.products.product.rgetattr(obj: object, attr: str, *args)[source]¶

recursive getattr for nested attributes

Recursively get attributes from nested classes. See https://stackoverflow.com/questions/31174295/getattr-and-setattr-on-nested-subobjects-chained-properties

datamodel.products.product.sort_function(x)[source]¶

Sort function for grouping products by field value.

if item is a pydantic model, sort by the model’s name if it has one; otherwise sort by the tuple item

Parameters:: x (tuple) – A tuple containing a product and its corresponding field value.
Returns:: str – The name of the field value

datamodel.products.product.zipper(x: Product, field: str) → list[source]¶

Creates a list of tuples of the Product and its corresponding field value(s).

This function retrieves the value of the specified field from the given product. It creates a list of tuples where each tuple contains the product and the field value. If the field value is a list, it returns a list of (product, item_element) tuples.

This creates an easily sortable list of tuples for grouping.

Parameters:

x (Product) – The product from which to retrieve the field value.
field (str) – The name of the attribute or field to retrieve from the product.

Returns:

list – A list of tuples to sort

Validate¶

datamodel.validate.add.add_and_commit(repo: Repo, file: str, message: str = None)[source]¶

Add and commit a file to the tree

Add and commit a file to the tree repo.

Parameters:

repo (Repo) – _description_
file (str) – _description_
message (str, optional) – _description_, by default None

datamodel.validate.add.clone_tree(branch: str = 'dm_update_tree', local: bool = None, path: str = None) → Repo[source]¶

Clone the tree repo

Clone the tree repo from either an existing local source or cloning the remote repo into a temporary directory.

Parameters:

branch (str, optional) – the name of the branch, by default None
local (bool, optional) – if True, use a local tree repo, by default None
path (str, optional) – a path to check out the tree repo to

Returns:

Repo – the git repo

datamodel.validate.add.get_new_products(release: str = None) → tuple[source]¶

Get new datamodel products for the tree

Retrieves any valid JSON datamodels that do not yet have a corresponding tree entry, i.e. the in_sdss_access field is False.

Parameters:: release (str, optional) – the SDSS release, by default None
Yields:: tuple – The release and access string

datamodel.validate.add.make_branch(repo: Repo, branch: str = 'dm_update_tree') → Repo[source]¶

Make a new branch in the tree repo

Checkout or create a branch in the tree repo.

Parameters:

repo (Repo) – the git repo
branch (_type_, optional) – the name of the branch, by default “dm_update_tree”

Returns:

Repo – the git repo

datamodel.validate.add.pull_and_push(repo: Repo)[source]¶

Pull and push the tree repo

Pull and push the current tree repo head to the remote.

Parameters:: repo (Repo) – the git repo

datamodel.validate.add.update_datamodel_access(branch: str = 'dm_update_models', test: bool = None, commit_to_git: bool = False)[source]¶

Updates the datamodel access info sections

Checks all “new” JSON datamodels for updated access info. Creates a new datamodel instance using the product file species, and updates YAML file and all stubs with the updated access info for the indicated release.

Parameters:

branch (str, optional) – the datamodel repo branch name, by default ‘dm_update_models’
test (bool, optional) – if set, skips all write operations, by default None
commit_to_git (bool, optional) – if set, commits to git, by default False

datamodel.validate.add.update_tree(release: str = None, work_ver: str = None, branch: str = 'dm_update_tree', local: bool = None, test: bool = None, skip_push: bool = False)[source]¶

Update the tree repo with new paths

Updates the tree repo with new paths for datamodel products. Gets all new JSON datamodels that do not yet have tree paths, and adds them to the PATH ini section of the respective release config file. Clones the tree repo and makes all commits in a new branch, by default ‘dm_update_tree’. Commits and pushes the branch to the remote. Makes a backup of the tree config file before writing any new changes. On failure, the backup is restored.

Use the test flag to skip all write operations and just print the new paths. Use the skip_push flag to bypass the push to the remote.

Parameters:

release (str, optional) – the SDSS release, by default None
work_ver (str, optional) – the tree config work version, by default None
branch (str, optional) – the tree repo branch name, by default ‘dm_update_tree’
local (bool, optional) – if set, uses an existing local repo, by default None
test (bool, optional) – if set, turns on testing, by default None
skip_push (bool, optional) – if set, skips the git push, by default None

datamodel.validate.add.write_comments(cfgfile: str, paths: list)[source]¶

Update a tree config file

Write a tree config file with new paths added into it. This preserves all comments from the tree ini config file. The list of paths to add is a list of tuples of path_name, path_template. Does not add them if they already exist in the config file.

Parameters:

cfgfile (str) – the tree config file path
paths (list) – a list of paths to add

datamodel.validate.add.write_no_comments(cfgfile: str, paths: list)[source]¶

Update a tree config file

Write a tree config file with new paths added into it. This removes all comments from the tree ini config file. The list of paths to add is a list of tuples of path_name, path_template. Does not add them if they already exist in the config file.

Parameters:

cfgfile (str) – the tree config file path
paths (list) – a list of paths to add

datamodel.validate.check.check_invalid(product: str, data: dict, release: str, verbose: bool = None) → tuple | None[source]¶

Check for an invalid product access path

For a given release, checks the datamodel product access info against the relevant Tree configuration path info for consistency. If the release is “WORK”, checks both the “sdss5” and “sdsswork” configs. If both configs return an invalidation, then the product path is invalid.

Parameters:

product (str) – The name of the datamodel product, i.e. file species name
data (dict) – The datamodel access info dictionary
release (str, optional) – The SDSS data release, by default None
verbose (bool, optional) – If True, turn on verbosity, by default None

Returns:

Union[tuple, None] – Either None for a valid path or a tuple of the invalid path info

datamodel.validate.check.check_path(product: str, data: dict, tree: Tree, verbose: bool = None) → tuple | None[source]¶

Checks a product access path

Checks the product access path name is in the list of tree paths. Checks the product access path template is the same as the tree path template. Checks the product access access_string is consistent with the tree path template. For tree paths that start with a special function rather than an environment variable, e.g. @spectrodir|, only checks the common part of the path.

Parameters:

product (str) – The name of the datamodel product, i.e. file species name
data (dict) – The datamodel access info dictionary
tree (Tree, optional) – The SDSS tree config
verbose (bool, optional) – If True, turn on verbosity, by default None

Returns:

Union[tuple, None] – None for a valid path or a tuple of the invalid path info

datamodel.validate.check.check_products(release: str = None, verbose: bool = None) → None[source]¶

Validate the data product path information

Checks the datamodel product access path information against the tree path information for consistency. Checks path name, template, and access_string.

Parameters:

release (str, optional) – The SDSS data release, by default None
verbose (bool, optional) – If True, turn on verbosity, by default None

Raises:

ValueError – when any of the product paths are invalid against tree

datamodel.validate.check.compare_path(a: str, b: str) → bool[source]¶

Compares two paths

Compares two paths for equality. Tries to account for comparison between a path with and without compression suffixes, i.e. “fits” and “fits.gz”. Strips the last suffix from paths with more than one.

Parameters:

a (str) – a str filepath or path location
b (str) – a str filepath or path location

Returns:

bool – If the two paths are the same

datamodel.validate.check.get_products(release: str = None) → dict[source]¶

Get the access info for all SDSS data products

Get the access datamodel info for all SDSS data products of a given release. If no release specified, returns the info for all products for all releases.

Parameters:: release (str, optional) – The SDSS data release, by default None
Returns:: dict – The datamodel path access information

datamodel.validate.check.yield_products(release: str = None) → tuple[source]¶

Generator to yield the access info for all SDSS data products

Yield the access datamodel info for all SDSS data products of a given release. If no release specified, returns the info for all products for all releases.

Parameters:: release (str, optional) – The SDSS data release, by default None
Returns:: tuple – The product name and its datamodel access dictionary
Yields:: Iterator[tuple] – The product name and its datamodel access dictionary

datamodel.validate.models.revalidate(species: str, release: str = None, verbose: bool = None)[source]¶

Rewrite JSON datamodels

Rewrites all the datamodel stubs for a given existing file species and release.

Parameters:

species (str) – the file species name of a YAML datamodel
release (str, optional) – the SDSS release, by default None
verbose (bool, optional) – if True, turn on verbosity, by default None

datamodel.validate.models.validate_models()[source]¶

Check YAML datamodel validation

Checks all YAML datamodels for corresponding validated JSON models.

Raises:: ValueError – when invalidated YAML models are found