datamodel Reference

Generate

class datamodel.generate.datamodel.DataModel(tree_ver: str = None, file_spec: str = None, path: str = None, keywords: list = [], env_label: str = None, location: str = None, verbose: bool = None, release: str = None, filename: str = None, access_path_name: str = None, design: bool = False, science_product: bool = None)[source]

Bases: object

Class to enable datamodel file generation for a given product

This class is used to generate valid SDSS datamodel files for a given data product.

Parameters:
  • tree_ver (str, optional) – an SDSS Tree configuration name, by default None

  • file_spec (str, optional) – The name of the file species (or sdss_access path name), by default None

  • path (str, optional) – A file path template definition, by default None

  • keywords (list, optional) – A list of path template keyword-value pairs, by default None

  • env_label (str, optional) – The environment variable name of the file’s location, by default None

  • location (str, optional) – A path location relative to the environment variable, by default None

  • verbose (bool, optional) – If True, turn on verbosity logging, by default None

  • release (str, optional) – The name of the SDSS release the file is a part of, by default None

  • filename (str, optional) – A full filepath to a real file on disk to create the datamodel for

  • access_path_name (str, optional) – A name of the path name in sdss_access, if different than the file species name, by default None

  • design (bool, optional) – If True, indicates the datamodel is in a design phase, by default None

  • science_product (bool, optional) – If True, indicates the datamodel is a recommended science product, by default None

Raises:
  • ValueError – when neither a path nor a (env_label + location) are specified

  • ValueError – when no path template keywords are specified

commit_stubs(format: str = None) None[source]

Commit the stub files into git

Commit stub files into git. Performs a git pull, commits all stubs into the repo, and attempts a git push. Optionally specify a format to only commit a specific stub.

Parameters:

format (str, optional) – A stub format to commit, by default None

design_hdf(name: str = '/', description: str = None, hdftype: str = 'group', attrs=None, ds_shape: tuple = None, ds_dtype: str = None)[source]

Wrapper to _design_content, to design a new HDF5 section

Design a new HDF entry for the given datamodel. Specify h5py groups or dataset definitions, with optional list of attributes. Each new entry is added to the members entry in the YAML structure. Use name, and description to specify the name and description of each new group or dataset the new table. Use hdftype to specify a “group” or “dataset” entry. For datasets, use ds_shape, ds_size, and ds_dtype to specify the shape, size, and dtype of the array dataset.

New HDF5 members are added to the datamodel in a flattened structure. To add a new group or dataset as a child to an existing group, specify the full path in name, e.g /mygroup/mydataset.

attrs can be a list of tuples of header keywords, conforming to (key, value, comment, dtype), or list of dictionaries conforming to {“key”: key, “value”: value, “comment”: comment, “dtype”: dtype}.

Allowed attribute or dataset dtypes are any valid string representation of numpy dtypes. For example, “<i8”, “int32”, “S10”, etc.

Parameters:
  • name (str, optional) – the name of the HDF group or dataset, by default ‘/’

  • description (str, optional) – a description of the HDF group or dataset, by default None

  • hdftype (str, optional) – the type of HDF5 object, by default ‘group’

  • attrs (list, optional) – a list of HDF5 Attributes, by default None

  • ds_shape (tuple, optional) – the shape of an HDF5 array dataset, by default None

  • ds_dtype (str, optional) – the dtype of an HDF5 array dataset, by default None

Raises:

ValueError – when an invalid hdftype is specified

design_hdu(ext: str = 'primary', extno: int = None, name: str = 'EXAMPLE', description: str = None, header: list | dict | Header = None, columns: List[list | dict | Column] = None, **kwargs)[source]

Wrapper to _design_content, to design a new HDU

Design a new astropy HDU for the given datamodel. Specify the extension type ext to indicate a PRIMARY, IMAGE, or BINTABLE HDU extension. Each new HDU is added to the YAML structure using next hdu extension id found, or the one provided with extno. Use name to specify the name of the HDU extension. Each call to this method will write out the new HDU to the YAML design file.

header can be a Header instance, a list of tuples of header keywords, conforming to (keyword, value, comment), or list of dictionaries conforming to {“keyword”: keyword, “value”: value, “comment”: comment}.

columns can be a list of Column objects, a list of tuples minimally conforming to (name, format, unit), or list of dictionaries minimally conforming to {“name”: name, “format”: format, “unit”: unit}. See Astropy’s Binary Table Column Format for the allowed format values. When supplying a list of tuples or dictionaries, can include any number of valid arguments into Column.

Parameters:
  • ext (str, optional) – the type of HDU to create, by default ‘primary’

  • extno (int, optional) – the extension number, by default None

  • name (str, optional) – the name of the HDU extension, by default ‘EXAMPLE’

  • description (str, optional) – a description for the HDU, by default None

  • header (Union[list, dict, fits.Header], optional) – valid input to create a Header, by default None

  • columns (List[Union[list, dict, fits.Column]], optional) – a list of binary table columns, by default None

  • force (bool) – If True, forces a new design even if the HDU already exists, by default None

  • **kwargs – additional keyword arguments to pass to the HDU constructor

  • optional – additional keyword arguments to pass to the HDU constructor

Raises:
  • ValueError – when the ext type is not supported

  • ValueError – when the table columns input is not a list

design_par(comments: str = None, header: list | dict = None, name: str = None, description: str = None, columns: list = None)[source]

Wrapper to _design_content, to design a new Yanny par section

Design a new Yanny par for the given datamodel. Specify Yanny comments, a header section, or a table definition. Each new table is added to the YAML structure. Use name, and description to specify the name and description of the new table. comments can be a single string of comments, with newlines indicated by “\n”.

header can be a dictionary of key-value pairs, a list of tuples of header keywords, conforming to (keyword, value, comment), or list of dictionaries conforming to {“key”: key, “value”: value, “comment”: comment}.

The columns parameter defines the relevant table columns to add to the file. It can be a list of column names, a list of tuple values conforming to column (name, type, [description]), or a list of dictionaries with keys defined from the complete column yaml definition.

Allowed column types are any valid Yanny par types, input as strings, e.g. “int”, “float”, “char”. Array columns can be specified by including the array size in “[]”, e.g. “float[6]”.

Parameters:
  • comments (str, optional) – Any comments to add to the file, by default None

  • header (Union[list, dict], optional) – Keywords to add to the header of the Yanny file, by default None

  • name (str, optional) – The name of the parameter table

  • description (str, optional) – A description of the parameter table

  • columns (list, optional) – A set of Yanny table column definitions

determine_survey(name_only: bool = False)[source]

Attempt to determine the SDSS survey for this datamodel

classmethod from_file(filename: str, path_name: str = None, tree_ver: str = None, verbose: bool = None) D[source]

class method to create a datamodel from an absolute filepath

Creates a DataModel for a given full path to a file. Prompts the user to verify any existing entry in sdss_access for the input file, or to define a new file_species / path_name, symbolic path location, and example variable_name=value key mappings.

Parameters:
  • filename (str) – The full path to the file

  • path_name (str, optional) – The existing sdss_access path name if any, by default None

  • tree_ver (str, optional) – The SDSS tree version or release associated with the file, by default None

  • verbose (bool, optional) – If True, creates the DataModel with verbosity, by default None

Returns:

DataModel – a SDSS DataModel instance

classmethod from_yaml(species: str, release: str = None, verbose: bool = None, tree_ver: str = None) D[source]

class method to create a datamodel from a YAML file species name

Creates a DataModel for a given file species name, from an existing YAML datamodel file. Extracts the abstract path and keyword arguments needed to instantiate a DataModel. Keywords are extracted using the datamodel “location” and “example” fields. The abstract path is extracted from the pre-existing “access_string” field. Fields are pulled from the specified release. If no release specified, it uses the first release it can find from the datamodel. Can optionally specify a tree config version instead for the cases where the WORK release is from the sdss5 config instead of sdsswork. If the tree_ver is set, it supersedes the release keyword.

Parameters:
  • species (str) – the file species datamodel name

  • release (str, optional) – the SDSS release, by default None

  • verbose (bool, optional) – if True, turn on verbosity, by default None

  • tree_ver (str, optional) – the SDSS tree config version, by default None

Returns:

DataModel – a SDSS DataModel instance

Raises:
  • ValueError – when no yaml file can be found for the file species

  • ValueError – when no release can be found in the datamodel

  • ValueError – when no location or example can be found in the datamodel

  • ValueError – when no path keyword arguments can be extracted

generate_designed_file(redesign: bool = None, **kwargs)[source]

Generate a file from a designed datamodel

Generates a real file on disk from a designed datamodel. If there are any path template keywords, they must be specified here as input keyword arguments to convert the symbolic path / abstract location to a real example location on disk. After generating the file, the datamodel sets design to False and exits design mode.

Parameters:
  • redesign (bool) – If True, re-enters design mode to create a new file

  • kwargs – Any path keyword arguments to be filled in

Raises:
  • KeyError – when there are missing path keywords

  • AttributeError – when the release is not WORK when in the datamodel design phase

get_stub(format: str = 'yaml') BaseStub[source]

Get a datamodel Stub

Return a datamodel Stub for a given format.

Parameters:

format (str, optional) – the stub format to return, by default ‘yaml’

Returns:

BaseStub – an instance of a stub class

remove_stubs(format: str = None, git: bool = None) None[source]

Remove the stub files

Remove all stubs or a stub of a given format.

Parameters:
  • format (str, optional) – A stub format to remove, by default None

  • git (bool, optional) – If True, removes from the git repo

write_stubs(format: str = None, force: bool = None, use_cache_release: str = None, full_cache: bool = None, group: str = 'WORK', force_release: str = None) None[source]

Write out the stub files

Write out all stubs or a stub of a given format.

Parameters:
  • format (str, optional) – A stub format to write out, by default None

  • force (bool, optional) – If True, forces a rewrite of the entire cached stub content

  • force_release (str, optional) – A specific release to force a rewrite in the cache

  • use_cache_release (str, optional) – Specify a cached release to use to copy over custom user content

  • full_cache (bool, optional, by default None) – If True, use the entire cached YAML release, rather than only the HDUs

  • group (str, optional) – The release group to use when writing the markdown file, by default “WORK”. Can be “DR”, or “IPL”.

property file_exists

Checks for file existence on disk

property recommended_science_product: bool

Checks if the datamodel product is a recommended science product

supported_filetypes = ['.fits', '.fit', '.par', '.h5', '.hdf5']
property survey: str

Get the SDSS survey for this datamodel

property vac: bool

Checks if the datamodel product is a vac by its envvar label

datamodel.generate.datamodel.prompt_for_access(filename: str, path_name: str = None, config: str = None) tuple[source]

Prompt the user to verify or define information

Takes the user through a variety of input prompts in order to verify any existing entry in sdss_access, or to define a new file species, symbolic path location, and example variable=value key mappings for the input file.

Parameters:
  • filename (str) – The full path to the file

  • path_name (str, optional) – Any existing sdss_access path_name / file_species, by default None

  • config (str, optional) – What tree config version, or release the file corresponds to, by default None

Returns:

tuple – a tuple of path_name, path_template, path_keys

datamodel.generate.parse.cleanup_dups(kwargs: dict) dict[source]

Cleanup duplicate keys in the extracted keywords

Removes the duplicated keywords from the extracted kwargs. If both key values are the same, uses it. If both are digits, attempts to remove any front zero-padding, e.g. “45”, and “”000045” -> “45”.

Parameters:

kwargs (dict) – the input extracted keywords

Returns:

dict – reduced keyword dictionary

datamodel.generate.parse.deduplicate(value: str, names: list) str[source]

De-duplicate regex pattern field names

Some paths have duplicate field names, e.g. “run”. The default regex named group replace fails with duplicate field names. To handle this we append each duplicate field name with “_” so the re.groupdict method can work properly.

Parameters:
  • value (str) – the input regex search pattern

  • names (list) – a list of path field names

Returns:

str – the new regex search pattern

datamodel.generate.parse.find_kwargs(location: str, example: str) dict[source]

Find and extract keyword arguments

Attempts to extract keyword argumets from an input abstract datamodel path location and its example path. The location and example parts must match exactly. For example, given “{mjd}/sdR-{br}{id}-{frame}.fits.gz” and “55049/sdR-b1-00100006.fits.gz”, it returns {‘mjd’: ‘55049’, ‘br’: ‘b’, ‘id’: ‘1’, ‘frame’: ‘00100006’}

Parameters:
  • location (str) – a datamodel abstract location

  • example (str) – a datamodel example location

Returns:

dict – any extracted keyword arguments

datamodel.generate.parse.get_abstract_key(key: str = None, add_brackets: bool = None) str[source]

Sanitize the path keyword name

Sanitizes the path keyword name. Upper cases the keyword name and appends any formatting numbers as an integer to the end of name. E.g. “plate:0>5” is converted to “PLATE5”.

Parameters:

key (str, optional) – The keywork name, by default None

Returns:

str – the sanitized keyword name

datamodel.generate.parse.get_abstract_path(path: str = None, add_brackets: bool = None) str[source]

Converts a path template into an abstract path

Converts a path template into an abstract path. Extracts bracketed keywords from a path a template and converts them to named uppercase. For example, MANGA_SPECTRO_REDUX/{drpver}/{plate}/stack/manga-{plate}-{ifu}-{wave}CUBE.fits.gz is converted to MANGA_SPECTRO_REDUX/DRPVER/PLATE/stack/manga-PLATE-IFU-WAVECUBE.fits.gz.

Parameters:

path (str, optional) – the path template, by default None

Returns:

str – the abstracted path

datamodel.generate.parse.get_file_spec(file_spec: str = None) str[source]

Checks validity of file species string

Checks if the file species name is a valid Python identifier.

Parameters:

file_spec (str, optional) – the name of the file species, by default None

Returns:

str – the name of the file species

datamodel.generate.parse.remap_patterns(value: str) str[source]

Remaps regex search patterns for certain fields

Some paths have abutted keywords, i.e. “{br}{id}” or “{dr}{version}”. The default regex search pattern of “.+?” can sometimes handle these but sometimes not. We replace certain fields with specific patterns to help the extraction process.

Parameters:

value (str) – the input regex search pattern

Returns:

str – the new regex search pattern

class datamodel.generate.stub.AccessStub(datamodel=None, use_cache_release: str = None, full_cache: bool = None, verbose: bool = None, force: bool = None)[source]

Bases: BaseStub

cacheable: bool = False
format: str = 'access'
has_template: bool = False
class datamodel.generate.stub.BaseStub(datamodel=None, use_cache_release: str = None, full_cache: bool = None, verbose: bool = None, force: bool = None)[source]

Bases: ABC

add_datamodel(datamodel)[source]
commit_to_git() None[source]

Commit the stub to Github

classmethod from_datamodel(datamodel)[source]
push_to_git() None[source]

Push changes to Github

remove_from_git() None[source]

Remove file from the git repo

remove_output() None[source]

Delete the yaml file on disk

remove_release(release: str)[source]

Remove a release from the datamodel stub

render_content(force: bool = None, force_release: str = None) None[source]

Populate the yaml template with generated content

update_cache(force: bool = None) None[source]

Update the in-memory stub cache from the on-disk file

validate_cache()[source]

Validate the yaml cache

write(force: bool = None, use_cache_release: str = None, full_cache: bool = None, **kwargs) None[source]
cacheable = False
format = None
has_template = True
class datamodel.generate.stub.JsonStub(datamodel=None, use_cache_release: str = None, full_cache: bool = None, verbose: bool = None, force: bool = None)[source]

Bases: BaseStub

format: str = 'json'
has_template: bool = False
class datamodel.generate.stub.MdStub(datamodel=None, use_cache_release: str = None, full_cache: bool = None, verbose: bool = None, force: bool = None)[source]

Bases: BaseStub

get_selected_release(release: str = None, group: str = 'WORK') str[source]

get the hdu content for a given release

render_content(force: bool = None, release: str = None, group: str = 'WORK') None[source]

Populate the yaml template with generated content

write(force: bool = None, release: str = None, group: str = 'WORK', html: bool = None, use_cache_release: str = None, full_cache: bool = None, **kwargs) None[source]
format: str = 'md'
class datamodel.generate.stub.YamlStub(datamodel=None, use_cache_release: str = None, full_cache: bool = None, verbose: bool = None, force: bool = None)[source]

Bases: BaseStub

cacheable: bool = True
format: str = 'yaml'
datamodel.generate.stub.stub_iterator(format: str = None) Iterator[BaseStub][source]

Iterator for all stub formats

Filetypes

class datamodel.generate.filetypes.base.BaseFile(cache: dict, datamodel=None, stub=None, filename: str = None, release: str = None, file_species: str = None, design: bool = None, use_cache_release: str = None, full_cache: bool = None)[source]

Bases: ABC

Base class for supported datamodel file types

This is the abstract base class used for defining new file types to be supported by the sdss datamodel product.

Parameters:
  • cache (dict) – The initial yaml cache to be populated.

  • datamodel (DataModel, optional) – an SDSS datamodel for the file, by default None

  • stub (Stub, optional) – an datamodel Stub for the file, by default None

  • filename (str, optional) – the name of file, by default None

  • release (str, optional) – the data release, by default None

  • file_species (str, optional) – the file species name, by default None

  • design (bool, optional) – whether the datamodel is in design mode, by default None

  • use_cache_release (str, optional) – the release to pull existing cache from, by default None

  • full_cache (bool, optional) – whether to use the entire previous cache, by default None

Raises:

ValueError – when datamodel is not provided when (filename, release, file_species) are not provided.

abstract _generate_new_cache() dict[source]

Abstract method to be implemented by subclass, for generating new cache content

This method is used to generate the file content for new datamodel YAML files. It should return a dictionary to be stored as the value of the cache key.

abstract static _get_designed_object(data: dict)[source]

Abstract static method to be implemented by subclass, for creating a valid object from cache

This method is used to create a data object from a designed YAML cache content. It should return a new designed object. Ideally the object should be created through the Pydantic model’s model_validate to ensure proper validation and field type coercion. This method is called by create_from_cache which sets the object as the self._designed_object attribute.

Parameters:

data (dict) – The YAML cache value for the cache_key field

abstract _update_partial_cache(cached_data: dict, old_cache: dict) dict[source]

Abstract method to be implemented by subclass, for partially updating cache content

This method updates the descriptions or comments of the new cached_data with the human-edited fields from the old_cache data. Used when adding a new release to a datamodel and retaining the old descriptions from the previous release. This method should return the cached_data object.

Parameters:
  • cached_data (dict) – The YAML cache for a the current release

  • old_cache (dict) – The YAML cache for a previous release

create_from_cache(release: str = 'WORK')[source]

Create a file object from the yaml cache

Converts the cache_key dictionary entry in the YAML cache into a file object.

Parameters:

release (str, optional) – the name of the data release, by default ‘WORK’

Returns:

object – a valid file object

Raises:
  • ValueError – when the release is not in the cache

  • ValueError – when the release is not WORK when in the datamodel design phase

abstract design_content()[source]

Abstract method to be implemented by subclass, for designing file content

This method is used to design new content for a YAML datamodel cache for new files from within Python. It should ultimately update the cache line self._cache[‘releases’][‘WORK’][self.cache_key] = [updated_cache_content] with the new content. This method is called by the DataModel’s global design_content method.

abstract write_design(file: str, overwrite: bool = None)[source]

Abstract method to be implemented by subclass, for writing a design to a file

This method is used to write out the designed data object. It should call the self.designed_object’s particular method for writing itself to a file, specific to that filetype.

Parameters:
  • file (str) – The datamodel filename to write to

  • overwrite (bool) – Flag to overwrite the file if it exists, by default None

aliases = []
cache_key = None
compressions = ['.gz', '.bz2', '.zip']
suffix = None
datamodel.generate.filetypes.base.file_selector(suffix: str = None) BaseFile[source]

Selects the correct File class given a file suffix

datamodel.generate.filetypes.base.format_bytes(value: int = None) str[source]

Convert an integer to human-readable format.

Parameters:

value (int) – An integer representing number of bytes.

Returns:

str – Size of the file in human-readable format.

datamodel.generate.filetypes.base.get_filesize(file) str[source]

Get the size of the input file.

Returns:

str – Size of the file in human-readable format.

datamodel.generate.filetypes.base.get_filetype(file) str[source]

Get the extension of the input file.

Returns:

str – File type in upper case.

datamodel.generate.filetypes.base.get_supported_filetypes() list[source]

Get a list of supported filetypes

Constructs a list of supported filetypes for datamodels, based on the BaseFile subclasses. Collects each subclass file suffix attribute as well as any designated aliases.

Returns:

list – A list of supported file types

class datamodel.generate.filetypes.fits.FitsFile(*args, **kwargs)[source]

Bases: BaseFile

Class for supporting FITS files

design_content(ext: str = 'primary', extno: int = None, name: str = 'EXAMPLE', description: str = None, header: list | dict | Header = None, columns: List[list | dict | Column] = None, **kwargs) None[source]

Design a new HDU

Design a new astropy HDU for the given datamodel. Specify the extension type ext to indicate a PRIMARY, IMAGE, or BINTABLE HDU extension. Each new HDU is added to the YAML structure using next hdu extension id found, or the one provided with extno. Use name to specify the name of the HDU extension.

header can be a Header instance, a list of tuples of header keywords, conforming to (keyword, value, comment), or list of dictionaries conforming to {“keyword”: keyword, “value”: value, “comment”: comment}.

columns can be a list of Column objects, a list of tuples minimally conforming to (name, format, unit), or list of dictionaries minimally conforming to {“name”: name, “format”: format, “unit”: unit}. See Astropy’s Binary Table Column Format for the allowed format values. When supplying a list of tuples or dictionaries, can include any number of valid arguments into Column.

Parameters:
  • ext (str, optional) – the type of HDU to create, by default ‘primary’

  • extno (int, optional) – the extension number, by default None

  • name (str, optional) – the name of the HDU extension, by default ‘EXAMPLE’

  • description (str, optional) – a description for the HDU, by default None

  • header (Union[list, dict, fits.Header], optional) – valid input to create a Header, by default None

  • columns (List[Union[list, dict, fits.Column]], optional) – a list of binary table columns, by default None

  • force (bool) – If True, forces a new design even if the HDU already exists, by default None

  • **kwargs – additional keyword arguments to pass to the HDU constructor

  • optional – additional keyword arguments to pass to the HDU constructor

Raises:
  • ValueError – when the ext type is not supported

  • ValueError – when the table columns input is not a list

write_design(file: str, overwrite: bool = True) None[source]

Write out the designed file

Write out a designed fits.HDUList object to a file on disk. Must have run create_from_cache method first.

Parameters:
  • file (str) – The designed filename

  • overwrite (bool, optional) – If True, overwrites any existing file, by default True

Raises:

AttributeError – when the designed object does not exit

aliases = ['FIT']
cache_key = 'hdus'
suffix = 'FITS'
class datamodel.generate.filetypes.par.ParFile(*args, **kwargs)[source]

Bases: BaseFile

Class for supporting Yanny par files

design_content(comments: str = None, header: list | dict = None, name: str = None, description: str = None, columns: list = None) None[source]

Design a new Yanny par section

Design a new Yanny par for the given datamodel. Specify Yanny comments, a header section, or a table definition. Each new table is added to the YAML structure. Use name, and description to specify the name and description of the new table. comments can be a single string of comments, with newlines indicated by “\n”

header can be a dictionary of key-value pairs, a list of tuples of header keywords, conforming to (keyword, value, comment), or list of dictionaries conforming to {“key”: key, “value”: value, “comment”: comment}.

The columns parameter defines the relevant table columns to add to the file. It can be a list of column names, a list of tuple values conforming to column (name, type, [description]), or a list of dictionaries with keys defined from the complete column yaml definition.

Allowed column types are any valid Yanny par types, input as strings, e.g. “int”, “float”, “char”. Array columns can be specified by including the array size in “[]”, e.g. “float[6]”. Enum types are defined by setting is_enum to True, and by providing a list of possible values via enum_values.

Parameters:
  • comments (str, optional) – Any comments to add to the file, by default None

  • header (Union[list, dict], optional) – Keywords to add to the header of the Yanny file, by default None

  • name (str, optional) – The name of the parameter table

  • description (str, optional) – A description of the parameter table

  • columns (list, optional) – A set of Yanny table column definitions

write_design(file: str, overwrite: bool = True) None[source]

Write out the designed file

Write out a designed Yanny par object to a file on disk. Must have run create_from_cache method first.

Parameters:
  • file (str) – The designed filename

  • overwrite (bool, optional) – If True, overwrites any existing file, by default True

Raises:

AttributeError – when the designed object does not exit

cache_key = 'par'
suffix = 'PAR'
class datamodel.generate.filetypes.par.literal[source]

Bases: str

datamodel.generate.filetypes.par.literal_presenter(dumper, data)[source]
class datamodel.generate.filetypes.hdf5.HdfFile(cache: dict, datamodel=None, stub=None, filename: str = None, release: str = None, file_species: str = None, design: bool = None, use_cache_release: str = None, full_cache: bool = None)[source]

Bases: BaseFile

Class for supporting HDF5 files

design_content(name: str = '/', description: str = None, hdftype: str = 'group', attrs: list = None, ds_shape: tuple = None, ds_dtype: str = None)[source]

Design a new HDF5 section for the datamodel

Design a new HDF entry for the given datamodel. Specify h5py groups or dataset definitions, with optional list of attributes. Each new entry is added to the members entry in the YAML structure. Use name, and description to specify the name and description of each new group or dataset the new table. Use hdftype to specify a “group” or “dataset” entry. For datasets, use ds_shape, ds_size, and ds_dtype to specify the shape, size, and dtype of the array dataset.

New HDF5 members are added to the datamodel in a flattened structure. To add a new group or dataset as a child to an existing group, specify the full path in name, e.g /mygroup/mydataset.

attrs can be a list of tuples of header keywords, conforming to (key, value, comment, dtype), or list of dictionaries conforming to {“key”: key, “value”: value, “comment”: comment, “dtype”: dtype}.

Allowed attribute or dataset dtypes are any valid string representation of numpy dtypes. For example, “<i8”, “int32”, “S10”, etc.

Parameters:
  • name (str, optional) – the name of the HDF group or dataset, by default ‘/’

  • description (str, optional) – a description of the HDF group or dataset, by default None

  • hdftype (str, optional) – the type of HDF5 object, by default ‘group’

  • attrs (list, optional) – a list of HDF5 Attributes, by default None

  • ds_shape (tuple, optional) – the shape of an HDF5 array dataset, by default None

  • ds_dtype (str, optional) – the dtype of an HDF5 array dataset, by default None

Raises:

ValueError – when an invalid hdftype is specified

write_design(file: str, overwrite: bool = None) None[source]

Write out the designed file

Write out a designed HDF5 object to a file on disk. Must have run create_from_cache method first.

Parameters:
  • file (str) – The designed filename

  • overwrite (bool, optional) – If True, overwrites any existing file, by default True

Raises:

AttributeError – when the designed object does not exit

aliases = ['HDF5']
cache_key = 'hdfs'
suffix = 'H5'

Changelog

class datamodel.generate.changelog.core.ChangeLog(the_list: list, **kwargs)[source]

Bases: list

Class that holds the change logs for all input files

Contains a list of all FileDiff objects. Mainly used as a container to iterate over many changelogs and generate a string report or dictionary object for each item in the list.

Parameters:

the_list (list) – A list of FileDiff objects

generate_report(split: bool = None, insert: bool = True) str | list[source]

Generate a string report of all changelogs

Iterates over all changelogs and builds a complete report containing all differences.

Parameters:
  • split (bool, optional) – If True, splits the string, by default None

  • insert (bool, optional) – If True, inserts a divider between changelogs, by default True

Returns:

str | list – A generated report of all changelogs

get_changes() dict[source]
class datamodel.generate.changelog.core.FileDiff(file1: str, file2: str, versions: list = None, diff_type: str = None)[source]

Bases: ABC, object

Class that holds the difference between two files

Creates an object that compares the difference between two files. Base class that is subclassed by FitsDiff and CatalogDiff.

Parameters:
  • file1 (str) – the filepath to compute the changes against

  • file2 (str) – the filepath to compute the changes from

  • versions (list, optional) – the named releases/versions corresponding to the two input files, by default None

  • diff_type (str, optional) – the object data type of which to compute the difference, by default None

abstract report()[source]

Print a report

datamodel.generate.changelog.core.compute_changelog(items: list, change: str = 'fits') ChangeLog[source]

Compute the changelogs between a list of datamodels

Given an input list of DataModel objects, computes the differences between them using the on-disk real file location. By default computes the FITS differences.

Parameters:
  • items (list) – A list of datamodels

  • change (str, optional) – The type of object, by default fits

Returns:

ChangeLog – A list of changelogs

datamodel.generate.changelog.core.compute_diff(oldfile: str, otherfile: str, change: str = 'fits', versions: list = None) FileDiff[source]

Produce a single changelog between two files

Produce a difference object for two files

Parameters:
  • oldfile (str) – The filepath to check changes against

  • otherfile (str) – The filepath to check changes from

  • change (str, optional) – The type of data input, by default ‘fits’

  • versions (list, optional) – The named releases/versions of the corresponding file inputs, by default None

Returns:

FileDiff – An instance containing the differences between the two files

Raises:

ValueError – when no valid input filepath is given

datamodel.generate.changelog.core.diff_selector(suffix: str = None) FileDiff[source]

Select the correct class given a file suffix

class datamodel.generate.changelog.yaml.YamlDiff(content: dict = None, file: str = None)[source]

Bases: ABC

Computes the difference between two releases in YAML cache

Computes the differences in HDU content between releases in a given YAML datamodel file, or cached dictionary.

Parameters:
  • content (dict, optional) – The yaml cache content for a given datamodel, by default None

  • file (str, optional) – A path to a yaml datamodel file, by default None

Raises:
  • ValueError – when no yaml filepath or cache content is provided

  • ValueError – when no releases can be identified from the yaml content

abstract _get_changes(version1: str, version2: str, simple: bool = None) dict[source]

Abstract method to be implemented by subclass, for generating changelog content

This method is used to construct a dictionary of changes between two releases for the given file YAML content. It should return a dictionary object, minimally of the form {version1: {“from”: version2, “key1”: value1, “key2”: value2, …}} where key1: value1, etc are the custom changes between the two releases. The input version1 is the new release, and version2 is the older release of which to compute the difference.

clean_empty(d: dict) dict[source]

clean up an empty dictionary

compute_changelog(version1: str = 'A', version2: str = 'B', simple: bool = False) dict[source]

Compute the changelog between two releases

Computes the changes between two releases in a given YAML cache. Compares the “hdus” entries in each release, and looks for differences in HDU extension number, added or removed HDU extensions, differences in primary header keyword number, and any added or removed primary header keywords.

Parameters:
  • version1 (str, optional) – The release to check differences against, by default ‘A’

  • version2 (str, optional) – The release to check differences from, by default ‘B’

  • simple (bool, optional) – If True, simplfies the changelog entries to only non-null values, by default False

Returns:

dict – a dictionary of found changes

Raises:

ValueError – when no HDULists are found in the YAML cache

generate_changelog(order: list = None, simple: bool = False) dict[source]

Generate a full changelog dictionary across all releases

Iterate over all releases and generate a complete changelog from one release to another. The release order to compute the changelog can be specified by passing in a desired list of releases to the order keyword. Set simple to True to produce a cleaner, simpler changelog, containing only non-null entries.

Parameters:
  • order (list, optional) – The order of releases to generate changelog from, by default None

  • simple (bool, optional) – If True, simplfies the changelog entries to only non-null values, by default False

Returns:

dict – A complete changelog dictionary over all releases

has_changes(version1: str = 'A', version2: str = 'B') bool[source]

Check if there are any changes between two releases

Computes the changelog between two releases and returns a flag if changes are detected. Compares the differences of release “version1” from release “version2”.

Parameters:
  • version1 (str, optional) – The release to check differences against, by default ‘A’

  • version2 (str, optional) – The release to check differences from, by default ‘B’

Returns:

bool – True if any changes detected

cache_key = None
suffix = None
datamodel.generate.changelog.yaml.yamldiff_selector(suffix: str = None) YamlDiff[source]

Select the correct class given a file suffix

class datamodel.generate.changelog.filetypes.catalog.CatalogDiff(file1: str | Table, file2: str | Table, full: bool = None, versions: list = None)[source]

Bases: FileDiff

Compute the difference between two catalog files

Computes the differences in catalog content between two input ascii catalog files, e.g. CSV. Looks for changes in row number, column number, and any added or removed column names.

Parameters:
  • file1 (Union[str, Table]) – the filepath or Table to compute the changes against

  • file2 (Union[str, Table]) – the filepath or Table to compute the changes from

  • full (bool, optional) – If True, compute the full Astropy Ascii Table differences, by default None

  • versions (list, optional) – the named releases/versions corresponding to the two input files, by default None

get_astropy_diff() str[source]

Returns the full Astropy diff using report_diff_values

Returns:

str – the complete difference between the two catalog files

report(split: bool = None, full: bool = None) str[source]

Print the catalog difference report

Returns the catalog differences as a string blob. Can optionally return the report as a list of string lines.

Parameters:
  • split (bool, optional) – if True, splits the report into a list of string lines, by default None

  • full (bool, optional) – if True, appends the full Astropy catalog diff report

Returns:

str – The difference report as a string blob

suffix = 'CATALOG'
class datamodel.generate.changelog.filetypes.fits.FitsDiff(file1: str | HDUList, file2: str | HDUList, full: bool = None, versions: list = None)[source]

Bases: FileDiff

Compute the difference between two FITS files

Computes the differences in HDUList content between two input FITS files. Looks for changes in HDU extension number, any added or removed HDU extensions, as well as any changes in the primary header keywords.

Parameters:
  • file1 (Union[str, fits.HDUList]) – the filepath or HDUList to compute the changes against

  • file2 (Union[str, fits.HDUList]) – the filepath or HDUList to compute the changes from

  • full (bool, optional) – If True, compute the full Astropy FITS HDUList differences, by default None

  • versions (list, optional) – the named releases/versions corresponding to the two input files, by default None

get_astropy_diff() FITSDiff[source]

Returns the full Astropy FITSDiff

Returns:

fits.FITSDiff – the complete difference between the two FITS files

report(split: bool = None) str[source]

Print the FITS difference report

Returns the FITS differences as a string blob. Can optionally return the report as a list of string lines.

Parameters:

split (bool, optional) – if True, splits the report into a list of string lines, by default None

Returns:

str – The difference report as a string blob

suffix = 'FITS'
class datamodel.generate.changelog.filetypes.fits.YamlFits(content: dict = None, file: str = None)[source]

Bases: YamlDiff

Class for supporting YAML changelog generation for FITS files

cache_key = 'hdus'
suffix = 'FITS'
class datamodel.generate.changelog.filetypes.par.ParDiff(file1: str | None, file2: str | None, versions: list = None)[source]

Bases: FileDiff

Class for computing differences between Yanny par files

Computes the differences in table content between two Yanny par files. Looks for changes in header keys, table number, and any added or removed keys, tables, or table columns.

Parameters:
  • file1 (Union[str, Table]) – the filepath or Table to compute the changes against

  • file2 (Union[str, Table]) – the filepath or Table to compute the changes from

  • versions (list, optional) – the named releases/versions corresponding to the two input files, by default None

report(split: bool = None, full: bool = None) str[source]

Print the yanny par difference report

suffix = 'PAR'
class datamodel.generate.changelog.filetypes.par.YamlPar(content: dict = None, file: str = None)[source]

Bases: YamlDiff

Class for supporting YAML changelog generation for Yanny par files

cache_key = 'par'
suffix = 'PAR'
class datamodel.generate.changelog.filetypes.hdf5.YamlHDF5(content: dict = None, file: str = None)[source]

Bases: YamlDiff

Class for supporting YAML changelog generation for HDF5 files

cache_key = 'hdfs'
suffix = 'H5'

Git

class datamodel.gitio.gitio.Git(verbose=None)[source]

Bases: object

Class to run the git commands

Wrapper class to the GitPython package.

add(path: str = None)[source]

Add a file to the git repo

Performs a “git add” on the datamodel repo

Parameters:

path (str, optional) – the full path of the file to add, by default None

Raises:
check_if_untracked(path: str = None) bool[source]

Checks if a file is untracked in the git repo

Parameters:

path (str, optional) – the full path of the file to add, by default None

Returns:

bool – if the file is untracked

checkout(branch: str = None)[source]

Checks out a branch from the git repo

Performs a “git checkout” on the datamodel repo. If the branch does not exist, it will be created.

Parameters:

branch (str, optional) – the name of the branch to checkout, by default None

Raises:

RuntimeError – when the git command fails

clone(product: str = None, branch: str = None)[source]

Clones the git repo

Performs a “git clone” of the datamodel repo.

Parameters:
  • product (str, optional) – the Github repo URL by default None

  • branch (str, optional) – the name or directory path of the clone, by default None

Raises:

RuntimeError – when the git command fails

commit(message: str = None)[source]

Commit a file to the git repo

Performs a “git commit” on the datamodel repo

Parameters:

message (str, optional) – a git commit message, by default None

Raises:
create_new_branch(branch: str = None)[source]

Create a new branch

Create a new branch. If no branch name is provided, it will create a branch name based on the email head found in the git user config. If none found, creates a random branch name using a UUID.

Parameters:

branch (str, optional) – the name of the branch to create, by default None

fetch()[source]

Fetch from Github remote origin

Performs a “git fetch” on the datamodel repo

Raises:

RuntimeError – when the git command fails

get_path_location(path: str = None) str[source]

Gets the path location

Gets the location relative to the git repo directory of the filepath.

Parameters:

path (str, optional) – the full path of the file to add, by default None

Returns:

str – the relative location of the path

list_branches(pprint: bool = None) list[source]

List all local branches for the repo

list_remotes(pprint: bool = None) list[source]

List all remotes for the repo

pull()[source]

Pull from Github remote origin

Performs a “git pull” on the datamodel repo

Raises:
push()[source]

Push to Github remote origin

Performs a “git push” on the datamodel repo

Raises:
rm(path: str = None)[source]

Remove a file from the git repo

Performs a “git rm” on the datamodel repo

Parameters:

path (str, optional) – the full path of the file to remove, by default None

Raises:
set_repo()[source]

Set the repo if needed

status() str[source]

Return the git status of the repo

property branch_exists_on_remote: bool

if the current active branch exists at the remote

property current_branch: str

the current active branch

property directory: str

the directory of the git repo

property is_dirty: bool

if the repo is dirty

property is_main_branch: bool

if the current active branch is the main branch

property origin

the git remote origin

Io

datamodel.io.loaders.dm(loader, node)[source]
datamodel.io.loaders.get_yaml_files(get: str = None) str | list[source]

Get a list of yaml files

Return a list of YAML files in the datamodel directory.

Parameters:

get (str, optional) – type of yaml file to get, can be “releases” or “products”, by default None

Returns:

Union[str, list] – The yaml file path or list of yaml file paths

datamodel.io.loaders.include(loader, node)[source]
datamodel.io.loaders.read_yaml(ymlfile: str | Path) dict[source]

Opens and reads a YAML file

Parameters:

ymlfile (Union[str, pathlib.Path]) – a file or pathlib.Path object

Returns:

dict – the YAML content

datamodel.io.move.construct_new_path(file: str | Path = None, old_path: str | Path = None, new_path: str | Path = None, release: str = None, kwargs: dict = None) Path[source]

Construct a new filepath

Constructs a new filepath, either from an abstract path location and a set of keyword arguments, or from an existing (old) filepath and abstract location.

Parameters:
  • file (Union[str, pathlib.Path], optional) – the existing full filepath, by default None

  • old_path (Union[str, pathlib.Path], optional) – the existing species abstract path, by default None

  • new_path (Union[str, pathlib.Path], optional) – the new species abstract path, by default None

  • release (str, optional) – the SDSS release, by default None

  • kwargs (dict, optional) – a set of path keyword arguments, by default None

Returns:

pathlib.Path – a full filepath

datamodel.io.move.dm_move(old: str, new: str, parent: bool = None, symlink: bool = True)[source]

_summary_

_extended_summary_

Parameters:
  • old (str) – _description_

  • new (str) – _description_

  • parent (bool, optional) – _description_, by default None

  • symlink (bool, optional) – _description_, by default True

datamodel.io.move.dm_move_species(abstract_path: str, new_path: str, release: str, parent: bool = None, symlink: bool = True, test: bool = None)[source]

Moves all files from a species to a new location

Moves all files from a given file species. Finds all real files that match an existing file species abstract path, and moves them to a new location. The location is determined by the original filename, a new abstract path location, and a given release.

Parameters:
  • abstract_path (str) – the existing species abstract path

  • new_path (str) – the new species abstract path

  • release (str) – the SDSS release

  • parent (bool, optional) – flag to move the entire parent directory, by default None

  • symlink (bool, optional) – flag to create a symlink from new location to old one, by default True

  • test (bool, optional) – flag to test the move, by default None

datamodel.io.move.find_files_from_species(path: str) Iterator[source]

Find all files species from an abstract path

Finds all files matching the species pattern in a given abstract path.

Parameters:

path (str) – an abstract file species path

Returns:

Iterator – Iterator over all matching files found

Models

class datamodel.models.base.BaseList[source]

Bases: BaseModel

Base pydantic class for lists of models

list_names()[source]

Create a simplified list of name attributes

sort(field: str, key: Callable = None, **kwargs) None[source]

Sort the list of models by a pydantic field name

Performs an in-place sort of the Pydantic Models using Python’s built-in sorted() method. Sets the newly sorted list to the root attribute, to preserve the original BaseList object instance. By default, the input sort key to the sorted function is the field attribute on the model.

Parameters:
  • field (str) – The Pydantic field name

  • key (Callable, optional) – a function to be passed into the sorted() function, by default None

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class datamodel.models.base.CoreModel[source]

Bases: BaseModel

Custom BaseModel

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

datamodel.models.base.add_repr(schema: Dict[str, Any], model: Type[BaseModel]) None[source]

Adds custom information into the schema

class datamodel.models.releases.Release(*, name: str, description: str, public: bool = False, release_date: str | date = 'unreleased')[source]

Bases: CoreModel

Pydantic model presenting an SDSS release

Parameters:
  • name (str) – The name of the release

  • description (str) – A description of the release

  • public (bool) – Whether the release is public or not

  • release_date (datetime.date) – The date of the release

Raises:

ValueError – when the release name does not start with a valid SDSS release code

classmethod name_check(value)[source]
description: str
model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str
public: bool
release_date: str | date
class datamodel.models.releases.Releases(root: RootModelRootType = PydanticUndefined)[source]

Bases: BaseList, RootModel[List[Release]]

Pydantic model representing a list of Releases

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class datamodel.models.yaml.Access(*, in_sdss_access: bool, path_name: str | None = None, path_template: str | None = None, path_kwargs: List[str] | None = None, access_string: str | None = None)[source]

Bases: CoreModel

Pydantic model representing the YAML releases access section

Parameters:
  • in_sdss_access (bool) – Whether or not the data product has an sdss_access entry

  • path_name (str) – The path name in sdss_access for the data product

  • path_template (str) – The path template in sdss_access for the data product

  • path_kwargs (List[str]) – A list of path keywords in the path_template for the data product

  • access_string (str) – The full sdss_access entry, “path_name=path_template”

classmethod check_path_kwargs(value: str, info: ValidationInfo)[source]
classmethod check_path_nulls(value: str, info: ValidationInfo)[source]
access_string: str | None
in_sdss_access: bool
model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

path_kwargs: List[str] | None
path_name: str | None
path_template: str | None
class datamodel.models.yaml.ChangeBase(*, from_: str, note: str | None = None)[source]

Bases: CoreModel

Base Pydantic model representing a YAML changelog release section

from_: str
model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

note: str | None
class datamodel.models.yaml.ChangeLog(*, description: str, releases: Dict[str, ChangeRelease] = None)[source]

Bases: CoreModel

Pydantic model representing the YAML changelog section

Parameters:
  • description (str) – A description of the changelog

  • releases (Dict[str, ChangeRelease]) – A dictionary of the file changes between the given release and previous one

dict(**kwargs)[source]

override dict method to exclude none fields by default

Need to override this method as well when serializing YamlModel to json, because nested models are already converted to dict when json.dumps is called. See https://github.com/samuelcolvin/pydantic/issues/1778

model_dump_json(**kwargs)[source]

override json method to exclude none fields by default

description: str
model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

releases: Dict[str, ChangeRelease]
class datamodel.models.yaml.ChangeRelease(*, from_: str, note: str | None = None, delta_nhdus: int | None = None, added_hdus: List[str] | None = None, removed_hdus: List[str] | None = None, primary_delta_nkeys: int | None = None, added_primary_header_kwargs: List[str] | None = None, removed_primary_header_kwargs: List[str] | None = None, delta_nkeys: int | None = None, addead_header_keys: List[str] | None = None, removed_header_keys: List[str] | None = None, delta_ntables: int | None = None, addead_tables: List[str] | None = None, removed_tables: List[str] | None = None, tables: Dict[str, ChangeTable] | None = None, new_libver: tuple | None = None, delta_nattrs: int | None = None, addead_attrs: List[str] | None = None, removed_attrs: List[str] | None = None, delta_nmembers: int | None = None, addead_members: List[str] | None = None, removed_members: List[str] | None = None, members: Dict[str, ChangeMember] | None = None)[source]

Bases: ChangeHdf, ChangePar, ChangeFits, ChangeBase

Pydantic model representing a YAML changelog release section

Represents a computed section of the changelog, for the specified release. Changelog is computed between the data products of release (key) and the release indicated in from.

Parameters:
  • from (str) – The release the changelog is computed from

  • delta_nhdus (int) – The difference in number of HDUs

  • added_hdus (List[str]) – A list of any added HDUs

  • removed_hdus (List[str]) – A list of any removed HDUs

  • primary_delta_nkeys (int) – The difference in primary header keywords

  • added_primary_header_kwargs (List[str]) – A list of any added primary header keywords

  • removed_primary_header_kwargs (List[str]) – A list of any removed primary header keywords

  • delta_nkeys (int) – The difference in number of Yanny header keys

  • added_header_keys (List[str]) – A list of any added Yanny header keywords

  • removed_header_keys (List[str]) – A list of any removed Yanny header keywords

  • delta_tables (int) – The difference in number of Yanny tables

  • added_tables (List[str]) – A list of any added Yanny tables

  • removed_tables (List[str]) – A list of any removed Yanny tables

  • tables (Dict[str, ChangeTable]) – A dictionary of table column and row changes

  • new_libver (tuple) – The difference in HDF5 library version

  • delta_nattrs (int) – The difference in the number of HDF5 Attributes

  • added_attrs (List[str]) – A list of any added HDF5 Attributes

  • removed_attrs (List[str]) – A list of any removed HDF5 Attributes

  • delta_nmembers (int) – The difference in number members in HDF5 file

  • added_members (List[str]) – A list of any added HDF5 groups or datasets

  • removed_members (List[str]) – A list of any removed HDF5 groups or datasets

  • members (Dict[str, ChangeMember]) – A dictionary of HDF5 group/dataset member changes

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class datamodel.models.yaml.GeneralSection(*, name: str, short: str, description: str, environments: List[str] = None, surveys: List[str | Survey] = None, datatype: str | None, filesize: str | None, releases: List[str | Release] = None, naming_convention: str, generated_by: str, design: bool = None, vac: bool = None, recommended_science_product: bool = None, data_level: DataLevel = None)[source]

Bases: CoreModel

Pydantic model representing the YAML general section

Parameters:
  • name (str) – The file species name of the data product (or sdss_access path_name)

  • short (str) – A one sentence summary of the data product

  • description (str) – A longer description of the data product

  • environments (List[str]) – A list of environment variables associated with the data product

  • datatype (str) – The type of data product, e.g. FITS

  • filesize (str) – An estimated size of the data product

  • releases (List[str]) – A list of SDSS releases the data product is in

  • naming_convention (str) – A description of the naming convention

  • generated_by (str) – An identifiable piece of the code that generates the data product

  • design (bool) – If True, the datamodel is in the design phase, before any file exists yet

  • vac (bool) – True if the datamodel is a VAC

  • recommended_science_product (bool) – True if the product is recommended for science use

  • data_level (str) – The product level or ranking, as numeral x.y.z

Raises:

ValueError – when any of the releases are not a valid SDSS Release

classmethod no_design(value: bool)[source]

Validator to check if the design flag is set to True

classmethod valid_data_level(value: DataLevel)[source]
data_level: DataLevel
datatype: str | None
description: str
design: bool
environments: List[str]
filesize: str | None
generated_by: str
model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str
naming_convention: str
recommended_science_product: bool
releases: List[AnnoRelease]
short: str
surveys: List[AnnoSurvey]
vac: bool
class datamodel.models.yaml.ProductModel(*, general: GeneralSection, changelog: ChangeLog, releases: Dict[str, ReleaseModel], notes: str = None, regrets: str = 'I have no regrets!')[source]

Bases: YamlModel

Pydantic model representing a data product JSON file

Parameters:
  • general (GeneralSection) – The general metadata section of the datamodel

  • changelog (ChangeLog) – An automated log of data product changes across releases

  • releases (Dict[str, ReleaseModel]) – A dictionary of information specific to that release

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class datamodel.models.yaml.ReleaseModel(*, template: str, example: str | None, location: str, environment: str, survey: str = None, access: Access, hdus: Dict[str, HDU] | None = None, par: ParModel | None = None, hdfs: HdfModel | None = None)[source]

Bases: CoreModel

Pydantic model representing an item in the YAML releases section

Contains any information on the data product that is specific to a given release, or that changes across releases.

Parameters:
  • template (str) – The full template representation of the path to the data product

  • example (str) – A real example path of the data product

  • location (str) – The symbolic location of the data product

  • environment (str) – The SAS environment variable the product lives under

  • access (Access) – Information on any relevant sdss_access entry

  • hdus (Dict[str, HDU]) – A dictionary of HDU content for the product for the given release

convert_to_hdulist() HDUList[source]

Convert the hdus to a fits.HDUList

access: Access
environment: str
example: str | None
hdfs: HdfModel | None
hdus: Dict[str, HDU] | None
location: str
model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

par: ParModel | None
survey: str
template: str
class datamodel.models.yaml.YamlModel(*, general: GeneralSection, changelog: ChangeLog, releases: Dict[str, ReleaseModel], notes: str = None, regrets: str = 'I have no regrets!')[source]

Bases: CoreModel

Pydantic model representing a YAML file

Parameters:
  • general (GeneralSection) – The general metadata section of the datamodel

  • changelog (ChangeLog) – An automated log of data product changes across releases

  • releases (Dict[str, ReleaseModel]) – A dictionary of information specific to that release

  • notes (str) – A string or multi-line text blob of additional information

  • regrets (str) – A string or multi-line text blob of any regrets over the datamodel

changelog: ChangeLog
general: GeneralSection
model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

notes: str
regrets: str
releases: Dict[str, ReleaseModel]
datamodel.models.yaml.check_gen_release(value: str) str[source]

Validator to check release against list of releases

datamodel.models.yaml.check_survey(value: str) str[source]

Validator to check survey against list of surveys

datamodel.models.yaml.orjson_dumps(v, *, default)[source]
class datamodel.models.surveys.Phase(*, name: str, id: int, start: int | None = None, end: int | None = None, active: bool = False)[source]

Bases: CoreModel

Pydantic model representing an SDSS phase

Parameters:
  • name (str) – The name of the phase

  • id (int) – The id of the phase

  • start (int) – The year the phase started

  • end (int) – The year the phase ended

  • active (bool) – Whether the phase is currently active

active: bool
end: int | None
id: int
model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str
start: int | None
class datamodel.models.surveys.Phases(root: RootModelRootType = PydanticUndefined)[source]

Bases: BaseList, RootModel[List[Phase]]

Pydantic model representing a list of Phases

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class datamodel.models.surveys.Survey(*, name: str, long: str = None, description: str, phase: int | Phase = None, id: str = None, aliases: list = [])[source]

Bases: CoreModel

Pydantic model representing an SDSS survey

Parameters:
  • name (str) – The short name of the survey

  • long (str) – The full name of the survey

  • description (str) – A description of the survey

  • phase (Phase) – The main phase the survey was in

  • id (str) – An internal reference id for the survey

Raises:

ValueError – when the survey phase is not a valid SDSS Phase

classmethod get_phase(v)[source]

check the phase is a valid SDSS phase

aliases: list
description: str
id: str
long: str
model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str
phase: int | Phase
class datamodel.models.surveys.Surveys(root: RootModelRootType = PydanticUndefined)[source]

Bases: BaseList, RootModel[List[Survey]]

Pydantic model representing a list of Surveys

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class datamodel.models.versions.Tag(*, version: Version, tag: list[str] | str = None, release: str | Release | List[Release], survey: str | Survey)[source]

Bases: CoreModel

Pydantic model representing an SDSS software tag

Parameters:
  • version (Version) – The version key

  • tag (str) – The version tag number or name

  • release (Release) – The SDSS release the tag is associated with

  • survey (Survey) – The SDSS survey the tag is associated with

Raises:
  • ValueError – when the tag release is not a valid SDSS Release

  • ValueError – when the tag survey is not a valid SDSS Survey

classmethod get_release(v)[source]

check the release is a valid SDSS release

classmethod get_survey(v)[source]

check the survey is a valid SDSS survey

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property name

A name for the tag

release: str | Release | List[Release]
survey: str | Survey
tag: list[str] | str
version: Version
class datamodel.models.versions.Tags(root: RootModelRootType = PydanticUndefined)[source]

Bases: BaseList, RootModel[List[Tag]]

Pydantic model representing a list of Tags

group_by(order_by: str = 'release') dict[source]

Group tags by SDSS release or survey

Convert the list of tags to a series of dictionaries, ordered by the SDSS release or survey, with key:value pairs of version_name:tag. Default is to group by release, then survey. With order_by set to survey, grouped by survey, then release. For example, “{‘DR17’: {‘manga’: {‘drpver’: ‘v3_1_1’, ‘dapver’: ‘3.1.0’}}”.

Parameters:

order_by (str, optional) – _description_, by default ‘release’

Returns:

dict – nested dictionary of tags

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class datamodel.models.versions.Version(*, name: str, description: str)[source]

Bases: CoreModel

Pydantic model representing an SDSS version

Parameters:
  • name (str) – The name of the software version key

  • description (str) – A description of the software key

description: str
model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str
class datamodel.models.vacs.VAC(*, name: str)[source]

Bases: BaseModel

Pydantic model presenting an SDSS VAC

Parameters:

name (str) – The environment variable label name of the VAC

Raises:

ValueError – when the release name does not start with a valid SDSS release code

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str
class datamodel.models.vacs.VACS(root: RootModelRootType = PydanticUndefined)[source]

Bases: BaseList, RootModel[List[VAC]]

Pydantic model representing a list of VACs

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class datamodel.models.filetypes.fits.ChangeFits(*, delta_nhdus: int | None = None, added_hdus: List[str] | None = None, removed_hdus: List[str] | None = None, primary_delta_nkeys: int | None = None, added_primary_header_kwargs: List[str] | None = None, removed_primary_header_kwargs: List[str] | None = None)[source]

Bases: CoreModel

Pydantic model representing the FITS hdu fields of the YAML changelog release section

Represents a computed section of the changelog, for the specified release. Changelog is computed between the data products of release (key) and the release indicated in from.

Parameters:
  • delta_nhdus (int) – The difference in number of HDUs

  • added_hdus (List[str]) – A list of any added HDUs

  • removed_hdus (List[str]) – A list of any removed HDUs

  • primary_delta_nkeys (int) – The difference in primary header keywords

  • added_primary_header_kwargs (List[str]) – A list of any added primary header keywords

  • removed_primary_header_kwargs (List[str]) – A list of any removed primary header keywords

added_hdus: List[str] | None
added_primary_header_kwargs: List[str] | None
delta_nhdus: int | None
model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

primary_delta_nkeys: int | None
removed_hdus: List[str] | None
removed_primary_header_kwargs: List[str] | None
class datamodel.models.filetypes.fits.Column(*, name: str, description: str, type: str, unit: str = '')[source]

Bases: CoreModel

Pydantic model representing a YAML column section

Represents a FITS binary table column

Parameters:
  • name (str) – The name of the table column

  • description (str) – A description of the table column

  • type (str) – The data type of the table column

  • unit (str) – The unit of the table column

to_fitscolumn() Column[source]

Convert the column to a fits.Column

Converts the column entry in the yaml file to an Astropy fits.Column object. Performs a mapping between type and format, using the reverse of datamodel.generate.stub.Stub._format_type.

Returns:

fits.Column – a valid astropy fits.Column object

Raises:

TypeError – when the column type cannot be coerced into a valid fits.Column format

description: str
model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str
type: str
unit: str
class datamodel.models.filetypes.fits.HDU(*, name: str, is_image: bool, description: str, size: str = None, header: List[Header] = None, columns: Dict[str, Column] | None = None)[source]

Bases: CoreModel

Pydantic model representing a YAML hdu section

Represents a FITS HDU extension

Parameters:
  • name (str) – The name of the HDU extension

  • is_image (bool) – Whether the HDU is an image extension

  • description (str) – A description of the HDU extension

  • size (str) – An estimated size of the HDU extension

  • header (List[Header]) – A list of header values for the extension

  • columns (Dict[str, Column]) – A list of any binary table columns for the extension

convert_columns() List[Column][source]

Convert the columns dict into a a list of fits.Columns

convert_hdu() PrimaryHDU | ImageHDU | BinTableHDU[source]

Convert the HDU entry into a valid fits.HDU

convert_header() Header[source]

Convert the list of header keys into a fits.Header

columns: Dict[str, Column] | None
description: str
header: List[Header]
is_image: bool
model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str
size: str
class datamodel.models.filetypes.fits.Header(*, key: str, value: str | None = '', comment: str = '')[source]

Bases: CoreModel

Pydantic model representing a YAML header section

Represents an individual FITS Header Key

Parameters:
  • key (str) – The name of the header keyword

  • value (str) – The value of the header keyword

  • comment (str) – A comment for the header keyword, if any

to_tuple()[source]

Convert the header key to a tuple

comment: str
key: str
model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

value: str | None
class datamodel.models.filetypes.par.ChangePar(*, delta_nkeys: int | None = None, addead_header_keys: List[str] | None = None, removed_header_keys: List[str] | None = None, delta_ntables: int | None = None, addead_tables: List[str] | None = None, removed_tables: List[str] | None = None, tables: Dict[str, ChangeTable] | None = None)[source]

Bases: CoreModel

Pydantic model representing the Yanny par fields of the YAML changelog release section

Represents a computed section of the changelog, for the specified release. Changelog is computed between the data products of release (key) and the release indicated in from.

Parameters:
  • delta_nkeys (int) – The difference in number of Yanny header keys

  • added_header_keys (List[str]) – A list of any added Yanny header keywords

  • removed_header_keys (List[str]) – A list of any removed Yanny header keywords

  • delta_tables (int) – The difference in number of Yanny tables

  • added_tables (List[str]) – A list of any added Yanny tables

  • removed_tables (List[str]) – A list of any removed Yanny tables

  • tables (Dict[str, ChangeTable]) – A dictionary of table column and row changes

addead_header_keys: List[str] | None
addead_tables: List[str] | None
delta_nkeys: int | None
delta_ntables: int | None
model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

removed_header_keys: List[str] | None
removed_tables: List[str] | None
tables: Dict[str, ChangeTable] | None
class datamodel.models.filetypes.par.ChangeTable(*, delta_nrows: int | None = None, added_cols: List[str] | None = None, removed_cols: List[str] | None = None)[source]

Bases: CoreModel

Pydantic model representing a YAML changelog Yanny table section

Represents a computed section of the changelog, for a specific Yanny table. For each similar Yanny table between releases, the changes in row number and structure columns are computed.

Parameters:
  • delta_nrows (int) – The difference in rows between Yanny tables

  • added_cols (List[str]) – A list of any added Yanny table columns

  • removed_cols (List[str]) – A list of any removed Yanny table columns

added_cols: List[str] | None
delta_nrows: int | None
model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

removed_cols: List[str] | None
class datamodel.models.filetypes.par.ParColumn(*, name: str, type: str, description: str, unit: str, is_array: bool, is_enum: bool, enum_values: list = None, example: str | int | float | list)[source]

Bases: CoreModel

Pydantic model representing a YAML par column section

Represents a typedef column definition in a Yanny parameter file

Parameters:
  • name (str) – The name of the column

  • description (str) – A description of the column

  • type (str) – The data type of the column

  • unit (str) – The unit of the column, if any

  • is_array (bool) – If the column is an array type

  • is_enum (bool) – If the column is an enum type

  • example (str) – An example value for the column

parse_type()[source]

Parse the yanny YAML column type

description: str
enum_values: list
example: str | int | float | list
is_array: bool
is_enum: bool
model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str
type: str
unit: str
class datamodel.models.filetypes.par.ParModel(*, comments: str = None, header: List[Header] = None, tables: Dict[str, ParTable])[source]

Bases: CoreModel

Pydantic model representing a YAML par section

Represents a Yanny parameter file

Parameters:
  • comments (str) – Any header comments in the parameter file

  • header (list) – A list of header key-value pairs in the parameter file

  • tables (dict) – A dictionary of tables in the parameter file

convert_header() dict[source]

Convert the header into a dicionary

convert_par()[source]

Convert the YAML par section into a Yanny par object

comments: str
header: List[Header]
model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

tables: Dict[str, ParTable]
class datamodel.models.filetypes.par.ParTable(*, name: str, description: str, n_rows: int, structure: List[ParColumn])[source]

Bases: CoreModel

Pydantic model representing a YAML par table section

Represents the structure of a single Yanny parameter table

Parameters:
  • name (str) – The name of the table

  • description (str) – A description of the table

  • n_rows (int) – The number of rows in the table

  • structure (list) – A list of column definitions for the table

convert_table()[source]

Create a dictionary to prepare a Yanny table

create_enum()[source]

Create a Yanny typedef enum string

create_typedef()[source]

Create a Yanny typedef struct string

description: str
model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_rows: int
name: str
structure: List[ParColumn]
class datamodel.models.filetypes.hdf5.ChangeHdf(*, new_libver: tuple | None = None, delta_nattrs: int | None = None, addead_attrs: List[str] | None = None, removed_attrs: List[str] | None = None, delta_nmembers: int | None = None, addead_members: List[str] | None = None, removed_members: List[str] | None = None, members: Dict[str, ChangeMember] | None = None)[source]

Bases: CoreModel

Pydantic model representing the HDF5 fields of the YAML changelog release section

Represents a computed section of the changelog, for the specified release. Changelog is computed between the data products of release (key) and the release indicated in from.

Parameters:
  • new_libver (tuple) – The difference in HDF5 library version

  • delta_nattrs (int) – The difference in the number of HDF5 Attributes

  • added_attrs (List[str]) – A list of any added HDF5 Attributes

  • removed_attrs (List[str]) – A list of any removed HDF5 Attributes

  • delta_nmembers (int) – The difference in number members in HDF5 file

  • added_members (List[str]) – A list of any added HDF5 groups or datasets

  • removed_members (List[str]) – A list of any removed HDF5 groups or datasets

  • members (Dict[str, ChangeMember]) – A dictionary of HDF5 group/dataset member changes

addead_attrs: List[str] | None
addead_members: List[str] | None
delta_nattrs: int | None
delta_nmembers: int | None
members: Dict[str, ChangeMember] | None
model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

new_libver: tuple | None
removed_attrs: List[str] | None
removed_members: List[str] | None
class datamodel.models.filetypes.hdf5.ChangeMember(*, delta_nmembers: int | None = None, delta_nattrs: int | None = None, added_attrs: List[str] | None = None, removed_attrs: List[str] | None = None, delta_ndim: int | None = None, new_shape: tuple | None = None, delta_size: int | None = None)[source]

Bases: CoreModel

Pydantic model representing a YAML changelog HDF5 member section

Represents a computed section of the changelog, for a specific HDF member. For each similar HDF5 member between releases, the changes in member number, attributes, and dataset dimensions, size and shape are computed.

Parameters:
  • delta_nmembers (int) – The difference in member number between HDF5 groups

  • delta_nattrs (int) – The difference in attribute number between HDF5 members

  • added_attrs (List[str]) – A list of any added HDF5 Attributes

  • removed_attrs (List[str]) – A list of any removed HDF5 Attributes The difference in dataset dimension number between HDF5 members

  • new_shape (int) – The difference in dataset shape between HDF5 members

  • delta_size (int) – The difference in dataset size between HDF5 members

added_attrs: List[str] | None
delta_nattrs: int | None
delta_ndim: int | None
delta_nmembers: int | None
delta_size: int | None
model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

new_shape: tuple | None
removed_attrs: List[str] | None
class datamodel.models.filetypes.hdf5.HdfAttr(*, key: str, value: str | int | float | bool = None, comment: str, dtype: str, is_empty: bool = None, shape: tuple | None = <factory>)[source]

Bases: CoreModel

Pydantic model representing a YAML hdfs attrs section

Represents the Attributes of an HDF5 file. Each group or dataset has a set of attributes (attrs), which contains metadata about the group or dataset.

Parameters:
  • key (str) – The name of the attribute

  • value (str) – The value of the attribute

  • comment (str) – A description of the attribute

  • dtype (str) – The numpy dtype of the attribute

  • is_empty (bool) – If the attribute is an HDF5 Empty atribute

  • shape (tuple) – The shape of the attribute, if any

check_value()[source]
comment: str
dtype: str
is_empty: bool
key: str
model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

shape: tuple | None
value: str | int | float | bool
class datamodel.models.filetypes.hdf5.HdfBase(*, name: str, parent: str, object: ~datamodel.models.filetypes.hdf5.HdfEnum, description: str, pytables: bool = None, attrs: ~typing.List[~datamodel.models.filetypes.hdf5.HdfAttr] = <factory>)[source]

Bases: CoreModel

Base Pydantic model representing a YAML hdfs section

Represents of an HDF5 file. Each group or dataset has a set of attributes (attrs), which contains metadata about the group or dataset.

Parameters:
  • name (str) – The name of the attribute

  • parent (str) – The value of the attribute

  • object (str) – A description of the attribute

  • description (str) – The numpy dtype of the attribute

  • pytables (bool) – Flag is object is a PyTables object

  • attrs (list) – If the attribute is an HDF5 Empty object

attrs: List[HdfAttr]
description: str
model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str
object: HdfEnum
parent: str
pytables: bool
class datamodel.models.filetypes.hdf5.HdfDataset(*, name: str, parent: str, object: ~datamodel.models.filetypes.hdf5.HdfEnum, description: str, pytables: bool = None, attrs: ~typing.List[~datamodel.models.filetypes.hdf5.HdfAttr] = <factory>, shape: tuple, size: int, ndim: int, dtype: str, nbytes: int = None, is_virtual: bool = None, is_empty: bool = None)[source]

Bases: HdfBase

Pydantic model representing a YAML HDF Dataset section

Represents a Dataset of an HDF5 file.

Parameters:
  • shape (tuple) – The dimensional shape of the dataset

  • size (int) – The size or number or elements in the dataset

  • ndim (int) – The number of dimensions in the dataset

  • dtype (str) – The numpy dtype of the dataset

  • nbytes (int) – The number of bytes in the dataset

  • is_virutal (bool) – Whether the dataset is virtual

  • is_empty (bool) – Whether the dataset is an HDF5 Empty object

dtype: str
is_empty: bool
is_virtual: bool
model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

nbytes: int
ndim: int
shape: tuple
size: int
class datamodel.models.filetypes.hdf5.HdfEnum(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

Pydantic Enum for HDF5 Group or Dataset

dataset = 'dataset'
group = 'group'
class datamodel.models.filetypes.hdf5.HdfGroup(*, name: str, parent: str, object: ~datamodel.models.filetypes.hdf5.HdfEnum, description: str, pytables: bool = None, attrs: ~typing.List[~datamodel.models.filetypes.hdf5.HdfAttr] = <factory>, n_members: int)[source]

Bases: HdfBase

Pydantic model representing a YAML HDF Group section

Represents a Group of an HDF5 file.

Parameters:

n_members (int) – The number of members in the group

model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_members: int
class datamodel.models.filetypes.hdf5.HdfModel(*, name: str, parent: str, object: ~datamodel.models.filetypes.hdf5.HdfEnum, description: str, pytables: bool = None, attrs: ~typing.List[~datamodel.models.filetypes.hdf5.HdfAttr] = <factory>, n_members: int, libver: tuple = [], members: ~typing.Dict[str, ~datamodel.models.filetypes.hdf5.HdfGroup | ~datamodel.models.filetypes.hdf5.HdfDataset] = <factory>)[source]

Bases: HdfGroup

Pydantic model representing a YAML hfds section

Represents a base HDF5 file, which is also an HDF5 Group. See HdfGroup, HdfDataset, and HdfBase Moodels for more information on the fields.

Parameters:
  • libver (tuple) – The HDF5 library version used to create the file

  • members (dict) – All groups and datasets in the HDF5 file

convert_hdf()[source]

Convert the hdfs to a h5py.File

libver: tuple
members: Dict[str, HdfGroup | HdfDataset]
model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

datamodel.models.validators.check_release(value: dict) str[source]

Validator for datamodel release keys

Validator for yaml “releases” fields. Checks the “releases” keys against valid SDSS releases, from the Releases Model.

Parameters:

value (dict) – the value of the field

Returns:

str – the value of the field

Raises:

ValueError – when the release key is not a valid release

datamodel.models.validators.replace_me(value: str) str[source]

Validator for datamodel text fields

Validator for yaml fields where the string values have the text “replace me” within it. This text indicates a template text that must be replaced.

Parameters:

value (str) – the value of the field

Returns:

str – the value of the field

Raises:

ValueError – when “replace me” is the in the value text

Products

class datamodel.products.product.DataProducts[source]

Bases: FuzzyList

Class of a fuzzy list of SDSS data products

Creates a list of all available SDSS data products that have valid JSON datamodel files, i.e. those in the datamodel/products/json/ directory. All products are lazy-loaded at first for efficiency. Products are automatically loaded with content when the items in the list are accessed.

get_level(level: str) dict[source]

Get products by data level

Get all products for a given data level. The input data level can be any ranking, e.g. “1”, “1.2”, “1.2.3”, etc, and it will return all products that match that level.

Parameters:

level (str) – the data level to retrieve

Returns:

dict – the products for the requested data level

group_by(field: str) dict[source]

Group the products by an attribute

Group all products by either a product attribute, e.g. “releases”, or a field in the underlying JSON model, e.g. “_model.general.environments”. A dotted attribute string is resolved as a set of nested attribute. Returns a dictionary of products grouped by the field, or fields, if the requested field is a list.

Parameters:

field (str) – The name of the attribute or field

Returns:

dict – A dictionary of products grouped by desired field

Example

>>> from datamodel.products import DataProducts
>>> dp = DataProducts()
>>> gg = dp.group_by('releases')
>>> gg
    {"DR15": ...,
     "DR16": ....}
list_products() list[source]

List all data products

load_all()[source]

Load all data products

static mapper(item) str[source]

Override the fuzzy mapper to match on product’s name

class datamodel.products.product.Product(name: str, load: bool = False)[source]

Bases: object

Class for an SDSS data product

Entry point for individual SDSS data products. This class reads in the content from the validated JSON datamodel file, handling deserialization via the pydantic ProductModel. By default, products are lazy-loaded, i.e. they will not load the underlying JSON content. Pass load=True or use load() to manually load the product’s datamodel.

Parameters:
  • name (str) – The file species name of the datamodel

  • load (bool, optional) – If True, loads the model’s JSON content, by default False

classmethod from_file(value: str | Path, load: bool = None) PType[source]

Class method to load a data Product from a JSON datamodel filepath

Parameters:
  • value (Union[str, pathlib.Path]) – The full path to a JSON datamodel file

  • load (bool, optional) – If True, loads the model content on instantiation, by default None

Returns:

PType – A new instance of a Product

get_access(release: str = None) dict[source]

Get the sdss-access information for a given release

Get the “access” entry from the datamodel for a given release. If no release is given, returns the access information for all releases for the product. The access information returned is also the same content as in the products/access/[fileSpecies].access file.

Parameters:

release (str, optional) – The data release to use, by default None

Returns:

dict – the access information from the datamodel

Raises:
  • AttributeError – when “releases” is not set and product is not loaded

  • ValueError – when the specified release is not a valid one for the product

get_content(*args, **kwargs) dict[source]

Returns the entire cached JSON datamodel content

Returns:

dict – The JSON datamodel content

get_example(release: str = 'WORK', expand: bool = True) str[source]

Get the example file from the datamodel

Returns the resolved example filepath for a specified release. By default the SAS environment variable will be expanded, but can optionally return the path unresolved.

Parameters:
  • release (str, optional) – The data release to use, by default “WORK”

  • expand (bool, optional) – If True, expands the SAS environment variable, by default True

Returns:

str – The generated filepath

Raises:
  • AttributeError – when “releases” is not set and product is not loaded

  • ValueError – when the specified release is not a valid one for the product

get_location(release: str = 'WORK', symbolic: bool = False, expand: bool = True, **kwargs) str[source]

Get a file location from the datamodel

Returns a resolved filepath for a specified release. The symbolic location can be given keyword arguments to resolve it to a real filepath. By default the SAS environment variable will be expanded, but can optionally return the path unresolved.

Parameters:
  • name (str) – The type of path to extract. Either “example” or “location”.

  • release (str, optional) – The data release to use, by default “WORK”

  • expand (bool, optional) – If True, expands the SAS environment variable, by default True

  • symbolic (bool, optional) – If True, returns only the symbolic path, by default False

  • kwargs (str) – Any set of keyword arguments needed to resolve the symbolic path

Returns:

str – The generated filepath

Raises:
  • AttributeError – when “releases” is not set and product is not loaded

  • ValueError – when the specified release is not a valid one for the product

get_release(value: str) Release[source]

Get the JSON content for the given product for a given SDSS release

Returns the Pydantic yaml.Release model for a given SDSS release. All JSON keys are accessible as instance attributes. The model can be dumped into a dictionary with the model_dump() method.

Parameters:

value (str) – a valid SDSS release

Returns:

Release – The JSON ReleaseModel content for the given SDSS release

Raises:

ValueError – when the input release is an invalid SDSS release

get_schema(*args, **kwargs) dict[source]

Returns the Pydantic schema datamodel definition

Returns:

dict – The datamodel schema

load() None[source]

Loads the DataModel content into the Product

loader() PType[source]

Contextmanager to temporarily load the product

unload() None[source]

Unloads the DataModel content from the Product

class datamodel.products.product.ReleaseList(the_items: list | dict, use_fuzzy: Callable = None, dottable: bool = True)[source]

Bases: FuzzyList

Class for a fuzzy list of Releases

static mapper(item)[source]

Mapper between list/dict item and rapidfuzz choices

Static method used to map a list’s items or dict’s keys to a string representation used by rapidfuzz for. By default returns an explicit string case of the item. To see the output, view the choices property. Can be overridden to customize what is input into rapidfuzz.

Parameters:

item (Union[str, object]) – Any iterable item, i.e. a list item or dictionary key

Returns:

str – The string representation to use in the choices supplied to rapidfuzz

class datamodel.products.product.SDSSDataModel[source]

Bases: object

Class for the SDSS DataModel

High-level entry point into the SDSS DataModel. Contains accounting of all relevant SDSS phases, surveys, data releases, and available data products.

datamodel.products.product.grouper(field: str, products: list) dict[source]

Group the products by an attribute

Group all products by either a product attribute, e.g. “releases”, or a field in the underlying JSON model, e.g. “_model.general.environments”. A dotted attribute string is resolved as a set of nested attribute. Returns a dictionary of products grouped by the field, or fields, if the requested field is a list.

Parameters:
  • field (str) – The name of the attribute or field

  • products (list) – A list of Products to group by

Returns:

dict – A dictionary of products grouped by desired field

datamodel.products.product.rgetattr(obj: object, attr: str, *args)[source]

recursive getattr for nested attributes

Recursively get attributes from nested classes. See https://stackoverflow.com/questions/31174295/getattr-and-setattr-on-nested-subobjects-chained-properties

datamodel.products.product.sort_function(x)[source]

Sort function for grouping products by field value.

if item is a pydantic model, sort by the model’s name if it has one; otherwise sort by the tuple item

Parameters:

x (tuple) – A tuple containing a product and its corresponding field value.

Returns:

str – The name of the field value

datamodel.products.product.zipper(x: Product, field: str) list[source]

Creates a list of tuples of the Product and its corresponding field value(s).

This function retrieves the value of the specified field from the given product. It creates a list of tuples where each tuple contains the product and the field value. If the field value is a list, it returns a list of (product, item_element) tuples.

This creates an easily sortable list of tuples for grouping.

Parameters:
  • x (Product) – The product from which to retrieve the field value.

  • field (str) – The name of the attribute or field to retrieve from the product.

Returns:

list – A list of tuples to sort

Validate

datamodel.validate.add.add_and_commit(repo: Repo, file: str, message: str = None)[source]

Add and commit a file to the tree

Add and commit a file to the tree repo.

Parameters:
  • repo (Repo) – _description_

  • file (str) – _description_

  • message (str, optional) – _description_, by default None

datamodel.validate.add.clone_tree(branch: str = 'dm_update_tree', local: bool = None, path: str = None) Repo[source]

Clone the tree repo

Clone the tree repo from either an existing local source or cloning the remote repo into a temporary directory.

Parameters:
  • branch (str, optional) – the name of the branch, by default None

  • local (bool, optional) – if True, use a local tree repo, by default None

  • path (str, optional) – a path to check out the tree repo to

Returns:

Repo – the git repo

datamodel.validate.add.get_new_products(release: str = None) tuple[source]

Get new datamodel products for the tree

Retrieves any valid JSON datamodels that do not yet have a corresponding tree entry, i.e. the in_sdss_access field is False.

Parameters:

release (str, optional) – the SDSS release, by default None

Yields:

tuple – The release and access string

datamodel.validate.add.make_branch(repo: Repo, branch: str = 'dm_update_tree') Repo[source]

Make a new branch in the tree repo

Checkout or create a branch in the tree repo.

Parameters:
  • repo (Repo) – the git repo

  • branch (_type_, optional) – the name of the branch, by default “dm_update_tree”

Returns:

Repo – the git repo

datamodel.validate.add.pull_and_push(repo: Repo)[source]

Pull and push the tree repo

Pull and push the current tree repo head to the remote.

Parameters:

repo (Repo) – the git repo

datamodel.validate.add.update_datamodel_access(branch: str = 'dm_update_models', test: bool = None, commit_to_git: bool = False)[source]

Updates the datamodel access info sections

Checks all “new” JSON datamodels for updated access info. Creates a new datamodel instance using the product file species, and updates YAML file and all stubs with the updated access info for the indicated release.

Parameters:
  • branch (str, optional) – the datamodel repo branch name, by default ‘dm_update_models’

  • test (bool, optional) – if set, skips all write operations, by default None

  • commit_to_git (bool, optional) – if set, commits to git, by default False

datamodel.validate.add.update_tree(release: str = None, work_ver: str = None, branch: str = 'dm_update_tree', local: bool = None, test: bool = None, skip_push: bool = False)[source]

Update the tree repo with new paths

Updates the tree repo with new paths for datamodel products. Gets all new JSON datamodels that do not yet have tree paths, and adds them to the PATH ini section of the respective release config file. Clones the tree repo and makes all commits in a new branch, by default ‘dm_update_tree’. Commits and pushes the branch to the remote. Makes a backup of the tree config file before writing any new changes. On failure, the backup is restored.

Use the test flag to skip all write operations and just print the new paths. Use the skip_push flag to bypass the push to the remote.

Parameters:
  • release (str, optional) – the SDSS release, by default None

  • work_ver (str, optional) – the tree config work version, by default None

  • branch (str, optional) – the tree repo branch name, by default ‘dm_update_tree’

  • local (bool, optional) – if set, uses an existing local repo, by default None

  • test (bool, optional) – if set, turns on testing, by default None

  • skip_push (bool, optional) – if set, skips the git push, by default None

datamodel.validate.add.write_comments(cfgfile: str, paths: list)[source]

Update a tree config file

Write a tree config file with new paths added into it. This preserves all comments from the tree ini config file. The list of paths to add is a list of tuples of path_name, path_template. Does not add them if they already exist in the config file.

Parameters:
  • cfgfile (str) – the tree config file path

  • paths (list) – a list of paths to add

datamodel.validate.add.write_no_comments(cfgfile: str, paths: list)[source]

Update a tree config file

Write a tree config file with new paths added into it. This removes all comments from the tree ini config file. The list of paths to add is a list of tuples of path_name, path_template. Does not add them if they already exist in the config file.

Parameters:
  • cfgfile (str) – the tree config file path

  • paths (list) – a list of paths to add

datamodel.validate.check.check_invalid(product: str, data: dict, release: str, verbose: bool = None) tuple | None[source]

Check for an invalid product access path

For a given release, checks the datamodel product access info against the relevant Tree configuration path info for consistency. If the release is “WORK”, checks both the “sdss5” and “sdsswork” configs. If both configs return an invalidation, then the product path is invalid.

Parameters:
  • product (str) – The name of the datamodel product, i.e. file species name

  • data (dict) – The datamodel access info dictionary

  • release (str, optional) – The SDSS data release, by default None

  • verbose (bool, optional) – If True, turn on verbosity, by default None

Returns:

Union[tuple, None] – Either None for a valid path or a tuple of the invalid path info

datamodel.validate.check.check_path(product: str, data: dict, tree: Tree, verbose: bool = None) tuple | None[source]

Checks a product access path

Checks the product access path name is in the list of tree paths. Checks the product access path template is the same as the tree path template. Checks the product access access_string is consistent with the tree path template. For tree paths that start with a special function rather than an environment variable, e.g. @spectrodir|, only checks the common part of the path.

Parameters:
  • product (str) – The name of the datamodel product, i.e. file species name

  • data (dict) – The datamodel access info dictionary

  • tree (Tree, optional) – The SDSS tree config

  • verbose (bool, optional) – If True, turn on verbosity, by default None

Returns:

Union[tuple, None] – None for a valid path or a tuple of the invalid path info

datamodel.validate.check.check_products(release: str = None, verbose: bool = None) None[source]

Validate the data product path information

Checks the datamodel product access path information against the tree path information for consistency. Checks path name, template, and access_string.

Parameters:
  • release (str, optional) – The SDSS data release, by default None

  • verbose (bool, optional) – If True, turn on verbosity, by default None

Raises:

ValueError – when any of the product paths are invalid against tree

datamodel.validate.check.compare_path(a: str, b: str) bool[source]

Compares two paths

Compares two paths for equality. Tries to account for comparison between a path with and without compression suffixes, i.e. “fits” and “fits.gz”. Strips the last suffix from paths with more than one.

Parameters:
  • a (str) – a str filepath or path location

  • b (str) – a str filepath or path location

Returns:

bool – If the two paths are the same

datamodel.validate.check.get_products(release: str = None) dict[source]

Get the access info for all SDSS data products

Get the access datamodel info for all SDSS data products of a given release. If no release specified, returns the info for all products for all releases.

Parameters:

release (str, optional) – The SDSS data release, by default None

Returns:

dict – The datamodel path access information

datamodel.validate.check.yield_products(release: str = None) tuple[source]

Generator to yield the access info for all SDSS data products

Yield the access datamodel info for all SDSS data products of a given release. If no release specified, returns the info for all products for all releases.

Parameters:

release (str, optional) – The SDSS data release, by default None

Returns:

tuple – The product name and its datamodel access dictionary

Yields:

Iterator[tuple] – The product name and its datamodel access dictionary

datamodel.validate.models.revalidate(species: str, release: str = None, verbose: bool = None)[source]

Rewrite JSON datamodels

Rewrites all the datamodel stubs for a given existing file species and release.

Parameters:
  • species (str) – the file species name of a YAML datamodel

  • release (str, optional) – the SDSS release, by default None

  • verbose (bool, optional) – if True, turn on verbosity, by default None

datamodel.validate.models.validate_models()[source]

Check YAML datamodel validation

Checks all YAML datamodels for corresponding validated JSON models.

Raises:

ValueError – when invalidated YAML models are found