datamodel Reference¶
Generate¶
- class datamodel.generate.datamodel.DataModel(tree_ver: str = None, file_spec: str = None, path: str = None, keywords: list = [], env_label: str = None, location: str = None, verbose: bool = None, release: str = None, filename: str = None, access_path_name: str = None, design: bool = False, science_product: bool = None)[source]¶
Bases:
object
Class to enable datamodel file generation for a given product
This class is used to generate valid SDSS datamodel files for a given data product.
- Parameters:
tree_ver (str, optional) – an SDSS Tree configuration name, by default None
file_spec (str, optional) – The name of the file species (or sdss_access path name), by default None
path (str, optional) – A file path template definition, by default None
keywords (list, optional) – A list of path template keyword-value pairs, by default None
env_label (str, optional) – The environment variable name of the file’s location, by default None
location (str, optional) – A path location relative to the environment variable, by default None
verbose (bool, optional) – If True, turn on verbosity logging, by default None
release (str, optional) – The name of the SDSS release the file is a part of, by default None
filename (str, optional) – A full filepath to a real file on disk to create the datamodel for
access_path_name (str, optional) – A name of the path name in sdss_access, if different than the file species name, by default None
design (bool, optional) – If True, indicates the datamodel is in a design phase, by default None
science_product (bool, optional) – If True, indicates the datamodel is a recommended science product, by default None
- Raises:
ValueError – when neither a path nor a (env_label + location) are specified
ValueError – when no path template keywords are specified
- commit_stubs(format: str = None) None [source]¶
Commit the stub files into git
Commit stub files into git. Performs a git pull, commits all stubs into the repo, and attempts a git push. Optionally specify a format to only commit a specific stub.
- Parameters:
format (str, optional) – A stub format to commit, by default None
- design_hdf(name: str = '/', description: str = None, hdftype: str = 'group', attrs=None, ds_shape: tuple = None, ds_dtype: str = None)[source]¶
Wrapper to _design_content, to design a new HDF5 section
Design a new HDF entry for the given datamodel. Specify h5py groups or dataset definitions, with optional list of attributes. Each new entry is added to the members entry in the YAML structure. Use
name
, anddescription
to specify the name and description of each new group or dataset the new table. Usehdftype
to specify a “group” or “dataset” entry. For datasets, useds_shape
,ds_size
, andds_dtype
to specify the shape, size, and dtype of the array dataset.New HDF5 members are added to the datamodel in a flattened structure. To add a new group or dataset as a child to an existing group, specify the full path in
name
, e.g/mygroup/mydataset
.attrs
can be a list of tuples of header keywords, conforming to (key, value, comment, dtype), or list of dictionaries conforming to {“key”: key, “value”: value, “comment”: comment, “dtype”: dtype}.Allowed attribute or dataset dtypes are any valid string representation of numpy dtypes. For example, “<i8”, “int32”, “S10”, etc.
- Parameters:
name (str, optional) – the name of the HDF group or dataset, by default ‘/’
description (str, optional) – a description of the HDF group or dataset, by default None
hdftype (str, optional) – the type of HDF5 object, by default ‘group’
attrs (list, optional) – a list of HDF5 Attributes, by default None
ds_shape (tuple, optional) – the shape of an HDF5 array dataset, by default None
ds_dtype (str, optional) – the dtype of an HDF5 array dataset, by default None
- Raises:
ValueError – when an invalid hdftype is specified
- design_hdu(ext: str = 'primary', extno: int = None, name: str = 'EXAMPLE', description: str = None, header: list | dict | Header = None, columns: List[list | dict | Column] = None, **kwargs)[source]¶
Wrapper to _design_content, to design a new HDU
Design a new astropy HDU for the given datamodel. Specify the extension type
ext
to indicate a PRIMARY, IMAGE, or BINTABLE HDU extension. Each new HDU is added to the YAML structure using next hdu extension id found, or the one provided withextno
. Usename
to specify the name of the HDU extension. Each call to this method will write out the new HDU to the YAML design file.header
can be aHeader
instance, a list of tuples of header keywords, conforming to (keyword, value, comment), or list of dictionaries conforming to {“keyword”: keyword, “value”: value, “comment”: comment}.columns
can be a list ofColumn
objects, a list of tuples minimally conforming to (name, format, unit), or list of dictionaries minimally conforming to {“name”: name, “format”: format, “unit”: unit}. See Astropy’s Binary Table Column Format for the allowed format values. When supplying a list of tuples or dictionaries, can include any number of valid arguments intoColumn
.- Parameters:
ext (str, optional) – the type of HDU to create, by default ‘primary’
extno (int, optional) – the extension number, by default None
name (str, optional) – the name of the HDU extension, by default ‘EXAMPLE’
description (str, optional) – a description for the HDU, by default None
header (Union[list, dict, fits.Header], optional) – valid input to create a Header, by default None
columns (List[Union[list, dict, fits.Column]], optional) – a list of binary table columns, by default None
force (bool) – If True, forces a new design even if the HDU already exists, by default None
**kwargs – additional keyword arguments to pass to the HDU constructor
optional – additional keyword arguments to pass to the HDU constructor
- Raises:
ValueError – when the ext type is not supported
ValueError – when the table columns input is not a list
- design_par(comments: str = None, header: list | dict = None, name: str = None, description: str = None, columns: list = None)[source]¶
Wrapper to _design_content, to design a new Yanny par section
Design a new Yanny par for the given datamodel. Specify Yanny comments, a header section, or a table definition. Each new table is added to the YAML structure. Use
name
, anddescription
to specify the name and description of the new table.comments
can be a single string of comments, with newlines indicated by “\n”.header
can be a dictionary of key-value pairs, a list of tuples of header keywords, conforming to (keyword, value, comment), or list of dictionaries conforming to {“key”: key, “value”: value, “comment”: comment}.The
columns
parameter defines the relevant table columns to add to the file. It can be a list of column names, a list of tuple values conforming to column (name, type, [description]), or a list of dictionaries with keys defined from the complete column yaml definition.Allowed column types are any valid Yanny par types, input as strings, e.g. “int”, “float”, “char”. Array columns can be specified by including the array size in “[]”, e.g. “float[6]”.
- Parameters:
comments (str, optional) – Any comments to add to the file, by default None
header (Union[list, dict], optional) – Keywords to add to the header of the Yanny file, by default None
name (str, optional) – The name of the parameter table
description (str, optional) – A description of the parameter table
columns (list, optional) – A set of Yanny table column definitions
- determine_survey(name_only: bool = False)[source]¶
Attempt to determine the SDSS survey for this datamodel
- classmethod from_file(filename: str, path_name: str = None, tree_ver: str = None, verbose: bool = None) D [source]¶
class method to create a datamodel from an absolute filepath
Creates a DataModel for a given full path to a file. Prompts the user to verify any existing entry in sdss_access for the input file, or to define a new file_species / path_name, symbolic path location, and example variable_name=value key mappings.
- Parameters:
filename (str) – The full path to the file
path_name (str, optional) – The existing sdss_access path name if any, by default None
tree_ver (str, optional) – The SDSS tree version or release associated with the file, by default None
verbose (bool, optional) – If True, creates the DataModel with verbosity, by default None
- Returns:
DataModel – a SDSS DataModel instance
- classmethod from_yaml(species: str, release: str = None, verbose: bool = None, tree_ver: str = None) D [source]¶
class method to create a datamodel from a YAML file species name
Creates a DataModel for a given file species name, from an existing YAML datamodel file. Extracts the abstract path and keyword arguments needed to instantiate a DataModel. Keywords are extracted using the datamodel “location” and “example” fields. The abstract path is extracted from the pre-existing “access_string” field. Fields are pulled from the specified release. If no release specified, it uses the first release it can find from the datamodel. Can optionally specify a tree config version instead for the cases where the WORK release is from the sdss5 config instead of sdsswork. If the tree_ver is set, it supersedes the release keyword.
- Parameters:
- Returns:
DataModel – a SDSS DataModel instance
- Raises:
ValueError – when no yaml file can be found for the file species
ValueError – when no release can be found in the datamodel
ValueError – when no location or example can be found in the datamodel
ValueError – when no path keyword arguments can be extracted
- generate_designed_file(redesign: bool = None, **kwargs)[source]¶
Generate a file from a designed datamodel
Generates a real file on disk from a designed datamodel. If there are any path template keywords, they must be specified here as input keyword arguments to convert the symbolic path / abstract location to a real example location on disk. After generating the file, the datamodel sets
design
to False and exits design mode.- Parameters:
redesign (bool) – If True, re-enters design mode to create a new file
kwargs – Any path keyword arguments to be filled in
- Raises:
KeyError – when there are missing path keywords
AttributeError – when the release is not WORK when in the datamodel design phase
- get_stub(format: str = 'yaml') BaseStub [source]¶
Get a datamodel Stub
Return a datamodel Stub for a given format.
- Parameters:
format (str, optional) – the stub format to return, by default ‘yaml’
- Returns:
BaseStub – an instance of a stub class
- remove_stubs(format: str = None, git: bool = None) None [source]¶
Remove the stub files
Remove all stubs or a stub of a given format.
- write_stubs(format: str = None, force: bool = None, use_cache_release: str = None, full_cache: bool = None, group: str = 'WORK', force_release: str = None) None [source]¶
Write out the stub files
Write out all stubs or a stub of a given format.
- Parameters:
format (str, optional) – A stub format to write out, by default None
force (bool, optional) – If True, forces a rewrite of the entire cached stub content
force_release (str, optional) – A specific release to force a rewrite in the cache
use_cache_release (str, optional) – Specify a cached release to use to copy over custom user content
full_cache (bool, optional, by default None) – If True, use the entire cached YAML release, rather than only the HDUs
group (str, optional) – The release group to use when writing the markdown file, by default “WORK”. Can be “DR”, or “IPL”.
- property file_exists¶
Checks for file existence on disk
- property recommended_science_product: bool¶
Checks if the datamodel product is a recommended science product
- supported_filetypes = ['.fits', '.fit', '.par', '.h5', '.hdf5']¶
- datamodel.generate.datamodel.prompt_for_access(filename: str, path_name: str = None, config: str = None) tuple [source]¶
Prompt the user to verify or define information
Takes the user through a variety of input prompts in order to verify any existing entry in sdss_access, or to define a new file species, symbolic path location, and example variable=value key mappings for the input file.
- Parameters:
- Returns:
tuple – a tuple of path_name, path_template, path_keys
- datamodel.generate.parse.cleanup_dups(kwargs: dict) dict [source]¶
Cleanup duplicate keys in the extracted keywords
Removes the duplicated keywords from the extracted kwargs. If both key values are the same, uses it. If both are digits, attempts to remove any front zero-padding, e.g. “45”, and “”000045” -> “45”.
- Parameters:
kwargs (dict) – the input extracted keywords
- Returns:
dict – reduced keyword dictionary
- datamodel.generate.parse.deduplicate(value: str, names: list) str [source]¶
De-duplicate regex pattern field names
Some paths have duplicate field names, e.g. “run”. The default regex named group replace fails with duplicate field names. To handle this we append each duplicate field name with “_” so the re.groupdict method can work properly.
- datamodel.generate.parse.find_kwargs(location: str, example: str) dict [source]¶
Find and extract keyword arguments
Attempts to extract keyword argumets from an input abstract datamodel path
location
and itsexample
path. The location and example parts must match exactly. For example, given “{mjd}/sdR-{br}{id}-{frame}.fits.gz” and “55049/sdR-b1-00100006.fits.gz”, it returns {‘mjd’: ‘55049’, ‘br’: ‘b’, ‘id’: ‘1’, ‘frame’: ‘00100006’}
- datamodel.generate.parse.get_abstract_key(key: str = None, add_brackets: bool = None) str [source]¶
Sanitize the path keyword name
Sanitizes the path keyword name. Upper cases the keyword name and appends any formatting numbers as an integer to the end of name. E.g. “plate:0>5” is converted to “PLATE5”.
- Parameters:
key (str, optional) – The keywork name, by default None
- Returns:
str – the sanitized keyword name
- datamodel.generate.parse.get_abstract_path(path: str = None, add_brackets: bool = None) str [source]¶
Converts a path template into an abstract path
Converts a path template into an abstract path. Extracts bracketed keywords from a path a template and converts them to named uppercase. For example,
MANGA_SPECTRO_REDUX/{drpver}/{plate}/stack/manga-{plate}-{ifu}-{wave}CUBE.fits.gz
is converted toMANGA_SPECTRO_REDUX/DRPVER/PLATE/stack/manga-PLATE-IFU-WAVECUBE.fits.gz
.- Parameters:
path (str, optional) – the path template, by default None
- Returns:
str – the abstracted path
- datamodel.generate.parse.get_file_spec(file_spec: str = None) str [source]¶
Checks validity of file species string
Checks if the file species name is a valid Python identifier.
- Parameters:
file_spec (str, optional) – the name of the file species, by default None
- Returns:
str – the name of the file species
- datamodel.generate.parse.remap_patterns(value: str) str [source]¶
Remaps regex search patterns for certain fields
Some paths have abutted keywords, i.e. “{br}{id}” or “{dr}{version}”. The default regex search pattern of “.+?” can sometimes handle these but sometimes not. We replace certain fields with specific patterns to help the extraction process.
- Parameters:
value (str) – the input regex search pattern
- Returns:
str – the new regex search pattern
- class datamodel.generate.stub.AccessStub(datamodel=None, use_cache_release: str = None, full_cache: bool = None, verbose: bool = None, force: bool = None)[source]¶
Bases:
BaseStub
- class datamodel.generate.stub.BaseStub(datamodel=None, use_cache_release: str = None, full_cache: bool = None, verbose: bool = None, force: bool = None)[source]¶
Bases:
ABC
- render_content(force: bool = None, force_release: str = None) None [source]¶
Populate the yaml template with generated content
- update_cache(force: bool = None) None [source]¶
Update the in-memory stub cache from the on-disk file
- write(force: bool = None, use_cache_release: str = None, full_cache: bool = None, **kwargs) None [source]¶
- cacheable = False¶
- format = None¶
- has_template = True¶
- class datamodel.generate.stub.JsonStub(datamodel=None, use_cache_release: str = None, full_cache: bool = None, verbose: bool = None, force: bool = None)[source]¶
Bases:
BaseStub
- class datamodel.generate.stub.MdStub(datamodel=None, use_cache_release: str = None, full_cache: bool = None, verbose: bool = None, force: bool = None)[source]¶
Bases:
BaseStub
- get_selected_release(release: str = None, group: str = 'WORK') str [source]¶
get the hdu content for a given release
- render_content(force: bool = None, release: str = None, group: str = 'WORK') None [source]¶
Populate the yaml template with generated content
- class datamodel.generate.stub.YamlStub(datamodel=None, use_cache_release: str = None, full_cache: bool = None, verbose: bool = None, force: bool = None)[source]¶
Bases:
BaseStub
- datamodel.generate.stub.stub_iterator(format: str = None) Iterator[BaseStub] [source]¶
Iterator for all stub formats
Filetypes¶
- class datamodel.generate.filetypes.base.BaseFile(cache: dict, datamodel=None, stub=None, filename: str = None, release: str = None, file_species: str = None, design: bool = None, use_cache_release: str = None, full_cache: bool = None)[source]¶
Bases:
ABC
Base class for supported datamodel file types
This is the abstract base class used for defining new file types to be supported by the sdss datamodel product.
- Parameters:
cache (dict) – The initial yaml cache to be populated.
datamodel (DataModel, optional) – an SDSS datamodel for the file, by default None
stub (Stub, optional) – an datamodel Stub for the file, by default None
filename (str, optional) – the name of file, by default None
release (str, optional) – the data release, by default None
file_species (str, optional) – the file species name, by default None
design (bool, optional) – whether the datamodel is in design mode, by default None
use_cache_release (str, optional) – the release to pull existing cache from, by default None
full_cache (bool, optional) – whether to use the entire previous cache, by default None
- Raises:
ValueError – when datamodel is not provided when (filename, release, file_species) are not provided.
- abstract _generate_new_cache() dict [source]¶
Abstract method to be implemented by subclass, for generating new cache content
This method is used to generate the file content for new datamodel YAML files. It should return a dictionary to be stored as the value of the cache key.
- abstract static _get_designed_object(data: dict)[source]¶
Abstract static method to be implemented by subclass, for creating a valid object from cache
This method is used to create a data object from a designed YAML cache content. It should return a new designed object. Ideally the object should be created through the Pydantic model’s model_validate to ensure proper validation and field type coercion. This method is called by create_from_cache which sets the object as the self._designed_object attribute.
- Parameters:
data (dict) – The YAML cache value for the
cache_key
field
- abstract _update_partial_cache(cached_data: dict, old_cache: dict) dict [source]¶
Abstract method to be implemented by subclass, for partially updating cache content
This method updates the descriptions or comments of the new cached_data with the human-edited fields from the old_cache data. Used when adding a new release to a datamodel and retaining the old descriptions from the previous release. This method should return the cached_data object.
- create_from_cache(release: str = 'WORK')[source]¶
Create a file object from the yaml cache
Converts the cache_key dictionary entry in the YAML cache into a file object.
- Parameters:
release (str, optional) – the name of the data release, by default ‘WORK’
- Returns:
object – a valid file object
- Raises:
ValueError – when the release is not in the cache
ValueError – when the release is not WORK when in the datamodel design phase
- abstract design_content()[source]¶
Abstract method to be implemented by subclass, for designing file content
This method is used to design new content for a YAML datamodel cache for new files from within Python. It should ultimately update the cache line self._cache[‘releases’][‘WORK’][self.cache_key] = [updated_cache_content] with the new content. This method is called by the DataModel’s global design_content method.
- abstract write_design(file: str, overwrite: bool = None)[source]¶
Abstract method to be implemented by subclass, for writing a design to a file
This method is used to write out the designed data object. It should call the self.designed_object’s particular method for writing itself to a file, specific to that filetype.
- aliases = []¶
- cache_key = None¶
- compressions = ['.gz', '.bz2', '.zip']¶
- suffix = None¶
- datamodel.generate.filetypes.base.file_selector(suffix: str = None) BaseFile [source]¶
Selects the correct File class given a file suffix
- datamodel.generate.filetypes.base.format_bytes(value: int = None) str [source]¶
Convert an integer to human-readable format.
- Parameters:
value (int) – An integer representing number of bytes.
- Returns:
str – Size of the file in human-readable format.
- datamodel.generate.filetypes.base.get_filesize(file) str [source]¶
Get the size of the input file.
- Returns:
str – Size of the file in human-readable format.
- datamodel.generate.filetypes.base.get_filetype(file) str [source]¶
Get the extension of the input file.
- Returns:
str – File type in upper case.
- datamodel.generate.filetypes.base.get_supported_filetypes() list [source]¶
Get a list of supported filetypes
Constructs a list of supported filetypes for datamodels, based on the BaseFile subclasses. Collects each subclass file suffix attribute as well as any designated aliases.
- Returns:
list – A list of supported file types
- class datamodel.generate.filetypes.fits.FitsFile(*args, **kwargs)[source]¶
Bases:
BaseFile
Class for supporting FITS files
- design_content(ext: str = 'primary', extno: int = None, name: str = 'EXAMPLE', description: str = None, header: list | dict | Header = None, columns: List[list | dict | Column] = None, **kwargs) None [source]¶
Design a new HDU
Design a new astropy HDU for the given datamodel. Specify the extension type
ext
to indicate a PRIMARY, IMAGE, or BINTABLE HDU extension. Each new HDU is added to the YAML structure using next hdu extension id found, or the one provided withextno
. Usename
to specify the name of the HDU extension.header
can be aHeader
instance, a list of tuples of header keywords, conforming to (keyword, value, comment), or list of dictionaries conforming to {“keyword”: keyword, “value”: value, “comment”: comment}.columns
can be a list ofColumn
objects, a list of tuples minimally conforming to (name, format, unit), or list of dictionaries minimally conforming to {“name”: name, “format”: format, “unit”: unit}. See Astropy’s Binary Table Column Format for the allowed format values. When supplying a list of tuples or dictionaries, can include any number of valid arguments intoColumn
.- Parameters:
ext (str, optional) – the type of HDU to create, by default ‘primary’
extno (int, optional) – the extension number, by default None
name (str, optional) – the name of the HDU extension, by default ‘EXAMPLE’
description (str, optional) – a description for the HDU, by default None
header (Union[list, dict, fits.Header], optional) – valid input to create a Header, by default None
columns (List[Union[list, dict, fits.Column]], optional) – a list of binary table columns, by default None
force (bool) – If True, forces a new design even if the HDU already exists, by default None
**kwargs – additional keyword arguments to pass to the HDU constructor
optional – additional keyword arguments to pass to the HDU constructor
- Raises:
ValueError – when the ext type is not supported
ValueError – when the table columns input is not a list
- write_design(file: str, overwrite: bool = True) None [source]¶
Write out the designed file
Write out a designed fits.HDUList object to a file on disk. Must have run create_from_cache method first.
- Parameters:
- Raises:
AttributeError – when the designed object does not exit
- aliases = ['FIT']¶
- cache_key = 'hdus'¶
- suffix = 'FITS'¶
- class datamodel.generate.filetypes.par.ParFile(*args, **kwargs)[source]¶
Bases:
BaseFile
Class for supporting Yanny par files
- design_content(comments: str = None, header: list | dict = None, name: str = None, description: str = None, columns: list = None) None [source]¶
Design a new Yanny par section
Design a new Yanny par for the given datamodel. Specify Yanny comments, a header section, or a table definition. Each new table is added to the YAML structure. Use
name
, anddescription
to specify the name and description of the new table.comments
can be a single string of comments, with newlines indicated by “\n”header
can be a dictionary of key-value pairs, a list of tuples of header keywords, conforming to (keyword, value, comment), or list of dictionaries conforming to {“key”: key, “value”: value, “comment”: comment}.The
columns
parameter defines the relevant table columns to add to the file. It can be a list of column names, a list of tuple values conforming to column (name, type, [description]), or a list of dictionaries with keys defined from the complete column yaml definition.Allowed column types are any valid Yanny par types, input as strings, e.g. “int”, “float”, “char”. Array columns can be specified by including the array size in “[]”, e.g. “float[6]”. Enum types are defined by setting
is_enum
to True, and by providing a list of possible values viaenum_values
.- Parameters:
comments (str, optional) – Any comments to add to the file, by default None
header (Union[list, dict], optional) – Keywords to add to the header of the Yanny file, by default None
name (str, optional) – The name of the parameter table
description (str, optional) – A description of the parameter table
columns (list, optional) – A set of Yanny table column definitions
- write_design(file: str, overwrite: bool = True) None [source]¶
Write out the designed file
Write out a designed Yanny par object to a file on disk. Must have run create_from_cache method first.
- Parameters:
- Raises:
AttributeError – when the designed object does not exit
- cache_key = 'par'¶
- suffix = 'PAR'¶
- class datamodel.generate.filetypes.hdf5.HdfFile(cache: dict, datamodel=None, stub=None, filename: str = None, release: str = None, file_species: str = None, design: bool = None, use_cache_release: str = None, full_cache: bool = None)[source]¶
Bases:
BaseFile
Class for supporting HDF5 files
- design_content(name: str = '/', description: str = None, hdftype: str = 'group', attrs: list = None, ds_shape: tuple = None, ds_dtype: str = None)[source]¶
Design a new HDF5 section for the datamodel
Design a new HDF entry for the given datamodel. Specify h5py groups or dataset definitions, with optional list of attributes. Each new entry is added to the members entry in the YAML structure. Use
name
, anddescription
to specify the name and description of each new group or dataset the new table. Usehdftype
to specify a “group” or “dataset” entry. For datasets, useds_shape
,ds_size
, andds_dtype
to specify the shape, size, and dtype of the array dataset.New HDF5 members are added to the datamodel in a flattened structure. To add a new group or dataset as a child to an existing group, specify the full path in
name
, e.g/mygroup/mydataset
.attrs
can be a list of tuples of header keywords, conforming to (key, value, comment, dtype), or list of dictionaries conforming to {“key”: key, “value”: value, “comment”: comment, “dtype”: dtype}.Allowed attribute or dataset dtypes are any valid string representation of numpy dtypes. For example, “<i8”, “int32”, “S10”, etc.
- Parameters:
name (str, optional) – the name of the HDF group or dataset, by default ‘/’
description (str, optional) – a description of the HDF group or dataset, by default None
hdftype (str, optional) – the type of HDF5 object, by default ‘group’
attrs (list, optional) – a list of HDF5 Attributes, by default None
ds_shape (tuple, optional) – the shape of an HDF5 array dataset, by default None
ds_dtype (str, optional) – the dtype of an HDF5 array dataset, by default None
- Raises:
ValueError – when an invalid hdftype is specified
- write_design(file: str, overwrite: bool = None) None [source]¶
Write out the designed file
Write out a designed HDF5 object to a file on disk. Must have run create_from_cache method first.
- Parameters:
- Raises:
AttributeError – when the designed object does not exit
- aliases = ['HDF5']¶
- cache_key = 'hdfs'¶
- suffix = 'H5'¶
Changelog¶
- class datamodel.generate.changelog.core.ChangeLog(the_list: list, **kwargs)[source]¶
Bases:
list
Class that holds the change logs for all input files
Contains a list of all FileDiff objects. Mainly used as a container to iterate over many changelogs and generate a string report or dictionary object for each item in the list.
- Parameters:
the_list (list) – A list of FileDiff objects
- class datamodel.generate.changelog.core.FileDiff(file1: str, file2: str, versions: list = None, diff_type: str = None)[source]¶
-
Class that holds the difference between two files
Creates an object that compares the difference between two files. Base class that is subclassed by FitsDiff and CatalogDiff.
- Parameters:
file1 (str) – the filepath to compute the changes against
file2 (str) – the filepath to compute the changes from
versions (list, optional) – the named releases/versions corresponding to the two input files, by default None
diff_type (str, optional) – the object data type of which to compute the difference, by default None
- datamodel.generate.changelog.core.compute_changelog(items: list, change: str = 'fits') ChangeLog [source]¶
Compute the changelogs between a list of datamodels
Given an input list of DataModel objects, computes the differences between them using the on-disk real file location. By default computes the FITS differences.
- datamodel.generate.changelog.core.compute_diff(oldfile: str, otherfile: str, change: str = 'fits', versions: list = None) FileDiff [source]¶
Produce a single changelog between two files
Produce a difference object for two files
- Parameters:
- Returns:
FileDiff – An instance containing the differences between the two files
- Raises:
ValueError – when no valid input filepath is given
- datamodel.generate.changelog.core.diff_selector(suffix: str = None) FileDiff [source]¶
Select the correct class given a file suffix
- class datamodel.generate.changelog.yaml.YamlDiff(content: dict = None, file: str = None)[source]¶
Bases:
ABC
Computes the difference between two releases in YAML cache
Computes the differences in HDU content between releases in a given YAML datamodel file, or cached dictionary.
- Parameters:
- Raises:
ValueError – when no yaml filepath or cache content is provided
ValueError – when no releases can be identified from the yaml content
- abstract _get_changes(version1: str, version2: str, simple: bool = None) dict [source]¶
Abstract method to be implemented by subclass, for generating changelog content
This method is used to construct a dictionary of changes between two releases for the given file YAML content. It should return a dictionary object, minimally of the form {version1: {“from”: version2, “key1”: value1, “key2”: value2, …}} where key1: value1, etc are the custom changes between the two releases. The input version1 is the new release, and version2 is the older release of which to compute the difference.
- compute_changelog(version1: str = 'A', version2: str = 'B', simple: bool = False) dict [source]¶
Compute the changelog between two releases
Computes the changes between two releases in a given YAML cache. Compares the “hdus” entries in each release, and looks for differences in HDU extension number, added or removed HDU extensions, differences in primary header keyword number, and any added or removed primary header keywords.
- Parameters:
- Returns:
dict – a dictionary of found changes
- Raises:
ValueError – when no HDULists are found in the YAML cache
- generate_changelog(order: list = None, simple: bool = False) dict [source]¶
Generate a full changelog dictionary across all releases
Iterate over all releases and generate a complete changelog from one release to another. The release order to compute the changelog can be specified by passing in a desired list of releases to the
order
keyword. Setsimple
to True to produce a cleaner, simpler changelog, containing only non-null entries.
- has_changes(version1: str = 'A', version2: str = 'B') bool [source]¶
Check if there are any changes between two releases
Computes the changelog between two releases and returns a flag if changes are detected. Compares the differences of release “version1” from release “version2”.
- cache_key = None¶
- suffix = None¶
- datamodel.generate.changelog.yaml.yamldiff_selector(suffix: str = None) YamlDiff [source]¶
Select the correct class given a file suffix
- class datamodel.generate.changelog.filetypes.catalog.CatalogDiff(file1: str | Table, file2: str | Table, full: bool = None, versions: list = None)[source]¶
Bases:
FileDiff
Compute the difference between two catalog files
Computes the differences in catalog content between two input ascii catalog files, e.g. CSV. Looks for changes in row number, column number, and any added or removed column names.
- Parameters:
file1 (Union[str, Table]) – the filepath or Table to compute the changes against
file2 (Union[str, Table]) – the filepath or Table to compute the changes from
full (bool, optional) – If True, compute the full Astropy Ascii Table differences, by default None
versions (list, optional) – the named releases/versions corresponding to the two input files, by default None
- get_astropy_diff() str [source]¶
Returns the full Astropy diff using report_diff_values
- Returns:
str – the complete difference between the two catalog files
- report(split: bool = None, full: bool = None) str [source]¶
Print the catalog difference report
Returns the catalog differences as a string blob. Can optionally return the report as a list of string lines.
- suffix = 'CATALOG'¶
- class datamodel.generate.changelog.filetypes.fits.FitsDiff(file1: str | HDUList, file2: str | HDUList, full: bool = None, versions: list = None)[source]¶
Bases:
FileDiff
Compute the difference between two FITS files
Computes the differences in HDUList content between two input FITS files. Looks for changes in HDU extension number, any added or removed HDU extensions, as well as any changes in the primary header keywords.
- Parameters:
file1 (Union[str, fits.HDUList]) – the filepath or HDUList to compute the changes against
file2 (Union[str, fits.HDUList]) – the filepath or HDUList to compute the changes from
full (bool, optional) – If True, compute the full Astropy FITS HDUList differences, by default None
versions (list, optional) – the named releases/versions corresponding to the two input files, by default None
- get_astropy_diff() FITSDiff [source]¶
Returns the full Astropy FITSDiff
- Returns:
fits.FITSDiff – the complete difference between the two FITS files
- report(split: bool = None) str [source]¶
Print the FITS difference report
Returns the FITS differences as a string blob. Can optionally return the report as a list of string lines.
- Parameters:
split (bool, optional) – if True, splits the report into a list of string lines, by default None
- Returns:
str – The difference report as a string blob
- suffix = 'FITS'¶
- class datamodel.generate.changelog.filetypes.fits.YamlFits(content: dict = None, file: str = None)[source]¶
Bases:
YamlDiff
Class for supporting YAML changelog generation for FITS files
- cache_key = 'hdus'¶
- suffix = 'FITS'¶
- class datamodel.generate.changelog.filetypes.par.ParDiff(file1: str | None, file2: str | None, versions: list = None)[source]¶
Bases:
FileDiff
Class for computing differences between Yanny par files
Computes the differences in table content between two Yanny par files. Looks for changes in header keys, table number, and any added or removed keys, tables, or table columns.
- Parameters:
- suffix = 'PAR'¶
Git¶
- class datamodel.gitio.gitio.Git(verbose=None)[source]¶
Bases:
object
Class to run the git commands
Wrapper class to the GitPython package.
- add(path: str = None)[source]¶
Add a file to the git repo
Performs a “git add” on the datamodel repo
- Parameters:
path (str, optional) – the full path of the file to add, by default None
- Raises:
AttributeError – when on the main branch
RuntimeError – when the git command fails
- check_if_untracked(path: str = None) bool [source]¶
Checks if a file is untracked in the git repo
- Parameters:
path (str, optional) – the full path of the file to add, by default None
- Returns:
bool – if the file is untracked
- checkout(branch: str = None)[source]¶
Checks out a branch from the git repo
Performs a “git checkout” on the datamodel repo. If the branch does not exist, it will be created.
- Parameters:
branch (str, optional) – the name of the branch to checkout, by default None
- Raises:
RuntimeError – when the git command fails
- clone(product: str = None, branch: str = None)[source]¶
Clones the git repo
Performs a “git clone” of the datamodel repo.
- Parameters:
- Raises:
RuntimeError – when the git command fails
- commit(message: str = None)[source]¶
Commit a file to the git repo
Performs a “git commit” on the datamodel repo
- Parameters:
message (str, optional) – a git commit message, by default None
- Raises:
AttributeError – when on the main branch
RuntimeError – when the git command fails
- create_new_branch(branch: str = None)[source]¶
Create a new branch
Create a new branch. If no branch name is provided, it will create a branch name based on the email head found in the git user config. If none found, creates a random branch name using a UUID.
- Parameters:
branch (str, optional) – the name of the branch to create, by default None
- fetch()[source]¶
Fetch from Github remote origin
Performs a “git fetch” on the datamodel repo
- Raises:
RuntimeError – when the git command fails
- get_path_location(path: str = None) str [source]¶
Gets the path location
Gets the location relative to the git repo directory of the filepath.
- Parameters:
path (str, optional) – the full path of the file to add, by default None
- Returns:
str – the relative location of the path
- pull()[source]¶
Pull from Github remote origin
Performs a “git pull” on the datamodel repo
- Raises:
RuntimeError – when the current branch does not exist on remote
RuntimeError – when the current repo is dirty
RuntimeError – when the git command fails
- push()[source]¶
Push to Github remote origin
Performs a “git push” on the datamodel repo
- Raises:
RuntimeError – when the current branch does not exist on remote
RuntimeError – when the current repo is dirty
RuntimeError – when the git command fails
- rm(path: str = None)[source]¶
Remove a file from the git repo
Performs a “git rm” on the datamodel repo
- Parameters:
path (str, optional) – the full path of the file to remove, by default None
- Raises:
AttributeError – when on the main branch
RuntimeError – when the git command fails
- property origin¶
the git remote origin
Io¶
- datamodel.io.loaders.get_yaml_files(get: str = None) str | list [source]¶
Get a list of yaml files
Return a list of YAML files in the datamodel directory.
- Parameters:
get (str, optional) – type of yaml file to get, can be “releases” or “products”, by default None
- Returns:
Union[str, list] – The yaml file path or list of yaml file paths
- datamodel.io.loaders.read_yaml(ymlfile: str | Path) dict [source]¶
Opens and reads a YAML file
- Parameters:
ymlfile (Union[str, pathlib.Path]) – a file or pathlib.Path object
- Returns:
dict – the YAML content
- datamodel.io.move.construct_new_path(file: str | Path = None, old_path: str | Path = None, new_path: str | Path = None, release: str = None, kwargs: dict = None) Path [source]¶
Construct a new filepath
Constructs a new filepath, either from an abstract path location and a set of keyword arguments, or from an existing (old) filepath and abstract location.
- Parameters:
file (Union[str, pathlib.Path], optional) – the existing full filepath, by default None
old_path (Union[str, pathlib.Path], optional) – the existing species abstract path, by default None
new_path (Union[str, pathlib.Path], optional) – the new species abstract path, by default None
release (str, optional) – the SDSS release, by default None
kwargs (dict, optional) – a set of path keyword arguments, by default None
- Returns:
pathlib.Path – a full filepath
- datamodel.io.move.dm_move(old: str, new: str, parent: bool = None, symlink: bool = True)[source]¶
_summary_
_extended_summary_
- datamodel.io.move.dm_move_species(abstract_path: str, new_path: str, release: str, parent: bool = None, symlink: bool = True, test: bool = None)[source]¶
Moves all files from a species to a new location
Moves all files from a given file species. Finds all real files that match an existing file species abstract path, and moves them to a new location. The location is determined by the original filename, a new abstract path location, and a given release.
- Parameters:
abstract_path (str) – the existing species abstract path
new_path (str) – the new species abstract path
release (str) – the SDSS release
parent (bool, optional) – flag to move the entire parent directory, by default None
symlink (bool, optional) – flag to create a symlink from new location to old one, by default True
test (bool, optional) – flag to test the move, by default None
- datamodel.io.move.find_files_from_species(path: str) Iterator [source]¶
Find all files species from an abstract path
Finds all files matching the species pattern in a given abstract path.
- Parameters:
path (str) – an abstract file species path
- Returns:
Iterator – Iterator over all matching files found
Models¶
- class datamodel.models.base.BaseList[source]¶
Bases:
BaseModel
Base pydantic class for lists of models
- sort(field: str, key: Callable = None, **kwargs) None [source]¶
Sort the list of models by a pydantic field name
Performs an in-place sort of the Pydantic Models using Python’s built-in
sorted()
method. Sets the newly sorted list to theroot
attribute, to preserve the original BaseList object instance. By default, the input sortkey
to thesorted
function is the field attribute on the model.- Parameters:
field (str) – The Pydantic field name
key (Callable, optional) – a function to be passed into the sorted() function, by default None
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.base.CoreModel[source]¶
Bases:
BaseModel
Custom BaseModel
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- datamodel.models.base.add_repr(schema: Dict[str, Any], model: Type[BaseModel]) None [source]¶
Adds custom information into the schema
- class datamodel.models.releases.Release(*, name: str, description: str, public: bool = False, release_date: str | date = 'unreleased')[source]¶
Bases:
CoreModel
Pydantic model presenting an SDSS release
- Parameters:
name (str) – The name of the release
description (str) – A description of the release
public (bool) – Whether the release is public or not
release_date (
datetime.date
) – The date of the release
- Raises:
ValueError – when the release name does not start with a valid SDSS release code
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.releases.Releases(root: RootModelRootType = PydanticUndefined)[source]¶
Bases:
BaseList
,RootModel[List[Release]]
Pydantic model representing a list of Releases
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.yaml.Access(*, in_sdss_access: bool, path_name: str | None = None, path_template: str | None = None, path_kwargs: List[str] | None = None, access_string: str | None = None)[source]¶
Bases:
CoreModel
Pydantic model representing the YAML releases access section
- Parameters:
in_sdss_access (bool) – Whether or not the data product has an sdss_access entry
path_name (str) – The path name in sdss_access for the data product
path_template (str) – The path template in sdss_access for the data product
path_kwargs (List[str]) – A list of path keywords in the path_template for the data product
access_string (str) – The full sdss_access entry, “path_name=path_template”
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.yaml.ChangeBase(*, from_: str, note: str | None = None)[source]¶
Bases:
CoreModel
Base Pydantic model representing a YAML changelog release section
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.yaml.ChangeLog(*, description: str, releases: Dict[str, ChangeRelease] = None)[source]¶
Bases:
CoreModel
Pydantic model representing the YAML changelog section
- Parameters:
description (str) – A description of the changelog
releases (Dict[str,
ChangeRelease
]) – A dictionary of the file changes between the given release and previous one
- dict(**kwargs)[source]¶
override dict method to exclude none fields by default
Need to override this method as well when serializing YamlModel to json, because nested models are already converted to dict when json.dumps is called. See https://github.com/samuelcolvin/pydantic/issues/1778
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- releases: Dict[str, ChangeRelease]¶
- class datamodel.models.yaml.ChangeRelease(*, from_: str, note: str | None = None, delta_nhdus: int | None = None, added_hdus: List[str] | None = None, removed_hdus: List[str] | None = None, primary_delta_nkeys: int | None = None, added_primary_header_kwargs: List[str] | None = None, removed_primary_header_kwargs: List[str] | None = None, delta_nkeys: int | None = None, addead_header_keys: List[str] | None = None, removed_header_keys: List[str] | None = None, delta_ntables: int | None = None, addead_tables: List[str] | None = None, removed_tables: List[str] | None = None, tables: Dict[str, ChangeTable] | None = None, new_libver: tuple | None = None, delta_nattrs: int | None = None, addead_attrs: List[str] | None = None, removed_attrs: List[str] | None = None, delta_nmembers: int | None = None, addead_members: List[str] | None = None, removed_members: List[str] | None = None, members: Dict[str, ChangeMember] | None = None)[source]¶
Bases:
ChangeHdf
,ChangePar
,ChangeFits
,ChangeBase
Pydantic model representing a YAML changelog release section
Represents a computed section of the changelog, for the specified release. Changelog is computed between the data products of release (key) and the release indicated in
from
.- Parameters:
from (str) – The release the changelog is computed from
delta_nhdus (int) – The difference in number of HDUs
added_hdus (List[str]) – A list of any added HDUs
removed_hdus (List[str]) – A list of any removed HDUs
primary_delta_nkeys (int) – The difference in primary header keywords
added_primary_header_kwargs (List[str]) – A list of any added primary header keywords
removed_primary_header_kwargs (List[str]) – A list of any removed primary header keywords
delta_nkeys (int) – The difference in number of Yanny header keys
added_header_keys (List[str]) – A list of any added Yanny header keywords
removed_header_keys (List[str]) – A list of any removed Yanny header keywords
delta_tables (int) – The difference in number of Yanny tables
added_tables (List[str]) – A list of any added Yanny tables
removed_tables (List[str]) – A list of any removed Yanny tables
tables (Dict[str, ChangeTable]) – A dictionary of table column and row changes
new_libver (tuple) – The difference in HDF5 library version
delta_nattrs (int) – The difference in the number of HDF5 Attributes
added_attrs (List[str]) – A list of any added HDF5 Attributes
removed_attrs (List[str]) – A list of any removed HDF5 Attributes
delta_nmembers (int) – The difference in number members in HDF5 file
added_members (List[str]) – A list of any added HDF5 groups or datasets
removed_members (List[str]) – A list of any removed HDF5 groups or datasets
members (Dict[str, ChangeMember]) – A dictionary of HDF5 group/dataset member changes
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.yaml.GeneralSection(*, name: str, short: str, description: str, environments: List[str] = None, surveys: List[str | Survey] = None, datatype: str | None, filesize: str | None, releases: List[str | Release] = None, naming_convention: str, generated_by: str, design: bool = None, vac: bool = None, recommended_science_product: bool = None, data_level: DataLevel = None)[source]¶
Bases:
CoreModel
Pydantic model representing the YAML general section
- Parameters:
name (str) – The file species name of the data product (or sdss_access path_name)
short (str) – A one sentence summary of the data product
description (str) – A longer description of the data product
environments (List[str]) – A list of environment variables associated with the data product
datatype (str) – The type of data product, e.g. FITS
filesize (str) – An estimated size of the data product
releases (List[str]) – A list of SDSS releases the data product is in
naming_convention (str) – A description of the naming convention
generated_by (str) – An identifiable piece of the code that generates the data product
design (bool) – If True, the datamodel is in the design phase, before any file exists yet
vac (bool) – True if the datamodel is a VAC
recommended_science_product (bool) – True if the product is recommended for science use
data_level (str) – The product level or ranking, as numeral x.y.z
- Raises:
ValueError – when any of the releases are not a valid SDSS Release
- data_level: DataLevel¶
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- releases: List[AnnoRelease]¶
- surveys: List[AnnoSurvey]¶
- class datamodel.models.yaml.ProductModel(*, general: GeneralSection, changelog: ChangeLog, releases: Dict[str, ReleaseModel], notes: str = None, regrets: str = 'I have no regrets!')[source]¶
Bases:
YamlModel
Pydantic model representing a data product JSON file
- Parameters:
general (
GeneralSection
) – The general metadata section of the datamodelchangelog (
ChangeLog
) – An automated log of data product changes across releasesreleases (Dict[str,
ReleaseModel
]) – A dictionary of information specific to that release
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.yaml.ReleaseModel(*, template: str, example: str | None, location: str, environment: str, survey: str = None, access: Access, hdus: Dict[str, HDU] | None = None, par: ParModel | None = None, hdfs: HdfModel | None = None)[source]¶
Bases:
CoreModel
Pydantic model representing an item in the YAML releases section
Contains any information on the data product that is specific to a given release, or that changes across releases.
- Parameters:
template (str) – The full template representation of the path to the data product
example (str) – A real example path of the data product
location (str) – The symbolic location of the data product
environment (str) – The SAS environment variable the product lives under
access (
Access
) – Information on any relevant sdss_access entryhdus (Dict[str,
HDU
]) – A dictionary of HDU content for the product for the given release
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.yaml.YamlModel(*, general: GeneralSection, changelog: ChangeLog, releases: Dict[str, ReleaseModel], notes: str = None, regrets: str = 'I have no regrets!')[source]¶
Bases:
CoreModel
Pydantic model representing a YAML file
- Parameters:
general (
GeneralSection
) – The general metadata section of the datamodelchangelog (
ChangeLog
) – An automated log of data product changes across releasesreleases (Dict[str,
ReleaseModel
]) – A dictionary of information specific to that releasenotes (str) – A string or multi-line text blob of additional information
regrets (str) – A string or multi-line text blob of any regrets over the datamodel
- general: GeneralSection¶
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- releases: Dict[str, ReleaseModel]¶
- datamodel.models.yaml.check_gen_release(value: str) str [source]¶
Validator to check release against list of releases
- datamodel.models.yaml.check_survey(value: str) str [source]¶
Validator to check survey against list of surveys
- class datamodel.models.surveys.Phase(*, name: str, id: int, start: int | None = None, end: int | None = None, active: bool = False)[source]¶
Bases:
CoreModel
Pydantic model representing an SDSS phase
- Parameters:
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.surveys.Phases(root: RootModelRootType = PydanticUndefined)[source]¶
Bases:
BaseList
,RootModel[List[Phase]]
Pydantic model representing a list of Phases
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.surveys.Survey(*, name: str, long: str = None, description: str, phase: int | Phase = None, id: str = None, aliases: list = [])[source]¶
Bases:
CoreModel
Pydantic model representing an SDSS survey
- Parameters:
- Raises:
ValueError – when the survey phase is not a valid SDSS Phase
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.surveys.Surveys(root: RootModelRootType = PydanticUndefined)[source]¶
Bases:
BaseList
,RootModel[List[Survey]]
Pydantic model representing a list of Surveys
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.versions.Tag(*, version: Version, tag: list[str] | str = None, release: str | Release | List[Release], survey: str | Survey)[source]¶
Bases:
CoreModel
Pydantic model representing an SDSS software tag
- Parameters:
- Raises:
ValueError – when the tag release is not a valid SDSS Release
ValueError – when the tag survey is not a valid SDSS Survey
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- property name¶
A name for the tag
- class datamodel.models.versions.Tags(root: RootModelRootType = PydanticUndefined)[source]¶
Bases:
BaseList
,RootModel[List[Tag]]
Pydantic model representing a list of Tags
- group_by(order_by: str = 'release') dict [source]¶
Group tags by SDSS release or survey
Convert the list of tags to a series of dictionaries, ordered by the SDSS release or survey, with key:value pairs of version_name:tag. Default is to group by release, then survey. With
order_by
set tosurvey
, grouped by survey, then release. For example, “{‘DR17’: {‘manga’: {‘drpver’: ‘v3_1_1’, ‘dapver’: ‘3.1.0’}}”.- Parameters:
order_by (str, optional) – _description_, by default ‘release’
- Returns:
dict – nested dictionary of tags
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.versions.Version(*, name: str, description: str)[source]¶
Bases:
CoreModel
Pydantic model representing an SDSS version
- Parameters:
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.vacs.VAC(*, name: str)[source]¶
Bases:
BaseModel
Pydantic model presenting an SDSS VAC
- Parameters:
name (str) – The environment variable label name of the VAC
- Raises:
ValueError – when the release name does not start with a valid SDSS release code
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.vacs.VACS(root: RootModelRootType = PydanticUndefined)[source]¶
Bases:
BaseList
,RootModel[List[VAC]]
Pydantic model representing a list of VACs
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.filetypes.fits.ChangeFits(*, delta_nhdus: int | None = None, added_hdus: List[str] | None = None, removed_hdus: List[str] | None = None, primary_delta_nkeys: int | None = None, added_primary_header_kwargs: List[str] | None = None, removed_primary_header_kwargs: List[str] | None = None)[source]¶
Bases:
CoreModel
Pydantic model representing the FITS hdu fields of the YAML changelog release section
Represents a computed section of the changelog, for the specified release. Changelog is computed between the data products of release (key) and the release indicated in
from
.- Parameters:
delta_nhdus (int) – The difference in number of HDUs
added_hdus (List[str]) – A list of any added HDUs
removed_hdus (List[str]) – A list of any removed HDUs
primary_delta_nkeys (int) – The difference in primary header keywords
added_primary_header_kwargs (List[str]) – A list of any added primary header keywords
removed_primary_header_kwargs (List[str]) – A list of any removed primary header keywords
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.filetypes.fits.Column(*, name: str, description: str, type: str, unit: str = '')[source]¶
Bases:
CoreModel
Pydantic model representing a YAML column section
Represents a FITS binary table column
- Parameters:
- to_fitscolumn() Column [source]¶
Convert the column to a fits.Column
Converts the column entry in the yaml file to an Astropy fits.Column object. Performs a mapping between
type
andformat
, using the reverse ofdatamodel.generate.stub.Stub._format_type
.- Returns:
fits.Column – a valid astropy fits.Column object
- Raises:
TypeError – when the column type cannot be coerced into a valid fits.Column format
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.filetypes.fits.HDU(*, name: str, is_image: bool, description: str, size: str = None, header: List[Header] = None, columns: Dict[str, Column] | None = None)[source]¶
Bases:
CoreModel
Pydantic model representing a YAML hdu section
Represents a FITS HDU extension
- Parameters:
name (str) – The name of the HDU extension
is_image (bool) – Whether the HDU is an image extension
description (str) – A description of the HDU extension
size (str) – An estimated size of the HDU extension
header (List[
Header
]) – A list of header values for the extensioncolumns (Dict[str,
Column
]) – A list of any binary table columns for the extension
- convert_hdu() PrimaryHDU | ImageHDU | BinTableHDU [source]¶
Convert the HDU entry into a valid fits.HDU
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.filetypes.fits.Header(*, key: str, value: str | None = '', comment: str = '')[source]¶
Bases:
CoreModel
Pydantic model representing a YAML header section
Represents an individual FITS Header Key
- Parameters:
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.filetypes.par.ChangePar(*, delta_nkeys: int | None = None, addead_header_keys: List[str] | None = None, removed_header_keys: List[str] | None = None, delta_ntables: int | None = None, addead_tables: List[str] | None = None, removed_tables: List[str] | None = None, tables: Dict[str, ChangeTable] | None = None)[source]¶
Bases:
CoreModel
Pydantic model representing the Yanny par fields of the YAML changelog release section
Represents a computed section of the changelog, for the specified release. Changelog is computed between the data products of release (key) and the release indicated in
from
.- Parameters:
delta_nkeys (int) – The difference in number of Yanny header keys
added_header_keys (List[str]) – A list of any added Yanny header keywords
removed_header_keys (List[str]) – A list of any removed Yanny header keywords
delta_tables (int) – The difference in number of Yanny tables
added_tables (List[str]) – A list of any added Yanny tables
removed_tables (List[str]) – A list of any removed Yanny tables
tables (Dict[str, ChangeTable]) – A dictionary of table column and row changes
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- tables: Dict[str, ChangeTable] | None¶
- class datamodel.models.filetypes.par.ChangeTable(*, delta_nrows: int | None = None, added_cols: List[str] | None = None, removed_cols: List[str] | None = None)[source]¶
Bases:
CoreModel
Pydantic model representing a YAML changelog Yanny table section
Represents a computed section of the changelog, for a specific Yanny table. For each similar Yanny table between releases, the changes in row number and structure columns are computed.
- Parameters:
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.filetypes.par.ParColumn(*, name: str, type: str, description: str, unit: str, is_array: bool, is_enum: bool, enum_values: list = None, example: str | int | float | list)[source]¶
Bases:
CoreModel
Pydantic model representing a YAML par column section
Represents a typedef column definition in a Yanny parameter file
- Parameters:
name (str) – The name of the column
description (str) – A description of the column
type (str) – The data type of the column
unit (str) – The unit of the column, if any
is_array (bool) – If the column is an array type
is_enum (bool) – If the column is an enum type
example (str) – An example value for the column
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.filetypes.par.ParModel(*, comments: str = None, header: List[Header] = None, tables: Dict[str, ParTable])[source]¶
Bases:
CoreModel
Pydantic model representing a YAML par section
Represents a Yanny parameter file
- Parameters:
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.filetypes.par.ParTable(*, name: str, description: str, n_rows: int, structure: List[ParColumn])[source]¶
Bases:
CoreModel
Pydantic model representing a YAML par table section
Represents the structure of a single Yanny parameter table
- Parameters:
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.filetypes.hdf5.ChangeHdf(*, new_libver: tuple | None = None, delta_nattrs: int | None = None, addead_attrs: List[str] | None = None, removed_attrs: List[str] | None = None, delta_nmembers: int | None = None, addead_members: List[str] | None = None, removed_members: List[str] | None = None, members: Dict[str, ChangeMember] | None = None)[source]¶
Bases:
CoreModel
Pydantic model representing the HDF5 fields of the YAML changelog release section
Represents a computed section of the changelog, for the specified release. Changelog is computed between the data products of release (key) and the release indicated in
from
.- Parameters:
new_libver (tuple) – The difference in HDF5 library version
delta_nattrs (int) – The difference in the number of HDF5 Attributes
added_attrs (List[str]) – A list of any added HDF5 Attributes
removed_attrs (List[str]) – A list of any removed HDF5 Attributes
delta_nmembers (int) – The difference in number members in HDF5 file
added_members (List[str]) – A list of any added HDF5 groups or datasets
removed_members (List[str]) – A list of any removed HDF5 groups or datasets
members (Dict[str, ChangeMember]) – A dictionary of HDF5 group/dataset member changes
- members: Dict[str, ChangeMember] | None¶
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.filetypes.hdf5.ChangeMember(*, delta_nmembers: int | None = None, delta_nattrs: int | None = None, added_attrs: List[str] | None = None, removed_attrs: List[str] | None = None, delta_ndim: int | None = None, new_shape: tuple | None = None, delta_size: int | None = None)[source]¶
Bases:
CoreModel
Pydantic model representing a YAML changelog HDF5 member section
Represents a computed section of the changelog, for a specific HDF member. For each similar HDF5 member between releases, the changes in member number, attributes, and dataset dimensions, size and shape are computed.
- Parameters:
delta_nmembers (int) – The difference in member number between HDF5 groups
delta_nattrs (int) – The difference in attribute number between HDF5 members
added_attrs (List[str]) – A list of any added HDF5 Attributes
removed_attrs (List[str]) – A list of any removed HDF5 Attributes The difference in dataset dimension number between HDF5 members
new_shape (int) – The difference in dataset shape between HDF5 members
delta_size (int) – The difference in dataset size between HDF5 members
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.filetypes.hdf5.HdfAttr(*, key: str, value: str | int | float | bool = None, comment: str, dtype: str, is_empty: bool = None, shape: tuple | None = <factory>)[source]¶
Bases:
CoreModel
Pydantic model representing a YAML hdfs attrs section
Represents the Attributes of an HDF5 file. Each group or dataset has a set of attributes (attrs), which contains metadata about the group or dataset.
- Parameters:
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.filetypes.hdf5.HdfBase(*, name: str, parent: str, object: ~datamodel.models.filetypes.hdf5.HdfEnum, description: str, pytables: bool = None, attrs: ~typing.List[~datamodel.models.filetypes.hdf5.HdfAttr] = <factory>)[source]¶
Bases:
CoreModel
Base Pydantic model representing a YAML hdfs section
Represents of an HDF5 file. Each group or dataset has a set of attributes (attrs), which contains metadata about the group or dataset.
- Parameters:
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.filetypes.hdf5.HdfDataset(*, name: str, parent: str, object: ~datamodel.models.filetypes.hdf5.HdfEnum, description: str, pytables: bool = None, attrs: ~typing.List[~datamodel.models.filetypes.hdf5.HdfAttr] = <factory>, shape: tuple, size: int, ndim: int, dtype: str, nbytes: int = None, is_virtual: bool = None, is_empty: bool = None)[source]¶
Bases:
HdfBase
Pydantic model representing a YAML HDF Dataset section
Represents a Dataset of an HDF5 file.
- Parameters:
shape (tuple) – The dimensional shape of the dataset
size (int) – The size or number or elements in the dataset
ndim (int) – The number of dimensions in the dataset
dtype (str) – The numpy dtype of the dataset
nbytes (int) – The number of bytes in the dataset
is_virutal (bool) – Whether the dataset is virtual
is_empty (bool) – Whether the dataset is an HDF5 Empty object
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.filetypes.hdf5.HdfEnum(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
-
Pydantic Enum for HDF5 Group or Dataset
- dataset = 'dataset'¶
- group = 'group'¶
- class datamodel.models.filetypes.hdf5.HdfGroup(*, name: str, parent: str, object: ~datamodel.models.filetypes.hdf5.HdfEnum, description: str, pytables: bool = None, attrs: ~typing.List[~datamodel.models.filetypes.hdf5.HdfAttr] = <factory>, n_members: int)[source]¶
Bases:
HdfBase
Pydantic model representing a YAML HDF Group section
Represents a Group of an HDF5 file.
- Parameters:
n_members (int) – The number of members in the group
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- class datamodel.models.filetypes.hdf5.HdfModel(*, name: str, parent: str, object: ~datamodel.models.filetypes.hdf5.HdfEnum, description: str, pytables: bool = None, attrs: ~typing.List[~datamodel.models.filetypes.hdf5.HdfAttr] = <factory>, n_members: int, libver: tuple = [], members: ~typing.Dict[str, ~datamodel.models.filetypes.hdf5.HdfGroup | ~datamodel.models.filetypes.hdf5.HdfDataset] = <factory>)[source]¶
Bases:
HdfGroup
Pydantic model representing a YAML hfds section
Represents a base HDF5 file, which is also an HDF5 Group. See HdfGroup, HdfDataset, and HdfBase Moodels for more information on the fields.
- Parameters:
- members: Dict[str, HdfGroup | HdfDataset]¶
- model_config: ClassVar[ConfigDict] = {'json_schema_extra': <function add_repr>}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].
- datamodel.models.validators.check_release(value: dict) str [source]¶
Validator for datamodel release keys
Validator for yaml “releases” fields. Checks the “releases” keys against valid SDSS releases, from the Releases Model.
- Parameters:
value (dict) – the value of the field
- Returns:
str – the value of the field
- Raises:
ValueError – when the release key is not a valid release
- datamodel.models.validators.replace_me(value: str) str [source]¶
Validator for datamodel text fields
Validator for yaml fields where the string values have the text “replace me” within it. This text indicates a template text that must be replaced.
- Parameters:
value (str) – the value of the field
- Returns:
str – the value of the field
- Raises:
ValueError – when “replace me” is the in the value text
Products¶
- class datamodel.products.product.DataProducts[source]¶
Bases:
FuzzyList
Class of a fuzzy list of SDSS data products
Creates a list of all available SDSS data products that have valid JSON datamodel files, i.e. those in the
datamodel/products/json/
directory. All products are lazy-loaded at first for efficiency. Products are automatically loaded with content when the items in the list are accessed.- get_level(level: str) dict [source]¶
Get products by data level
Get all products for a given data level. The input data level can be any ranking, e.g. “1”, “1.2”, “1.2.3”, etc, and it will return all products that match that level.
- Parameters:
level (str) – the data level to retrieve
- Returns:
dict – the products for the requested data level
- group_by(field: str) dict [source]¶
Group the products by an attribute
Group all products by either a product attribute, e.g. “releases”, or a field in the underlying JSON model, e.g. “_model.general.environments”. A dotted attribute string is resolved as a set of nested attribute. Returns a dictionary of products grouped by the field, or fields, if the requested field is a list.
- Parameters:
field (str) – The name of the attribute or field
- Returns:
dict – A dictionary of products grouped by desired field
Example
>>> from datamodel.products import DataProducts >>> dp = DataProducts() >>> gg = dp.group_by('releases') >>> gg {"DR15": ..., "DR16": ....}
- class datamodel.products.product.Product(name: str, load: bool = False)[source]¶
Bases:
object
Class for an SDSS data product
Entry point for individual SDSS data products. This class reads in the content from the validated JSON datamodel file, handling deserialization via the pydantic
ProductModel
. By default, products are lazy-loaded, i.e. they will not load the underlying JSON content. Passload=True
or useload()
to manually load the product’s datamodel.- Parameters:
- classmethod from_file(value: str | Path, load: bool = None) PType [source]¶
Class method to load a data Product from a JSON datamodel filepath
- Parameters:
value (Union[str, pathlib.Path]) – The full path to a JSON datamodel file
load (bool, optional) – If True, loads the model content on instantiation, by default None
- Returns:
PType – A new instance of a Product
- get_access(release: str = None) dict [source]¶
Get the sdss-access information for a given release
Get the “access” entry from the datamodel for a given release. If no release is given, returns the access information for all releases for the product. The access information returned is also the same content as in the
products/access/[fileSpecies].access
file.- Parameters:
release (str, optional) – The data release to use, by default None
- Returns:
dict – the access information from the datamodel
- Raises:
AttributeError – when “releases” is not set and product is not loaded
ValueError – when the specified release is not a valid one for the product
- get_content(*args, **kwargs) dict [source]¶
Returns the entire cached JSON datamodel content
- Returns:
dict – The JSON datamodel content
- get_example(release: str = 'WORK', expand: bool = True) str [source]¶
Get the example file from the datamodel
Returns the resolved example filepath for a specified release. By default the SAS environment variable will be expanded, but can optionally return the path unresolved.
- Parameters:
- Returns:
str – The generated filepath
- Raises:
AttributeError – when “releases” is not set and product is not loaded
ValueError – when the specified release is not a valid one for the product
- get_location(release: str = 'WORK', symbolic: bool = False, expand: bool = True, **kwargs) str [source]¶
Get a file location from the datamodel
Returns a resolved filepath for a specified release. The symbolic location can be given keyword arguments to resolve it to a real filepath. By default the SAS environment variable will be expanded, but can optionally return the path unresolved.
- Parameters:
name (str) – The type of path to extract. Either “example” or “location”.
release (str, optional) – The data release to use, by default “WORK”
expand (bool, optional) – If True, expands the SAS environment variable, by default True
symbolic (bool, optional) – If True, returns only the symbolic path, by default False
kwargs (str) – Any set of keyword arguments needed to resolve the symbolic path
- Returns:
str – The generated filepath
- Raises:
AttributeError – when “releases” is not set and product is not loaded
ValueError – when the specified release is not a valid one for the product
- get_release(value: str) Release [source]¶
Get the JSON content for the given product for a given SDSS release
Returns the Pydantic yaml.Release model for a given SDSS release. All JSON keys are accessible as instance attributes. The model can be dumped into a dictionary with the
model_dump()
method.- Parameters:
value (str) – a valid SDSS release
- Returns:
Release – The JSON ReleaseModel content for the given SDSS release
- Raises:
ValueError – when the input release is an invalid SDSS release
- class datamodel.products.product.ReleaseList(the_items: list | dict, use_fuzzy: Callable = None, dottable: bool = True)[source]¶
Bases:
FuzzyList
Class for a fuzzy list of Releases
- static mapper(item)[source]¶
Mapper between list/dict item and rapidfuzz choices
Static method used to map a list’s items or dict’s keys to a string representation used by
rapidfuzz
for. By default returns an explicit string case of the item. To see the output, view thechoices
property. Can be overridden to customize what is input intorapidfuzz
.
- class datamodel.products.product.SDSSDataModel[source]¶
Bases:
object
Class for the SDSS DataModel
High-level entry point into the SDSS DataModel. Contains accounting of all relevant SDSS phases, surveys, data releases, and available data products.
- datamodel.products.product.grouper(field: str, products: list) dict [source]¶
Group the products by an attribute
Group all products by either a product attribute, e.g. “releases”, or a field in the underlying JSON model, e.g. “_model.general.environments”. A dotted attribute string is resolved as a set of nested attribute. Returns a dictionary of products grouped by the field, or fields, if the requested field is a list.
- datamodel.products.product.rgetattr(obj: object, attr: str, *args)[source]¶
recursive getattr for nested attributes
Recursively get attributes from nested classes. See https://stackoverflow.com/questions/31174295/getattr-and-setattr-on-nested-subobjects-chained-properties
- datamodel.products.product.sort_function(x)[source]¶
Sort function for grouping products by field value.
if item is a pydantic model, sort by the model’s name if it has one; otherwise sort by the tuple item
- Parameters:
x (tuple) – A tuple containing a product and its corresponding field value.
- Returns:
str – The name of the field value
- datamodel.products.product.zipper(x: Product, field: str) list [source]¶
Creates a list of tuples of the Product and its corresponding field value(s).
This function retrieves the value of the specified field from the given product. It creates a list of tuples where each tuple contains the product and the field value. If the field value is a list, it returns a list of (product, item_element) tuples.
This creates an easily sortable list of tuples for grouping.
Validate¶
- datamodel.validate.add.add_and_commit(repo: Repo, file: str, message: str = None)[source]¶
Add and commit a file to the tree
Add and commit a file to the tree repo.
- datamodel.validate.add.clone_tree(branch: str = 'dm_update_tree', local: bool = None, path: str = None) Repo [source]¶
Clone the tree repo
Clone the tree repo from either an existing local source or cloning the remote repo into a temporary directory.
- datamodel.validate.add.get_new_products(release: str = None) tuple [source]¶
Get new datamodel products for the tree
Retrieves any valid JSON datamodels that do not yet have a corresponding tree entry, i.e. the
in_sdss_access
field is False.- Parameters:
release (str, optional) – the SDSS release, by default None
- Yields:
tuple – The release and access string
- datamodel.validate.add.make_branch(repo: Repo, branch: str = 'dm_update_tree') Repo [source]¶
Make a new branch in the tree repo
Checkout or create a branch in the tree repo.
- Parameters:
repo (Repo) – the git repo
branch (_type_, optional) – the name of the branch, by default “dm_update_tree”
- Returns:
Repo – the git repo
- datamodel.validate.add.pull_and_push(repo: Repo)[source]¶
Pull and push the tree repo
Pull and push the current tree repo head to the remote.
- Parameters:
repo (Repo) – the git repo
- datamodel.validate.add.update_datamodel_access(branch: str = 'dm_update_models', test: bool = None, commit_to_git: bool = False)[source]¶
Updates the datamodel access info sections
Checks all “new” JSON datamodels for updated access info. Creates a new datamodel instance using the product file species, and updates YAML file and all stubs with the updated access info for the indicated release.
- datamodel.validate.add.update_tree(release: str = None, work_ver: str = None, branch: str = 'dm_update_tree', local: bool = None, test: bool = None, skip_push: bool = False)[source]¶
Update the tree repo with new paths
Updates the tree repo with new paths for datamodel products. Gets all new JSON datamodels that do not yet have tree paths, and adds them to the PATH ini section of the respective release config file. Clones the tree repo and makes all commits in a new branch, by default ‘dm_update_tree’. Commits and pushes the branch to the remote. Makes a backup of the tree config file before writing any new changes. On failure, the backup is restored.
Use the test flag to skip all write operations and just print the new paths. Use the skip_push flag to bypass the push to the remote.
- Parameters:
release (str, optional) – the SDSS release, by default None
work_ver (str, optional) – the tree config work version, by default None
branch (str, optional) – the tree repo branch name, by default ‘dm_update_tree’
local (bool, optional) – if set, uses an existing local repo, by default None
test (bool, optional) – if set, turns on testing, by default None
skip_push (bool, optional) – if set, skips the git push, by default None
- datamodel.validate.add.write_comments(cfgfile: str, paths: list)[source]¶
Update a tree config file
Write a tree config file with new paths added into it. This preserves all comments from the tree ini config file. The list of paths to add is a list of tuples of path_name, path_template. Does not add them if they already exist in the config file.
- datamodel.validate.add.write_no_comments(cfgfile: str, paths: list)[source]¶
Update a tree config file
Write a tree config file with new paths added into it. This removes all comments from the tree ini config file. The list of paths to add is a list of tuples of path_name, path_template. Does not add them if they already exist in the config file.
- datamodel.validate.check.check_invalid(product: str, data: dict, release: str, verbose: bool = None) tuple | None [source]¶
Check for an invalid product access path
For a given release, checks the datamodel product access info against the relevant Tree configuration path info for consistency. If the release is “WORK”, checks both the “sdss5” and “sdsswork” configs. If both configs return an invalidation, then the product path is invalid.
- Parameters:
- Returns:
Union[tuple, None] – Either None for a valid path or a tuple of the invalid path info
- datamodel.validate.check.check_path(product: str, data: dict, tree: Tree, verbose: bool = None) tuple | None [source]¶
Checks a product access path
Checks the product access path name is in the list of tree paths. Checks the product access path template is the same as the tree path template. Checks the product access access_string is consistent with the tree path template. For tree paths that start with a special function rather than an environment variable, e.g. @spectrodir|, only checks the common part of the path.
- Parameters:
- Returns:
Union[tuple, None] – None for a valid path or a tuple of the invalid path info
- datamodel.validate.check.check_products(release: str = None, verbose: bool = None) None [source]¶
Validate the data product path information
Checks the datamodel product access path information against the tree path information for consistency. Checks path name, template, and access_string.
- Parameters:
- Raises:
ValueError – when any of the product paths are invalid against tree
- datamodel.validate.check.compare_path(a: str, b: str) bool [source]¶
Compares two paths
Compares two paths for equality. Tries to account for comparison between a path with and without compression suffixes, i.e. “fits” and “fits.gz”. Strips the last suffix from paths with more than one.
- datamodel.validate.check.get_products(release: str = None) dict [source]¶
Get the access info for all SDSS data products
Get the access datamodel info for all SDSS data products of a given release. If no release specified, returns the info for all products for all releases.
- Parameters:
release (str, optional) – The SDSS data release, by default None
- Returns:
dict – The datamodel path access information
- datamodel.validate.check.yield_products(release: str = None) tuple [source]¶
Generator to yield the access info for all SDSS data products
Yield the access datamodel info for all SDSS data products of a given release. If no release specified, returns the info for all products for all releases.
- Parameters:
release (str, optional) – The SDSS data release, by default None
- Returns:
tuple – The product name and its datamodel access dictionary
- Yields:
Iterator[tuple] – The product name and its datamodel access dictionary
- datamodel.validate.models.revalidate(species: str, release: str = None, verbose: bool = None)[source]¶
Rewrite JSON datamodels
Rewrites all the datamodel stubs for a given existing file species and release.
- datamodel.validate.models.validate_models()[source]¶
Check YAML datamodel validation
Checks all YAML datamodels for corresponding validated JSON models.
- Raises:
ValueError – when invalidated YAML models are found