Generating Datamodels¶

Here we describe the process of generating new datamodels for existing SDSS files.

Supported Filetypes¶

Currently the datamodel product supports generating datamodels for the following filetypes:

FITS: a common astronomy data format
“Yanny” parameter files: human- and machine-readable ASCII parameter (.par) files
HDF5: Hierarchical Data Format (.h5) files

The basic procedure for generating datamodels is the same, regardless of filetype. All of the following code examples below are for generating datamodels for FITS files. The same procedure can be used for generating datamodels for other supported file types. See the Yanny Parameter files section for explicit differences on Yanny files. See the HDF5 Files section for explicit differences on HDF5 files. See Examples for code and output YAML datamodels for supported filetypes.

Generating a datamodel¶

Datamodels are documented metadata representations of data products, e.g. FITS files. Since pipelines can produce many different files of the same data product, e.g. with different input parameters, for different target names or different pipeline analysis settings, we generate a single datamodel file for a set, or “species”, of data products. This “file_species” is a representative name for all files of a given data product, e.g. all MaNGA IFU data cubes get a single mangaCube file species name. Associated with a file species is a symbolic path that is a representative file path for all files of that data product. For example, the symbolic path to MaNGA IFU data cubes is $MANGA_SPECTRO_REDUX/{drpver}/{plate}/stack/manga-{plate}-{ifu}-{wave}CUBE.fits.gz, where the items in brackets, e.g. {plate} are variables to be substituted and represent parameters that can change for individual files or files across different releases. An example of a resolved path would be $MANGA_SPECTRO_REDUX/v2_4_3/8485/stack/manga-8485-1901-LOGCUBE.fits.gz. The file species and symbolic path are similar to the syntax of entries in the sdss_access and tree products. See the Tree Path Template Syntax for more information.

Required Inputs:

file_species: A short name of the “species” of the file, similar to an sdss_access entry name
path: An abstract file path, starting with a root environment variable, and using Jinja2 template variable syntax, similar to an sdss_access entry path
keywords: A list of keyword-value pairs, of an example file, matching the path template variable names

Release Inputs:

tree_ver: the SDSS tree configuration name the data product is associated with. Useful when working with modules or “work” releases.
release: the data release of the data product

All the following examples walk through the creation of a datamodel for the MaNGA Row-Stacked Spectra (RSS) data product. Let’s create the datamodel for the MaNGA RSS file for Data Release 15 (DR15).

CLI

Generate a datamodel using the command-line:

$ datamodel generate -f mangaRss \
-p MANGA_SPECTRO_REDUX/{drpver}/{plate}/stack/manga-{plate}-{ifu}-{wave}RSS.fits.gz \
-k plate=8485 -k ifu=1901 -k drpver=v2_4_3 -k wave=LOG -r DR15

Python

or from within Python

from datamodel.generate import DataModel

# define the inputs
file_species = "mangaRss"
path = "MANGA_SPECTRO_REDUX/{drpver}/{plate}/stack/manga-{plate}-{ifu}-{wave}RSS.fits.gz"
keys = ['plate=8485', 'ifu=1901', 'drpver=v2_4_3', 'wave=LOG']

# generate a datamodel for Data Release 15 (DR15)
dm = DataModel(file_spec=file_species, path=path, keywords=keys, release='DR15')

# write out the stub files
dm.write_stubs()

As inputs, we pass in the name of the file species, the symbolic path to the file, the list of example keyword-value pairs, and the release we’re interested in. See the datamodel generate cli for a full list of command-line arguments.

Note

In 0.2.1, the cli code and syntax changed from argparse to click. See Cli Differences for more.

After we run the command, a stub YAML datamodel file will be created. The code will also attempt to write a valid markdown file, a JSON file, and access file. These files are automatically produced and do not require any user modification. During the initial YAML file creation, it will produce an unvalidated YAML file. The additional files only get written out if, and when, the YAML file is validated. See The YAML structure and Validating datamodels below for the next steps.

A Note on Releases¶

Note

As of datamodel >= 1.0, the datamodel uses the new tree, which removes all MaNGA MPL configs, and replaces the “sdss5” config as the new “sdsswork”. “sdsswork” config now refers to all work paths for SDSS-V. For older versions of the datamodel code, SDSS-V paths must still use the “sdss5” config.

The release keyword argument is used to specify which internal (IPL) or public data release (DR) to use for the product datamodel generation. The allowed values are any release specified in the Releases Metadata Models. For a complete list of current releases, see the Release Models.

For products in development, there is an available WORK release. A WORK release should be used for any products that have not yet been released in a DR or IPL, i.e. any products defined in the sdsswork tree configurations. A datamodel for a WORK release product represents the latest version of that data product and should be considered an in-flux datamodel. As the product itself changes, a new WORK datamodel should be regenerated to reflect the updated changes.

By default, a WORK release will use the SDSS-V tree configuration, sdsswork.cfg. Using the legacy sdss5.cfg syntax will default back to sdsswork.cfg. Legacy work paths to SDSS-IV products no longer exist, but need datamodel < 1.0 and tree < 4.0 to build them.

The YAML structure¶

The YAML file is the main entry point for adding custom content, and is the only file you will need to modify. The structure of the YAML is broken up into the following sections:

general - section containing general information and metadata on the data product
changelog - automatically populated section containing any FITS file changes between data releases
releases - section of information specific for a release
- access - a section containing information on any existing sdss_access entry
- hdus - a section for each HDU in the FITS file (only for FITS files)
- par - a section containing the header and table content in the par file (only for Yanny files)
- hdfs - a section containing the HDF5 file content and member info (only for HDF5 files)
notes - a section containing any additional information or caveats on the data product, as multi-line text
regrets - a section containing any regrets on the data product, as multi-line text

Most of the YAML content is automatically generated. Values containing the text replace me are areas to be replaced with user custom content, e.g. descriptions of the data product, individual descriptions of HDU content, column units, etc. A truncated example of the newly created unvalidated datamodel/products/yaml/mangaRSS.yaml file is below:

FITS Yaml

Example yaml datamodel for the MaNGA RSS FITS file, shortened for brevity

general:
  name: mangaRss
  short: replace me - with a short one sentence summary of file
  description: replace me - with a longer description of the data product
  datatype: FITS
  filesize: 14 MB
  releases:
    - DR15
  environments:
    - MANGA_SPECTRO_REDUX
  naming_convention: replace me - with $MANGA_SPECTRO_REDUX/[DRPVER]/[PLATE]/stack/manga-[PLATE]-[IFU]-[WAVE]RSS.fits.gz
    or manga-8485-1901-LOGRSS.fits.gz but with regex pattern matches
  generated_by: replace me - with the name(s) of any git or svn product(s) that produces
    this product.
changelog:
  description: Describes changes to the datamodel product and/or file structure from
    one release to another
  releases: {}
releases:
  DR15:
    template: $MANGA_SPECTRO_REDUX/[DRPVER]/[PLATE]/stack/manga-[PLATE]-[IFU]-[WAVE]RSS.fits.gz
    example: v2_4_3/8485/stack/manga-8485-1901-LOGRSS.fits.gz
    location: '{drpver}/{plate}/stack/manga-{plate}-{ifu}-{wave}RSS.fits.gz'
    environment: MANGA_SPECTRO_REDUX
    access:
      in_sdss_access: true
      path_name: mangarss
      path_template: $MANGA_SPECTRO_REDUX/{drpver}/{plate}/stack/manga-{plate}-{ifu}-{wave}RSS.fits.gz
      path_kwargs:
        - plate
        - drpver
        - wave
        - ifu
      access_string: mangaRss = $MANGA_SPECTRO_REDUX/{drpver}/{plate}/stack/manga-{plate}-{ifu}-{wave}RSS.fits.gz
    hdus:
      hdu0:
        name: PRIMARY
        description: replace me description
        is_image: true
        size: 0 bytes
        header:
          - key: SIMPLE
            value: true
            comment: ''
      hdu1:
        ...

Validating datamodels¶

When we first create a datamodel, we will get an unvalidated YAML file. In the above example, we get a new YAML file at datamodel/products/yaml/mangaRss.yaml. During the creation, you may see some log output in the terminal of something like the following:

[INFO]: Preparing datamodel: <DataModel(file_species='mangaRss', release='WORK')>.
[INFO]: Creating stub: <Stub(format="yaml", file_species="mangaRss", release="WORK")>
[INFO]: Creating stub: <Stub(format="access", file_species="mangaRss", release="WORK")>
[ERROR]: 148 validation errors for YamlModel
general -> short
  Generic text needs to be replaced with specific content! (type=value_error)
general -> description
  Generic text needs to be replaced with specific content! (type=value_error)
general -> naming_convention
  Generic text needs to be replaced with specific content! (type=value_error)
...
[INFO]: yaml cache is not validated!
[INFO]: No cache content to write out!

This indicates there are validation errors in the YAML file, and the remaining stubs cannot be produced. At this stage, we need to resolve all validation errors, e.g. supplying required information, or replacing all generic text with custom user content. Once a YAML file is validated, we re-run the same datamodel_generate command from above to produce the remaining files in datamodel/products/:

md/mangaRss.md: the markdown file for human-readable representation on the DSI
json/mangaRss.json: a machine-readable JSON file for the datamodel python package
access/mangaRss.access: a subset YAML file containing access information

When writing out the stubs, a successfully valid YAML will produce the following verbose output:

[INFO]: Preparing datamodel: <DataModel(file_species='mangaRss', release='DR15')>.
[INFO]: Creating stub: <Stub(format="yaml", file_species="mangaRss", release="DR15")>
[INFO]: Creating stub: <Stub(format="access", file_species="mangaRss", release="DR15")>
[INFO]: Creating stub: <Stub(format="md", file_species="mangaRss", release="DR15")>
[INFO]: Creating stub: <Stub(format="json", file_species="mangaRss", release="DR15")>

Recommended Science Product¶

Datamodels include a boolean flag recommended_science_product that indicates if the data product is recommended for science. This flag is used to differentiate “final” science-ready pipeline products, such as catalog summary files or final calibrated spectral products, from products such as raw input products or intermediate products produced by a pipeline. By default, all VACs are considered recommended for science and have this flag set to True. Otherwise by default this flag is set to False.

You can manually set this flag either in the YAML file after it’s created, or when generating the initial YAML file, e.g.

CLI

From the command-line, specify the -s or --science_product flag

$ datamodel generate -f mangaRss \
-p MANGA_SPECTRO_REDUX/{drpver}/{plate}/stack/manga-{plate}-{ifu}-{wave}RSS.fits.gz \
-k plate=8485 -k ifu=1901 -k drpver=v3_1_1 -k wave=LOG -s

Python

From within Python, set the science_product keyword to True

dm = DataModel(file_spec=file_species, path=path, keywords=keys, science_product=True)
dm.write_stubs()

Generating a datamodel by file¶

You can also generate a datamodel using only a filename. In this mode, you will be given a series of prompts asking you to either define the file_species, path, and keywords, or to look up an existing sdss_access entry.

To generate a datamodel by file, for DR15

CLI

$ datamodel generate -r DR15 \
-n /Users/Brian/Work/sdss/sas/dr15/manga/spectro/redux/v2_4_3/8485/stack/manga-8485-1901-LOGRSS.fits.gz

Python

from datamodel.generate import DataModel

ff='/Users/Brian/Work/sdss/sas/dr15/manga/spectro/redux/v2_4_3/8485/stack/manga-8485-1901-LOGRSS.fits.gz'
dm = DataModel.from_file(ff, tree_ver='dr15')

The datamodel code will first prompt you if an existing sdss_access definition exists:

Does this file have an existing sdss_access definition? (y/n):

Answering y will prompt you to look up the sdss_access name, and will attempt to extract the relevant keyword-value pairs. If it cannnot do so, it will prompt you to define them.

Does this file have an existing sdss_access definition? (y/n): y
What is the sdss_access path_name?: mangarss
Could not extract a value mapping for keys: ['drpver', 'wave', 'ifu', 'plate']
Please define a list of name=value key mappings for variable substitution.
e.g. drpver=v2_4_3, plate=8485, ifu=1901, wave=LOG
:drpver=v2_4_3, plate=8485, ifu=1901, wave=LOG

If the file does not have an existing sdss_access entry, i.e. answering n, it will prompt you to define new inputs for the file species, symbolic path, and example keywords:

Does this file have an existing sdss_access definition? (y/n): n
Define a new path_name / file_species, e.g. mangaRss: mangaRss
Define a new path template, starting with an environment variable label.
Use jinja {} templating to define variable name used for substitution.
e.g. "MANGA_SPECTRO_REDUX/{drpver}/{plate}/stack/manga-{plate}-{ifu}-{wave}RSS.fits.gz"
: MANGA_SPECTRO_REDUX/{drpver}/{plate}/stack/manga-{plate}-{ifu}-{wave}RSS.fits.gz
Define a list of name=value key mappings for variable substitution.
e.g. drpver=v2_4_3, plate=8485, ifu=1901, wave=LOG
: drpver=v2_4_3, plate=8485, ifu=1901, wave=LOG

Either way, at the end it will ask you to confirm your definitions:

Confirm the following: (y/n):
 file = /Users/Brian/Work/sdss/sas/dr15/manga/spectro/redux/v2_4_3/8485/stack/manga-8485-1901-LOGRSS.fits.gz
 path_name = mangarss
 path_template = MANGA_SPECTRO_REDUX/{drpver}/{plate}/stack/manga-{plate}-{ifu}-{wave}RSS.fits.gz
 path_keys = ['drpver=v2_4_3', 'plate=8485', 'ifu=1901', 'wave=LOG']

Adding the datamodel to the DSI¶

Once a valid datamodel markdown is created, it will be automatically added to the SDSS Data Specification Index (DSI) for display. The DSI is a web application accessible at https://data.sdss5.org/dsi using the standard SDSS passwords. You do not need to do anything extra to have your datamodel appear on the DSI, only ensure that a valid JSON representation has been created.

Changing the Markdown Release Example¶

The generated markdown file only displays example HDU content for a single release, by default the “WORK” release. To change which release is used in the example, you can specify a new “release group”, e.g. "DR" for public data releases, or "IPL" for internal product launches. The code will use the most recent release it can find within that “release group”. For example, to use the latest IPL release in the markdown file, when generating a new datamodel for IPL-2, specify the -m IPL keyword argument:

CLI

From the command-line, specify the -m or --md-group flag to “IPL”.

$ datamodel generate -f astraAllStarAPOGEENet \
-p MWM_ASTRA/{astra_version}/{run2d}-{apred}/summary/allStar-APOGEENet-{astra_version}-{run2d}-{apred}.fits \
-k astra_version=0.3.0 -k run2d=v6_0_9 -k apred=1.0 -r IPL2 -m IPL

Yanny Parameter files¶

While most of the datamodel workflow is the same for par files as for FITS, there are a few differences, which we describe here.

The PRODUCT_ROOT environment variable¶

Many Yanny parameter files are defined inside SVN or GIT repository software products, which can be checked out by the user or installed via the sdss-install product. For example the SDSS platePlans.par lives inside the platelist repo, whose path is defined as $PLATELIST_DIR/platePlans.par, using the PLATELIST_DIR environment variable.

Since the PLATELIST_DIR environment variable can point to any custom user or SAS location, or to a location installed by sdss-install, and can also vary during data releases when software is tagged, a flexible definition is needed. This flexibility is controlled by a PRODUCT_ROOT environment variable. You can find more info on PRODUCT_ROOT in the SVN/Git Data Files section of the tree documentation.

By default the datamodel product will use any existing custom environment variable definition found in your local os.environ. However, if one cannot be found, it falls back on any definition found in the tree product. This may invoke the PRODUCT_ROOT envvar. For example, in the tree product, the PLATELIST_DIR env path for sdsswork is defined as $PRODUCT_ROOT/data/sdss/platelist/trunk, as a general location where to find platelist files.

The datamodel product will attempt to find a valid PRODUCT_ROOT environment variable definition in your system, in the following order of precedence of variable names:

PRODUCT_ROOT
SDSS_GIT_ROOT or SDSS_SVN_ROOT
SDSS_INSTALL_PRODUCT_ROOT
SDSS4_PRODUCT_ROOT
the parent diretory of SAS_BASE_DIR

Note

When running the datamodel product at Utah, most software products are already installed. Their environment variables along with the underlying PRODUCT_ROOT environment variable, will already be defined. The user does not have to do anything extra to enable this functionality.

Example Par YAML¶

The YAML datamodel for a par file is mostly the same as for FITS files, but with a par section instead of an hdus section. Let’s generate an example datamodel stub for the SDSS platePlans yanny file, located in the top-level directory of the platelist product. The code to generate the datamodel stub is:

dm = DataModel(file_spec='platePlans', path='PLATELIST_DIR/platePlans.par', keywords=[], release="WORK")
dm.write_stubs()

The output datamodel file, products/yaml/platePlans.yaml has the following contents:

PAR Yaml

Example yaml datamodel for the SDSS plate plans par file, shortened for brevity

general:
  name: platePlans
  short: replace me - with a short one sentence summary of file
  description: replace me - with a longer description of the data product
  datatype: PAR
  filesize: 1 MB
  releases:
  - WORK
  environments:
  - PLATELIST_DIR
  naming_convention: replace me - with $PLATELIST_DIR/platePlans.par or platePlans.par
    but with regex pattern matches
  generated_by: replace me - with the name(s) of any git or svn product(s) that produces
    this product.
  design: false
changelog:
  description: Describes changes to the datamodel product and/or file structure from
    one release to another
  releases: {}
releases:
  WORK:
    template: $PLATELIST_DIR/platePlans.par
    example: platePlans.par
    location: platePlans.par
    environment: PLATELIST_DIR
    access:
      in_sdss_access: true
      path_name: platePlans
      path_template: $PLATELIST_DIR/platePlans.par
      path_kwargs: []
      access_string: platePlans = $PLATELIST_DIR/platePlans.par
    par:
      comments: |-
        # platePlans.par
        #
        # Global plate planning file for SDSS-III
        #
        # Every plate number (plateid) has one and only one entry here.
        #
        # Numbering of plates starts after last plates of SDSS-II, which
        # were the MARVELS June 2008 pre-selection plates (3000-3014).
        # Note that SDSS-II also used plate numbers 8000-8033, which should
        # therefore be avoided
        #
        # Meaning of columns:
        #  plateid - unique ID of plate
        #  designid - ID of "design"; two plates can have the same design
        #             but be drilled for different HA, TEMP, EPOCH
        # ...
        # ...
        #
      header: []
      tables:
        PLATEPLANS:
          name: PLATEPLANS
          description: replace me - with a description of this table
          n_rows: 7551
          structure:
          - name: plateid
            type: int
            description: replace me - with a description of this column
            unit: replace me - with a unit of this column
            is_array: false
            is_enum: false
            example: 186
          - name: designid
            type: int
            description: replace me - with a description of this column
            unit: replace me - with a unit of this column
            is_array: false
            is_enum: false
            example: -1
          ...

Yaml “Par” Section¶

The par section of the YAML file has the following content:

comments: a string block of any comments found at the top of the Yanny par file, up to the “typedef” struct definition.
header: a list of any header keywords found in the Yanny par file
tables: a dictionary of tables defined in the Yanny par file

Each table entry has a table name (name), a description of the table (description), the number of rows in the table (n_rows), and a list of column definitions (structure). The column definitions are constructed from the Yanny typedef structure definition found in the file for the given table.

The type column parameter is pulled directly from the typedef column definition, eg. int plateid. For column defintions with a size element, they are stored on the type itself. For example char survey[20] is stored as type char[20]. The array Yanny column definition float ha[6]; would be converted to the yaml entry:

- name: ha
  type: float[6]
  description: replace me - with a description of this column
  unit: replace me - with a unit of this column
  is_array: true
  is_enum: false
  example:
  - -45.0
  - 0.0
  - 0.0
  - 0.0
  - 0.0
  - 0.0

For Yanny columns with an “enumerated” definition, the type will be set to the name of the enum typedef structure, and have is_enum set to True. The enumerated values will be listed in the enum_values yaml parameter. For example, the SDSS-V sdsscore configuration summary file, confSummary-XXXX.par has a fiberType column with an ENUM definition of

typedef enum {
    BOSS,
    APOGEE,
    METROLOGY,
    NONE
} FIBERTYPE;

The corresponding YAML entry would be:

- name: fiberType
  type: FIBERTYPE
  description: replace me - with a description of this column
  unit: replace me - with a unit of this column
  is_array: false
  is_enum: true
  enum_values:
  - BOSS
  - APOGEE
  - METROLOGY
  - NONE
  example: APOGEE

HDF5 Files¶

While most of the datamodel workflow is the same for HDF5 files as for FITS, there are a few differences, which we describe here. HDF5 files are in a hierarchical data format, with many nested levels of groups of information and/or data. Each group or dataset can also have a list of metadata attributes associated with each level.

For ease of representation in the YAML datamodel, we flatten the entire hierachy of the HDF5 file into a single members list. Parent-child relationships, and the numbers of members in each group are maintained.

Example HDF YAML¶

The YAML datamodel for a HDF5 file is mostly the same as for FITS files, but with a hdfs section instead of an hdus section. Let’s generate an example datamodel stub for a file that lives in the APOGEE_SANDBOX, and contains deblending information for a crowded stellar field. The code to generate the datamodel stub is:

dm = DataModel(file_spec='apogeeDeblend', path='APOGEE_SANDBOX/deblend/{ver}/deblend_{chunk}.h5', keywords=["ver=v0", "chunk=2422101"], release="WORK")
dm.write_stubs()

The output datamodel file, deblend/v0/deblend_2422101.h5 has the following contents:

HDF Yaml

Example yaml datamodel for an HDF5 file, shortened for brevity

general:
  name: apogeeDeblend
  short: replace me - with a short one sentence summary of file
  description: replace me - with a longer description of the data product
  datatype: H5
  filesize: 1 MB
  releases:
  - WORK
  ...
releases:
  WORK:
    ...
    hdfs:
      name: /
      parent: /
      object: group
      description: replace me - with a description of this group
      libver: !!python/tuple
      - earliest
      - v112
      n_members: 7
      pytables: false
      attrs: []
      members:
        chi2:
          name: chi2
          parent: /
          object: dataset
          description: replace me - with a description of this dataset
          attrs: []
          shape: !!python/tuple
          - 100
          - 4
          - 81
          size: 32400
          ndim: 3
          dtype: float64
          nbytes: 259200
          is_virtual: false
          is_empty: false
        chi2f:
          name: chi2f
          parent: /
          object: dataset
          description: replace me - with a description of this dataset
          attrs: []
          shape: !!python/tuple
          - 100
          - 3
          - 10
          size: 3000
          ndim: 3
          dtype: float64
          nbytes: 24000
          is_virtual: false
          is_empty: false
        outlst:
          name: outlst
          parent: /
          object: dataset
          description: replace me - with a description of this dataset
          attrs: []
          shape: !!python/tuple
          - 100
          - 39
          size: 3900
          ndim: 2
          dtype: float64
          nbytes: 31200
          is_virtual: false
          is_empty: false

Yaml “Hdf” Section¶

The hdfs section of the YAML file has the following content:

name: the root group name of the HDF5 file
parent: the name of the parent of the current level
object: the type of HDF5 object, either a “group” or “dataset”
description: a description of the group
libver: the HDF5 library version, as a python tuple
n_members: the number of members in the group
pytables: a boolean flag whether this file was written using PyTables
attrs: a list of metadata attributes at the current level
members: a dictionary of members contained in the HDF5 file, of name: {...} pairs

Each member of the members dictionary can either be a group or a dataset, each a dictionary of its own key-value pairs. The group dictionary has many of the same keys as the top-level section. An example of a group member is:

data:
  name: data
  parent: /
  object: group
  description: replace me - with a description of this group
  n_members: 12
  attrs: []

A dataset member is the equivalent of a numpy array dataset. In addition to the similar keys as the top-level section, a dataset has the following additional keys:

shape: the shape of the array dataset
size: the size of the array dataset, i.e. number of elements
ndim: the number of dimensions of the array dataset
dtype: the string repr numpy dtype of the array dataset, e.g. int32
nbytes: the memory size in bytes of the array dataset
is_virtual: flag whether the dataset is a virtual one
is_empty: flag whether the dataset is an empty one

An example of a dataset member is:

outlst:
  name: outlst
  parent: /
  object: dataset
  description: replace me - with a description of this dataset
  attrs: []
  shape: !!python/tuple
  - 100
  - 39
  size: 3900
  ndim: 2
  dtype: float64
  nbytes: 31200
  is_virtual: false
  is_empty: false

Each group or dataset can also have a list of metadata attributes, attrs, associated with it. These are stored similarly to FITS header keyword values.

An example attribute:

attrs:
- key: name
  value: b'N.'
  comment: replace me - with a description of this attribute
  dtype: '|S2'
  is_empty: false
  shape: !!python/tuple []

Nested Membership¶

As the hiearchical nature of HDF5 files is flattened in the datamodel, each member contains a fully resolved name, its immediate parent, and the number of members in its subgroup, where relevant. Here is an example of a group at the top level, which contains a sub-group and a dataset. The sub-group also contains a dataset.

foo:
  name: foo
  parent: /
  object: group
  description: the new foo group
  n_members: 2
  attrs:
  - key: AFOO
    value: b'ANEW'
    comment: a foo attr
    dtype: '|S4'
    is_empty: false
    shape: !!python/tuple []
foo/foodat:
  name: foo/foodat
  parent: /foo
  object: dataset
  description: foo dat dataset
  attrs: []
  shape: !!python/tuple
  - 100
  size: 100
  ndim: 1
  dtype: float32
  nbytes: 400
  is_virtual: false
  is_empty: false
foo/stuff:
  name: foo/stuff
  parent: /foo
  object: group
  description: foo has stuff too
  n_members: 1
  attrs: []
foo/stuff/newints:
  name: foo/stuff/newints
  parent: /foo/stuff
  object: dataset
  description: new ints for the new stuff
  attrs: []
  shape: !!python/tuple
  - 100
  size: 100
  ndim: 1
  dtype: int64
  nbytes: 800
  is_virtual: false
  is_empty: false

Generating Datamodels¶

Supported Filetypes¶

Generating a datamodel¶

A Note on Releases¶

The YAML structure¶

Validating datamodels¶

Adding new releases¶

Adding a public release with complete cache¶

Adding a new internal release with partial cache¶

Adding a WORK release¶

Forcing a cache refresh¶

Recommended Science Product¶

Generating a datamodel by file¶

Adding the datamodel to the DSI¶

Changing the Markdown Release Example¶

Yanny Parameter files¶

The PRODUCT_ROOT environment variable¶

Example Par YAML¶

Yaml “Par” Section¶

HDF5 Files¶

Example HDF YAML¶

Yaml “Hdf” Section¶

Nested Membership¶