Adding New Files

To add support for new filetypes to the SDSS DataModel product, here are the steps to follow.

  1. Update the DataModel

  2. Create a New File Generator

  3. Describe the YAML representation

  4. Create a Markdown Stub

  5. Create a Changelog Generator

  6. Create Changelog Models

  7. Add DataModel Design Code

  8. Add New Tests

  9. Update the Documentation

  10. Submit a Pull Request

Update the DataModel

Add the file suffix or extension to the list of supported filetypes on the DataModel class in datamodel/generate/datamodel.py.

class DataModel:
    supported_filetypes = ['.fits', '.par', '.h5']

Create a New File Generator

Create a new python file under datamodel/generate/filetypes/, with the name of your new file suffix. In this file, create a new class subclassed from BaseFile. For example,

from .base import BaseFile

class FitsFile(BaseFile):
    """ Class for supporting FITS files """
    suffix = 'FITS'
    cache_key = 'hdus'

This is the class that will be used to generate the new file. It requires several class attributes and methods to be defined. It requires the following class attributes:

  • suffix: An uppercase name of the file extension or suffix

  • cache_key: The key name used to store the data values in the YAML cache

Each new subclass must minimally override and define the following class methods:

  • _generate_new_cache: creates and returns the initial dictionary data that populates the cache_key YAML key

  • _update_partial_cache: updates the cache of new releases with any cache content from old releases, e.g. field descriptions

  • design_content: when designing datamodels, updates a YAML cache with content via Python

  • _get_designed_object: when designing datamodels, creates a new object representation of the file from the YAML cache

  • write_design: when designing datamodels, generates a new designed template file on disk

These methods are customized for each file type. The first two methods in the list above pertain to generating the initial cache content, and partially updating new cache with older cache data. The last three methods are specifically for designing new datamodels when the files do not yet exist on disk. If this functionality is not desired, simply create empty placeholder methods.

See the existing FitsFile, ParFile, and HdfFile example classes for implementation details.

Describe the YAML representation

To convert YAML files to validated JSON files, we need to describe the YAML structure in Python format, and the rules necessary to validate the YAML content to ensure its of the proper format. We use Pydantic to construct model class representations of the YAML content and to handle type validation.

Create a new python file under datamodel/models/filetypes/, with the name of your new file suffix. In this file, create a new class subclassed from Pydantic’s BaseModel. For example,

from typing import List, Dict
from pydantic import BaseModel

class HDU(BaseModel):
    """ Pydantic model representing a YAML hdu section """

    name: str
    is_image: bool
    description: str
    size: str
    header: List[Header] = None
    columns: Dict[str, Column] = None

Every field or object in YAML must be represented in the model, either as an attribute on a given Pydantic model, or as a Model itself. Models can be nested and chained together to create more complex structures. See, for example, the Header, Column, and HDU models that describe the YAML content of an individual FITS HDU. Also see the existing HDU, ParModel, and HdfModel example classes for complete implementation details.

Once the relevant models are created, we must add them to our global YamlModel, in datamodel.models.yaml.py. In the Release model, add a new attribute with the name of the cache_key, e.g. hdus. Set the default to None so the field is an optional one. For example,

class Release(BaseModel):
    """ Pydantic model representing an item in the YAML releases section """
    hdus: Dict[str, HDU] = None
    par: ParModel = None
    hdfs: HdfModel = None

Create a Markdown Stub

Create a new markdown file under datamodel/templates/md/, with the name of your new file suffix. This file uses Jinja template syntax. The new file must extend the md/base.md file, i.e. {% extends "md/base.md" %}. It also must contain the following two jinja blocks.

  • content: A list of structures in the file, displayed in the example, e.g. FITS HDUs or Yanny tables

  • example: Any content from the YAML cache_key to display

For example,

{% block content %}
{% for table in data['tables'] %}
- [{{table}}](#{{table}})
{% endfor %}
{% endblock content %}

{# Example par rendering for the example block #}
{% block example %}
....
{% endblock example %}

The YAML cache content specified in the cache_key field is available to your new markdown file as the data attribute, as a dictionary. By default the data values used for populating the example will come from the “WORK” release, or if not available, the latest DR release in the model.

See the examples in templates/md/fits.md, templates/md/par.md, and templates/md/h5.md for implementation details.

Create a Changelog Generator

Create a new python file under datamodel/generate/changelog/filetypes/, with the name of your new file suffix. In this file, create a new class subclassed from YamlDiff. For example,

from datamodel.generate.changelog.yaml import YamlDiff

class YamlFits(YamlDiff):
    """ Class for supporting YAML changelog generation for FITS files """
    suffix = 'FITS'
    cache_key = 'hdus'

This is the class that will be used to generate the changelog between releases for the new file. It requires several class attributes and methods to be defined. It requires the following class attributes:

  • suffix: An uppercase name of the file extension or suffix

  • cache_key: The key name used to store the data values in the YAML cache

Each new subclass must minimally override and define the following class methods:

  • _get_changes: creates and returns the dictionary data that populates the changelog.releases YAML key.

This method is customized for each file type. Its inputs are the two YAML cache_key values from one release, and the release prior. It should return a dictionary of desired computed changes between the two releases. If this functionality is not desired, simply create empty placeholder methods.

See the existing YamlFits, YamlPar, and YamlHDF5 example classes for implementation details.

Create Changelog Models

In the python file containing our new Pydantic models, we must also create new models for the changelog. These are created in a similar fashion to models as described above. For example,

class ChangeFits(BaseModel):
    """ Pydantic model representing the FITS hdu fields of the YAML changelog release section """
    delta_nhdus: int = None
    added_hdus: List[str] = None
    removed_hdus: List[str] = None
    primary_delta_nkeys: int = None
    added_primary_header_kwargs: List[str] = None
    removed_primary_header_kwargs: List[str] = None

See ChangeFits, ChangePar, and ChangeHdf example classes for complete implementation details.

Once our new models are created, the core model must be added to the list of subclasses in the ChangeRelease model in datamodel/models/yaml.py. See

class ChangeRelease(ChangeHdf, ChangePar, ChangeFits, ChangeBase):
    pass

To maintain proper field ordering, it must be added to the front of the list, e.g. ChangeRelease([NewModel], ChangeHdf, ChangePar, ChangeFits, ChangeBase).

Add DataModel Design Code

To add the option of users designing datamodels for the new file before it exists, you need to override the following three methods in your file generation python file.

  • design_content: when designing datamodels, updates a YAML cache with content via Python

  • _get_designed_object: when designing datamodels, creates a new object representation of the file from the YAML cache

  • write_design: when designing datamodels, generates a new designed template file on disk

Designing New Content

On your new file class, override the design_content method, and customize it for the specifics of the new file. It should parse the Python inputs and convert them into the proper YAML datamodel structure, populating the content in the cache_key field. For example, see FITS design_content method for implementation details of parsing input Python into the YAML hdus dictionary content. Also see the Yanny par design_content and HDF5 design_content methods for details on other filetypes.

Once you’ve created the above method, you need to add a new design_xxx convenience method to the DataModel class, which passes your desired inputs into the private _design_content method. This is a convenience for users to easily design new YAML content directly from a datamodel instance. For example, the convenient design_hdu method for FITS files looks like:

def design_hdu(self, ext: str = 'primary', extno: int = None, name: str = 'EXAMPLE',
               description: str = None, header: Union[list, dict, fits.Header] = None,
               columns: List[Union[list, dict, fits.Column]] = None, **kwargs):
    """ Wrapper to _design_content, to design a new HDU """
    self._design_content(ext=ext, extno=extno, name=name, description=description,
                        header=header, columns=columns, **kwargs)

Creating the File

To create a new file object on disk, we need to convert the YAML content to a valid file representation and we need to write out that file to disk.

First we need to override the _get_designed_object method. This is a static method whose input is the cache_key YAML dictionary content for the new file. It should return a new file object representation. This method gets called by create_from_cache and makes the new object available as the _designed_object attribute. This method should create a new file by parsing and validating the YAML content through its Pydantic model and calling a convert_xxx method to convert the Pydantic model to the new file. For example, the method for FITS conversion look like:

@staticmethod
def _get_designed_object(data: dict):
    """ Return a valid fits HDUList """
    return fits.HDUList([HDU.model_validate(v).convert_hdu() for v in data.values()])

with the convert_hdu method:

class HDU(BaseModel):
    ...

    def convert_header(self) -> fits.Header:
        """ Convert the list of header keys into a fits.Header """
        if not self.header:
            return None
        return fits.Header(i.to_tuple() for i in self.header)

    def convert_columns(self) -> List[fits.Column]:
        """ Convert the columns dict into a a list of fits.Columns """
        if not self.columns:
            return None
        return [i.to_fitscolumn() for i in self.columns.values()]

    def convert_hdu(self) -> Union[fits.PrimaryHDU, fits.ImageHDU, fits.BinTableHDU]:
        """ Convert the HDU entry into a valid fits.HDU """
        if self.name.lower() == 'primary':
            return fits.PrimaryHDU(header=self.convert_header())
        elif self.columns:
            return fits.BinTableHDU.from_columns(self.convert_columns(), name=self.name,
                                                header=self.convert_header())
        else:
            return fits.ImageHDU(name=self.name, header=self.convert_header())

See also the convert_par and convert_hdf methods for details on other filetypes.

Since different file packages have different mechanisms of writing to disk, we need to override the write_design method and customize our write method for the specifics of the file object. The method should use the self._designed_object attribute, which contains the file object itself. For example, the method for writing out a new FITS file looks like:

def write_design(self, file: str, overwrite: bool = True) -> None:
    """ Write out the designed file """
    if not self._designed_object:
        raise AttributeError('Cannot write.  Designed object does not exist.')

    self._designed_object.writeto(file, overwrite=overwrite, checksum=True)

Add New Tests

All the tests are designed around creating a datamodel for a file species “test”, located at $TEST_REDUX/{ver}/testfile_{id}.{suffix} where supported filetypes fill in the suffix field.

In tests/conftest.py, create a new create_xxx function for your new filetype. This function creates a new test file. For example,

def create_fits(name, version, extra_cols):
    """ create a test fits hdulist """
    # create the FITS HDUList
    header = fits.Header([('filename', name,'name of the file'),
                        ('testver', version, 'version of the file')])
    primary = fits.PrimaryHDU(header=header)
    imdata = fits.ImageHDU(name='FLUX', data=np.ones([5,5]))
    cols = [fits.Column(name='object', format='20A', array=['a', 'b', 'c']),
            fits.Column(name='param', format='E', array=np.random.rand(3), unit='m'),
            fits.Column(name='flag', format='I', array=np.arange(3))]
    if extra_cols:
        cols.extend([fits.Column(name='field', format='J', array=np.arange(3)),
                    fits.Column(name='mjd', format='I', array=np.arange(3))])
    bindata = fits.BinTableHDU.from_columns(cols, name='PARAMS')

    return fits.HDUList([primary, imdata, bindata])

Update the create_test_file where appropriate to call the new create_xxx method.

Add a new validated test yaml file for in tests/data/. Copy an existing test YAML file and modify it for your new test filetype. The validated YAML content should match the content created in the create_xxx method.

In tests/conftest.py, add your new file extension to the list of suffixes suffixes = ['fits', 'par', 'h5'] `.

The above setup will automatically add the new filetype to some of the test suite. Additional tests can be added. Minimally a new design test specific to the new filetype should be added to the tests/generate/test_design.py file.

Update the Documentation

Update the Sphinx documentation with all relevant documentation pertaining to the new supported filetype.

  • Add your autodoc module API references to the api.rst file.

  • Update the “Supported Filetypes” table in the index.rst file.

  • Add the new filetype to the list of supported files in the generate.rst file.

  • Update the docs in generate.rst with a new section of caveats for generating your file.

  • Update the docs in design.rst with a new sections on designing your file.

  • Add a new example to the examples_generate.rst file.

  • If needed add examples to the examples_design.rst file.

You can build the docs locally using the sdsstools command sdss docs.build and sdss docs.show to open the local docs in your browser.

Submit a Pull Request

Submit a Github PR for review. Follow the instructions to Create a PR. Make sure the PR passes all Checks.