Adding New Files¶
To add support for new filetypes to the SDSS DataModel product, here are the steps to follow.
Update the DataModel¶
Add the file suffix or extension to the list of supported filetypes on the
DataModel
class in datamodel/generate/datamodel.py
.
class DataModel:
supported_filetypes = ['.fits', '.par', '.h5']
Create a New File Generator¶
Create a new python file under datamodel/generate/filetypes/
, with the name of your new file suffix. In
this file, create a new class subclassed from BaseFile
. For example,
from .base import BaseFile
class FitsFile(BaseFile):
""" Class for supporting FITS files """
suffix = 'FITS'
cache_key = 'hdus'
This is the class that will be used to generate the new file. It requires several class attributes and methods to be defined. It requires the following class attributes:
suffix: An uppercase name of the file extension or suffix
cache_key: The key name used to store the data values in the YAML cache
Each new subclass must minimally override and define the following class methods:
_generate_new_cache
: creates and returns the initial dictionary data that populates thecache_key
YAML key_update_partial_cache
: updates the cache of new releases with any cache content from old releases, e.g. field descriptionsdesign_content
: when designing datamodels, updates a YAML cache with content via Python_get_designed_object
: when designing datamodels, creates a new object representation of the file from the YAML cachewrite_design
: when designing datamodels, generates a new designed template file on disk
These methods are customized for each file type. The first two methods in the list above pertain to generating the initial cache content, and partially updating new cache with older cache data. The last three methods are specifically for designing new datamodels when the files do not yet exist on disk. If this functionality is not desired, simply create empty placeholder methods.
See the existing FitsFile
, ParFile
, and
HdfFile
example classes for implementation details.
Describe the YAML representation¶
To convert YAML files to validated JSON files, we need to describe the YAML structure in Python format, and the rules necessary to validate the YAML content to ensure its of the proper format. We use Pydantic to construct model class representations of the YAML content and to handle type validation.
Create a new python file under datamodel/models/filetypes/
, with the name of your new file suffix. In
this file, create a new class subclassed from Pydantic’s BaseModel.
For example,
from typing import List, Dict
from pydantic import BaseModel
class HDU(BaseModel):
""" Pydantic model representing a YAML hdu section """
name: str
is_image: bool
description: str
size: str
header: List[Header] = None
columns: Dict[str, Column] = None
Every field or object in YAML must be represented in the model, either as an attribute on a given Pydantic model,
or as a Model itself. Models can be nested and chained together to create more complex structures. See, for
example, the Header
, Column
, and
HDU
models that describe the YAML content of an individual FITS HDU.
Also see the existing HDU
, ParModel
, and
HdfModel
example classes for complete implementation details.
Once the relevant models are created, we must add them to our global YamlModel
, in
datamodel.models.yaml.py
. In the Release
model, add a new attribute with the
name of the cache_key
, e.g. hdus
. Set the default to None
so the field is an optional one.
For example,
class Release(BaseModel):
""" Pydantic model representing an item in the YAML releases section """
hdus: Dict[str, HDU] = None
par: ParModel = None
hdfs: HdfModel = None
Create a Markdown Stub¶
Create a new markdown file under datamodel/templates/md/
, with the name of your new file suffix. This file
uses Jinja template syntax. The new file must extend the
md/base.md
file, i.e. {% extends "md/base.md" %}
. It also must contain the following two jinja blocks
.
content: A list of structures in the file, displayed in the example, e.g. FITS HDUs or Yanny tables
example: Any content from the YAML
cache_key
to display
For example,
{% block content %}
{% for table in data['tables'] %}
- [{{table}}](#{{table}})
{% endfor %}
{% endblock content %}
{# Example par rendering for the example block #}
{% block example %}
....
{% endblock example %}
The YAML cache content specified in the cache_key
field is available to your new markdown file as the
data
attribute, as a dictionary. By default the data values used for populating the example will
come from the “WORK” release, or if not available, the latest DR release in the model.
See the examples in templates/md/fits.md
, templates/md/par.md
, and templates/md/h5.md
for
implementation details.
Create a Changelog Generator¶
Create a new python file under datamodel/generate/changelog/filetypes/
, with the name of your new file
suffix. In this file, create a new class subclassed from YamlDiff
. For
example,
from datamodel.generate.changelog.yaml import YamlDiff
class YamlFits(YamlDiff):
""" Class for supporting YAML changelog generation for FITS files """
suffix = 'FITS'
cache_key = 'hdus'
This is the class that will be used to generate the changelog between releases for the new file. It requires several class attributes and methods to be defined. It requires the following class attributes:
suffix: An uppercase name of the file extension or suffix
cache_key: The key name used to store the data values in the YAML cache
Each new subclass must minimally override and define the following class methods:
_get_changes
: creates and returns the dictionary data that populates thechangelog.releases
YAML key.
This method is customized for each file type. Its inputs are the two YAML cache_key
values from one
release, and the release prior. It should return a dictionary of desired computed changes between the
two releases. If this functionality is not desired, simply create empty placeholder methods.
See the existing YamlFits
,
YamlPar
, and
YamlHDF5
example classes for implementation details.
Create Changelog Models¶
In the python file containing our new Pydantic models, we must also create new models for the changelog. These are created in a similar fashion to models as described above. For example,
class ChangeFits(BaseModel):
""" Pydantic model representing the FITS hdu fields of the YAML changelog release section """
delta_nhdus: int = None
added_hdus: List[str] = None
removed_hdus: List[str] = None
primary_delta_nkeys: int = None
added_primary_header_kwargs: List[str] = None
removed_primary_header_kwargs: List[str] = None
See ChangeFits
, ChangePar
, and
ChangeHdf
example classes for complete implementation details.
Once our new models are created, the core model must be added to the list of subclasses in the
ChangeRelease
model in datamodel/models/yaml.py
. See
class ChangeRelease(ChangeHdf, ChangePar, ChangeFits, ChangeBase):
pass
To maintain proper field ordering, it must be added to the front of the list, e.g.
ChangeRelease([NewModel], ChangeHdf, ChangePar, ChangeFits, ChangeBase)
.
Add DataModel Design Code¶
To add the option of users designing datamodels for the new file before it exists, you need to override the following three methods in your file generation python file.
design_content
: when designing datamodels, updates a YAML cache with content via Python_get_designed_object
: when designing datamodels, creates a new object representation of the file from the YAML cachewrite_design
: when designing datamodels, generates a new designed template file on disk
Designing New Content¶
On your new file class, override the design_content
method, and
customize it for the specifics of the new file. It should parse the Python inputs and convert them into the
proper YAML datamodel structure, populating the content in the cache_key
field. For example, see FITS
design_content
method for implementation details of parsing input
Python into the YAML hdus dictionary content. Also see the Yanny par
design_content
and HDF5 design_content
methods for details on other filetypes.
Once you’ve created the above method, you need to add a new design_xxx
convenience method to the
DataModel
class, which passes your desired inputs into the private
_design_content
method. This is a convenience for users to easily design new YAML content directly
from a datamodel instance. For example, the convenient design_hdu
method for FITS files looks like:
def design_hdu(self, ext: str = 'primary', extno: int = None, name: str = 'EXAMPLE',
description: str = None, header: Union[list, dict, fits.Header] = None,
columns: List[Union[list, dict, fits.Column]] = None, **kwargs):
""" Wrapper to _design_content, to design a new HDU """
self._design_content(ext=ext, extno=extno, name=name, description=description,
header=header, columns=columns, **kwargs)
Creating the File¶
To create a new file object on disk, we need to convert the YAML content to a valid file representation and we need to write out that file to disk.
First we need to override the _get_designed_object
method.
This is a static method whose input is the cache_key
YAML dictionary content for the new file. It should
return a new file object representation. This method gets called by
create_from_cache
and makes the new object available as the
_designed_object
attribute. This method should create a new file by parsing and validating the YAML content
through its Pydantic model and calling a convert_xxx
method to convert the Pydantic model to the new file.
For example, the method for FITS conversion look like:
@staticmethod
def _get_designed_object(data: dict):
""" Return a valid fits HDUList """
return fits.HDUList([HDU.model_validate(v).convert_hdu() for v in data.values()])
with the convert_hdu
method:
class HDU(BaseModel):
...
def convert_header(self) -> fits.Header:
""" Convert the list of header keys into a fits.Header """
if not self.header:
return None
return fits.Header(i.to_tuple() for i in self.header)
def convert_columns(self) -> List[fits.Column]:
""" Convert the columns dict into a a list of fits.Columns """
if not self.columns:
return None
return [i.to_fitscolumn() for i in self.columns.values()]
def convert_hdu(self) -> Union[fits.PrimaryHDU, fits.ImageHDU, fits.BinTableHDU]:
""" Convert the HDU entry into a valid fits.HDU """
if self.name.lower() == 'primary':
return fits.PrimaryHDU(header=self.convert_header())
elif self.columns:
return fits.BinTableHDU.from_columns(self.convert_columns(), name=self.name,
header=self.convert_header())
else:
return fits.ImageHDU(name=self.name, header=self.convert_header())
See also the convert_par
and
convert_hdf
methods for details on other filetypes.
Since different file packages have different mechanisms of writing to disk, we need to override the
write_design
method and customize our write method for the
specifics of the file object. The method should use the self._designed_object
attribute, which contains
the file object itself. For example, the method for writing out a new FITS file looks like:
def write_design(self, file: str, overwrite: bool = True) -> None:
""" Write out the designed file """
if not self._designed_object:
raise AttributeError('Cannot write. Designed object does not exist.')
self._designed_object.writeto(file, overwrite=overwrite, checksum=True)
Add New Tests¶
All the tests are designed around creating a datamodel for a file species “test”, located
at $TEST_REDUX/{ver}/testfile_{id}.{suffix}
where supported filetypes fill in the suffix
field.
In tests/conftest.py
, create a new create_xxx
function for your new filetype. This function creates
a new test file. For example,
def create_fits(name, version, extra_cols):
""" create a test fits hdulist """
# create the FITS HDUList
header = fits.Header([('filename', name,'name of the file'),
('testver', version, 'version of the file')])
primary = fits.PrimaryHDU(header=header)
imdata = fits.ImageHDU(name='FLUX', data=np.ones([5,5]))
cols = [fits.Column(name='object', format='20A', array=['a', 'b', 'c']),
fits.Column(name='param', format='E', array=np.random.rand(3), unit='m'),
fits.Column(name='flag', format='I', array=np.arange(3))]
if extra_cols:
cols.extend([fits.Column(name='field', format='J', array=np.arange(3)),
fits.Column(name='mjd', format='I', array=np.arange(3))])
bindata = fits.BinTableHDU.from_columns(cols, name='PARAMS')
return fits.HDUList([primary, imdata, bindata])
Update the create_test_file
where appropriate to call the new create_xxx
method.
Add a new validated test yaml file for in tests/data/
. Copy an existing test YAML file and modify it
for your new test filetype. The validated YAML content should match the content created in the create_xxx
method.
In tests/conftest.py
, add your new file extension to the list of suffixes suffixes = ['fits', 'par', 'h5']
`
.
The above setup will automatically add the new filetype to some of the test suite. Additional tests can be
added. Minimally a new design test specific to the new filetype should be added to
the tests/generate/test_design.py
file.
Update the Documentation¶
Update the Sphinx documentation with all relevant documentation pertaining to the new supported filetype.
Add your autodoc module API references to the
api.rst
file.Update the “Supported Filetypes” table in the
index.rst
file.Add the new filetype to the list of supported files in the
generate.rst
file.Update the docs in
generate.rst
with a new section of caveats for generating your file.Update the docs in
design.rst
with a new sections on designing your file.Add a new example to the
examples_generate.rst
file.If needed add examples to the
examples_design.rst
file.
You can build the docs locally using the sdsstools
command sdss docs.build
and sdss docs.show
to open the local docs in your browser.
Submit a Pull Request¶
Submit a Github PR for review. Follow the instructions to Create a PR. Make sure the PR passes all Checks.