Schema validation for Xarray
Install xarray-schema from PyPI:
pip install xarray-schema
Conda:
conda install -c conda-forge xarray-schema
Or install it from source:
pip install git+https://github.com/xarray-contrib/xarray-schema
Xarray-schema's API is modeled after Pandera. The DataArraySchema
and DatasetSchema
objects both have .validate()
methods.
The basic usage is as follows:
import numpy as np
import xarray as xr
from xarray_schema import DataArraySchema, DatasetSchema, CoordsSchema
da = xr.DataArray(np.ones(4, dtype='i4'), dims=['x'], name='foo')
schema = DataArraySchema(dtype=np.integer, name='foo', shape=(4, ), dims=['x'])
schema.validate(da)
You can also use it to validate a Dataset
like so:
schema_ds = DatasetSchema({'foo': schema})
schema_ds.validate(da.to_dataset())
Each component of the Xarray data model is implemented as a stand alone class:
from xarray_schema.components import (
DTypeSchema,
DimsSchema,
ShapeSchema,
NameSchema,
ChunksSchema,
ArrayTypeSchema,
AttrSchema,
AttrsSchema
)
# example constructions
dtype_schema = DTypeSchema('i4')
dims_schema = DimsSchema(('x', 'y', None)) # None is used as a wildcard
shape_schema = ShapeSchema((5, 10, None)) # None is used as a wildcard
name_schema = NameSchema('foo')
chunk_schema = ChunksSchema({'x': None, 'y': -1}) # None is used as a wildcard, -1 is used as
ArrayTypeSchema = ArrayTypeSchema(np.ndarray)
# Example usage
dtype_schema.validate(da.dtype)
# Each object schema can be exported to JSON format
dtype_json = dtype_schema.to_json()
This is a very early prototype of a library. Some key things are missing:
- Exceptions: Pandera accumulates schema exceptions and reports them all at once. Currently, we are a eagerly raising
SchemaErrors
when the are found.
All the code in this repository is MIT licensed.
This project was originally developed at CarbonPlan. It was transferred to the xarray-contrib organization in August 2022.