cogrib - Cloud Optimized GRIB

This library has utilities for facilitating access to GRIB2 files stored in cloud Blob Storage.

The problem

Traditionally, to load data from a GRIB2 file you would need to first download the entire GRIB2 file to disk and then read a portion of it.

Disks are relatively slow
GRIB files contain many variables, and you might just be interested in a small portion of it

We'd like to access subsets of the GRIB2 file directly from Blob Storage, without having to download all or part of the file locally first.

We could just convert the GRIB2 file to Zarr. But we'll assume that the data provider has to host the unmodified GRIB2 files and doesn't want to host two copies of the data.

How it works

There are two distinct stages: First a data host scans the GRIB2 file for datasets and figures out which portions the GRIB2 file each dataset refers to. These references, which are byte offsets and lengths for each variable, are saved off to a "kerchunk index file." The index file contains

All the metadata (the dimensions, coordinate values, each variable's attributes, etc.)
References to the GRIB2 as (url, offset, length) tuples.

In code, that looks like

>>> datasets = cfgrib.open_datasets("/path/to/file.grib2")  # must be local
>>> references = [cogrib.make_references(ds) for ds in datasets]

These references would be provided as, e.g., assets on a STAC item or collection.

Second, a user loads up this kerchunk index file and loads it into xarray using the Zarr engine using the "normal" Kerchunk / fsspec reference access pattern. For those unfamiliar, with this process, Zarr doesn't natively understand what to do with these references files. So we need a slightly intelligent Zarr store that will perform the HTTP range requests for the actual data from the GRIB2 file. The fsspec "reference" filesystem does just that.

>>> references = requests.get("http://path/to/references.json")
>>> store = fsspec.filesystem("reference", fo=references).get_mapper("")
>>> ds = xr.open_dataset(store, engine="zarr", chunks={})

Why isn't this in kerchunk / fsspec's reference filesystem?

It probably should be. Just experimenting for now.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cogrib.py		cogrib.py
pyproject.toml		pyproject.toml
test_cogrib.py		test_cogrib.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cogrib - Cloud Optimized GRIB

The problem

How it works

Why isn't this in kerchunk / fsspec's reference filesystem?

About

Releases

Packages

Languages

License

TomAugspurger/cogrib

Folders and files

Latest commit

History

Repository files navigation

cogrib - Cloud Optimized GRIB

The problem

How it works

Why isn't this in kerchunk / fsspec's reference filesystem?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages