-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define STAC metadata structure #1
Comments
Thanks @hrodmn. A few quick suggestions that shouldn't be taken too seriously
This was probably the thing I struggled with the most when working with ecmwf-forecast. If we take building data cubes as our end goal (which I think is reasonable), it's not always obvious what the relationship is between a list of grib files and one or more data cubes (xarray Datasets). cfgrib has some conventions for building a list of xarray Datasets out of a STAC item, which might be worth following. But in general, a single GRIB file might map to multiple Datasets, which makes things messy (not sure if it applies to the HRRR files).
Can you give a bit more detail on this? (But in general, my recommendation will probably be "do whatever cfgrib does")
I'm generally OK with trusting the provider's documentation around how things are organized, but it would be good to have an easy way to validate that the STAC items are correct.
I'm not sure yet. I think there's a larger discussion to be had about where Kerchunk metadata lives (briefly, it might go in STAC metadata (and from the stac-geoparquet exports be available in parquet for bulk queries) or it might go in sidecar JSON / parquet files.) If possible, I would defer this initially, while we work out the best way to do that. |
Thanks for your suggestions, @TomAugspurger!
Thanks for the
The difference between
Good call. For now I could just add the @abarciauskas-bgse @sharkinsspatial After thinking about it some more, I think it might make sense to declare the separate |
After hacking on the STAC metadata in #3 and after much discussion with @sharkinsspatial and @abarciauskas-bgse I have a few more thoughts and a few alternative approaches, each with a set of pros and cons. Here are some more details about the HRRR data that make it challenging to produce useful STAC metadata:
Option 1: many collectionssharkinspatial proposed this as an option that would be most efficient on the STAC side:
Pros:
Cons:
Option 2: one collection, all coherent datacubes listed as assets within items
When you read a .grib file with !wget https://noaahrrr.blob.core.windows.net/hrrr/hrrr.20240530/conus/hrrr.t06z.wrfsfcf00.grib2 -O /tmp/hrrr.20240530__conus__hrrr.t06z.wrfsfcf00.grib2
import cfgrib
local_file = "/tmp/hrrr.20240530__conus__hrrr.t06z.wrfsfcf00.grib2"
cfgrib_datasets = cfgrib.open_datasets(
local_file,
) This returns a list of 47 xarray datasets with one or more data variables. Many have two dimensions (x and y), but some have a third dimension like It would not be typical to store references to internal layers of a .grib file as assets, but it is a natural way to access the data. Pros:
Cons:
If anyone has any strong reactions to these options I would love to hear them! |
@hrodmn in relation to
I think a possible approach here might be adding another custom property to the datacube variable's with the following structure
where each I'll try to crack open some notebooks today to investigate but it also appears that when applying specific level filters with Herbie that it seems to coerce the |
I am back with an example of the "many collections" metadata! For the entire HRRR dataset, there would be a single collection per Here is what the collection metadata look like for one combination ( {
"type": "Collection",
"id": "noaa-hrrr-conus-sfc-fh02-48",
"stac_version": "1.0.0",
"description": "The NOAA HRRR is a real-time 3km resolution, hourly updated, cloud-resolving, convection-allowing atmospheric model, initialized by 3km grids with 3km radar assimilation. Radar data is assimilated in the HRRR every 15 min over a 1-hour period adding further detail to that provided by the hourly data assimilation from the 13km radar-enhanced Rapid Refresh (RAP) system.",
"links": [
{
"rel": "license",
"href": "https://creativecommons.org/licenses/by/4.0/",
"type": "text/html",
"title": "CC-BY-4.0 license"
},
{
"rel": "documentation",
"href": "https://rapidrefresh.noaa.gov/hrrr/",
"type": "text/html",
"title": "NOAA HRRR documentation"
}
],
"stac_extensions": [
"https://stac-extensions.github.io/item-assets/v1.0.0/schema.json",
"https://stac-extensions.github.io/datacube/v2.2.0/schema.json"
],
"item_assets": {
"grib": {
"type": "application/wmo-GRIB2",
"roles": [
"data"
],
"title": "2D Surface Levels",
"description": "2D Surface Level forecast data as a grib2 file. Subsets of the data can be loaded using the provided byte range."
},
"index": {
"type": "application/x-ndjson",
"roles": [
"index"
],
"title": "Index file",
"description": "The index file contains information on each message within the GRIB2 file."
}
},
"cube:dimensions": {
"x": {
"type": "spatial",
"reference_system": "PROJCS[\"unknown\",GEOGCS[\"unknown\",DATUM[\"unknown\",SPHEROID[\"unknown\",6371229,0]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]]],PROJECTION[\"Lambert_Conformal_Conic_2SP\"],PARAMETER[\"latitude_of_origin\",38.5],PARAMETER[\"central_meridian\",-97.5],PARAMETER[\"standard_parallel_1\",38.5],PARAMETER[\"standard_parallel_2\",38.5],PARAMETER[\"false_easting\",0],PARAMETER[\"false_northing\",0],UNIT[\"metre\",1,AUTHORITY[\"EPSG\",\"9001\"]],AXIS[\"Easting\",EAST],AXIS[\"Northing\",NORTH]]",
"extent": [
-2697520.142521929,
2696479.857478071
],
"axis": "x"
},
"y": {
"type": "spatial",
"reference_system": "PROJCS[\"unknown\",GEOGCS[\"unknown\",DATUM[\"unknown\",SPHEROID[\"unknown\",6371229,0]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]]],PROJECTION[\"Lambert_Conformal_Conic_2SP\"],PARAMETER[\"latitude_of_origin\",38.5],PARAMETER[\"central_meridian\",-97.5],PARAMETER[\"standard_parallel_1\",38.5],PARAMETER[\"standard_parallel_2\",38.5],PARAMETER[\"false_easting\",0],PARAMETER[\"false_northing\",0],UNIT[\"metre\",1,AUTHORITY[\"EPSG\",\"9001\"]],AXIS[\"Easting\",EAST],AXIS[\"Northing\",NORTH]]",
"extent": [
-1587306.152556665,
1586693.847443335
],
"axis": "y"
},
"level": {
"type": "atmospheric level",
"description": "The atmospheric level for which the forecast is applicable, e.g. surface, top of atmosphere, 100 m above ground, etc.",
"values": [
"0 m underground",
"0-1000 m above ground",
"0-3000 m above ground",
"0-500 m above ground",
"0-6000 m above ground",
"0.1 sigma level",
"0.5-0.8 sigma layer",
"0C isotherm",
"1 m above ground",
"10 m above ground",
"100-1000 mb above ground",
"1000 m above ground",
"1000 mb",
"1000-0 m above ground",
"180-0 mb above ground",
"2 m above ground",
"2000-0 m above ground",
"250 mb",
"253 K level",
"255-0 mb above ground",
"261 K level - 256 K level",
"263 K level",
"300 mb",
"3000-0 m above ground",
"4000 m above ground",
"500 mb",
"500-1000 mb",
"5000-2000 m above ground",
"700 mb",
"8 m above ground",
"80 m above ground",
"850 mb",
"90-0 mb above ground",
"925 mb",
"boundary layer cloud layer",
"cloud base",
"cloud ceiling",
"cloud top",
"entire atmosphere",
"entire atmosphere (considered as a single layer)",
"equilibrium level",
"high cloud layer",
"highest tropospheric freezing level",
"level of adiabatic condensation from sfc",
"level of free convection",
"low cloud layer",
"mean sea level",
"middle cloud layer",
"surface",
"top of atmosphere"
]
},
"forecast_time": {
"type": "temporal",
"description": "The time horizon for which the forecast is applicable.",
"values": [
"0-1 day acc fcst",
"0-10 hour acc fcst",
"0-11 hour acc fcst",
"0-12 hour acc fcst",
"0-13 hour acc fcst",
"0-14 hour acc fcst",
"0-15 hour acc fcst",
"0-16 hour acc fcst",
"0-17 hour acc fcst",
"0-18 hour acc fcst",
"0-19 hour acc fcst",
"0-2 day acc fcst",
"0-2 hour acc fcst",
"0-20 hour acc fcst",
"0-21 hour acc fcst",
"0-22 hour acc fcst",
"0-23 hour acc fcst",
"0-25 hour acc fcst",
"0-26 hour acc fcst",
"0-27 hour acc fcst",
"0-28 hour acc fcst",
"0-29 hour acc fcst",
"0-3 hour acc fcst",
"0-30 hour acc fcst",
"0-31 hour acc fcst",
"0-32 hour acc fcst",
"0-33 hour acc fcst",
"0-34 hour acc fcst",
"0-35 hour acc fcst",
"0-36 hour acc fcst",
"0-37 hour acc fcst",
"0-38 hour acc fcst",
"0-39 hour acc fcst",
"0-4 hour acc fcst",
"0-40 hour acc fcst",
"0-41 hour acc fcst",
"0-42 hour acc fcst",
"0-43 hour acc fcst",
"0-44 hour acc fcst",
"0-45 hour acc fcst",
"0-46 hour acc fcst",
"0-47 hour acc fcst",
"0-5 hour acc fcst",
"0-6 hour acc fcst",
"0-7 hour acc fcst",
"0-8 hour acc fcst",
"0-9 hour acc fcst",
"1-2 hour acc fcst",
"1-2 hour ave fcst",
"1-2 hour max fcst",
"1-2 hour min fcst",
"10 hour fcst",
"10-11 hour acc fcst",
"10-11 hour ave fcst",
"10-11 hour max fcst",
"10-11 hour min fcst",
"11 hour fcst",
"11-12 hour acc fcst",
"11-12 hour ave fcst",
"11-12 hour max fcst",
"11-12 hour min fcst",
"12 hour fcst",
"12-13 hour acc fcst",
"12-13 hour ave fcst",
"12-13 hour max fcst",
"12-13 hour min fcst",
"13 hour fcst",
"13-14 hour acc fcst",
"13-14 hour ave fcst",
"13-14 hour max fcst",
"13-14 hour min fcst",
"14 hour fcst",
"14-15 hour acc fcst",
"14-15 hour ave fcst",
"14-15 hour max fcst",
"14-15 hour min fcst",
"15 hour fcst",
"15-16 hour acc fcst",
"15-16 hour ave fcst",
"15-16 hour max fcst",
"15-16 hour min fcst",
"16 hour fcst",
"16-17 hour acc fcst",
"16-17 hour ave fcst",
"16-17 hour max fcst",
"16-17 hour min fcst",
"17 hour fcst",
"17-18 hour acc fcst",
"17-18 hour ave fcst",
"17-18 hour max fcst",
"17-18 hour min fcst",
"18 hour fcst",
"18-19 hour acc fcst",
"18-19 hour ave fcst",
"18-19 hour max fcst",
"18-19 hour min fcst",
"19 hour fcst",
"19-20 hour acc fcst",
"19-20 hour ave fcst",
"19-20 hour max fcst",
"19-20 hour min fcst",
"2 hour fcst",
"2-3 hour acc fcst",
"2-3 hour ave fcst",
"2-3 hour max fcst",
"2-3 hour min fcst",
"20 hour fcst",
"20-21 hour acc fcst",
"20-21 hour ave fcst",
"20-21 hour max fcst",
"20-21 hour min fcst",
"21 hour fcst",
"21-22 hour acc fcst",
"21-22 hour ave fcst",
"21-22 hour max fcst",
"21-22 hour min fcst",
"22 hour fcst",
"22-23 hour acc fcst",
"22-23 hour ave fcst",
"22-23 hour max fcst",
"22-23 hour min fcst",
"23 hour fcst",
"23-24 hour acc fcst",
"23-24 hour ave fcst",
"23-24 hour max fcst",
"23-24 hour min fcst",
"24 hour fcst",
"24-25 hour acc fcst",
"24-25 hour ave fcst",
"24-25 hour max fcst",
"24-25 hour min fcst",
"25 hour fcst",
"25-26 hour acc fcst",
"25-26 hour ave fcst",
"25-26 hour max fcst",
"25-26 hour min fcst",
"26 hour fcst",
"26-27 hour acc fcst",
"26-27 hour ave fcst",
"26-27 hour max fcst",
"26-27 hour min fcst",
"27 hour fcst",
"27-28 hour acc fcst",
"27-28 hour ave fcst",
"27-28 hour max fcst",
"27-28 hour min fcst",
"28 hour fcst",
"28-29 hour acc fcst",
"28-29 hour ave fcst",
"28-29 hour max fcst",
"28-29 hour min fcst",
"29 hour fcst",
"29-30 hour acc fcst",
"29-30 hour ave fcst",
"29-30 hour max fcst",
"29-30 hour min fcst",
"3 hour fcst",
"3-4 hour acc fcst",
"3-4 hour ave fcst",
"3-4 hour max fcst",
"3-4 hour min fcst",
"30 hour fcst",
"30-31 hour acc fcst",
"30-31 hour ave fcst",
"30-31 hour max fcst",
"30-31 hour min fcst",
"31 hour fcst",
"31-32 hour acc fcst",
"31-32 hour ave fcst",
"31-32 hour max fcst",
"31-32 hour min fcst",
"32 hour fcst",
"32-33 hour acc fcst",
"32-33 hour ave fcst",
"32-33 hour max fcst",
"32-33 hour min fcst",
"33 hour fcst",
"33-34 hour acc fcst",
"33-34 hour ave fcst",
"33-34 hour max fcst",
"33-34 hour min fcst",
"34 hour fcst",
"34-35 hour acc fcst",
"34-35 hour ave fcst",
"34-35 hour max fcst",
"34-35 hour min fcst",
"35 hour fcst",
"35-36 hour acc fcst",
"35-36 hour ave fcst",
"35-36 hour max fcst",
"35-36 hour min fcst",
"36 hour fcst",
"36-37 hour acc fcst",
"36-37 hour ave fcst",
"36-37 hour max fcst",
"36-37 hour min fcst",
"37 hour fcst",
"37-38 hour acc fcst",
"37-38 hour ave fcst",
"37-38 hour max fcst",
"37-38 hour min fcst",
"38 hour fcst",
"38-39 hour acc fcst",
"38-39 hour ave fcst",
"38-39 hour max fcst",
"38-39 hour min fcst",
"39 hour fcst",
"39-40 hour acc fcst",
"39-40 hour ave fcst",
"39-40 hour max fcst",
"39-40 hour min fcst",
"4 hour fcst",
"4-5 hour acc fcst",
"4-5 hour ave fcst",
"4-5 hour max fcst",
"4-5 hour min fcst",
"40 hour fcst",
"40-41 hour acc fcst",
"40-41 hour ave fcst",
"40-41 hour max fcst",
"40-41 hour min fcst",
"41 hour fcst",
"41-42 hour acc fcst",
"41-42 hour ave fcst",
"41-42 hour max fcst",
"41-42 hour min fcst",
"42 hour fcst",
"42-43 hour acc fcst",
"42-43 hour ave fcst",
"42-43 hour max fcst",
"42-43 hour min fcst",
"43 hour fcst",
"43-44 hour acc fcst",
"43-44 hour ave fcst",
"43-44 hour max fcst",
"43-44 hour min fcst",
"44 hour fcst",
"44-45 hour acc fcst",
"44-45 hour ave fcst",
"44-45 hour max fcst",
"44-45 hour min fcst",
"45 hour fcst",
"45-46 hour acc fcst",
"45-46 hour ave fcst",
"45-46 hour max fcst",
"45-46 hour min fcst",
"46 hour fcst",
"46-47 hour acc fcst",
"46-47 hour ave fcst",
"46-47 hour max fcst",
"46-47 hour min fcst",
"47 hour fcst",
"47-48 hour acc fcst",
"47-48 hour ave fcst",
"47-48 hour max fcst",
"47-48 hour min fcst",
"48 hour fcst",
"5 hour fcst",
"5-6 hour acc fcst",
"5-6 hour ave fcst",
"5-6 hour max fcst",
"5-6 hour min fcst",
"6 hour fcst",
"6-7 hour acc fcst",
"6-7 hour ave fcst",
"6-7 hour max fcst",
"6-7 hour min fcst",
"7 hour fcst",
"7-8 hour acc fcst",
"7-8 hour ave fcst",
"7-8 hour max fcst",
"7-8 hour min fcst",
"8 hour fcst",
"8-9 hour acc fcst",
"8-9 hour ave fcst",
"8-9 hour max fcst",
"8-9 hour min fcst",
"9 hour fcst",
"9-10 hour acc fcst",
"9-10 hour ave fcst",
"9-10 hour max fcst",
"9-10 hour min fcst"
]
}
},
"cube:variables": {
...
"ASNOW": {
"dimensions": [
"x",
"y",
"level",
"forecast_time"
],
"type": "data",
"description": "Total Snowfall",
"unit": "m",
"dimension_domains": {
"level": [
"surface"
],
"forecast_time": [
"0-2 hour acc fcst",
"0-3 hour acc fcst",
"0-4 hour acc fcst",
"0-5 hour acc fcst",
"0-6 hour acc fcst",
"0-7 hour acc fcst",
"0-8 hour acc fcst",
"0-9 hour acc fcst",
"0-10 hour acc fcst",
"0-11 hour acc fcst",
"0-12 hour acc fcst",
"0-13 hour acc fcst",
"0-14 hour acc fcst",
"0-15 hour acc fcst",
"0-16 hour acc fcst",
"0-17 hour acc fcst",
"0-18 hour acc fcst",
"0-19 hour acc fcst",
"0-20 hour acc fcst",
"0-21 hour acc fcst",
"0-22 hour acc fcst",
"0-23 hour acc fcst",
"0-1 day acc fcst",
"0-25 hour acc fcst",
"0-26 hour acc fcst",
"0-27 hour acc fcst",
"0-28 hour acc fcst",
"0-29 hour acc fcst",
"0-30 hour acc fcst",
"0-31 hour acc fcst",
"0-32 hour acc fcst",
"0-33 hour acc fcst",
"0-34 hour acc fcst",
"0-35 hour acc fcst",
"0-36 hour acc fcst",
"0-37 hour acc fcst",
"0-38 hour acc fcst",
"0-39 hour acc fcst",
"0-40 hour acc fcst",
"0-41 hour acc fcst",
"0-42 hour acc fcst",
"0-43 hour acc fcst",
"0-44 hour acc fcst",
"0-45 hour acc fcst",
"0-46 hour acc fcst",
"0-47 hour acc fcst",
"0-2 day acc fcst"
]
}
},
...
} The The main issue that I can see is that each specific item will only cover a specific subset of the @sharkinsspatial @abarciauskas-bgse how does this look to you? I think we could come up with a pretty simple application that takes this metadata to populate |
Another possible approach would be to treat the individual "bands" in the GRIB files as assets within the items. We could keep the base It would not require a very complicated application to concatenate the same assets from different items along a time dimension in xarray and it could be used without installing Here is a gist showing how you can load tiny subsets of data into xarray without any downloading logic using To make it easier to combine assets from a single item into a datacube, we could add an asset-level property that indicates which group or datacube each asset can be added to, but maybe this would just be a separate |
FWIW, over time I've become more of a "purist" that assets in STAC items should refer to actual assets, and that things like this example extracting a variable from a file are better off in an extension on the item or asset. For example, as an extension on the assets:
That leaves the assets as plain links to files, and a (not very complicated, as you say) client application could pick out the extra metadata if they want to construct the efficient range queries. |
@TomAugspurger thanks for the suggestion. I tried the slices as assets approach in #5 and it seems a bit unwieldy. Here is a snippet from the example output in #5 "assets": {
"grib": {
"href": "https://noaahrrr.blob.core.windows.net/hrrr/hrrr.20240510/conus/hrrr.t12z.wrfsfcf00.grib2",
"type": "application/wmo-GRIB2",
"title": "2D Surface Levels",
"description": "2D Surface Level forecast data as a grib2 file. Subsets of the data can be loaded using the provided byte range.",
"roles": [
"data"
]
},
"index": {
"href": "https://noaahrrr.blob.core.windows.net/hrrr/hrrr.20240510/conus/hrrr.t12z.wrfsfcf00.grib2.idx",
"type": "application/x-ndjson",
"title": "Index file",
"description": "The index file contains information on each message within the GRIB2 file.",
"roles": [
"index"
]
},
"REFC__entire_atmosphere__analysis": {
"href": "/vsisubfile/0_238534,/vsicurl/https://noaahrrr.blob.core.windows.net/hrrr/hrrr.20240510/conus/hrrr.t12z.wrfsfcf00.grib2",
"type": "application/wmo-GRIB2",
"title": "REFC - entire atmosphere - analysis",
"hrrr:forecast_layer_type": "analysis",
"roles": [
"data"
]
},
"RETOP__cloud_top__analysis": {
"href": "/vsisubfile/238534_138642,/vsicurl/https://noaahrrr.blob.core.windows.net/hrrr/hrrr.20240510/conus/hrrr.t12z.wrfsfcf00.grib2",
"type": "application/wmo-GRIB2",
"title": "RETOP - cloud top - analysis",
"hrrr:forecast_layer_type": "analysis",
"roles": [
"data"
]
}
}
I like how a user could take the |
Here is an example of @TomAugspurger's asset-level properties idea: |
I just caught up on these comments and had some questions which I just discussed with @hrodmn
We could still benefit from having some answers to these questions from a HRRR data provider or expert:
|
One more high-level question (since I haven't had time to review the PR yet), related to data cubes, STAC items, and chunk manifests / virtual Zarr. IIUC, by storing the byte range offsets of each message in https://github.com/developmentseed/noaa-hrrr/blob/01a335d258b40a2aa9cb2dc073cdb5d7bc3c3e75/examples/hrrr-conus-sfc-2024-05-10T12-FH2/hrrr-conus-sfc-2024-05-10T12-FH2.json#L59-L95, it should be relatively easy (and fast) to form a datacube from a STAC search. Given
We should be able to make a datacube by
This seems pretty useful. It costs extra time when creating the STAC metadata (to extract the offsets) and extra space for storage and transmission of the STAC items. Do folks think the benefit is worth it? Assuming that's a good idea, I'm struggling a bit with how to expose these datacubes through STAC. Some options I see:
client = pystac_client.Client.open(...)
collection= client.get_collection("noaa-hrr")
items = client.search(collections=collection.id, datetime=[start, end])
search = collection.datacubes["ASNOW:surface:0-45 acc fcst:"] # get the search string
datasets = [ChunkThing.from_asset(item.assets["data"], search=search) for item in items]
ds = xr.concat(datasets) We could work on higher-level wrappers, but the idea is that the STAC metadata has all the information needed to combine Items into a datacube.
Sorry to the ramble-y post. Still working through this. |
@TomAugspurger thanks for your thoughts.
Your vision for an application that can use the collection-level datacube schema and the sub-asset metadata to quickly construct an xarray is exactly what I think we want to be able to build.
In this case, I am not very concerned about the overhead for STAC metadata generation and storage because we are not dealing with that many items. For any forecast hour there are just two items (conus + alaska). For a whole day's worth of forecasts there are 848 items. After a lot of trial and error over the last few weeks, here is an outline of an approach that I think we should take for v0.1: Collection(s)
Items
|
There is a simpler way to do targeted reads from GRIB files with GDAL's driver. You can use GDAL's vrt connection string syntax and the GRIB message indices like bands in a def format_vrt_path(grib_asset: Asset, grib_layer_key: str):
layer = grib_asset.extra_fields["grib:layers"][grib_layer_key]
return f"vrt:///vsicurl/{grib_asset.href}?bands={layer['grib_message']}" where |
I could go either way, but lean towards the same collection (assuming the only difference between the two is the AOI covered. If there are differences in the "schema" then Alaska can go on its own)
I'm not sure, but I lean towards the same collection? I see your comment about the different use-cases, which is compelling. Can those two use cases be supported by including some condition on the forecast hour set in the query that gets the list of STAC items?
I don't have strong thoughts. Seems fine to defer this.
Happy to split out "problematic" items into their own collections if needed. But just to confirm: is this |
The only substantial difference between Alaska and CONUS is that Alaska only gets a fresh model run every three hours while CONUS gets a fresh model run every hour. In addition to the non-overlapping spatial extents for the two regions, I added an item-level property
The item-level property
The way I have it set up right now, every product is in a separate collection. The reason I would consider splitting the |
re:CONUS / Alaska discussion: If the datacube extension metadata is a priority, we will probably need to store CONUS and Alaska in separate collections. |
I'm not the STAC expert here but it seems like having a distinct spatial extents and different projections is good enough reason to put Alaska and CONUS into separate collections. Cool to learn about the vrt:// syntax! |
Thanks all for your feedback and guidance on the best structure - I think we landed on a workable format for the HRRR data. |
Background
The NOAA HRRR data (Planetary Computer catalog entry, NOAA product page) is a continuously updated forecast data product. The data are currently housed in Azure but are not yet cataloged. Development Seed is working on improving the accessibility of the data for analytical applications.
Data structure
00-23
) get an 18-hour forecast00
,06
,12
,18
get a 48-hour forecastsubh
) gets 15 minute forecasts (four per hour per attribute)Goal
Generate STAC metadata that supports these use-cases:
xarray
Relevant links
Here are a few example collections that confront the same problem:
There is a great discussion about how to handle forecast data in a STAC collection here: Best practices for operational forecast data radiantearth/stac-spec#1169
There is a proposed
forecast
extension here: https://github.com/stac-extensions/forecastHere is a notebook with some details about the structure of the data and examples for using it: https://github.com/microsoft/AIforEarthDataSets/blob/main/data/noaa-hrrr.ipynb
Proposed solution
forecast
extension instead ofhrrr:reference_time
datetime
propertyItem
metadataExample STAC item metadata for a GRIB2
Item
:Questions
subh
variable?raster:bands
e.g. https://github.com/stactools-packages/noaa-gefs/blob/main/src/stactools/noaa_gefs/stac.py#L335kerchunk:indices
property be useful for our datacube work?cc @zacharyDez @abarciauskas-bgse @sharkinsspatial @vincentsarago @TomAugspurger
The text was updated successfully, but these errors were encountered: