Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ecosystem section #123

Merged
merged 14 commits into from
Jul 8, 2022
Merged
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
266 changes: 258 additions & 8 deletions intermediate/xarray_ecosystem.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,20 @@
"\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #2.    import cf_xarray.units  # must be imported before pint_xarray

the import order is not important anymore with recent versions of pint, so you should probably remove the comment


Reply via ReviewNB

"Xarray is easily extensible.\n",
"This means it is easy to add onto to build custom packages that tackle particular computational problems.\n",
"Here we introduce two popular and widely used extensions that are installable as their own packages (via conda and pip).\n",
"\n",
"These packages can plug in to xarray in various different ways. They may build directly on top of xarray, or they may take advantage of some of xarray's dedicated interfacing features:\n",
"- Accessors\n",
"- Backend (filetype) entrypoint\n",
"- Duck-array wrapping interface\n",
"- Metadata attributes\n",
"- Flexible indexes (coming soon!)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we introduce two popular and widely used extensions that are installable as their own packages (via conda and pip). These packages integrate with xarray using one or more of the features mentioned above.\n",
"\n",
"- [rioxarray](https://corteva.github.io/rioxarray/stable/index.html), for working with geospatial raster data using rasterio\n",
"- [pint-xarray](https://pint-xarray.readthedocs.io/en/latest/), for unit-aware computations using pint."
Expand All @@ -20,16 +33,141 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## The accessor interface\n",
"## (Preface) The accessor interface\n",
dcherian marked this conversation as resolved.
Show resolved Hide resolved
"\n",
"Before we look at the packages we need to briefly introduce a feature they commonly use: [\"xarray accessors\"](https://docs.xarray.dev/en/stable/internals/extending-xarray.html).\n",
"\n",
"An accessor is a way of attaching a custom function to xarray types so that it can be called as if it were a method, but while retaining a clear separation between \"core\" xarray API and custom API."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For example, imagine you're a statistician who regularly uses a special `skewness` function which acts on dataarrays but is only of interest to people in your specific field.\n",
"\n",
"You can create a method which applies this skewness function to an xarray objects, and then register the method under a custom `stats` accessor like this"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from scipy.stats import skew\n",
"\n",
"import xarray as xr\n",
"\n",
"\n",
"@xr.register_dataarray_accessor(\"stats\")\n",
"class StatsAccessor:\n",
" def __init__(self, da):\n",
" self._da = da\n",
"\n",
" def skewness(self, dim):\n",
" return self._da.reduce(func=skew, dim=dim)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can conveniently access this functionality via the `stats` accessor"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import xarray as xr\n",
"\n",
"- Describe what an accessor is before introducting `.rio`, `.pint` etc."
"ds = xr.tutorial.load_dataset(\"air_temperature\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds['air'].stats.skewness(dim=\"time\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice how the presence of `.stats` clearly differentiates our new \"accessor method\" from core xarray methods."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This accessor-style syntax is used heavily by the other libraries we are about to cover."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## hvplot via accessors"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The [HoloViews library](https://holoviews.org/) makes great use of accessors to allow seamless plotting of xarray data using a completely different plotting backend."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We first need to import the code that registers the hvplot accessor"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import hvplot.xarray"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And now we can call the `.hvplot` method to plot using holoviews in the same way that we would have used `.plot` to plot using matplotlib."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds['air'].isel(time=1).hvplot(cmap=\"fire\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Rioxarray\n",
"For some more examples of how powerful holoviews is [see here](https://tutorial.xarray.dev/intermediate/hvplot.html)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Rioxarray via the backend entrypoint\n",
"more details about rioxarray here"
]
},
Expand All @@ -47,9 +185,7 @@
"cell_type": "code",
"execution_count": null,
"metadata": {
"jupyter": {
"source_hidden": true
}
"tags": []
},
"outputs": [],
"source": [
Expand All @@ -60,7 +196,100 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pint\n",
"## cf-xarray via metadata attributes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Xarray objects can store [arbitrary metadata](https://docs.xarray.dev/en/stable/getting-started-guide/faq.html#what-is-your-approach-to-metadata) in the form of a `dict` attached to each `DataArray` and `Dataset` object, accessible via the `.attrs` property."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"xr.DataArray(name=\"Hitchhiker\", data=0, attrs={\"life\": 42, \"name\": \"Arthur Dent\"})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Normally xarray operations ignore this metadata, simply carting it around until you explicitly choose to use it. However sometimes we might want to write custom code which makes use of the metadata."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[cf_xarray](https://cf-xarray.readthedocs.io/) is a project that tries to\n",
"let you make use of other [Climate and Forecast metadata convention attributes](http://cfconventions.org/) (or \"CF attributes\") that xarray ignores. It attaches itself\n",
"to all xarray objects under the `.cf` namespace.\n",
"\n",
"Where xarray allows you to specify dimension names for analysis, `cf_xarray`\n",
"lets you specify logical names like `\"latitude\"` or `\"longitude\"` instead as\n",
"long as the appropriate CF attributes are set.\n",
"\n",
"For example, the `\"longitude\"` dimension in different files might be labelled as: (lon, LON, long, x…), but cf_xarray let's you always refer to the logical name `\"longitude\"` in your code:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import cf_xarray"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# describe cf attributes in dataset\n",
"ds.air.cf.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following `mean` operation will work with any dataset that has appropriate\n",
"attributes set that allow detection of the \"latitude\" variable (e.g.\n",
"`units: \"degress_north\"` or `standard_name: \"latitude\"`)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# demonstrate equivalent of .mean(\"lat\")\n",
"ds.air.cf.mean(\"latitude\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# demonstrate indexing\n",
"ds.air.cf.sel(longitude=242.5, method=\"nearest\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pint via duck array wrapping\n",
"\n",
"more details about pint here\n"
]
Expand All @@ -77,6 +306,27 @@
"\n",
"xr.set_options(display_expand_data=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The wider world..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are many other libraries in the wider xarray ecosystem. For a list of a few packages we particularly like for geoscience work [see here](https://tutorial.xarray.dev/overview/xarray-in-45-min.html#other-cool-packages), and for a [more exhaustive list see here](https://docs.xarray.dev/en/stable/ecosystem.html)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand Down