Skip to content

Commit 79cfe4d

Browse files
committed
Merge remote-tracking branch 'origin/main' into backend-indexing
* origin/main: clean up the upstream-dev setup script (#8986) Skip flaky `test_open_mfdataset_manyfiles` test (#8989) Remove `.drop` warning allow (#8988) Add notes on when to add ignores to warnings (#8987) Docstring and documentation improvement for the Dataset class (#8973)
2 parents e96e70e + bfcb0a7 commit 79cfe4d

File tree

5 files changed

+92
-79
lines changed

5 files changed

+92
-79
lines changed

ci/install-upstream-wheels.sh

+4-22
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,5 @@
11
#!/usr/bin/env bash
22

3-
# install cython for building cftime without build isolation
4-
micromamba install "cython>=0.29.20" py-cpuinfo setuptools-scm
53
# temporarily (?) remove numbagg and numba
64
micromamba remove -y numba numbagg sparse
75
# temporarily remove numexpr
@@ -18,10 +16,9 @@ micromamba remove -y --force \
1816
zarr \
1917
cftime \
2018
packaging \
21-
pint \
2219
bottleneck \
23-
flox \
24-
numcodecs
20+
flox
21+
# pint
2522
# to limit the runtime of Upstream CI
2623
python -m pip install \
2724
-i https://pypi.anaconda.org/scientific-python-nightly-wheels/simple \
@@ -42,32 +39,17 @@ python -m pip install \
4239
--pre \
4340
--upgrade \
4441
pyarrow
45-
# without build isolation for packages compiling against numpy
46-
# TODO: remove once there are `numpy>=2.0` builds for these
47-
python -m pip install \
48-
--no-deps \
49-
--upgrade \
50-
--no-build-isolation \
51-
git+https://github.com/Unidata/cftime
52-
python -m pip install \
53-
--no-deps \
54-
--upgrade \
55-
--no-build-isolation \
56-
git+https://github.com/zarr-developers/numcodecs
57-
python -m pip install \
58-
--no-deps \
59-
--upgrade \
60-
--no-build-isolation \
61-
git+https://github.com/pydata/bottleneck
6242
python -m pip install \
6343
--no-deps \
6444
--upgrade \
6545
git+https://github.com/dask/dask \
6646
git+https://github.com/dask/dask-expr \
6747
git+https://github.com/dask/distributed \
6848
git+https://github.com/zarr-developers/zarr \
49+
git+https://github.com/Unidata/cftime \
6950
git+https://github.com/pypa/packaging \
7051
git+https://github.com/hgrecco/pint \
52+
git+https://github.com/pydata/bottleneck \
7153
git+https://github.com/intake/filesystem_spec \
7254
git+https://github.com/SciTools/nc-time-axis \
7355
git+https://github.com/xarray-contrib/flox \

doc/user-guide/data-structures.rst

+33-20
Original file line numberDiff line numberDiff line change
@@ -282,27 +282,40 @@ variables (``data_vars``), coordinates (``coords``) and attributes (``attrs``).
282282

283283
- ``attrs`` should be a dictionary.
284284

285-
Let's create some fake data for the example we show above:
285+
Let's create some fake data for the example we show above. In this
286+
example dataset, we will represent measurements of the temperature and
287+
pressure that were made under various conditions:
288+
289+
* the measurements were made on four different days;
290+
* they were made at two separate locations, which we will represent using
291+
their latitude and longitude; and
292+
* they were made using instruments by three different manufacutrers, which we
293+
will refer to as `'manufac1'`, `'manufac2'`, and `'manufac3'`.
286294

287295
.. ipython:: python
288296
289-
temp = 15 + 8 * np.random.randn(2, 2, 3)
290-
precip = 10 * np.random.rand(2, 2, 3)
291-
lon = [[-99.83, -99.32], [-99.79, -99.23]]
292-
lat = [[42.25, 42.21], [42.63, 42.59]]
297+
np.random.seed(0)
298+
temperature = 15 + 8 * np.random.randn(2, 3, 4)
299+
precipitation = 10 * np.random.rand(2, 3, 4)
300+
lon = [-99.83, -99.32]
301+
lat = [42.25, 42.21]
302+
instruments = ["manufac1", "manufac2", "manufac3"]
303+
time = pd.date_range("2014-09-06", periods=4)
304+
reference_time = pd.Timestamp("2014-09-05")
293305
294306
# for real use cases, its good practice to supply array attributes such as
295307
# units, but we won't bother here for the sake of brevity
296308
ds = xr.Dataset(
297309
{
298-
"temperature": (["x", "y", "time"], temp),
299-
"precipitation": (["x", "y", "time"], precip),
310+
"temperature": (["loc", "instrument", "time"], temperature),
311+
"precipitation": (["loc", "instrument", "time"], precipitation),
300312
},
301313
coords={
302-
"lon": (["x", "y"], lon),
303-
"lat": (["x", "y"], lat),
304-
"time": pd.date_range("2014-09-06", periods=3),
305-
"reference_time": pd.Timestamp("2014-09-05"),
314+
"lon": (["loc"], lon),
315+
"lat": (["loc"], lat),
316+
"instrument": instruments,
317+
"time": time,
318+
"reference_time": reference_time,
306319
},
307320
)
308321
ds
@@ -387,12 +400,12 @@ example, to create this example dataset from scratch, we could have written:
387400
.. ipython:: python
388401
389402
ds = xr.Dataset()
390-
ds["temperature"] = (("x", "y", "time"), temp)
391-
ds["temperature_double"] = (("x", "y", "time"), temp * 2)
392-
ds["precipitation"] = (("x", "y", "time"), precip)
393-
ds.coords["lat"] = (("x", "y"), lat)
394-
ds.coords["lon"] = (("x", "y"), lon)
395-
ds.coords["time"] = pd.date_range("2014-09-06", periods=3)
403+
ds["temperature"] = (("loc", "instrument", "time"), temperature)
404+
ds["temperature_double"] = (("loc", "instrument", "time"), temperature * 2)
405+
ds["precipitation"] = (("loc", "instrument", "time"), precipitation)
406+
ds.coords["lat"] = (("loc",), lat)
407+
ds.coords["lon"] = (("loc",), lon)
408+
ds.coords["time"] = pd.date_range("2014-09-06", periods=4)
396409
ds.coords["reference_time"] = pd.Timestamp("2014-09-05")
397410
398411
To change the variables in a ``Dataset``, you can use all the standard dictionary
@@ -452,8 +465,8 @@ follow nested function calls:
452465
453466
# these lines are equivalent, but with pipe we can make the logic flow
454467
# entirely from left to right
455-
plt.plot((2 * ds.temperature.sel(x=0)).mean("y"))
456-
(ds.temperature.sel(x=0).pipe(lambda x: 2 * x).mean("y").pipe(plt.plot))
468+
plt.plot((2 * ds.temperature.sel(loc=0)).mean("instrument"))
469+
(ds.temperature.sel(loc=0).pipe(lambda x: 2 * x).mean("instrument").pipe(plt.plot))
457470
458471
Both ``pipe`` and ``assign`` replicate the pandas methods of the same names
459472
(:py:meth:`DataFrame.pipe <pandas.DataFrame.pipe>` and
@@ -479,7 +492,7 @@ dimension and non-dimension variables:
479492

480493
.. ipython:: python
481494
482-
ds.coords["day"] = ("time", [6, 7, 8])
495+
ds.coords["day"] = ("time", [6, 7, 8, 9])
483496
ds.swap_dims({"time": "day"})
484497
485498
.. _coordinates:

pyproject.toml

+14-11
Original file line numberDiff line numberDiff line change
@@ -288,18 +288,22 @@ addopts = ["--strict-config", "--strict-markers"]
288288
# - Converts any warning from xarray into an error
289289
# - Allows some warnings ("default") which the test suite currently raises,
290290
# since it wasn't practical to fix them all before merging this config. The
291-
# arnings are still listed in CI (since it uses `default`, not `ignore`).
291+
# warnings are reported in CI (since it uses `default`, not `ignore`).
292292
#
293-
# We can remove these rules allowing warnings; a valued contribution is removing
294-
# a line, seeing what breaks, and then fixing the library code or tests so that
295-
# it doesn't raise warnings.
293+
# Over time, we can remove these rules allowing warnings. A valued contribution
294+
# is removing a line, seeing what breaks, and then fixing the library code or
295+
# tests so that it doesn't raise warnings.
296296
#
297-
# While we only raise an error on warnings from within xarray, if dependency
298-
# raises a warning with a stacklevel such that it's interpreted to be raised
299-
# from xarray, please feel free to add a rule switching it to `default` here.
300-
#
301-
# If these settings get in the way of making progress, it's also acceptable to
302-
# temporarily add additional ignores.
297+
# There are some instance where we'll want to add to these rules:
298+
# - While we only raise errors on warnings from within xarray, a dependency can
299+
# raise a warning with a stacklevel such that it's interpreted to be raised
300+
# from xarray and this will mistakenly convert it to an error. If that
301+
# happens, please feel free to add a rule switching it to `default` here, and
302+
# disabling the error.
303+
# - If these settings get in the way of making progress, it's also acceptable to
304+
# temporarily add additional `default` rules.
305+
# - But we should only add `ignore` rules if we're confident that we'll never
306+
# need to address a warning.
303307

304308
filterwarnings = [
305309
"error:::xarray.*",
@@ -315,7 +319,6 @@ filterwarnings = [
315319
"default:deallocating CachingFileManager:RuntimeWarning:xarray.backends.netCDF4_",
316320
"default:deallocating CachingFileManager:RuntimeWarning:xarray.core.indexing",
317321
"default:Failed to decode variable.*NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays:DeprecationWarning",
318-
"default:dropping variables using `drop` is deprecated; use drop_vars:DeprecationWarning:xarray.tests.test_groupby",
319322
"default:The `interpolation` argument to quantile was renamed to `method`:FutureWarning:xarray.*",
320323
"default:invalid value encountered in cast:RuntimeWarning:xarray.core.duck_array_ops",
321324
"default:invalid value encountered in cast:RuntimeWarning:xarray.conventions",

xarray/core/dataset.py

+38-23
Original file line numberDiff line numberDiff line change
@@ -590,60 +590,75 @@ class Dataset(
590590
591591
Examples
592592
--------
593-
Create data:
593+
In this example dataset, we will represent measurements of the temperature
594+
and pressure that were made under various conditions:
595+
596+
* the measurements were made on four different days;
597+
* they were made at two separate locations, which we will represent using
598+
their latitude and longitude; and
599+
* they were made using three instrument developed by three different
600+
manufacturers, which we will refer to using the strings `'manufac1'`,
601+
`'manufac2'`, and `'manufac3'`.
594602
595603
>>> np.random.seed(0)
596-
>>> temperature = 15 + 8 * np.random.randn(2, 2, 3)
597-
>>> precipitation = 10 * np.random.rand(2, 2, 3)
598-
>>> lon = [[-99.83, -99.32], [-99.79, -99.23]]
599-
>>> lat = [[42.25, 42.21], [42.63, 42.59]]
600-
>>> time = pd.date_range("2014-09-06", periods=3)
604+
>>> temperature = 15 + 8 * np.random.randn(2, 3, 4)
605+
>>> precipitation = 10 * np.random.rand(2, 3, 4)
606+
>>> lon = [-99.83, -99.32]
607+
>>> lat = [42.25, 42.21]
608+
>>> instruments = ["manufac1", "manufac2", "manufac3"]
609+
>>> time = pd.date_range("2014-09-06", periods=4)
601610
>>> reference_time = pd.Timestamp("2014-09-05")
602611
603-
Initialize a dataset with multiple dimensions:
612+
Here, we initialize the dataset with multiple dimensions. We use the string
613+
`"loc"` to represent the location dimension of the data, the string
614+
`"instrument"` to represent the instrument manufacturer dimension, and the
615+
string `"time"` for the time dimension.
604616
605617
>>> ds = xr.Dataset(
606618
... data_vars=dict(
607-
... temperature=(["x", "y", "time"], temperature),
608-
... precipitation=(["x", "y", "time"], precipitation),
619+
... temperature=(["loc", "instrument", "time"], temperature),
620+
... precipitation=(["loc", "instrument", "time"], precipitation),
609621
... ),
610622
... coords=dict(
611-
... lon=(["x", "y"], lon),
612-
... lat=(["x", "y"], lat),
623+
... lon=("loc", lon),
624+
... lat=("loc", lat),
625+
... instrument=instruments,
613626
... time=time,
614627
... reference_time=reference_time,
615628
... ),
616629
... attrs=dict(description="Weather related data."),
617630
... )
618631
>>> ds
619-
<xarray.Dataset> Size: 288B
620-
Dimensions: (x: 2, y: 2, time: 3)
632+
<xarray.Dataset> Size: 552B
633+
Dimensions: (loc: 2, instrument: 3, time: 4)
621634
Coordinates:
622-
lon (x, y) float64 32B -99.83 -99.32 -99.79 -99.23
623-
lat (x, y) float64 32B 42.25 42.21 42.63 42.59
624-
* time (time) datetime64[ns] 24B 2014-09-06 2014-09-07 2014-09-08
635+
lon (loc) float64 16B -99.83 -99.32
636+
lat (loc) float64 16B 42.25 42.21
637+
* instrument (instrument) <U8 96B 'manufac1' 'manufac2' 'manufac3'
638+
* time (time) datetime64[ns] 32B 2014-09-06 ... 2014-09-09
625639
reference_time datetime64[ns] 8B 2014-09-05
626-
Dimensions without coordinates: x, y
640+
Dimensions without coordinates: loc
627641
Data variables:
628-
temperature (x, y, time) float64 96B 29.11 18.2 22.83 ... 16.15 26.63
629-
precipitation (x, y, time) float64 96B 5.68 9.256 0.7104 ... 4.615 7.805
642+
temperature (loc, instrument, time) float64 192B 29.11 18.2 ... 9.063
643+
precipitation (loc, instrument, time) float64 192B 4.562 5.684 ... 1.613
630644
Attributes:
631645
description: Weather related data.
632646
633647
Find out where the coldest temperature was and what values the
634648
other variables had:
635649
636650
>>> ds.isel(ds.temperature.argmin(...))
637-
<xarray.Dataset> Size: 48B
651+
<xarray.Dataset> Size: 80B
638652
Dimensions: ()
639653
Coordinates:
640654
lon float64 8B -99.32
641655
lat float64 8B 42.21
642-
time datetime64[ns] 8B 2014-09-08
656+
instrument <U8 32B 'manufac3'
657+
time datetime64[ns] 8B 2014-09-06
643658
reference_time datetime64[ns] 8B 2014-09-05
644659
Data variables:
645-
temperature float64 8B 7.182
646-
precipitation float64 8B 8.326
660+
temperature float64 8B -5.424
661+
precipitation float64 8B 9.884
647662
Attributes:
648663
description: Weather related data.
649664

xarray/tests/test_backends.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -3813,11 +3813,11 @@ def skip_if_not_engine(engine):
38133813
pytest.importorskip(engine)
38143814

38153815

3816-
# Flaky test. Very open to contributions on fixing this
38173816
@requires_dask
38183817
@pytest.mark.filterwarnings("ignore:use make_scale(name) instead")
3819-
@pytest.mark.xfail(reason="Flaky test. Very open to contributions on fixing this")
3820-
@pytest.mark.skipif(ON_WINDOWS, reason="Skipping on Windows")
3818+
@pytest.mark.skip(
3819+
reason="Flaky test which can cause the worker to crash (so don't xfail). Very open to contributions fixing this"
3820+
)
38213821
def test_open_mfdataset_manyfiles(
38223822
readengine, nfiles, parallel, chunks, file_cache_maxsize
38233823
):

0 commit comments

Comments
 (0)