Skip to content
forked from pydata/xarray

Commit 1498c35

Browse files
committed
Merge remote-tracking branch 'upstream/main' into init-zarr
* upstream/main: Faster encoding functions. (pydata#8565) ENH: vendor SerializableLock from dask and use as default backend lock, adapt tests (pydata#8571) Silence a bunch of CachingFileManager warnings (pydata#8584) Bump actions/download-artifact from 3 to 4 (pydata#8556) Minimize duplication in `map_blocks` task graph (pydata#8412) [pre-commit.ci] pre-commit autoupdate (pydata#8578) ignore a `DeprecationWarning` emitted by `seaborn` (pydata#8576) Fix mypy type ignore (pydata#8564) Support for the new compression arguments. (pydata#7551) FIX: reverse index output of bottleneck move_argmax/move_argmin functions (pydata#8552)
2 parents ce3b17d + 5f1f78f commit 1498c35

19 files changed

+419
-152
lines changed

.github/workflows/ci-additional.yaml

+4-5
Original file line numberDiff line numberDiff line change
@@ -76,11 +76,10 @@ jobs:
7676
# Raise an error if there are warnings in the doctests, with `-Werror`.
7777
# This is a trial; if it presents an problem, feel free to remove.
7878
# See https://github.com/pydata/xarray/issues/7164 for more info.
79-
80-
# ignores:
81-
# 1. h5py: see https://github.com/pydata/xarray/issues/8537
82-
python -m pytest --doctest-modules xarray --ignore xarray/tests -Werror \
83-
-W "ignore:h5py is running against HDF5 1.14.3:UserWarning"
79+
#
80+
# If dependencies emit warnings we can't do anything about, add ignores to
81+
# `xarray/tests/__init__.py`.
82+
python -m pytest --doctest-modules xarray --ignore xarray/tests -Werror
8483
8584
mypy:
8685
name: Mypy

.github/workflows/pypi-release.yaml

+3-3
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ jobs:
5454
name: Install Python
5555
with:
5656
python-version: "3.11"
57-
- uses: actions/download-artifact@v3
57+
- uses: actions/download-artifact@v4
5858
with:
5959
name: releases
6060
path: dist
@@ -82,7 +82,7 @@ jobs:
8282
id-token: write
8383

8484
steps:
85-
- uses: actions/download-artifact@v3
85+
- uses: actions/download-artifact@v4
8686
with:
8787
name: releases
8888
path: dist
@@ -106,7 +106,7 @@ jobs:
106106
id-token: write
107107

108108
steps:
109-
- uses: actions/download-artifact@v3
109+
- uses: actions/download-artifact@v4
110110
with:
111111
name: releases
112112
path: dist

.pre-commit-config.yaml

+5-5
Original file line numberDiff line numberDiff line change
@@ -18,24 +18,24 @@ repos:
1818
files: ^xarray/
1919
- repo: https://github.com/astral-sh/ruff-pre-commit
2020
# Ruff version.
21-
rev: 'v0.1.6'
21+
rev: 'v0.1.9'
2222
hooks:
2323
- id: ruff
2424
args: ["--fix"]
2525
# https://github.com/python/black#version-control-integration
26-
- repo: https://github.com/psf/black
27-
rev: 23.11.0
26+
- repo: https://github.com/psf/black-pre-commit-mirror
27+
rev: 23.12.1
2828
hooks:
2929
- id: black-jupyter
3030
- repo: https://github.com/keewis/blackdoc
3131
rev: v0.3.9
3232
hooks:
3333
- id: blackdoc
3434
exclude: "generate_aggregations.py"
35-
additional_dependencies: ["black==23.11.0"]
35+
additional_dependencies: ["black==23.12.1"]
3636
- id: blackdoc-autoupdate-black
3737
- repo: https://github.com/pre-commit/mirrors-mypy
38-
rev: v1.7.1
38+
rev: v1.8.0
3939
hooks:
4040
- id: mypy
4141
# Copied from setup.cfg

doc/whats-new.rst

+13-1
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,10 @@ New Features
2626

2727
- :py:meth:`xr.cov` and :py:meth:`xr.corr` now support using weights (:issue:`8527`, :pull:`7392`).
2828
By `Llorenç Lledó <https://github.com/lluritu>`_.
29+
- Accept the compression arguments new in netCDF 1.6.0 in the netCDF4 backend.
30+
See `netCDF4 documentation <https://unidata.github.io/netcdf4-python/#efficient-compression-of-netcdf-variables>`_ for details.
31+
By `Markel García-Díez <https://github.com/markelg>`_. (:issue:`6929`, :pull:`7551`) Note that some
32+
new compression filters needs plugins to be installed which may not be available in all netCDF distributions.
2933

3034
Breaking changes
3135
~~~~~~~~~~~~~~~~
@@ -38,14 +42,22 @@ Deprecations
3842
Bug fixes
3943
~~~~~~~~~
4044

45+
- Reverse index output of bottleneck's rolling move_argmax/move_argmin functions (:issue:`8541`, :pull:`8552`).
46+
By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_.
47+
- Vendor `SerializableLock` from dask and use as default lock for netcdf4 backends (:issue:`8442`, :pull:`8571`).
48+
By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_.
49+
4150

4251
Documentation
4352
~~~~~~~~~~~~~
4453

4554

4655
Internal Changes
4756
~~~~~~~~~~~~~~~~
48-
57+
- The implementation of :py:func:`map_blocks` has changed to minimize graph size and duplication of data.
58+
This should be a strict improvement even though the graphs are not always embarassingly parallel any more.
59+
Please open an issue if you spot a regression. (:pull:`8412`, :issue:`8409`).
60+
By `Deepak Cherian <https://github.com/dcherian>`_.
4961
- Remove null values before plotting. (:pull:`8535`).
5062
By `Jimmy Westling <https://github.com/illviljan>`_.
5163

pyproject.toml

+1
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@ module = [
9191
"cf_units.*",
9292
"cfgrib.*",
9393
"cftime.*",
94+
"cloudpickle.*",
9495
"cubed.*",
9596
"cupy.*",
9697
"dask.types.*",

xarray/backends/locks.py

+76-8
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,83 @@
22

33
import multiprocessing
44
import threading
5+
import uuid
56
import weakref
6-
from collections.abc import MutableMapping
7-
from typing import Any
8-
9-
try:
10-
from dask.utils import SerializableLock
11-
except ImportError:
12-
# no need to worry about serializing the lock
13-
SerializableLock = threading.Lock # type: ignore
7+
from collections.abc import Hashable, MutableMapping
8+
from typing import Any, ClassVar
9+
from weakref import WeakValueDictionary
10+
11+
12+
# SerializableLock is adapted from Dask:
13+
# https://github.com/dask/dask/blob/74e898f0ec712e8317ba86cc3b9d18b6b9922be0/dask/utils.py#L1160-L1224
14+
# Used under the terms of Dask's license, see licenses/DASK_LICENSE.
15+
class SerializableLock:
16+
"""A Serializable per-process Lock
17+
18+
This wraps a normal ``threading.Lock`` object and satisfies the same
19+
interface. However, this lock can also be serialized and sent to different
20+
processes. It will not block concurrent operations between processes (for
21+
this you should look at ``dask.multiprocessing.Lock`` or ``locket.lock_file``
22+
but will consistently deserialize into the same lock.
23+
24+
So if we make a lock in one process::
25+
26+
lock = SerializableLock()
27+
28+
And then send it over to another process multiple times::
29+
30+
bytes = pickle.dumps(lock)
31+
a = pickle.loads(bytes)
32+
b = pickle.loads(bytes)
33+
34+
Then the deserialized objects will operate as though they were the same
35+
lock, and collide as appropriate.
36+
37+
This is useful for consistently protecting resources on a per-process
38+
level.
39+
40+
The creation of locks is itself not threadsafe.
41+
"""
42+
43+
_locks: ClassVar[
44+
WeakValueDictionary[Hashable, threading.Lock]
45+
] = WeakValueDictionary()
46+
token: Hashable
47+
lock: threading.Lock
48+
49+
def __init__(self, token: Hashable | None = None):
50+
self.token = token or str(uuid.uuid4())
51+
if self.token in SerializableLock._locks:
52+
self.lock = SerializableLock._locks[self.token]
53+
else:
54+
self.lock = threading.Lock()
55+
SerializableLock._locks[self.token] = self.lock
56+
57+
def acquire(self, *args, **kwargs):
58+
return self.lock.acquire(*args, **kwargs)
59+
60+
def release(self, *args, **kwargs):
61+
return self.lock.release(*args, **kwargs)
62+
63+
def __enter__(self):
64+
self.lock.__enter__()
65+
66+
def __exit__(self, *args):
67+
self.lock.__exit__(*args)
68+
69+
def locked(self):
70+
return self.lock.locked()
71+
72+
def __getstate__(self):
73+
return self.token
74+
75+
def __setstate__(self, token):
76+
self.__init__(token)
77+
78+
def __str__(self):
79+
return f"<{self.__class__.__name__}: {self.token}>"
80+
81+
__repr__ = __str__
1482

1583

1684
# Locks used by multiple backends.

xarray/backends/netCDF4_.py

+17-8
Original file line numberDiff line numberDiff line change
@@ -257,6 +257,12 @@ def _extract_nc4_variable_encoding(
257257
"_FillValue",
258258
"dtype",
259259
"compression",
260+
"significant_digits",
261+
"quantize_mode",
262+
"blosc_shuffle",
263+
"szip_coding",
264+
"szip_pixels_per_block",
265+
"endian",
260266
}
261267
if lsd_okay:
262268
valid_encodings.add("least_significant_digit")
@@ -497,20 +503,23 @@ def prepare_variable(
497503
if name in self.ds.variables:
498504
nc4_var = self.ds.variables[name]
499505
else:
500-
nc4_var = self.ds.createVariable(
506+
default_args = dict(
501507
varname=name,
502508
datatype=datatype,
503509
dimensions=variable.dims,
504-
zlib=encoding.get("zlib", False),
505-
complevel=encoding.get("complevel", 4),
506-
shuffle=encoding.get("shuffle", True),
507-
fletcher32=encoding.get("fletcher32", False),
508-
contiguous=encoding.get("contiguous", False),
509-
chunksizes=encoding.get("chunksizes"),
510+
zlib=False,
511+
complevel=4,
512+
shuffle=True,
513+
fletcher32=False,
514+
contiguous=False,
515+
chunksizes=None,
510516
endian="native",
511-
least_significant_digit=encoding.get("least_significant_digit"),
517+
least_significant_digit=None,
512518
fill_value=fill_value,
513519
)
520+
default_args.update(encoding)
521+
default_args.pop("_FillValue", None)
522+
nc4_var = self.ds.createVariable(**default_args)
514523

515524
nc4_var.setncatts(attrs)
516525

xarray/coding/strings.py

+12-8
Original file line numberDiff line numberDiff line change
@@ -47,12 +47,11 @@ class EncodedStringCoder(VariableCoder):
4747
def __init__(self, allows_unicode=True):
4848
self.allows_unicode = allows_unicode
4949

50-
def encode(self, variable, name=None):
50+
def encode(self, variable: Variable, name=None) -> Variable:
5151
dims, data, attrs, encoding = unpack_for_encoding(variable)
5252

5353
contains_unicode = is_unicode_dtype(data.dtype)
5454
encode_as_char = encoding.get("dtype") == "S1"
55-
5655
if encode_as_char:
5756
del encoding["dtype"] # no longer relevant
5857

@@ -69,9 +68,12 @@ def encode(self, variable, name=None):
6968
# TODO: figure out how to handle this in a lazy way with dask
7069
data = encode_string_array(data, string_encoding)
7170

72-
return Variable(dims, data, attrs, encoding)
71+
return Variable(dims, data, attrs, encoding)
72+
else:
73+
variable.encoding = encoding
74+
return variable
7375

74-
def decode(self, variable, name=None):
76+
def decode(self, variable: Variable, name=None) -> Variable:
7577
dims, data, attrs, encoding = unpack_for_decoding(variable)
7678

7779
if "_Encoding" in attrs:
@@ -95,13 +97,15 @@ def encode_string_array(string_array, encoding="utf-8"):
9597
return np.array(encoded, dtype=bytes).reshape(string_array.shape)
9698

9799

98-
def ensure_fixed_length_bytes(var):
100+
def ensure_fixed_length_bytes(var: Variable) -> Variable:
99101
"""Ensure that a variable with vlen bytes is converted to fixed width."""
100-
dims, data, attrs, encoding = unpack_for_encoding(var)
101-
if check_vlen_dtype(data.dtype) == bytes:
102+
if check_vlen_dtype(var.dtype) == bytes:
103+
dims, data, attrs, encoding = unpack_for_encoding(var)
102104
# TODO: figure out how to handle this with dask
103105
data = np.asarray(data, dtype=np.bytes_)
104-
return Variable(dims, data, attrs, encoding)
106+
return Variable(dims, data, attrs, encoding)
107+
else:
108+
return var
105109

106110

107111
class CharacterArrayCoder(VariableCoder):

xarray/conventions.py

+9-7
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
)
1717
from xarray.core.pycompat import is_duck_dask_array
1818
from xarray.core.utils import emit_user_level_warning
19-
from xarray.core.variable import IndexVariable, Variable
19+
from xarray.core.variable import Variable
2020

2121
CF_RELATED_DATA = (
2222
"bounds",
@@ -97,10 +97,10 @@ def _infer_dtype(array, name=None):
9797

9898

9999
def ensure_not_multiindex(var: Variable, name: T_Name = None) -> None:
100-
if isinstance(var, IndexVariable) and isinstance(var.to_index(), pd.MultiIndex):
100+
if isinstance(var._data, indexing.PandasMultiIndexingAdapter):
101101
raise NotImplementedError(
102102
f"variable {name!r} is a MultiIndex, which cannot yet be "
103-
"serialized to netCDF files. Instead, either use reset_index() "
103+
"serialized. Instead, either use reset_index() "
104104
"to convert MultiIndex levels into coordinate variables instead "
105105
"or use https://cf-xarray.readthedocs.io/en/latest/coding.html."
106106
)
@@ -647,7 +647,9 @@ def cf_decoder(
647647
return variables, attributes
648648

649649

650-
def _encode_coordinates(variables, attributes, non_dim_coord_names):
650+
def _encode_coordinates(
651+
variables: T_Variables, attributes: T_Attrs, non_dim_coord_names
652+
):
651653
# calculate global and variable specific coordinates
652654
non_dim_coord_names = set(non_dim_coord_names)
653655

@@ -675,7 +677,7 @@ def _encode_coordinates(variables, attributes, non_dim_coord_names):
675677
variable_coordinates[k].add(coord_name)
676678

677679
if any(
678-
attr_name in v.encoding and coord_name in v.encoding.get(attr_name)
680+
coord_name in v.encoding.get(attr_name, tuple())
679681
for attr_name in CF_RELATED_DATA
680682
):
681683
not_technically_coordinates.add(coord_name)
@@ -742,7 +744,7 @@ def _encode_coordinates(variables, attributes, non_dim_coord_names):
742744
return variables, attributes
743745

744746

745-
def encode_dataset_coordinates(dataset):
747+
def encode_dataset_coordinates(dataset: Dataset):
746748
"""Encode coordinates on the given dataset object into variable specific
747749
and global attributes.
748750
@@ -764,7 +766,7 @@ def encode_dataset_coordinates(dataset):
764766
)
765767

766768

767-
def cf_encoder(variables, attributes):
769+
def cf_encoder(variables: T_Variables, attributes: T_Attrs):
768770
"""
769771
Encode a set of CF encoded variables and attributes.
770772
Takes a dicts of variables and attributes and encodes them

xarray/core/dataarray.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@
8484
try:
8585
from dask.delayed import Delayed
8686
except ImportError:
87-
Delayed = None # type: ignore
87+
Delayed = None # type: ignore[misc,assignment]
8888
try:
8989
from iris.cube import Cube as iris_Cube
9090
except ImportError:

xarray/core/dataset.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,7 @@
167167
try:
168168
from dask.delayed import Delayed
169169
except ImportError:
170-
Delayed = None # type: ignore
170+
Delayed = None # type: ignore[misc,assignment]
171171
try:
172172
from dask.dataframe import DataFrame as DaskDataFrame
173173
except ImportError:

0 commit comments

Comments
 (0)