-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance for backend datetime handling #7374
Conversation
for more information, see https://pre-commit.ci
…into mypy_conventions
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…into mypy_conventions
for more information, see https://pre-commit.ci
…into mypy_conventions
for more information, see https://pre-commit.ci
xarray/conventions.py
Outdated
if ( | ||
decode_coords | ||
and "coordinates" in attributes | ||
and isinstance(attributes["coordinates"], str) | ||
): | ||
attributes = dict(attributes) | ||
coord_names.update(attributes.pop("coordinates").split()) | ||
crds = attributes.pop("coordinates") | ||
coord_names.update(crds.split()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This previously would've crashed when trying to use attrs["coordinates"].split()
on a non-string value.
Might be a functionality change? Should this raise an error instead if it's not a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it would be good to raise a nice error. "coordinates"
is expected to be a string with variable names separated by spaces: http://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#attribute-appendix
Function is about 18% faster without this check.
…into mypy_conventions
Reduces time from 450ms -> 290ms from my open_dataset testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+100 for all the typing :)
Do we need some benchmark for this?
for more information, see https://pre-commit.ci
…into mypy_conventions
Benchmark are improved if I understand the logs correctly. Unfortunately not significant enough to make ASV report it though. The ratio has to be >1.5 and the improvements on .time_open_dataset are around 1.3-1.4.
|
Looks like an improvement on my machine
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Illviljan
Great, thanks! |
* main: (41 commits) v2023.01.0 whats-new (pydata#7440) explain keep_attrs in docstring of apply_ufunc (pydata#7445) Add sentence to open_dataset docstring (pydata#7438) pin scipy version in doc environment (pydata#7436) Improve performance for backend datetime handling (pydata#7374) fix typo (pydata#7433) Add lazy backend ASV test (pydata#7426) Pull Request Labeler - Workaround sync-labels bug (pydata#7431) see also : groupby in resample doc and vice-versa (pydata#7425) Some alignment optimizations (pydata#7382) Make `broadcast` and `concat` work with the Array API (pydata#7387) remove `numbagg` and `numba` from the upstream-dev CI (pydata#7416) [pre-commit.ci] pre-commit autoupdate (pydata#7402) Preserve original dtype when accessing MultiIndex levels (pydata#7393) [pre-commit.ci] pre-commit autoupdate (pydata#7389) [pre-commit.ci] pre-commit autoupdate (pydata#7360) COMPAT: Adjust CFTimeIndex.get_loc for pandas 2.0 deprecation enforcement (pydata#7361) Avoid loading entire dataset by getting the nbytes in an array (pydata#7356) `keep_attrs` for pad (pydata#7267) Bump pypa/gh-action-pypi-publish from 1.5.1 to 1.6.4 (pydata#7375) ...
Was hunting some low-hanging performance fruits when reading in files.
whats-new.rst
api.rst