Automatically chunk `other` in GroupBy binary ops. #7684

dcherian · 2023-03-27T15:15:22Z

Closes automatically chunk in groupby binary ops #7683
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst

xarray/core/groupby.py

TomNicholas · 2023-03-27T20:54:53Z

xarray/core/groupby.py

+        if obj.__dask_graph__() is not None and other.__dask_graph__() is None:
+            # a chunk size of 1 seems reasonable since we expect it to be repeated
+            # TODO: what about dims other than `name``
+            # TODO: What about datasets with some dask vars, and others not?


Do we know which variables will be operated on at this point? (I'm not very familiar with this part of the code). Surely we only need to chunk the variables that will be operated on?

Hmmm.... I guess technically we could have different data vars in obj and other

…lazy-array * upstream/main: (153 commits) Add HDF5 Section to read/write docs page (pydata#8012) [pre-commit.ci] pre-commit autoupdate (pydata#8014) Update interpolate_na in dataset.py (pydata#7974) improved docstring of to_netcdf (issue pydata#7127) (pydata#7947) Expose "Coordinates" as part of Xarray's public API (pydata#7368) Core team member guide (pydata#7999) join together duplicate entries in the text `repr` (pydata#7225) Update copyright year in README (pydata#8007) Allow opening datasets with nD dimenson coordinate variables. (pydata#7989) Move whats-new entry [pre-commit.ci] pre-commit autoupdate (pydata#7997) Add documentation on custom indexes (pydata#6975) Use variable name in all exceptions raised in `as_variable` (pydata#7995) Bump pypa/gh-action-pypi-publish from 1.8.7 to 1.8.8 (pydata#7994) New whatsnew section Remove future release notes before this release Update whats-new.rst for new release (pydata#7993) Remove hue_style from plot1d docstring (pydata#7925) Add new what's new section (pydata#7986) Release summary for v2023.07.0 (pydata#7979) ...

xarray/core/groupby.py

dcherian · 2023-07-28T02:17:40Z

+      4.44±0.2ms       66.4±0.4ms    14.93  groupby.GroupByDask.time_binary_op_1d
+      9.03±0.2ms       44.3±0.6ms     4.91  groupby.GroupByDask.time_binary_op_2d

Well this was a big regression for the tiny benchmark problem. I still think it's a good idea given https://discourse.pangeo.io/t/xarray-unable-to-allocate-memory-how-to-size-up-problem/3233/1

TomNicholas · 2023-07-28T03:12:19Z

Hmm yeah I agree that scalability seems more important. Also I missed that this PR doesn't have a whatsnew entry - we should probably add one to help flag the regression.

Automatically chunk other in GroupBy binary ops.

836f497

Closes pydata#7683

github-actions bot added the topic-groupby label Mar 27, 2023

dcherian requested a review from TomNicholas March 27, 2023 15:15

dcherian commented Mar 27, 2023

View reviewed changes

xarray/core/groupby.py Outdated Show resolved Hide resolved

Update xarray/core/groupby.py

f4b771f

TomNicholas reviewed Mar 27, 2023

View reviewed changes

dcherian added 2 commits July 24, 2023 16:09

Add test

0509e72

dcherian requested a review from TomNicholas July 24, 2023 22:22

dcherian marked this pull request as ready for review July 24, 2023 22:22

dcherian commented Jul 25, 2023

View reviewed changes

xarray/core/groupby.py Outdated Show resolved Hide resolved

Update xarray/core/groupby.py

1774a05

TomNicholas approved these changes Jul 25, 2023

View reviewed changes

dcherian added the plan to merge Final call for comments label Jul 25, 2023

dcherian merged commit 52f5cf1 into pydata:main Jul 27, 2023

dcherian deleted the groupby-binary-ops-lazy-array branch July 27, 2023 16:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically chunk `other` in GroupBy binary ops. #7684

Automatically chunk `other` in GroupBy binary ops. #7684

dcherian commented Mar 27, 2023 •

edited

Loading

TomNicholas Mar 27, 2023

dcherian Jul 24, 2023

dcherian commented Jul 28, 2023

TomNicholas commented Jul 28, 2023

Automatically chunk other in GroupBy binary ops. #7684

Automatically chunk other in GroupBy binary ops. #7684

Conversation

dcherian commented Mar 27, 2023 • edited Loading

TomNicholas Mar 27, 2023

Choose a reason for hiding this comment

dcherian Jul 24, 2023

Choose a reason for hiding this comment

dcherian commented Jul 28, 2023

TomNicholas commented Jul 28, 2023

Automatically chunk `other` in GroupBy binary ops. #7684

Automatically chunk `other` in GroupBy binary ops. #7684

dcherian commented Mar 27, 2023 •

edited

Loading