Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically chunk other in GroupBy binary ops. #7684

Merged
merged 5 commits into from
Jul 27, 2023

Conversation

dcherian
Copy link
Contributor

@dcherian dcherian commented Mar 27, 2023

xarray/core/groupby.py Outdated Show resolved Hide resolved
xarray/core/groupby.py Outdated Show resolved Hide resolved
if obj.__dask_graph__() is not None and other.__dask_graph__() is None:
# a chunk size of 1 seems reasonable since we expect it to be repeated
# TODO: what about dims other than `name``
# TODO: What about datasets with some dask vars, and others not?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know which variables will be operated on at this point? (I'm not very familiar with this part of the code). Surely we only need to chunk the variables that will be operated on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.... I guess technically we could have different data vars in obj and other

dcherian added 2 commits July 24, 2023 16:09
…lazy-array

* upstream/main: (153 commits)
  Add HDF5 Section to read/write docs page (pydata#8012)
  [pre-commit.ci] pre-commit autoupdate (pydata#8014)
  Update interpolate_na in dataset.py (pydata#7974)
  improved docstring of to_netcdf (issue pydata#7127) (pydata#7947)
  Expose "Coordinates" as part of Xarray's public API (pydata#7368)
  Core team member guide (pydata#7999)
  join together duplicate entries in the text `repr` (pydata#7225)
  Update copyright year in README (pydata#8007)
  Allow opening datasets with nD dimenson coordinate variables. (pydata#7989)
  Move whats-new entry
  [pre-commit.ci] pre-commit autoupdate (pydata#7997)
  Add documentation on custom indexes (pydata#6975)
  Use variable name in all exceptions raised in `as_variable` (pydata#7995)
  Bump pypa/gh-action-pypi-publish from 1.8.7 to 1.8.8 (pydata#7994)
  New whatsnew section
  Remove future release notes before this release
  Update whats-new.rst for new release (pydata#7993)
  Remove hue_style from plot1d docstring (pydata#7925)
  Add new what's new section (pydata#7986)
  Release summary for v2023.07.0 (pydata#7979)
  ...
@dcherian dcherian requested a review from TomNicholas July 24, 2023 22:22
@dcherian dcherian marked this pull request as ready for review July 24, 2023 22:22
xarray/core/groupby.py Outdated Show resolved Hide resolved
@dcherian dcherian added the plan to merge Final call for comments label Jul 25, 2023
@dcherian dcherian merged commit 52f5cf1 into pydata:main Jul 27, 2023
@dcherian dcherian deleted the groupby-binary-ops-lazy-array branch July 27, 2023 16:41
@dcherian
Copy link
Contributor Author

+      4.44±0.2ms       66.4±0.4ms    14.93  groupby.GroupByDask.time_binary_op_1d
+      9.03±0.2ms       44.3±0.6ms     4.91  groupby.GroupByDask.time_binary_op_2d

Well this was a big regression for the tiny benchmark problem. I still think it's a good idea given https://discourse.pangeo.io/t/xarray-unable-to-allocate-memory-how-to-size-up-problem/3233/1

@TomNicholas
Copy link
Member

Hmm yeah I agree that scalability seems more important. Also I missed that this PR doesn't have a whatsnew entry - we should probably add one to help flag the regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments topic-groupby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

automatically chunk in groupby binary ops
2 participants