Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change .groupby fastpath to work for monotonic increasing and decreasing #7427

Conversation

JoelJaeschke
Copy link
Contributor

@JoelJaeschke JoelJaeschke commented Jan 6, 2023

This fixes GH6220 which makes it possible to use the fastpath for .groupby for monotonically increasing and decreasing values.

This fixes GH6220 which makes it possible to use the fast path for .groupby for monotonically increasing and decreasing values.
@dcherian
Copy link
Contributor

dcherian commented Jan 12, 2023

Thanks @JoelJaeschke!

The intent was to change the code in _unique_and_monotonic here

A good test would be make sure you get the same answer when grouping a DataArray of ones by a monotonically increasing variable (e.g. [1, 1, 2, 2, 3, 3, 3]) and the decreasing variable( e.g. [3, 3, 3, 2, 2, 1, 1])

@JoelJaeschke
Copy link
Contributor Author

JoelJaeschke commented Jan 20, 2023

Hi @dcherian
that was so obvious that I missed it..My bad!

I am not sure if I understand correctly what you are saying, but if I implement the test as you describe, would this not already work anyways? It will take the slow-path, but the behavior should be identical, right? My intention when writing the test was to explicitly test this case by supplying a grouper.

@dcherian
Copy link
Contributor

TO check the fastpath we'd want to both get the right answer and make sure that _group_indices is filled with slices:

elif group.dims == (group.name,) and _unique_and_monotonic(group):
# no need to factorize
if not squeeze:
# use slices to do views instead of fancy indexing
# equivalent to: group_indices = group_indices.reshape(-1, 1)
group_indices = [slice(i, i + 1) for i in range(group.size)]
else:
group_indices = np.arange(group.size)
unique_coord = group

@JoelJaeschke JoelJaeschke force-pushed the fix-6220-use-fast-path-when-grouping-unique-monotonic-variable branch from c8efb23 to f4bb524 Compare January 29, 2023 20:17
@JoelJaeschke JoelJaeschke force-pushed the fix-6220-use-fast-path-when-grouping-unique-monotonic-variable branch from 0022c13 to 38ff104 Compare January 29, 2023 20:20
@JoelJaeschke JoelJaeschke marked this pull request as draft January 29, 2023 20:37
@JoelJaeschke JoelJaeschke marked this pull request as ready for review January 29, 2023 21:34
@JoelJaeschke
Copy link
Contributor Author

Hi @dcherian, I implemented the feedback as you described, hope this does the trick.

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JoelJaeschke Can you add a line to doc/whats-new.rst please?

@@ -400,7 +402,9 @@ def __init__(
index = safe_cast_to_index(group)
if not index.is_monotonic_increasing:
# TODO: sort instead of raising an error
raise ValueError("index must be monotonic for resampling")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I undid this change. It's more involved to make it actually work.

fwd = array.groupby("idx", squeeze=False)
rev = array_rev.groupby("idx", squeeze=False)

for gb in [fwd, rev]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice test!

I just added a value check on the result below.

array[[2, 0, 1]].resample(time="1D")

reverse = array.isel(time=slice(-1, None, -1))
with pytest.raises(ValueError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new test to make sure we error when trying to resample with a decreasing time index.

@JoelJaeschke
Copy link
Contributor Author

Hi @dcherian, I added a note to whats-new. Thanks for your patience, still got a lot to learn :D

@dcherian dcherian enabled auto-merge (squash) February 24, 2023 20:01
@Illviljan
Copy link
Contributor

This one is failing on CI / ubuntu-latest py3.9 bare-minimum, is it taking a path without flox installed perhaps?

___________ TestDataArrayGroupBy.test_groupby_fastpath_for_monotonic ___________
[gw2] linux -- Python 3.9.16 /home/runner/micromamba-root/envs/xarray-tests/bin/python

self = <xarray.tests.test_groupby.TestDataArrayGroupBy object at 0x7fbd9f461c10>

    def test_groupby_fastpath_for_monotonic(self):
        # Fixes https://github.com/pydata/xarray/issues/6220
        index = [1, 2, 3, 4, 7, 9, 10]
        array = DataArray(np.arange(len(index)), [("idx", index)])
        array_rev = array.copy().assign_coords({"idx": index[::-1]})
        fwd = array.groupby("idx", squeeze=False)
        rev = array_rev.groupby("idx", squeeze=False)
    
        for gb in [fwd, rev]:
            assert all([isinstance(elem, slice) for elem in gb._group_indices])
    
        assert_identical(fwd.sum(), array)
>       assert_identical(rev.sum(), array_rev.sortby("idx"))
E       AssertionError: Left and right DataArray objects are not identical
E       
E       Differing values:
E       L
E           array([0, 1, 2, 3, 4, 5, 6])
E       R
E           array([6, 5, 4, 3, 2, 1, 0])
E       Differing coordinates:
E       L * idx      (idx) int64 10 9 7 4 3 2 1
E       R * idx      (idx) int64 1 2 3 4 7 9 10

dcherian added 2 commits July 25, 2024 15:24
…monotonic-variable

* main: (995 commits)
  Adding `open_datatree` backend-specific keyword arguments (pydata#9199)
  [pre-commit.ci] pre-commit autoupdate (pydata#9202)
  Restore ability to specify _FillValue as Python native integers (pydata#9258)
  add backend intro and how-to diagram (pydata#9175)
  Fix copybutton for multi line examples in double digit ipython cells (pydata#9264)
  Update signature for _arrayfunction.__array__ (pydata#9237)
  Add encode_cf_datetime benchmark (pydata#9262)
  groupby, resample: Deprecate some positional args (pydata#9236)
  Delete ``base`` and ``loffset`` parameters to resample (pydata#9233)
  Update dropna docstring (pydata#9257)
  Grouper, Resampler as public api (pydata#8840)
  Fix mypy on main (pydata#9252)
  fix fallback isdtype method (pydata#9250)
  Enable pandas type checking (pydata#9213)
  Per-variable specification of boolean parameters in open_dataset (pydata#9218)
  test push
  Added a space to the documentation (pydata#9247)
  Fix typing for test_plot.py (pydata#9234)
  Allow mypy to run in vscode (pydata#9239)
  Revert "Test main push"
  ...
@dcherian dcherian added the plan to merge Final call for comments label Jul 26, 2024
@dcherian dcherian requested a review from max-sixty July 26, 2024 19:00
xarray/tests/test_groupby.py Outdated Show resolved Hide resolved
@dcherian dcherian merged commit 0023e5d into pydata:main Jul 26, 2024
28 checks passed
Copy link

welcome bot commented Jul 26, 2024

Congratulations on completing your first pull request! Welcome to Xarray! We are proud of you, and hope to see you again! celebration gif

@dcherian
Copy link
Contributor

Thanks @JoelJaeschke sorry for the really long delay here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments topic-groupby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE]: Use fast path when grouping by unique monotonic decreasing variable
5 participants