Skip to content

Conversation

@galipremsagar
Copy link
Contributor

Description

  • This PR adds support for group_keys in groupby. Starting pandas 1.5.0, issues around group_keys have been resolved:

pandas-dev/pandas#34998
pandas-dev/pandas#47185

  • This PR defaults group_keys to False which is the same as what pandas is going to be defaulting to in the future version.
  • Required to unblock pandas-1.5.0 upgrade in cudf: [REVIEW] Upgrade pandas to 1.5 #11617

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@galipremsagar galipremsagar added 3 - Ready for Review Ready for review by team Python Affects Python cuDF API. 4 - Needs cuDF (Python) Reviewer improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 7, 2022
@galipremsagar galipremsagar requested a review from shwina September 7, 2022 00:02
@galipremsagar galipremsagar self-assigned this Sep 7, 2022
@galipremsagar galipremsagar requested a review from a team as a code owner September 7, 2022 00:02
@galipremsagar galipremsagar requested a review from vyasr September 7, 2022 00:02
@codecov
Copy link

codecov bot commented Sep 7, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.10@66b5a0c). Click here to learn what that means.
Patch has no changes to coverable lines.

❗ Current head a20593e differs from pull request most recent head 86b3c97. Consider uploading reports for the commit 86b3c97 to get more accurate results

Additional details and impacted files
@@               Coverage Diff               @@
##             branch-22.10   #11659   +/-   ##
===============================================
  Coverage                ?   86.42%           
===============================================
  Files                   ?      145           
  Lines                   ?    23000           
  Branches                ?        0           
===============================================
  Hits                    ?    19877           
  Misses                  ?     3123           
  Partials                ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks fine to me! Multi-indexing is always complicated but I'm happy with this approach. Thank you for the helpful notes in the docs, as well.

@galipremsagar galipremsagar requested a review from shwina September 7, 2022 16:08
@galipremsagar galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer labels Sep 8, 2022
@galipremsagar
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit d3e8f6d into rapidsai:branch-22.10 Sep 8, 2022
@galipremsagar galipremsagar mentioned this pull request Sep 12, 2022
14 tasks
rapids-bot bot pushed a commit that referenced this pull request Sep 21, 2022
This PR introduces `pandas-1.5` support in `cudf`. The changes include:

- [x] Requires `group_keys` support in `groupby` for `dask_cudf` to work: #11659
- [x] Requires `zfill` updates to match `pandas-1.5` behavior: #11634
- [x] `where` API: Ability to inspect a scalar value if it can be fit into the existing dtype, similar to: pandas-dev/pandas#48373
- [x] Switches `ValueError` to `TypeError` when an unknown category is being set to a `CategoricalColumn`
- [x] Handles breaking change of an `ArrowIntervalType` related import that has resulted in `cudf` to error on import itself.
- [x] Fix an issue with `IntervalColumn.to_pandas`.
- [x] Raises error when an object of `boolean` dtype is being set to a `NumericalColumn`.
- [x] Raises error when `pat` is None in `Series.str.startswith` & `Series.str.endswith`.
- [x] Add `IntervalDtype.to_pandas` with appropriate versioning.
- [x] Handle `get_window_bounds` signature changes.
- [x] Fix and version a bunch of pytests.

```python
branch-22.10:

== 4275 failed, 79837 passed, 2049 skipped, 1193 xfailed, 1923 xpassed, 6597 warnings, 4 errors in 1103.52s (0:18:23) ==
== 803 failed, 106 passed, 14 skipped, 14 xfailed, 324 warnings, 17 errors in 148.46s (0:02:28) ==

This PR:

== 84041 passed, 2049 skipped, 1199 xfailed, 1710 xpassed, 6599 warnings in 359.27s (0:05:59) ==
== 954 passed, 14 skipped, 7 xfailed, 3 xpassed, 580 warnings in 54.75s ==
```

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Ashwin Srinath (https://github.com/shwina)
  - Matthew Roeschke (https://github.com/mroeschke)
  - Mark Sadang (https://github.com/msadang)

URL: #11617
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants