Skip to content

Conversation

@rjzamora
Copy link
Member

@rjzamora rjzamora commented May 20, 2024

Description

Some dask-cudf tests are currently producing a segfault when sorting by categorical columns. These tests were already marked as "xfail". This PR goes one step further, and raises an error in the top-level sort_values API. This NotImplementedError can be removed as soon as the problem is fixed up-stream (working on this now, but probably won't be available for 24.06).

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@rjzamora rjzamora added bug Something isn't working 2 - In Progress Currently a work in progress non-breaking Non-breaking change labels May 20, 2024
@rjzamora rjzamora self-assigned this May 20, 2024
@github-actions github-actions bot added the Python Affects Python cuDF API. label May 20, 2024
@rjzamora rjzamora marked this pull request as ready for review May 20, 2024 18:05
@rjzamora rjzamora requested a review from a team as a code owner May 20, 2024 18:05
@rjzamora rjzamora added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels May 20, 2024
@jameslamb
Copy link
Member

@rjzamora could you pull in latest branch-24.06? Now that #15782, if you do that you shouldn't see the libarrow issue any more.

@galipremsagar galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels May 20, 2024
@galipremsagar
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit 4da00ea into rapidsai:branch-24.06 May 20, 2024
@rjzamora rjzamora deleted the avoid-segfault branch May 20, 2024 23:59
rapids-bot bot pushed a commit that referenced this pull request May 22, 2024
Follow up to #15788

Adds a temporary workaround for sorting on categorical columns in 24.06: We convert only the partitioning column to pandas to calculate divisions.

This is related to #11795, but I don't want to "close" that issue until `RepartitionQuantiles` works with cudf-backed data.

Authors:
  - Richard (Rick) Zamora (https://github.com/rjzamora)

Approvers:
  - Charles Blackmon-Luca (https://github.com/charlesbluca)

URL: #15801
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working non-breaking Non-breaking change Python Affects Python cuDF API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants