Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace cuDF (de)serializer with cuDF spill-aware (de)serializer #1369

Merged

Conversation

pentschev
Copy link
Member

@pentschev pentschev commented Jul 26, 2024

Replace cuDF (de)serializer with cuDF spill-aware (de)serializer, using both together should be avoided as that will cause excessive spilling.

Additionally add:

  • Missing test of cuDF internal spill mechanism with LocalCUDACluster;
  • dask cuda worker warning to alert the user that cuDF spilling mechanism requires client/scheduler to enable it as well.

Closes #1363 .

@github-actions github-actions bot added the python python code needed label Jul 26, 2024
@pentschev pentschev added 2 - In Progress Currently a work in progress feature request New feature or request breaking Breaking change labels Jul 26, 2024
@pentschev pentschev added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Jul 29, 2024
@pentschev pentschev marked this pull request as ready for review July 29, 2024 16:27
@pentschev pentschev requested a review from a team as a code owner July 29, 2024 16:27
@pentschev pentschev changed the base branch from branch-24.08 to branch-24.10 July 29, 2024 16:27
@pentschev
Copy link
Member Author

@ayushdg @trivialfis @dantegd @quasiben people who are using cuDF spill for whatever reason should test this, the change has potential to avoid double Dask+cuDF spilling of cuDF objects and therefore may improve performance. There's also a chance that this might introduce issues that are hard to predict, so please feel free to give it a try and report any issues and/or performance results, we plan to merge it in branch-24.10 only since we're under code freeze and this is potentially risky.

@pentschev
Copy link
Member Author

cc @rjzamora as well

Copy link
Member

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @pentschev, I only have one suggestion

if not cudf.get_option("spill"):
from cudf.comm import serialize


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good with a comment here saying why this it needed and include a link to #1363

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done via 29751de

@pentschev
Copy link
Member Author

Thanks for review and approval @madsbk !

@pentschev
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit c0cd465 into rapidsai:branch-24.10 Jul 31, 2024
27 checks passed
@pentschev pentschev deleted the cudf-spill-aware-serializer branch July 31, 2024 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team breaking Breaking change feature request New feature or request python python code needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow deregistering cuDF from Dask's spilling mechanism
2 participants