Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce incompatible-types and enables spilling of CuPy arrays #856

Merged
merged 11 commits into from
Feb 15, 2022

Conversation

madsbk
Copy link
Member

@madsbk madsbk commented Feb 11, 2022

This PR replaces ignore_types introduced in #568 with incompatible_types, which is a list of types that ProxifyHostFile will unproxify on retrieval. This makes it possible to spill types we previously ignored completely such as cupy.ndarray.

To mark a type incompatible, add it to the comma separated config value "jit-unspill-incompatible" or environment variable DASK_JIT_UNSPILL_INCOMPATIBLE.

The default value is: DASK_JIT_UNSPILL_INCOMPATIBLE="cupy.ndarray"

Closes #855

Notice, I have marked the PR breaking because the DASK_JIT_UNSPILL_IGNORE option has been removed.

cc. @VibhuJawa

Which replace ignore_types
@github-actions github-actions bot added the python python code needed label Feb 11, 2022
@madsbk madsbk added 2 - In Progress Currently a work in progress breaking Breaking change improvement Improvement / enhancement to an existing function labels Feb 11, 2022
@codecov-commenter
Copy link

codecov-commenter commented Feb 11, 2022

Codecov Report

Merging #856 (336dded) into branch-22.04 (48c4929) will increase coverage by 24.30%.
The diff coverage is 96.66%.

Impacted file tree graph

@@                Coverage Diff                @@
##           branch-22.04     #856       +/-   ##
=================================================
+ Coverage         64.17%   88.47%   +24.30%     
=================================================
  Files                22       16        -6     
  Lines              3076     2117      -959     
=================================================
- Hits               1974     1873      -101     
+ Misses             1102      244      -858     
Impacted Files Coverage Δ
dask_cuda/proxify_device_objects.py 97.32% <93.75%> (+0.07%) ⬆️
dask_cuda/explicit_comms/dataframe/shuffle.py 98.69% <100.00%> (ø)
dask_cuda/proxify_host_file.py 94.07% <100.00%> (+0.11%) ⬆️
dask_cuda/cuda_worker.py 77.01% <0.00%> (ø)
dask_cuda/benchmarks/utils.py
dask_cuda/benchmarks/local_cupy.py
dask_cuda/benchmarks/local_cudf_shuffle.py
dask_cuda/_version.py
dask_cuda/benchmarks/local_cudf_merge.py
dask_cuda/benchmarks/local_cupy_map_overlap.py
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 48c4929...336dded. Read the comment docs.

By maintaining key_containts_incompatible_type, we don't have to call
unproxify_device_objects() unnecessary.
@madsbk madsbk added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Feb 11, 2022
@madsbk madsbk marked this pull request as ready for review February 11, 2022 13:43
@madsbk madsbk requested a review from a team as a code owner February 11, 2022 13:43
Comment on lines +32 to +33
incompatibles = dask.config.get("jit-unspill-incompatible", "cupy.ndarray")
incompatibles = incompatibles.split(",")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @madsbk .

I am curious, What about stuff like cupyx.scipy.sparse and CumlArray .

Will this work with those too ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently only cupy.ndarray is on the incompatible list but if you encounter problems with those classes please let me know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. So i guess if we set DASK_JIT_UNSPILL_INCOMPATIBLE = cupy.ndarray, CumlArray, cupyx.scipy.sparse.csr_matrix and just try that to see if that fixes the gpu-bdb stuff, ( which potenitally maybe failing with just cupy.ndarrays and see ?.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and also try with just cupy.ndarray to see if CumlArray and cupyx.scipy.sparse.csr_matrix works with ProxyObject

Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @madsbk , I've added a few minor suggestions, but otherwise LGTM.

dask_cuda/proxify_host_file.py Outdated Show resolved Hide resolved
Comment on lines 633 to 634
del self.store[key]
del self.key_contains_incompatible_type[key]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be simpler to keep a single dict storing a (obj, incompatible_type_found)? Maybe I'm missing some context, but a single dict would potentially make it less error prone on forgetting to update one of them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment explaining exactly what self.store is. Please let me know if you think the comment is accurate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shwina , I think the changes to combine dicts look good, but I didn't see any comment, any chance you missed pushing the change?

Copy link
Contributor

@shwina shwina Feb 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, I still don't see any comments. 😅

Could you paste a link to that diff where the comment is contained?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept pushing to my fork instead of Mads' :( Sorry for the confusion. Should be updated on this PR now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries Ashwin, I see your comment now and I think it's explaining correctly, thanks for adding that.

dask_cuda/tests/test_proxify_host_file.py Outdated Show resolved Hide resolved
@pentschev
Copy link
Member

rerun tests

Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @madsbk and @shwina for the work in this!

@pentschev
Copy link
Member

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 98686bd into rapidsai:branch-22.04 Feb 15, 2022
@madsbk
Copy link
Member Author

madsbk commented Feb 21, 2022

Thanks @shwina for finishing the PR and thanks @pentschev for the review :)

@madsbk madsbk deleted the incompatible_types branch February 21, 2022 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team breaking Breaking change improvement Improvement / enhancement to an existing function python python code needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Spilling on demand with CuPy arrays not working as expected
5 participants