Make `dask.dataframe` optional by rjzamora · Pull Request #1439 · rapidsai/dask-cuda

rjzamora · 2025-02-04T00:15:36Z

Dask-CUDA currently requires that dask.dataframe be imported in a few places. We only do this to patch in explicit-comms shuffling and to register various dispatch functions. There is no fundamental reason that we need dask.dataframe to be installed if the user is not actually using dask.dataframe/dask_cudf in their workflow.

This PR essentially adds exception handling for "automatic" dask.dataframe imports (when dask_cuda is imported).

copy-pr-bot · 2025-02-04T00:15:41Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

TomAugspurger · 2025-02-04T02:26:23Z

How should we test this? I think we can either define a new test environment with the minimal dependencies, and update all the test imports to be importorskips, or we add a test that configures the imports so that dask.dataframe can't be imported. Something like this accomplishes that:

diff --git a/dask_cuda/tests/test_local_cuda_cluster.py b/dask_cuda/tests/test_local_cuda_cluster.py
index b144d11..ac0d10e 100644
--- a/dask_cuda/tests/test_local_cuda_cluster.py
+++ b/dask_cuda/tests/test_local_cuda_cluster.py
@@ -580,3 +580,12 @@ def test_death_timeout_raises():
             dashboard_address=":0",
         ):
             pass
+
+
+def test_without_dask_dataframe(monkeypatch):
+    for k in list(sys.modules):
+        if k.startswith("dask.dataframe"):
+            monkeypatch.setitem(sys.modules, k, None)
+    monkeypatch.delitem(sys.modules, "dask_cuda")
+
+    import dask_cuda  # noqa: F401

By sticking None in there, Python will raise a ModuleNotFoundError. That test fails on main, but passes on this branch.

The downside of this test is that a change in the implementation could break it. I've tried to make it a bit more robust by overriding every package under dask.dataframe, but it's a risk.

pentschev · 2025-02-04T08:46:04Z

we add a test that configures the imports so that dask.dataframe can't be imported

Probably this makes more sense given the cost of launching a separate environment only to test this behavior. Historically I think we ignored testing such rather complex cases because it's hard to ensure any changes are also reverted appropriately, to overcome that I think writing the test as you proposed makes sense but I'd prefer if we launch the test in a new process as we do we some other tests, this way it becomes less probable that we leak some of those global changes to tests that follow.

TomAugspurger · 2025-02-04T12:21:30Z

I like the subprocess suggestion in this case.

rjzamora · 2025-02-04T14:47:32Z

dask_cuda/tests/test_initialize.py

    assert not p.exitcode
+
+
+def _test_dask_cuda_import(monkeypatch):


@TomAugspurger @pentschev - I used a combination of your suggestions (thanks!)

Let me know what you think.

I think this is good. The only behavior I'm a little worried about is what the behavior of a pickled monkeypatch is since we're sending it through the mp.Process. It's probably fine, but the docs suggest that you can use pytest.MonkeyPatch.context() if the fixture isn't available, so maybe I'd suggest that:

with pytest.MonkeyPatch.context() as monekypatch: ...

Only question is whether we'd want to test any functionality (like creating a LocalCUDACluster) beyond the imports.

TomAugspurger · 2025-02-04T17:21:11Z

Looks good to me. I've confirmed locally that the test fails if the imports aren't inside try/except blocks.

pentschev

LGTM, thanks @rjzamora !

rjzamora · 2025-02-04T18:39:18Z

/merge

rjzamora added 3 commits February 3, 2025 15:37

relax dependency on dask.dataframe

0130400

import dask.dataframe and dask.array within try/except

87519f7

remove debug line

13a92b1

rjzamora added 2 - In Progress Currently a work in progress improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 4, 2025

rjzamora self-assigned this Feb 4, 2025

github-actions bot added the python python code needed label Feb 4, 2025

add test coverage

e791da0

rjzamora commented Feb 4, 2025

View reviewed changes

rjzamora marked this pull request as ready for review February 4, 2025 14:48

rjzamora requested a review from a team as a code owner February 4, 2025 14:48

update monkey patching and test LocalCUDACluster creation

09ee51c

pentschev approved these changes Feb 4, 2025

View reviewed changes

rjzamora added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 2 - In Progress Currently a work in progress labels Feb 4, 2025

rapids-bot bot merged commit 4b86af2 into rapidsai:branch-25.04 Feb 4, 2025
32 checks passed

rjzamora deleted the relax-dataframe-dependency branch February 4, 2025 18:39

jameslamb mentioned this pull request Aug 25, 2025

Build and test with CUDA 13.0.0 #1536

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Make `dask.dataframe` optional#1439

Make `dask.dataframe` optional#1439
rapids-bot[bot] merged 5 commits intorapidsai:branch-25.04from
rjzamora:relax-dataframe-dependency

rjzamora commented Feb 4, 2025

Uh oh!

copy-pr-bot bot commented Feb 4, 2025

Uh oh!

TomAugspurger commented Feb 4, 2025

Uh oh!

pentschev commented Feb 4, 2025

Uh oh!

TomAugspurger commented Feb 4, 2025

Uh oh!

rjzamora Feb 4, 2025

Uh oh!

TomAugspurger Feb 4, 2025 •

edited

Loading

Uh oh!

TomAugspurger commented Feb 4, 2025

Uh oh!

pentschev left a comment

Uh oh!

rjzamora commented Feb 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		assert not p.exitcode


		def _test_dask_cuda_import(monkeypatch):

Comments

Conversation

rjzamora commented Feb 4, 2025

Uh oh!

copy-pr-bot bot commented Feb 4, 2025

Uh oh!

TomAugspurger commented Feb 4, 2025

Uh oh!

pentschev commented Feb 4, 2025

Uh oh!

TomAugspurger commented Feb 4, 2025

Uh oh!

rjzamora Feb 4, 2025

Choose a reason for hiding this comment

Uh oh!

TomAugspurger Feb 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomAugspurger commented Feb 4, 2025

Uh oh!

pentschev left a comment

Choose a reason for hiding this comment

Uh oh!

rjzamora commented Feb 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

TomAugspurger Feb 4, 2025 •

edited

Loading