Make dask.dataframe optional#1439
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
How should we test this? I think we can either define a new test environment with the minimal dependencies, and update all the test imports to be diff --git a/dask_cuda/tests/test_local_cuda_cluster.py b/dask_cuda/tests/test_local_cuda_cluster.py
index b144d11..ac0d10e 100644
--- a/dask_cuda/tests/test_local_cuda_cluster.py
+++ b/dask_cuda/tests/test_local_cuda_cluster.py
@@ -580,3 +580,12 @@ def test_death_timeout_raises():
dashboard_address=":0",
):
pass
+
+
+def test_without_dask_dataframe(monkeypatch):
+ for k in list(sys.modules):
+ if k.startswith("dask.dataframe"):
+ monkeypatch.setitem(sys.modules, k, None)
+ monkeypatch.delitem(sys.modules, "dask_cuda")
+
+ import dask_cuda # noqa: F401By sticking The downside of this test is that a change in the implementation could break it. I've tried to make it a bit more robust by overriding every package under |
Probably this makes more sense given the cost of launching a separate environment only to test this behavior. Historically I think we ignored testing such rather complex cases because it's hard to ensure any changes are also reverted appropriately, to overcome that I think writing the test as you proposed makes sense but I'd prefer if we launch the test in a new process as we do we some other tests, this way it becomes less probable that we leak some of those global changes to tests that follow. |
|
I like the subprocess suggestion in this case. |
dask_cuda/tests/test_initialize.py
Outdated
| assert not p.exitcode | ||
|
|
||
|
|
||
| def _test_dask_cuda_import(monkeypatch): |
There was a problem hiding this comment.
@TomAugspurger @pentschev - I used a combination of your suggestions (thanks!)
Let me know what you think.
There was a problem hiding this comment.
I think this is good. The only behavior I'm a little worried about is what the behavior of a pickled monkeypatch is since we're sending it through the mp.Process. It's probably fine, but the docs suggest that you can use pytest.MonkeyPatch.context() if the fixture isn't available, so maybe I'd suggest that:
with pytest.MonkeyPatch.context() as monekypatch:
...
Only question is whether we'd want to test any functionality (like creating a LocalCUDACluster) beyond the imports.
|
Looks good to me. I've confirmed locally that the test fails if the imports aren't inside try/except blocks. |
|
/merge |
Dask-CUDA currently requires that
dask.dataframebe imported in a few places. We only do this to patch in explicit-comms shuffling and to register various dispatch functions. There is no fundamental reason that we needdask.dataframeto be installed if the user is not actually usingdask.dataframe/dask_cudfin their workflow.This PR essentially adds exception handling for "automatic"
dask.dataframeimports (whendask_cudais imported).