Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unable to change backend of a dask cudf dataframe to pandas when a client exists #18122

Open
praateekmahajan opened this issue Feb 27, 2025 · 2 comments
Labels
bug Something isn't working dask Dask issue Python Affects Python cuDF API.

Comments

@praateekmahajan
Copy link

Describe the bug
If a client has been created then to_backend('pandas') throws the following error

2025-02-27 11:05:50,648 - distributed.worker - ERROR - Compute Failed
Key:       ('operation-88baa4e26a1e04e68dadcfc83119619d', 0)
State:     executing
Task:  <Task ('operation-88baa4e26a1e04e68dadcfc83119619d', 0) operation(...)>
Exception: 'TypeError("No dispatch for <class \'cudf.core.dataframe.DataFrame\'>")'
Traceback: '  File "/opt/conda/lib/python3.12/site-packages/dask_expr/_backends.py", line 40, in operation\n    return to_pandas_dispatch(df, **options)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/opt/conda/lib/python3.12/site-packages/dask/utils.py", line 771, in __call__\n    meth = self.dispatch(type(arg))\n           ^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/opt/conda/lib/python3.12/site-packages/dask/utils.py", line 765, in dispatch\n    raise TypeError(f"No dispatch for {cls}")\n'

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[1], line 15
      7 _ = dask_cudf.from_cudf(cudf.DataFrame([{"a" : 123}])).to_backend("pandas").compute()
      8 with Client(
      9     LocalCUDACluster(
     10         CUDA_VISIBLE_DEVICES="0",
   (...)
     13 ) as client:
     14     # This doesn't
---> 15     dask_cudf.from_cudf(cudf.DataFrame([{"a" : 123}])).to_backend("pandas").compute()

File /opt/conda/lib/python3.12/site-packages/dask_expr/_collection.py:480, in FrameBase.compute(self, fuse, concatenate, **kwargs)
    478     out = out.repartition(npartitions=1)
    479 out = out.optimize(fuse=fuse)
--> 480 return DaskMethodsMixin.compute(out, **kwargs)

File /opt/conda/lib/python3.12/site-packages/dask/base.py:372, in DaskMethodsMixin.compute(self, **kwargs)
    348 def compute(self, **kwargs):
    349     """Compute this dask collection
    350
    351     This turns a lazy Dask collection into its in-memory equivalent.
   (...)
    370     dask.compute
    371     """
--> 372     (result,) = compute(self, traverse=False, **kwargs)
    373     return result

File /opt/conda/lib/python3.12/site-packages/dask/base.py:660, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    657     postcomputes.append(x.__dask_postcompute__())
    659 with shorten_traceback():
--> 660     results = schedule(dsk, keys, **kwargs)
    662 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])

File /opt/conda/lib/python3.12/site-packages/dask_expr/_backends.py:40, in operation()
     38 @staticmethod
     39 def operation(df, options):
---> 40     return to_pandas_dispatch(df, **options)

File /opt/conda/lib/python3.12/site-packages/dask/utils.py:765, in dispatch()
    763             lk[cls] = impl
    764         return impl
--> 765 raise TypeError(f"No dispatch for {cls}")

TypeError: No dispatch for <class 'cudf.core.dataframe.DataFrame'>

Repro

import dask
from dask_cuda import LocalCUDACluster
from distributed import Client
import dask_cudf
import cudf
# This works
_ = dask_cudf.from_cudf(cudf.DataFrame([{"a" : 123}])).to_backend("pandas").compute()
with Client(
    LocalCUDACluster(
        CUDA_VISIBLE_DEVICES="0",
        enable_cudf_spill=True,
    )
) as client:
    # This doesn't
    dask_cudf.from_cudf(cudf.DataFrame([{"a" : 123}])).to_backend("pandas").compute()

Environment overview (please complete the following information)

dask.__version__ = '2024.12.1'
(dask)cudf.__version__ = '2025.02.00'

Environment details
Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details

Additional context
Add any other context about the problem here.

@praateekmahajan praateekmahajan added the bug Something isn't working label Feb 27, 2025
@rjzamora
Copy link
Member

My best guess is that this problem was caused by rapidsai/dask-cuda#1424, because we are no-longer automatically importing dask_cudf on the dask-cuda worker.

I think the only "proper" fix is to add a @to_pandas_dispatch.register_lazy("cudf") decorator in dask.dataframe.backends.

@mroeschke
Copy link
Contributor

Now that dask/dask#11799 is merged, do we just need a test to ensure this doesn't break?

@mroeschke mroeschke added Python Affects Python cuDF API. dask Dask issue labels Feb 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working dask Dask issue Python Affects Python cuDF API.
Projects
Status: Todo
Development

No branches or pull requests

3 participants