Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion conda/environments/all_cuda-129_arch-aarch64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ channels:
- conda-forge
dependencies:
- click >=8.1
- cuda-core==0.3.*
- cuda-nvcc-impl
- cuda-nvrtc
- cuda-version=12.9
Expand All @@ -15,7 +16,6 @@ dependencies:
- kvikio==25.10.*,>=0.0.0a0
- numactl-devel-cos7-aarch64
- numba-cuda>=0.19.0,<0.20.0a0
- numba>=0.60.0,<0.62.0a0
- numpy>=1.23,<3.0a0
- numpydoc>=1.1.0
- pandas>=1.3
Expand Down
2 changes: 1 addition & 1 deletion conda/environments/all_cuda-129_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ channels:
- conda-forge
dependencies:
- click >=8.1
- cuda-core==0.3.*
- cuda-nvcc-impl
- cuda-nvrtc
- cuda-version=12.9
Expand All @@ -15,7 +16,6 @@ dependencies:
- kvikio==25.10.*,>=0.0.0a0
- numactl-devel-cos7-x86_64
- numba-cuda>=0.19.0,<0.20.0a0
- numba>=0.60.0,<0.62.0a0
- numpy>=1.23,<3.0a0
- numpydoc>=1.1.0
- pandas>=1.3
Expand Down
11 changes: 4 additions & 7 deletions dask_cuda/initialize.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import os

import click
import numba.cuda
import cuda.core.experimental
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little out of the loop here but depending on something with experimental in the name makes me nervous.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more of an "experimental API" (you know, like all of Dask's API) rather than experimental functionality, those are essentially CUDA bindings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, at this point numba-cuda can also be considered experimental in my book, rapidsai/ucxx#462 then posterior need to downgrade it in rapidsai/ucxx#466 and immediately downgrade it again in rapidsai/ucxx#468, as well as NVIDIA/cuda-python#852 are a statement of that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://nvidia.github.io/cuda-python/cuda-core/latest/api.html#cuda-core-experimental-api-reference

All of the APIs listed (or cross-referenced from) below are considered experimental and subject to future changes without deprecation notice. Once stablized they will be moved out of the experimental namespace.

I would hope we'll get a bit of time to move to cuda.core.Device before .experimental is removed completely, but I can try to do the equivalent operation with the (experimental) cuda.bindings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pentschev I take your point, but this is clearly labelled as experimental and subject to change.

Given that .experimental is in the namespace that's certainly subject to change as it don't be experimental forever.

Perhaps wrapping the calls via a utility would help centralise future changes? It could also be a good place to catch future import errors and raise something more helpful like a link to an issue that describes that this needs changing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but if the argument is being experimental, then I would consider all of Dask experimental because everything is subject to change, and has proven to be the case in countless occasions. Dask only avoids the trouble of even labelling anything appropriately, so arguable worse.

I wouldn't mind centralizing it to simplify for future changes, that would be fine. I would still prefer that we use cuda.core.experimental than roll our own re-implementation on top of cuda.bindings, it's way more likely it will be better tested there than what we would do on our own, even though it's experimental. The move out of experimental will still break us at some point but that will be way less maintenance burden than the alternatives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are also likely to catch issues from cuda.core.experimental now, which is good for everyone, it helps us finding problems that will eventually come to bite us early, and in the process we help making cuda.core better earlier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got confirmation from the cuda-python team that pinning to minor versions (e.g., cuda-core=0.3.*) we are safe against breaking changes, and therefore I still think we should go ahead with the changes as they, plus adding a proper cuda-core=0.3.* explicit dependency with the pin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c184166 adjusted the pin, so I think we're good. We'll just need to be prepared to update this in the future once things are available in cuda.core rather than cuda.core.experimental.


import dask
from distributed.diagnostics.nvml import get_device_index_and_uuid, has_cuda_context
Expand All @@ -18,11 +18,11 @@
def _create_cuda_context_handler():
if int(os.environ.get("DASK_CUDA_TEST_SINGLE_GPU", "0")) != 0:
try:
numba.cuda.current_context()
except numba.cuda.cudadrv.error.CudaSupportError:
cuda.core.experimental.Device().set_current()
except Exception:
pass
else:
numba.cuda.current_context()
cuda.core.experimental.Device().set_current()


def _warn_generic():
Expand Down Expand Up @@ -100,9 +100,6 @@ def _initialize_ucxx():


def _create_cuda_context(protocol="ucx"):
if protocol not in ["ucx", "ucxx", "ucx-old"]:
return

try:
ucx_implementation = _get_active_ucx_implementation_name(protocol)
except ValueError:
Expand Down
18 changes: 5 additions & 13 deletions dependencies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -142,24 +142,12 @@ dependencies:
- output_types: [conda, requirements, pyproject]
packages:
- click >=8.1
- numba>=0.60.0,<0.62.0a0
- cuda-core==0.3.*
- numpy>=1.23,<3.0a0
- pandas>=1.3
- pynvml>=12.0.0,<13.0.0a0
- rapids-dask-dependency==25.10.*,>=0.0.0a0
- zict>=2.0.0
- output_types: [conda]
packages:
- &numba_cuda numba-cuda>=0.19.0,<0.20.0a0
specific:
- output_types: [requirements, pyproject]
matrices:
- matrix: {cuda: "12.*"}
packages:
- &numba_cuda_cu12 numba-cuda[cu12]>=0.19.0,<0.20.0a0
- matrix: # Fallback for no matrix
packages:
- *numba_cuda_cu12
test_python:
common:
- output_types: [conda, requirements, pyproject]
Expand All @@ -175,6 +163,8 @@ dependencies:
- &kvikio_unsuffixed kvikio==25.10.*,>=0.0.0a0
- &ucx_py_unsuffixed ucx-py==0.46.*,>=0.0.0a0
- ucxx==0.46.*,>=0.0.0a0
- &numba_cuda numba-cuda>=0.19.0,<0.20.0a0

specific:
- output_types: conda
matrices:
Expand All @@ -197,13 +187,15 @@ dependencies:
- distributed-ucxx-cu12==0.46.*,>=0.0.0a0
- kvikio-cu12==25.10.*,>=0.0.0a0
- ucx-py-cu12==0.46.*,>=0.0.0a0
- &numba_cuda_cu12 numba-cuda[cu12]>=0.19.0,<0.20.0a0
- matrix:
packages:
- *cudf_unsuffixed
- *dask_cudf_unsuffixed
- *distributed_ucxx_unsuffixed
- *kvikio_unsuffixed
- *ucx_py_unsuffixed
- *numba_cuda_cu12
depends_on_dask_cuda:
common:
- output_types: conda
Expand Down
3 changes: 1 addition & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,7 @@ license = { text = "Apache-2.0" }
requires-python = ">=3.10"
dependencies = [
"click >=8.1",
"numba-cuda[cu12]>=0.19.0,<0.20.0a0",
"numba>=0.60.0,<0.62.0a0",
"cuda-core==0.3.*",
"numpy>=1.23,<3.0a0",
"pandas>=1.3",
"pynvml>=12.0.0,<13.0.0a0",
Expand Down