Mismatch between GPU driver and CUDA toolkit version in the current environment breaks Numba GPU functionality #1281

mnc909 · 2023-08-08T19:43:10Z

🐛 Bug

Current CUDA Toolkit version is 11.8, while the CUDA version in the Nvidia driver is 11.4. This is usually not a problem because of Minor Version Compatibility, however Numba in particular doesn't support MVC yet, so the entire CUDA functionality of Numba is not working with the current docker image (it used to work before), producing the error "[222] Call to cuLinkAddData results in UNKNOWN_CUDA_ERROR" known to occur when a mismatch like this happens.

As Numba's CUDA wrapper is sort of the only sensible way of getting custom GPU algorithms to work together with pytorch in Kaggle, this is quite unfortunate.

To Reproduce

Open any notebook implementing a Numba CUDA kernel, such as this one: https://www.kaggle.com/code/harshwalia/2-custom-cuda-kernels-in-python-with-numba/notebook
Change the environment in notebook settings from pinned to latest
Run the kernel (in the above example, the first 3 cells)

Expected behavior

Numba kernels working

djherbis · 2023-08-08T20:52:14Z

Hey @mnc909 I dug into this a little bit to understand better.

Kaggle's CUDA toolkit version comes from our base Docker image, which we upgrade periodically:

docker-python/config.txt

Line 4 in 2a8c4dd

GPU_BASE_IMAGE_NAME=tf2-gpu.2-12.py310

Our NVIDIA driver version on the other hand comes from our VM image.

We'll have to look into how we would go about upgrading the driver in the boot image since we don't actually directly manage that today and it seems like while the boot image gets updates, they haven't upgraded the nvidia driver.

gmarkall · 2023-08-09T14:20:32Z

Numba in particular doesn't support MVC yet

The Numba MVC support is documented at https://numba.readthedocs.io/en/stable/cuda/minor_version_compatibility.html - can you make use of this in your environment?

djherbis · 2023-08-09T14:43:10Z

Thanks @gmarkall trying it 🤞

#1281 numba mvc support

jakirkham · 2023-11-23T03:59:29Z

In addition to PR ( #1282 ), which added the missing MVC bits for Numba, it looks like the base image was updated recently ( #1305 ), which appears to contain a newer driver version as well

mnc909 added bug bug & failures with existing packages help wanted labels Aug 8, 2023

djherbis added a commit that referenced this issue Aug 9, 2023

#1281 numba mvc support

ba33535

djherbis added a commit that referenced this issue Aug 14, 2023

Merge pull request #1282 from Kaggle/numba-mvc

8e15dd3

#1281 numba mvc support

djherbis mentioned this issue Aug 16, 2023

#1281 numba mvc support #1282

Merged

djherbis closed this as completed May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mismatch between GPU driver and CUDA toolkit version in the current environment breaks Numba GPU functionality #1281

Mismatch between GPU driver and CUDA toolkit version in the current environment breaks Numba GPU functionality #1281

mnc909 commented Aug 8, 2023

djherbis commented Aug 8, 2023

gmarkall commented Aug 9, 2023

djherbis commented Aug 9, 2023

jakirkham commented Nov 23, 2023

Mismatch between GPU driver and CUDA toolkit version in the current environment breaks Numba GPU functionality #1281

Mismatch between GPU driver and CUDA toolkit version in the current environment breaks Numba GPU functionality #1281

Comments

mnc909 commented Aug 8, 2023

🐛 Bug

To Reproduce

Expected behavior

djherbis commented Aug 8, 2023

gmarkall commented Aug 9, 2023

djherbis commented Aug 9, 2023

jakirkham commented Nov 23, 2023