-
Notifications
You must be signed in to change notification settings - Fork 957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mismatch between GPU driver and CUDA toolkit version in the current environment breaks Numba GPU functionality #1281
Comments
Hey @mnc909 I dug into this a little bit to understand better. Kaggle's CUDA toolkit version comes from our base Docker image, which we upgrade periodically: Line 4 in 2a8c4dd
Our NVIDIA driver version on the other hand comes from our VM image. We'll have to look into how we would go about upgrading the driver in the boot image since we don't actually directly manage that today and it seems like while the boot image gets updates, they haven't upgraded the nvidia driver. |
The Numba MVC support is documented at https://numba.readthedocs.io/en/stable/cuda/minor_version_compatibility.html - can you make use of this in your environment? |
Thanks @gmarkall trying it 🤞 |
🐛 Bug
Current CUDA Toolkit version is 11.8, while the CUDA version in the Nvidia driver is 11.4. This is usually not a problem because of Minor Version Compatibility, however Numba in particular doesn't support MVC yet, so the entire CUDA functionality of Numba is not working with the current docker image (it used to work before), producing the error "[222] Call to cuLinkAddData results in UNKNOWN_CUDA_ERROR" known to occur when a mismatch like this happens.
As Numba's CUDA wrapper is sort of the only sensible way of getting custom GPU algorithms to work together with pytorch in Kaggle, this is quite unfortunate.
To Reproduce
Expected behavior
Numba kernels working
The text was updated successfully, but these errors were encountered: