GH-47677: [C++][GPU] Allow building with CUDA 13#48259
Conversation
|
@github-actions crossbow submit cuda |
|
|
|
Revision: 60f974c Submitted crossbow builds: ursacomputing/crossbow @ actions-c2a5e1506b
|
60f974c to
42610fe
Compare
|
@github-actions crossbow submit cuda |
|
Revision: 42610fe Submitted crossbow builds: ursacomputing/crossbow @ actions-719d93fe63
|
|
So, the Numba interop tests fail with: Did Numba change its CUDA interaction APIs again? @gmarkall |
|
This is presumably happening in arrow/python/pyarrow/_cuda.pyx Lines 458 to 465 in 39c865e |
|
@github-actions crossbow submit cudapython* |
|
Revision: ed01642 Submitted crossbow builds: ursacomputing/crossbow @ actions-bc22ec352a
|
|
Ok, given that the tests now fail for other reasons, I think we might just disable them on CI and let interested parties contribute. |
|
@github-actions crossbow submit cudapython* |
|
Revision: 5080baf Submitted crossbow builds: ursacomputing/crossbow @ actions-d5735696f4
|
| #if CUDA_VERSION >= 13000 | ||
| RETURN_NOT_OK(StatusFromCuda(cuCtxCreate_v4(&ctx, /*ctxCreateParams=*/nullptr, | ||
| /*flags=*/0, device_->handle()))); | ||
| #else | ||
| RETURN_NOT_OK(StatusFromCuda(cuCtxCreate(&ctx, /*flags=*/0, device_->handle()))); | ||
| #endif |
5080baf to
9423f81
Compare
|
@github-actions crossbow submit cudapython* |
|
Revision: 9423f81 Submitted crossbow builds: ursacomputing/crossbow @ actions-c59a2b88bb
|
cpp/src/arrow/gpu/cuda_test.cc
Outdated
| Result<CUcontext> NonPrimaryRawContext() { | ||
| CUcontext ctx; | ||
| #if CUDA_VERSION >= 13000 | ||
| RETURN_NOT_OK(StatusFromCuda(cuCtxCreate_v4(&ctx, /*ctxCreateParams=*/nullptr, |
There was a problem hiding this comment.
I'm not sure cuCtxCreate_v4 is designated as a public API. For CUDA 13, cuCtxCreate is #defined as cuCtxCreate_v4, so I think I'd be inclined to write this as:
| RETURN_NOT_OK(StatusFromCuda(cuCtxCreate_v4(&ctx, /*ctxCreateParams=*/nullptr, | |
| RETURN_NOT_OK(StatusFromCuda(cuCtxCreate(&ctx, /*ctxCreateParams=*/nullptr, |
There was a problem hiding this comment.
There was a problem hiding this comment.
(unrelated, but using a #define isn't very nice for non-C languages that would want to access the driver API through FFI)
There was a problem hiding this comment.
That is from the 12.6 API and this was removed in the 13.0 API: https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX (but not the ABI).
There was a problem hiding this comment.
Well, weirdly it succeeds on the 13.0.2 CUDA build... But, yes, I can just switch to calling the macro.
There was a problem hiding this comment.
The way it's defined in the cuda.h header is that cuCtxCreate is a macro defined as cuCtxCreate_v4:
#define cuCtxCreate cuCtxCreate_v4
...
CUresult CUDAAPI cuCtxCreate(CUcontext *pctx, CUctxCreateParams *ctxCreateParams, unsigned int flags, CUdevice dev);
This was an API breaking change in CUDA 13. If you want to generically support CUDA 12 and CUDA 13, I'd recommend using the cuGetProcAddress API (https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__DRIVER__ENTRY__POINT.html#group__CUDA__DRIVER__ENTRY__POINT_1gcae5adad00590572ab35b2508c2d6e0d) which is similar to using dlsym and allows you to work against the ABI which is guaranteed to never be broken.
|
@github-actions crossbow submit cudacpp* |
|
Revision: 9ea7b75 Submitted crossbow builds: ursacomputing/crossbow @ actions-03152c375f
|
PR to restore the public-facing surface of these APIs to what it was before is here: NVIDIA/numba-cuda#610 - my intention is that if this is merged, to make another release shortly afterwards so that the latest version is consistent with earlier versions. |
NVIDIA#536)" This reverts commit 9a56516. This changed the public API of `MemoryPointer` and related classes, and the context that they held was used by Arrow (see apache/arrow#48259 (comment)): > Numba interop tests fail with: ``` arrow-dev/lib/python3.12/site-packages/pyarrow/tests/test_cuda_numba_interop.py:233: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E TypeError: MemoryPointer.__init__() got multiple values for argument 'pointer' ``` This commit reverts the change, as it was intended to improve performance without changing functionality, but has had a functional change as a side effect. Following the merge of this PR, we should be able to remove some of the `@require_context` decorators with some more targeted changes.
|
@raulcd @jorisvandenbossche What do you think? Should we wait for Numba-Cuda to publish a new release or should we just merge this PR with a skip as currently done? |
Given that we are going to lose access to the cuda runners for an unspecified amount of time starting this weekend I'd rather merge with the skip and create a follow up issue to remove the skip once we have runners again and can validate. |
|
@raulcd Do you want to give this a review? |
9ea7b75 to
00fccbb
Compare
|
@github-actions crossbow submit cuda |
|
Revision: 00fccbb Submitted crossbow builds: ursacomputing/crossbow @ actions-23548261b9
|
FWIW, I'm hoping to resolve the Numba-CUDA issue and publish a new release today. |
Actually testing Numba interop may still require GH-47371 to be fixed, though. |
|
Thanks for the pointer - I'll move onto that issue shortly afterwards. |
…ctions" (#611) This reverts commit 9a56516. This changed the public API of `MemoryPointer` and related classes, and the context that they held was used by Arrow (see apache/arrow#48259 (comment)): > Numba interop tests fail with: ``` arrow-dev/lib/python3.12/site-packages/pyarrow/tests/test_cuda_numba_interop.py:233: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E TypeError: MemoryPointer.__init__() got multiple values for argument 'pointer' ``` This commit reverts the change, as it was intended to improve performance without changing functionality, but has had a functional change as a side effect. Following the merge of this PR, we should be able to remove some of the `@require_context` decorators with some more targeted changes.
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit ab4a096. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
What changes are included in this PR?
Are these changes tested?
Yes, on CI.
Are there any user-facing changes?
No.