build and test against CUDA 13.1.0 by jameslamb · Pull Request #1677 · rapidsai/cuvs

jameslamb · 2026-01-07T15:42:16Z

Contributes to rapidsai/build-planning#236

Tests that CI here will work with the changes from rapidsai/shared-workflows#483,
switches CUDA 13 builds to CUDA 13.1.0 and adds some CUDA 13.1.0 test jobs.

Also restores the CUDA 13.0 devcontainers and switches CI to testing those (see rapidsai/devcontainers#644 (comment)).

copy-pr-bot · 2026-01-07T15:42:20Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

…t builds

jameslamb · 2026-01-07T15:48:48Z

/ok to test

jameslamb · 2026-01-07T16:49:14Z

All of the CUDA 13.1.0 builds are failing like this:

[315/332] Building CUDA object CMakeFiles/cuvs_objs.dir/src/neighbors/ivf_pq/detail/ivf_pq_compute_similarity_float_fp8_false.cu.o
FAILED: [code=1] CMakeFiles/cuvs_objs.dir/src/neighbors/ivf_pq/detail/ivf_pq_compute_similarity_float_fp8_false.cu.o
sccache /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/opt/rh/gcc-toolset-14/root/usr/bin/gcc -DCCCL_DISABLE_PDL -DCUB_DISABLE_NAMESPACE_MAGIC -DCUB_IGNORE_NAMESPACE_MAGIC_ERROR -DCUTLASS_NAMESPACE=raft_cutlass -DCUVS_BUILD_CAGRA_HNSWLIB -DCUVS_BUILD_MG_ALGOS -DCUVS_SYSTEM_LITTLE_ENDIAN=1 -DLIBCUDACXX_ENABLE_EXPERIMENTAL_MEMORY_RESOURCE -DRAFT_LOG_ACTIVE_LEVEL=RAPIDS_LOGGER_LOG_LEVEL_INFO -DRAFT_SYSTEM_LITTLE_ENDIAN=1 -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_DISABLE_ABI_NAMESPACE -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -DTHRUST_IGNORE_ABI_NAMESPACE_ERROR -I/__w/cuvs/cuvs/cpp/include -I/__w/cuvs/cuvs/cpp/build/include -I/__w/cuvs/cuvs/cpp/build/_deps/raft-src/cpp/include -I/__w/cuvs/cuvs/cpp/build/_deps/raft-build/include -I/__w/cuvs/cuvs/cpp/build/_deps/rapids_logger-src/include -I/__w/cuvs/cuvs/cpp/build/_deps/rmm-src/cpp/include -I/__w/cuvs/cuvs/cpp/build/_deps/rmm-build/include -I/__w/cuvs/cuvs/cpp/build/_deps/cccl-src/lib/cmake/thrust/../../../thrust -I/__w/cuvs/cuvs/cpp/build/_deps/cccl-src/lib/cmake/libcudacxx/../../../libcudacxx/include -I/__w/cuvs/cuvs/cpp/build/_deps/cccl-src/lib/cmake/cub/../../../cub -I/__w/cuvs/cuvs/cpp/build/_deps/nvtx3-src/c/include -I/__w/cuvs/cuvs/cpp/build/_deps/cuco-src/include -I/__w/cuvs/cuvs/cpp/build/_deps/nvidiacutlass-src/include -I/__w/cuvs/cuvs/cpp/build/_deps/nvidiacutlass-build/include -I/__w/cuvs/cuvs/cpp/build/_deps/hnswlib-src -isystem /usr/local/cuda/targets/x86_64-linux/include -isystem /usr/local/cuda/targets/x86_64-linux/include/cccl -isystem /usr/local/cuda/include -isystem /usr/local/cuda/include/cccl -O3 -DNDEBUG -std=c++20 "--generate-code=arch=compute_75,code=[sm_75]" "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_86,code=[sm_86]" "--generate-code=arch=compute_90a,code=[sm_90a]" "--generate-code=arch=compute_100f,code=[sm_100f]" "--generate-code=arch=compute_120a,code=[sm_120a]" "--generate-code=arch=compute_120,code=[compute_120,sm_120]" -Xcompiler=-fPIC -Xcompiler=-Wno-deprecated-declarations -DRAFT_HIDE_DEPRECATION_WARNINGS -Xcompiler=-Wall,-Werror,-Wno-error=deprecated-declarations,-Wno-reorder -Werror=all-warnings --expt-extended-lambda --expt-relaxed-constexpr -DCUDA_API_PER_THREAD_DEFAULT_STREAM -Xfatbin=-compress-all --compress-mode=size -Xcompiler=-fopenmp -Xcompiler -pthread -MD -MT CMakeFiles/cuvs_objs.dir/src/neighbors/ivf_pq/detail/ivf_pq_compute_similarity_float_fp8_false.cu.o -MF CMakeFiles/cuvs_objs.dir/src/neighbors/ivf_pq/detail/ivf_pq_compute_similarity_float_fp8_false.cu.o.d -x cu -c /__w/cuvs/cuvs/cpp/src/neighbors/ivf_pq/detail/ivf_pq_compute_similarity_float_fp8_false.cu -o CMakeFiles/cuvs_objs.dir/src/neighbors/ivf_pq/detail/ivf_pq_compute_similarity_float_fp8_false.cu.o
Stored value type does not match pointer operand type!
store void (i32, i32, i32, i32, i32, i32, i32, i32, i32, float*, float*, i8**, i32*, i32*, float*, i32*, float*, %struct._ZN4cuvs9neighbors9filtering14ivf_filter_devE, %struct._ZN4cuvs9neighbors6ivf_pq6detail7fp_8bitILj5ELb0EEE*, float*, i32*)** %call2, void (i32, i32, i32, i32, i32, i32, i32, i32, i32, float*, float*, i8**, i32*, i32*, float*, i32*, float*, %struct._ZN4cuvs9neighbors9filtering14ivf_filter_devE*, %struct._ZN4cuvs9neighbors6ivf_pq6detail7fp_8bitILj5ELb0EEE*, float*, i32*)*** %retval, align 8, !dbg !26518
void (i32, i32, i32, i32, i32, i32, i32, i32, i32, float*, float*, i8**, i32*, i32*, float*, i32*, float*, %struct._ZN4cuvs9neighbors9filtering14ivf_filter_devE*, %struct._ZN4cuvs9neighbors6ivf_pq6detail7fp_8bitILj5ELb0EEE*, float*, i32*)**: parse Explicit load/store type does not match pointee type of pointer operand (Producer: 'LLVM7.0.1' Reader: 'LLVM 7.0.1')

(build link)

Quoting an offline conversation w/ @robertmaynard

The technical issue is that somehow we are getting a mismatch of pass by value and pass by pointer ( _ZN4cuvs9neighbors9filtering14ivf_filter_devE versus _ZN4cuvs9neighbors9filtering14ivf_filter_devE* ).
The first place I would look is line 611-613 of cuvs/cpp/src/neighbors/ivf_pq/ivf_pq_compute_similarity_impl.cuh which shoves a kernel launch into a lambda.

That's here:

cuvs/cpp/src/neighbors/ivf_pq/ivf_pq_compute_similarity_impl.cuh

Lines 611 to 613 in e1d127c

    
           auto launch_kernel = [&](filtering::ivf_filter_dev sample_filter) { 
        
             auto kernel = reinterpret_cast<compute_similarity_kernel_t<OutT, LutT>>(get_kernel(s)); 
        
             kernel<<<s.grid_dim, s.block_dim, s.smem_size, stream>>>(dim,

bdice · 2026-01-07T23:15:47Z

Calling out the fix in b3a0cf1:

Switching from std::tuple to cuda::std::tuple fixed the compilation problems observed above.

bdice · 2026-01-07T23:36:44Z

/merge

#1686) ## use CUDA 13.1 devcontainers Follow-up to #1677 There, I forgot to switch devcontainer testing here back to CUDA 13.1 (I'd temporarily kept it at 13.0 because there weren't yet NCCL packages with 13.1 support). This fixes that. ## react to cutlass removals in RAFT rapidsai/raft#2916 removed headers used by cuVS and stopped exporting cutlass from RAFT. This brings those headers and some related patches over here to cuVS. Related: rapidsai/cuml#7658 Authors: - James Lamb (https://github.com/jameslamb) - Divye Gala (https://github.com/divyegala) Approvers: - Bradley Dice (https://github.com/bdice) - Robert Maynard (https://github.com/robertmaynard) - Dante Gama Dessavre (https://github.com/dantegd) URL: #1686

build and test against CUDA 13.1.0

1a776eb

github-project-automation bot added this to Vector Search, ML, & Data Mining Release Board Jan 7, 2026

github-project-automation bot moved this to Todo in Vector Search, ML, & Data Mining Release Board Jan 7, 2026

jameslamb added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels Jan 7, 2026

jameslamb mentioned this pull request Jan 7, 2026

Update RAPIDS from CUDA 13.0 to 13.1 rapidsai/build-planning#236

Closed

restore CUDA 13.0 devcontainers, update version for Go, Java, and Rus…

8197a88

…t builds

jameslamb mentioned this pull request Jan 7, 2026

build and test against CUDA 13.1.0 rapidsai/cuvs-lucene#89

Merged

jameslamb requested a review from bdice January 7, 2026 19:57

jameslamb changed the title ~~WIP: build and test against CUDA 13.1.0~~ build and test against CUDA 13.1.0 Jan 7, 2026

jameslamb marked this pull request as ready for review January 7, 2026 19:57

jameslamb requested review from a team as code owners January 7, 2026 19:58

jameslamb and others added 2 commits January 7, 2026 13:58

Merge branch 'main' into cuda13.1.0-workflows

65c330b

Use cuda::std::tuple for host/device compatibility

b3a0cf1

bdice requested a review from a team as a code owner January 7, 2026 20:13

jameslamb requested a review from gforsyth January 7, 2026 21:37

bdice approved these changes Jan 7, 2026

View reviewed changes

divyegala approved these changes Jan 7, 2026

View reviewed changes

rapids-bot bot merged commit 4c004e7 into rapidsai:main Jan 7, 2026
99 checks passed

github-project-automation bot moved this from Todo to Done in Vector Search, ML, & Data Mining Release Board Jan 7, 2026

jameslamb mentioned this pull request Jan 8, 2026

prefer CUDA 13.1 devcontainers, react to some cutlass removals in RAFT #1686

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build and test against CUDA 13.1.0#1677

build and test against CUDA 13.1.0#1677
rapids-bot[bot] merged 4 commits intorapidsai:mainfrom
jameslamb:cuda13.1.0-workflows

jameslamb commented Jan 7, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Jan 7, 2026

Uh oh!

jameslamb commented Jan 7, 2026

Uh oh!

jameslamb commented Jan 7, 2026

Uh oh!

bdice commented Jan 7, 2026

Uh oh!

bdice commented Jan 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jameslamb commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Jan 7, 2026

Uh oh!

jameslamb commented Jan 7, 2026

Uh oh!

jameslamb commented Jan 7, 2026

Uh oh!

bdice commented Jan 7, 2026

Uh oh!

bdice commented Jan 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jameslamb commented Jan 7, 2026 •

edited

Loading