Skip to content

build and test against CUDA 13.1.0#1677

Merged
rapids-bot[bot] merged 4 commits intorapidsai:mainfrom
jameslamb:cuda13.1.0-workflows
Jan 7, 2026
Merged

build and test against CUDA 13.1.0#1677
rapids-bot[bot] merged 4 commits intorapidsai:mainfrom
jameslamb:cuda13.1.0-workflows

Conversation

@jameslamb
Copy link
Member

@jameslamb jameslamb commented Jan 7, 2026

Contributes to rapidsai/build-planning#236

Tests that CI here will work with the changes from rapidsai/shared-workflows#483,
switches CUDA 13 builds to CUDA 13.1.0 and adds some CUDA 13.1.0 test jobs.

Also restores the CUDA 13.0 devcontainers and switches CI to testing those (see rapidsai/devcontainers#644 (comment)).

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 7, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@jameslamb
Copy link
Member Author

/ok to test

@jameslamb
Copy link
Member Author

All of the CUDA 13.1.0 builds are failing like this:

[315/332] Building CUDA object CMakeFiles/cuvs_objs.dir/src/neighbors/ivf_pq/detail/ivf_pq_compute_similarity_float_fp8_false.cu.o
FAILED: [code=1] CMakeFiles/cuvs_objs.dir/src/neighbors/ivf_pq/detail/ivf_pq_compute_similarity_float_fp8_false.cu.o
sccache /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/opt/rh/gcc-toolset-14/root/usr/bin/gcc -DCCCL_DISABLE_PDL -DCUB_DISABLE_NAMESPACE_MAGIC -DCUB_IGNORE_NAMESPACE_MAGIC_ERROR -DCUTLASS_NAMESPACE=raft_cutlass -DCUVS_BUILD_CAGRA_HNSWLIB -DCUVS_BUILD_MG_ALGOS -DCUVS_SYSTEM_LITTLE_ENDIAN=1 -DLIBCUDACXX_ENABLE_EXPERIMENTAL_MEMORY_RESOURCE -DRAFT_LOG_ACTIVE_LEVEL=RAPIDS_LOGGER_LOG_LEVEL_INFO -DRAFT_SYSTEM_LITTLE_ENDIAN=1 -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_DISABLE_ABI_NAMESPACE -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -DTHRUST_IGNORE_ABI_NAMESPACE_ERROR -I/__w/cuvs/cuvs/cpp/include -I/__w/cuvs/cuvs/cpp/build/include -I/__w/cuvs/cuvs/cpp/build/_deps/raft-src/cpp/include -I/__w/cuvs/cuvs/cpp/build/_deps/raft-build/include -I/__w/cuvs/cuvs/cpp/build/_deps/rapids_logger-src/include -I/__w/cuvs/cuvs/cpp/build/_deps/rmm-src/cpp/include -I/__w/cuvs/cuvs/cpp/build/_deps/rmm-build/include -I/__w/cuvs/cuvs/cpp/build/_deps/cccl-src/lib/cmake/thrust/../../../thrust -I/__w/cuvs/cuvs/cpp/build/_deps/cccl-src/lib/cmake/libcudacxx/../../../libcudacxx/include -I/__w/cuvs/cuvs/cpp/build/_deps/cccl-src/lib/cmake/cub/../../../cub -I/__w/cuvs/cuvs/cpp/build/_deps/nvtx3-src/c/include -I/__w/cuvs/cuvs/cpp/build/_deps/cuco-src/include -I/__w/cuvs/cuvs/cpp/build/_deps/nvidiacutlass-src/include -I/__w/cuvs/cuvs/cpp/build/_deps/nvidiacutlass-build/include -I/__w/cuvs/cuvs/cpp/build/_deps/hnswlib-src -isystem /usr/local/cuda/targets/x86_64-linux/include -isystem /usr/local/cuda/targets/x86_64-linux/include/cccl -isystem /usr/local/cuda/include -isystem /usr/local/cuda/include/cccl -O3 -DNDEBUG -std=c++20 "--generate-code=arch=compute_75,code=[sm_75]" "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_86,code=[sm_86]" "--generate-code=arch=compute_90a,code=[sm_90a]" "--generate-code=arch=compute_100f,code=[sm_100f]" "--generate-code=arch=compute_120a,code=[sm_120a]" "--generate-code=arch=compute_120,code=[compute_120,sm_120]" -Xcompiler=-fPIC -Xcompiler=-Wno-deprecated-declarations -DRAFT_HIDE_DEPRECATION_WARNINGS -Xcompiler=-Wall,-Werror,-Wno-error=deprecated-declarations,-Wno-reorder -Werror=all-warnings --expt-extended-lambda --expt-relaxed-constexpr -DCUDA_API_PER_THREAD_DEFAULT_STREAM -Xfatbin=-compress-all --compress-mode=size -Xcompiler=-fopenmp -Xcompiler -pthread -MD -MT CMakeFiles/cuvs_objs.dir/src/neighbors/ivf_pq/detail/ivf_pq_compute_similarity_float_fp8_false.cu.o -MF CMakeFiles/cuvs_objs.dir/src/neighbors/ivf_pq/detail/ivf_pq_compute_similarity_float_fp8_false.cu.o.d -x cu -c /__w/cuvs/cuvs/cpp/src/neighbors/ivf_pq/detail/ivf_pq_compute_similarity_float_fp8_false.cu -o CMakeFiles/cuvs_objs.dir/src/neighbors/ivf_pq/detail/ivf_pq_compute_similarity_float_fp8_false.cu.o
Stored value type does not match pointer operand type!
store void (i32, i32, i32, i32, i32, i32, i32, i32, i32, float*, float*, i8**, i32*, i32*, float*, i32*, float*, %struct._ZN4cuvs9neighbors9filtering14ivf_filter_devE, %struct._ZN4cuvs9neighbors6ivf_pq6detail7fp_8bitILj5ELb0EEE*, float*, i32*)** %call2, void (i32, i32, i32, i32, i32, i32, i32, i32, i32, float*, float*, i8**, i32*, i32*, float*, i32*, float*, %struct._ZN4cuvs9neighbors9filtering14ivf_filter_devE*, %struct._ZN4cuvs9neighbors6ivf_pq6detail7fp_8bitILj5ELb0EEE*, float*, i32*)*** %retval, align 8, !dbg !26518
void (i32, i32, i32, i32, i32, i32, i32, i32, i32, float*, float*, i8**, i32*, i32*, float*, i32*, float*, %struct._ZN4cuvs9neighbors9filtering14ivf_filter_devE*, %struct._ZN4cuvs9neighbors6ivf_pq6detail7fp_8bitILj5ELb0EEE*, float*, i32*)**: parse Explicit load/store type does not match pointee type of pointer operand (Producer: 'LLVM7.0.1' Reader: 'LLVM 7.0.1')

(build link)

Quoting an offline conversation w/ @robertmaynard

The technical issue is that somehow we are getting a mismatch of pass by value and pass by pointer ( _ZN4cuvs9neighbors9filtering14ivf_filter_devE versus _ZN4cuvs9neighbors9filtering14ivf_filter_devE* ).
The first place I would look is line 611-613 of cuvs/cpp/src/neighbors/ivf_pq/ivf_pq_compute_similarity_impl.cuh which shoves a kernel launch into a lambda.

That's here:

auto launch_kernel = [&](filtering::ivf_filter_dev sample_filter) {
auto kernel = reinterpret_cast<compute_similarity_kernel_t<OutT, LutT>>(get_kernel(s));
kernel<<<s.grid_dim, s.block_dim, s.smem_size, stream>>>(dim,

@jameslamb jameslamb requested a review from bdice January 7, 2026 19:57
@jameslamb jameslamb changed the title WIP: build and test against CUDA 13.1.0 build and test against CUDA 13.1.0 Jan 7, 2026
@jameslamb jameslamb marked this pull request as ready for review January 7, 2026 19:57
@jameslamb jameslamb requested review from a team as code owners January 7, 2026 19:58
@bdice bdice requested a review from a team as a code owner January 7, 2026 20:13
@jameslamb jameslamb requested a review from gforsyth January 7, 2026 21:37
@bdice
Copy link
Contributor

bdice commented Jan 7, 2026

Calling out the fix in b3a0cf1:

Switching from std::tuple to cuda::std::tuple fixed the compilation problems observed above.

@bdice
Copy link
Contributor

bdice commented Jan 7, 2026

/merge

@rapids-bot rapids-bot bot merged commit 4c004e7 into rapidsai:main Jan 7, 2026
99 checks passed
rapids-bot bot pushed a commit that referenced this pull request Jan 10, 2026
#1686)

## use CUDA 13.1 devcontainers

Follow-up to #1677

There, I forgot to switch devcontainer testing here back to CUDA 13.1 (I'd temporarily kept it at 13.0 because there weren't yet NCCL packages with 13.1 support). This fixes that.

## react to cutlass removals in RAFT

rapidsai/raft#2916 removed headers used by cuVS and stopped exporting cutlass from RAFT.

This brings those headers and some related patches over here to cuVS.

Related: rapidsai/cuml#7658

Authors:
  - James Lamb (https://github.com/jameslamb)
  - Divye Gala (https://github.com/divyegala)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Robert Maynard (https://github.com/robertmaynard)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #1686
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Development

Successfully merging this pull request may close these issues.

3 participants