Skip to content

Instantiate uint64_t IndexType for CAGRA search kernels#2190

Closed
divyegala wants to merge 4 commits intorapidsai:branch-24.04from
divyegala:cagra-index_t-uint64
Closed

Instantiate uint64_t IndexType for CAGRA search kernels#2190
divyegala wants to merge 4 commits intorapidsai:branch-24.04from
divyegala:cagra-index_t-uint64

Conversation

@divyegala
Copy link
Member

These instantiations are required in facebookresearch/faiss#3084 to link to precompiled CAGRA binaries from libraft.so

@divyegala divyegala requested a review from a team as a code owner February 16, 2024 20:35
@divyegala divyegala added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 16, 2024
@divyegala divyegala requested a review from a team as a code owner February 16, 2024 20:59
@tfeher
Copy link
Contributor

tfeher commented Feb 19, 2024

/rerun tests

@achirkin
Copy link
Contributor

Looks like the *.cu files are missing from the PR?

@tfeher
Copy link
Contributor

tfeher commented Feb 20, 2024

Thanks Divye for this PR! Currently INDEX_T is used for three different purpose:

  1. Output neighbor index type
  2. Graph index type
  3. Index type used in search kernels to store visited / candidate nodes.

The goal of this PR is to change 1. to int64_t to facilitate FAISS integration. But the current PR effectively changes all 3 types, and that is not ideal. We have the following problems:

  1. Graph size. A graph for 100M vectors and 32 degree is already 12 GiB. This would double, reducing the data size that we can handle in GPU memory.

  2. a) Binary size: currently CAGRA search kernels add around 50 MiB to libraft.so. I expect that this would double, and also the build time would correspondingly increase (although it can be build in parallel, so this might be less concern).

    b) Performance: the kernels are optimized for 32bit indexing. We should benchmark and possibly adjust the kernel configs for 64bit index type.

Considering these issues I would recommend to separate these concerns, and modify only the output index type (1.)

This can be easily done by introducing an extra output index template parameter, and an elementwise op to cast the index type if necessary. This is already used in our benchmark, so our benchmark times already include this overhead:

if constexpr (!std::is_same_v<IdxT, size_t>) {
raft::linalg::unaryOp(neighbors,
neighbors_IdxT,
batch_size * k,
raft::cast_op<size_t>(),
raft::resource::get_cuda_stream(handle_));
}

If we have any performance consideration with additional kernel call, then I would recommend to add a follow-up task to improve that. One could add the cast when

While changing single-cta kernel, one should watch out for binary size: we can do a runtime type dispatch without introducing an additional template parameter.

@divyegala
Copy link
Member Author

@tfeher thanks for the detailed writeup! I agree with you that we should limit the changes just to the OutputType. Would you agree that in this case where we have a lot of considerations to figure out, we should just add a copy in FAISS itself to cast the output from uint32_t to int64_t for the first iteration, then follow up with updates to see if we can natively support OutputType=int64_t with acceptable changes to binary size and overheads?

@tfeher
Copy link
Contributor

tfeher commented Feb 27, 2024

add a copy in FAISS itself to cast the output from uint32_t to int64_t for the first iteration, then follow up with updates to see if we can natively support OutputType=int64_t

Depends. If you anyways have to map the neighbor idx data in the faiss integration (e.g. marking invalid values according to faiss expectation), then it sounds fine to me to do the mapping in faiss.

Otherwise I would prefer to do the cast in cagra::search, because then we need to modify only one library when we decide to further optimize this.

@divyegala
Copy link
Member Author

Depends. If you anyways have to map the neighbor idx data in the faiss integration (e.g. marking invalid values according to faiss expectation), then it sounds fine to me to do the mapping in faiss.

@tfeher I'm not sure what this means. When would values be invalid in faiss?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CMake cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Development

Successfully merging this pull request may close these issues.

4 participants