Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to use cuVS for vector search #6085

Merged
merged 52 commits into from
Oct 4, 2024
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
2f2c55c
Migrate from raft to cuvs for pairwise_distance and bfknn
benfred Jul 16, 2024
9116ae8
Merge branch 'branch-24.08' into cuvs
benfred Jul 22, 2024
84bc77a
.
benfred Jul 22, 2024
cbb79ec
use cuvs::distance::DistanceType where possible
benfred Jul 22, 2024
f653059
Revert "use cuvs::distance::DistanceType where possible"
benfred Sep 6, 2024
a6f2c2a
use stats from raft
benfred Sep 18, 2024
c2c9c04
Merge remote-tracking branch 'origin/branch-24.10' into cuvs
benfred Sep 18, 2024
95b9c14
use ivf-* from cuvs
benfred Sep 20, 2024
1934d40
Merge remote-tracking branch 'origin/branch-24.10' into cuvs
benfred Sep 20, 2024
0601412
Merge branch 'rapidsai:branch-24.10' into cuvs
benfred Sep 26, 2024
f30f933
add libcuvs to dependencies.yaml
benfred Sep 26, 2024
69d2398
Merge branch 'cuvs' of https://github.com/benfred/cuml into cuvs
benfred Sep 26, 2024
6c46624
attempt to fix build error in CI
benfred Sep 26, 2024
3fb5479
Merge branch 'branch-24.10' into cuvs
benfred Sep 26, 2024
5380c17
fix tsne
benfred Sep 27, 2024
dc1f7a6
Merge branch 'cuvs' of https://github.com/benfred/cuml into cuvs
benfred Sep 27, 2024
fdd18c5
fix ivf-fla
benfred Sep 27, 2024
590865d
fix test_nearest_neighbors_rbc test for haversine distance
benfred Sep 27, 2024
c7d1b0e
re-add MetricProcessor code
benfred Sep 27, 2024
e8c1b18
suggestions from code review
benfred Sep 27, 2024
bd58347
fix dask pytests
benfred Sep 29, 2024
adb450a
attempt to fix python build errors in CI
benfred Sep 29, 2024
df47d3c
use raft in header only mode
benfred Sep 29, 2024
c72173c
Use kmeans/mutual reachability code from pending cuvs PR's
benfred Sep 29, 2024
6139db0
remove comment
benfred Sep 30, 2024
3e1c465
cmake fixes
benfred Sep 30, 2024
732ea10
.
benfred Sep 30, 2024
4942bb0
pick up right cuvs version
benfred Sep 30, 2024
fef9920
.
benfred Sep 30, 2024
3c0c47e
empty commit for ci
benfred Sep 30, 2024
d8e6b6d
Exclude libcuvs.so in auditwheel.
bdice Sep 30, 2024
257898e
add cuvs to python dependencies
benfred Oct 1, 2024
9f805cd
Merge branch 'cuvs' of https://github.com/benfred/cuml into cuvs
benfred Oct 1, 2024
176c9a9
use l2expanded distance in kmeans transform
benfred Oct 2, 2024
2122120
Merge branch 'branch-24.10' into cuvs
benfred Oct 2, 2024
c12f1d5
Set rpath for cuvs
KyleFromNVIDIA Oct 2, 2024
f3edb8f
Don't link Python modules against cuvs directly
KyleFromNVIDIA Oct 2, 2024
f906e79
Remove superfluous cuvs::cuvs references
KyleFromNVIDIA Oct 2, 2024
585acd4
Add cuvs rpath
KyleFromNVIDIA Oct 2, 2024
1964a94
remove cuvs pin
benfred Oct 2, 2024
f1db388
Merge branch 'cuvs' of https://github.com/benfred/cuml into cuvs
benfred Oct 2, 2024
259256a
updates to handle bfknn api changes
benfred Oct 3, 2024
4efd97a
link cuvs statically in python wheels
benfred Oct 3, 2024
41daf01
empty commit for ci
benfred Oct 3, 2024
31331d7
empty commit for ci
benfred Oct 3, 2024
6bef472
empty commit for ci
benfred Oct 3, 2024
3e88b8e
remove pin
benfred Oct 3, 2024
064cced
Merge branch 'branch-24.10' into cuvs
cjnolet Oct 3, 2024
3959c58
re-add pin + suggestions from code review
benfred Oct 3, 2024
dc6de84
Merge branch 'cuvs' of https://github.com/benfred/cuml into cuvs
benfred Oct 3, 2024
461a271
fix
benfred Oct 3, 2024
f50b9d2
remove pin
benfred Oct 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions ci/release/update-version.sh
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ DEPENDENCIES=(
libcumlprims
libraft-headers
libraft
libcuvs
benfred marked this conversation as resolved.
Show resolved Hide resolved
librmm
pylibraft
raft-dask
Expand Down
1 change: 1 addition & 0 deletions conda/environments/all_cuda-118_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ dependencies:
- libcusolver=11.4.1.48
- libcusparse-dev=11.7.5.86
- libcusparse=11.7.5.86
- libcuvs==24.10.*,>=0.0.0a0
- libraft-headers==24.10.*,>=0.0.0a0
- libraft==24.10.*,>=0.0.0a0
- librmm==24.10.*,>=0.0.0a0
Expand Down
1 change: 1 addition & 0 deletions conda/environments/all_cuda-125_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ dependencies:
- libcurand-dev
- libcusolver-dev
- libcusparse-dev
- libcuvs==24.10.*,>=0.0.0a0
- libraft-headers==24.10.*,>=0.0.0a0
- libraft==24.10.*,>=0.0.0a0
- librmm==24.10.*,>=0.0.0a0
Expand Down
1 change: 1 addition & 0 deletions conda/environments/clang_tidy_cuda-118_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ dependencies:
- libcusolver=11.4.1.48
- libcusparse-dev=11.7.5.86
- libcusparse=11.7.5.86
- libcuvs==24.10.*,>=0.0.0a0
- libraft-headers==24.10.*,>=0.0.0a0
- libraft==24.10.*,>=0.0.0a0
- librmm==24.10.*,>=0.0.0a0
Expand Down
1 change: 1 addition & 0 deletions conda/environments/cpp_all_cuda-118_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ dependencies:
- libcusolver=11.4.1.48
- libcusparse-dev=11.7.5.86
- libcusparse=11.7.5.86
- libcuvs==24.10.*,>=0.0.0a0
- libraft-headers==24.10.*,>=0.0.0a0
- libraft==24.10.*,>=0.0.0a0
- librmm==24.10.*,>=0.0.0a0
Expand Down
1 change: 1 addition & 0 deletions conda/environments/cpp_all_cuda-125_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ dependencies:
- libcurand-dev
- libcusolver-dev
- libcusparse-dev
- libcuvs==24.10.*,>=0.0.0a0
- libraft-headers==24.10.*,>=0.0.0a0
- libraft==24.10.*,>=0.0.0a0
- librmm==24.10.*,>=0.0.0a0
Expand Down
15 changes: 3 additions & 12 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,7 @@ endif()
include(cmake/thirdparty/get_cccl.cmake)
include(cmake/thirdparty/get_rmm.cmake)
include(cmake/thirdparty/get_raft.cmake)
include(cmake/thirdparty/get_cuvs.cmake)

if(LINK_TREELITE)
include(cmake/thirdparty/get_treelite.cmake)
Expand Down Expand Up @@ -442,18 +443,6 @@ if(BUILD_CUML_CPP_LIBRARY)
src/metrics/kl_divergence.cu
src/metrics/mutual_info_score.cu
src/metrics/pairwise_distance.cu
src/metrics/pairwise_distance_canberra.cu
src/metrics/pairwise_distance_chebyshev.cu
src/metrics/pairwise_distance_correlation.cu
src/metrics/pairwise_distance_cosine.cu
src/metrics/pairwise_distance_euclidean.cu
src/metrics/pairwise_distance_hamming.cu
src/metrics/pairwise_distance_hellinger.cu
src/metrics/pairwise_distance_jensen_shannon.cu
src/metrics/pairwise_distance_kl_divergence.cu
src/metrics/pairwise_distance_l1.cu
src/metrics/pairwise_distance_minkowski.cu
src/metrics/pairwise_distance_russell_rao.cu
src/metrics/r2_score.cu
src/metrics/rand_index.cu
src/metrics/silhouette_score.cu
Expand Down Expand Up @@ -611,6 +600,7 @@ if(BUILD_CUML_CPP_LIBRARY)
# These are always private:
list(APPEND _cuml_cpp_private_libs
raft::raft
cuvs::cuvs
$<TARGET_NAME_IF_EXISTS:GPUTreeShap::GPUTreeShap>
$<$<BOOL:${LINK_CUFFT}>:CUDA::cufft${_ctk_fft_static_suffix}>
${TREELITE_LIBS}
Expand Down Expand Up @@ -677,6 +667,7 @@ if(BUILD_CUML_C_LIBRARY)
target_link_libraries(${CUML_C_TARGET}
PUBLIC
${CUML_CPP_TARGET}
PRIVATE cuvs::cuvs
)

# ensure CUDA symbols aren't relocated to the middle of the debug build binaries
Expand Down
1 change: 1 addition & 0 deletions cpp/bench/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ if(BUILD_CUML_BENCH)
cuml::${CUML_CPP_TARGET}
benchmark::benchmark
${TREELITE_LIBS}
cuvs::cuvs
benfred marked this conversation as resolved.
Show resolved Hide resolved
raft::raft
raft::compiled
)
Expand Down
66 changes: 66 additions & 0 deletions cpp/cmake/thirdparty/get_cuvs.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
#=============================================================================
# Copyright (c) 2024, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#=============================================================================

set(CUML_MIN_VERSION_cuvs "${CUML_VERSION_MAJOR}.${CUML_VERSION_MINOR}.00")
set(CUML_BRANCH_VERSION_cuvs "${CUML_VERSION_MAJOR}.${CUML_VERSION_MINOR}")

function(find_and_configure_cuvs)
set(oneValueArgs VERSION FORK PINNED_TAG EXCLUDE_FROM_ALL COMPILE_LIBRARY CLONE_ON_PIN NVTX)
cmake_parse_arguments(PKG "${options}" "${oneValueArgs}"
"${multiValueArgs}" ${ARGN} )

if(PKG_CLONE_ON_PIN AND NOT PKG_PINNED_TAG STREQUAL "branch-${CUML_BRANCH_VERSION_cuvs}")
message(STATUS "CUML: CUVS pinned tag found: ${PKG_PINNED_TAG}. Cloning cuvs locally.")
set(CPM_DOWNLOAD_cuvs ON)
endif()

rapids_cpm_find(cuvs ${PKG_VERSION}
GLOBAL_TARGETS cuvs::cuvs
BUILD_EXPORT_SET cuml-exports
INSTALL_EXPORT_SET cuml-exports
CPM_ARGS
GIT_REPOSITORY https://github.com/${PKG_FORK}/cuvs.git
GIT_TAG ${PKG_PINNED_TAG}
SOURCE_SUBDIR cpp
EXCLUDE_FROM_ALL ${PKG_EXCLUDE_FROM_ALL}
OPTIONS
"BUILD_TESTS OFF"
"BUILD_BENCH OFF"
)

if(cuvs_ADDED)
message(VERBOSE "CUML: Using CUVS located in ${cuvs_SOURCE_DIR}")
else()
message(VERBOSE "CUML: Using CUVS located in ${cuvs_DIR}")
endif()


endfunction()

# Change pinned tag here to test a commit in CI
# To use a different CUVS locally, set the CMake variable
# CPM_cuvs_SOURCE=/path/to/local/cuvs
find_and_configure_cuvs(VERSION ${CUML_MIN_VERSION_cuvs}
FORK rapidsai
PINNED_TAG branch-${CUML_BRANCH_VERSION_cuvs}
EXCLUDE_FROM_ALL ${CUML_EXCLUDE_CUVS_FROM_ALL}
# When PINNED_TAG above doesn't match cuml,
# force local cuvs clone in build directory
# even if it's already installed.
CLONE_ON_PIN ${CUML_CUVS_CLONE_ON_PIN}
COMPILE_LIBRARY ${CUML_CUVS_COMPILED}
NVTX ${NVTX}
benfred marked this conversation as resolved.
Show resolved Hide resolved
)
46 changes: 40 additions & 6 deletions cpp/include/cuml/neighbors/knn.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2023, NVIDIA CORPORATION.
* Copyright (c) 2019-2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -17,9 +17,11 @@
#pragma once

#include <raft/distance/distance_types.hpp>
#include <raft/spatial/knn/ann_common.h>
#include <raft/spatial/knn/ball_cover_types.hpp>

#include <cuvs/neighbors/ivf_flat.hpp>
#include <cuvs/neighbors/ivf_pq.hpp>

namespace raft {
class handle_t;
}
Expand All @@ -46,6 +48,8 @@ namespace ML {
* default
* @param[in] metric_arg the value of `p` for Minkowski (l-p) distances. This
* is ignored if the metric_type is not Minkowski.
* @param[in] translations translation ids for indices when index rows represent
* non-contiguous partitions
*/
void brute_force_knn(const raft::handle_t& handle,
std::vector<float*>& input,
Expand All @@ -59,7 +63,8 @@ void brute_force_knn(const raft::handle_t& handle,
bool rowMajorIndex = false,
bool rowMajorQuery = false,
raft::distance::DistanceType metric = raft::distance::DistanceType::L2Expanded,
float metric_arg = 2.0f);
float metric_arg = 2.0f,
std::vector<int64_t>* translations = nullptr);

void rbc_build_index(const raft::handle_t& handle,
raft::spatial::knn::BallCoverIndex<int64_t, float, uint32_t>& index);
Expand All @@ -71,6 +76,35 @@ void rbc_knn_query(const raft::handle_t& handle,
uint32_t n_search_items,
int64_t* out_inds,
float* out_dists);

struct knnIndex {
raft::distance::DistanceType metric;
float metricArg;
int nprobe;

std::unique_ptr<cuvs::neighbors::ivf_flat::index<float, int64_t>> ivf_flat;
std::unique_ptr<cuvs::neighbors::ivf_pq::index<int64_t>> ivf_pq;

int device;
};

struct knnIndexParam {
virtual ~knnIndexParam() {}
};

struct IVFParam : knnIndexParam {
int nlist;
int nprobe;
};

struct IVFFlatParam : IVFParam {};

struct IVFPQParam : IVFParam {
int M;
int n_bits;
bool usePrecomputedTables;
};

/**
* @brief Flat C++ API function to build an approximate nearest neighbors index
* from an index array and a set of parameters.
Expand All @@ -85,8 +119,8 @@ void rbc_knn_query(const raft::handle_t& handle,
* @param[in] D the dimensionality of the index array
*/
void approx_knn_build_index(raft::handle_t& handle,
raft::spatial::knn::knnIndex* index,
raft::spatial::knn::knnIndexParam* params,
knnIndex* index,
knnIndexParam* params,
raft::distance::DistanceType metric,
float metricArg,
float* index_array,
Expand All @@ -109,7 +143,7 @@ void approx_knn_build_index(raft::handle_t& handle,
void approx_knn_search(raft::handle_t& handle,
float* distances,
int64_t* indices,
raft::spatial::knn::knnIndex* index,
knnIndex* index,
int k,
float* query_array,
int n);
Expand Down
50 changes: 10 additions & 40 deletions cpp/src/hdbscan/detail/soft_clustering.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,6 @@
#include <cuml/common/logger.hpp>

#include <raft/core/device_mdspan.hpp>
#include <raft/distance/distance.cuh>
#include <raft/distance/distance_types.hpp>
#include <raft/label/classlabels.cuh>
#include <raft/linalg/matrix_vector_op.cuh>
Expand All @@ -43,6 +42,8 @@
#include <thrust/execution_policy.h>
#include <thrust/transform.h>

#include <cuvs/distance/distance.hpp>

#include <algorithm>
#include <cmath>
#include <limits>
Expand Down Expand Up @@ -88,45 +89,14 @@ void dist_membership_vector(const raft::handle_t& handle,
value_idx samples_per_batch = min((value_idx)batch_size, (value_idx)n_queries - batch_offset);
rmm::device_uvector<value_t> dist(samples_per_batch * n_exemplars, stream);

// compute the distances using raft API
switch (metric) {
case raft::distance::DistanceType::L2SqrtExpanded:
raft::distance::
distance<raft::distance::DistanceType::L2SqrtExpanded, value_t, value_t, value_t, int>(
handle,
query + batch_offset * n,
exemplars_dense.data(),
dist.data(),
samples_per_batch,
n_exemplars,
n,
true);
break;
case raft::distance::DistanceType::L1:
raft::distance::distance<raft::distance::DistanceType::L1, value_t, value_t, value_t, int>(
handle,
query + batch_offset * n,
exemplars_dense.data(),
dist.data(),
samples_per_batch,
n_exemplars,
n,
true);
break;
case raft::distance::DistanceType::CosineExpanded:
raft::distance::
distance<raft::distance::DistanceType::CosineExpanded, value_t, value_t, value_t, int>(
handle,
query + batch_offset * n,
exemplars_dense.data(),
dist.data(),
samples_per_batch,
n_exemplars,
n,
true);
break;
default: RAFT_EXPECTS(false, "Incorrect metric passed!");
}
// compute the distances using the CUVS API
cuvs::distance::pairwise_distance(
handle,
raft::make_device_matrix_view<const value_t, int64_t>(
query + batch_offset * n, samples_per_batch, n),
raft::make_device_matrix_view<const value_t, int64_t>(exemplars_dense.data(), n_exemplars, n),
raft::make_device_matrix_view<value_t, int64_t>(dist.data(), samples_per_batch, n_exemplars),
static_cast<cuvs::distance::DistanceType>(metric));

// compute the minimum distances to exemplars of each cluster
value_idx n_elements = samples_per_batch * n_selected_clusters;
Expand Down
Loading
Loading