Reduce CAGRA test runtime by bdice · Pull Request #602 · rapidsai/cuvs

bdice · 2025-01-22T21:55:02Z

Currently, running NEIGHBORS_ANN_CAGRA_TEST takes:
0.96 hours on CUDA 11.8, V100 (x86)
1.59 hours on CUDA 12.5, V100 (x86)
0.28 hours on CUDA 12.0, A100 (ARM)

Individual tests should be able to complete in less than an hour. Ideally, less than 10 minutes.

This PR proposes some changes to CAGRA tests:

Each CAGRA type is now its own test executable (e.g. NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST)
Some parameter combinations were trimmed by ~50%

bdice · 2025-01-23T00:06:55Z

@tfeher @achirkin, I'd appreciate your thoughts on the parameter reductions here. We don't want to lose anything that is important for coverage, and @cjnolet suggested you would be good reviewers to ensure we retain good coverage.

If we can be even more aggressive in our parameter trimming, that is also useful input!

achirkin

Thanks, @bdice, for the PR! The long CAGRA test times has long been a problem and I totally agree we have to do something with it.

I have nothing against splitting the CAGRA test into multiple executables.

However, I'm a bit hesitating to reduce the test cases via the generate_inputs(). Most of these were set during development for various (legit, but sometimes forgotten) reasons.

I think, instead of reducing the base set of tests, we can go couple other ways and have a greater impact on test times without significantly reducing the coverage:

We can drastically reduce the number of cases in the AnnCagraAddNodesTest / AnnCagraFilterTest, while keeping the base AnnCagraTest as is. For example, we can remove all variants changing the build algorithm and refine ratios for the two suites as these parameters has no effect on them. We can copy-paste the generate_inputs() to generate_inputs_limited() and manually reduce the set by more than 50% of cases. Or we can simply filter out every second variant from the vector (knowing we've already tested the same combination in the base test).
A more involved refactoring (perhaps for a follow-up): I believe we can share some of the state between the tests rather than generating everything from scratch. E.g. generate a blob of data big enough only once and slice/reuse it for all tests in a suite. Or keep an index between the cases varying only search parameters.

achirkin · 2025-01-23T07:14:31Z

cpp/test/neighbors/ann_cagra.cuh

+  inputs2 =
+    raft::util::itertools::product<AnnCagraInputs>({100},
+                                                   {1000},
+                                                   {1, 5, 8, 64, 137, 256, 619, 1024},  // dim


Here we test a few edge cases where the dimensionality is 2ⁿ or 2ⁿ ± 1. These are important to check whether we have any problems with padding the data during build and search.

I figured that was the case. There were a few places that varied dim over a wide range of values. I kept different dim values in each test so that the full range would be covered. Perhaps we can cover the full range in only one set of inputs instead, and use a small range for the other cases?

cpp/test/neighbors/ann_cagra.cuh

achirkin · 2025-01-23T07:18:24Z

cpp/test/neighbors/ann_cagra.cuh

 inline std::vector<AnnCagraInputs> generate_inputs()
 {
  // TODO(tfeher): test MULTI_CTA kernel with search_width > 1 to allow multiple CTA per queries
+  // Varying dim, k, graph_build_algo, search_algo, max_queries


I love the extra annotations on what are we varying. Thanks!

bdice · 2025-01-23T13:12:41Z

@achirkin Your comments make sense to me. Would you be willing to push those changes to this PR?

tfeher · 2025-01-23T13:18:33Z

@bdice Thank you for your proposal. I am preparing the changes, and I will push them soon.

…est combinations

tfeher

Running the AddNode and Filtering tests on a subset of input makes a large difference.

I have also applied a refactoring on the basic parameter combinations. This also gave a smaller extra improvement. @achirkin please have a look, if you prefer to keep some of the combinations then let me know.

Testing on H100, this runs 3.7x faster than the previously.

tfeher · 2025-01-23T17:43:34Z

cpp/test/neighbors/ann_cagra.cuh

    {100},
    {1000},
    {1, 8, 17},
-    {1, 16},  // k


Dim and build algo combinations are tested below, therefor we focus on dim and search algo and max_query parameter value here.

tfeher · 2025-01-23T17:44:56Z

cpp/test/neighbors/ann_cagra.cuh


+  // Fixed dim, and changing neighbors and query size (output matrix size)
  auto inputs2 = raft::util::itertools::product<AnnCagraInputs>(
+    {1, 100},


Above we tested with different max_query parameter, here we test with batch size 1 and 100.

tfeher · 2025-01-23T17:47:40Z

cpp/test/neighbors/ann_cagra.cuh

-    {1, 3, 5, 7, 8, 17, 64, 128, 137, 192, 256, 512, 619, 1024},  // dim
-    {10},
+    {10000},
+    {192, 1024},  // dim


When testing AddNode, I assumed it is fine to limit to a smaller set of dimensions.

bdice

@tfeher Thanks so much! Your changes look fine to me! ~~It looks like ann_cagra/test_half_uint32_t.cu might have been missed in the refactor.~~

bdice · 2025-01-23T18:08:22Z

cpp/test/neighbors/ann_cagra/test_float_uint32_t.cu

Please apply these changes to ann_cagra/test_half_uint32_t.cu as well.

Oh, nevermind! I see there are no AddNode / Filtering tests in that file. My apologies.

achirkin

Thanks, @tfeher, for the updates! Nice to see such a big speedup. Did you have a chance to check the difference in terms of the total number of test cases out of curiosity?

I'd personally prefer to keep the original generate_inputs() unchanged and see how other improvements reduce the test time - just to be on the safe side. But the changes look reasonable, so I don't see a point in holding up this PR.

…reduction-proposal

bdice · 2025-01-24T23:52:25Z

@tfeher @achirkin I merged in the upstream to resolve merge conflicts from #595. I tried to reflect that PR's intent (adding a bunch of InnerProduct tests) but I would like your eyes one more time to make sure that I did what you expected w.r.t. this PR and #595. I'll also request a build codeowner review.

achirkin

LGTM

cpp/test/neighbors/ann_cagra.cuh

…nn_cagra.cuh (not supported)

achirkin · 2025-01-27T11:58:40Z

/merge

bdice · 2025-01-27T18:51:50Z

Thanks so much @achirkin @tfeher!

Currently, running `NEIGHBORS_ANN_CAGRA_TEST` takes: [0.96 hours on CUDA 11.8, V100 (x86)](https://github.com/rapidsai/cuvs/actions/runs/12913409417/job/36012418022?pr=596#step:8:1718) [1.59 hours on CUDA 12.5, V100 (x86)](https://github.com/rapidsai/cuvs/actions/runs/12913409417/job/36012418329?pr=596#step:8:492) [0.28 hours on CUDA 12.0, A100 (ARM)](https://github.com/rapidsai/cuvs/actions/runs/12913409417/job/36012418741?pr=596#step:8:1729) Individual tests should be able to complete in less than an hour. Ideally, less than 10 minutes. This PR proposes some changes to CAGRA tests: - Each CAGRA type is now its own test executable (e.g. `NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST`) - Some parameter combinations were trimmed by ~50% Authors: - Bradley Dice (https://github.com/bdice) - Tamas Bela Feher (https://github.com/tfeher) - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Artem M. Chirkin (https://github.com/achirkin) - Divye Gala (https://github.com/divyegala) URL: rapidsai#602

Proposal to reduce CAGRA test runtime.

a6ee19e

bdice requested review from a team as code owners January 22, 2025 21:55

github-actions bot added cpp CMake labels Jan 22, 2025

bdice added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Jan 22, 2025

bdice mentioned this pull request Jan 22, 2025

[FEA] Reduce C++ test runtime #603

Open

bdice requested review from achirkin and tfeher January 23, 2025 00:05

bdice mentioned this pull request Jan 23, 2025

introduce libcuvs wheels #594

Merged

achirkin requested changes Jan 23, 2025

View reviewed changes

Update cpp/test/neighbors/ann_cagra.cuh

916c469

Test CAGRA addnode and filtering on a limeted set of inputs, reduce t…

cb79e83

…est combinations

tfeher reviewed Jan 23, 2025

View reviewed changes

bdice commented Jan 23, 2025

View reviewed changes

achirkin approved these changes Jan 23, 2025

View reviewed changes

bdice added 2 commits January 24, 2025 17:49

Merge remote-tracking branch 'upstream/branch-25.02' into cagra-test-…

344eb43

…reduction-proposal

Remove extra test.

e3423e6

bdice changed the title ~~Proposal to reduce CAGRA test runtime.~~ Reduce CAGRA test runtime. Jan 24, 2025

bdice changed the title ~~Reduce CAGRA test runtime.~~ Reduce CAGRA test runtime Jan 24, 2025

divyegala approved these changes Jan 25, 2025

View reviewed changes

cjnolet assigned bdice Jan 25, 2025

achirkin approved these changes Jan 27, 2025

View reviewed changes

cpp/test/neighbors/ann_cagra.cuh Show resolved Hide resolved

achirkin added 2 commits January 27, 2025 08:29

Remove InnerProduct tests for compressed dataset cpp/test/neighbors/a…

4809576

…nn_cagra.cuh (not supported)

Merge branch 'branch-25.02' into cagra-test-reduction-proposal

b5220df

rapids-bot bot merged commit 5527cdf into rapidsai:branch-25.02 Jan 27, 2025
61 checks passed

Conversation

bdice commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bdice commented Jan 23, 2025

Uh oh!

achirkin left a comment

Choose a reason for hiding this comment

Uh oh!

achirkin Jan 23, 2025

Choose a reason for hiding this comment

Uh oh!

bdice Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

achirkin Jan 23, 2025

Choose a reason for hiding this comment

Uh oh!

bdice commented Jan 23, 2025

Uh oh!

tfeher commented Jan 23, 2025

Uh oh!

tfeher left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tfeher Jan 23, 2025

Choose a reason for hiding this comment

Uh oh!

tfeher Jan 23, 2025

Choose a reason for hiding this comment

Uh oh!

tfeher Jan 23, 2025

Choose a reason for hiding this comment

Uh oh!

bdice left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bdice Jan 23, 2025

Choose a reason for hiding this comment

Uh oh!

bdice Jan 23, 2025

Choose a reason for hiding this comment

Uh oh!

achirkin left a comment

Choose a reason for hiding this comment

Uh oh!

bdice commented Jan 24, 2025

Uh oh!

achirkin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

achirkin commented Jan 27, 2025

Uh oh!

Uh oh!

bdice commented Jan 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bdice commented Jan 22, 2025 •

edited

Loading

bdice Jan 23, 2025 •

edited

Loading

tfeher left a comment •

edited

Loading

bdice left a comment •

edited

Loading

bdice commented Jan 27, 2025 •

edited

Loading