Reduce CAGRA test runtime#602
Conversation
|
@tfeher @achirkin, I'd appreciate your thoughts on the parameter reductions here. We don't want to lose anything that is important for coverage, and @cjnolet suggested you would be good reviewers to ensure we retain good coverage. If we can be even more aggressive in our parameter trimming, that is also useful input! |
achirkin
left a comment
There was a problem hiding this comment.
Thanks, @bdice, for the PR! The long CAGRA test times has long been a problem and I totally agree we have to do something with it.
I have nothing against splitting the CAGRA test into multiple executables.
However, I'm a bit hesitating to reduce the test cases via the generate_inputs(). Most of these were set during development for various (legit, but sometimes forgotten) reasons.
I think, instead of reducing the base set of tests, we can go couple other ways and have a greater impact on test times without significantly reducing the coverage:
- We can drastically reduce the number of cases in the
AnnCagraAddNodesTest/AnnCagraFilterTest, while keeping the baseAnnCagraTestas is. For example, we can remove all variants changing the build algorithm and refine ratios for the two suites as these parameters has no effect on them. We can copy-paste thegenerate_inputs()togenerate_inputs_limited()and manually reduce the set by more than 50% of cases. Or we can simply filter out every second variant from the vector (knowing we've already tested the same combination in the base test). - A more involved refactoring (perhaps for a follow-up): I believe we can share some of the state between the tests rather than generating everything from scratch. E.g. generate a blob of data big enough only once and slice/reuse it for all tests in a suite. Or keep an index between the cases varying only search parameters.
cpp/test/neighbors/ann_cagra.cuh
Outdated
| inputs2 = | ||
| raft::util::itertools::product<AnnCagraInputs>({100}, | ||
| {1000}, | ||
| {1, 5, 8, 64, 137, 256, 619, 1024}, // dim |
There was a problem hiding this comment.
Here we test a few edge cases where the dimensionality is 2ⁿ or 2ⁿ ± 1. These are important to check whether we have any problems with padding the data during build and search.
There was a problem hiding this comment.
I figured that was the case. There were a few places that varied dim over a wide range of values. I kept different dim values in each test so that the full range would be covered. Perhaps we can cover the full range in only one set of inputs instead, and use a small range for the other cases?
cpp/test/neighbors/ann_cagra.cuh
Outdated
| inline std::vector<AnnCagraInputs> generate_inputs() | ||
| { | ||
| // TODO(tfeher): test MULTI_CTA kernel with search_width > 1 to allow multiple CTA per queries | ||
| // Varying dim, k, graph_build_algo, search_algo, max_queries |
There was a problem hiding this comment.
I love the extra annotations on what are we varying. Thanks!
|
@achirkin Your comments make sense to me. Would you be willing to push those changes to this PR? |
|
@bdice Thank you for your proposal. I am preparing the changes, and I will push them soon. |
There was a problem hiding this comment.
Running the AddNode and Filtering tests on a subset of input makes a large difference.
I have also applied a refactoring on the basic parameter combinations. This also gave a smaller extra improvement. @achirkin please have a look, if you prefer to keep some of the combinations then let me know.
Testing on H100, this runs 3.7x faster than the previously.
| {100}, | ||
| {1000}, | ||
| {1, 8, 17}, | ||
| {1, 16}, // k |
There was a problem hiding this comment.
Dim and build algo combinations are tested below, therefor we focus on dim and search algo and max_query parameter value here.
|
|
||
| // Fixed dim, and changing neighbors and query size (output matrix size) | ||
| auto inputs2 = raft::util::itertools::product<AnnCagraInputs>( | ||
| {1, 100}, |
There was a problem hiding this comment.
Above we tested with different max_query parameter, here we test with batch size 1 and 100.
| {1, 3, 5, 7, 8, 17, 64, 128, 137, 192, 256, 512, 619, 1024}, // dim | ||
| {10}, | ||
| {10000}, | ||
| {192, 1024}, // dim |
There was a problem hiding this comment.
When testing AddNode, I assumed it is fine to limit to a smaller set of dimensions.
There was a problem hiding this comment.
Please apply these changes to ann_cagra/test_half_uint32_t.cu as well.
There was a problem hiding this comment.
Oh, nevermind! I see there are no AddNode / Filtering tests in that file. My apologies.
achirkin
left a comment
There was a problem hiding this comment.
Thanks, @tfeher, for the updates! Nice to see such a big speedup. Did you have a chance to check the difference in terms of the total number of test cases out of curiosity?
I'd personally prefer to keep the original generate_inputs() unchanged and see how other improvements reduce the test time - just to be on the safe side. But the changes look reasonable, so I don't see a point in holding up this PR.
|
@tfeher @achirkin I merged in the upstream to resolve merge conflicts from #595. I tried to reflect that PR's intent (adding a bunch of |
|
/merge |
Currently, running `NEIGHBORS_ANN_CAGRA_TEST` takes: [0.96 hours on CUDA 11.8, V100 (x86)](https://github.com/rapidsai/cuvs/actions/runs/12913409417/job/36012418022?pr=596#step:8:1718) [1.59 hours on CUDA 12.5, V100 (x86)](https://github.com/rapidsai/cuvs/actions/runs/12913409417/job/36012418329?pr=596#step:8:492) [0.28 hours on CUDA 12.0, A100 (ARM)](https://github.com/rapidsai/cuvs/actions/runs/12913409417/job/36012418741?pr=596#step:8:1729) Individual tests should be able to complete in less than an hour. Ideally, less than 10 minutes. This PR proposes some changes to CAGRA tests: - Each CAGRA type is now its own test executable (e.g. `NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST`) - Some parameter combinations were trimmed by ~50% Authors: - Bradley Dice (https://github.com/bdice) - Tamas Bela Feher (https://github.com/tfeher) - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Artem M. Chirkin (https://github.com/achirkin) - Divye Gala (https://github.com/divyegala) URL: rapidsai#602
Currently, running
NEIGHBORS_ANN_CAGRA_TESTtakes:0.96 hours on CUDA 11.8, V100 (x86)
1.59 hours on CUDA 12.5, V100 (x86)
0.28 hours on CUDA 12.0, A100 (ARM)
Individual tests should be able to complete in less than an hour. Ideally, less than 10 minutes.
This PR proposes some changes to CAGRA tests:
NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST)