Add/update multi node/multi GPU test scripts#2410
Conversation
📝 WalkthroughWalkthroughCentralizes test orchestration: removes a monolithic test runner, adds a reusable test utility library, and introduces task-specific runners for unit, multi-GPU, and multi-node kernel tests. New scripts discover, sample, dry-run, and execute tests while installing/verifying precompiled kernels. Changes
Sequence Diagram(s)sequenceDiagram
rect rgba(46, 117, 182, 0.5)
Participant Runner as Task Script
end
rect rgba(76, 175, 80, 0.5)
Participant Utils as test_utils.sh
end
rect rgba(255, 193, 7, 0.5)
Participant FS as Artifact / Filesystem
end
rect rgba(233, 30, 99, 0.5)
Participant Pytest as pytest
end
Runner->>Utils: source, parse_args(), print banner
Runner->>Utils: request install_and_verify() / install_precompiled_kernels()
Utils->>FS: locate precompiled artifacts (cubin/jit-cache)
FS-->>Utils: artifact present / missing
Utils->>Runner: install results (ok / error)
Runner->>Utils: collect_tests() / find_test_files()
Utils->>Pytest: run tests (sampled or full) -> produce JUnit XML
Pytest-->>Utils: test outcomes
Utils->>Runner: print_execution_summary(), exit code
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Important Action Needed: IP Allowlist UpdateIf your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:
Reviews will stop working after February 8, 2026 if the new IP is not added to your allowlist. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @dierksen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the organization and capabilities of the project's testing infrastructure. It centralizes common test execution logic into a reusable script, renames the main unit test runner for clarity, and introduces dedicated scripts for multi-GPU and multi-node communication kernel tests. These changes aim to make test management more efficient and extensible, while also providing flexible execution options like dry runs and sanity checks. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request refactors the testing scripts by introducing a common functions file, renaming the main unit test script, and adding new scripts for multi-GPU/multi-node testing. The changes significantly improve the structure and maintainability of the test suite. My review includes suggestions to improve the robustness of shell scripting practices, such as handling filenames with spaces and simplifying conditional checks. I also found a file that seems to be a leftover from a rename operation and should be removed.
I am having trouble creating individual review comments. Click here to see my feedback.
scripts/task_test_blackwell_kernels.sh (1)
This file appears to be an artifact of a file rename operation, as mentioned in the PR description. It contains only the name of another script (task_run_unit_tests.sh). This is likely a mistake and the file should be removed to avoid confusion and potential issues in the future.
scripts/common_test_functions.sh (187)
The condition to check if SAMPLED_NODE_IDS is empty is overly complex and the wc -l part is buggy. When $SAMPLED_NODE_IDS is empty, echo "$SAMPLED_NODE_IDS" still prints a newline, causing wc -l to return 1, not 0. The [ -z "$SAMPLED_NODE_IDS" ] part correctly handles the empty case, making the second part of the || redundant and incorrect. A simpler check is sufficient and more robust.
if [ -z "$SAMPLED_NODE_IDS" ]; then
scripts/common_test_functions.sh (241)
Using [ ... ] with == for string comparison is a bashism and can have unexpected behavior with word splitting. It's safer and generally recommended to use the [[ ... ]] keyword for tests in bash scripts, as it provides more predictable behavior and additional features.
if [[ "$SANITY_TEST" == "true" ]]; then
scripts/task_run_unit_tests.sh (31-35)
The command ALL_TEST_FILES=$(find ...) and loop for test_file in $ALL_TEST_FILES is not robust for filenames with spaces. Using mapfile to read find's output into an array and then iterating over that array is a safer and more robust approach in bash.
mapfile -t ALL_TEST_FILES < <(find tests/ -name "test_*.py" -type f | sort)
# Filter out excluded files based on directory exclusions
TEST_FILES=""
for test_file in "${ALL_TEST_FILES[@]}"; do
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@scripts/common_test_functions.sh`:
- Around line 88-96: The script uses CUDA_VERSION to derive CUDA_STREAM but
doesn't validate it; update the logic around the CUDA_VERSION/CUDA_STREAM
assignment (the block referencing CUDA_VERSION, CUDA_STREAM and
JIT_ARCH_EFFECTIVE) to first check if CUDA_VERSION is set (e.g. test -z
"${CUDA_VERSION}") and either assign a clear default or exit with an error
message; then perform the existing pattern-match for values starting with "cu*"
and explicit matches like "12.9.0" to set CUDA_STREAM, ensuring the empty/unset
case is handled explicitly instead of falling back implicitly to "cu130".
- Around line 130-142: The script currently prints error messages when wheels
are missing but continues execution; update the two failure branches that check
DIST_CUBIN_DIR and DIST_JIT_CACHE_DIR so they stop the script with a non-zero
exit (e.g., call exit 1) after printing the error to stderr, or return a
non-zero status if these checks are inside a function; modify the branches in
the blocks referencing DIST_CUBIN_DIR and DIST_JIT_CACHE_DIR (the pip install
checks) to fail fast by exiting with code 1 so downstream tests won't run
without the expected artifacts.
In `@scripts/task_run_unit_tests.sh`:
- Around line 14-27: In find_test_files(), the parsing of NORECURSEDIRS
incorrectly converts commas to spaces and only grabs a single line with grep;
update the extraction so it preserves pytest's space-separated values (remove
the tr ',' ' ' conversion) and capture multiline continuations for the
norecursedirs setting (e.g., read the line starting with "norecursedirs" plus
any subsequent indented continuation lines, collapse runs of whitespace into
single spaces) before assigning EXCLUDED_DIRS and logging; adjust the
NORECURSEDIRS assignment logic accordingly so EXCLUDED_DIRS reflects the actual
space-separated directories.
🧹 Nitpick comments (5)
scripts/common_test_functions.sh (3)
181-189: Awk variables should be passed explicitly to avoid shell interpolation issues.The awk command uses shell variables directly in the pattern. While this works in most cases, it's more robust to pass them as awk variables to avoid potential issues with special characters or unexpected expansion.
Suggested improvement
- SAMPLED_NODE_IDS=$(echo "$all_node_ids" | awk "NR % $SAMPLE_RATE == $SAMPLE_OFFSET") + SAMPLED_NODE_IDS=$(echo "$all_node_ids" | awk -v rate="$SAMPLE_RATE" -v offset="$SAMPLE_OFFSET" 'NR % rate == offset')
241-241: Use POSIX-compliant=instead of==for string comparison in[ ].The script uses
[ "$SANITY_TEST" == "true" ]which is a bashism. While this works in bash, the POSIX-compliant form uses single=. This appears in multiple places (lines 241, 268, 370, 413, 434).Since the shebang is
#!/bin/bash, this will work, but for consistency with best practices:Suggested fix (multiple locations)
- if [ "$SANITY_TEST" == "true" ]; then + if [ "$SANITY_TEST" = "true" ]; then
405-426: Word splitting on$test_filesmay break with paths containing spaces.The
for test_file in $test_filespattern relies on word splitting, which will break if any test file paths contain spaces. While this is unlikely for test files, using an array would be more robust.For now this is acceptable given the controlled test file paths, but worth noting for future maintainability.
scripts/task_test_multi_gpu_comm_kernels.sh (1)
19-22: Consider adding a TODO comment or tracking issue for disabled tests.The commented-out test files indicate tests that need fixing. Consider adding a more explicit TODO with a reference to an issue tracker, or documenting what specific fixes are needed.
Suggested improvement
# Define the specific test files for multi-GPU comm tests (single-node) -# TEST_FILES="tests/comm/test_allreduce_unified_api.py tests/comm/test_allreduce_negative.py tests/comm/test_trtllm_allreduce_fusion.py" -# Add others back once they are fixed +# TODO(dierksen): Re-enable once fixed (see issue `#XXXX`): +# - tests/comm/test_allreduce_negative.py +# - tests/comm/test_trtllm_allreduce_fusion.py TEST_FILES="tests/comm/test_allreduce_unified_api.py"scripts/task_test_multi_node_comm_kernels.sh (1)
10-15: Remove dead commented-out code.These commented-out lines for cache cleaning are now redundant since
clean_python_cache()from the common library is called inmain(). Leaving this code adds noise and could cause confusion about whether cache cleaning is happening.Suggested fix
-# Clean Python bytecode cache to avoid stale imports (e.g., after module refactoring) -# echo "Cleaning Python bytecode cache..." -# find . -type d -name __pycache__ -exec rm -rf {} + 2>/dev/null || true -# find . -type f -name '*.pyc' -delete 2>/dev/null || true -# echo "Cache cleaned." -# echo "" -
|
Will review it today |
|
/bot run |
|
[FAILED] Pipeline #42837414: 11/20 passed |
- Add common test functions - Add multi-gpu and multi-node comm test scripts - Update blackwell kernel test script to use common functions - Rename unit test script to accurately reflect its purpose
- Remove pytest -s flag to reduce CI log verbosity and costs - Rename common_test_functions.sh to test_utils.sh for brevity - Remove Python bytecode cache cleanup (no longer needed) - Delete task_test_blackwell_kernels.sh symlink The bytecode cleanup was added for a one-time module refactoring (gemm.py -> gemm/ package) in Nov 2025 and is no longer relevant. Modern Python handles cache invalidation automatically. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
fd03e26 to
fedbea5
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@scripts/test_utils.sh`:
- Around line 5-14: SAMPLE_RATE can be zero or non-numeric which will break
arithmetic (e.g. RANDOM % SAMPLE_RATE, 100 / SAMPLE_RATE, awk modulo); add a
validation right after the SAMPLE_RATE default assignment that ensures
SAMPLE_RATE is a positive integer (reject empty/non-digit or <=0), and if
invalid reset SAMPLE_RATE to a safe default (e.g. 5) and emit a warning; then
use the validated SAMPLE_RATE when computing SAMPLE_OFFSET and elsewhere
(references: SAMPLE_RATE, SAMPLE_OFFSET, RANDOM, the sampling arithmetic used
later in the script).
| SAMPLE_OFFSET=$((RANDOM % SAMPLE_RATE)) | ||
| fi | ||
|
|
||
| # Pytest configuration flags |
There was a problem hiding this comment.
do we need the "Clean Python bytecode cache" part before this?
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@scripts/task_test_multi_gpu_comm_kernels.sh`:
- Around line 22-50: TEST_FILES is defined as a quoted space-delimited string
which will be passed as a single argument to execute_tests/execute_dry_run;
change TEST_FILES to a Bash array (e.g.,
TEST_FILES=(tests/comm/test_allreduce_unified_api.py ...) ), update the for loop
to iterate over "${TEST_FILES[@]}", and call execute_dry_run "${TEST_FILES[@]}"
and execute_tests "${TEST_FILES[@]}"; ensure parse_args, print_test_mode_banner
and other surrounding logic remain unchanged and that any places referencing
TEST_FILES as a string are updated to use the array expansion.
| # Define the specific test files for multi-GPU comm tests (single-node) | ||
| # TEST_FILES="tests/comm/test_allreduce_unified_api.py tests/comm/test_allreduce_negative.py tests/comm/test_trtllm_allreduce_fusion.py" | ||
| # Add others back once they are fixed | ||
| TEST_FILES="tests/comm/test_allreduce_unified_api.py" | ||
|
|
||
| # Main execution | ||
| main() { | ||
| # Parse command line arguments | ||
| parse_args "$@" | ||
|
|
||
| # Print test mode banner | ||
| print_test_mode_banner | ||
|
|
||
| # Install and verify (unless dry run) | ||
| install_and_verify | ||
|
|
||
| # Print test files | ||
| echo "Multi-GPU comm kernel test files (running with: ${PYTEST_COMMAND_PREFIX}):" | ||
| for test_file in $TEST_FILES; do | ||
| echo " $test_file" | ||
| done | ||
| echo "" | ||
|
|
||
| # Execute tests or dry run | ||
| if [ "$DRY_RUN" == "true" ]; then | ||
| execute_dry_run "$TEST_FILES" | ||
| else | ||
| execute_tests "$TEST_FILES" | ||
| fi |
There was a problem hiding this comment.
Avoid quoting a space-delimited list; use an array for TEST_FILES.
execute_dry_run "$TEST_FILES" / execute_tests "$TEST_FILES" will pass a single argument once multiple files are re-enabled, which will break execution. Use a Bash array and pass "${TEST_FILES[@]}" instead.
Proposed fix
-# TEST_FILES="tests/comm/test_allreduce_unified_api.py tests/comm/test_allreduce_negative.py tests/comm/test_trtllm_allreduce_fusion.py"
+# TEST_FILES=(
+# "tests/comm/test_allreduce_unified_api.py"
+# "tests/comm/test_allreduce_negative.py"
+# "tests/comm/test_trtllm_allreduce_fusion.py"
+# )
# Add others back once they are fixed
-TEST_FILES="tests/comm/test_allreduce_unified_api.py"
+TEST_FILES=("tests/comm/test_allreduce_unified_api.py")
@@
- for test_file in $TEST_FILES; do
+ for test_file in "${TEST_FILES[@]}"; do
echo " $test_file"
done
@@
- if [ "$DRY_RUN" == "true" ]; then
- execute_dry_run "$TEST_FILES"
+ if [ "$DRY_RUN" == "true" ]; then
+ execute_dry_run "${TEST_FILES[@]}"
else
- execute_tests "$TEST_FILES"
+ execute_tests "${TEST_FILES[@]}"
fi🤖 Prompt for AI Agents
In `@scripts/task_test_multi_gpu_comm_kernels.sh` around lines 22 - 50, TEST_FILES
is defined as a quoted space-delimited string which will be passed as a single
argument to execute_tests/execute_dry_run; change TEST_FILES to a Bash array
(e.g., TEST_FILES=(tests/comm/test_allreduce_unified_api.py ...) ), update the
for loop to iterate over "${TEST_FILES[@]}", and call execute_dry_run
"${TEST_FILES[@]}" and execute_tests "${TEST_FILES[@]}"; ensure parse_args,
print_test_mode_banner and other surrounding logic remain unchanged and that any
places referencing TEST_FILES as a string are updated to use the array
expansion.
| echo "" | ||
|
|
||
| if [ -n "${JIT_ARCH}" ]; then | ||
| # 12.0a for CUDA 12.9.0, 12.0f for CUDA 13.0.0 |
There was a problem hiding this comment.
[non-blocking] should we have a file of consolidated source of truth like our cudnn cmake presets to specify
(device) x (cuda toolkit) -> arch target list
There was a problem hiding this comment.
That's probably a good idea; keeping track of which SMs support the f modifier and which ones don't can be kind of a headache tbh.
aleozlx
left a comment
There was a problem hiding this comment.
posted a couple of non-blocking questions
lgtm
|
can @kahyunnam try approving it per approver list on the "Merging is blocked" section. thx |
<!-- .github/pull_request_template.md --> ## 📌 Description - Renames the unit testing script to more accurately reflect current usage - adds multi-node and multi-GPU testing scripts - refactors common code for those 3 scripts into a separate file ## 🔍 Related Issues <!-- Link any related issues here --> ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [ ] Tests have been added or updated as needed. - [ ] All tests are passing (`unittest`, etc.). ## Reviewer Notes <!-- Optional: anything you'd like reviewers to focus on, concerns, etc. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Centralized and consolidated test orchestration and utilities for consistent unit, multi‑GPU, and multi‑node test runs. * Improved environment handling, install/verify flows, and deterministic test sampling across runners. * **New Features** * Added dry‑run mode, clearer test‑mode banners, and detailed execution and dry‑run summaries for easier test planning and reporting. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
📌 Description
🔍 Related Issues
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
Summary by CodeRabbit