unittest: Add SM arch checks to skip unsupported tests on Hopper#1998
unittest: Add SM arch checks to skip unsupported tests on Hopper#1998bkryu merged 1 commit intoflashinfer-ai:mainfrom
Conversation
|
/bot run |
WalkthroughThis pull request restricts GPU compute capability filters across multiple test suites, narrowing test execution from a broader set of architectures to SM100/SM103 GPUs only. Changes include replacing skip conditions and adding new capability guards consistently across attention, GEMM, and MOE test modules. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Areas requiring attention:
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/attention/test_trtllm_gen_attention.py (1)
635-637: Consider moving the SM check immediately after compute_capability retrieval.For consistency with
test_trtllm_batch_prefill(Lines 351-353) and to fail fast, consider placing the SM architecture check directly after Line 635 without the intervening blank line. This ensures early exit before any subsequent validation logic.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
tests/attention/test_trtllm_gen_attention.py(2 hunks)tests/attention/test_trtllm_gen_mla.py(1 hunks)tests/gemm/test_bmm_fp8.py(1 hunks)tests/gemm/test_groupwise_scaled_gemm_fp8.py(4 hunks)tests/gemm/test_groupwise_scaled_gemm_mxfp4.py(1 hunks)tests/gemm/test_mm_fp4.py(1 hunks)tests/moe/test_trtllm_gen_fused_moe.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
tests/gemm/test_bmm_fp8.py (1)
flashinfer/utils.py (1)
get_compute_capability(251-254)
tests/gemm/test_mm_fp4.py (2)
flashinfer/gemm.py (1)
mm_fp4(1858-2009)flashinfer/utils.py (1)
is_backend_supported(953-964)
tests/gemm/test_groupwise_scaled_gemm_fp8.py (1)
flashinfer/utils.py (1)
get_compute_capability(251-254)
🔇 Additional comments (12)
tests/attention/test_trtllm_gen_attention.py (2)
352-353: Verify SM architecture restriction is intentional.The test now only runs on SM100/SM103 GPUs (compute_capability[0] == 10), excluding all other architectures. This is a significant restriction compared to the previous logic that excluded only specific SM versions. Confirm this aligns with the supported hardware for trtllm_batch_context_with_kv_cache.
920-922: LGTM!The SM architecture check is correctly placed for early exit, consistent with the pattern in
test_trtllm_batch_prefill.tests/gemm/test_mm_fp4.py (1)
29-34: LGTM!Good addition of backend support validation using the centralized
is_backend_supported()API. This provides early gating based on compute capability and backend combination, improving test clarity and preventing unsupported configurations from proceeding to more specific validation logic.tests/attention/test_trtllm_gen_mla.py (1)
36-37: LGTM!The SM architecture restriction is correctly implemented and consistent with other trtllm attention tests in this PR. The check appropriately gates execution to SM100/SM103 only.
tests/gemm/test_groupwise_scaled_gemm_mxfp4.py (1)
259-262: LGTM!The SM architecture restriction correctly limits gemm_mxfp4_nt_groupwise to SM100/SM103 GPUs. This aligns with the broader PR pattern of tightening compute capability requirements for specialized kernels.
tests/moe/test_trtllm_gen_fused_moe.py (1)
2038-2039: LGTM!The SM architecture check appropriately restricts the MoE test to SM100/SM103 GPUs, consistent with the compute capability gating pattern used throughout this PR.
tests/gemm/test_bmm_fp8.py (2)
20-21: LGTM!Good refactoring to retrieve
compute_capabilityonce and reuse it, improving efficiency and readability.
29-32: LGTM!The cutlass backend SM architecture gating is correctly implemented, restricting execution to SM100/103, SM110, and SM120/121 GPUs. This complements the existing xfail for known SM120/121 issues and provides clear skip messaging for unsupported architectures.
tests/gemm/test_groupwise_scaled_gemm_fp8.py (4)
46-50: LGTM!The SM architecture check appropriately gates
gemm_fp8_nt_blockscaledto SM100/103, SM110, and SM120/121 GPUs with a clear skip message.
91-91: LGTM!Good practice to retrieve compute_capability early for reuse in subsequent conditional logic.
101-104: LGTM!The cutlass backend gating for
gemm_fp8_nt_groupwisecorrectly restricts execution to supported SM architectures (SM100/103, SM110, and SM120/121).
157-167: Verify SM110 exclusion is intentional for group_gemm_fp8_nt_groupwise.Line 164-167 restricts
group_gemm_fp8_nt_groupwiseto SM100/103 and SM120/121 only, excluding SM110 unlike the other FP8 tests in this file (Lines 47, 102). If this exclusion is intentional due to specific SM110 limitations for grouped GEMMs, consider adding a brief inline comment explaining the rationale.
…shinfer-ai#1998) <!-- .github/pull_request_template.md --> A number of unit tests fail on Hopper because they either do not have a support-check or fail based on "what is not supported" while missing SM90. Current PR adds checks based on "what is supported" and skips if not in the supported list of SMs. Special case of `mm_fp4` where `mm_fp4.is_backend_supported(backend, compute_capability_number)` now exists and is used to skip tests if not supported. Impacted tests: * tests/attention/test_trtllm_gen_attention.py * tests/attention/test_trtllm_gen_mla.py * tests/gemm/test_bmm_fp8.py * tests/gemm/test_mm_fp4.py * tests/gemm/test_groupwise_scaled_gemm_fp8.py * tests/gemm/test_groupwise_scaled_gemm_mxfp4.py * tests/moe/test_trtllm_gen_fused_moe.py <!-- What does this PR do? Briefly describe the changes and why they’re needed. --> <!-- Link any related issues here --> Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). <!-- Optional: anything you'd like reviewers to focus on, concerns, etc. -->
📌 Description
A number of unit tests fail on Hopper because they either do not have a support-check or fail based on "what is not supported" while missing SM90. Current PR adds checks based on "what is supported" and skips if not in the supported list of SMs.
Special case of
mm_fp4wheremm_fp4.is_backend_supported(backend, compute_capability_number)now exists and is used to skip tests if not supported.Impacted tests:
🔍 Related Issues
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes