unittest: Add SM arch checks to skip unsupported tests on Hopper by bkryu · Pull Request #1998 · flashinfer-ai/flashinfer

bkryu · 2025-10-28T18:34:47Z

📌 Description

A number of unit tests fail on Hopper because they either do not have a support-check or fail based on "what is not supported" while missing SM90. Current PR adds checks based on "what is supported" and skips if not in the supported list of SMs.

Special case of mm_fp4 where mm_fp4.is_backend_supported(backend, compute_capability_number) now exists and is used to skip tests if not supported.

Impacted tests:

tests/attention/test_trtllm_gen_attention.py
tests/attention/test_trtllm_gen_mla.py
tests/gemm/test_bmm_fp8.py
tests/gemm/test_mm_fp4.py
tests/gemm/test_groupwise_scaled_gemm_fp8.py
tests/gemm/test_groupwise_scaled_gemm_mxfp4.py
tests/moe/test_trtllm_gen_fused_moe.py

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

bkryu · 2025-10-28T18:35:09Z

/bot run

coderabbitai · 2025-10-28T18:35:14Z

Walkthrough

This pull request restricts GPU compute capability filters across multiple test suites, narrowing test execution from a broader set of architectures to SM100/SM103 GPUs only. Changes include replacing skip conditions and adding new capability guards consistently across attention, GEMM, and MOE test modules.

Changes

Cohort / File(s)	Summary
Attention Tests `tests/attention/test_trtllm_gen_attention.py`, `tests/attention/test_trtllm_gen_mla.py`	Replaced GPU compute capability guards to skip tests unless SM100/SM103 are present, narrowing supported architectures from excluding SM110/SM120/SM121 to allowing only SM100/SM103.
GEMM Tests: Core Logic `tests/gemm/test_bmm_fp8.py`, `tests/gemm/test_mm_fp4.py`	Introduced computed capability variables and added skip conditions for unsupported compute capabilities. For cutlass backend in bmm_fp8, skip when SM not in [10, 11, 12]. For mm_fp4, skip when backend unsupported for the given compute capability.
GEMM Tests: Groupwise `tests/gemm/test_groupwise_scaled_gemm_fp8.py`, `tests/gemm/test_groupwise_scaled_gemm_mxfp4.py`	Added runtime GPU capability checks early in test execution; skip tests for unsupported architectures. Blockscale test requires SM100/103, 110, or 120/121; groupwise tests restrict to SM100/103 for cutlass backend; mxfp4 test restricts to SM100/103 only.
MOE Tests `tests/moe/test_trtllm_gen_fused_moe.py`	Updated skip condition in test_moe_quantization_classes from excluding SM110/SM120/SM121 to allowing only SM100/SM103 GPUs.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Areas requiring attention:

Verify SM compute capability constants are correctly mapped (SM100/SM103 → [10], SM110 → [11], SM120/SM121 → [12])
Cross-check that xfail behaviors (particularly in test_bmm_fp8) are preserved as intended for specific SM versions
Ensure skip messages are descriptive and accurate across all test files
Validate control flow in multi-test files like test_groupwise_scaled_gemm_fp8, where capability checks are positioned correctly relative to backend-specific logic

Suggested reviewers

cyx-6
yzh119

Poem

🐰 GPUs align, a narrower range,
SM100, SM103—no need to change!
Tests now focus, skip the rest with grace,
Compute caps dance in a tighter space! 🎯

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The PR description is almost entirely empty, consisting only of the template structure with all placeholder comments and no substantive content. The critical sections—Description, Related Issues, and the verification checklist items—are all unfilled or unchecked, providing no information about what the PR does, why the changes are needed, which issues are addressed, or whether pre-commit checks and tests have been verified. This fails to meet the basic requirements of a complete PR description.
Docstring Coverage	⚠️ Warning	Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The PR title "unittest: Add SM arch checks to skip unsupported tests on Hopper" accurately and concisely describes the main objective of the changeset. The raw summary shows that across multiple test files, GPU compute capability checks have been added to skip tests on unsupported GPU architectures, which aligns directly with the title's description of adding SM (streaming multiprocessor) architecture checks. The title is specific, clear, and avoids vague terminology, making it useful for scanning PR history.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

flashinfer-bot · 2025-10-28T18:35:25Z

GitLab MR !95 has been created, and the CI pipeline #37467216 is currently running. I'll report back once the pipeline job completes.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tests/attention/test_trtllm_gen_attention.py (1)

635-637: Consider moving the SM check immediately after compute_capability retrieval.

For consistency with test_trtllm_batch_prefill (Lines 351-353) and to fail fast, consider placing the SM architecture check directly after Line 635 without the intervening blank line. This ensures early exit before any subsequent validation logic.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8ad64e2 and eafc9a9.

📒 Files selected for processing (7)

tests/attention/test_trtllm_gen_attention.py (2 hunks)
tests/attention/test_trtllm_gen_mla.py (1 hunks)
tests/gemm/test_bmm_fp8.py (1 hunks)
tests/gemm/test_groupwise_scaled_gemm_fp8.py (4 hunks)
tests/gemm/test_groupwise_scaled_gemm_mxfp4.py (1 hunks)
tests/gemm/test_mm_fp4.py (1 hunks)
tests/moe/test_trtllm_gen_fused_moe.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

tests/gemm/test_bmm_fp8.py (1)

flashinfer/utils.py (1)

get_compute_capability (251-254)

tests/gemm/test_mm_fp4.py (2)

flashinfer/gemm.py (1)

mm_fp4 (1858-2009)

flashinfer/utils.py (1)

is_backend_supported (953-964)

tests/gemm/test_groupwise_scaled_gemm_fp8.py (1)

flashinfer/utils.py (1)

get_compute_capability (251-254)

🔇 Additional comments (12)

tests/attention/test_trtllm_gen_attention.py (2)

352-353: Verify SM architecture restriction is intentional.

The test now only runs on SM100/SM103 GPUs (compute_capability[0] == 10), excluding all other architectures. This is a significant restriction compared to the previous logic that excluded only specific SM versions. Confirm this aligns with the supported hardware for trtllm_batch_context_with_kv_cache.

920-922: LGTM!

The SM architecture check is correctly placed for early exit, consistent with the pattern in test_trtllm_batch_prefill.

tests/gemm/test_mm_fp4.py (1)

29-34: LGTM!

Good addition of backend support validation using the centralized is_backend_supported() API. This provides early gating based on compute capability and backend combination, improving test clarity and preventing unsupported configurations from proceeding to more specific validation logic.

tests/attention/test_trtllm_gen_mla.py (1)

36-37: LGTM!

The SM architecture restriction is correctly implemented and consistent with other trtllm attention tests in this PR. The check appropriately gates execution to SM100/SM103 only.

tests/gemm/test_groupwise_scaled_gemm_mxfp4.py (1)

259-262: LGTM!

The SM architecture restriction correctly limits gemm_mxfp4_nt_groupwise to SM100/SM103 GPUs. This aligns with the broader PR pattern of tightening compute capability requirements for specialized kernels.

tests/moe/test_trtllm_gen_fused_moe.py (1)

2038-2039: LGTM!

The SM architecture check appropriately restricts the MoE test to SM100/SM103 GPUs, consistent with the compute capability gating pattern used throughout this PR.

tests/gemm/test_bmm_fp8.py (2)

20-21: LGTM!

Good refactoring to retrieve compute_capability once and reuse it, improving efficiency and readability.

29-32: LGTM!

The cutlass backend SM architecture gating is correctly implemented, restricting execution to SM100/103, SM110, and SM120/121 GPUs. This complements the existing xfail for known SM120/121 issues and provides clear skip messaging for unsupported architectures.

tests/gemm/test_groupwise_scaled_gemm_fp8.py (4)

46-50: LGTM!

The SM architecture check appropriately gates gemm_fp8_nt_blockscaled to SM100/103, SM110, and SM120/121 GPUs with a clear skip message.

91-91: LGTM!

Good practice to retrieve compute_capability early for reuse in subsequent conditional logic.

101-104: LGTM!

The cutlass backend gating for gemm_fp8_nt_groupwise correctly restricts execution to supported SM architectures (SM100/103, SM110, and SM120/121).

157-167: Verify SM110 exclusion is intentional for group_gemm_fp8_nt_groupwise.

Line 164-167 restricts group_gemm_fp8_nt_groupwise to SM100/103 and SM120/121 only, excluding SM110 unlike the other FP8 tests in this file (Lines 47, 102). If this exclusion is intentional due to specific SM110 limitations for grouped GEMMs, consider adding a brief inline comment explaining the rationale.

…shinfer-ai#1998)  A number of unit tests fail on Hopper because they either do not have a support-check or fail based on "what is not supported" while missing SM90. Current PR adds checks based on "what is supported" and skips if not in the supported list of SMs. Special case of `mm_fp4` where `mm_fp4.is_backend_supported(backend, compute_capability_number)` now exists and is used to skip tests if not supported. Impacted tests: * tests/attention/test_trtllm_gen_attention.py * tests/attention/test_trtllm_gen_mla.py * tests/gemm/test_bmm_fp8.py * tests/gemm/test_mm_fp4.py * tests/gemm/test_groupwise_scaled_gemm_fp8.py * tests/gemm/test_groupwise_scaled_gemm_mxfp4.py * tests/moe/test_trtllm_gen_fused_moe.py   Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.).

Add SM arch checks to pass on Hopper

eafc9a9

bkryu marked this pull request as ready for review October 28, 2025 18:35

bkryu self-assigned this Oct 28, 2025

bkryu assigned bkryu and unassigned bkryu Oct 28, 2025

coderabbitai bot reviewed Oct 28, 2025

View reviewed changes

bkryu requested a review from yzh119 October 28, 2025 18:45

yzh119 approved these changes Oct 28, 2025

View reviewed changes

bkryu merged commit c857f09 into flashinfer-ai:main Oct 28, 2025
4 checks passed

bkryu deleted the test_script_sm_check branch October 28, 2025 21:21

This was referenced Nov 3, 2025

update trtllm cutlass moe #2020

Merged

test: Skip test_fp8_quantize.py on Hopper #2052

Merged

coderabbitai bot mentioned this pull request Dec 4, 2025

fix: compile flags for trtllm fmha_v2 #2175

Merged

5 tasks

coderabbitai bot mentioned this pull request Jan 30, 2026

Skip trtllm_alltoall tests on Thor #2448

Merged

5 tasks

coderabbitai bot mentioned this pull request Feb 11, 2026

tests: bmm_fp8 for SM110 #2538

Merged

5 tasks

coderabbitai bot mentioned this pull request Mar 5, 2026

HOTFIX: Skip mamba Stochastic Rounding tests on sm_120 #2699

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unittest: Add SM arch checks to skip unsupported tests on Hopper#1998

unittest: Add SM arch checks to skip unsupported tests on Hopper#1998
bkryu merged 1 commit intoflashinfer-ai:mainfrom
bkryu:test_script_sm_check

bkryu commented Oct 28, 2025 •

edited

Loading

Uh oh!

bkryu commented Oct 28, 2025

Uh oh!

coderabbitai bot commented Oct 28, 2025 •

edited

Loading

Uh oh!

flashinfer-bot commented Oct 28, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bkryu commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Uh oh!

bkryu commented Oct 28, 2025

Uh oh!

coderabbitai bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

flashinfer-bot commented Oct 28, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bkryu commented Oct 28, 2025 •

edited

Loading

coderabbitai bot commented Oct 28, 2025 •

edited

Loading