ci(tests): waive flaky/hardware-incompatible tests to cleanup CI#2922
ci(tests): waive flaky/hardware-incompatible tests to cleanup CI#2922bobboli wants to merge 8 commits intoflashinfer-ai:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRemoved a module-level pytest skip in Changes
Sequence Diagram(s)(omitted — changes are limited to test skip adjustments and do not introduce new multi-component control flow) Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request re-enables the FMHA v2 prefill tests by removing the global skip and introduces targeted skips for FP8 configurations with a head dimension of 256 to avoid known hardware deadlocks. Feedback suggests centralizing this skip logic within the run_trtllm_fmha_v2_prefill_case helper function to reduce duplication and improve maintainability, as well as standardizing the TODO comment format.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/attention/test_fmha_v2_prefill.py`:
- Around line 844-848: The test currently unconditionally skips FP8 case when
dtype == torch.float8_e4m3fn and head_dim == 256, which is SM90-specific; change
the condition to only skip when running on SM90 by wrapping the check with
flashinfer.utils.is_sm90a_supported() (or
get_compute_capability()/is_sm90a_supported helper), e.g., only call
pytest.skip(...) if is_sm90a_supported() and dtype == torch.float8_e4m3fn and
head_dim == 256; apply the same guarded check for the duplicate at lines 909-913
and optionally factor into a small helper like should_skip_fp8_head_dim_256()
used by both locations to centralize logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 68428e83-65d7-4e61-97aa-267b873769f4
📒 Files selected for processing (1)
tests/attention/test_fmha_v2_prefill.py
|
/bot run |
|
[FAILED] Pipeline #47344575: 7/20 passed |
|
/bot run |
|
[CANCELING] Pipeline #47424795: canceled |
284f025 to
5d129c7
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/attention/test_fmha_v2_prefill.py (1)
495-500: Consider making the SM90 condition explicit for clarity (optional).The current skip logic is functionally correct—by this point in the code flow, FP8 dtype can only be reached on SM90A (since SM120+ FP8 is already skipped at lines 493-494). However, adding an explicit
is_sm90a_supported()check would make the SM90-specific nature self-documenting and more robust if the surrounding guard structure ever changes.This addresses the same concern from the prior review, but note that the current code is not actually over-suppressing tests—it's just implicit.
🔧 Optional refactor for explicit SM90 guard
- if dtype == torch.float8_e4m3fn and head_dim == 256: + if ( + is_sm90a_supported(torch.device("cuda")) + and dtype == torch.float8_e4m3fn + and head_dim == 256 + ): pytest.skip( "todo(bobboli): fp8 with head_dim=256 hangs on SM90 tma_ws kernel due to " "barrier deadlock in transpose_v_tile " "(fmha_v2_flash_attention_e4m3_*_S_qkv_256_*_tma_ws_sm90_kernel+0xedf0)" )As per coding guidelines: Use flashinfer.utils functions (
get_compute_capability(),is_sm90a_supported(),is_sm100a_supported()) to skip tests on unsupported GPU architectures.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/attention/test_fmha_v2_prefill.py` around lines 495 - 500, The skip for FP8 dtype with head_dim==256 should explicitly check for SM90 support to make the guard self-documenting and robust: update the condition that currently tests "dtype == torch.float8_e4m3fn and head_dim == 256" to also call the flashinfer/utils helper (e.g., is_sm90a_supported() or check get_compute_capability()) so the pytest.skip is only executed when running on SM90A; reference the symbols dtype, torch.float8_e4m3fn, head_dim, pytest.skip, and is_sm90a_supported() to locate and modify the condition.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/attention/test_fmha_v2_prefill.py`:
- Around line 495-500: The skip for FP8 dtype with head_dim==256 should
explicitly check for SM90 support to make the guard self-documenting and robust:
update the condition that currently tests "dtype == torch.float8_e4m3fn and
head_dim == 256" to also call the flashinfer/utils helper (e.g.,
is_sm90a_supported() or check get_compute_capability()) so the pytest.skip is
only executed when running on SM90A; reference the symbols dtype,
torch.float8_e4m3fn, head_dim, pytest.skip, and is_sm90a_supported() to locate
and modify the condition.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 33605633-6693-44c0-8419-be2726f14d25
📒 Files selected for processing (1)
tests/attention/test_fmha_v2_prefill.py
|
/bot run |
|
[FAILED] Pipeline #47507304: 10/20 passed |
|
/bot run |
|
@bobboli is not authorized to trigger this CI job. cc: @yzh119, @sricketts, @yongwww |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/attention/test_cudnn_prefill.py`:
- Around line 4-6: Replace the unconditional module skip (pytestmark =
pytest.mark.skip) with a targeted skipif that uses flashinfer.utils helpers
(get_compute_capability(), is_sm90a_supported(), is_sm100a_supported()) to
detect unsupported GPU architectures and skip only those environments, and/or
wrap the cuDNN prefill setup in a try/except that catches the specific
cudnnGraphNotSupportedError and calls pytest.skip with a clear message; update
the pytestmark or test setup code that references the module-level skip to
instead perform the architecture predicate check or the exception-based skip so
tests run on healthy hardware/CUDA combos.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c418df7f-b272-47b0-87a7-074450a46972
📒 Files selected for processing (1)
tests/attention/test_cudnn_prefill.py
|
/bot run |
|
[FAILED] Pipeline #47576434: 4/20 passed |
|
/bot run |
67ce95c to
7e927f3
Compare
|
/bot run |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
tests/attention/test_fmha_v2_prefill.py (1)
495-500:⚠️ Potential issue | 🟡 MinorScope the fp8/head_dim=256 skip to SM90 explicitly.
Line 495 currently skips by dtype/head_dim only, while the reason is SM90-specific. Please gate it with
is_sm90a_supported(...)so coverage isn’t unnecessarily masked if non-SM90 behavior changes.🔧 Proposed fix
- if dtype == torch.float8_e4m3fn and head_dim == 256: + if ( + is_sm90a_supported(torch.device("cuda")) + and dtype == torch.float8_e4m3fn + and head_dim == 256 + ): pytest.skip( "todo(bobboli): fp8 with head_dim=256 hangs on SM90 tma_ws kernel due to " "barrier deadlock in transpose_v_tile " "(fmha_v2_flash_attention_e4m3_*_S_qkv_256_*_tma_ws_sm90_kernel+0xedf0)" )As per coding guidelines: Use flashinfer.utils functions (
get_compute_capability(),is_sm90a_supported(),is_sm100a_supported()) to skip tests on unsupported GPU architectures.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/attention/test_fmha_v2_prefill.py` around lines 495 - 500, The skip currently triggers for dtype == torch.float8_e4m3fn and head_dim == 256 regardless of GPU; change it to only skip when the SM90 architecture is present by wrapping the condition with an SM90 check (use is_sm90a_supported(...) from flashinfer.utils or combine get_compute_capability()/is_sm90a_supported()). Update the if that calls pytest.skip so it requires dtype == torch.float8_e4m3fn, head_dim == 256, and is_sm90a_supported(...) to all be true before skipping to avoid masking non-SM90 failures.
🧹 Nitpick comments (1)
tests/gemm/test_bmm_mxfp8.py (1)
5-7: Consider narrowing the skip to specific hardware combinations when root cause is identified.The blanket module-level skip is reasonable as a temporary measure to unblock CI. However, as per coding guidelines, tests should use targeted skips with
get_compute_capability(),is_sm90a_supported(), oris_sm100a_supported()to skip only on unsupported architectures.The test already demonstrates this pattern at lines 26-32. Once the "too low cosine similarity" issue is root-caused, consider replacing this blanket skip with a targeted
pytest.mark.skipifthat checks the specific hardware combinations affected.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/gemm/test_bmm_mxfp8.py` around lines 5 - 7, The module-level blanket skip assigned to pytestmark should be replaced with a targeted skipif once the failing hardware is identified: remove or replace pytestmark = pytest.mark.skip(...) and instead apply pytest.mark.skipif(...) using the existing helpers such as get_compute_capability(), is_sm90a_supported(), and is_sm100a_supported() (as used in the test at lines 26-32) to only skip on the specific compute-capability/SM variants that exhibit the low cosine similarity; update the skip reason to mention the exact hardware condition and retain the test for other architectures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@tests/attention/test_fmha_v2_prefill.py`:
- Around line 495-500: The skip currently triggers for dtype ==
torch.float8_e4m3fn and head_dim == 256 regardless of GPU; change it to only
skip when the SM90 architecture is present by wrapping the condition with an
SM90 check (use is_sm90a_supported(...) from flashinfer.utils or combine
get_compute_capability()/is_sm90a_supported()). Update the if that calls
pytest.skip so it requires dtype == torch.float8_e4m3fn, head_dim == 256, and
is_sm90a_supported(...) to all be true before skipping to avoid masking non-SM90
failures.
---
Nitpick comments:
In `@tests/gemm/test_bmm_mxfp8.py`:
- Around line 5-7: The module-level blanket skip assigned to pytestmark should
be replaced with a targeted skipif once the failing hardware is identified:
remove or replace pytestmark = pytest.mark.skip(...) and instead apply
pytest.mark.skipif(...) using the existing helpers such as
get_compute_capability(), is_sm90a_supported(), and is_sm100a_supported() (as
used in the test at lines 26-32) to only skip on the specific
compute-capability/SM variants that exhibit the low cosine similarity; update
the skip reason to mention the exact hardware condition and retain the test for
other architectures.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 18e657b0-1b99-44ef-9fc3-97fd7fef6a08
📒 Files selected for processing (3)
tests/attention/test_cudnn_prefill.pytests/attention/test_fmha_v2_prefill.pytests/gemm/test_bmm_mxfp8.py
✅ Files skipped from review due to trivial changes (1)
- tests/attention/test_cudnn_prefill.py
Replace the module-level pytestmark skip (which disabled all 2345 tests) with targeted per-case skips for the known hanging configurations: - fp8→fp8: already skipped (carried over from jimmyzho's PR flashinfer-ai#2781) - Sliding window (SWA): already skipped (carried over from PR flashinfer-ai#2781) - fp8 + head_dim=256: newly identified via compute-sanitizer synccheck Root cause of the fp8+head_dim=256 hang: barrier deadlock in the SM90 TMA warp-specialized kernel (transpose_v_tile in dma.h:672). The named barrier expects 128 threads but the head_dim=256 fp8 kernel configuration (STEP_KV=128, BMM2_K_GROUPS=1) creates a thread layout mismatch, causing divergent threads at: fmha_v2_flash_attention_e4m3_*_S_qkv_256_*_tma_ws_sm90_kernel+0xedf0 Validated with two sequential clean runs (712 passed, 1633 skipped each). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…efill_case Move the fp8+head_dim=256 skip from individual test functions into the shared run_trtllm_fmha_v2_prefill_case helper, consistent with where other config-based skips live. Also use todo(bobboli) format. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ror on some hardware/CUDA version combinations Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
…arity failures on some hardware Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
…accuracy failures on some hardware Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
…ilure on some hardware Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
8bc7b37 to
1596f62
Compare
|
/bot run |
|
[FAILED] Pipeline #47694453: 8/20 passed |
…tile config Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
/bot run |
|
|
||
| if torch.cuda.get_device_capability()[0] == 12: | ||
| pytest.skip( | ||
| "todo: CUTLASS MoE MXFP8xMXFP4 has no valid tile config for SM120 " |
There was a problem hiding this comment.
#2020 introduced SM120. It's wired because PR2020's CI passed
|
[SUCCESS] Pipeline #47718122: 14/20 passed |
|
thx for contrib we may not need this, thus closing for now |
Summary
Waives tests that consistently fail due to hardware/driver/library incompatibilities across CI nodes, unrelated to FlashInfer kernel correctness. Each waived test file is documented with the failure message and job link for tracking.
Waived tests
1.
tests/attention/test_fmha_v2_prefill.py— narrow module-level skip to per-case skipsReplaces the blanket module-level skip (disabling all 2345 tests) with targeted per-case skips. Root-cause fix is in #2957.
Confirmed with `compute-sanitizer --tool synccheck`:
```
Barrier error detected. Divergent thread(s) in block.
at fmha_v2_flash_attention_e4m3_S_qkv_256_tma_ws_sm90_kernel+0xedf0
(10848 errors total)
```
Validation: Full suite on H100 — 848 passed, 1497 skipped, 0 failed.
2.
tests/attention/test_cudnn_prefill.py— fully skipped`test_cudnn_prefill_fp8` fails on cu129 (CUDA 12.9) across all Blackwell GPUs (B300, GB200, GB300, B200):
```
cudnn._compiled_module.cudnnGraphNotSupportedError: No valid engine configs for ___________________
```
Passes on cu130 (CUDA 13.0). cuDNN graph compatibility issue with CUDA 12.9 on Blackwell.
Spotted in: https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/290478328, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/291103321, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/291103315
3.
tests/gemm/test_bmm_mxfp8.py— fully skipped`test_bmm_mxfp8` fails with the cuDNN backend on GB300 (cu129 + cu130) and B200 (cu130):
```
AssertionError: Cosine similarity 0.8984 is too low (expected > 0.9)
```
cuDNN MXFP8 BMM numerical accuracy falls below threshold on these hardware combinations.
Spotted in: https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/291775835, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/291775834, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/291775833
4.
tests/attention/test_non_contiguous_prefill.py— fully skipped`test_batch_ragged_prefill_packed_input` fails on B200 (cu129) with a corrupted JIT-compiled shared library:
```
RuntimeError: Failed to load dynamic shared library
.../batch_prefill_with_kv_cache_...head_dim_qk_256.../....so: file too short
```
The `.so` artifact is truncated during JIT compilation on this hardware, causing a load failure.
Spotted in: https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292022396
5.
tests/gemm/test_mm_mxfp8.py— fully skipped`test_mm_mxfp8` fails with the CUTLASS backend on RTX Pro 6000 Blackwell (cu129 + cu130) and RTX 5090 (cu129 + cu130):
```
FAILED tests/gemm/test_mm_mxfp8.py::test_mm_mxfp8[True-cutlass-out_dtype0-input_dtype0-False-...]
AssertionError: Cosine similarity too low
```
CUTLASS MXFP8 MM accuracy failures across all matrix shapes on these GPUs.
Spotted in: https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292022418, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292022417, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292022411, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292022410
6.
tests/gemm/test_mm_fp4.py— fully skipped`test_mm_fp4` fails with the CuteDSL backend on GB200 (cu130):
```
AssertionError: assert tensor(0.9688, device='cuda:0', dtype=torch.bfloat16) > 0.97
```
CuteDSL FP4 GEMM cosine similarity marginally below threshold on GB200.
Spotted in: https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292022401
7.
tests/moe/test_trtllm_cutlass_fused_moe.py::test_moe_mxfp8_mxfp4— skipped on SM120`test_moe_mxfp8_mxfp4` fails on SM120 (RTX Pro 6000 Blackwell, RTX 5090, cu129 + cu130):
```
RuntimeError: Unsupported tile (128, 256, 64) and cluster (1, 1, 1) shape combination for arch 120.
```
TRT-LLM's CUTLASS MoE GEMM kernel (`fp8_e4m3 × fp4_e2m1`) has no valid tile configuration for SM120. Skip is targeted to SM120 only; SM100/SM110 are unaffected.
Spotted in: https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292360128, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292360127, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292360121, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292360120