Skip to content

ci(tests): waive flaky/hardware-incompatible tests to cleanup CI#2922

Closed
bobboli wants to merge 8 commits intoflashinfer-ai:mainfrom
bobboli:fix/narrow-fmha-v2-prefill-test-skip
Closed

ci(tests): waive flaky/hardware-incompatible tests to cleanup CI#2922
bobboli wants to merge 8 commits intoflashinfer-ai:mainfrom
bobboli:fix/narrow-fmha-v2-prefill-test-skip

Conversation

@bobboli
Copy link
Copy Markdown
Contributor

@bobboli bobboli commented Mar 31, 2026

Summary

Waives tests that consistently fail due to hardware/driver/library incompatibilities across CI nodes, unrelated to FlashInfer kernel correctness. Each waived test file is documented with the failure message and job link for tracking.


Waived tests

1. tests/attention/test_fmha_v2_prefill.py — narrow module-level skip to per-case skips

Replaces the blanket module-level skip (disabling all 2345 tests) with targeted per-case skips. Root-cause fix is in #2957.

Condition Reason
Sliding window (SWA) Already skipped (PR #2781)
fp8→fp8 Already skipped (PR #2781)
fp8 + head_dim=256 Barrier deadlock in `transpose_v_tile` (SM90); fixed in #2957

Confirmed with `compute-sanitizer --tool synccheck`:
```
Barrier error detected. Divergent thread(s) in block.
at fmha_v2_flash_attention_e4m3_S_qkv_256_tma_ws_sm90_kernel+0xedf0
(10848 errors total)
```
Validation: Full suite on H100 — 848 passed, 1497 skipped, 0 failed.


2. tests/attention/test_cudnn_prefill.py — fully skipped

`test_cudnn_prefill_fp8` fails on cu129 (CUDA 12.9) across all Blackwell GPUs (B300, GB200, GB300, B200):
```
cudnn._compiled_module.cudnnGraphNotSupportedError: No valid engine configs for ___________________
```
Passes on cu130 (CUDA 13.0). cuDNN graph compatibility issue with CUDA 12.9 on Blackwell.

Spotted in: https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/290478328, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/291103321, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/291103315


3. tests/gemm/test_bmm_mxfp8.py — fully skipped

`test_bmm_mxfp8` fails with the cuDNN backend on GB300 (cu129 + cu130) and B200 (cu130):
```
AssertionError: Cosine similarity 0.8984 is too low (expected > 0.9)
```
cuDNN MXFP8 BMM numerical accuracy falls below threshold on these hardware combinations.

Spotted in: https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/291775835, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/291775834, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/291775833


4. tests/attention/test_non_contiguous_prefill.py — fully skipped

`test_batch_ragged_prefill_packed_input` fails on B200 (cu129) with a corrupted JIT-compiled shared library:
```
RuntimeError: Failed to load dynamic shared library
.../batch_prefill_with_kv_cache_...head_dim_qk_256.../....so: file too short
```
The `.so` artifact is truncated during JIT compilation on this hardware, causing a load failure.

Spotted in: https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292022396


5. tests/gemm/test_mm_mxfp8.py — fully skipped

`test_mm_mxfp8` fails with the CUTLASS backend on RTX Pro 6000 Blackwell (cu129 + cu130) and RTX 5090 (cu129 + cu130):
```
FAILED tests/gemm/test_mm_mxfp8.py::test_mm_mxfp8[True-cutlass-out_dtype0-input_dtype0-False-...]
AssertionError: Cosine similarity too low
```
CUTLASS MXFP8 MM accuracy failures across all matrix shapes on these GPUs.

Spotted in: https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292022418, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292022417, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292022411, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292022410


6. tests/gemm/test_mm_fp4.py — fully skipped

`test_mm_fp4` fails with the CuteDSL backend on GB200 (cu130):
```
AssertionError: assert tensor(0.9688, device='cuda:0', dtype=torch.bfloat16) > 0.97
```
CuteDSL FP4 GEMM cosine similarity marginally below threshold on GB200.

Spotted in: https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292022401


7. tests/moe/test_trtllm_cutlass_fused_moe.py::test_moe_mxfp8_mxfp4 — skipped on SM120

`test_moe_mxfp8_mxfp4` fails on SM120 (RTX Pro 6000 Blackwell, RTX 5090, cu129 + cu130):
```
RuntimeError: Unsupported tile (128, 256, 64) and cluster (1, 1, 1) shape combination for arch 120.
```
TRT-LLM's CUTLASS MoE GEMM kernel (`fp8_e4m3 × fp4_e2m1`) has no valid tile configuration for SM120. Skip is targeted to SM120 only; SM100/SM110 are unaffected.

Spotted in: https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292360128, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292360127, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292360121, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292360120

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 31, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Removed a module-level pytest skip in tests/attention/test_fmha_v2_prefill.py and added a targeted runtime skip inside run_trtllm_fmha_v2_prefill_case for dtype == torch.float8_e4m3fn and head_dim == 256 to avoid an SM90 tma_ws kernel barrier deadlock. Added module-level pytestmark skips to tests/attention/test_cudnn_prefill.py and tests/gemm/test_bmm_mxfp8.py for platform-specific failures. Other per-case skips remain unchanged.

Changes

Cohort / File(s) Summary
FMHA v2 test refinement
tests/attention/test_fmha_v2_prefill.py
Removed module-level pytestmark skip; added a runtime conditional skip in run_trtllm_fmha_v2_prefill_case for dtype == torch.float8_e4m3fn && head_dim == 256 with an SM90 tma_ws kernel barrier deadlock reason. Existing in-test skip conditions preserved.
Module-level skips added
tests/attention/test_cudnn_prefill.py, tests/gemm/test_bmm_mxfp8.py
Added module-level pytestmark = pytest.mark.skip(...) to skip entire test modules: test_cudnn_prefill.py (cudnnGraphNotSupportedError on some hardware/CUDA combos) and test_bmm_mxfp8.py (cuDNN MXFP8 BMM cosine similarity too low on some hardware combos). No other test logic changes.

Sequence Diagram(s)

(omitted — changes are limited to test skip adjustments and do not introduce new multi-component control flow)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

Suggested reviewers

  • aleozlx
  • yzh119
  • jimmyzho
  • jiahanc
  • bkryu
  • nv-yunzheq
  • saltyminty

Poem

🐰 I hopped through tests both short and deep,
Nibbled broad skips down to careful leaps.
When float8_e4m3fn meets head_dim two-five-six,
I tiptoe past SM90's tma_ws tricks,
And leave CI fields with fewer slips.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The pull request description provides a comprehensive summary of changes, includes related context with issue links, and documents reasons for each test waiver with specific error messages and job references.
Title check ✅ Passed The title 'ci(tests): waive flaky/hardware-incompatible tests to cleanup CI' accurately describes the main objectives: narrowing test skips and adding new skips for flaky/incompatible tests across three test files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bobboli bobboli changed the title fix(tests): narrow skip scope in test_fmha_v2_prefill.py ci: narrow skip scope in test_fmha_v2_prefill.py Mar 31, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request re-enables the FMHA v2 prefill tests by removing the global skip and introduces targeted skips for FP8 configurations with a head dimension of 256 to avoid known hardware deadlocks. Feedback suggests centralizing this skip logic within the run_trtllm_fmha_v2_prefill_case helper function to reduce duplication and improve maintainability, as well as standardizing the TODO comment format.

Comment thread tests/attention/test_fmha_v2_prefill.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/attention/test_fmha_v2_prefill.py`:
- Around line 844-848: The test currently unconditionally skips FP8 case when
dtype == torch.float8_e4m3fn and head_dim == 256, which is SM90-specific; change
the condition to only skip when running on SM90 by wrapping the check with
flashinfer.utils.is_sm90a_supported() (or
get_compute_capability()/is_sm90a_supported helper), e.g., only call
pytest.skip(...) if is_sm90a_supported() and dtype == torch.float8_e4m3fn and
head_dim == 256; apply the same guarded check for the duplicate at lines 909-913
and optionally factor into a small helper like should_skip_fp8_head_dim_256()
used by both locations to centralize logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 68428e83-65d7-4e61-97aa-267b873769f4

📥 Commits

Reviewing files that changed from the base of the PR and between c31435b and 13884b2.

📒 Files selected for processing (1)
  • tests/attention/test_fmha_v2_prefill.py

Comment thread tests/attention/test_fmha_v2_prefill.py Outdated
@jiahanc
Copy link
Copy Markdown
Collaborator

jiahanc commented Mar 31, 2026

/bot run

@jiahanc jiahanc added the run-ci label Mar 31, 2026
@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !478 has been created, and the CI pipeline #47344575 is currently running. I'll report back once the pipeline job completes.

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

[FAILED] Pipeline #47344575: 7/20 passed

@samuellees
Copy link
Copy Markdown
Collaborator

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !478 has been created, and the CI pipeline #47424795 is currently running. I'll report back once the pipeline job completes.

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

[CANCELING] Pipeline #47424795: canceled

@bobboli bobboli force-pushed the fix/narrow-fmha-v2-prefill-test-skip branch from 284f025 to 5d129c7 Compare April 2, 2026 03:32
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/attention/test_fmha_v2_prefill.py (1)

495-500: Consider making the SM90 condition explicit for clarity (optional).

The current skip logic is functionally correct—by this point in the code flow, FP8 dtype can only be reached on SM90A (since SM120+ FP8 is already skipped at lines 493-494). However, adding an explicit is_sm90a_supported() check would make the SM90-specific nature self-documenting and more robust if the surrounding guard structure ever changes.

This addresses the same concern from the prior review, but note that the current code is not actually over-suppressing tests—it's just implicit.

🔧 Optional refactor for explicit SM90 guard
-    if dtype == torch.float8_e4m3fn and head_dim == 256:
+    if (
+        is_sm90a_supported(torch.device("cuda"))
+        and dtype == torch.float8_e4m3fn
+        and head_dim == 256
+    ):
         pytest.skip(
             "todo(bobboli): fp8 with head_dim=256 hangs on SM90 tma_ws kernel due to "
             "barrier deadlock in transpose_v_tile "
             "(fmha_v2_flash_attention_e4m3_*_S_qkv_256_*_tma_ws_sm90_kernel+0xedf0)"
         )

As per coding guidelines: Use flashinfer.utils functions (get_compute_capability(), is_sm90a_supported(), is_sm100a_supported()) to skip tests on unsupported GPU architectures.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/attention/test_fmha_v2_prefill.py` around lines 495 - 500, The skip for
FP8 dtype with head_dim==256 should explicitly check for SM90 support to make
the guard self-documenting and robust: update the condition that currently tests
"dtype == torch.float8_e4m3fn and head_dim == 256" to also call the
flashinfer/utils helper (e.g., is_sm90a_supported() or check
get_compute_capability()) so the pytest.skip is only executed when running on
SM90A; reference the symbols dtype, torch.float8_e4m3fn, head_dim, pytest.skip,
and is_sm90a_supported() to locate and modify the condition.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/attention/test_fmha_v2_prefill.py`:
- Around line 495-500: The skip for FP8 dtype with head_dim==256 should
explicitly check for SM90 support to make the guard self-documenting and robust:
update the condition that currently tests "dtype == torch.float8_e4m3fn and
head_dim == 256" to also call the flashinfer/utils helper (e.g.,
is_sm90a_supported() or check get_compute_capability()) so the pytest.skip is
only executed when running on SM90A; reference the symbols dtype,
torch.float8_e4m3fn, head_dim, pytest.skip, and is_sm90a_supported() to locate
and modify the condition.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 33605633-6693-44c0-8419-be2726f14d25

📥 Commits

Reviewing files that changed from the base of the PR and between 284f025 and 5d129c7.

📒 Files selected for processing (1)
  • tests/attention/test_fmha_v2_prefill.py

@samuellees
Copy link
Copy Markdown
Collaborator

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !478 has been updated with latest changes, and the CI pipeline #47507304 is currently running. I'll report back once the pipeline job completes.

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

[FAILED] Pipeline #47507304: 10/20 passed

@bobboli
Copy link
Copy Markdown
Contributor Author

bobboli commented Apr 2, 2026

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

@bobboli is not authorized to trigger this CI job. cc: @yzh119, @sricketts, @yongwww

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/attention/test_cudnn_prefill.py`:
- Around line 4-6: Replace the unconditional module skip (pytestmark =
pytest.mark.skip) with a targeted skipif that uses flashinfer.utils helpers
(get_compute_capability(), is_sm90a_supported(), is_sm100a_supported()) to
detect unsupported GPU architectures and skip only those environments, and/or
wrap the cuDNN prefill setup in a try/except that catches the specific
cudnnGraphNotSupportedError and calls pytest.skip with a clear message; update
the pytestmark or test setup code that references the module-level skip to
instead perform the architecture predicate check or the exception-based skip so
tests run on healthy hardware/CUDA combos.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c418df7f-b272-47b0-87a7-074450a46972

📥 Commits

Reviewing files that changed from the base of the PR and between 5d129c7 and 0fc48eb.

📒 Files selected for processing (1)
  • tests/attention/test_cudnn_prefill.py

Comment thread tests/attention/test_cudnn_prefill.py
@jimmyzho
Copy link
Copy Markdown
Contributor

jimmyzho commented Apr 2, 2026

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !478 has been updated with latest changes, and the CI pipeline #47576434 is currently running. I'll report back once the pipeline job completes.

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

[FAILED] Pipeline #47576434: 4/20 passed

@bobboli
Copy link
Copy Markdown
Contributor Author

bobboli commented Apr 3, 2026

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !478 has been created, and the CI pipeline #47600263 is currently running. I'll report back once the pipeline job completes.

@bobboli bobboli requested a review from aleozlx as a code owner April 3, 2026 16:21
@bobboli bobboli force-pushed the fix/narrow-fmha-v2-prefill-test-skip branch from 67ce95c to 7e927f3 Compare April 3, 2026 16:23
@bobboli
Copy link
Copy Markdown
Contributor Author

bobboli commented Apr 3, 2026

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !478 has been updated with latest changes, and the CI pipeline #47640862 is currently running. I'll report back once the pipeline job completes.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
tests/attention/test_fmha_v2_prefill.py (1)

495-500: ⚠️ Potential issue | 🟡 Minor

Scope the fp8/head_dim=256 skip to SM90 explicitly.

Line 495 currently skips by dtype/head_dim only, while the reason is SM90-specific. Please gate it with is_sm90a_supported(...) so coverage isn’t unnecessarily masked if non-SM90 behavior changes.

🔧 Proposed fix
-    if dtype == torch.float8_e4m3fn and head_dim == 256:
+    if (
+        is_sm90a_supported(torch.device("cuda"))
+        and dtype == torch.float8_e4m3fn
+        and head_dim == 256
+    ):
         pytest.skip(
             "todo(bobboli): fp8 with head_dim=256 hangs on SM90 tma_ws kernel due to "
             "barrier deadlock in transpose_v_tile "
             "(fmha_v2_flash_attention_e4m3_*_S_qkv_256_*_tma_ws_sm90_kernel+0xedf0)"
         )

As per coding guidelines: Use flashinfer.utils functions (get_compute_capability(), is_sm90a_supported(), is_sm100a_supported()) to skip tests on unsupported GPU architectures.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/attention/test_fmha_v2_prefill.py` around lines 495 - 500, The skip
currently triggers for dtype == torch.float8_e4m3fn and head_dim == 256
regardless of GPU; change it to only skip when the SM90 architecture is present
by wrapping the condition with an SM90 check (use is_sm90a_supported(...) from
flashinfer.utils or combine get_compute_capability()/is_sm90a_supported()).
Update the if that calls pytest.skip so it requires dtype ==
torch.float8_e4m3fn, head_dim == 256, and is_sm90a_supported(...) to all be true
before skipping to avoid masking non-SM90 failures.
🧹 Nitpick comments (1)
tests/gemm/test_bmm_mxfp8.py (1)

5-7: Consider narrowing the skip to specific hardware combinations when root cause is identified.

The blanket module-level skip is reasonable as a temporary measure to unblock CI. However, as per coding guidelines, tests should use targeted skips with get_compute_capability(), is_sm90a_supported(), or is_sm100a_supported() to skip only on unsupported architectures.

The test already demonstrates this pattern at lines 26-32. Once the "too low cosine similarity" issue is root-caused, consider replacing this blanket skip with a targeted pytest.mark.skipif that checks the specific hardware combinations affected.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/gemm/test_bmm_mxfp8.py` around lines 5 - 7, The module-level blanket
skip assigned to pytestmark should be replaced with a targeted skipif once the
failing hardware is identified: remove or replace pytestmark =
pytest.mark.skip(...) and instead apply pytest.mark.skipif(...) using the
existing helpers such as get_compute_capability(), is_sm90a_supported(), and
is_sm100a_supported() (as used in the test at lines 26-32) to only skip on the
specific compute-capability/SM variants that exhibit the low cosine similarity;
update the skip reason to mention the exact hardware condition and retain the
test for other architectures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@tests/attention/test_fmha_v2_prefill.py`:
- Around line 495-500: The skip currently triggers for dtype ==
torch.float8_e4m3fn and head_dim == 256 regardless of GPU; change it to only
skip when the SM90 architecture is present by wrapping the condition with an
SM90 check (use is_sm90a_supported(...) from flashinfer.utils or combine
get_compute_capability()/is_sm90a_supported()). Update the if that calls
pytest.skip so it requires dtype == torch.float8_e4m3fn, head_dim == 256, and
is_sm90a_supported(...) to all be true before skipping to avoid masking non-SM90
failures.

---

Nitpick comments:
In `@tests/gemm/test_bmm_mxfp8.py`:
- Around line 5-7: The module-level blanket skip assigned to pytestmark should
be replaced with a targeted skipif once the failing hardware is identified:
remove or replace pytestmark = pytest.mark.skip(...) and instead apply
pytest.mark.skipif(...) using the existing helpers such as
get_compute_capability(), is_sm90a_supported(), and is_sm100a_supported() (as
used in the test at lines 26-32) to only skip on the specific
compute-capability/SM variants that exhibit the low cosine similarity; update
the skip reason to mention the exact hardware condition and retain the test for
other architectures.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 18e657b0-1b99-44ef-9fc3-97fd7fef6a08

📥 Commits

Reviewing files that changed from the base of the PR and between 0fc48eb and 7e927f3.

📒 Files selected for processing (3)
  • tests/attention/test_cudnn_prefill.py
  • tests/attention/test_fmha_v2_prefill.py
  • tests/gemm/test_bmm_mxfp8.py
✅ Files skipped from review due to trivial changes (1)
  • tests/attention/test_cudnn_prefill.py

@bobboli bobboli changed the title ci: narrow skip scope in test_fmha_v2_prefill.py ci(tests): waive flaky/hardware-incompatible tests to unblock CI Apr 4, 2026
bobboli and others added 7 commits April 4, 2026 05:41
Replace the module-level pytestmark skip (which disabled all 2345 tests)
with targeted per-case skips for the known hanging configurations:

- fp8→fp8: already skipped (carried over from jimmyzho's PR flashinfer-ai#2781)
- Sliding window (SWA): already skipped (carried over from PR flashinfer-ai#2781)
- fp8 + head_dim=256: newly identified via compute-sanitizer synccheck

Root cause of the fp8+head_dim=256 hang: barrier deadlock in the SM90
TMA warp-specialized kernel (transpose_v_tile in dma.h:672). The named
barrier expects 128 threads but the head_dim=256 fp8 kernel configuration
(STEP_KV=128, BMM2_K_GROUPS=1) creates a thread layout mismatch, causing
divergent threads at:
  fmha_v2_flash_attention_e4m3_*_S_qkv_256_*_tma_ws_sm90_kernel+0xedf0

Validated with two sequential clean runs (712 passed, 1633 skipped each).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…efill_case

Move the fp8+head_dim=256 skip from individual test functions into the
shared run_trtllm_fmha_v2_prefill_case helper, consistent with where
other config-based skips live. Also use todo(bobboli) format.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ror on some hardware/CUDA version combinations

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
…arity failures on some hardware

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
…accuracy failures on some hardware

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
…ilure on some hardware

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
@bobboli bobboli force-pushed the fix/narrow-fmha-v2-prefill-test-skip branch from 8bc7b37 to 1596f62 Compare April 4, 2026 05:42
@bobboli
Copy link
Copy Markdown
Contributor Author

bobboli commented Apr 4, 2026

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !478 has been updated with latest changes, and the CI pipeline #47694453 is currently running. I'll report back once the pipeline job completes.

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

[FAILED] Pipeline #47694453: 8/20 passed

…tile config

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
@bobboli bobboli changed the title ci(tests): waive flaky/hardware-incompatible tests to unblock CI ci(tests): waive flaky/hardware-incompatible tests to cleanup CI Apr 4, 2026
@bobboli
Copy link
Copy Markdown
Contributor Author

bobboli commented Apr 4, 2026

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !478 has been updated with latest changes, and the CI pipeline #47718122 is currently running. I'll report back once the pipeline job completes.


if torch.cuda.get_device_capability()[0] == 12:
pytest.skip(
"todo: CUTLASS MoE MXFP8xMXFP4 has no valid tile config for SM120 "
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2020 introduced SM120. It's wired because PR2020's CI passed

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

[SUCCESS] Pipeline #47718122: 14/20 passed

@aleozlx
Copy link
Copy Markdown
Collaborator

aleozlx commented Apr 5, 2026

thx for contrib
i think all of these are getting addressed soon if not already.
see some other recent bot run from today https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/pipelines/47787607

we may not need this, thus closing for now

@aleozlx aleozlx closed this Apr 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants