ci(tests): waive flaky/hardware-incompatible tests to cleanup CI by bobboli · Pull Request #2922 · flashinfer-ai/flashinfer

bobboli · 2026-03-31T10:33:53Z

Summary

Waives tests that consistently fail due to hardware/driver/library incompatibilities across CI nodes, unrelated to FlashInfer kernel correctness. Each waived test file is documented with the failure message and job link for tracking.

Waived tests

1. `tests/attention/test_fmha_v2_prefill.py` — narrow module-level skip to per-case skips

Replaces the blanket module-level skip (disabling all 2345 tests) with targeted per-case skips. Root-cause fix is in #2957.

Condition	Reason
Sliding window (SWA)	Already skipped (PR #2781)
fp8→fp8	Already skipped (PR #2781)
fp8 + head_dim=256	Barrier deadlock in `transpose_v_tile` (SM90); fixed in #2957

Confirmed with `compute-sanitizer --tool synccheck`:
```
Barrier error detected. Divergent thread(s) in block.
at fmha_v2_flash_attention_e4m3_S_qkv_256_tma_ws_sm90_kernel+0xedf0
(10848 errors total)
```
Validation: Full suite on H100 — 848 passed, 1497 skipped, 0 failed.

2. `tests/attention/test_cudnn_prefill.py` — fully skipped

`test_cudnn_prefill_fp8` fails on cu129 (CUDA 12.9) across all Blackwell GPUs (B300, GB200, GB300, B200):
```
cudnn._compiled_module.cudnnGraphNotSupportedError: No valid engine configs for ___________________
```
Passes on cu130 (CUDA 13.0). cuDNN graph compatibility issue with CUDA 12.9 on Blackwell.

Spotted in: https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/290478328, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/291103321, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/291103315

3. `tests/gemm/test_bmm_mxfp8.py` — fully skipped

`test_bmm_mxfp8` fails with the cuDNN backend on GB300 (cu129 + cu130) and B200 (cu130):
```
AssertionError: Cosine similarity 0.8984 is too low (expected > 0.9)
```
cuDNN MXFP8 BMM numerical accuracy falls below threshold on these hardware combinations.

Spotted in: https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/291775835, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/291775834, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/291775833

4. `tests/attention/test_non_contiguous_prefill.py` — fully skipped

`test_batch_ragged_prefill_packed_input` fails on B200 (cu129) with a corrupted JIT-compiled shared library:
```
RuntimeError: Failed to load dynamic shared library
.../batch_prefill_with_kv_cache_...head_dim_qk_256.../....so: file too short
```
The `.so` artifact is truncated during JIT compilation on this hardware, causing a load failure.

Spotted in: https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292022396

5. `tests/gemm/test_mm_mxfp8.py` — fully skipped

`test_mm_mxfp8` fails with the CUTLASS backend on RTX Pro 6000 Blackwell (cu129 + cu130) and RTX 5090 (cu129 + cu130):
```
FAILED tests/gemm/test_mm_mxfp8.py::test_mm_mxfp8[True-cutlass-out_dtype0-input_dtype0-False-...]
AssertionError: Cosine similarity too low
```
CUTLASS MXFP8 MM accuracy failures across all matrix shapes on these GPUs.

Spotted in: https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292022418, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292022417, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292022411, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292022410

6. `tests/gemm/test_mm_fp4.py` — fully skipped

`test_mm_fp4` fails with the CuteDSL backend on GB200 (cu130):
```
AssertionError: assert tensor(0.9688, device='cuda:0', dtype=torch.bfloat16) > 0.97
```
CuteDSL FP4 GEMM cosine similarity marginally below threshold on GB200.

Spotted in: https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292022401

7. `tests/moe/test_trtllm_cutlass_fused_moe.py::test_moe_mxfp8_mxfp4` — skipped on SM120

`test_moe_mxfp8_mxfp4` fails on SM120 (RTX Pro 6000 Blackwell, RTX 5090, cu129 + cu130):
```
RuntimeError: Unsupported tile (128, 256, 64) and cluster (1, 1, 1) shape combination for arch 120.
```
TRT-LLM's CUTLASS MoE GEMM kernel (`fp8_e4m3 × fp4_e2m1`) has no valid tile configuration for SM120. Skip is targeted to SM120 only; SM100/SM110 are unaffected.

Spotted in: https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292360128, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292360127, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292360121, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/292360120

coderabbitai · 2026-03-31T10:34:11Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Removed a module-level pytest skip in tests/attention/test_fmha_v2_prefill.py and added a targeted runtime skip inside run_trtllm_fmha_v2_prefill_case for dtype == torch.float8_e4m3fn and head_dim == 256 to avoid an SM90 tma_ws kernel barrier deadlock. Added module-level pytestmark skips to tests/attention/test_cudnn_prefill.py and tests/gemm/test_bmm_mxfp8.py for platform-specific failures. Other per-case skips remain unchanged.

Changes

Cohort / File(s)	Summary
FMHA v2 test refinement `tests/attention/test_fmha_v2_prefill.py`	Removed module-level `pytestmark` skip; added a runtime conditional skip in `run_trtllm_fmha_v2_prefill_case` for `dtype == torch.float8_e4m3fn` && `head_dim == 256` with an SM90 `tma_ws` kernel barrier deadlock reason. Existing in-test skip conditions preserved.
Module-level skips added `tests/attention/test_cudnn_prefill.py`, `tests/gemm/test_bmm_mxfp8.py`	Added module-level `pytestmark = pytest.mark.skip(...)` to skip entire test modules: `test_cudnn_prefill.py` (cudnnGraphNotSupportedError on some hardware/CUDA combos) and `test_bmm_mxfp8.py` (cuDNN MXFP8 BMM cosine similarity too low on some hardware combos). No other test logic changes.

Sequence Diagram(s)

(omitted — changes are limited to test skip adjustments and do not introduce new multi-component control flow)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

fix: compile flags for trtllm fmha_v2 #2175 — touches fmha_v2 prefill tests and device/SM-level guards similar to this PR.
tests: skip sliding window + fp8 to prevent hang in fmha_v2 unit tests #2781 — adjusts fmha_v2 prefill test skips for FP8/sliding-window hang cases that intersect with this PR's skip logic.

Suggested reviewers

aleozlx
yzh119
jimmyzho
jiahanc
bkryu
nv-yunzheq
saltyminty

Poem

🐰 I hopped through tests both short and deep,
Nibbled broad skips down to careful leaps.
When float8_e4m3fn meets head_dim two-five-six,
I tiptoe past SM90's tma_ws tricks,
And leave CI fields with fewer slips.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description check	✅ Passed	The pull request description provides a comprehensive summary of changes, includes related context with issue links, and documents reasons for each test waiver with specific error messages and job references.
Title check	✅ Passed	The title 'ci(tests): waive flaky/hardware-incompatible tests to cleanup CI' accurately describes the main objectives: narrowing test skips and adding new skips for flaky/incompatible tests across three test files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request re-enables the FMHA v2 prefill tests by removing the global skip and introduces targeted skips for FP8 configurations with a head dimension of 256 to avoid known hardware deadlocks. Feedback suggests centralizing this skip logic within the run_trtllm_fmha_v2_prefill_case helper function to reduce duplication and improve maintainability, as well as standardizing the TODO comment format.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/attention/test_fmha_v2_prefill.py`:
- Around line 844-848: The test currently unconditionally skips FP8 case when
dtype == torch.float8_e4m3fn and head_dim == 256, which is SM90-specific; change
the condition to only skip when running on SM90 by wrapping the check with
flashinfer.utils.is_sm90a_supported() (or
get_compute_capability()/is_sm90a_supported helper), e.g., only call
pytest.skip(...) if is_sm90a_supported() and dtype == torch.float8_e4m3fn and
head_dim == 256; apply the same guarded check for the duplicate at lines 909-913
and optionally factor into a small helper like should_skip_fp8_head_dim_256()
used by both locations to centralize logic.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 68428e83-65d7-4e61-97aa-267b873769f4

📥 Commits

Reviewing files that changed from the base of the PR and between c31435b and 13884b2.

📒 Files selected for processing (1)

tests/attention/test_fmha_v2_prefill.py

jiahanc · 2026-03-31T11:14:06Z

/bot run

flashinfer-bot · 2026-03-31T11:15:12Z

GitLab MR !478 has been created, and the CI pipeline #47344575 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2026-03-31T15:41:56Z

[FAILED] Pipeline #47344575: 7/20 passed

samuellees · 2026-04-01T06:58:47Z

/bot run

flashinfer-bot · 2026-04-01T06:59:49Z

GitLab MR !478 has been created, and the CI pipeline #47424795 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2026-04-01T17:08:43Z

[CANCELING] Pipeline #47424795: canceled

coderabbitai

🧹 Nitpick comments (1)

tests/attention/test_fmha_v2_prefill.py (1)
495-500: Consider making the SM90 condition explicit for clarity (optional).

The current skip logic is functionally correct—by this point in the code flow, FP8 dtype can only be reached on SM90A (since SM120+ FP8 is already skipped at lines 493-494). However, adding an explicit is_sm90a_supported() check would make the SM90-specific nature self-documenting and more robust if the surrounding guard structure ever changes.

This addresses the same concern from the prior review, but note that the current code is not actually over-suppressing tests—it's just implicit.
🔧 Optional refactor for explicit SM90 guard
-    if dtype == torch.float8_e4m3fn and head_dim == 256:
+    if (
+        is_sm90a_supported(torch.device("cuda"))
+        and dtype == torch.float8_e4m3fn
+        and head_dim == 256
+    ):
         pytest.skip(
             "todo(bobboli): fp8 with head_dim=256 hangs on SM90 tma_ws kernel due to "
             "barrier deadlock in transpose_v_tile "
             "(fmha_v2_flash_attention_e4m3_*_S_qkv_256_*_tma_ws_sm90_kernel+0xedf0)"
         )
As per coding guidelines: Use flashinfer.utils functions (get_compute_capability(), is_sm90a_supported(), is_sm100a_supported()) to skip tests on unsupported GPU architectures.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/attention/test_fmha_v2_prefill.py` around lines 495 - 500, The skip for
FP8 dtype with head_dim==256 should explicitly check for SM90 support to make
the guard self-documenting and robust: update the condition that currently tests
"dtype == torch.float8_e4m3fn and head_dim == 256" to also call the
flashinfer/utils helper (e.g., is_sm90a_supported() or check
get_compute_capability()) so the pytest.skip is only executed when running on
SM90A; reference the symbols dtype, torch.float8_e4m3fn, head_dim, pytest.skip,
and is_sm90a_supported() to locate and modify the condition.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/attention/test_fmha_v2_prefill.py`:
- Around line 495-500: The skip for FP8 dtype with head_dim==256 should
explicitly check for SM90 support to make the guard self-documenting and robust:
update the condition that currently tests "dtype == torch.float8_e4m3fn and
head_dim == 256" to also call the flashinfer/utils helper (e.g.,
is_sm90a_supported() or check get_compute_capability()) so the pytest.skip is
only executed when running on SM90A; reference the symbols dtype,
torch.float8_e4m3fn, head_dim, pytest.skip, and is_sm90a_supported() to locate
and modify the condition.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 33605633-6693-44c0-8419-be2726f14d25

📥 Commits

Reviewing files that changed from the base of the PR and between 284f025 and 5d129c7.

📒 Files selected for processing (1)

tests/attention/test_fmha_v2_prefill.py

samuellees · 2026-04-02T04:07:19Z

/bot run

flashinfer-bot · 2026-04-02T04:08:09Z

GitLab MR !478 has been updated with latest changes, and the CI pipeline #47507304 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2026-04-02T16:18:20Z

[FAILED] Pipeline #47507304: 10/20 passed

bobboli · 2026-04-02T16:22:13Z

/bot run

flashinfer-bot · 2026-04-02T16:22:26Z

@bobboli is not authorized to trigger this CI job. cc: @yzh119, @sricketts, @yongwww

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/attention/test_cudnn_prefill.py`:
- Around line 4-6: Replace the unconditional module skip (pytestmark =
pytest.mark.skip) with a targeted skipif that uses flashinfer.utils helpers
(get_compute_capability(), is_sm90a_supported(), is_sm100a_supported()) to
detect unsupported GPU architectures and skip only those environments, and/or
wrap the cuDNN prefill setup in a try/except that catches the specific
cudnnGraphNotSupportedError and calls pytest.skip with a clear message; update
the pytestmark or test setup code that references the module-level skip to
instead perform the architecture predicate check or the exception-based skip so
tests run on healthy hardware/CUDA combos.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c418df7f-b272-47b0-87a7-074450a46972

📥 Commits

Reviewing files that changed from the base of the PR and between 5d129c7 and 0fc48eb.

📒 Files selected for processing (1)

tests/attention/test_cudnn_prefill.py

jimmyzho · 2026-04-02T21:39:08Z

/bot run

flashinfer-bot · 2026-04-02T21:40:18Z

GitLab MR !478 has been updated with latest changes, and the CI pipeline #47576434 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2026-04-03T01:18:02Z

[FAILED] Pipeline #47576434: 4/20 passed

bobboli · 2026-04-03T04:03:06Z

/bot run

flashinfer-bot · 2026-04-03T04:03:36Z

GitLab MR !478 has been created, and the CI pipeline #47600263 is currently running. I'll report back once the pipeline job completes.

bobboli · 2026-04-03T16:24:03Z

/bot run

flashinfer-bot · 2026-04-03T16:24:44Z

GitLab MR !478 has been updated with latest changes, and the CI pipeline #47640862 is currently running. I'll report back once the pipeline job completes.

coderabbitai

♻️ Duplicate comments (1)

tests/attention/test_fmha_v2_prefill.py (1)
495-500: ⚠️ Potential issue | 🟡 Minor

Scope the fp8/head_dim=256 skip to SM90 explicitly.

Line 495 currently skips by dtype/head_dim only, while the reason is SM90-specific. Please gate it with is_sm90a_supported(...) so coverage isn’t unnecessarily masked if non-SM90 behavior changes.
🔧 Proposed fix
-    if dtype == torch.float8_e4m3fn and head_dim == 256:
+    if (
+        is_sm90a_supported(torch.device("cuda"))
+        and dtype == torch.float8_e4m3fn
+        and head_dim == 256
+    ):
         pytest.skip(
             "todo(bobboli): fp8 with head_dim=256 hangs on SM90 tma_ws kernel due to "
             "barrier deadlock in transpose_v_tile "
             "(fmha_v2_flash_attention_e4m3_*_S_qkv_256_*_tma_ws_sm90_kernel+0xedf0)"
         )
As per coding guidelines: Use flashinfer.utils functions (get_compute_capability(), is_sm90a_supported(), is_sm100a_supported()) to skip tests on unsupported GPU architectures.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/attention/test_fmha_v2_prefill.py` around lines 495 - 500, The skip
currently triggers for dtype == torch.float8_e4m3fn and head_dim == 256
regardless of GPU; change it to only skip when the SM90 architecture is present
by wrapping the condition with an SM90 check (use is_sm90a_supported(...) from
flashinfer.utils or combine get_compute_capability()/is_sm90a_supported()).
Update the if that calls pytest.skip so it requires dtype ==
torch.float8_e4m3fn, head_dim == 256, and is_sm90a_supported(...) to all be true
before skipping to avoid masking non-SM90 failures.

🧹 Nitpick comments (1)

tests/gemm/test_bmm_mxfp8.py (1)
5-7: Consider narrowing the skip to specific hardware combinations when root cause is identified.

The blanket module-level skip is reasonable as a temporary measure to unblock CI. However, as per coding guidelines, tests should use targeted skips with get_compute_capability(), is_sm90a_supported(), or is_sm100a_supported() to skip only on unsupported architectures.

The test already demonstrates this pattern at lines 26-32. Once the "too low cosine similarity" issue is root-caused, consider replacing this blanket skip with a targeted pytest.mark.skipif that checks the specific hardware combinations affected.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/gemm/test_bmm_mxfp8.py` around lines 5 - 7, The module-level blanket
skip assigned to pytestmark should be replaced with a targeted skipif once the
failing hardware is identified: remove or replace pytestmark =
pytest.mark.skip(...) and instead apply pytest.mark.skipif(...) using the
existing helpers such as get_compute_capability(), is_sm90a_supported(), and
is_sm100a_supported() (as used in the test at lines 26-32) to only skip on the
specific compute-capability/SM variants that exhibit the low cosine similarity;
update the skip reason to mention the exact hardware condition and retain the
test for other architectures.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@tests/attention/test_fmha_v2_prefill.py`:
- Around line 495-500: The skip currently triggers for dtype ==
torch.float8_e4m3fn and head_dim == 256 regardless of GPU; change it to only
skip when the SM90 architecture is present by wrapping the condition with an
SM90 check (use is_sm90a_supported(...) from flashinfer.utils or combine
get_compute_capability()/is_sm90a_supported()). Update the if that calls
pytest.skip so it requires dtype == torch.float8_e4m3fn, head_dim == 256, and
is_sm90a_supported(...) to all be true before skipping to avoid masking non-SM90
failures.

---

Nitpick comments:
In `@tests/gemm/test_bmm_mxfp8.py`:
- Around line 5-7: The module-level blanket skip assigned to pytestmark should
be replaced with a targeted skipif once the failing hardware is identified:
remove or replace pytestmark = pytest.mark.skip(...) and instead apply
pytest.mark.skipif(...) using the existing helpers such as
get_compute_capability(), is_sm90a_supported(), and is_sm100a_supported() (as
used in the test at lines 26-32) to only skip on the specific
compute-capability/SM variants that exhibit the low cosine similarity; update
the skip reason to mention the exact hardware condition and retain the test for
other architectures.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 18e657b0-1b99-44ef-9fc3-97fd7fef6a08

📥 Commits

Reviewing files that changed from the base of the PR and between 0fc48eb and 7e927f3.

📒 Files selected for processing (3)

tests/attention/test_cudnn_prefill.py
tests/attention/test_fmha_v2_prefill.py
tests/gemm/test_bmm_mxfp8.py

✅ Files skipped from review due to trivial changes (1)

tests/attention/test_cudnn_prefill.py

Replace the module-level pytestmark skip (which disabled all 2345 tests) with targeted per-case skips for the known hanging configurations: - fp8→fp8: already skipped (carried over from jimmyzho's PR flashinfer-ai#2781) - Sliding window (SWA): already skipped (carried over from PR flashinfer-ai#2781) - fp8 + head_dim=256: newly identified via compute-sanitizer synccheck Root cause of the fp8+head_dim=256 hang: barrier deadlock in the SM90 TMA warp-specialized kernel (transpose_v_tile in dma.h:672). The named barrier expects 128 threads but the head_dim=256 fp8 kernel configuration (STEP_KV=128, BMM2_K_GROUPS=1) creates a thread layout mismatch, causing divergent threads at: fmha_v2_flash_attention_e4m3_*_S_qkv_256_*_tma_ws_sm90_kernel+0xedf0 Validated with two sequential clean runs (712 passed, 1633 skipped each). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…efill_case Move the fp8+head_dim=256 skip from individual test functions into the shared run_trtllm_fmha_v2_prefill_case helper, consistent with where other config-based skips live. Also use todo(bobboli) format. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ror on some hardware/CUDA version combinations Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

…arity failures on some hardware Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

…accuracy failures on some hardware Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

…ilure on some hardware Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

bobboli · 2026-04-04T05:46:08Z

/bot run

flashinfer-bot · 2026-04-04T05:46:36Z

GitLab MR !478 has been updated with latest changes, and the CI pipeline #47694453 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2026-04-04T10:19:25Z

[FAILED] Pipeline #47694453: 8/20 passed

…tile config Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

bobboli · 2026-04-04T14:43:03Z

/bot run

flashinfer-bot · 2026-04-04T14:43:27Z

GitLab MR !478 has been updated with latest changes, and the CI pipeline #47718122 is currently running. I'll report back once the pipeline job completes.

samuellees · 2026-04-04T14:56:42Z


+    if torch.cuda.get_device_capability()[0] == 12:
+        pytest.skip(
+            "todo: CUTLASS MoE MXFP8xMXFP4 has no valid tile config for SM120 "


#2020 introduced SM120. It's wired because PR2020's CI passed

flashinfer-bot · 2026-04-04T19:24:48Z

[SUCCESS] Pipeline #47718122: 14/20 passed

aleozlx · 2026-04-05T22:48:06Z

thx for contrib
i think all of these are getting addressed soon if not already.
see some other recent bot run from today https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/pipelines/47787607

we may not need this, thus closing for now

bobboli requested review from bkryu, nv-yunzheq, saltyminty and yzh119 as code owners March 31, 2026 10:33

flashinfer-bot added the op: attention label Mar 31, 2026

bobboli changed the title ~~fix(tests): narrow skip scope in test_fmha_v2_prefill.py~~ ci: narrow skip scope in test_fmha_v2_prefill.py Mar 31, 2026

gemini-code-assist bot reviewed Mar 31, 2026

View reviewed changes

Comment thread tests/attention/test_fmha_v2_prefill.py Outdated

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

Comment thread tests/attention/test_fmha_v2_prefill.py Outdated

jiahanc added the run-ci label Mar 31, 2026

bobboli force-pushed the fix/narrow-fmha-v2-prefill-test-skip branch from 284f025 to 5d129c7 Compare April 2, 2026 03:32

coderabbitai bot reviewed Apr 2, 2026

View reviewed changes

bobboli mentioned this pull request Apr 2, 2026

[fix] Fix barrier deadlock in fmha_v2 fp8+head_dim=256 transpose_v_tile #2957

Open

coderabbitai bot reviewed Apr 2, 2026

View reviewed changes

Comment thread tests/attention/test_cudnn_prefill.py

bobboli requested a review from aleozlx as a code owner April 3, 2026 16:21

flashinfer-bot added the op: gemm label Apr 3, 2026

bobboli force-pushed the fix/narrow-fmha-v2-prefill-test-skip branch from 67ce95c to 7e927f3 Compare April 3, 2026 16:23

coderabbitai bot reviewed Apr 3, 2026

View reviewed changes

bobboli changed the title ~~ci: narrow skip scope in test_fmha_v2_prefill.py~~ ci(tests): waive flaky/hardware-incompatible tests to unblock CI Apr 4, 2026

bobboli and others added 7 commits April 4, 2026 05:41

ci(tests): skip test_cudnn_prefill.py due to cudnnGraphNotSupportedEr…

b628e7d

…ror on some hardware/CUDA version combinations Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

ci(tests): skip test_bmm_mxfp8.py due to cuDNN MXFP8 BMM cosine simil…

8b1e580

…arity failures on some hardware Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

ci(tests): skip test_mm_mxfp8.py and test_mm_fp4.py due to numerical …

4ac115a

…accuracy failures on some hardware Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

ci(tests): skip test_non_contiguous_prefill.py due to JIT .so load fa…

025547f

…ilure on some hardware Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

ci(tests): add job links to pytest.mark.skip reasons

1596f62

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

bobboli force-pushed the fix/narrow-fmha-v2-prefill-test-skip branch from 8bc7b37 to 1596f62 Compare April 4, 2026 05:42

ci(tests): skip test_moe_mxfp8_mxfp4 on SM120 due to missing CUTLASS …

9b05111

…tile config Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

bobboli requested review from IwakuraRein, jiahanc and samuellees as code owners April 4, 2026 14:07

flashinfer-bot added the op: moe label Apr 4, 2026

bobboli changed the title ~~ci(tests): waive flaky/hardware-incompatible tests to unblock CI~~ ci(tests): waive flaky/hardware-incompatible tests to cleanup CI Apr 4, 2026

samuellees reviewed Apr 4, 2026

View reviewed changes

aleozlx closed this Apr 5, 2026

Conversation

bobboli commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Waived tests

1. tests/attention/test_fmha_v2_prefill.py — narrow module-level skip to per-case skips

2. tests/attention/test_cudnn_prefill.py — fully skipped

3. tests/gemm/test_bmm_mxfp8.py — fully skipped

4. tests/attention/test_non_contiguous_prefill.py — fully skipped

5. tests/gemm/test_mm_mxfp8.py — fully skipped

6. tests/gemm/test_mm_fp4.py — fully skipped

7. tests/moe/test_trtllm_cutlass_fused_moe.py::test_moe_mxfp8_mxfp4 — skipped on SM120

Uh oh!

coderabbitai bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jiahanc commented Mar 31, 2026

Uh oh!

flashinfer-bot commented Mar 31, 2026

Uh oh!

flashinfer-bot commented Mar 31, 2026

Uh oh!

samuellees commented Apr 1, 2026

Uh oh!

flashinfer-bot commented Apr 1, 2026

Uh oh!

flashinfer-bot commented Apr 1, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

samuellees commented Apr 2, 2026

Uh oh!

flashinfer-bot commented Apr 2, 2026

Uh oh!

flashinfer-bot commented Apr 2, 2026

Uh oh!

bobboli commented Apr 2, 2026

Uh oh!

flashinfer-bot commented Apr 2, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jimmyzho commented Apr 2, 2026

Uh oh!

flashinfer-bot commented Apr 2, 2026

Uh oh!

flashinfer-bot commented Apr 3, 2026

Uh oh!

bobboli commented Apr 3, 2026

Uh oh!

flashinfer-bot commented Apr 3, 2026

Uh oh!

bobboli commented Apr 3, 2026

Uh oh!

flashinfer-bot commented Apr 3, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

bobboli commented Mar 31, 2026 •

edited

Loading

1. `tests/attention/test_fmha_v2_prefill.py` — narrow module-level skip to per-case skips

2. `tests/attention/test_cudnn_prefill.py` — fully skipped

3. `tests/gemm/test_bmm_mxfp8.py` — fully skipped

4. `tests/attention/test_non_contiguous_prefill.py` — fully skipped

5. `tests/gemm/test_mm_mxfp8.py` — fully skipped

6. `tests/gemm/test_mm_fp4.py` — fully skipped

7. `tests/moe/test_trtllm_cutlass_fused_moe.py::test_moe_mxfp8_mxfp4` — skipped on SM120

coderabbitai bot commented Mar 31, 2026 •

edited

Loading