Skip to content

[CI][Bugfix] Fix CI Failure Step "Basic Models Tests (Extra Initialization) 1 & 2"#42154

Closed
haosdent wants to merge 1 commit intovllm-project:mainfrom
haosdent:ci-0722062a
Closed

[CI][Bugfix] Fix CI Failure Step "Basic Models Tests (Extra Initialization) 1 & 2"#42154
haosdent wants to merge 1 commit intovllm-project:mainfrom
haosdent:ci-0722062a

Conversation

@haosdent
Copy link
Copy Markdown
Contributor

@haosdent haosdent commented May 9, 2026

Purpose

Fixes Buildkite #65314 — Basic Models Tests (Extra Initialization).

The @torch.compile(fullgraph=True) decorators added by #40711 on prepare_gdn_attention_core_inputs and rearrange_mixed_qkv crash CUDA-graph capture for Qwen3.5 MTP / MoeMTP: Inductor's first-call Triton autotune calls torch.cuda.synchronize(), which is illegal inside stream capture. The non-spec path is autotuned during eager warmup, but mixed_qkv_spec is None then and only becomes a tensor during capture — so the spec-path autotune fires inside torch.cuda.graph(...)cudaErrorStreamCaptureInvalidated → engine core dies.

Drop the two decorators. The cat-then-slice bodies were designed for compile fusion and pessimize eager mode, so simplify to plain split/contiguous/view.

Test Plan

pytest tests/models/test_initialization.py::test_can_initialize_large_subset \
    -k 'Qwen3_5MTP or Qwen3_5MoeMTP' -v

Test Result

Result Time
Before patch FAILED ... cudaErrorStreamCaptureInvalidated 114.8 s
After patch 1 passed 48.2 s

(GB10 / SM12.1, dense Qwen3_5MTP only — Qwen3_5MoeMTP shares the exact same code path through qwen3_next.py:503 forwardgdn_attention_corerearrange_mixed_qkv.)

The @torch.compile(fullgraph=True) decorators added in vllm-project#40711 on
prepare_gdn_attention_core_inputs and rearrange_mixed_qkv crash CUDA-graph
capture for Qwen3.5 MTP / MoeMTP: Inductor's first-call Triton autotune
runs torch.cuda.synchronize(), which is illegal during stream capture.
The non-spec path is autotuned during eager warmup; the spec path's
mixed_qkv_spec is None during warmup and only becomes a tensor during
capture, so autotune fires inside torch.cuda.graph(...) and the engine
core dies with cudaErrorStreamCaptureInvalidated.

Removing the decorators fixes the crash. The cat-then-slice bodies were
pessimizations without compile fusion, so simplify them to plain
split/contiguous/view (and drop the fused round-trip in
prepare_gdn_attention_core_inputs).

Signed-off-by: haosdent <haosdent@hotmail.com>

Signed-off-by: haosdent <haosdent@gmail.com>
@mergify mergify Bot added the bug Something isn't working label May 9, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request simplifies the prepare_gdn_attention_core_inputs and rearrange_mixed_qkv methods by removing @torch.compile decorators and replacing complex concatenation-based contiguity logic with more straightforward reshape and contiguous calls. Feedback suggests explicitly adding .contiguous() to the reshaped outputs in prepare_gdn_attention_core_inputs to ensure memory layout compatibility with downstream kernels that strictly require contiguous memory.

Comment thread vllm/model_executor/layers/mamba/gdn_linear_attn.py
@haosdent
Copy link
Copy Markdown
Contributor Author

haosdent commented May 9, 2026

@tpopp @ChuanLi1101 may you help to review this, this is try to fix the CI failure "Basic Models Tests (Extra Initialization) x" related to #40711

@haosdent haosdent changed the title [WIP][Bugfix] Drop @torch.compile from GDN qkv reshape helpers [CI][Bugfix] Fix CI Failure Step "Basic Models Tests (Extra Initialization) 1 & 2" May 9, 2026
@haosdent haosdent marked this pull request as ready for review May 9, 2026 09:50
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mgoin
Copy link
Copy Markdown
Member

mgoin commented May 9, 2026

@vadiklyutiy @tjtanaa can you help with this fix for ci failure? Completely removing compile doesn't seem great

@ZJY0516
Copy link
Copy Markdown
Member

ZJY0516 commented May 9, 2026

@vadiklyutiy @tjtanaa can you help with this fix for ci failure? Completely removing compile doesn't seem great

fixed in #42070

Completely removing compile doesn't seem great

agreed, but it's acceptable to do so to unblock ci, or we will have to revert pr that causes this

@tjtanaa
Copy link
Copy Markdown
Collaborator

tjtanaa commented May 10, 2026

Let me take a look today.

@DarkLight1337 DarkLight1337 requested review from Isotr0py and ywang96 May 10, 2026 02:44
@haosdent
Copy link
Copy Markdown
Contributor Author

Thanks all, didn't notice that PR before, let me close mine

@haosdent haosdent closed this May 10, 2026
@tjtanaa
Copy link
Copy Markdown
Collaborator

tjtanaa commented May 10, 2026

Thanks all, didn't notice that PR before, let me close mine

yup, this test group is running fine on main now after the other PR bugfix. https://buildkite.com/vllm/ci/builds/65423/canvas?sid=019e107a-23ca-47ea-bc60-22d1590d15f2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants