Skip to content

[AMD] Fix Qwen3-Coder-Next: Add missing k_scale/v_scale args to extend_attention_fwd in aiter_backend#19736

Merged
HaiShaw merged 1 commit intomainfrom
amd/fix-aiter-mla-replay-cuda-graph-v2
Mar 4, 2026
Merged

[AMD] Fix Qwen3-Coder-Next: Add missing k_scale/v_scale args to extend_attention_fwd in aiter_backend#19736
HaiShaw merged 1 commit intomainfrom
amd/fix-aiter-mla-replay-cuda-graph-v2

Conversation

@michaelzhang-ai
Copy link
Collaborator

@michaelzhang-ai michaelzhang-ai commented Mar 3, 2026

Summary

Motivation

#18882 added k_scale and v_scale as required positional parameters to extend_attention_fwd in triton_ops/extend_attention.py and updated triton_backend.py, but missed the call site in aiter_backend.py for non-MLA target_verify/draft_extend paths (used by hybrid models like Qwen3-Coder-Next with MTP).

Fixes

Nightly (AMD ROCm) failure: https://github.com/sgl-project/sglang/actions/runs/22648816293/job/65643197339#step:5:6363

TypeError: extend_attention_fwd() missing 1 required positional argument: 'v_scale'

Test plan

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Collaborator

@yichiche yichiche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR fixes a crash in the CUDA graph replay path for non-MLA backends by properly initializing the forward metadata. Previously, custom_mask and mask_indptr were not consistently set outside the MLA path, which could lead to invalid memory access during replay. LGTM.

@michaelzhang-ai michaelzhang-ai marked this pull request as ready for review March 3, 2026 03:00
@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@HaiShaw
Copy link
Collaborator

HaiShaw commented Mar 3, 2026

/tag-and-rerun-ci

…d call

#18882 added k_scale and v_scale as required positional parameters
to extend_attention_fwd and updated triton_backend.py, but missed
updating the call site in aiter_backend.py for non-MLA
target_verify/draft_extend paths.

This caused Qwen3-Coder-Next MTP to crash with:
  TypeError: extend_attention_fwd() missing 1 required positional argument: 'v_scale'

Fixes: https://github.com/sgl-project/sglang/actions/runs/22636393480/job/65600256830
@michaelzhang-ai michaelzhang-ai force-pushed the amd/fix-aiter-mla-replay-cuda-graph-v2 branch from 6aedf5d to 9cf021b Compare March 4, 2026 00:22
@michaelzhang-ai michaelzhang-ai changed the title [AMD] Fix Qwen3-Coder-Next: AiterAttnBackend crash for non-MLA backends [AMD] Fix Qwen3-Coder-Next: Add missing k_scale/v_scale args to extend_attention_fwd in aiter_backend Mar 4, 2026
Copy link
Collaborator

@kkHuang-amd kkHuang-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HaiShaw HaiShaw merged commit c6850ac into main Mar 4, 2026
175 of 198 checks passed
@HaiShaw HaiShaw deleted the amd/fix-aiter-mla-replay-cuda-graph-v2 branch March 4, 2026 06:01
Kangyan-Zhou pushed a commit to Kangyan-Zhou/sglang that referenced this pull request Mar 4, 2026
qeternity pushed a commit to qeternity/sglang that referenced this pull request Mar 6, 2026
magicYang1573 pushed a commit to magicYang1573/sglang that referenced this pull request Mar 9, 2026
Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants