[BugFix] ChunkedLocalAttention is currently not CG compatible #26034

LucasWilkinson · 2025-10-01T16:17:03Z

potential fix for: #25960

Signed-off-by: Lucas Wilkinson <[email protected]>

mgoin · 2025-10-01T16:24:13Z

Verified it does work right before capture

vllm serve RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16 --load-format dummy --max-model-len 32K
...
(EngineCore_DP0 pid=2401224) WARNING 10-01 12:23:25 [gpu_model_runner.py:3772] CUDAGraphMode.FULL_AND_PIECEWISE is not supported with ChunkedLocalAttentionBuilder backend (support: AttentionCGSupport.NEVER); setting cudagraph_mode=PIECEWISE because attention is compiled piecewise
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 100%|██████████████████████████████████████████████████████████████████████████████| 67/67 [00:07<00:00,  8.85it/s]
(EngineCore_DP0 pid=2401224) INFO 10-01 12:23:33 [gpu_model_runner.py:3583] Graph capturing finished in 8 secs, took 0.52 GiB

mgoin

Thanks for the better approach

cjackal · 2025-10-01T16:38:08Z

I wonder if this PR has a chance to be cherry-picked for 0.11.0? If so, I'd like to note that there's another llama 4 performance bugfix at #25889 which would be cherry-picked together.

mgoin · 2025-10-01T23:27:51Z

Hybrid test failures are also on main, merging

Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: simon-mo <[email protected]>

…roject#26034) Signed-off-by: Lucas Wilkinson <[email protected]>

Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: yewentao256 <[email protected]>

…roject#26034) Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: Tomer Asida <[email protected]>

…roject#26034) Signed-off-by: Lucas Wilkinson <[email protected]>

…roject#26034) Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

…roject#26034) Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: simon-mo <[email protected]>

) Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: simon-mo <[email protected]>

…roject#26034) Signed-off-by: Lucas Wilkinson <[email protected]>

…roject#26034) Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

…roject#26034) Signed-off-by: Lucas Wilkinson <[email protected]>

ChunkedLocalAttention is currently not CG compatible

88f2622

Signed-off-by: Lucas Wilkinson <[email protected]>

LucasWilkinson mentioned this pull request Oct 1, 2025

[Bugfix] Don't use FULL_AND_PIECEWISE cudagraph for Llama4 #26033

Closed

5 tasks

mgoin marked this pull request as ready for review October 1, 2025 16:25

mgoin approved these changes Oct 1, 2025

View reviewed changes

mgoin added bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed llama Related to Llama models labels Oct 1, 2025

mgoin added this to Llama Issues & Bugs Oct 1, 2025

github-project-automation bot moved this to To Triage in Llama Issues & Bugs Oct 1, 2025

mgoin removed this from Llama Issues & Bugs Oct 1, 2025

tlrmchlsmth approved these changes Oct 1, 2025

View reviewed changes

tlrmchlsmth added this to the v0.11.0 Cherry Picks milestone Oct 1, 2025

vllm-bot merged commit 4134312 into vllm-project:main Oct 1, 2025
48 of 51 checks passed

cjackal mentioned this pull request Oct 2, 2025

[Bug]: llama 4 family is incompatible with CUDA graph FULL_AND_PIECEWISE mode #25960

Closed

1 task

simon-mo pushed a commit that referenced this pull request Oct 2, 2025

[BugFix] ChunkedLocalAttention is currently not CG compatible (#26034)

c536881

Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: simon-mo <[email protected]>

pdasigi pushed a commit to pdasigi/vllm that referenced this pull request Oct 2, 2025

[BugFix] ChunkedLocalAttention is currently not CG compatible (vllm-p…

4095627

…roject#26034) Signed-off-by: Lucas Wilkinson <[email protected]>

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

[BugFix] ChunkedLocalAttention is currently not CG compatible (#26034)

ac1598d

Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: yewentao256 <[email protected]>

tomeras91 pushed a commit to tomeras91/vllm that referenced this pull request Oct 6, 2025

[BugFix] ChunkedLocalAttention is currently not CG compatible (vllm-p…

4fff719

…roject#26034) Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: Tomer Asida <[email protected]>

southfreebird pushed a commit to southfreebird/vllm that referenced this pull request Oct 7, 2025

[BugFix] ChunkedLocalAttention is currently not CG compatible (vllm-p…

8b1d3ff

…roject#26034) Signed-off-by: Lucas Wilkinson <[email protected]>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[BugFix] ChunkedLocalAttention is currently not CG compatible (vllm-p…

0c1a26b

…roject#26034) Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025

[BugFix] ChunkedLocalAttention is currently not CG compatible (vllm-p…

1f4cf51

…roject#26034) Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: simon-mo <[email protected]>

shyeh25 pushed a commit to shyeh25/vllm that referenced this pull request Oct 14, 2025

ChunkedLocalAttention is currently not CG compatible (vllm-project#26034

c5687a8

) Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: simon-mo <[email protected]>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[BugFix] ChunkedLocalAttention is currently not CG compatible (vllm-p…

9a1603d

…roject#26034) Signed-off-by: Lucas Wilkinson <[email protected]>

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

[BugFix] ChunkedLocalAttention is currently not CG compatible (vllm-p…

4554632

…roject#26034) Signed-off-by: Lucas Wilkinson <[email protected]>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[BugFix] ChunkedLocalAttention is currently not CG compatible (vllm-p…

79b2cd0

…roject#26034) Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[BugFix] ChunkedLocalAttention is currently not CG compatible (vllm-p…

f7c48e4

…roject#26034) Signed-off-by: Lucas Wilkinson <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BugFix] ChunkedLocalAttention is currently not CG compatible #26034

[BugFix] ChunkedLocalAttention is currently not CG compatible #26034

Uh oh!

LucasWilkinson commented Oct 1, 2025

Uh oh!

mgoin commented Oct 1, 2025 •

edited

Loading

Uh oh!

mgoin left a comment

Uh oh!

cjackal commented Oct 1, 2025

Uh oh!

mgoin commented Oct 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

[BugFix] ChunkedLocalAttention is currently not CG compatible #26034

[BugFix] ChunkedLocalAttention is currently not CG compatible #26034

Uh oh!

Conversation

LucasWilkinson commented Oct 1, 2025

Uh oh!

mgoin commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

cjackal commented Oct 1, 2025

Uh oh!

mgoin commented Oct 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mgoin commented Oct 1, 2025 •

edited

Loading