[Bugfix][Kernel] fix merge attn states when both prefix and suffix are empty by courage17340 · Pull Request #28181 · vllm-project/vllm

courage17340 · 2025-11-06T03:43:20Z

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: courage17340 <courage17340@163.com>

gemini-code-assist

Code Review

This pull request introduces a bugfix in the merge_attn_states CUDA kernel to handle an edge case where both prefix and suffix log-sum-exp (LSE) values are negative infinity. This prevents the generation of NaN values. The approach of checking for isinf(max_lse) and handling it separately is correct. I have one suggestion to make the fix more robust by explicitly zeroing out the output tensor in this edge case, rather than relying on an assumption about the contents of prefix_output.

gemini-code-assist · 2025-11-06T03:45:23Z

csrc/attention/merge_attn_states.cu

+      // Pack 128b load
+      pack_128b_t p_out_pack = reinterpret_cast<const pack_128b_t*>(
+          prefix_head_ptr)[pack_offset / pack_size];
+
+      // Pack 128b storage
+      reinterpret_cast<pack_128b_t*>(output_head_ptr)[pack_offset / pack_size] =
+          p_out_pack;


While the comment mentions that prefix_output is expected to be all zeros, relying on this assumption makes the code less robust. If p_lse and s_lse are both -inf, the combined attention output should mathematically be zero. It's safer to explicitly write zeros to the output here rather than copying prefix_output. This ensures correctness even if the assumption about prefix_output does not hold in some unforeseen cases.

// When max_lse is -inf, the output should be zero. const pack_128b_t zero_pack = {0, 0, 0, 0}; // Explicitly zero out the output for robustness, instead of // copying prefix_output. reinterpret_cast<pack_128b_t*>(output_head_ptr)[pack_offset / pack_size] = zero_pack;

LucasWilkinson

Overall looks good; but I think a simpler solution could be:

max_lse = std::isinf(max_lse) ? 0 : max_lse

courage17340 · 2025-11-06T05:33:25Z

Overall looks good; but I think a simpler solution could be:
max_lse = std::isinf(max_lse) ? 0 : max_lse

That solution can make p_se=s_se=out_se=0 and then p_scale=s_scale=0/0=nan, which also needs more actions to deal with corner cases.

youkaichao · 2025-11-06T09:51:59Z

this is clearly a bugfix. the failing lm eval small models looks strange, we need to investigate further.

…e empty (vllm-project#28181) Signed-off-by: courage17340 <courage17340@163.com>

fix merge attn states

6402d5a

Signed-off-by: courage17340 <courage17340@163.com>

gemini-code-assist bot reviewed Nov 6, 2025

View reviewed changes

youkaichao changed the title ~~[Bugfix][Kernel] fix merge attn states~~ [Bugfix][Kernel] fix merge attn states when both prefix and suffix are empty Nov 6, 2025

youkaichao mentioned this pull request Nov 6, 2025

[Bug]: NaN's in MLA with chunked-prefill #27491

Closed

1 task

LucasWilkinson approved these changes Nov 6, 2025

View reviewed changes

youkaichao enabled auto-merge (squash) November 6, 2025 05:49

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 6, 2025

youkaichao disabled auto-merge November 6, 2025 09:52

youkaichao merged commit 981cadb into vllm-project:main Nov 6, 2025
91 of 95 checks passed

ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025

[Bugfix][Kernel] fix merge attn states when both prefix and suffix ar…

9cb0779

…e empty (vllm-project#28181) Signed-off-by: courage17340 <courage17340@163.com>

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[Bugfix][Kernel] fix merge attn states when both prefix and suffix ar…

162e80c

…e empty (vllm-project#28181) Signed-off-by: courage17340 <courage17340@163.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix][Kernel] fix merge attn states when both prefix and suffix are empty#28181

[Bugfix][Kernel] fix merge attn states when both prefix and suffix are empty#28181
youkaichao merged 1 commit intovllm-project:mainfrom
courage17340:courage/fix-merge-attn

courage17340 commented Nov 6, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 6, 2025

Uh oh!

LucasWilkinson left a comment

Uh oh!

courage17340 commented Nov 6, 2025

Uh oh!

youkaichao commented Nov 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

courage17340 commented Nov 6, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

LucasWilkinson left a comment

Choose a reason for hiding this comment

Uh oh!

courage17340 commented Nov 6, 2025

Uh oh!

youkaichao commented Nov 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

courage17340 commented Nov 6, 2025 •

edited by github-actions bot

Loading