Skip to content

[Bugfix][Kernel] fix merge attn states when both prefix and suffix are empty#28181

Merged
youkaichao merged 1 commit intovllm-project:mainfrom
courage17340:courage/fix-merge-attn
Nov 6, 2025
Merged

[Bugfix][Kernel] fix merge attn states when both prefix and suffix are empty#28181
youkaichao merged 1 commit intovllm-project:mainfrom
courage17340:courage/fix-merge-attn

Conversation

@courage17340
Copy link
Copy Markdown
Contributor

@courage17340 courage17340 commented Nov 6, 2025

Purpose

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: courage17340 <courage17340@163.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a bugfix in the merge_attn_states CUDA kernel to handle an edge case where both prefix and suffix log-sum-exp (LSE) values are negative infinity. This prevents the generation of NaN values. The approach of checking for isinf(max_lse) and handling it separately is correct. I have one suggestion to make the fix more robust by explicitly zeroing out the output tensor in this edge case, rather than relying on an assumption about the contents of prefix_output.

Comment on lines +60 to +66
// Pack 128b load
pack_128b_t p_out_pack = reinterpret_cast<const pack_128b_t*>(
prefix_head_ptr)[pack_offset / pack_size];

// Pack 128b storage
reinterpret_cast<pack_128b_t*>(output_head_ptr)[pack_offset / pack_size] =
p_out_pack;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While the comment mentions that prefix_output is expected to be all zeros, relying on this assumption makes the code less robust. If p_lse and s_lse are both -inf, the combined attention output should mathematically be zero. It's safer to explicitly write zeros to the output here rather than copying prefix_output. This ensures correctness even if the assumption about prefix_output does not hold in some unforeseen cases.

      // When max_lse is -inf, the output should be zero.
      const pack_128b_t zero_pack = {0, 0, 0, 0};

      // Explicitly zero out the output for robustness, instead of
      // copying prefix_output.
      reinterpret_cast<pack_128b_t*>(output_head_ptr)[pack_offset / pack_size] =
          zero_pack;

@youkaichao youkaichao changed the title [Bugfix][Kernel] fix merge attn states [Bugfix][Kernel] fix merge attn states when both prefix and suffix are empty Nov 6, 2025
Copy link
Copy Markdown
Collaborator

@LucasWilkinson LucasWilkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good; but I think a simpler solution could be:

max_lse = std::isinf(max_lse) ? 0 : max_lse

@courage17340
Copy link
Copy Markdown
Contributor Author

Overall looks good; but I think a simpler solution could be:

max_lse = std::isinf(max_lse) ? 0 : max_lse

That solution can make p_se=s_se=out_se=0 and then p_scale=s_scale=0/0=nan, which also needs more actions to deal with corner cases.

@youkaichao youkaichao enabled auto-merge (squash) November 6, 2025 05:49
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 6, 2025
@youkaichao
Copy link
Copy Markdown
Member

this is clearly a bugfix. the failing lm eval small models looks strange, we need to investigate further.

@youkaichao youkaichao disabled auto-merge November 6, 2025 09:52
@youkaichao youkaichao merged commit 981cadb into vllm-project:main Nov 6, 2025
91 of 95 checks passed
ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025
…e empty (vllm-project#28181)

Signed-off-by: courage17340 <courage17340@163.com>
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
…e empty (vllm-project#28181)

Signed-off-by: courage17340 <courage17340@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants