[Bugfix][Kernel] fix merge attn states when both prefix and suffix are empty#28181
Conversation
Signed-off-by: courage17340 <courage17340@163.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a bugfix in the merge_attn_states CUDA kernel to handle an edge case where both prefix and suffix log-sum-exp (LSE) values are negative infinity. This prevents the generation of NaN values. The approach of checking for isinf(max_lse) and handling it separately is correct. I have one suggestion to make the fix more robust by explicitly zeroing out the output tensor in this edge case, rather than relying on an assumption about the contents of prefix_output.
| // Pack 128b load | ||
| pack_128b_t p_out_pack = reinterpret_cast<const pack_128b_t*>( | ||
| prefix_head_ptr)[pack_offset / pack_size]; | ||
|
|
||
| // Pack 128b storage | ||
| reinterpret_cast<pack_128b_t*>(output_head_ptr)[pack_offset / pack_size] = | ||
| p_out_pack; |
There was a problem hiding this comment.
While the comment mentions that prefix_output is expected to be all zeros, relying on this assumption makes the code less robust. If p_lse and s_lse are both -inf, the combined attention output should mathematically be zero. It's safer to explicitly write zeros to the output here rather than copying prefix_output. This ensures correctness even if the assumption about prefix_output does not hold in some unforeseen cases.
// When max_lse is -inf, the output should be zero.
const pack_128b_t zero_pack = {0, 0, 0, 0};
// Explicitly zero out the output for robustness, instead of
// copying prefix_output.
reinterpret_cast<pack_128b_t*>(output_head_ptr)[pack_offset / pack_size] =
zero_pack;
LucasWilkinson
left a comment
There was a problem hiding this comment.
Overall looks good; but I think a simpler solution could be:
max_lse = std::isinf(max_lse) ? 0 : max_lse
That solution can make p_se=s_se=out_se=0 and then p_scale=s_scale=0/0=nan, which also needs more actions to deal with corner cases. |
|
this is clearly a bugfix. the failing lm eval small models looks strange, we need to investigate further. |
…e empty (vllm-project#28181) Signed-off-by: courage17340 <courage17340@163.com>
…e empty (vllm-project#28181) Signed-off-by: courage17340 <courage17340@163.com>
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.