Skip to content

Fix illegal memory access in FA2 varlen SplitKV early-exit LSE write#139

Open
wangyxbh wants to merge 1 commit into
vllm-project:mainfrom
wangyxbh:fix-fa2-varlen-splitkv-early-exit-lse
Open

Fix illegal memory access in FA2 varlen SplitKV early-exit LSE write#139
wangyxbh wants to merge 1 commit into
vllm-project:mainfrom
wangyxbh:fix-fa2-varlen-splitkv-early-exit-lse

Conversation

@wangyxbh
Copy link
Copy Markdown

@wangyxbh wangyxbh commented May 18, 2026

Summary

Fix the LSE write offset used by the FA2 SplitKV early-exit path when writing directly to unpadded softmax_lse.

Upstream varlen forward stores LSE in the packed unpadded layout by setting params.unpadded_lse = true. The normal epilogue path already handles this layout, but the SplitKV early-exit path still uses the padded (batch, head, seqlen_q) offset.

Problem

When running a Qwen 235B model on a single node with 8x L40S GPUs after enabling DCP, execution can hang and eventually report:

torch.AcceleratorError: CUDA error: an illegal memory access was encountered

The issue is in the FA2 SplitKV early-exit path. It always computes row_offset_lseaccum with the padded SplitKV-style layout:

((n_split_idx * b + bidb) * h + bidh) * seqlen_q + m_block * kBlockM

That layout is correct for SplitKV/LSE accumulation buffers, but not for direct unpadded LSE output in varlen mode.

For varlen forward, softmax_lse is allocated as {num_heads, total_q} and params.unpadded_lse is set to true. In this case, the early-exit path must use the same packed varlen LSE layout as the normal epilogue path.

Fix

Use the packed varlen LSE offset when writing directly to unpadded softmax_lse:

bidh * params.total_q + binfo.q_offset(params.seqlen_q, 1, bidb) + m_block * kBlockM

Keep the existing padded offset for SplitKV accumulation buffers and padded LSE output.

Testing

Not run locally.

@wangyxbh wangyxbh changed the title Fix FA2 varlen SplitKV early-exit LSE offset Fix illegal memory access in FA2 varlen SplitKV early-exit LSE write May 18, 2026
Signed-off-by: wangyxbh <wangyxbh@digitalchina.com>
@wangyxbh wangyxbh force-pushed the fix-fa2-varlen-splitkv-early-exit-lse branch from 74cd0aa to 17b8ccc Compare May 18, 2026 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant