Skip to content

[0.13.0][cherry-pick][bugfix](CP,MLA) fix wrong slot_mapping of decode for mixed p/d batch#6346

Merged
wangxiyuan merged 1 commit intovllm-project:releases/v0.13.0from
pisceskkk:pcp/mla/bugfix-013
Jan 29, 2026
Merged

[0.13.0][cherry-pick][bugfix](CP,MLA) fix wrong slot_mapping of decode for mixed p/d batch#6346
wangxiyuan merged 1 commit intovllm-project:releases/v0.13.0from
pisceskkk:pcp/mla/bugfix-013

Conversation

@pisceskkk
Copy link
Copy Markdown
Contributor

What this PR does / why we need it?

PR #5672 attempted to remove the -1 padding for duplicate tokens in the decode slot_mapping when adapting PCP for MLAPO, and adopted a simpler slicing approach. However, in the single-ops logic and mixed PD batches, the decode slot_mapping did not eliminate the -1 and also shared the slicing method, resulting in incorrect slot_mapping. This PR resolves this issue, and the logic will be further consolidated in subsequent refactoring PRs.
ref: #6344

Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug in how slot_mapping is handled for decode tokens in mixed prefill/decode batches under context parallelism. The change correctly removes a condition that limited a necessary slot_mapping adjustment to decode-only batches. By applying this logic to mixed batches as well, the slot_mapping for decode tokens is now correctly computed. The fix is well-targeted and appears correct. I have no further comments.

@weiguihua2 weiguihua2 added ready read for review ready-for-test start test by label for PR labels Jan 29, 2026
@wangxiyuan wangxiyuan merged commit 6ba7a5a into vllm-project:releases/v0.13.0 Jan 29, 2026
19 checks passed
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
…e for mixed p/d batch (vllm-project#6346)

### What this PR does / why we need it?
PR vllm-project#5672 attempted to remove the -1 padding for duplicate tokens in the
decode slot_mapping when adapting PCP for MLAPO, and adopted a simpler
slicing approach. However, in the single-ops logic and mixed PD batches,
the decode slot_mapping did not eliminate the -1 and also shared the
slicing method, resulting in incorrect slot_mapping. This PR resolves
this issue, and the logic will be further consolidated in subsequent
refactoring PRs.
ref: vllm-project#6344

Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
@pisceskkk pisceskkk deleted the pcp/mla/bugfix-013 branch February 3, 2026 02:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants