[0.13.0][cherry-pick][bugfix](CP,MLA) fix wrong slot_mapping of decode for mixed p/d batch#6346
Merged
wangxiyuan merged 1 commit intovllm-project:releases/v0.13.0from Jan 29, 2026
Conversation
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Contributor
There was a problem hiding this comment.
Code Review
This pull request addresses a bug in how slot_mapping is handled for decode tokens in mixed prefill/decode batches under context parallelism. The change correctly removes a condition that limited a necessary slot_mapping adjustment to decode-only batches. By applying this logic to mixed batches as well, the slot_mapping for decode tokens is now correctly computed. The fix is well-targeted and appears correct. I have no further comments.
starmountain1997
pushed a commit
to starmountain1997/vllm-ascend
that referenced
this pull request
Jan 31, 2026
…e for mixed p/d batch (vllm-project#6346) ### What this PR does / why we need it? PR vllm-project#5672 attempted to remove the -1 padding for duplicate tokens in the decode slot_mapping when adapting PCP for MLAPO, and adopted a simpler slicing approach. However, in the single-ops logic and mixed PD batches, the decode slot_mapping did not eliminate the -1 and also shared the slicing method, resulting in incorrect slot_mapping. This PR resolves this issue, and the logic will be further consolidated in subsequent refactoring PRs. ref: vllm-project#6344 Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it?
PR #5672 attempted to remove the -1 padding for duplicate tokens in the decode slot_mapping when adapting PCP for MLAPO, and adopted a simpler slicing approach. However, in the single-ops logic and mixed PD batches, the decode slot_mapping did not eliminate the -1 and also shared the slicing method, resulting in incorrect slot_mapping. This PR resolves this issue, and the logic will be further consolidated in subsequent refactoring PRs.
ref: #6344