Skip to content

[bugfix] adapt_remote_request_id#6051

Merged
jianzs merged 1 commit intovllm-project:mainfrom
ghphotoframe:adapt_remote_request_id
Jan 22, 2026
Merged

[bugfix] adapt_remote_request_id#6051
jianzs merged 1 commit intovllm-project:mainfrom
ghphotoframe:adapt_remote_request_id

Conversation

@ghphotoframe
Copy link
Copy Markdown
Contributor

@ghphotoframe ghphotoframe commented Jan 20, 2026

What this PR does / why we need it?

This PR addresses a request ID mismatch issue in the PD (Prefill-Decoding) separation deployment scenario for vllm-ascend.
Upstream vLLM recently mitigated request ID collisions by appending a random suffix to each request_id (e.g., req-123 → req-123-abc), refer to PR-27987 & PR-29665. While this works in single-node deployments, it breaks compatibility in PD-separated setups: the Producer (Prefill node) and Consumer (Decoding node) end up with different request_id values, preventing the Consumer from correctly retrieving the KV cache generated by the Producer.
To resolve this, this PR introduces a new field remote_request_id in the metadata passed via mooncake_connector. The Producer preserves and forwards the original (unmodified) request_id as remote_request_id. The Consumer then uses this remote_request_id—instead of its locally generated suffixed ID—to fetch the correct KV cache from the Prefill node.
This ensures consistent request identification across PD nodes while maintaining compatibility with upstream vLLM’s request ID deduplication mechanism.
image

Does this PR introduce any user-facing change?

No. This change is internal to the PD separation logic in vllm-ascend and does not affect APIs, CLI, or observable behavior for end users.

How was this patch tested?

Verified end-to-end inference in a PD-separated Ascend cluster: requests successfully complete with correct KV cache retrieval.
Confirmed that remote_request_id is correctly propagated through mooncake_connector metadata.

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses the request ID mismatch in PD-separated deployments by introducing remote_request_id. The changes correctly propagate the original request ID from the producer to the consumer, ensuring proper coordination and logging across nodes. The implementation is logical and well-contained within the mooncake_connector.

I've found one issue related to a potential memory leak due to using an incorrect request ID for cleanup, which I've detailed in a specific comment. Once that is addressed, this PR should be good to go.

@ghphotoframe ghphotoframe changed the title adapt_remote_request_id [bugfix] adapt_remote_request_id Jan 20, 2026
Signed-off-by: ghphotoframe <854746559@qq.com>
@ghphotoframe ghphotoframe force-pushed the adapt_remote_request_id branch from 812df8f to 8e634fd Compare January 20, 2026 11:37
@yiz-liu yiz-liu added ready read for review ready-for-test start test by label for PR labels Jan 21, 2026
@jianzs jianzs merged commit cef04b3 into vllm-project:main Jan 22, 2026
45 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Jan 22, 2026
…to qwen3next_rebase

* 'main' of https://github.com/vllm-project/vllm-ascend: (51 commits)
  [Bugfix] Remove `use_aclgraph` in mtp_proposer and use `use_cuda_graph` (vllm-project#6032)
  [BugFix] fix 3vl dense model load quant weight (vllm-project#6100)
  [CP&SP] Integrate FIA operator in mla_cp._forward_decode (vllm-project#5641)
  [CI][Doc] Upgrade wheel building's CANN to 8.5.0 and update the Docs (vllm-project#6145)
  [CI]Install clang in dokerfile for triton ascend (vllm-project#4409)
  [Main] Upgrade PTA to 2.9.0 (vllm-project#6112)
  [Graph][Fusion] Add QKVNormRope and QKVNormRopeWithBias (vllm-project#5721)
  [P/D][PCP]bugfix pcp force free twice caused logger error (vllm-project#6124)
  [BugFix]converting pa get_workspace back to capturing (vllm-project#5833)
  [CI] optimize lint term (vllm-project#5986)
  [Bugfix] Fix Triton operator usage for multimodal models based on `the mrope_interleaved` parameter (vllm-project#6042)
  [bugfix][npugraph_ex]fix the model output type issue caused by manually modify FX graph (vllm-project#6015)
  [BugFix] Support setting tp=1 for the Eagle draft model to take effect (vllm-project#6097)
  [Misc] Bump mooncake version to v0.3.8.post1 (vllm-project#6110)
  [Feature]Enable DispatchGmmCombineDecode when eagle is moe with w8a8 or not moe [RFC: issue 5476] (vllm-project#5758)
  [bugfix] adapt_remote_request_id (vllm-project#6051)
  [Feature] Add support of new W4A4_LAOS_DYNAMIC quantization method (vllm-project#5143)
  [Feature] Support DSA-CP for Hybrid scenario (vllm-project#5702)
  [CI] Upgrade CANN to 8.5.0 (vllm-project#6070)
  Default enable MLAPO (vllm-project#5952)
  ...
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
This PR addresses a request ID mismatch issue in the PD
(Prefill-Decoding) separation deployment scenario for vllm-ascend.
Upstream vLLM recently mitigated request ID collisions by appending a
random suffix to each request_id (e.g., req-123 → req-123-abc), refer to
[PR-27987](vllm-project/vllm#27987 ) &
[PR-29665](vllm-project/vllm#29665). While this
works in single-node deployments, it breaks compatibility in
PD-separated setups: the Producer (Prefill node) and Consumer (Decoding
node) end up with different request_id values, preventing the Consumer
from correctly retrieving the KV cache generated by the Producer.
To resolve this, this PR introduces a new field remote_request_id in the
metadata passed via mooncake_connector. The Producer preserves and
forwards the original (unmodified) request_id as remote_request_id. The
Consumer then uses this remote_request_id—instead of its locally
generated suffixed ID—to fetch the correct KV cache from the Prefill
node.
This ensures consistent request identification across PD nodes while
maintaining compatibility with upstream vLLM’s request ID deduplication
mechanism.
<img width="1279" height="781" alt="image"
src="https://github.com/user-attachments/assets/274238c1-dab6-4d3a-9ee4-6e578679b762"
/>

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@d682094

Signed-off-by: ghphotoframe <854746559@qq.com>
Co-authored-by: jiangweixiang <jwx02384838@antgroup.com>
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
This PR addresses a request ID mismatch issue in the PD
(Prefill-Decoding) separation deployment scenario for vllm-ascend.
Upstream vLLM recently mitigated request ID collisions by appending a
random suffix to each request_id (e.g., req-123 → req-123-abc), refer to
[PR-27987](vllm-project/vllm#27987 ) &
[PR-29665](vllm-project/vllm#29665). While this
works in single-node deployments, it breaks compatibility in
PD-separated setups: the Producer (Prefill node) and Consumer (Decoding
node) end up with different request_id values, preventing the Consumer
from correctly retrieving the KV cache generated by the Producer.
To resolve this, this PR introduces a new field remote_request_id in the
metadata passed via mooncake_connector. The Producer preserves and
forwards the original (unmodified) request_id as remote_request_id. The
Consumer then uses this remote_request_id—instead of its locally
generated suffixed ID—to fetch the correct KV cache from the Prefill
node.
This ensures consistent request identification across PD nodes while
maintaining compatibility with upstream vLLM’s request ID deduplication
mechanism.
<img width="1279" height="781" alt="image"
src="https://github.com/user-attachments/assets/274238c1-dab6-4d3a-9ee4-6e578679b762"
/>

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@d682094

Signed-off-by: ghphotoframe <854746559@qq.com>
Co-authored-by: jiangweixiang <jwx02384838@antgroup.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
This PR addresses a request ID mismatch issue in the PD
(Prefill-Decoding) separation deployment scenario for vllm-ascend.
Upstream vLLM recently mitigated request ID collisions by appending a
random suffix to each request_id (e.g., req-123 → req-123-abc), refer to
[PR-27987](vllm-project/vllm#27987 ) &
[PR-29665](vllm-project/vllm#29665). While this
works in single-node deployments, it breaks compatibility in
PD-separated setups: the Producer (Prefill node) and Consumer (Decoding
node) end up with different request_id values, preventing the Consumer
from correctly retrieving the KV cache generated by the Producer.
To resolve this, this PR introduces a new field remote_request_id in the
metadata passed via mooncake_connector. The Producer preserves and
forwards the original (unmodified) request_id as remote_request_id. The
Consumer then uses this remote_request_id—instead of its locally
generated suffixed ID—to fetch the correct KV cache from the Prefill
node.
This ensures consistent request identification across PD nodes while
maintaining compatibility with upstream vLLM’s request ID deduplication
mechanism.
<img width="1279" height="781" alt="image"
src="https://github.com/user-attachments/assets/274238c1-dab6-4d3a-9ee4-6e578679b762"
/>

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@d682094

Signed-off-by: ghphotoframe <854746559@qq.com>
Co-authored-by: jiangweixiang <jwx02384838@antgroup.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
This PR addresses a request ID mismatch issue in the PD
(Prefill-Decoding) separation deployment scenario for vllm-ascend.
Upstream vLLM recently mitigated request ID collisions by appending a
random suffix to each request_id (e.g., req-123 → req-123-abc), refer to
[PR-27987](vllm-project/vllm#27987 ) &
[PR-29665](vllm-project/vllm#29665). While this
works in single-node deployments, it breaks compatibility in
PD-separated setups: the Producer (Prefill node) and Consumer (Decoding
node) end up with different request_id values, preventing the Consumer
from correctly retrieving the KV cache generated by the Producer.
To resolve this, this PR introduces a new field remote_request_id in the
metadata passed via mooncake_connector. The Producer preserves and
forwards the original (unmodified) request_id as remote_request_id. The
Consumer then uses this remote_request_id—instead of its locally
generated suffixed ID—to fetch the correct KV cache from the Prefill
node.
This ensures consistent request identification across PD nodes while
maintaining compatibility with upstream vLLM’s request ID deduplication
mechanism.
<img width="1279" height="781" alt="image"
src="https://github.com/user-attachments/assets/274238c1-dab6-4d3a-9ee4-6e578679b762"
/>

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@d682094

Signed-off-by: ghphotoframe <854746559@qq.com>
Co-authored-by: jiangweixiang <jwx02384838@antgroup.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
This PR addresses a request ID mismatch issue in the PD
(Prefill-Decoding) separation deployment scenario for vllm-ascend.
Upstream vLLM recently mitigated request ID collisions by appending a
random suffix to each request_id (e.g., req-123 → req-123-abc), refer to
[PR-27987](vllm-project/vllm#27987 ) &
[PR-29665](vllm-project/vllm#29665). While this
works in single-node deployments, it breaks compatibility in
PD-separated setups: the Producer (Prefill node) and Consumer (Decoding
node) end up with different request_id values, preventing the Consumer
from correctly retrieving the KV cache generated by the Producer.
To resolve this, this PR introduces a new field remote_request_id in the
metadata passed via mooncake_connector. The Producer preserves and
forwards the original (unmodified) request_id as remote_request_id. The
Consumer then uses this remote_request_id—instead of its locally
generated suffixed ID—to fetch the correct KV cache from the Prefill
node.
This ensures consistent request identification across PD nodes while
maintaining compatibility with upstream vLLM’s request ID deduplication
mechanism.
<img width="1279" height="781" alt="image"
src="https://github.com/user-attachments/assets/274238c1-dab6-4d3a-9ee4-6e578679b762"
/>

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@d682094

Signed-off-by: ghphotoframe <854746559@qq.com>
Co-authored-by: jiangweixiang <jwx02384838@antgroup.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
This PR addresses a request ID mismatch issue in the PD
(Prefill-Decoding) separation deployment scenario for vllm-ascend.
Upstream vLLM recently mitigated request ID collisions by appending a
random suffix to each request_id (e.g., req-123 → req-123-abc), refer to
[PR-27987](vllm-project/vllm#27987 ) &
[PR-29665](vllm-project/vllm#29665). While this
works in single-node deployments, it breaks compatibility in
PD-separated setups: the Producer (Prefill node) and Consumer (Decoding
node) end up with different request_id values, preventing the Consumer
from correctly retrieving the KV cache generated by the Producer.
To resolve this, this PR introduces a new field remote_request_id in the
metadata passed via mooncake_connector. The Producer preserves and
forwards the original (unmodified) request_id as remote_request_id. The
Consumer then uses this remote_request_id—instead of its locally
generated suffixed ID—to fetch the correct KV cache from the Prefill
node.
This ensures consistent request identification across PD nodes while
maintaining compatibility with upstream vLLM’s request ID deduplication
mechanism.
<img width="1279" height="781" alt="image"
src="https://github.com/user-attachments/assets/274238c1-dab6-4d3a-9ee4-6e578679b762"
/>

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@d682094

Signed-off-by: ghphotoframe <854746559@qq.com>
Co-authored-by: jiangweixiang <jwx02384838@antgroup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants