[bugfix] adapt_remote_request_id by ghphotoframe · Pull Request #6051 · vllm-project/vllm-ascend

ghphotoframe · 2026-01-20T11:29:18Z

What this PR does / why we need it?

This PR addresses a request ID mismatch issue in the PD (Prefill-Decoding) separation deployment scenario for vllm-ascend.
Upstream vLLM recently mitigated request ID collisions by appending a random suffix to each request_id (e.g., req-123 → req-123-abc), refer to PR-27987 & PR-29665. While this works in single-node deployments, it breaks compatibility in PD-separated setups: the Producer (Prefill node) and Consumer (Decoding node) end up with different request_id values, preventing the Consumer from correctly retrieving the KV cache generated by the Producer.
To resolve this, this PR introduces a new field remote_request_id in the metadata passed via mooncake_connector. The Producer preserves and forwards the original (unmodified) request_id as remote_request_id. The Consumer then uses this remote_request_id—instead of its locally generated suffixed ID—to fetch the correct KV cache from the Prefill node.
This ensures consistent request identification across PD nodes while maintaining compatibility with upstream vLLM’s request ID deduplication mechanism.

Does this PR introduce any user-facing change?

No. This change is internal to the PD separation logic in vllm-ascend and does not affect APIs, CLI, or observable behavior for end users.

How was this patch tested?

Verified end-to-end inference in a PD-separated Ascend cluster: requests successfully complete with correct KV cache retrieval.
Confirmed that remote_request_id is correctly propagated through mooncake_connector metadata.

vLLM version: v0.13.0
vLLM main: vllm-project/vllm@d682094

github-actions · 2026-01-20T11:29:44Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request effectively addresses the request ID mismatch in PD-separated deployments by introducing remote_request_id. The changes correctly propagate the original request ID from the producer to the consumer, ensuring proper coordination and logging across nodes. The implementation is logical and well-contained within the mooncake_connector.

I've found one issue related to a potential memory leak due to using an incorrect request ID for cleanup, which I've detailed in a specific comment. Once that is addressed, this PR should be good to go.

Signed-off-by: ghphotoframe <854746559@qq.com>

…to qwen3next_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (51 commits) [Bugfix] Remove `use_aclgraph` in mtp_proposer and use `use_cuda_graph` (vllm-project#6032) [BugFix] fix 3vl dense model load quant weight (vllm-project#6100) [CP&SP] Integrate FIA operator in mla_cp._forward_decode (vllm-project#5641) [CI][Doc] Upgrade wheel building's CANN to 8.5.0 and update the Docs (vllm-project#6145) [CI]Install clang in dokerfile for triton ascend (vllm-project#4409) [Main] Upgrade PTA to 2.9.0 (vllm-project#6112) [Graph][Fusion] Add QKVNormRope and QKVNormRopeWithBias (vllm-project#5721) [P/D][PCP]bugfix pcp force free twice caused logger error (vllm-project#6124) [BugFix]converting pa get_workspace back to capturing (vllm-project#5833) [CI] optimize lint term (vllm-project#5986) [Bugfix] Fix Triton operator usage for multimodal models based on `the mrope_interleaved` parameter (vllm-project#6042) [bugfix][npugraph_ex]fix the model output type issue caused by manually modify FX graph (vllm-project#6015) [BugFix] Support setting tp=1 for the Eagle draft model to take effect (vllm-project#6097) [Misc] Bump mooncake version to v0.3.8.post1 (vllm-project#6110) [Feature]Enable DispatchGmmCombineDecode when eagle is moe with w8a8 or not moe [RFC: issue 5476] (vllm-project#5758) [bugfix] adapt_remote_request_id (vllm-project#6051) [Feature] Add support of new W4A4_LAOS_DYNAMIC quantization method (vllm-project#5143) [Feature] Support DSA-CP for Hybrid scenario (vllm-project#5702) [CI] Upgrade CANN to 8.5.0 (vllm-project#6070) Default enable MLAPO (vllm-project#5952) ...

This PR addresses a request ID mismatch issue in the PD (Prefill-Decoding) separation deployment scenario for vllm-ascend. Upstream vLLM recently mitigated request ID collisions by appending a random suffix to each request_id (e.g., req-123 → req-123-abc), refer to [PR-27987](vllm-project/vllm#27987 ) & [PR-29665](vllm-project/vllm#29665). While this works in single-node deployments, it breaks compatibility in PD-separated setups: the Producer (Prefill node) and Consumer (Decoding node) end up with different request_id values, preventing the Consumer from correctly retrieving the KV cache generated by the Producer. To resolve this, this PR introduces a new field remote_request_id in the metadata passed via mooncake_connector. The Producer preserves and forwards the original (unmodified) request_id as remote_request_id. The Consumer then uses this remote_request_id—instead of its locally generated suffixed ID—to fetch the correct KV cache from the Prefill node. This ensures consistent request identification across PD nodes while maintaining compatibility with upstream vLLM’s request ID deduplication mechanism. <img width="1279" height="781" alt="image" src="https://github.com/user-attachments/assets/274238c1-dab6-4d3a-9ee4-6e578679b762" /> - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: ghphotoframe <854746559@qq.com> Co-authored-by: jiangweixiang <jwx02384838@antgroup.com>

This PR addresses a request ID mismatch issue in the PD (Prefill-Decoding) separation deployment scenario for vllm-ascend. Upstream vLLM recently mitigated request ID collisions by appending a random suffix to each request_id (e.g., req-123 → req-123-abc), refer to [PR-27987](vllm-project/vllm#27987 ) & [PR-29665](vllm-project/vllm#29665). While this works in single-node deployments, it breaks compatibility in PD-separated setups: the Producer (Prefill node) and Consumer (Decoding node) end up with different request_id values, preventing the Consumer from correctly retrieving the KV cache generated by the Producer. To resolve this, this PR introduces a new field remote_request_id in the metadata passed via mooncake_connector. The Producer preserves and forwards the original (unmodified) request_id as remote_request_id. The Consumer then uses this remote_request_id—instead of its locally generated suffixed ID—to fetch the correct KV cache from the Prefill node. This ensures consistent request identification across PD nodes while maintaining compatibility with upstream vLLM’s request ID deduplication mechanism. <img width="1279" height="781" alt="image" src="https://github.com/user-attachments/assets/274238c1-dab6-4d3a-9ee4-6e578679b762" /> - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: ghphotoframe <854746559@qq.com> Co-authored-by: jiangweixiang <jwx02384838@antgroup.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

This PR addresses a request ID mismatch issue in the PD (Prefill-Decoding) separation deployment scenario for vllm-ascend. Upstream vLLM recently mitigated request ID collisions by appending a random suffix to each request_id (e.g., req-123 → req-123-abc), refer to [PR-27987](vllm-project/vllm#27987 ) & [PR-29665](vllm-project/vllm#29665). While this works in single-node deployments, it breaks compatibility in PD-separated setups: the Producer (Prefill node) and Consumer (Decoding node) end up with different request_id values, preventing the Consumer from correctly retrieving the KV cache generated by the Producer. To resolve this, this PR introduces a new field remote_request_id in the metadata passed via mooncake_connector. The Producer preserves and forwards the original (unmodified) request_id as remote_request_id. The Consumer then uses this remote_request_id—instead of its locally generated suffixed ID—to fetch the correct KV cache from the Prefill node. This ensures consistent request identification across PD nodes while maintaining compatibility with upstream vLLM’s request ID deduplication mechanism. <img width="1279" height="781" alt="image" src="https://github.com/user-attachments/assets/274238c1-dab6-4d3a-9ee4-6e578679b762" /> - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: ghphotoframe <854746559@qq.com> Co-authored-by: jiangweixiang <jwx02384838@antgroup.com>

This PR addresses a request ID mismatch issue in the PD (Prefill-Decoding) separation deployment scenario for vllm-ascend. Upstream vLLM recently mitigated request ID collisions by appending a random suffix to each request_id (e.g., req-123 → req-123-abc), refer to [PR-27987](vllm-project/vllm#27987 ) & [PR-29665](vllm-project/vllm#29665). While this works in single-node deployments, it breaks compatibility in PD-separated setups: the Producer (Prefill node) and Consumer (Decoding node) end up with different request_id values, preventing the Consumer from correctly retrieving the KV cache generated by the Producer. To resolve this, this PR introduces a new field remote_request_id in the metadata passed via mooncake_connector. The Producer preserves and forwards the original (unmodified) request_id as remote_request_id. The Consumer then uses this remote_request_id—instead of its locally generated suffixed ID—to fetch the correct KV cache from the Prefill node. This ensures consistent request identification across PD nodes while maintaining compatibility with upstream vLLM’s request ID deduplication mechanism. <img width="1279" height="781" alt="image" src="https://github.com/user-attachments/assets/274238c1-dab6-4d3a-9ee4-6e578679b762" /> - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: ghphotoframe <854746559@qq.com> Co-authored-by: jiangweixiang <jwx02384838@antgroup.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

This PR addresses a request ID mismatch issue in the PD (Prefill-Decoding) separation deployment scenario for vllm-ascend. Upstream vLLM recently mitigated request ID collisions by appending a random suffix to each request_id (e.g., req-123 → req-123-abc), refer to [PR-27987](vllm-project/vllm#27987 ) & [PR-29665](vllm-project/vllm#29665). While this works in single-node deployments, it breaks compatibility in PD-separated setups: the Producer (Prefill node) and Consumer (Decoding node) end up with different request_id values, preventing the Consumer from correctly retrieving the KV cache generated by the Producer. To resolve this, this PR introduces a new field remote_request_id in the metadata passed via mooncake_connector. The Producer preserves and forwards the original (unmodified) request_id as remote_request_id. The Consumer then uses this remote_request_id—instead of its locally generated suffixed ID—to fetch the correct KV cache from the Prefill node. This ensures consistent request identification across PD nodes while maintaining compatibility with upstream vLLM’s request ID deduplication mechanism. <img width="1279" height="781" alt="image" src="https://github.com/user-attachments/assets/274238c1-dab6-4d3a-9ee4-6e578679b762" /> - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: ghphotoframe <854746559@qq.com> Co-authored-by: jiangweixiang <jwx02384838@antgroup.com>

ghphotoframe requested review from LCAIZJ and MengqingCao as code owners January 20, 2026 11:29

gemini-code-assist Bot reviewed Jan 20, 2026

View reviewed changes

ghphotoframe changed the title ~~adapt_remote_request_id~~ [bugfix] adapt_remote_request_id Jan 20, 2026

adapt_remote_request_id

8e634fd

Signed-off-by: ghphotoframe <854746559@qq.com>

ghphotoframe force-pushed the adapt_remote_request_id branch from 812df8f to 8e634fd Compare January 20, 2026 11:37

wangxiyuan approved these changes Jan 21, 2026

View reviewed changes

yiz-liu added ready read for review ready-for-test start test by label for PR labels Jan 21, 2026

jianzs approved these changes Jan 22, 2026

View reviewed changes

jianzs merged commit cef04b3 into vllm-project:main Jan 22, 2026
45 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix] adapt_remote_request_id#6051

[bugfix] adapt_remote_request_id#6051
jianzs merged 1 commit intovllm-project:mainfrom
ghphotoframe:adapt_remote_request_id

ghphotoframe commented Jan 20, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jan 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ghphotoframe commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions Bot commented Jan 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ghphotoframe commented Jan 20, 2026 •

edited

Loading