[P/D][v0.16.0]Adapt to RecomputeScheduler in vLLM 0.16.0 by wangxiaoteng888 · Pull Request #6898 · vllm-project/vllm-ascend

wangxiaoteng888 · 2026-03-02T02:16:24Z

What this PR does / why we need it?

Adapt the recompute feature to vLLM 0.16.0, where the D node forwards recompute requests to the P node.

Does this PR introduce any user-facing change?

No

How was this patch tested?

By ci

vLLM version: v0.16.0
vLLM main: vllm-project/vllm@15d76f7

Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>

github-actions · 2026-03-02T02:16:35Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist · 2026-03-02T02:16:44Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the RecomputeScheduler to improve its efficiency and extend its capabilities. Key changes include simplifying the request lifecycle by removing outdated KV cache transfer mechanisms, introducing explicit support for Mamba models with block-aligned cache splitting, and enhancing the tracking of prefix cache statistics. The update also modernizes type hints and strengthens the handling of request states, ensuring more reliable operation during model execution and asynchronous KV transfers.

Highlights

Simplified Request Management: Removed specialized __init__ and add_request logic related to PLACEHOLDER_TOKEN_ID and is_mtp_kv_consumer, streamlining how requests are initialized and added to the scheduler.
Mamba Block-Aligned Cache Support: Integrated new logic (_mamba_block_aligned_split) within the scheduling process to support block-aligned chunking for Mamba models, including a new reason for unschedulable requests.
Enhanced Prefix Cache Statistics: Introduced new variables (connector_prefix_cache_queries, connector_prefix_cache_hits) and recording logic to track prefix cache performance more comprehensively.
Improved Request State Handling: Refined the update_from_output method to robustly handle finished or aborted requests, especially in asynchronous KV transfer scenarios, and standardized the use of request_id for consistency.
Type Hint Modernization: Updated the return type of schedule and the scheduler_output parameter type in update_from_output to use the more general SchedulerOutput.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

vllm_ascend/core/recompute_scheduler.py
- Removed PLACEHOLDER_TOKEN_ID import and related __init__ and add_request methods.
- Updated schedule method return type to SchedulerOutput.
- Integrated _mamba_block_aligned_split for Mamba block-aligned cache handling.
- Added a new unschedulable reason for Mamba models.
- Standardized request_id usage for dictionary operations.
- Modified spec_token_ids trimming logic.
- Added logic to skip requests waiting for streaming.
- Initialized and recorded connector_prefix_cache_stats.
- Simplified num_new_tokens calculation.
- Removed redundant spec_token_ids processing for waiting requests.
- Used any_request_id for common prefix block retrieval.
- Updated update_from_output signature and added is_finished() check.
- Ensured generated_token_ids exist before processing scheduled_spec_token_ids.
- Refactored routed_experts and finish_reason handling.
- Added num_external_computed_tokens to EngineCoreOutput.

Activity

No specific activity was provided in the pull request description.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request updates the RecomputeScheduler to align with recent changes in the upstream vLLM repository. The changes include removing outdated MTP KV consumer logic, adding support for Mamba block-aligned splits, and various refactorings for streaming support and speculative decoding.

I've found a critical issue where type hints for schedule and update_from_output were changed to the base SchedulerOutput type. This will cause a runtime AttributeError in update_from_output as it accesses attributes specific to the RecomputeSchedulerOutput subclass. I've added comments to revert these type hints to fix the bug.

Per the repository style guide, here are suggestions for the pull request title and summary:

Suggested PR Title:

[Core][Update] Align RecomputeScheduler with upstream vLLM changes

Suggested PR Summary:

### What this PR does / why we need it?

This PR updates `RecomputeScheduler` to align with recent changes in vLLM (likely for v0.16.0 compatibility). The main changes are:

- Removed outdated MTP KV consumer logic and placeholder token handling for speculative decoding.
- Added support for Mamba block-aligned splits.
- Refactored request ID handling for improved readability.
- Updated logic to support streaming requests.
- Adjusted handling of stopped requests and speculative decoding statistics.

These changes are necessary to keep the forked scheduler compatible with the latest vLLM core logic.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI passed with new added/existing test.

gemini-code-assist · 2026-03-02T02:18:42Z

    def update_from_output(
        self,
-        scheduler_output: RecomputeSchedulerOutput,
+        scheduler_output: SchedulerOutput,


The type hint for scheduler_output has been changed to SchedulerOutput. However, the method body at line 673 accesses scheduler_output.recomputed_reqs, which is an attribute specific to the RecomputeSchedulerOutput subclass. This will cause an AttributeError at runtime because SchedulerOutput does not have this attribute. To fix this bug, the type hint should be reverted to RecomputeSchedulerOutput.

Suggested change

scheduler_output: SchedulerOutput,

scheduler_output: RecomputeSchedulerOutput,

gemini-code-assist · 2026-03-02T02:18:42Z

-            request.record_event(EngineCoreEventType.QUEUED)
-
-    def schedule(self) -> RecomputeSchedulerOutput:
+    def schedule(self) -> SchedulerOutput:


The return type hint for schedule has been changed to SchedulerOutput. While this matches the base class, the method returns a RecomputeSchedulerOutput instance, and the specific fields of this subclass are used in update_from_output. Changing the type hint in update_from_output to SchedulerOutput introduces a bug. To maintain consistency and correctness, it's best to revert this change and use the more specific RecomputeSchedulerOutput type.

Suggested change

def schedule(self) -> SchedulerOutput:

def schedule(self) -> RecomputeSchedulerOutput:

Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>

liziyu179 · 2026-03-02T07:26:24Z

-            request.record_event(EngineCoreEventType.QUEUED)
-
-    def schedule(self) -> RecomputeSchedulerOutput:
+    def schedule(self) -> SchedulerOutput:


Should return RecomputeSchedulerOutput

OK , I will fix.

Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>

…to qwen3next_graph * 'main' of https://github.com/vllm-project/vllm-ascend: (40 commits) [Feature] Add docs of batch invariance and make some extra operators patch (vllm-project#6910) [bugfix]Qwen2.5VL accurate question (vllm-project#6975) [CI] Add DeepSeek-V3.2 large EP nightly ci (vllm-project#6378) [Ops][BugFix] Fix RoPE shape mismatch for mtp models with flashcomm v1 enabled (vllm-project#6939) [bugfix]fix file not found error in nightly of single-node (vllm-project#6976) [Bugfix] Fix the acceptance rates dorp issue when applying eagle3 to QuaRot model (vllm-project#6914) [CI] Enable auto upgrade e2e estimated time for auto-partition suites (vllm-project#6840) [Doc][Misc] Fix msprobe_guide.md documentation issues (vllm-project#6965) [Nightly][Refactor]Migrate nightly single-node model tests from `.py` to `.yaml` (vllm-project#6503) [BugFix] Improve GDN layer detection for multimodal models (vllm-project#6941) [feat]ds3.2 pcp support mtp and chunkprefill (vllm-project#6917) [CPU binding] Implement global CPU slicing and improve IRQ binding for Ascend NPUs (vllm-project#6945) [Triton] Centralize Ascend extension op dispatch in triton_utils (vllm-project#6937) [csrc][bugfix] Add compile-time Ascend950/910_95 compatibility for custom ops between CANN8.5 and 9.0 (vllm-project#6936) [300I][Bugfix] fix unquant model weight nd2nz error (vllm-project#6851) [doc] fix supported_models (vllm-project#6930) [CI] nightly test timeout (vllm-project#6912) [CI] Upgrade CANN to 8.5.1 (vllm-project#6897) [Model]Add Qwen3-Omni quantization Ascend NPU adaptation and optimization (vllm-project#6828) [P/D][v0.16.0]Adapt to RecomputeScheduler in vLLM 0.16.0 (vllm-project#6898) ...

…t#6898) ### What this PR does / why we need it? Adapt the recompute feature to vLLM 0.16.0, where the D node forwards recompute requests to the P node. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By ci - vLLM version: v0.16.0 - vLLM main: vllm-project/vllm@15d76f7 --------- Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>

update_recompute_for_16

eda162b

Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>

wangxiaoteng888 requested review from MengqingCao and wangxiyuan as code owners March 2, 2026 02:16

gemini-code-assist bot reviewed Mar 2, 2026

View reviewed changes

wangxiaoteng888 changed the title ~~update_recompute_for_16~~ [recompute][0.16.0]Adapt to RecomputeScheduler in vLLM 0.16.0 Mar 2, 2026

wangxiaoteng888 changed the title ~~[recompute][0.16.0]Adapt to RecomputeScheduler in vLLM 0.16.0~~ [P/D][0.16.0]Adapt to RecomputeScheduler in vLLM 0.16.0 Mar 2, 2026

wangxiaoteng888 changed the title ~~[P/D][0.16.0]Adapt to RecomputeScheduler in vLLM 0.16.0~~ [P/D][v0.16.0]Adapt to RecomputeScheduler in vLLM 0.16.0 Mar 2, 2026

fix_lint

372cafd

Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>

liziyu179 reviewed Mar 2, 2026

View reviewed changes

wangxiyuan approved these changes Mar 2, 2026

View reviewed changes

fix_return_class

220c777

Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>

weiguihua2 added ready read for review ready-for-test start test by label for PR labels Mar 2, 2026

wangxiyuan approved these changes Mar 2, 2026

View reviewed changes

wangxiyuan merged commit dfa9ff7 into vllm-project:main Mar 2, 2026
59 of 60 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[P/D][v0.16.0]Adapt to RecomputeScheduler in vLLM 0.16.0#6898

[P/D][v0.16.0]Adapt to RecomputeScheduler in vLLM 0.16.0#6898
wangxiyuan merged 3 commits intovllm-project:mainfrom
wangxiaoteng888:recompute_scheduler_16

wangxiaoteng888 commented Mar 2, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

gemini-code-assist bot commented Mar 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 2, 2026

Uh oh!

gemini-code-assist bot Mar 2, 2026

Uh oh!

liziyu179 Mar 2, 2026

Uh oh!

wangxiaoteng888 Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	scheduler_output: SchedulerOutput,
	scheduler_output: RecomputeSchedulerOutput,

	def schedule(self) -> SchedulerOutput:
	def schedule(self) -> RecomputeSchedulerOutput:

Conversation

wangxiaoteng888 commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

gemini-code-assist bot commented Mar 2, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

liziyu179 Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

wangxiaoteng888 Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wangxiaoteng888 commented Mar 2, 2026 •

edited

Loading