[P/D][v0.16.0]Adapt to RecomputeScheduler in vLLM 0.16.0#6898
[P/D][v0.16.0]Adapt to RecomputeScheduler in vLLM 0.16.0#6898wangxiyuan merged 3 commits intovllm-project:mainfrom
Conversation
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request updates the RecomputeScheduler to align with recent changes in the upstream vLLM repository. The changes include removing outdated MTP KV consumer logic, adding support for Mamba block-aligned splits, and various refactorings for streaming support and speculative decoding.
I've found a critical issue where type hints for schedule and update_from_output were changed to the base SchedulerOutput type. This will cause a runtime AttributeError in update_from_output as it accesses attributes specific to the RecomputeSchedulerOutput subclass. I've added comments to revert these type hints to fix the bug.
Per the repository style guide, here are suggestions for the pull request title and summary:
Suggested PR Title:
[Core][Update] Align RecomputeScheduler with upstream vLLM changesSuggested PR Summary:
### What this PR does / why we need it?
This PR updates `RecomputeScheduler` to align with recent changes in vLLM (likely for v0.16.0 compatibility). The main changes are:
- Removed outdated MTP KV consumer logic and placeholder token handling for speculative decoding.
- Added support for Mamba block-aligned splits.
- Refactored request ID handling for improved readability.
- Updated logic to support streaming requests.
- Adjusted handling of stopped requests and speculative decoding statistics.
These changes are necessary to keep the forked scheduler compatible with the latest vLLM core logic.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
CI passed with new added/existing test.| def update_from_output( | ||
| self, | ||
| scheduler_output: RecomputeSchedulerOutput, | ||
| scheduler_output: SchedulerOutput, |
There was a problem hiding this comment.
The type hint for scheduler_output has been changed to SchedulerOutput. However, the method body at line 673 accesses scheduler_output.recomputed_reqs, which is an attribute specific to the RecomputeSchedulerOutput subclass. This will cause an AttributeError at runtime because SchedulerOutput does not have this attribute. To fix this bug, the type hint should be reverted to RecomputeSchedulerOutput.
| scheduler_output: SchedulerOutput, | |
| scheduler_output: RecomputeSchedulerOutput, |
| request.record_event(EngineCoreEventType.QUEUED) | ||
|
|
||
| def schedule(self) -> RecomputeSchedulerOutput: | ||
| def schedule(self) -> SchedulerOutput: |
There was a problem hiding this comment.
The return type hint for schedule has been changed to SchedulerOutput. While this matches the base class, the method returns a RecomputeSchedulerOutput instance, and the specific fields of this subclass are used in update_from_output. Changing the type hint in update_from_output to SchedulerOutput introduces a bug. To maintain consistency and correctness, it's best to revert this change and use the more specific RecomputeSchedulerOutput type.
| def schedule(self) -> SchedulerOutput: | |
| def schedule(self) -> RecomputeSchedulerOutput: |
| request.record_event(EngineCoreEventType.QUEUED) | ||
|
|
||
| def schedule(self) -> RecomputeSchedulerOutput: | ||
| def schedule(self) -> SchedulerOutput: |
There was a problem hiding this comment.
Should return RecomputeSchedulerOutput
There was a problem hiding this comment.
OK , I will fix.
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
…to qwen3next_graph * 'main' of https://github.com/vllm-project/vllm-ascend: (40 commits) [Feature] Add docs of batch invariance and make some extra operators patch (vllm-project#6910) [bugfix]Qwen2.5VL accurate question (vllm-project#6975) [CI] Add DeepSeek-V3.2 large EP nightly ci (vllm-project#6378) [Ops][BugFix] Fix RoPE shape mismatch for mtp models with flashcomm v1 enabled (vllm-project#6939) [bugfix]fix file not found error in nightly of single-node (vllm-project#6976) [Bugfix] Fix the acceptance rates dorp issue when applying eagle3 to QuaRot model (vllm-project#6914) [CI] Enable auto upgrade e2e estimated time for auto-partition suites (vllm-project#6840) [Doc][Misc] Fix msprobe_guide.md documentation issues (vllm-project#6965) [Nightly][Refactor]Migrate nightly single-node model tests from `.py` to `.yaml` (vllm-project#6503) [BugFix] Improve GDN layer detection for multimodal models (vllm-project#6941) [feat]ds3.2 pcp support mtp and chunkprefill (vllm-project#6917) [CPU binding] Implement global CPU slicing and improve IRQ binding for Ascend NPUs (vllm-project#6945) [Triton] Centralize Ascend extension op dispatch in triton_utils (vllm-project#6937) [csrc][bugfix] Add compile-time Ascend950/910_95 compatibility for custom ops between CANN8.5 and 9.0 (vllm-project#6936) [300I][Bugfix] fix unquant model weight nd2nz error (vllm-project#6851) [doc] fix supported_models (vllm-project#6930) [CI] nightly test timeout (vllm-project#6912) [CI] Upgrade CANN to 8.5.1 (vllm-project#6897) [Model]Add Qwen3-Omni quantization Ascend NPU adaptation and optimization (vllm-project#6828) [P/D][v0.16.0]Adapt to RecomputeScheduler in vLLM 0.16.0 (vllm-project#6898) ...
…t#6898) ### What this PR does / why we need it? Adapt the recompute feature to vLLM 0.16.0, where the D node forwards recompute requests to the P node. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By ci - vLLM version: v0.16.0 - vLLM main: vllm-project/vllm@15d76f7 --------- Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
What this PR does / why we need it?
Adapt the recompute feature to vLLM 0.16.0, where the D node forwards recompute requests to the P node.
Does this PR introduce any user-facing change?
No
How was this patch tested?
By ci