[Bugfix] fix bug of pcp+mtp+async scheduler by weiguihua2 · Pull Request #5994 · vllm-project/vllm-ascend

weiguihua2 · 2026-01-19T06:28:57Z

What this PR does / why we need it?

Fixed the issue where the PCP and MTP services could not be started due to asynchronous scheduling.

After the pcp, mtp, and asynchronous scheduling functions are enabled, the service is suspended because of a shape mismatch after a curl request is sent. This PR resolves this issue.

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.13.0
vLLM main: vllm-project/vllm@2c24bc6

github-actions · 2026-01-19T06:29:10Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request aims to fix a bug related to the combination of PCP, MTP, and asynchronous scheduling. The changes involve passing additional scheduling information to the PCPManager and correcting the logic for calculating sequence lengths and updating input IDs in asynchronous speculative decoding scenarios. The core logic change in pcp_utils.py appears to correctly handle the state transfer between iterations for async scheduling. However, I've found a critical issue in the test case added to verify this fix, where a duplicated parameter prevents the test from running with the intended configuration.

gemini-code-assist · 2026-01-19T06:30:47Z

            max_num_batched_tokens=1024,
            enable_expert_parallel=True,
            block_size=128,
+            async_scheduling=True,


The async_scheduling parameter is specified twice for the VllmRunner. The value True set on this line will be overridden by async_scheduling=False on line 79. As a result, this test case does not run with asynchronous scheduling enabled, and therefore does not validate the bugfix for the async scenario. To fix this, please remove the redundant async_scheduling=False on line 79.

Signed-off-by: weiguihua2 <weiguihua2@huawei.com>

### What this PR does / why we need it? Fixed the issue where the PCP and MTP services could not be started due to asynchronous scheduling. After the pcp, mtp, and asynchronous scheduling functions are enabled, the service is suspended because of a shape mismatch after a curl request is sent. This PR resolves this issue. PR for the main branch: #5994 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? --------- Signed-off-by: weiguihua2 <weiguihua2@huawei.com>

…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (24 commits) add dispath_ffn_combine_bf16 (vllm-project#5866) [BugFix] Fix input parameter bug of dispatch_gmm_combine_decode[RFC: issue 5476] (vllm-project#5932) [1/N][Feat] Xlite Qwen3 MoE Support (vllm-project#5951) [Bugfix] Fix setting of `speculative_config.enforce_eager` for dsv32 (vllm-project#5945) [bugfix][mm] change get_num_encoder_tokens to get_num_encoder_embeds in recompute_schedule.py (vllm-project#5132) [Bugfix] fix pcp qwen full graph FIA bug (vllm-project#6037) [Bugfix]Fixed precision issues caused by pooled request pooling (vllm-project#6049) 【main】【bugfix】Resolved memory deallocation failure in the pooling layer under re-computation workloads. (vllm-project#6045) [main][Bugfix] Fixed an problem related to embeddings sharing (vllm-project#5967) [Feature]refactor the npugraph_ex config, support online-infer with static kernel (vllm-project#5775) [CI][Lint] Show lint diff on failure (vllm-project#5956) [CI] Add wait logic for each individual case (vllm-project#6036) [CI] Add DeepSeek-V3.2-W8A8 nightly ci test (vllm-project#4633) model runner v2 support triton of penalty (vllm-project#5854) [Docs][Model] Support Qwen3-VL-Embedding & Qwen3-VL-Reranker (vllm-project#6034) [Tests] move qwen3 performance test from nightly to e2e (vllm-project#5980) [Bugfix] fix bug of pcp+mtp+async scheduler (vllm-project#5994) [Main2Main] Upgrade vllm commit to releases/v0.14.0 (vllm-project#5988) [Ops] Add layernorm for qwen3Next (vllm-project#5765) [Doc] Add layer_sharding additional config for DeepSeek-V3.2-W8A8 (vllm-project#5921) ...

### What this PR does / why we need it? Fixed the issue where the PCP and MTP services could not be started due to asynchronous scheduling. After the pcp, mtp, and asynchronous scheduling functions are enabled, the service is suspended because of a shape mismatch after a curl request is sent. This PR resolves this issue. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: weiguihua2 <weiguihua2@huawei.com>

### What this PR does / why we need it? Fixed the issue where the PCP and MTP services could not be started due to asynchronous scheduling. After the pcp, mtp, and asynchronous scheduling functions are enabled, the service is suspended because of a shape mismatch after a curl request is sent. This PR resolves this issue. PR for the main branch: vllm-project#5994 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? --------- Signed-off-by: weiguihua2 <weiguihua2@huawei.com>

### What this PR does / why we need it? Fixed the issue where the PCP and MTP services could not be started due to asynchronous scheduling. After the pcp, mtp, and asynchronous scheduling functions are enabled, the service is suspended because of a shape mismatch after a curl request is sent. This PR resolves this issue. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: weiguihua2 <weiguihua2@huawei.com>

### What this PR does / why we need it? Fixed the issue where the PCP and MTP services could not be started due to asynchronous scheduling. After the pcp, mtp, and asynchronous scheduling functions are enabled, the service is suspended because of a shape mismatch after a curl request is sent. This PR resolves this issue. PR for the main branch: vllm-project#5994 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? --------- Signed-off-by: weiguihua2 <weiguihua2@huawei.com>

### What this PR does / why we need it? Fixed the issue where the PCP and MTP services could not be started due to asynchronous scheduling. After the pcp, mtp, and asynchronous scheduling functions are enabled, the service is suspended because of a shape mismatch after a curl request is sent. This PR resolves this issue. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: weiguihua2 <weiguihua2@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? Fixed the issue where the PCP and MTP services could not be started due to asynchronous scheduling. After the pcp, mtp, and asynchronous scheduling functions are enabled, the service is suspended because of a shape mismatch after a curl request is sent. This PR resolves this issue. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: weiguihua2 <weiguihua2@huawei.com>

### What this PR does / why we need it? Fixed the issue where the PCP and MTP services could not be started due to asynchronous scheduling. After the pcp, mtp, and asynchronous scheduling functions are enabled, the service is suspended because of a shape mismatch after a curl request is sent. This PR resolves this issue. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: weiguihua2 <weiguihua2@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? Fixed the issue where the PCP and MTP services could not be started due to asynchronous scheduling. After the pcp, mtp, and asynchronous scheduling functions are enabled, the service is suspended because of a shape mismatch after a curl request is sent. This PR resolves this issue. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: weiguihua2 <weiguihua2@huawei.com>

weiguihua2 requested review from MengqingCao and wangxiyuan as code owners January 19, 2026 06:28

github-actions bot added the module:tests label Jan 19, 2026

gemini-code-assist bot reviewed Jan 19, 2026

View reviewed changes

wangxiyuan approved these changes Jan 19, 2026

View reviewed changes

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Jan 19, 2026

[Bugfix] fix bug of pcp+mtp+async scheduler

3e7fb5f

Signed-off-by: weiguihua2 <weiguihua2@huawei.com>

weiguihua2 force-pushed the main branch from 7f15230 to 3e7fb5f Compare January 19, 2026 12:48

weiguihua2 added 2 commits January 19, 2026 21:54

cleancode

e424356

Signed-off-by: weiguihua2 <weiguihua2@huawei.com>

cleancode

4a90ea1

Signed-off-by: weiguihua2 <weiguihua2@huawei.com>

weijinqian0 approved these changes Jan 20, 2026

View reviewed changes

weijinqian0 merged commit 5892455 into vllm-project:main Jan 20, 2026
28 of 31 checks passed

weiguihua2 mentioned this pull request Jan 20, 2026

[0.13.0][Bugfix] fix bug of pcp+mtp+async scheduler #5995

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] fix bug of pcp+mtp+async scheduler#5994

[Bugfix] fix bug of pcp+mtp+async scheduler#5994
weijinqian0 merged 3 commits intovllm-project:mainfrom
weiguihua2:main

weiguihua2 commented Jan 19, 2026 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jan 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

weiguihua2 commented Jan 19, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Jan 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

weiguihua2 commented Jan 19, 2026 •

edited by github-actions bot

Loading