[Misc] Enable async scheduling by default with spec decoding by njhill · Pull Request #31998 · vllm-project/vllm

njhill · 2026-01-08T23:31:36Z

Now that all of the gaps have been addressed in async scheduling + spec decoding support, we can enable it by default in this case too.

It will still be disabled implicitly for non-EAGLE/MTP types or when padded drafter batch is disabled.

This should only be merged once #30495 is merged.

Note

Enables async scheduling by default when using compatible speculative decoding, with clearer gating and messaging.

Default-on when speculative_config.method is EAGLE/MTP; otherwise disabled with warnings
Explicitly disables when disable_padded_drafter_batch=True, pipeline_parallel_size > 1, or executor backend lacks support
Tightens validation and updates error/warning strings for incompatibilities

^{Written by Cursor Bugbot for commit 0443231. This will update automatically on new commits. Configure here.}

Note

^{Cursor Bugbot is generating a summary for commit 8f9270e. Configure here.}

Note

Enables async_scheduling by default when using compatible speculative decoding, with clearer gating and messaging.

Default-on when speculative_config.method is in EagleModelTypes; otherwise disabled with warning_once and scoped messages
Explicitly disables for pipeline_parallel_size > 1, disable_padded_drafter_batch=True, or unsupported executor backends, with improved error/warning text
Replaces several logger.warning calls with logger.warning_once(..., scope="local") to reduce log spam
Minor string cleanup for incompatibility errors

^{Written by Cursor Bugbot for commit 8f9270e. This will update automatically on new commits. Configure here.}

Signed-off-by: Nick Hill <nickhill123@gmail.com>

gemini-code-assist

Code Review

This pull request enables asynchronous scheduling by default when using speculative decoding with EAGLE/MTP methods. The changes correctly update the logic in VllmConfig.__post_init__ to no longer disable async scheduling by default for this configuration. The conditions for disabling async scheduling are now correctly limited to non-EAGLE/MTP speculative decoding methods or when disable_padded_drafter_batch is enabled. Additionally, an error message related to disable_padded_drafter_batch has been improved for clarity and correctness, removing a typo and repetition. The changes are logical and well-aligned with the goal of improving performance by enabling async scheduling in more scenarios. I have not found any issues of high or critical severity.

mgoin

Matches my expectation but I'll let @benchislett @LucasWilkinson @MatthewBonanni sign off before merge

yewentao256

Thanks for the work! Could you also add a lm_eval to show the acc is correct?

vllm/config/vllm.py

njhill · 2026-01-09T19:28:14Z

Thanks for the work! Could you also add a lm_eval to show the acc is correct?

@yewentao256 actually this test already checks for precise output match:

vllm/tests/v1/e2e/test_async_scheduling.py

Line 98 in 308feab

    
           def test_with_spec_decoding(sample_json_schema, monkeypatch: pytest.MonkeyPatch):

Signed-off-by: Nick Hill <nickhill123@gmail.com>

MatthewBonanni

LGTM! We should wait for @benchislett to weigh in though

benchislett · 2026-01-09T20:11:59Z

LGTM

yewentao256

LGTM, thanks!

### What this PR does / why we need it? Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9) - Modify import paths due to the refactors vllm-project/vllm#31916 vllm-project/vllm#32054 - Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional arguments but 3 were given` due to vllm-project/vllm#24498 - Skip the async-scheduling tests in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never verified vllm-project/vllm#31998 - Skip some pooling tests, which are caused by vllm-project/vllm#32148 where vllm is also failed https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4 We will reopen those tests when main2main reachs vllm-project/vllm#32243 - Skip some cases in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are broken by vllm-project/vllm#32118 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>

…oject#31998) Signed-off-by: Nick Hill <nickhill123@gmail.com>

…oject#31998) Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

### What this PR does / why we need it? Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9) - Modify import paths due to the refactors vllm-project/vllm#31916 vllm-project/vllm#32054 - Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional arguments but 3 were given` due to vllm-project/vllm#24498 - Skip the async-scheduling tests in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never verified vllm-project/vllm#31998 - Skip some pooling tests, which are caused by vllm-project/vllm#32148 where vllm is also failed https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4 We will reopen those tests when main2main reachs vllm-project/vllm#32243 - Skip some cases in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are broken by vllm-project/vllm#32118 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>

…oject#31998) Signed-off-by: Nick Hill <nickhill123@gmail.com>

### What this PR does / why we need it? Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9) - Modify import paths due to the refactors vllm-project/vllm#31916 vllm-project/vllm#32054 - Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional arguments but 3 were given` due to vllm-project/vllm#24498 - Skip the async-scheduling tests in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never verified vllm-project/vllm#31998 - Skip some pooling tests, which are caused by vllm-project/vllm#32148 where vllm is also failed https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4 We will reopen those tests when main2main reachs vllm-project/vllm#32243 - Skip some cases in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are broken by vllm-project/vllm#32118 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9) - Modify import paths due to the refactors vllm-project/vllm#31916 vllm-project/vllm#32054 - Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional arguments but 3 were given` due to vllm-project/vllm#24498 - Skip the async-scheduling tests in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never verified vllm-project/vllm#31998 - Skip some pooling tests, which are caused by vllm-project/vllm#32148 where vllm is also failed https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4 We will reopen those tests when main2main reachs vllm-project/vllm#32243 - Skip some cases in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are broken by vllm-project/vllm#32118 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>

### What this PR does / why we need it? Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9) - Modify import paths due to the refactors vllm-project/vllm#31916 vllm-project/vllm#32054 - Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional arguments but 3 were given` due to vllm-project/vllm#24498 - Skip the async-scheduling tests in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never verified vllm-project/vllm#31998 - Skip some pooling tests, which are caused by vllm-project/vllm#32148 where vllm is also failed https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4 We will reopen those tests when main2main reachs vllm-project/vllm#32243 - Skip some cases in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are broken by vllm-project/vllm#32118 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9) - Modify import paths due to the refactors vllm-project/vllm#31916 vllm-project/vllm#32054 - Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional arguments but 3 were given` due to vllm-project/vllm#24498 - Skip the async-scheduling tests in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never verified vllm-project/vllm#31998 - Skip some pooling tests, which are caused by vllm-project/vllm#32148 where vllm is also failed https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4 We will reopen those tests when main2main reachs vllm-project/vllm#32243 - Skip some cases in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are broken by vllm-project/vllm#32118 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>

[Misc] Enable async scheduling by default with spec decoding

0443231

Signed-off-by: Nick Hill <nickhill123@gmail.com>

gemini-code-assist bot reviewed Jan 8, 2026

View reviewed changes

njhill marked this pull request as ready for review January 9, 2026 02:48

njhill requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, tlrmchlsmth, yewentao256 and youkaichao as code owners January 9, 2026 02:48

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 9, 2026

mgoin requested a review from benchislett January 9, 2026 14:35

mgoin approved these changes Jan 9, 2026

View reviewed changes

yewentao256 reviewed Jan 9, 2026

View reviewed changes

vllm/config/vllm.py Outdated Show resolved Hide resolved

vllm/config/vllm.py Outdated Show resolved Hide resolved

njhill added 2 commits January 9, 2026 11:47

address @yewentao256's comments

ff747d7

Signed-off-by: Nick Hill <nickhill123@gmail.com>

Merge remote-tracking branch 'origin/main' into async-default-for-spec

8f9270e

njhill enabled auto-merge (squash) January 9, 2026 19:48

MatthewBonanni approved these changes Jan 9, 2026

View reviewed changes

yewentao256 approved these changes Jan 9, 2026

View reviewed changes

njhill merged commit 3adffd5 into vllm-project:main Jan 9, 2026
50 checks passed

kyuyeunk mentioned this pull request Jan 10, 2026

[CI] Fix due to upstream chagne vllm-project/tpu-inference#1436

Merged

njhill deleted the async-default-for-spec branch January 10, 2026 05:01

wjunLu mentioned this pull request Jan 13, 2026

[Main2Main] Upgrade vllm commit to 0113 vllm-project/vllm-ascend#5839

Merged

akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026

[Misc] Enable async scheduling by default with spec decoding (vllm-pr…

121e2bb

…oject#31998) Signed-off-by: Nick Hill <nickhill123@gmail.com>

dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026

[Misc] Enable async scheduling by default with spec decoding (vllm-pr…

c936bb2

…oject#31998) Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

ikaadil mentioned this pull request Jan 21, 2026

[Docs] Remove outdated async_scheduling limitation with speculative decoding #32775

Merged

Ofir408 mentioned this pull request Jan 27, 2026

[Feature Request] Update vLLM to v0.14+ for async speculative decoding support deepjavalibrary/djl#3833

Open

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[Misc] Enable async scheduling by default with spec decoding (vllm-pr…

433cded

…oject#31998) Signed-off-by: Nick Hill <nickhill123@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Misc] Enable async scheduling by default with spec decoding#31998

[Misc] Enable async scheduling by default with spec decoding#31998
njhill merged 3 commits intovllm-project:mainfrom
njhill:async-default-for-spec

njhill commented Jan 8, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

mgoin left a comment

Uh oh!

yewentao256 left a comment

Uh oh!

Uh oh!

Uh oh!

njhill commented Jan 9, 2026

Uh oh!

MatthewBonanni left a comment

Uh oh!

benchislett commented Jan 9, 2026

Uh oh!

yewentao256 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

njhill commented Jan 8, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

njhill commented Jan 9, 2026

Uh oh!

MatthewBonanni left a comment

Choose a reason for hiding this comment

Uh oh!

benchislett commented Jan 9, 2026

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

njhill commented Jan 8, 2026 •

edited by github-actions bot

Loading