[Core] Move pause and resume functions into engine by hao-aaron · Pull Request #34125 · vllm-project/vllm

hao-aaron · 2026-02-09T07:17:41Z

Purpose

Relevant RFC: #32103

Follow up to #32351, enables pause and resume for api server count is > 1. This is necessary for data parallel deployments.

To keep a consistent pause state across multiple AsyncLLM instances, we push all pause logic to the EngineCore level. Pause and resume requests received by one AsyncLLM is broadcast to all EngineCores, so all engines instances share the same paused state and can be queried by any AsyncLLM to find the current paused state. New requests submitted while paused have the same behavior, e.g. new requests are frozen until resumed.

To ensure that pause only returns when the operation is completely finished (mode=wait finishes processing, mode=abort finishes processing all aborts), I introduce the FutureUtility mechanism, which allows a custom callback function to be specified which runs every step which decides when to return utility output and unblock the utility caller. An example is give in for mode=wait:

        elif mode == "wait":
            # wait: PAUSE_WAIT so adds are queued but step() still runs to drain.
            self._scheduler_pause_state = PauseState.PAUSE_WAIT

            def _step_wait() -> None:
                if self.scheduler.has_unfinished_requests():
                    return
                self._scheduler_pause_state = PauseState.PAUSE_KEEP
                if clear_cache:
                    self.reset_prefix_cache()
                    self.reset_mm_cache()
                    self.reset_encoder_cache()
                future.set_result(None)

            return UtilityFuture(future, _step_wait)

A return value of DeferredUtility lets the busy_loop know to stash the callback function and call it ever step, checking for some finish condition.

Support for DPEP is not included in this PR and will come in a separate one.

Test Plan

Unit tests

Test Result

passing

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

mergify · 2026-02-09T07:18:21Z

Documentation preview: https://vllm--34125.org.readthedocs.build/en/34125/

hao-aaron · 2026-02-09T07:19:58Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors the pause/resume functionality to support data-parallel deployments by centralizing the logic within EngineCore. Pause and resume commands are now broadcast to all engine instances, ensuring a consistent state. A new DeferredUtilityResult mechanism is introduced to handle asynchronous operations that require multiple steps to complete, making the pause operation more robust. The changes are well-tested with new unit tests and a new example for data-parallel pause/resume. My review identifies a couple of areas for improvement: one related to a potentially unreliable synchronization mechanism using queue.qsize(), and another concerning a possibly outdated NotImplementedError that may be overly restrictive with the new architecture.

vllm/v1/engine/async_llm.py

vllm/v1/engine/core.py

gemini-code-assist

Code Review

This pull request introduces a significant refactoring to enable pause and resume functionality in data parallel deployments. The core logic for pausing and resuming is now centralized in EngineCore, and these actions are broadcast to all engine instances, ensuring a consistent state. A new DeferredUtilityResult mechanism is introduced to handle asynchronous operations that need to wait for specific conditions, which is a clean and effective pattern. The changes are well-implemented and supported by new and updated tests. My main concern is a potential IndexError in the new example test script, which could cause it to crash under certain conditions.

examples/online_serving/data_parallel_pause_resume.py

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Aaron Hao <ahao@anyscale.com>

kouroshHakha

Did the first pass. Overall looks good. Needs some touch ups in the implementation of the deferred utility to bring down its complexity

vllm/v1/engine/__init__.py

vllm/v1/engine/core.py

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

…nto dp-pause-resume

njhill

Thanks @hao-aaron, overall I like the approach.

Have a couple of initial simplification comments from a first pass.

vllm/v1/engine/core.py

vllm/v1/engine/core_client.py

Signed-off-by: hao-aaron <ahao@anyscale.com>

hao-aaron · 2026-02-10T22:00:20Z

/gemini review

gemini-code-assist

Code Review

This pull request is a significant and well-executed refactoring that moves the pause and resume logic into the EngineCore. This is a crucial step for enabling this functionality in data-parallel deployments. The introduction of the UtilityFuture mechanism for handling deferred operations is a solid design choice for managing asynchronous completion of pause modes. The code is cleaner, and the logic is now centralized, which greatly improves maintainability. I've found one critical issue related to aborting requests that are submitted while the engine is paused, which I've detailed in a specific comment. Otherwise, the changes look excellent.

vllm/v1/engine/core.py

Signed-off-by: hao-aaron <ahao@anyscale.com>

mergify · 2026-02-12T16:26:23Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @hao-aaron.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Aaron Hao <ahao@anyscale.com>

njhill

Thanks @hao-aaron!

mergify · 2026-02-13T00:18:48Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @hao-aaron.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Aaron Hao <ahao@anyscale.com>

mergify · 2026-02-13T00:32:42Z

Hi @hao-aaron, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

mergify · 2026-02-13T01:40:59Z

Hi @hao-aaron, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: hao-aaron <ahao@anyscale.com>

Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Aaron Hao <ahao@anyscale.com> Signed-off-by: hao-aaron <ahao@anyscale.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>

Looks like this got changed during vllm-project#34125 iterations, but the docs got out of sync. Signed-off-by: Mark McLoughlin <markmc@redhat.com>

Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Aaron Hao <ahao@anyscale.com> Signed-off-by: hao-aaron <ahao@anyscale.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Eldar Kurtic <research@neuralmagic.com>

Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Aaron Hao <ahao@anyscale.com> Signed-off-by: hao-aaron <ahao@anyscale.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Aaron Hao <ahao@anyscale.com> Signed-off-by: hao-aaron <ahao@anyscale.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>

) ### Summary - Fix `abort_generation()` and `sleep()` abort logic that broke silently after the vllm 0.16.0 bump (#1240) - Add backward-compatible `_get_unfinished_request_ids()` helper to resolve internal vs external request ID mismatch - Fixes #1243 ### Root Cause In vllm 0.16.0, [`InputProcessor.assign_request_id()`](https://github.com/vllm-project/vllm/blob/main/vllm/v1/engine/input_processor.py) now creates **internal** request IDs (with a random suffix) that are distinct from the user-provided **external** request IDs: ```python request.external_req_id = request.request_id # save original as external request.request_id = f"{request.external_req_id}-{random_uuid():.8}" # new internal ID ``` Our code was reading request IDs from `output_processor.request_states.keys()` (which are now **internal** IDs) and passing them to `engine.abort()` with `internal=False` (the default). The abort looked them up in the `external_req_ids` mapping, found nothing, and **silently did nothing**. Requests completed normally with `finish_reason="length"` instead of `"abort"`. This broke fully async RL's pause/resume flow, which relies on abort returning partial outputs with `finish_reason="abort"` so the retry loop can re-submit with accumulated tokens. Related vllm changes: - vllm-project/vllm#32103 - vllm-project/vllm#32351 - vllm-project/vllm#34125 - vllm-project/vllm#34528 ### Fix Add a `_get_unfinished_request_ids()` static method on `BaseVLLMInferenceEngine` that: - Uses `output_processor.external_req_ids.keys()` when available (vllm 0.16.0+) - Falls back to `output_processor.request_states.keys()` for older vllm versions Applied to all three abort call sites: 1. `AsyncVLLMInferenceEngine.abort_generation()` — used by fully async pause/resume 2. `AsyncVLLMInferenceEngine.sleep()` — cleanup before sleep 3. `VLLMInferenceEngine.sleep()` — sync engine cleanup before sleep ### Test plan - [x] `test_abort_generation_vllm_engine` — passes (was failing with `assert 'length' == 'abort'`) - [x] `test_continue_generation_vllm_engine_chat_completion` — passes - [x] `test_continue_generation_generate_vllm_engine_generation` — passes - [x] E2E fully async gsm8k (`gsm8k_fully_async_ci` project) — ran ~12 training steps successfully with pause/resume working correctly Light blue is the run after this fix (our nightly gsm8k fully async CI) https://wandb.ai/sky-posttraining-uc-berkeley/gsm8k_fully_async_ci <img width="2163" height="976" alt="image" src="https://github.com/user-attachments/assets/eaece0dc-ca53-4dd1-b3d1-2f6e308a8a47" />  --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1250" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a>  Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Aaron Hao <ahao@anyscale.com> Signed-off-by: hao-aaron <ahao@anyscale.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>

hao-aaron added 2 commits February 8, 2026 23:15

deferred utility, send pause to engine cores

0f75607

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

x

b4622cf

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

mergify bot added documentation Improvements or additions to documentation v1 labels Feb 9, 2026

gemini-code-assist bot reviewed Feb 9, 2026

View reviewed changes

vllm/v1/engine/async_llm.py Outdated Show resolved Hide resolved

vllm/v1/engine/core.py Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Feb 9, 2026

View reviewed changes

examples/online_serving/data_parallel_pause_resume.py Outdated Show resolved Hide resolved

hao-aaron and others added 2 commits February 8, 2026 23:26

x

d0ceb5e

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

Update examples/online_serving/data_parallel_pause_resume.py

e5176f8

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Aaron Hao <ahao@anyscale.com>

hao-aaron marked this pull request as ready for review February 9, 2026 07:26

hao-aaron requested review from ApostaC, WoosukKwon, alexm-redhat, heheda12345, njhill, orozery, robertgshaw2-redhat and ywang96 as code owners February 9, 2026 07:26

kouroshHakha reviewed Feb 9, 2026

View reviewed changes

hao-aaron added 2 commits February 9, 2026 11:18

x

6e84644

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

Merge branch 'dp-pause-resume' of https://github.com/hao-aaron/vllm i…

ff4b1f1

…nto dp-pause-resume

njhill reviewed Feb 9, 2026

View reviewed changes

vllm/v1/engine/core.py Outdated Show resolved Hide resolved

vllm/v1/engine/core_client.py Outdated Show resolved Hide resolved

njhill changed the title ~~Dp pause resume~~ [Core] Move pause and resume functions into engine Feb 10, 2026

hao-aaron added 2 commits February 9, 2026 17:22

change deferred utility to thread based

3d87888

Signed-off-by: hao-aaron <ahao@anyscale.com>

x

18e2001

Signed-off-by: hao-aaron <ahao@anyscale.com>

markmc mentioned this pull request Feb 10, 2026

[Frontend] Enable drain shutdown mode for non-DP deployments #32420

Closed

x

a7f9cc7

Signed-off-by: hao-aaron <ahao@anyscale.com>

gemini-code-assist bot reviewed Feb 10, 2026

View reviewed changes

vllm/v1/engine/core.py Outdated Show resolved Hide resolved

fix test

725e596

Signed-off-by: hao-aaron <ahao@anyscale.com>

mergify bot added the needs-rebase label Feb 12, 2026

Merge branch 'main' into dp-pause-resume

532b9a6

Signed-off-by: Aaron Hao <ahao@anyscale.com>

mergify bot removed the needs-rebase label Feb 12, 2026

njhill approved these changes Feb 12, 2026

View reviewed changes

Merge branch 'main' into dp-pause-resume

80ba143

mergify bot added the needs-rebase label Feb 13, 2026

Merge branch 'main' into dp-pause-resume

0f68852

Signed-off-by: Aaron Hao <ahao@anyscale.com>

mergify bot removed the needs-rebase label Feb 13, 2026

hao-aaron force-pushed the dp-pause-resume branch from 8bcc74a to 0f68852 Compare February 13, 2026 01:36

integrate sleep 0=pause changes

5b4d43d

Signed-off-by: hao-aaron <ahao@anyscale.com>

vllm-bot merged commit dddbff4 into vllm-project:main Feb 13, 2026
47 of 50 checks passed

njhill mentioned this pull request Feb 13, 2026

[Core] Cleanup engine pause/sleep logic #34528

Merged

QiliangCui mentioned this pull request Feb 13, 2026

[Fixing] Fix CI/CD to make Integration pipeline moving forward vllm-project/tpu-inference#1716

Merged

wenxindongwork mentioned this pull request Feb 13, 2026

[DP] Add pause_state to DPScheduler vllm-project/tpu-inference#1718

Open

hao-aaron mentioned this pull request Feb 17, 2026

[RL] Pause and Resume for DPEP #34544

Closed

5 tasks

markmc added a commit to markmc/vllm that referenced this pull request Feb 18, 2026

[Core] Fix state names in pause_scheduler()

666e1eb

Looks like this got changed during vllm-project#34125 iterations, but the docs got out of sync. Signed-off-by: Mark McLoughlin <markmc@redhat.com>

markmc mentioned this pull request Feb 18, 2026

[Core] Fix state names in pause_scheduler() #34840

Merged

markmc added a commit to markmc/vllm that referenced this pull request Feb 18, 2026

[Core] Fix state names in pause_scheduler()

6aee682

Looks like this got changed during vllm-project#34125 iterations, but the docs got out of sync. Signed-off-by: Mark McLoughlin <markmc@redhat.com>

CharlieFRuan mentioned this pull request Mar 2, 2026

[train][fullyAsync] Fix abort/pause broken after vllm 0.16.0 bump NovaSky-AI/SkyRL#1250

Merged

4 tasks

Uh oh!

Conversation

hao-aaron commented Feb 9, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify bot commented Feb 9, 2026

Uh oh!

hao-aaron commented Feb 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

kouroshHakha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hao-aaron commented Feb 10, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

mergify bot commented Feb 12, 2026

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Feb 13, 2026

Uh oh!

mergify bot commented Feb 13, 2026

Uh oh!

mergify bot commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hao-aaron commented Feb 9, 2026 •

edited by github-actions bot

Loading