Skip to content

[Core] Move pause and resume functions into engine#34125

Merged
vllm-bot merged 25 commits intovllm-project:mainfrom
hao-aaron:dp-pause-resume
Feb 13, 2026
Merged

[Core] Move pause and resume functions into engine#34125
vllm-bot merged 25 commits intovllm-project:mainfrom
hao-aaron:dp-pause-resume

Conversation

@hao-aaron
Copy link
Contributor

@hao-aaron hao-aaron commented Feb 9, 2026

Purpose

Relevant RFC: #32103

Follow up to #32351, enables pause and resume for api server count is > 1. This is necessary for data parallel deployments.

To keep a consistent pause state across multiple AsyncLLM instances, we push all pause logic to the EngineCore level. Pause and resume requests received by one AsyncLLM is broadcast to all EngineCores, so all engines instances share the same paused state and can be queried by any AsyncLLM to find the current paused state. New requests submitted while paused have the same behavior, e.g. new requests are frozen until resumed.

To ensure that pause only returns when the operation is completely finished (mode=wait finishes processing, mode=abort finishes processing all aborts), I introduce the FutureUtility mechanism, which allows a custom callback function to be specified which runs every step which decides when to return utility output and unblock the utility caller. An example is give in for mode=wait:

        elif mode == "wait":
            # wait: PAUSE_WAIT so adds are queued but step() still runs to drain.
            self._scheduler_pause_state = PauseState.PAUSE_WAIT

            def _step_wait() -> None:
                if self.scheduler.has_unfinished_requests():
                    return
                self._scheduler_pause_state = PauseState.PAUSE_KEEP
                if clear_cache:
                    self.reset_prefix_cache()
                    self.reset_mm_cache()
                    self.reset_encoder_cache()
                future.set_result(None)

            return UtilityFuture(future, _step_wait)

A return value of DeferredUtility lets the busy_loop know to stash the callback function and call it ever step, checking for some finish condition.

Support for DPEP is not included in this PR and will come in a separate one.

Test Plan

Unit tests

Test Result

passing


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: ahao-anyscale <ahao@anyscale.com>
x
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
@mergify
Copy link

mergify bot commented Feb 9, 2026

Documentation preview: https://vllm--34125.org.readthedocs.build/en/34125/

@mergify mergify bot added documentation Improvements or additions to documentation v1 labels Feb 9, 2026
@hao-aaron
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the pause/resume functionality to support data-parallel deployments by centralizing the logic within EngineCore. Pause and resume commands are now broadcast to all engine instances, ensuring a consistent state. A new DeferredUtilityResult mechanism is introduced to handle asynchronous operations that require multiple steps to complete, making the pause operation more robust. The changes are well-tested with new unit tests and a new example for data-parallel pause/resume. My review identifies a couple of areas for improvement: one related to a potentially unreliable synchronization mechanism using queue.qsize(), and another concerning a possibly outdated NotImplementedError that may be overly restrictive with the new architecture.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant refactoring to enable pause and resume functionality in data parallel deployments. The core logic for pausing and resuming is now centralized in EngineCore, and these actions are broadcast to all engine instances, ensuring a consistent state. A new DeferredUtilityResult mechanism is introduced to handle asynchronous operations that need to wait for specific conditions, which is a clean and effective pattern. The changes are well-implemented and supported by new and updated tests. My main concern is a potential IndexError in the new example test script, which could cause it to crash under certain conditions.

hao-aaron and others added 2 commits February 8, 2026 23:26
x
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Copy link
Collaborator

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did the first pass. Overall looks good. Needs some touch ups in the implementation of the deferred utility to bring down its complexity

x
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hao-aaron, overall I like the approach.

Have a couple of initial simplification comments from a first pass.

@njhill njhill changed the title Dp pause resume [Core] Move pause and resume functions into engine Feb 10, 2026
Signed-off-by: hao-aaron <ahao@anyscale.com>
x
Signed-off-by: hao-aaron <ahao@anyscale.com>
x
Signed-off-by: hao-aaron <ahao@anyscale.com>
@hao-aaron
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant and well-executed refactoring that moves the pause and resume logic into the EngineCore. This is a crucial step for enabling this functionality in data-parallel deployments. The introduction of the UtilityFuture mechanism for handling deferred operations is a solid design choice for managing asynchronous completion of pause modes. The code is cleaner, and the logic is now centralized, which greatly improves maintainability. I've found one critical issue related to aborting requests that are submitted while the engine is paused, which I've detailed in a specific comment. Otherwise, the changes look excellent.

Signed-off-by: hao-aaron <ahao@anyscale.com>
@mergify
Copy link

mergify bot commented Feb 12, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @hao-aaron.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Feb 12, 2026
Signed-off-by: Aaron Hao <ahao@anyscale.com>
@mergify mergify bot removed the needs-rebase label Feb 12, 2026
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hao-aaron!

@mergify
Copy link

mergify bot commented Feb 13, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @hao-aaron.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Feb 13, 2026
Signed-off-by: Aaron Hao <ahao@anyscale.com>
@mergify mergify bot removed the needs-rebase label Feb 13, 2026
@mergify
Copy link

mergify bot commented Feb 13, 2026

Hi @hao-aaron, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@mergify
Copy link

mergify bot commented Feb 13, 2026

Hi @hao-aaron, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: hao-aaron <ahao@anyscale.com>
@vllm-bot vllm-bot merged commit dddbff4 into vllm-project:main Feb 13, 2026
47 of 50 checks passed
@hao-aaron hao-aaron mentioned this pull request Feb 17, 2026
5 tasks
wzhao18 pushed a commit to wzhao18/vllm that referenced this pull request Feb 18, 2026
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Signed-off-by: hao-aaron <ahao@anyscale.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
markmc added a commit to markmc/vllm that referenced this pull request Feb 18, 2026
Looks like this got changed during vllm-project#34125 iterations, but the docs
got out of sync.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
markmc added a commit to markmc/vllm that referenced this pull request Feb 18, 2026
Looks like this got changed during vllm-project#34125 iterations, but the docs
got out of sync.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
eldarkurtic pushed a commit to eldarkurtic/vllm that referenced this pull request Feb 19, 2026
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Signed-off-by: hao-aaron <ahao@anyscale.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Eldar Kurtic <research@neuralmagic.com>
ZJY0516 pushed a commit to ZJY0516/vllm that referenced this pull request Feb 23, 2026
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Signed-off-by: hao-aaron <ahao@anyscale.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Signed-off-by: hao-aaron <ahao@anyscale.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
SumanthRH pushed a commit to NovaSky-AI/SkyRL that referenced this pull request Mar 2, 2026
)

### Summary

- Fix `abort_generation()` and `sleep()` abort logic that broke silently
after the vllm 0.16.0 bump (#1240)
- Add backward-compatible `_get_unfinished_request_ids()` helper to
resolve internal vs external request ID mismatch
- Fixes #1243

### Root Cause

In vllm 0.16.0,
[`InputProcessor.assign_request_id()`](https://github.com/vllm-project/vllm/blob/main/vllm/v1/engine/input_processor.py)
now creates **internal** request IDs (with a random suffix) that are
distinct from the user-provided **external** request IDs:

```python
request.external_req_id = request.request_id                       # save original as external
request.request_id = f"{request.external_req_id}-{random_uuid():.8}"  # new internal ID
```

Our code was reading request IDs from
`output_processor.request_states.keys()` (which are now **internal**
IDs) and passing them to `engine.abort()` with `internal=False` (the
default). The abort looked them up in the `external_req_ids` mapping,
found nothing, and **silently did nothing**. Requests completed normally
with `finish_reason="length"` instead of `"abort"`.

This broke fully async RL's pause/resume flow, which relies on abort
returning partial outputs with `finish_reason="abort"` so the retry loop
can re-submit with accumulated tokens.

Related vllm changes:
- vllm-project/vllm#32103
- vllm-project/vllm#32351
- vllm-project/vllm#34125
- vllm-project/vllm#34528

### Fix

Add a `_get_unfinished_request_ids()` static method on
`BaseVLLMInferenceEngine` that:
- Uses `output_processor.external_req_ids.keys()` when available (vllm
0.16.0+)
- Falls back to `output_processor.request_states.keys()` for older vllm
versions

Applied to all three abort call sites:
1. `AsyncVLLMInferenceEngine.abort_generation()` — used by fully async
pause/resume
2. `AsyncVLLMInferenceEngine.sleep()` — cleanup before sleep
3. `VLLMInferenceEngine.sleep()` — sync engine cleanup before sleep

### Test plan

- [x] `test_abort_generation_vllm_engine` — passes (was failing with
`assert 'length' == 'abort'`)
- [x] `test_continue_generation_vllm_engine_chat_completion` — passes
- [x] `test_continue_generation_generate_vllm_engine_generation` —
passes
- [x] E2E fully async gsm8k (`gsm8k_fully_async_ci` project) — ran ~12
training steps successfully with pause/resume working correctly

Light blue is the run after this fix (our nightly gsm8k fully async CI)
https://wandb.ai/sky-posttraining-uc-berkeley/gsm8k_fully_async_ci

<img width="2163" height="976" alt="image"
src="https://github.com/user-attachments/assets/eaece0dc-ca53-4dd1-b3d1-2f6e308a8a47"
/>


<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1250"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Signed-off-by: hao-aaron <ahao@anyscale.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants