Skip to content

[Core] Cleanup engine pause/sleep logic #34528

Merged
vllm-bot merged 21 commits intovllm-project:mainfrom
njhill:cleanup-pausesleep
Feb 25, 2026
Merged

[Core] Cleanup engine pause/sleep logic #34528
vllm-bot merged 21 commits intovllm-project:mainfrom
njhill:cleanup-pausesleep

Conversation

@njhill
Copy link
Member

@njhill njhill commented Feb 13, 2026

This is a follow-on from #33195 which I didn't get a chance to review, and the subsequent merging with #34125

  • Always pause scheduler prior to sleeping (regardless of level)
  • Centralize cache resetting
  • Also support sleep/pause/resume with inline-engine cases (VLLM_ENABLE_V1_MULTIPROCESSING=0 / external launcher mode) - i.e. move from EngineCoreProc to EngineCore
  • Deduplicate the added LLM.enqueue logic'
  • Add optional pause-mode arg to sleep method
  • Clean up OutputProcessor._requests_drained logic missed from [Core] Move pause and resume functions into engine #34125
  • Incorporate changes for DPEP support (including @hao-aaron's test updates from [RL] Pause and Resume for DPEP #34544)

cc @jaewonlee-fb @houseroad @zhuohan123 @hao-aaron

Resolves #31619
Resolves #15483
Replaces #30186, #28721

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the engine's pause and sleep logic, centralizing it in the EngineCore base class and improving the separation of concerns between request submission and execution. While the overall refactoring is a positive change for code clarity and maintainability, I've identified a critical logic bug in the new pause_scheduler implementation that could lead to deadlocks.

Signed-off-by: Nick Hill <nickhill123@gmail.com>
@njhill njhill force-pushed the cleanup-pausesleep branch from baab562 to f440baf Compare February 13, 2026 18:06
@njhill njhill marked this pull request as ready for review February 13, 2026 18:16
@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 13, 2026
@njhill njhill force-pushed the cleanup-pausesleep branch from b1f19ef to f088cf2 Compare February 13, 2026 19:12
@njhill njhill requested a review from houseroad February 13, 2026 19:12
@njhill njhill force-pushed the cleanup-pausesleep branch from f088cf2 to 9df4dd0 Compare February 13, 2026 19:28
Signed-off-by: Nick Hill <nickhill123@gmail.com>
@njhill njhill force-pushed the cleanup-pausesleep branch from 9df4dd0 to 9d94abb Compare February 13, 2026 19:41
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Copy link
Contributor

@hao-aaron hao-aaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some clarifying questions

Comment on lines +694 to +698
# Pause scheduler before sleeping.
clear_prefix_cache = level >= 1
pause_future = self.pause_scheduler(mode=mode, clear_cache=clear_prefix_cache)
if level < 1:
return pause_future
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts on removing the whole pause/resume and merging with sleep/wakeup

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a good idea, but we should put some thought into the API, and we'll need to keep the existing ones anyhow to begin with for backwards compatibility. So we can consider that as a follow-on change.

self.model_executor.sleep(level)

# Pause scheduler before sleeping.
clear_prefix_cache = level >= 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we okay with sleep level=0 always retaining prefix cache?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine

@njhill
Copy link
Member Author

njhill commented Feb 14, 2026

Test failures are all known existing unrelated errors

@mergify
Copy link

mergify bot commented Feb 18, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @njhill.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot removed the needs-rebase label Feb 19, 2026
@mergify

This comment was marked as resolved.

# Conflicts:
#	vllm/entrypoints/llm.py

Signed-off-by: Nick Hill <nickhill123@gmail.com>
@mergify mergify bot removed the needs-rebase label Feb 22, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Aaron Hao <ahao@anyscale.com>

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Aaron Hao <ahao@anyscale.com>

Signed-off-by: Nick Hill <nickhill123@gmail.com>
@mergify

This comment was marked as resolved.

Signed-off-by: Nick Hill <nickhill123@gmail.com>
@njhill njhill mentioned this pull request Feb 24, 2026
5 tasks
@njhill
Copy link
Member Author

njhill commented Feb 24, 2026

CI failures are known issues on main.

@DarkLight1337
Copy link
Member

I think https://buildkite.com/vllm/ci/builds/52826/steps/canvas?sid=019c8d4d-b78f-43f0-810a-461f7d2befce&tab=output is related, it's not the spec decoding test that is failing.

Copy link
Member

@markmc markmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, lgtm now 👍

(Obvs need to sort out test failures before merging)

@njhill
Copy link
Member Author

njhill commented Feb 24, 2026

I think https://buildkite.com/vllm/ci/builds/52826/steps/canvas?sid=019c8d4d-b78f-43f0-810a-461f7d2befce&tab=output is related, it's not the spec decoding test that is failing.

Ah thanks @DarkLight1337, will check that, it was the flaky spec decode test that failed before I retried it!

Copy link
Contributor

@hao-aaron hao-aaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks for adding my tests

@njhill
Copy link
Member Author

njhill commented Feb 24, 2026

Only failure is known unrelated audio model failure on main.

@vllm-bot vllm-bot merged commit dbf0da8 into vllm-project:main Feb 25, 2026
50 of 53 checks passed
@njhill njhill deleted the cleanup-pausesleep branch February 25, 2026 16:12
tom-zju pushed a commit to tom-zju/vllm that referenced this pull request Feb 26, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
SumanthRH pushed a commit to NovaSky-AI/SkyRL that referenced this pull request Mar 2, 2026
)

### Summary

- Fix `abort_generation()` and `sleep()` abort logic that broke silently
after the vllm 0.16.0 bump (#1240)
- Add backward-compatible `_get_unfinished_request_ids()` helper to
resolve internal vs external request ID mismatch
- Fixes #1243

### Root Cause

In vllm 0.16.0,
[`InputProcessor.assign_request_id()`](https://github.com/vllm-project/vllm/blob/main/vllm/v1/engine/input_processor.py)
now creates **internal** request IDs (with a random suffix) that are
distinct from the user-provided **external** request IDs:

```python
request.external_req_id = request.request_id                       # save original as external
request.request_id = f"{request.external_req_id}-{random_uuid():.8}"  # new internal ID
```

Our code was reading request IDs from
`output_processor.request_states.keys()` (which are now **internal**
IDs) and passing them to `engine.abort()` with `internal=False` (the
default). The abort looked them up in the `external_req_ids` mapping,
found nothing, and **silently did nothing**. Requests completed normally
with `finish_reason="length"` instead of `"abort"`.

This broke fully async RL's pause/resume flow, which relies on abort
returning partial outputs with `finish_reason="abort"` so the retry loop
can re-submit with accumulated tokens.

Related vllm changes:
- vllm-project/vllm#32103
- vllm-project/vllm#32351
- vllm-project/vllm#34125
- vllm-project/vllm#34528

### Fix

Add a `_get_unfinished_request_ids()` static method on
`BaseVLLMInferenceEngine` that:
- Uses `output_processor.external_req_ids.keys()` when available (vllm
0.16.0+)
- Falls back to `output_processor.request_states.keys()` for older vllm
versions

Applied to all three abort call sites:
1. `AsyncVLLMInferenceEngine.abort_generation()` — used by fully async
pause/resume
2. `AsyncVLLMInferenceEngine.sleep()` — cleanup before sleep
3. `VLLMInferenceEngine.sleep()` — sync engine cleanup before sleep

### Test plan

- [x] `test_abort_generation_vllm_engine` — passes (was failing with
`assert 'length' == 'abort'`)
- [x] `test_continue_generation_vllm_engine_chat_completion` — passes
- [x] `test_continue_generation_generate_vllm_engine_generation` —
passes
- [x] E2E fully async gsm8k (`gsm8k_fully_async_ci` project) — ran ~12
training steps successfully with pause/resume working correctly

Light blue is the run after this fix (our nightly gsm8k fully async CI)
https://wandb.ai/sky-posttraining-uc-berkeley/gsm8k_fully_async_ci

<img width="2163" height="976" alt="image"
src="https://github.com/user-attachments/assets/eaece0dc-ca53-4dd1-b3d1-2f6e308a8a47"
/>


<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1250"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
askliar pushed a commit to askliar/vllm that referenced this pull request Mar 9, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Andrii Skliar <askliar@nvidia.com>
Copilot AI pushed a commit to machov/vllm that referenced this pull request Mar 10, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: vLLM engine crashes then restarts and loads the model on sleep if a chat request is made

5 participants