[Core] Cleanup engine pause/sleep logic #34528
Conversation
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request refactors the engine's pause and sleep logic, centralizing it in the EngineCore base class and improving the separation of concerns between request submission and execution. While the overall refactoring is a positive change for code clarity and maintainability, I've identified a critical logic bug in the new pause_scheduler implementation that could lead to deadlocks.
Signed-off-by: Nick Hill <nickhill123@gmail.com>
baab562 to
f440baf
Compare
b1f19ef to
f088cf2
Compare
f088cf2 to
9df4dd0
Compare
Signed-off-by: Nick Hill <nickhill123@gmail.com>
9df4dd0 to
9d94abb
Compare
Signed-off-by: Nick Hill <nickhill123@gmail.com>
hao-aaron
left a comment
There was a problem hiding this comment.
LGTM, just some clarifying questions
| # Pause scheduler before sleeping. | ||
| clear_prefix_cache = level >= 1 | ||
| pause_future = self.pause_scheduler(mode=mode, clear_cache=clear_prefix_cache) | ||
| if level < 1: | ||
| return pause_future |
There was a problem hiding this comment.
What are your thoughts on removing the whole pause/resume and merging with sleep/wakeup
There was a problem hiding this comment.
I think it would be a good idea, but we should put some thought into the API, and we'll need to keep the existing ones anyhow to begin with for backwards compatibility. So we can consider that as a follow-on change.
| self.model_executor.sleep(level) | ||
|
|
||
| # Pause scheduler before sleeping. | ||
| clear_prefix_cache = level >= 1 |
There was a problem hiding this comment.
Are we okay with sleep level=0 always retaining prefix cache?
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
Test failures are all known existing unrelated errors |
|
This pull request has merge conflicts that must be resolved before it can be |
# Conflicts: # vllm/entrypoints/llm.py
This comment was marked as resolved.
This comment was marked as resolved.
# Conflicts: # vllm/entrypoints/llm.py Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Aaron Hao <ahao@anyscale.com> Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Aaron Hao <ahao@anyscale.com> Signed-off-by: Nick Hill <nickhill123@gmail.com>
This comment was marked as resolved.
This comment was marked as resolved.
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
CI failures are known issues on main. |
|
I think https://buildkite.com/vllm/ci/builds/52826/steps/canvas?sid=019c8d4d-b78f-43f0-810a-461f7d2befce&tab=output is related, it's not the spec decoding test that is failing. |
markmc
left a comment
There was a problem hiding this comment.
Nice work, lgtm now 👍
(Obvs need to sort out test failures before merging)
Ah thanks @DarkLight1337, will check that, it was the flaky spec decode test that failed before I retried it! |
Signed-off-by: Nick Hill <nickhill123@gmail.com>
hao-aaron
left a comment
There was a problem hiding this comment.
LGTM! thanks for adding my tests
|
Only failure is known unrelated audio model failure on main. |
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
) ### Summary - Fix `abort_generation()` and `sleep()` abort logic that broke silently after the vllm 0.16.0 bump (#1240) - Add backward-compatible `_get_unfinished_request_ids()` helper to resolve internal vs external request ID mismatch - Fixes #1243 ### Root Cause In vllm 0.16.0, [`InputProcessor.assign_request_id()`](https://github.com/vllm-project/vllm/blob/main/vllm/v1/engine/input_processor.py) now creates **internal** request IDs (with a random suffix) that are distinct from the user-provided **external** request IDs: ```python request.external_req_id = request.request_id # save original as external request.request_id = f"{request.external_req_id}-{random_uuid():.8}" # new internal ID ``` Our code was reading request IDs from `output_processor.request_states.keys()` (which are now **internal** IDs) and passing them to `engine.abort()` with `internal=False` (the default). The abort looked them up in the `external_req_ids` mapping, found nothing, and **silently did nothing**. Requests completed normally with `finish_reason="length"` instead of `"abort"`. This broke fully async RL's pause/resume flow, which relies on abort returning partial outputs with `finish_reason="abort"` so the retry loop can re-submit with accumulated tokens. Related vllm changes: - vllm-project/vllm#32103 - vllm-project/vllm#32351 - vllm-project/vllm#34125 - vllm-project/vllm#34528 ### Fix Add a `_get_unfinished_request_ids()` static method on `BaseVLLMInferenceEngine` that: - Uses `output_processor.external_req_ids.keys()` when available (vllm 0.16.0+) - Falls back to `output_processor.request_states.keys()` for older vllm versions Applied to all three abort call sites: 1. `AsyncVLLMInferenceEngine.abort_generation()` — used by fully async pause/resume 2. `AsyncVLLMInferenceEngine.sleep()` — cleanup before sleep 3. `VLLMInferenceEngine.sleep()` — sync engine cleanup before sleep ### Test plan - [x] `test_abort_generation_vllm_engine` — passes (was failing with `assert 'length' == 'abort'`) - [x] `test_continue_generation_vllm_engine_chat_completion` — passes - [x] `test_continue_generation_generate_vllm_engine_generation` — passes - [x] E2E fully async gsm8k (`gsm8k_fully_async_ci` project) — ran ~12 training steps successfully with pause/resume working correctly Light blue is the run after this fix (our nightly gsm8k fully async CI) https://wandb.ai/sky-posttraining-uc-berkeley/gsm8k_fully_async_ci <img width="2163" height="976" alt="image" src="https://github.com/user-attachments/assets/eaece0dc-ca53-4dd1-b3d1-2f6e308a8a47" /> <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1250" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end --> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
This is a follow-on from #33195 which I didn't get a chance to review, and the subsequent merging with #34125
VLLM_ENABLE_V1_MULTIPROCESSING=0/ external launcher mode) - i.e. move fromEngineCoreProctoEngineCoreLLM.enqueuelogic'OutputProcessor._requests_drainedlogic missed from [Core] Move pause and resume functions into engine #34125cc @jaewonlee-fb @houseroad @zhuohan123 @hao-aaron
Resolves #31619
Resolves #15483
Replaces #30186, #28721