[Feat](sleep mode): implement sleep and wake_up APIs for engine lifecycle management#1160
[Feat](sleep mode): implement sleep and wake_up APIs for engine lifecycle management#1160Flink-ddd wants to merge 2 commits into
Conversation
Signed-off-by: vensen <vensenmu@gmail.com>
cf33769 to
294a798
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 294a7980a8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
294a798 to
e47653c
Compare
|
@knlnguyen1802 PTAL |
|
@Flink-ddd I think you had better raise a RFC first, because this feature is related to RL when we want inference engine and training engine using the same device. Besides, I have some question to ask:
Details``` root@deepseek-v3-2-vllm-85c4fdb9f9-6nzg9:/proj-tango-pvc/users/zhipeng.wang/workspace/vllm-omni# python3 test.py
2.Running Baseline Generation... |
|
Hi @princepride , thanks for your feedback! Regarding your concerns: About the OOM Error: I checked your log, and the OOM actually happened during the initialization of Stage-2 (EngineCore failed to start). This is before any sleep and wake commands were even issued. Lowering the gpu_memory_utilization or using separate devices should fix this initialization issue. About OmniDiffusion: Yes, OmniDiffusion currently lacks the sleep method. Further improvements are needed, the sleep mode functionality on Omni is still somewhat lacking. About the RFC: I'd be happy to raise an RFC, Let's discuss how to build a complete sleep mode. |
|
@Flink-ddd You can take a look at this PR #355 If you want to add |
|
Hi @knlnguyen1802 , sure, but I saw this PR: #355 , There are some methods regarding sleep and wakeup. I'd like to plan them out first and submit an RFC, and then we can discuss the omni sleep mode function. |
Sure please also notify me when you submit a new RFC thanks |
|
sure, Thanks. |
|
@Flink-ddd Any updates? Is there a RFC now? |
|
Hi @Gaohan123 , I'm running a demo for verification, and I'll be able to submit an RFC version soon. Thanks |
|
@princepride @hsliuustc0106 @Gaohan123 @knlnguyen1802 I've submitted RFC, PTAL. Thank you for your time. |
|
@vllm-omni-reviewer |
|
@Flink-ddd Hello, any updates? |
|
Hi @Gaohan123 , this PR I consider close it, because the sleep mode ACK function will include completely function and sleep mode ack will open new PR soon, I'm testing and integrating. |
Purpose
Currently, the Omni orchestrator lacks programmatic control for sleep and wake_up states. This prevents users from releasing VRAM during idle periods in a multi-stage distributed environment.
This PR implements the necessary infrastructure to broadcast lifecycle commands from the Orchestrator down to the distributed workers, enabling efficient memory management without losing model state.
Key Changes
Orchestrator API: Added sleep(level) and wake_up() to the Omni entry point to manage all active stages.
Instruction Forwarding: Updated StageController and OmniStageTaskType to support standardized lifecycle task signaling across processes.
Worker Integration: Enabled the underlying workers to receive and execute memory release/recovery instructions via the established task queue.
Test Result
The implementation was verified using Qwen/Qwen2.5-Omni-3B with FP8 quantization enabled. Verification machine: RTX A6000 x 2
Verification logic:
Initial failure confirmed (AttributeError) when calling engine.sleep().
After applying the fix, the engine successfully entered Level 2 Sleep (verified VRAM release).
Upon wake_up(), the model produced bit-identical Token IDs compared to the baseline, confirming that the internal quantization states and KV caches are preserved correctly in the omni architecture.
Initially, the orchestrator crashed as it could not handle lifecycle commands.
After the fix, running test.py again, the program was able to correctly recognize and execute the sleep-related logic.