-
Notifications
You must be signed in to change notification settings - Fork 1k
[Docs] Guide for using sleep mode and enable sleep mode #660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
hsliuustc0106
merged 12 commits into
vllm-project:main
from
knlnguyen1802:sleep_wakeup_docs
Jan 6, 2026
Merged
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
925bb6c
Add docs for wakeup and sleep
knlnguyen1802 a294c13
Update docs
knlnguyen1802 3231730
Update nav.yaml
knlnguyen1802 abb17f6
Fix pre-commit
knlnguyen1802 f924bda
Update docs
knlnguyen1802 d3a6950
Fix pre-commit
knlnguyen1802 b54e2c3
Update docs/features/sleep_mode.md
knlnguyen1802 4944af2
Update docs/features/sleep_mode.md
knlnguyen1802 4b849b5
Update docs/features/sleep_mode.md
knlnguyen1802 3c1b322
Fix docs gen
knlnguyen1802 3f58466
Fix docs gen and pre-commit
knlnguyen1802 9acafed
Resolve comment
knlnguyen1802 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| # Sleep Mode | ||
|
|
||
| vLLM-Omni’s **Sleep Mode** allows you to temporarily release most GPU memory used by a model—such as model weights and key-value (KV) caches (for autoregressive models)—**without stopping the server or unloading the Docker container**. | ||
|
|
||
| This feature is inherited from [vLLM’s Sleep Mode](https://blog.vllm.ai/2025/10/26/sleep-mode.html), which provides zero-reload model switching for multi-model serving. | ||
|
hsliuustc0106 marked this conversation as resolved.
|
||
|
|
||
| It is especially useful in **RLHF**, **training**, or **cost-saving scenarios**, where GPU resources must be freed between inference workloads. | ||
|
|
||
| --- | ||
|
|
||
| ## Omni Model | ||
|
|
||
| Omni model inherit the feature from vLLM' Sleep Mode | ||
|
|
||
| This means: | ||
|
|
||
| - Support both Level 1 and Level 2 sleep, allow to release and reset both model weights and KV Cache | ||
|
|
||
| ## Diffusion Model Extension | ||
|
|
||
| We added Sleep Mode support for **diffusion models**, which previously lacked this functionality. | ||
| In diffusion pipelines, this currently only offloads **model weight memory**, as these models typically do not use KV caches. | ||
|
|
||
| This means: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These are not rendered correctly
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| - Diffusion models can now enter Level 1 sleep. | ||
| - Pipeline states (e.g., noise schedulers, buffers) remain intact after waking. | ||
| - Useful for releasing VRAM between image generation or training cycles. | ||
|
|
||
| --- | ||
|
|
||
| ## Enable sleep mode | ||
| To enable sleep mode, set the `enable_sleep_mode` in `engine_args` to `True` | ||
|
|
||
|
|
||
| Example: | ||
| ```python | ||
| omni = Omni(model=...,enable_sleep_mode=True) | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.