add sleep and wake_up api in sleep model by lengrongfu · Pull Request #2742 · vllm-project/vllm-omni

lengrongfu · 2026-04-13T14:55:36Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Current wen can set Sleep Mode, but not api to set sleep and wake_up, so we need add it.

Test Plan

start serve

$ python3 -m vllm_omni.entrypoints.cli.main serve /home/jovyan/Wan2.2-T2V-A14B-Diffusers/ --omni  --enable-sleep-mode

call sleep api

$ curl -X POST 'localhost:8000/sleep'

we can look a log Sleep mode (process-scoped) freed 64.23 GiB memory, 0.71 GiB memory is still in use.

call wake_up api

$ curl -X POST 'localhost:8000/wake_up'

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: rongfu.leng <lenronfu@gmail.com>

chatgpt-codex-connector · 2026-04-13T14:55:41Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

hsliuustc0106 · 2026-04-13T21:06:29Z

PR #2742 - add sleep and wake_up api

OVERALL: NO BLOCKERS (missing doc)
VERDICT: COMMENT

Correctness: PASS
Reliability: PASS
Breaking: PASS
Tests: PASS (curl commands present)
Documentation: ISSUES - missing API doc for /sleep and /wake_up
Security: PASS

Summary: Add /sleep and /wake_up API endpoints. 36 add. Test plan has curl commands, test result empty. Please update API documentation for new endpoints.

hsliuustc0106 · 2026-04-14T09:37:47Z

BLOCKER scan:

Correctness: PASS
Reliability/Safety: PASS
Breaking Changes: PASS (new API)
Test Coverage: PASS
Documentation: PASS (docs updated)
Security: PASS

OVERALL: NO BLOCKERS

VERDICT: COMMENT

Simple and straightforward API additions. A few minor suggestions:

Consider adding try-except blocks around the engine_client calls to return proper HTTP error codes instead of 500
The tags parameter in wake_up() - when it's an empty list, you set it to None. This is correct but could be clarified with a comment
Consider adding a GET endpoint to query the current sleep level, not just is_sleeping status

Signed-off-by: rongfu.leng <lenronfu@gmail.com>

yenuo26 · 2026-04-16T12:14:36Z

        )
+
+
+@router.post("/sleep")


Could you add some unit test cases in tests/entrypoints?

lishunyang12

Review of "add sleep and wake_up api in sleep model"

Thanks for adding the HTTP API surface for sleep/wake_up -- this is a useful feature. I have several concerns that should be addressed before merging.

1. Missing error handling in `/sleep` endpoint -- no return value on success

The /sleep handler catches exceptions and returns an error, but on the success path it returns nothing (implicitly None, which FastAPI turns into a 200 with no body). Compare with /wake_up which explicitly returns Response(status_code=200). Both endpoints should return an explicit, consistent response. Suggest returning a JSONResponse with a body like {"status": "ok"} from both, or at least a Response(status_code=200).

2. No guard for `enable_sleep_mode`

The sleep/wake_up/sleep_info endpoints are registered unconditionally on the router. If the server was started without --enable-sleep-mode, calling /sleep will still invoke engine_client.sleep() which may produce unexpected behavior or crash. The endpoints should either:

Be conditionally registered only when sleep mode is enabled, or
Check whether sleep mode is enabled and return 400 Bad Request with a clear message otherwise.

3. `/wake_up` has no error handling

Unlike /sleep and /sleep_info, the /wake_up handler has no try/except block. If engine_client.wake_up(tags) raises, the user gets an unhandled 500 with a stack trace. Please wrap it in the same pattern used by the other two endpoints.

4. `/sleep_info` crashes with `AttributeError` when `engine_client` is None

engine_client = getattr(raw_request.app.state, "engine_client", None)
level = await engine_client.sleep_level()  # AttributeError if None

If engine_client is None, the next line will raise AttributeError: 'NoneType' object has no attribute 'sleep_level'. Either remove the getattr fallback (since all other endpoints access raw_request.app.state.engine_client directly), or add a None check.

5. API inconsistency: docs say `GET /is_sleeping`, code provides `GET /sleep_info`

The documentation added to docs/features/sleep_mode.md advertises GET /is_sleeping, but the actual endpoint is GET /sleep_info. These must match. I'd suggest using /sleep_info since it returns more than a boolean.

6. `_sleep_level` sentinel value `-1` is confusing

Using -1 to mean "not sleeping" is a magic number. The sleep_info endpoint returns -1 as the level when the engine is awake, which is not documented and confusing for API consumers. Consider using None (which serializes to null in JSON) as the "not sleeping" sentinel, which the docstring already describes as the intended behavior: "null -- engine is awake".

7. `sleep_level()` method is added but not declared in any abstract base / protocol

The new sleep_level() method is added to AsyncOmni but there is no corresponding declaration in the EngineClient protocol or abstract base class (if one exists). This means type checkers won't catch missing implementations in other engine client classes. Please add it to the relevant protocol/ABC as well.

8. Minor: `level` query parameter is parsed from string without validation

level = raw_request.query_params.get("level", "1")
await engine_client.sleep(int(level))

If a user sends ?level=abc, this raises an unhandled ValueError inside the try block, which gets caught but returns a misleading "Failed to sleep engine" message. Consider validating the level explicitly and returning a 400 for invalid input.

Overall the feature direction is good, but the inconsistencies and missing error handling need to be cleaned up before this is ready to merge.

Signed-off-by: rongfu.leng <lenronfu@gmail.com>

Replacing with inline comments

Signed-off-by: rongfu.leng <lenronfu@gmail.com>

lengrongfu · 2026-04-18T07:59:37Z

@lishunyang12 thanks you commens, all other suggested changes have been completed; only point 7 remains unaddressed. This is because the EngineClient ABC (Abstract Base Class) is defined within vLLM itself. Aside from modifying vLLM, do you have any other suggestions for how to proceed?

Signed-off-by: rongfu.leng <lenronfu@gmail.com>

lengrongfu · 2026-04-22T07:15:52Z

#2022

add sleep and wake_up api in sleep model

482c47f

Signed-off-by: rongfu.leng <lenronfu@gmail.com>

lengrongfu requested a review from hsliuustc0106 as a code owner April 13, 2026 14:55

add /sleep_info api, remove /is_sleep api

97c932d

Signed-off-by: rongfu.leng <lenronfu@gmail.com>

yenuo26 reviewed Apr 16, 2026

View reviewed changes

lishunyang12 previously requested changes Apr 16, 2026

View reviewed changes

add unit test

1706b1e

Signed-off-by: rongfu.leng <lenronfu@gmail.com>

opt code and docs

e2047e2

Signed-off-by: rongfu.leng <lenronfu@gmail.com>

lengrongfu added 2 commits April 18, 2026 08:00

fix docs

7f66815

Signed-off-by: rongfu.leng <lenronfu@gmail.com>

fix docs

0e6a995

Signed-off-by: rongfu.leng <lenronfu@gmail.com>

lengrongfu closed this Apr 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add sleep and wake_up api in sleep model#2742

add sleep and wake_up api in sleep model#2742
lengrongfu wants to merge 6 commits into
vllm-project:mainfrom
lengrongfu:feat/add-sleep-api

lengrongfu commented Apr 13, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 13, 2026

Uh oh!

hsliuustc0106 commented Apr 13, 2026

Uh oh!

hsliuustc0106 commented Apr 14, 2026

Uh oh!

yenuo26 Apr 16, 2026

Uh oh!

lishunyang12 left a comment

Uh oh!

lengrongfu commented Apr 18, 2026

Uh oh!

lengrongfu commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		)


		@router.post("/sleep")

Conversation

lengrongfu commented Apr 13, 2026

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot commented Apr 13, 2026

Uh oh!

hsliuustc0106 commented Apr 13, 2026

Uh oh!

hsliuustc0106 commented Apr 14, 2026

Uh oh!

yenuo26 Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Review of "add sleep and wake_up api in sleep model"

1. Missing error handling in /sleep endpoint -- no return value on success

2. No guard for enable_sleep_mode

3. /wake_up has no error handling

4. /sleep_info crashes with AttributeError when engine_client is None

5. API inconsistency: docs say GET /is_sleeping, code provides GET /sleep_info

6. _sleep_level sentinel value -1 is confusing

7. sleep_level() method is added but not declared in any abstract base / protocol

8. Minor: level query parameter is parsed from string without validation

Uh oh!

lengrongfu commented Apr 18, 2026

Uh oh!

lengrongfu commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

1. Missing error handling in `/sleep` endpoint -- no return value on success

2. No guard for `enable_sleep_mode`

3. `/wake_up` has no error handling

4. `/sleep_info` crashes with `AttributeError` when `engine_client` is None

5. API inconsistency: docs say `GET /is_sleeping`, code provides `GET /sleep_info`

6. `_sleep_level` sentinel value `-1` is confusing

7. `sleep_level()` method is added but not declared in any abstract base / protocol

8. Minor: `level` query parameter is parsed from string without validation