Skip to content

add sleep and wake_up api in sleep model#2742

Closed
lengrongfu wants to merge 6 commits into
vllm-project:mainfrom
lengrongfu:feat/add-sleep-api
Closed

add sleep and wake_up api in sleep model#2742
lengrongfu wants to merge 6 commits into
vllm-project:mainfrom
lengrongfu:feat/add-sleep-api

Conversation

@lengrongfu

Copy link
Copy Markdown
Contributor

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Current wen can set Sleep Mode, but not api to set sleep and wake_up, so we need add it.

Test Plan

  1. start serve
$ python3 -m vllm_omni.entrypoints.cli.main serve /home/jovyan/Wan2.2-T2V-A14B-Diffusers/ --omni  --enable-sleep-mode
  1. call sleep api
$ curl -X POST 'localhost:8000/sleep'

we can look a log Sleep mode (process-scoped) freed 64.23 GiB memory, 0.71 GiB memory is still in use.

  1. call wake_up api
$ curl -X POST 'localhost:8000/wake_up'

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: rongfu.leng <lenronfu@gmail.com>
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@hsliuustc0106

Copy link
Copy Markdown
Collaborator

PR #2742 - add sleep and wake_up api

OVERALL: NO BLOCKERS (missing doc)
VERDICT: COMMENT

Correctness: PASS
Reliability: PASS
Breaking: PASS
Tests: PASS (curl commands present)
Documentation: ISSUES - missing API doc for /sleep and /wake_up
Security: PASS

Summary: Add /sleep and /wake_up API endpoints. 36 add. Test plan has curl commands, test result empty. Please update API documentation for new endpoints.

@hsliuustc0106

Copy link
Copy Markdown
Collaborator

BLOCKER scan:

  • Correctness: PASS
  • Reliability/Safety: PASS
  • Breaking Changes: PASS (new API)
  • Test Coverage: PASS
  • Documentation: PASS (docs updated)
  • Security: PASS

OVERALL: NO BLOCKERS

VERDICT: COMMENT

Simple and straightforward API additions. A few minor suggestions:

  1. Consider adding try-except blocks around the engine_client calls to return proper HTTP error codes instead of 500
  2. The tags parameter in wake_up() - when it's an empty list, you set it to None. This is correct but could be clarified with a comment
  3. Consider adding a GET endpoint to query the current sleep level, not just is_sleeping status

Signed-off-by: rongfu.leng <lenronfu@gmail.com>
)


@router.post("/sleep")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some unit test cases in tests/entrypoints?

@lishunyang12 lishunyang12 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of "add sleep and wake_up api in sleep model"

Thanks for adding the HTTP API surface for sleep/wake_up -- this is a useful feature. I have several concerns that should be addressed before merging.


1. Missing error handling in /sleep endpoint -- no return value on success

The /sleep handler catches exceptions and returns an error, but on the success path it returns nothing (implicitly None, which FastAPI turns into a 200 with no body). Compare with /wake_up which explicitly returns Response(status_code=200). Both endpoints should return an explicit, consistent response. Suggest returning a JSONResponse with a body like {"status": "ok"} from both, or at least a Response(status_code=200).

2. No guard for enable_sleep_mode

The sleep/wake_up/sleep_info endpoints are registered unconditionally on the router. If the server was started without --enable-sleep-mode, calling /sleep will still invoke engine_client.sleep() which may produce unexpected behavior or crash. The endpoints should either:

  • Be conditionally registered only when sleep mode is enabled, or
  • Check whether sleep mode is enabled and return 400 Bad Request with a clear message otherwise.

3. /wake_up has no error handling

Unlike /sleep and /sleep_info, the /wake_up handler has no try/except block. If engine_client.wake_up(tags) raises, the user gets an unhandled 500 with a stack trace. Please wrap it in the same pattern used by the other two endpoints.

4. /sleep_info crashes with AttributeError when engine_client is None

engine_client = getattr(raw_request.app.state, "engine_client", None)
level = await engine_client.sleep_level()  # AttributeError if None

If engine_client is None, the next line will raise AttributeError: 'NoneType' object has no attribute 'sleep_level'. Either remove the getattr fallback (since all other endpoints access raw_request.app.state.engine_client directly), or add a None check.

5. API inconsistency: docs say GET /is_sleeping, code provides GET /sleep_info

The documentation added to docs/features/sleep_mode.md advertises GET /is_sleeping, but the actual endpoint is GET /sleep_info. These must match. I'd suggest using /sleep_info since it returns more than a boolean.

6. _sleep_level sentinel value -1 is confusing

Using -1 to mean "not sleeping" is a magic number. The sleep_info endpoint returns -1 as the level when the engine is awake, which is not documented and confusing for API consumers. Consider using None (which serializes to null in JSON) as the "not sleeping" sentinel, which the docstring already describes as the intended behavior: "null -- engine is awake".

7. sleep_level() method is added but not declared in any abstract base / protocol

The new sleep_level() method is added to AsyncOmni but there is no corresponding declaration in the EngineClient protocol or abstract base class (if one exists). This means type checkers won't catch missing implementations in other engine client classes. Please add it to the relevant protocol/ABC as well.

8. Minor: level query parameter is parsed from string without validation

level = raw_request.query_params.get("level", "1")
await engine_client.sleep(int(level))

If a user sends ?level=abc, this raises an unhandled ValueError inside the try block, which gets caught but returns a misleading "Failed to sleep engine" message. Consider validating the level explicitly and returning a 400 for invalid input.


Overall the feature direction is good, but the inconsistencies and missing error handling need to be cleaned up before this is ready to merge.

Signed-off-by: rongfu.leng <lenronfu@gmail.com>
@lishunyang12 lishunyang12 dismissed their stale review April 16, 2026 14:56

Replacing with inline comments

Signed-off-by: rongfu.leng <lenronfu@gmail.com>
@lengrongfu

Copy link
Copy Markdown
Contributor Author

@lishunyang12 thanks you commens, all other suggested changes have been completed; only point 7 remains unaddressed. This is because the EngineClient ABC (Abstract Base Class) is defined within vLLM itself. Aside from modifying vLLM, do you have any other suggestions for how to proceed?

Signed-off-by: rongfu.leng <lenronfu@gmail.com>
Signed-off-by: rongfu.leng <lenronfu@gmail.com>
@lengrongfu

Copy link
Copy Markdown
Contributor Author

#2022

@lengrongfu lengrongfu closed this Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants