add sleep and wake_up api in sleep model#2742
Conversation
Signed-off-by: rongfu.leng <lenronfu@gmail.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
PR #2742 - add sleep and wake_up api OVERALL: NO BLOCKERS (missing doc) Correctness: PASS Summary: Add /sleep and /wake_up API endpoints. 36 add. Test plan has curl commands, test result empty. Please update API documentation for new endpoints. |
|
BLOCKER scan:
OVERALL: NO BLOCKERS VERDICT: COMMENT Simple and straightforward API additions. A few minor suggestions:
|
Signed-off-by: rongfu.leng <lenronfu@gmail.com>
| ) | ||
|
|
||
|
|
||
| @router.post("/sleep") |
There was a problem hiding this comment.
Could you add some unit test cases in tests/entrypoints?
lishunyang12
left a comment
There was a problem hiding this comment.
Review of "add sleep and wake_up api in sleep model"
Thanks for adding the HTTP API surface for sleep/wake_up -- this is a useful feature. I have several concerns that should be addressed before merging.
1. Missing error handling in /sleep endpoint -- no return value on success
The /sleep handler catches exceptions and returns an error, but on the success path it returns nothing (implicitly None, which FastAPI turns into a 200 with no body). Compare with /wake_up which explicitly returns Response(status_code=200). Both endpoints should return an explicit, consistent response. Suggest returning a JSONResponse with a body like {"status": "ok"} from both, or at least a Response(status_code=200).
2. No guard for enable_sleep_mode
The sleep/wake_up/sleep_info endpoints are registered unconditionally on the router. If the server was started without --enable-sleep-mode, calling /sleep will still invoke engine_client.sleep() which may produce unexpected behavior or crash. The endpoints should either:
- Be conditionally registered only when sleep mode is enabled, or
- Check whether sleep mode is enabled and return
400 Bad Requestwith a clear message otherwise.
3. /wake_up has no error handling
Unlike /sleep and /sleep_info, the /wake_up handler has no try/except block. If engine_client.wake_up(tags) raises, the user gets an unhandled 500 with a stack trace. Please wrap it in the same pattern used by the other two endpoints.
4. /sleep_info crashes with AttributeError when engine_client is None
engine_client = getattr(raw_request.app.state, "engine_client", None)
level = await engine_client.sleep_level() # AttributeError if NoneIf engine_client is None, the next line will raise AttributeError: 'NoneType' object has no attribute 'sleep_level'. Either remove the getattr fallback (since all other endpoints access raw_request.app.state.engine_client directly), or add a None check.
5. API inconsistency: docs say GET /is_sleeping, code provides GET /sleep_info
The documentation added to docs/features/sleep_mode.md advertises GET /is_sleeping, but the actual endpoint is GET /sleep_info. These must match. I'd suggest using /sleep_info since it returns more than a boolean.
6. _sleep_level sentinel value -1 is confusing
Using -1 to mean "not sleeping" is a magic number. The sleep_info endpoint returns -1 as the level when the engine is awake, which is not documented and confusing for API consumers. Consider using None (which serializes to null in JSON) as the "not sleeping" sentinel, which the docstring already describes as the intended behavior: "null -- engine is awake".
7. sleep_level() method is added but not declared in any abstract base / protocol
The new sleep_level() method is added to AsyncOmni but there is no corresponding declaration in the EngineClient protocol or abstract base class (if one exists). This means type checkers won't catch missing implementations in other engine client classes. Please add it to the relevant protocol/ABC as well.
8. Minor: level query parameter is parsed from string without validation
level = raw_request.query_params.get("level", "1")
await engine_client.sleep(int(level))If a user sends ?level=abc, this raises an unhandled ValueError inside the try block, which gets caught but returns a misleading "Failed to sleep engine" message. Consider validating the level explicitly and returning a 400 for invalid input.
Overall the feature direction is good, but the inconsistencies and missing error handling need to be cleaned up before this is ready to merge.
Signed-off-by: rongfu.leng <lenronfu@gmail.com>
Signed-off-by: rongfu.leng <lenronfu@gmail.com>
|
@lishunyang12 thanks you commens, all other suggested changes have been completed; only point 7 remains unaddressed. This is because the |
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Current wen can set Sleep Mode, but not api to set sleep and wake_up, so we need add it.
Test Plan
we can look a log
Sleep mode (process-scoped) freed 64.23 GiB memory, 0.71 GiB memory is still in use.Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)