Skip to content

[Core] Add sleep level 0 mode with enqueue/wait pattern#33195

Merged
zhuohan123 merged 5 commits intovllm-project:mainfrom
jaewonlee-fb:sleep-level-0
Feb 13, 2026
Merged

[Core] Add sleep level 0 mode with enqueue/wait pattern#33195
zhuohan123 merged 5 commits intovllm-project:mainfrom
jaewonlee-fb:sleep-level-0

Conversation

@jaewonlee-fb
Copy link
Contributor

@jaewonlee-fb jaewonlee-fb commented Jan 27, 2026

Summary

  • Add level 0 sleep mode that pauses scheduling. See details below. We believe this is a useful feature for precise scheduling of requests and debugging.
  • Add enqueue() and wait_for_completion() methods to offline LLM class for explicit request scheduling control

Level 0 Sleep

  • Pauses scheduling but keeps accepting requests
  • No GPU memory changes (unlike level 1/2)
  • Wake up with tags=["scheduling"] to resume

Use Case

Enables batched inference patterns where all requests are queued first, then processed together.

Test plan

No-op by default, could be used for offline inference LLM class as a start.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a new 'sleep level 0' mode, which allows pausing the engine's scheduling without offloading model weights or KV cache from GPU memory. This is implemented by introducing a scheduling_paused flag in EngineCore and modifying the step, sleep, wake_up, run_busy_loop, and _process_input_queue methods to respect this state. The LLM.generate method is refactored to use enqueue and wait_for_completion for a more flexible request handling pattern. The changes appear to correctly implement the intended functionality, providing a new mechanism for fine-grained control over engine activity without incurring the overhead of full memory offload. No critical or high-severity issues were identified.

@mergify
Copy link

mergify bot commented Jan 27, 2026

Hi @jaewonlee-fb, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

@jaewonlee-fb jaewonlee-fb force-pushed the sleep-level-0 branch 2 times, most recently from e6a63aa to 63a8b06 Compare January 27, 2026 21:31
@mergify
Copy link

mergify bot commented Jan 31, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jaewonlee-fb.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify
Copy link

mergify bot commented Feb 2, 2026

Hi @jaewonlee-fb, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

CPU memory pressure.
"""
self.reset_prefix_cache()
if level > 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the behavior of level 0?

Will this cause any breakage if user use level 0 before?


return self.wait_for_completion(use_tqdm=use_tqdm)

def enqueue(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do we expect to call this function?

Copy link
Collaborator

@houseroad houseroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain the usage of this new functions?

@houseroad houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 4, 2026
@mergify
Copy link

mergify bot commented Feb 6, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jaewonlee-fb.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify
Copy link

mergify bot commented Feb 6, 2026

Hi @jaewonlee-fb, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@mergify
Copy link

mergify bot commented Feb 7, 2026

Hi @jaewonlee-fb, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@mergify
Copy link

mergify bot commented Feb 9, 2026

Hi @jaewonlee-fb, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

1 similar comment
@mergify
Copy link

mergify bot commented Feb 10, 2026

Hi @jaewonlee-fb, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@jaewonlee-fb jaewonlee-fb force-pushed the sleep-level-0 branch 2 times, most recently from 7dfe329 to cc187a3 Compare February 10, 2026 21:19
Add level 0 sleep mode that pauses scheduling without touching GPU memory.
This enables batched inference patterns where all requests are queued first,
then processed together. Also adds enqueue() and wait_for_completion() methods
to LLM class for explicit control over request scheduling.

Level 0 sleep:
- Pauses scheduling but keeps accepting requests
- No GPU memory changes (unlike level 1/2)
- Wake up with tags=["scheduling"] to resume

Also adds profile_prefix parameter to start_profile() for custom trace naming.

Signed-off-by: Jaewon Lee <jaewon@meta.com>
Signed-off-by: Jaewon Lee <jaewon@meta.com>
Level 0 sleep should only pause scheduling without any side effects.
The sync path (llm.py) correctly guards reset_prefix_cache with
`if level > 0:`, but the async path was missing this check.

Signed-off-by: Jaewon Lee <jaewon@meta.com>
Signed-off-by: Jaewon Lee <jaewon@meta.com>
@njhill njhill self-requested a review February 12, 2026 21:57
@zhuohan123 zhuohan123 merged commit aa181c9 into vllm-project:main Feb 13, 2026
46 of 50 checks passed
eldarkurtic pushed a commit to eldarkurtic/vllm that referenced this pull request Feb 19, 2026
…#33195)

Signed-off-by: Jaewon Lee <jaewon@meta.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
Signed-off-by: Eldar Kurtic <research@neuralmagic.com>
llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026
…#33195)

Signed-off-by: Jaewon Lee <jaewon@meta.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026
…#33195)

Signed-off-by: Jaewon Lee <jaewon@meta.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants