[Core] Add sleep level 0 mode with enqueue/wait pattern by jaewonlee-fb · Pull Request #33195 · vllm-project/vllm

jaewonlee-fb · 2026-01-27T20:57:29Z

Summary

Add level 0 sleep mode that pauses scheduling. See details below. We believe this is a useful feature for precise scheduling of requests and debugging.
Add enqueue() and wait_for_completion() methods to offline LLM class for explicit request scheduling control

Level 0 Sleep

Pauses scheduling but keeps accepting requests
No GPU memory changes (unlike level 1/2)
Wake up with tags=["scheduling"] to resume

Use Case

Enables batched inference patterns where all requests are queued first, then processed together.

Test plan

No-op by default, could be used for offline inference LLM class as a start.

gemini-code-assist

Code Review

The pull request introduces a new 'sleep level 0' mode, which allows pausing the engine's scheduling without offloading model weights or KV cache from GPU memory. This is implemented by introducing a scheduling_paused flag in EngineCore and modifying the step, sleep, wake_up, run_busy_loop, and _process_input_queue methods to respect this state. The LLM.generate method is refactored to use enqueue and wait_for_completion for a more flexible request handling pattern. The changes appear to correctly implement the intended functionality, providing a new mechanism for fine-grained control over engine activity without incurring the overhead of full memory offload. No critical or high-severity issues were identified.

mergify · 2026-01-27T21:01:54Z

Hi @jaewonlee-fb, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

Comment @cursor review or bugbot run to trigger another review on this PR

vllm/v1/engine/core.py

vllm/v1/engine/core_client.py

mergify · 2026-01-31T15:01:00Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jaewonlee-fb.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2026-02-02T23:13:42Z

Hi @jaewonlee-fb, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

houseroad · 2026-02-03T21:33:46Z

vllm/entrypoints/llm.py

+                           CPU memory pressure.
        """
-        self.reset_prefix_cache()
+        if level > 0:


what's the behavior of level 0?

Will this cause any breakage if user use level 0 before?

houseroad · 2026-02-03T21:34:09Z

vllm/entrypoints/llm.py

+
+        return self.wait_for_completion(use_tqdm=use_tqdm)
+
+    def enqueue(


where do we expect to call this function?

houseroad

Could you explain the usage of this new functions?

mergify · 2026-02-06T16:32:02Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jaewonlee-fb.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2026-02-06T18:23:16Z

Hi @jaewonlee-fb, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

mergify · 2026-02-07T20:44:59Z

Hi @jaewonlee-fb, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

mergify · 2026-02-09T23:18:33Z

Hi @jaewonlee-fb, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

mergify · 2026-02-10T02:28:27Z

Hi @jaewonlee-fb, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Add level 0 sleep mode that pauses scheduling without touching GPU memory. This enables batched inference patterns where all requests are queued first, then processed together. Also adds enqueue() and wait_for_completion() methods to LLM class for explicit control over request scheduling. Level 0 sleep: - Pauses scheduling but keeps accepting requests - No GPU memory changes (unlike level 1/2) - Wake up with tags=["scheduling"] to resume Also adds profile_prefix parameter to start_profile() for custom trace naming. Signed-off-by: Jaewon Lee <jaewon@meta.com>

Signed-off-by: Jaewon Lee <jaewon@meta.com>

Level 0 sleep should only pause scheduling without any side effects. The sync path (llm.py) correctly guards reset_prefix_cache with `if level > 0:`, but the async path was missing this check. Signed-off-by: Jaewon Lee <jaewon@meta.com>

Signed-off-by: Jaewon Lee <jaewon@meta.com>

…#33195) Signed-off-by: Jaewon Lee <jaewon@meta.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com> Signed-off-by: Eldar Kurtic <research@neuralmagic.com>

…#33195) Signed-off-by: Jaewon Lee <jaewon@meta.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>

jaewonlee-fb requested review from aarnphm and chaunceyjiang as code owners January 27, 2026 20:57

mergify bot added frontend v1 labels Jan 27, 2026

gemini-code-assist bot reviewed Jan 27, 2026

View reviewed changes

cursor bot reviewed Jan 27, 2026

View reviewed changes

vllm/v1/engine/core.py Show resolved Hide resolved

vllm/v1/engine/core_client.py Show resolved Hide resolved

jaewonlee-fb force-pushed the sleep-level-0 branch 2 times, most recently from e6a63aa to 63a8b06 Compare January 27, 2026 21:31

mergify bot added the needs-rebase label Jan 31, 2026

jaewonlee-fb force-pushed the sleep-level-0 branch from 63a8b06 to 7937cde Compare February 2, 2026 21:19

mergify bot removed the needs-rebase label Feb 2, 2026

houseroad reviewed Feb 3, 2026

View reviewed changes

houseroad approved these changes Feb 4, 2026

View reviewed changes

houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 4, 2026

jaewonlee-fb force-pushed the sleep-level-0 branch from 014e33a to 611888e Compare February 6, 2026 05:59

mergify bot added the needs-rebase label Feb 6, 2026

jaewonlee-fb force-pushed the sleep-level-0 branch from 86d66c1 to f5a74b1 Compare February 6, 2026 18:18

mergify bot removed the needs-rebase label Feb 6, 2026

jaewonlee-fb force-pushed the sleep-level-0 branch from f5a74b1 to f1ece8b Compare February 6, 2026 18:35

jaewonlee-fb force-pushed the sleep-level-0 branch from b2ac14a to 65fca50 Compare February 9, 2026 23:14

jaewonlee-fb force-pushed the sleep-level-0 branch 2 times, most recently from 7dfe329 to cc187a3 Compare February 10, 2026 21:19

jaewonlee-fb added 4 commits February 11, 2026 09:22

Retrigger CI

7b0d059

Signed-off-by: Jaewon Lee <jaewon@meta.com>

Retrigger CI

b5fcb4c

Signed-off-by: Jaewon Lee <jaewon@meta.com>

jaewonlee-fb force-pushed the sleep-level-0 branch from cc187a3 to b5fcb4c Compare February 11, 2026 17:23

Merge branch 'main' into sleep-level-0

cff329a

njhill self-requested a review February 12, 2026 21:57

zhuohan123 merged commit aa181c9 into vllm-project:main Feb 13, 2026
46 of 50 checks passed

njhill mentioned this pull request Feb 13, 2026

[Core] Cleanup engine pause/sleep logic #34528

Merged


		return self.wait_for_completion(use_tqdm=use_tqdm)

		def enqueue(

Uh oh!

Conversation

jaewonlee-fb commented Jan 27, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Level 0 Sleep

Use Case

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mergify bot commented Jan 27, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Jan 31, 2026

Uh oh!

mergify bot commented Feb 2, 2026

Uh oh!

houseroad Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

houseroad Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

houseroad left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Feb 6, 2026

Uh oh!

mergify bot commented Feb 6, 2026

Uh oh!

mergify bot commented Feb 7, 2026

Uh oh!

mergify bot commented Feb 9, 2026

Uh oh!

mergify bot commented Feb 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jaewonlee-fb commented Jan 27, 2026 •

edited by github-actions bot

Loading