[MISC][Bugfix] Use less CPU when message queue has been empty for some time by p12tic · Pull Request #16226 · vllm-project/vllm

p12tic · 2025-04-08T02:24:31Z

In setups which have long inactivity periods it is desirable to reduce system power consumption when vllm does nothing. This would lead to more CPU thermal headroom when a request eventually comes, especially when multiple GPUs are connected as each GPU would otherwise pin one thread at 100% CPU usage.

I didn't include any configuration knobs because I couldn't think of anyone who would adversely impacted by this change. It introduces a maximum additional latency of 100ms when vllm workers are inactive for 10 seconds or more. Seems like someone who has inactive vllm wouldn't care about small additional latency. But please let me know if you think otherwise.

EDIT: Final version of the PR contains new environment variable VLLM_SLEEP_WHEN_IDLE to enable CPU usage reduction. Set VLLM_SLEEP_WHEN_IDLE=1 to reduce power consumption at the expense of small amount of additional latency.

FIX #14799, #16968, #16660

github-actions · 2025-04-08T02:24:40Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

MattJCroteau · 2025-04-18T23:21:49Z

I was experiencing 100% CPU load on 0.8.3, applied this PR and it's now fixed. Thank you.

zacksiri · 2025-04-21T08:56:43Z

Any chance this can get merged?

p12tic · 2025-04-21T19:12:39Z

Rebased on latest main

p12tic · 2025-04-21T19:17:51Z

@youkaichao, @WoosukKwon, @mgoin Could you please take a look? I cannot request reviews, thus pinging directly. Thank you very much.

RawthiL · 2025-04-28T20:06:25Z

+1 to this.

I'm using vllm:6.2 on my local environment just for this, it's really annoying...

p12tic · 2025-04-29T18:11:04Z

@RawthiL FYI you can apply this PR to a recent release using docker as follows:

Dockerfile:

FROM docker.io/vllm/vllm-openai:v0.8.5

ADD vllm.patch /vllm.patch

RUN bash -c "cd /usr/local/lib/python3.12/dist-packages/ && patch -p1 < /vllm.patch"

vllm.patch can be generated by checking out this branch and doing git diff HEAD^^..HEAD > vllm.patch. Or just use the file attached to this comment. vllm.patch must be moved to the docker build context.

In docker-compose add the following for the vllm service:

    build:
      context: .

EDIT: Attached patch as file as the command to generate it is now more complex. --sleep-on-idle needs to be passed to vllm to turn on the feature. sleep-on-idle.txt

ahmeteminkocal · 2025-05-09T23:03:02Z

+1

RomainBrault · 2025-05-13T15:22:24Z

any update on this @youkaichao @WoosukKwon @mgoin ?

nOneKzero · 2025-05-15T04:24:52Z

+1

lingej · 2025-05-15T04:45:35Z

+1

bengsV · 2025-05-15T04:54:08Z

+1

Jollokim · 2025-05-15T11:09:34Z

+1

mgoin · 2025-05-15T14:27:49Z

Could you change the behavior so it is off by default, preserving original behavior? I don't think we want to disturb existing deployments that rely on fast responses regardless of continuous load. I think it is okay to add an opt-in config

youkaichao · 2025-05-15T15:25:54Z

agree with @mgoin that we can provide an option to enable the behavior, but we should not use it by default since normally we expect the server to be always busy (if it is idle for quite a long time, an upper level monitor should shut down the server).

njhill · 2025-05-15T15:33:18Z

I was wondering whether we can just use zmq instead of the shm spin there?j (it already supports both and falls back to zmq for large msg or remote cases anyhow). I suspect the perf difference may not be noticeable. And there could also be a config toggle for that in case there is any difference to latency.

xanadu-3g · 2025-05-15T21:20:34Z

@youkaichao Consider our use case: the vllm server gets a few thousand queries a day, mostly in working hours, the average inference takes 5 seconds, a few millisecond more latency has no relevance. Starting vllm takes more than a minute, so shutting it down is not an option. Saturating 2 cpu cores increases the system's power usage by 2,5x

chclaus · 2025-05-16T07:41:02Z

+1

p12tic · 2025-05-16T08:10:14Z

Could you change the behavior so it is off by default, preserving original behavior?

Will do, thanks.

p12tic · 2025-05-16T08:13:01Z

@njhill

I was wondering whether we can just use zmq instead of the shm spin there?

Using zmq does not necessarily mean that there won't be busy loop. I fixed very similar issue in sglang in their zmq code path (sgl-project/sglang#6026).

njhill · 2025-05-16T23:17:21Z

@njhill

I was wondering whether we can just use zmq instead of the shm spin there?

Using zmq does not necessarily mean that there won't be busy loop. I fixed very similar issue in sglang in their zmq code path (sgl-project/sglang#6026).

@p12tic there won't be any busy loop if zmq is used properly. Maybe they were reading nonblocking in a busy loop but that isn't what we do anywhere. I think we should just switch to that unless there's a measurable performance difference to using shm.

njhill · 2025-06-02T14:59:09Z

@p12tic are you ok with updating this to use an env var instead of CLI arg?

p12tic · 2025-06-04T20:10:54Z

@p12tic are you ok with updating this to use an env var instead of CLI arg?

Sorry, I only added reaction to your comment.

In setups which have long inactivity periods it is desirable to reduce system power consumption when vllm does nothing. The simpliest is to reduce polling frequency when there is no activity for a certain period of time. Fixes: vllm-project#14799 Signed-off-by: Povilas Kanapickas <povilas@radix.lt>

p12tic · 2025-06-04T20:34:38Z

@njhill Sorry for the delay again, I've implemented and tested env variable based solution.

njhill

Thanks @p12tic

njhill · 2025-06-04T23:29:44Z

Actually @p12tic would you be ok to add a test for this? Can just be copy of one of the existing simple tests but with the env var set, or else we're not exercising that path at all.

Signed-off-by: Povilas Kanapickas <povilas@radix.lt>

p12tic · 2025-06-05T08:30:10Z

@njhill Done, let me know if I should have added test in more appropriate place.

zacksiri · 2025-06-05T10:38:41Z

@p12tic thank you so much for this patch. I've been following this thread, your patience and persistence is greatly appreciated.

Thanks to the vllm team for merging this patch, I believe it will be beneficial for a lot of users who have smaller deployments.

njhill

Thanks again @p12tic for your work and patience!

…e time (vllm-project#16226) Signed-off-by: Povilas Kanapickas <povilas@radix.lt>

p12tic mentioned this pull request Apr 8, 2025

[Bug]: v0.7.4 dev version CPU usage remains at 100% even when no requests are being processed. #14799

Closed

1 task

p12tic force-pushed the reduce-spinning-when-no-activity branch from 6d50290 to 24bf316 Compare April 8, 2025 02:48

p12tic changed the title ~~[MISC] Use less CPU when message queue has been empty for some time~~ [MISC][Bugfix] Use less CPU when message queue has been empty for some time Apr 8, 2025

MattJCroteau mentioned this pull request Apr 18, 2025

[Bug]: 100% CPU usage when idle #16660

Closed

1 task

p12tic force-pushed the reduce-spinning-when-no-activity branch from 24bf316 to 2eea42d Compare April 21, 2025 19:07

p12tic force-pushed the reduce-spinning-when-no-activity branch from 2eea42d to 5aa28ba Compare April 21, 2025 19:14

p12tic force-pushed the reduce-spinning-when-no-activity branch from 5aa28ba to 447a8f4 Compare April 21, 2025 19:24

p12tic force-pushed the reduce-spinning-when-no-activity branch from 447a8f4 to dcae20d Compare May 22, 2025 08:11

p12tic requested a review from WoosukKwon as a code owner May 22, 2025 08:11

mergify bot added the needs-rebase label Jun 2, 2025

youkaichao mentioned this pull request Jun 3, 2025

[Bug]: 100% CPU usage when idle. While loop in acquire_read pegging the CPU. #19036

Closed

1 task

p12tic force-pushed the reduce-spinning-when-no-activity branch from 58d4caa to 219f47a Compare June 4, 2025 20:25

mergify bot removed the needs-rebase label Jun 4, 2025

p12tic requested a review from njhill June 4, 2025 21:35

njhill approved these changes Jun 4, 2025

View reviewed changes

p12tic force-pushed the reduce-spinning-when-no-activity branch from 219f47a to 62eb078 Compare June 5, 2025 07:58

[MISC] Add VLLM_SLEEP_WHEN_IDLE env arg

2a1d615

Signed-off-by: Povilas Kanapickas <povilas@radix.lt>

p12tic force-pushed the reduce-spinning-when-no-activity branch from 62eb078 to 2a1d615 Compare June 5, 2025 08:08

njhill approved these changes Jun 5, 2025

View reviewed changes

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 5, 2025

njhill enabled auto-merge (squash) June 5, 2025 15:00

njhill merged commit 85e2b7b into vllm-project:main Jun 5, 2025
79 checks passed

p12tic deleted the reduce-spinning-when-no-activity branch June 5, 2025 18:24

chaunceyjiang mentioned this pull request Jun 6, 2025

[Bug]: high cpu utilization when there is no inference task #19243

Closed

1 task

AllenDou mentioned this pull request Jul 4, 2025

[Usage]: 使用vllm的docker镜像启动Qwen/Qwen3-32B模型服务，CPU占用一直100% #19150

Closed

1 task

leoli1208 pushed a commit to leoli1208/vllm that referenced this pull request Jul 22, 2025

[MISC][Bugfix] Use less CPU when message queue has been empty for som…

a133812

…e time (vllm-project#16226) Signed-off-by: Povilas Kanapickas <povilas@radix.lt>

noobpwnftw mentioned this pull request Sep 14, 2025

[Core] Reuse ZMQ to level trigger local ShmRingBuffer events. #24618

Open

5 tasks

joerunde mentioned this pull request Nov 4, 2025

[Core] Remove busy loop from idle buffer readers #28053

Merged

7 tasks

wuhang2014 mentioned this pull request Jan 31, 2026

[Feature]: Profiler Trace Size Optimization vllm-project/vllm-omni#651

Open

1 task

Uh oh!

Conversation

p12tic commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 8, 2025

Uh oh!

MattJCroteau commented Apr 18, 2025

Uh oh!

zacksiri commented Apr 21, 2025

Uh oh!

p12tic commented Apr 21, 2025

Uh oh!

p12tic commented Apr 21, 2025

Uh oh!

RawthiL commented Apr 28, 2025

Uh oh!

p12tic commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahmeteminkocal commented May 9, 2025

Uh oh!

RomainBrault commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nOneKzero commented May 15, 2025

Uh oh!

lingej commented May 15, 2025

Uh oh!

bengsV commented May 15, 2025

Uh oh!

Jollokim commented May 15, 2025

Uh oh!

mgoin commented May 15, 2025

Uh oh!

youkaichao commented May 15, 2025

Uh oh!

njhill commented May 15, 2025

Uh oh!

xanadu-3g commented May 15, 2025

Uh oh!

chclaus commented May 16, 2025

Uh oh!

p12tic commented May 16, 2025

Uh oh!

p12tic commented May 16, 2025

Uh oh!

njhill commented May 16, 2025

Uh oh!

njhill commented Jun 2, 2025

Uh oh!

p12tic commented Jun 4, 2025

Uh oh!

p12tic commented Jun 4, 2025

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

njhill commented Jun 4, 2025

Uh oh!

p12tic commented Jun 5, 2025

Uh oh!

zacksiri commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

16 participants

p12tic commented Apr 8, 2025 •

edited

Loading

p12tic commented Apr 29, 2025 •

edited

Loading

RomainBrault commented May 13, 2025 •

edited

Loading

zacksiri commented Jun 5, 2025 •

edited

Loading