[CI][CPU] Fix worker shutdown crash on CPU backend by haosdent · Pull Request #41130 · vllm-project/vllm

haosdent · 2026-04-28T12:09:29Z

Purpose

Fix CPU CI regression from #38503. The new Worker.shutdown() → model_runner.shutdown() → _cleanup_profiling_kv_cache() path calls torch.accelerator.synchronize() / empty_cache() directly, which raise RuntimeError: Cannot access accelerator device when none is available. on CPU-only hosts. Every CPU worker crashes on teardown and the suite times out (Buildkite #63257, exit 124).

Route both calls through the existing _sync_device() virtualization hook plus a matching new _empty_cache() hook, with no-op overrides on CPUModelRunner. XPU is unaffected (torch.accelerator works there).

Test Plan

Test Result

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request introduces a new private method, _empty_cache, to both CPUModelRunner and GPUModelRunner classes for standardized cache management. The GPUModelRunner's implementation wraps torch.accelerator.empty_cache(), while the CPU version is a no-op. Additionally, the _cleanup_profiling_kv_cache method in GPUModelRunner is refactored to utilize these new and existing private methods (_empty_cache and _sync_device), abstracting direct calls to torch.accelerator functions for improved encapsulation and consistency. There is no feedback to provide.

PR vllm-project#38503 added a model_runner.shutdown() call from Worker.shutdown(), which invokes _cleanup_profiling_kv_cache() containing direct torch.accelerator.synchronize() and torch.accelerator.empty_cache() calls. On CPU-only hosts these raise "Cannot access accelerator device when none is available." causing every CPU CI worker to crash on teardown and the suite to time out. Route both calls through the existing _sync_device() virtualization hook plus a matching new _empty_cache() hook, with no-op overrides on CPUModelRunner. XPU keeps inheriting the GPU defaults since torch.accelerator works there. Signed-off-by: haosdent <haosdent@gmail.com>

bigPYJ1151 · 2026-04-29T05:18:43Z

Thanks for the fix :) But it duplicates with #41034

haosdent · 2026-04-29T05:23:08Z

thanks @bigPYJ1151

mergify Bot added cpu Related to CPU backends v1 bug Something isn't working labels Apr 28, 2026

haosdent changed the title ~~[WIP][Bugfix][CPU] Fix worker shutdown crash on CPU backend~~ [Bugfix][CPU] Fix worker shutdown crash on CPU backend Apr 28, 2026

haosdent marked this pull request as ready for review April 28, 2026 12:13

haosdent requested review from bigPYJ1151, njhill and xuechendi as code owners April 28, 2026 12:13

claude Bot reviewed Apr 28, 2026

View reviewed changes

haosdent changed the title ~~[Bugfix][CPU] Fix worker shutdown crash on CPU backend~~ [CI][CPU] Fix worker shutdown crash on CPU backend Apr 28, 2026

gemini-code-assist Bot reviewed Apr 28, 2026

View reviewed changes

haosdent force-pushed the ci-8b99b352 branch from 916719f to c1ad777 Compare April 28, 2026 13:10

haosdent mentioned this pull request Apr 29, 2026

[CI] De-flake test_chat_completion_n_parameter_non_streaming #41147

Merged

bigPYJ1151 closed this Apr 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI][CPU] Fix worker shutdown crash on CPU backend#41130

[CI][CPU] Fix worker shutdown crash on CPU backend#41130
haosdent wants to merge 1 commit intovllm-project:mainfrom
haosdent:ci-8b99b352

haosdent commented Apr 28, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

bigPYJ1151 commented Apr 29, 2026

Uh oh!

haosdent commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

haosdent commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

bigPYJ1151 commented Apr 29, 2026

Uh oh!

haosdent commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

haosdent commented Apr 28, 2026 •

edited

Loading