Skip to content

[CI][CPU] Fix worker shutdown crash on CPU backend#41130

Closed
haosdent wants to merge 1 commit intovllm-project:mainfrom
haosdent:ci-8b99b352
Closed

[CI][CPU] Fix worker shutdown crash on CPU backend#41130
haosdent wants to merge 1 commit intovllm-project:mainfrom
haosdent:ci-8b99b352

Conversation

@haosdent
Copy link
Copy Markdown
Contributor

@haosdent haosdent commented Apr 28, 2026

Purpose

Fix CPU CI regression from #38503. The new Worker.shutdown()model_runner.shutdown()_cleanup_profiling_kv_cache() path calls torch.accelerator.synchronize() / empty_cache() directly, which raise RuntimeError: Cannot access accelerator device when none is available. on CPU-only hosts. Every CPU worker crashes on teardown and the suite times out (Buildkite #63257, exit 124).

Route both calls through the existing _sync_device() virtualization hook plus a matching new _empty_cache() hook, with no-op overrides on CPUModelRunner. XPU is unaffected (torch.accelerator works there).

Test Plan

Test Result

@mergify mergify Bot added cpu Related to CPU backends v1 bug Something isn't working labels Apr 28, 2026
@haosdent haosdent changed the title [WIP][Bugfix][CPU] Fix worker shutdown crash on CPU backend [Bugfix][CPU] Fix worker shutdown crash on CPU backend Apr 28, 2026
@haosdent haosdent marked this pull request as ready for review April 28, 2026 12:13
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@haosdent haosdent changed the title [Bugfix][CPU] Fix worker shutdown crash on CPU backend [CI][CPU] Fix worker shutdown crash on CPU backend Apr 28, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new private method, _empty_cache, to both CPUModelRunner and GPUModelRunner classes for standardized cache management. The GPUModelRunner's implementation wraps torch.accelerator.empty_cache(), while the CPU version is a no-op. Additionally, the _cleanup_profiling_kv_cache method in GPUModelRunner is refactored to utilize these new and existing private methods (_empty_cache and _sync_device), abstracting direct calls to torch.accelerator functions for improved encapsulation and consistency. There is no feedback to provide.

PR vllm-project#38503 added a model_runner.shutdown() call from Worker.shutdown(),
which invokes _cleanup_profiling_kv_cache() containing direct
torch.accelerator.synchronize() and torch.accelerator.empty_cache()
calls. On CPU-only hosts these raise "Cannot access accelerator device
when none is available." causing every CPU CI worker to crash on
teardown and the suite to time out.

Route both calls through the existing _sync_device() virtualization
hook plus a matching new _empty_cache() hook, with no-op overrides on
CPUModelRunner. XPU keeps inheriting the GPU defaults since
torch.accelerator works there.
Signed-off-by: haosdent <haosdent@gmail.com>
@bigPYJ1151
Copy link
Copy Markdown
Member

Thanks for the fix :) But it duplicates with #41034

@bigPYJ1151 bigPYJ1151 closed this Apr 29, 2026
@haosdent
Copy link
Copy Markdown
Contributor Author

thanks @bigPYJ1151

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cpu Related to CPU backends v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants