Skip to content

Revert "[Hardware] Replace torch.cuda.empty_cache with torch.accelerator.empty_cache" (#30681)#36076

Draft
zhewenl wants to merge 1 commit intovllm-project:mainfrom
zhewenl:auto-revert/pr-30681
Draft

Revert "[Hardware] Replace torch.cuda.empty_cache with torch.accelerator.empty_cache" (#30681)#36076
zhewenl wants to merge 1 commit intovllm-project:mainfrom
zhewenl:auto-revert/pr-30681

Conversation

@zhewenl
Copy link
Collaborator

@zhewenl zhewenl commented Mar 5, 2026

Revert of #30681

This reverts the merge commit for PR #30681 which replaced torch.cuda.empty_cache with torch.accelerator.empty_cache across the codebase.

Reason

This PR is linked to 1 new CI failure in nightly build #54530:

  • Distributed Tests (4 GPUs)test_torchrun_example_moe.py fails with KV cache memory error: available memory 0.49 GiB < needed 0.50 GiB. The replacement of torch.cuda.empty_cache with torch.accelerator.empty_cache may affect GPU memory reclamation behavior, causing this marginal shortfall.

Auto-generated

This revert PR was auto-generated by the CI failure analyzer. Please review before merging.

@mergify
Copy link

mergify bot commented Mar 5, 2026

Documentation preview: https://vllm--36076.org.readthedocs.build/en/36076/

@mergify mergify bot added documentation Improvements or additions to documentation performance Performance-related issues nvidia structured-output v1 labels Mar 5, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request reverts a previous change that replaced torch.cuda.empty_cache with torch.accelerator.empty_cache, which caused CI failures. The revert is mostly mechanical, but in some platform-agnostic files, it correctly uses a platform abstraction (current_platform.empty_cache()) instead of hardcoding torch.cuda.empty_cache. This is a good improvement. However, I've identified a critical issue in vllm/v1/worker/xpu_model_runner.py where a monkey-patch is not reverted, potentially leading to side effects.

if supports_xpu_graph():
torch.cuda.graph = torch.xpu.graph
torch.cuda.CUDAGraph = torch.xpu.XPUGraph
torch.cuda.empty_cache = torch.xpu.empty_cache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This monkey-patch is not reverted in the finally block, making it permanent for the process. This can cause unexpected behavior if other parts of the code expect the original torch.cuda.empty_cache. The same issue exists for torch.cuda.graph and torch.cuda.CUDAGraph from the original code. A context manager should restore the original state upon exit. Please save the original attributes before patching and restore them in the finally block.

@mergify
Copy link

mergify bot commented Mar 5, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zhewenl.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 5, 2026
@jikunshang
Copy link
Collaborator

jikunshang commented Mar 9, 2026

#30681 run all test. Distributed Tests (4 GPUs) passed see https://buildkite.com/vllm/ci/builds/54293#019cb62a-38d7-4e77-a962-89d3fb0de589
while strange thing is this is running ibm-research/PowerMoE-3b instead of microsoft/Phi-mini-MoE-instruct
oh sorry, i didn't check full log. it shows that only have some chat template error.

my pr log:

[2026-03-04T00:25:02Z] INFO 03-04 00:25:02 [decorators.py:588] saved AOT compiled function to /root/.cache/vllm/torch_compile_cache/torch_aot_compile/0da436ac5f91ca7287450564ca8ac58a52973ee129f411ac065de87300f1e07d/rank_0_0/model
[2026-03-04T00:25:02Z] INFO 03-04 00:25:02 [gpu_worker.py:424] Available KV cache memory: 4.87 GiB
[2026-03-04T00:25:02Z] INFO 03-04 00:25:02 [kv_cache_utils.py:1314] GPU KV cache size: 21,888 tokens


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation needs-rebase nvidia performance Performance-related issues structured-output v1

Projects

Status: No status
Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants