Revert "[Hardware] Replace torch.cuda.empty_cache with torch.accelerator.empty_cache" (#30681)#36076
Revert "[Hardware] Replace torch.cuda.empty_cache with torch.accelerator.empty_cache" (#30681)#36076zhewenl wants to merge 1 commit intovllm-project:mainfrom
torch.cuda.empty_cache with torch.accelerator.empty_cache" (#30681)#36076Conversation
…erator.empty_cache` (vllm-project#30681)" This reverts commit 16d2ad1.
|
Documentation preview: https://vllm--36076.org.readthedocs.build/en/36076/ |
There was a problem hiding this comment.
Code Review
This pull request reverts a previous change that replaced torch.cuda.empty_cache with torch.accelerator.empty_cache, which caused CI failures. The revert is mostly mechanical, but in some platform-agnostic files, it correctly uses a platform abstraction (current_platform.empty_cache()) instead of hardcoding torch.cuda.empty_cache. This is a good improvement. However, I've identified a critical issue in vllm/v1/worker/xpu_model_runner.py where a monkey-patch is not reverted, potentially leading to side effects.
| if supports_xpu_graph(): | ||
| torch.cuda.graph = torch.xpu.graph | ||
| torch.cuda.CUDAGraph = torch.xpu.XPUGraph | ||
| torch.cuda.empty_cache = torch.xpu.empty_cache |
There was a problem hiding this comment.
This monkey-patch is not reverted in the finally block, making it permanent for the process. This can cause unexpected behavior if other parts of the code expect the original torch.cuda.empty_cache. The same issue exists for torch.cuda.graph and torch.cuda.CUDAGraph from the original code. A context manager should restore the original state upon exit. Please save the original attributes before patching and restore them in the finally block.
|
This pull request has merge conflicts that must be resolved before it can be |
|
#30681 run all test. my pr log: |
Revert of #30681
This reverts the merge commit for PR #30681 which replaced
torch.cuda.empty_cachewithtorch.accelerator.empty_cacheacross the codebase.Reason
This PR is linked to 1 new CI failure in nightly build #54530:
test_torchrun_example_moe.pyfails with KV cache memory error: available memory 0.49 GiB < needed 0.50 GiB. The replacement oftorch.cuda.empty_cachewithtorch.accelerator.empty_cachemay affect GPU memory reclamation behavior, causing this marginal shortfall.Auto-generated
This revert PR was auto-generated by the CI failure analyzer. Please review before merging.