[Hardware] Replace memory related torch.cuda APIs by jikunshang · Pull Request #37031 · vllm-project/vllm

jikunshang · 2026-03-14T02:48:30Z

Purpose

part of #30679
this PR replace below APIs with torch.accelerator:

torch.cuda.memory_reserved
torch.cuda.memory_allocated
torch.cuda.max_memory_allocated
torch.cuda.max_memory_reserved
torch.cuda.reset_peak_memory_stats
torch.cuda.memory_stats

Test Plan

CI

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>

gemini-code-assist

Code Review

This pull request continues the effort to abstract away hardware-specific APIs by replacing several torch.cuda memory management functions with their torch.accelerator counterparts. The changes are consistently applied across various parts of the codebase, including benchmarks, tests, and utility modules. A corresponding update to the pre-commit hook ensures these new standards are maintained. My review identifies one area where a CUDA-specific conditional check remains, limiting the hardware-agnostic benefit of the refactoring on other platforms like ROCm. I've provided a suggestion to address this.

gemini-code-assist · 2026-03-14T02:50:39Z

            # to have test coverage on peak memory for online quantization.
            if current_platform.is_cuda():
-                peak_memory = torch.cuda.max_memory_allocated()
+                peak_memory = torch.accelerator.max_memory_allocated()


The change to torch.accelerator.max_memory_allocated() is correct, but it's inside a if current_platform.is_cuda(): block on line 66. Since torch.accelerator is designed to be device-agnostic (working on CUDA, ROCm, etc.), this condition is now too restrictive and will prevent peak memory logging on other GPU platforms like ROCm.

To ensure this logging works on all supported GPU-like devices, consider broadening the condition. For example:

if current_platform.is_cuda_alike():

I think we shuold use if not current_platform.is_cpu() here.

Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>

### What this PR does / why we need it? 1.fix "TypeError: get_attn_backend() remove variable": [Refactor `check_and_update_config`](vllm-project/vllm#35122) 2.fix [Rename `compile_ranges_split_points` to `compile_ranges_endpoints`](vllm-project/vllm#36027) 3.fix "RuntimeError: device_allocator not a DeviceAllocator":[Replace memory related torch.cuda APIs"](vllm-project/vllm#37031) 4.fix [Support multiple KV groups in OffloadingSpec ](vllm-project/vllm#36610) removed self.offloaded_block_size and changed self.gpu_block_size from a scalar to a tuple of per-group block sizes, adding block_size_factor. 5.fix [Consolidate SupportsEagle](vllm-project/vllm#36063) renamed get_eagle3_aux_hidden_state_layers() to get_eagle3_default_aux_hidden_state_layers() and added a supports_eagle3() guard before calling it. ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? E2E - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8a68046 --------- Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Claude Code <noreply@anthropic.com>

Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>

### What this PR does / why we need it? 1.fix "TypeError: get_attn_backend() remove variable": [Refactor `check_and_update_config`](vllm-project/vllm#35122) 2.fix [Rename `compile_ranges_split_points` to `compile_ranges_endpoints`](vllm-project/vllm#36027) 3.fix "RuntimeError: device_allocator not a DeviceAllocator":[Replace memory related torch.cuda APIs"](vllm-project/vllm#37031) 4.fix [Support multiple KV groups in OffloadingSpec ](vllm-project/vllm#36610) removed self.offloaded_block_size and changed self.gpu_block_size from a scalar to a tuple of per-group block sizes, adding block_size_factor. 5.fix [Consolidate SupportsEagle](vllm-project/vllm#36063) renamed get_eagle3_aux_hidden_state_layers() to get_eagle3_default_aux_hidden_state_layers() and added a supports_eagle3() guard before calling it. ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? E2E - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8a68046 --------- Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Claude Code <noreply@anthropic.com>

Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>

jikunshang added 6 commits March 14, 2026 10:29

replace torch.cuda.memory_reserved

a601b47

Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>

replace torch.cuda.memory_allocated

eb45d48

Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>

replace torch.cuda.max_memory_allocated

4f4e948

Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>

replace torch.cuda.max_memory_allocated

a582344

Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>

replace torch.cuda.reset_peak_memory_stats

949618f

Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>

replace torch.cuda.memory_stats

eea94a3

Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>

jikunshang requested review from 22quinn, hmellor and njhill as code owners March 14, 2026 02:48

mergify Bot added performance Performance-related issues nvidia v1 labels Mar 14, 2026

github-project-automation Bot added this to NVIDIA Mar 14, 2026

gemini-code-assist Bot reviewed Mar 14, 2026

View reviewed changes

jikunshang mentioned this pull request Mar 14, 2026

[RFC]: Replace torch.cuda API with torch.accelerator for better hardware compatiblity. #30679

Open

1 task

Merge branch 'main' into kunshang/acc_mem

d5e9cfc

jikunshang added ready ONLY add when PR is ready to merge/full CI is needed ready-run-all-tests Trigger CI with all tests for wide-ranging PRs labels Mar 16, 2026

hmellor reviewed Mar 16, 2026

View reviewed changes

Comment thread vllm/v1/worker/gpu_worker.py

hmellor approved these changes Mar 16, 2026

View reviewed changes

github-project-automation Bot moved this to Ready in NVIDIA Mar 16, 2026

hmellor enabled auto-merge (squash) March 16, 2026 09:58

hmellor merged commit 747b068 into vllm-project:main Mar 16, 2026
168 checks passed

github-project-automation Bot moved this from Ready to Done in NVIDIA Mar 16, 2026

Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Mar 17, 2026

[Hardware] Replace memory related torch.cuda APIs (vllm-project#37031)

5976e24

Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>

leo-pony mentioned this pull request Mar 18, 2026

Main2main upgrade to vllm 0317 afternoon vllm-project/vllm-ascend#7409

Merged

wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026

[Hardware] Replace memory related torch.cuda APIs (vllm-project#37031)

574f38d

Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>

khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026

[Hardware] Replace memory related torch.cuda APIs (vllm-project#37031)

c324c5f

Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>

JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026

[Hardware] Replace memory related torch.cuda APIs (vllm-project#37031)

2c4649d

Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>

mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026

[Hardware] Replace memory related torch.cuda APIs (vllm-project#37031)

a85f4ac

Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Hardware] Replace memory related torch.cuda APIs #37031

[Hardware] Replace memory related torch.cuda APIs #37031
hmellor merged 7 commits intovllm-project:mainfrom
jikunshang:kunshang/acc_mem

jikunshang commented Mar 14, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 14, 2026

Uh oh!

jikunshang Mar 14, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jikunshang commented Mar 14, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

jikunshang Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jikunshang commented Mar 14, 2026 •

edited by github-actions Bot

Loading