Skip to content

[Hardware] Replace memory related torch.cuda APIs #37031

Merged
hmellor merged 7 commits intovllm-project:mainfrom
jikunshang:kunshang/acc_mem
Mar 16, 2026
Merged

[Hardware] Replace memory related torch.cuda APIs #37031
hmellor merged 7 commits intovllm-project:mainfrom
jikunshang:kunshang/acc_mem

Conversation

@jikunshang
Copy link
Copy Markdown
Collaborator

@jikunshang jikunshang commented Mar 14, 2026

Purpose

part of #30679
this PR replace below APIs with torch.accelerator:

  • torch.cuda.memory_reserved
  • torch.cuda.memory_allocated
  • torch.cuda.max_memory_allocated
  • torch.cuda.max_memory_reserved
  • torch.cuda.reset_peak_memory_stats
  • torch.cuda.memory_stats

Test Plan

CI

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
@mergify mergify Bot added performance Performance-related issues nvidia v1 labels Mar 14, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request continues the effort to abstract away hardware-specific APIs by replacing several torch.cuda memory management functions with their torch.accelerator counterparts. The changes are consistently applied across various parts of the codebase, including benchmarks, tests, and utility modules. A corresponding update to the pre-commit hook ensures these new standards are maintained. My review identifies one area where a CUDA-specific conditional check remains, limiting the hardware-agnostic benefit of the refactoring on other platforms like ROCm. I've provided a suggestion to address this.

# to have test coverage on peak memory for online quantization.
if current_platform.is_cuda():
peak_memory = torch.cuda.max_memory_allocated()
peak_memory = torch.accelerator.max_memory_allocated()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The change to torch.accelerator.max_memory_allocated() is correct, but it's inside a if current_platform.is_cuda(): block on line 66. Since torch.accelerator is designed to be device-agnostic (working on CUDA, ROCm, etc.), this condition is now too restrictive and will prevent peak memory logging on other GPU platforms like ROCm.

To ensure this logging works on all supported GPU-like devices, consider broadening the condition. For example:

if current_platform.is_cuda_alike():

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shuold use if not current_platform.is_cpu() here.

@jikunshang jikunshang added ready ONLY add when PR is ready to merge/full CI is needed ready-run-all-tests Trigger CI with all tests for wide-ranging PRs labels Mar 16, 2026
Comment thread vllm/v1/worker/gpu_worker.py
@github-project-automation github-project-automation Bot moved this to Ready in NVIDIA Mar 16, 2026
@hmellor hmellor enabled auto-merge (squash) March 16, 2026 09:58
@hmellor hmellor merged commit 747b068 into vllm-project:main Mar 16, 2026
168 checks passed
@github-project-automation github-project-automation Bot moved this from Ready to Done in NVIDIA Mar 16, 2026
Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Mar 17, 2026
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
wangxiyuan pushed a commit to vllm-project/vllm-ascend that referenced this pull request Mar 18, 2026
### What this PR does / why we need it?

1.fix "TypeError: get_attn_backend() remove variable": [Refactor
`check_and_update_config`](vllm-project/vllm#35122)

2.fix [Rename `compile_ranges_split_points` to
`compile_ranges_endpoints`](vllm-project/vllm#36027)

3.fix "RuntimeError: device_allocator not a DeviceAllocator":[Replace
memory related torch.cuda
APIs"](vllm-project/vllm#37031)

4.fix [Support multiple KV groups in OffloadingSpec
](vllm-project/vllm#36610) removed
self.offloaded_block_size and changed self.gpu_block_size from a scalar
to a tuple of per-group block sizes, adding block_size_factor.

5.fix [Consolidate
SupportsEagle](vllm-project/vllm#36063) renamed
get_eagle3_aux_hidden_state_layers() to
get_eagle3_default_aux_hidden_state_layers() and added a
supports_eagle3() guard before calling it.

### Does this PR introduce _any_ user-facing change?
NA
### How was this patch tested?
E2E


- vLLM version: v0.17.0
- vLLM main:
vllm-project/vllm@8a68046

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Claude Code <noreply@anthropic.com>
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Mar 25, 2026
### What this PR does / why we need it?

1.fix "TypeError: get_attn_backend() remove variable": [Refactor
`check_and_update_config`](vllm-project/vllm#35122)

2.fix [Rename `compile_ranges_split_points` to
`compile_ranges_endpoints`](vllm-project/vllm#36027)

3.fix "RuntimeError: device_allocator not a DeviceAllocator":[Replace
memory related torch.cuda
APIs"](vllm-project/vllm#37031)

4.fix [Support multiple KV groups in OffloadingSpec
](vllm-project/vllm#36610) removed
self.offloaded_block_size and changed self.gpu_block_size from a scalar
to a tuple of per-group block sizes, adding block_size_factor.

5.fix [Consolidate
SupportsEagle](vllm-project/vllm#36063) renamed
get_eagle3_aux_hidden_state_layers() to
get_eagle3_default_aux_hidden_state_layers() and added a
supports_eagle3() guard before calling it.

### Does this PR introduce _any_ user-facing change?
NA
### How was this patch tested?
E2E


- vLLM version: v0.17.0
- vLLM main:
vllm-project/vllm@8a68046

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Claude Code <noreply@anthropic.com>
khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
lihaokun-2026 pushed a commit to lihaokun-2026/vllm-ascend that referenced this pull request Mar 29, 2026
### What this PR does / why we need it?

1.fix "TypeError: get_attn_backend() remove variable": [Refactor
`check_and_update_config`](vllm-project/vllm#35122)

2.fix [Rename `compile_ranges_split_points` to
`compile_ranges_endpoints`](vllm-project/vllm#36027)

3.fix "RuntimeError: device_allocator not a DeviceAllocator":[Replace
memory related torch.cuda
APIs"](vllm-project/vllm#37031)

4.fix [Support multiple KV groups in OffloadingSpec
](vllm-project/vllm#36610) removed
self.offloaded_block_size and changed self.gpu_block_size from a scalar
to a tuple of per-group block sizes, adding block_size_factor.

5.fix [Consolidate
SupportsEagle](vllm-project/vllm#36063) renamed
get_eagle3_aux_hidden_state_layers() to
get_eagle3_default_aux_hidden_state_layers() and added a
supports_eagle3() guard before calling it.

### Does this PR introduce _any_ user-facing change?
NA
### How was this patch tested?
E2E


- vLLM version: v0.17.0
- vLLM main:
vllm-project/vllm@8a68046

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Claude Code <noreply@anthropic.com>
chenchuw886 pushed a commit to chenchuw886/vllm-ascend that referenced this pull request Apr 1, 2026
### What this PR does / why we need it?

1.fix "TypeError: get_attn_backend() remove variable": [Refactor
`check_and_update_config`](vllm-project/vllm#35122)

2.fix [Rename `compile_ranges_split_points` to
`compile_ranges_endpoints`](vllm-project/vllm#36027)

3.fix "RuntimeError: device_allocator not a DeviceAllocator":[Replace
memory related torch.cuda
APIs"](vllm-project/vllm#37031)

4.fix [Support multiple KV groups in OffloadingSpec
](vllm-project/vllm#36610) removed
self.offloaded_block_size and changed self.gpu_block_size from a scalar
to a tuple of per-group block sizes, adding block_size_factor.

5.fix [Consolidate
SupportsEagle](vllm-project/vllm#36063) renamed
get_eagle3_aux_hidden_state_layers() to
get_eagle3_default_aux_hidden_state_layers() and added a
supports_eagle3() guard before calling it.

### Does this PR introduce _any_ user-facing change?
NA
### How was this patch tested?
E2E


- vLLM version: v0.17.0
- vLLM main:
vllm-project/vllm@8a68046

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Claude Code <noreply@anthropic.com>
mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

nvidia performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed ready-run-all-tests Trigger CI with all tests for wide-ranging PRs v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants