fix ROCm/HIP support for large memory allocation #14

YangWang92 · 2025-05-25T13:10:58Z

Due to internal constraints in ROCm/HIP, virtual-memory allocation must be performed chunk-wise for large memory allocation. In addition, avoid using ROCm 6.4.0, an internal bug prevents memory from being released. This commit follows the approach introduced in vllm-project/vllm#12695.

EDIT by tom to make github happy:
Close #9

fzyzcjy

Great job! Firstly one thought

fzyzcjy · 2025-05-25T13:19:43Z

csrc/torch_memory_saver_hip.cpp

wondering is it possible we unify the CUDA and HIP versions to avoid code duplication

It is possbile to reuse CUDA and HIP versions, but current HIP version requires the chunk-wise allocation. Should I merge them?

Here is the example from vllm's pull https://github.com/vllm-project/vllm/pull/12695/files#diff-80ab2c32370c400e78803f0ea6c73ec130cf9fb6050ec2728de56bea731cf880 https://github.com/vllm-project/vllm/pull/12695/files#diff-b8401ec6ffbf88a5594fa627fddbf2e3ca5742318c6882520f99d3fc211cbd3e

As is discussed in DM, wondering whether it is possible to make a layered cake - one low-level layer that provides a unified API and hides diffs from CUDA and HIP, and a high-level layer that consumes this

ExtremeViscent and others added 4 commits April 1, 2025 11:04

AMD ver

1053036

fix hip support with chunk-wise allocation

656ea00

improve example output

b49e31d

Merge remote-tracking branch 'amd/master'

4e775e6

fzyzcjy reviewed May 25, 2025

View reviewed changes

fzyzcjy mentioned this pull request May 27, 2025

fix ROCm/HIP support for large memory allocation. ExtremeViscent/torch_memory_saver#1

Merged

fzyzcjy mentioned this pull request Jul 23, 2025

Utilize torch_memory_saver THUDM/slime#42

Closed

yushengsu-thu mentioned this pull request Oct 8, 2025

[Hardware Support] AMD - ROCM #43

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix ROCm/HIP support for large memory allocation #14

fix ROCm/HIP support for large memory allocation #14

Uh oh!

YangWang92 commented May 25, 2025 •

edited by fzyzcjy

Loading

Uh oh!

fzyzcjy left a comment

Uh oh!

fzyzcjy May 25, 2025

Uh oh!

YangWang92 May 25, 2025

Uh oh!

YangWang92 May 25, 2025

Uh oh!

fzyzcjy May 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix ROCm/HIP support for large memory allocation #14

Are you sure you want to change the base?

fix ROCm/HIP support for large memory allocation #14

Uh oh!

Conversation

YangWang92 commented May 25, 2025 • edited by fzyzcjy Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fzyzcjy left a comment

Choose a reason for hiding this comment

Uh oh!

fzyzcjy May 25, 2025

Choose a reason for hiding this comment

Uh oh!

YangWang92 May 25, 2025

Choose a reason for hiding this comment

Uh oh!

YangWang92 May 25, 2025

Choose a reason for hiding this comment

Uh oh!

fzyzcjy May 25, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

YangWang92 commented May 25, 2025 •

edited by fzyzcjy

Loading