fix(rocm): remove workaround causing invalid argument on Qwen3.5 with TP=2 by aaab8b · Pull Request #40686 · vllm-project/vllm

aaab8b · 2026-04-23T07:53:23Z

This PR removes a ROCm workaround in cumem_allocator.cpp. Keeping this code causes an invalid argument error during wake_up for the Qwen3.5 model when running with TP=2 on ROCm platforms.

… TP=2

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

mergify · 2026-04-23T07:54:44Z

Hi @aaab8b, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

gemini-code-assist

Code Review

This pull request removes a ROCm-specific workaround in csrc/cumem_allocator.cpp that cycled virtual address reservations to force physical VRAM release. I have no feedback to provide.

aaab8b · 2026-04-23T07:56:21Z

using this script and rocm can reproduce this problem:
import torch
from vllm import LLM

if name == 'main':
llm = LLM(model="Qwen/Qwen3.5-4B",
enable_sleep_mode=True,
tensor_parallel_size=2)

def run_inference(prompt):
    outputs = llm.generate(prompt)
    for output in outputs:
        print(f"Prompt: {output.prompt!r}, Generated: {output.outputs[0].text[:60]!r}")

print("\n===== Round 1 =====")
run_inference("San Francisco is")

print("\n===== Sleep (level 1, TP=2) =====")
llm.sleep(level=1)
torch.cuda.empty_cache()

for i in range(2):
    free, total = torch.cuda.mem_get_info(i)
    print(f"  GPU {i}: free={free/1024**3:.2f} GiB / total={total/1024**3:.1f} GiB")

print("\n===== Allocate 90% VRAM on GPU 0 and 1 =====")
for i in range(2):
    free, total = torch.cuda.mem_get_info(i)
    alloc_bytes = int(total * 0.9)
    try:
        t = torch.empty(alloc_bytes, dtype=torch.uint8, device=f"cuda:{i}")
        print(f"  GPU {i}: allocated {alloc_bytes/1024**3:.2f} GiB ✓")
        del t
        torch.cuda.empty_cache()
    except torch.cuda.OutOfMemoryError:
        print(f"  GPU {i}: FAILED (free={free/1024**3:.2f} GiB) ✗")

print("\n===== Wake up =====")
llm.wake_up()
run_inference("Paris is")

tjtanaa

LGTM

… TP=2 (vllm-project#40686) Co-authored-by: Test User <test@example.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>

… TP=2 (vllm-project#40686) Co-authored-by: Test User <test@example.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

… TP=2 (vllm-project#40686) Co-authored-by: Test User <test@example.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com> Signed-off-by: Libin Tang <libin.tang@intel.com>

… TP=2 (vllm-project#40686) Co-authored-by: Test User <test@example.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>

… TP=2 (vllm-project#40686) Co-authored-by: Test User <test@example.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

fix(rocm): remove workaround causing invalid argument on Qwen3.5 with…

387d828

… TP=2

claude Bot reviewed Apr 23, 2026

View reviewed changes

mergify Bot added qwen Related to Qwen models rocm Related to AMD ROCm labels Apr 23, 2026

github-project-automation Bot added this to AMD Apr 23, 2026

github-project-automation Bot moved this to Todo in AMD Apr 23, 2026

gemini-code-assist Bot reviewed Apr 23, 2026

View reviewed changes

tjtanaa approved these changes Apr 28, 2026

View reviewed changes

Merge branch 'main' into fix-rocm-qwen-wakeup

c016adf

tjtanaa enabled auto-merge (squash) April 28, 2026 05:16

github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 28, 2026

Merge branch 'main' into fix-rocm-qwen-wakeup

f994b6a

vllm-bot merged commit 66d1cc0 into vllm-project:main May 6, 2026
147 of 152 checks passed

github-project-automation Bot moved this from Todo to Done in AMD May 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(rocm): remove workaround causing invalid argument on Qwen3.5 with TP=2#40686

fix(rocm): remove workaround causing invalid argument on Qwen3.5 with TP=2#40686
vllm-bot merged 3 commits into
vllm-project:mainfrom
aaab8b:fix-rocm-qwen-wakeup

aaab8b commented Apr 23, 2026

Uh oh!

claude Bot left a comment

Uh oh!

mergify Bot commented Apr 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

aaab8b commented Apr 23, 2026

Uh oh!

tjtanaa left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

aaab8b commented Apr 23, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

mergify Bot commented Apr 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

aaab8b commented Apr 23, 2026

Uh oh!

tjtanaa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants