Skip to content

fix(rocm): remove workaround causing invalid argument on Qwen3.5 with TP=2#40686

Merged
vllm-bot merged 3 commits into
vllm-project:mainfrom
aaab8b:fix-rocm-qwen-wakeup
May 6, 2026
Merged

fix(rocm): remove workaround causing invalid argument on Qwen3.5 with TP=2#40686
vllm-bot merged 3 commits into
vllm-project:mainfrom
aaab8b:fix-rocm-qwen-wakeup

Conversation

@aaab8b

@aaab8b aaab8b commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

This PR removes a ROCm workaround in cumem_allocator.cpp. Keeping this code causes an invalid argument error during wake_up for the Qwen3.5 model when running with TP=2 on ROCm platforms.

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify Bot added qwen Related to Qwen models rocm Related to AMD ROCm labels Apr 23, 2026
@github-project-automation github-project-automation Bot moved this to Todo in AMD Apr 23, 2026
@mergify

mergify Bot commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

Hi @aaab8b, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes a ROCm-specific workaround in csrc/cumem_allocator.cpp that cycled virtual address reservations to force physical VRAM release. I have no feedback to provide.

@aaab8b

aaab8b commented Apr 23, 2026

Copy link
Copy Markdown
Contributor Author

using this script and rocm can reproduce this problem:
import torch
from vllm import LLM

if name == 'main':
llm = LLM(model="Qwen/Qwen3.5-4B",
enable_sleep_mode=True,
tensor_parallel_size=2)

def run_inference(prompt):
    outputs = llm.generate(prompt)
    for output in outputs:
        print(f"Prompt: {output.prompt!r}, Generated: {output.outputs[0].text[:60]!r}")

print("\n===== Round 1 =====")
run_inference("San Francisco is")

print("\n===== Sleep (level 1, TP=2) =====")
llm.sleep(level=1)
torch.cuda.empty_cache()

for i in range(2):
    free, total = torch.cuda.mem_get_info(i)
    print(f"  GPU {i}: free={free/1024**3:.2f} GiB / total={total/1024**3:.1f} GiB")

print("\n===== Allocate 90% VRAM on GPU 0 and 1 =====")
for i in range(2):
    free, total = torch.cuda.mem_get_info(i)
    alloc_bytes = int(total * 0.9)
    try:
        t = torch.empty(alloc_bytes, dtype=torch.uint8, device=f"cuda:{i}")
        print(f"  GPU {i}: allocated {alloc_bytes/1024**3:.2f} GiB ✓")
        del t
        torch.cuda.empty_cache()
    except torch.cuda.OutOfMemoryError:
        print(f"  GPU {i}: FAILED (free={free/1024**3:.2f} GiB) ✗")

print("\n===== Wake up =====")
llm.wake_up()
run_inference("Paris is")

@tjtanaa tjtanaa left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tjtanaa tjtanaa enabled auto-merge (squash) April 28, 2026 05:16
@github-actions github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 28, 2026
@vllm-bot vllm-bot merged commit 66d1cc0 into vllm-project:main May 6, 2026
147 of 152 checks passed
@github-project-automation github-project-automation Bot moved this from Todo to Done in AMD May 6, 2026
chaojun-zhang pushed a commit to chaojun-zhang/vllm that referenced this pull request May 6, 2026
… TP=2 (vllm-project#40686)

Co-authored-by: Test User <test@example.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
ikaadil pushed a commit to ikaadil/vllm that referenced this pull request May 7, 2026
… TP=2 (vllm-project#40686)

Co-authored-by: Test User <test@example.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
libinta pushed a commit to libinta/vllm that referenced this pull request May 8, 2026
… TP=2 (vllm-project#40686)

Co-authored-by: Test User <test@example.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Signed-off-by: Libin Tang <libin.tang@intel.com>
weifang231 pushed a commit to weifang231/eb-vllm that referenced this pull request May 13, 2026
… TP=2 (vllm-project#40686)

Co-authored-by: Test User <test@example.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
mfylcek pushed a commit to mfylcek/vllm that referenced this pull request May 19, 2026
… TP=2 (vllm-project#40686)

Co-authored-by: Test User <test@example.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026
… TP=2 (vllm-project#40686)

Co-authored-by: Test User <test@example.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026
… TP=2 (vllm-project#40686)

Co-authored-by: Test User <test@example.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants