[BugFix][ROCm] Fix `get_cu_count` missing variable error #28608

ganyi1996ppo · 2025-11-13T02:33:23Z

Purpose

Got error message when PR #27005 merged

ting down executor.
(EngineCore_DP0 pid=2001222) Process EngineCore_DP0:
(EngineCore_DP0 pid=2001222) Traceback (most recent call last):
(EngineCore_DP0 pid=2001222)   File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=2001222)     self.run()
(EngineCore_DP0 pid=2001222)   File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=2001222)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=2001222)   File "/home/ygan/vllm/vllm/v1/engine/core.py", line 859, in run_engine_core
(EngineCore_DP0 pid=2001222)     raise e
(EngineCore_DP0 pid=2001222)   File "/home/ygan/vllm/vllm/v1/engine/core.py", line 846, in run_engine_core
(EngineCore_DP0 pid=2001222)     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=2001222)   File "/home/ygan/vllm/vllm/v1/engine/core.py", line 619, in __init__
(EngineCore_DP0 pid=2001222)     super().__init__(
(EngineCore_DP0 pid=2001222)   File "/home/ygan/vllm/vllm/v1/engine/core.py", line 110, in __init__
(EngineCore_DP0 pid=2001222)     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=2001222)   File "/home/ygan/vllm/vllm/v1/engine/core.py", line 244, in _initialize_kv_caches
(EngineCore_DP0 pid=2001222)     self.model_executor.initialize_from_config(kv_cache_configs)
(EngineCore_DP0 pid=2001222)   File "/home/ygan/vllm/vllm/v1/executor/abstract.py", line 116, in initialize_from_config
(EngineCore_DP0 pid=2001222)     self.collective_rpc("compile_or_warm_up_model")
(EngineCore_DP0 pid=2001222)   File "/home/ygan/vllm/vllm/v1/executor/multiproc_executor.py", line 307, in collective_rpc
(EngineCore_DP0 pid=2001222)     return aggregate(get_response())
(EngineCore_DP0 pid=2001222)   File "/home/ygan/vllm/vllm/v1/executor/multiproc_executor.py", line 290, in get_response
(EngineCore_DP0 pid=2001222)     raise RuntimeError(
(EngineCore_DP0 pid=2001222) RuntimeError: Worker failed with error 'get_cu_count() missing 1 required positional argument: 'cls'', please check the stack trace above for the root cause

The get_cu_count calling missing class variable, this PR remove that class variable to make sure the functionality on ROCm platform.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: ganyi <[email protected]>

gemini-code-assist

Code Review

This pull request correctly fixes a TypeError in the get_cu_count function by removing an unnecessary cls parameter. I've also identified a potential improvement within the same function. The current implementation could lead to unintended CUDA context initialization. I've provided a suggestion to use an existing utility function from the same file to make it safer in multiprocessing environments.

gemini-code-assist · 2025-11-13T02:35:29Z

vllm/utils/platform_utils.py

+def get_cu_count(device_id: int = 0) -> int:
    """Returns the total number of compute units (CU) on single GPU."""
    return torch.cuda.get_device_properties(device_id).multi_processor_count


While removing the unused cls parameter is correct, the function can be further improved. The direct call to torch.cuda.get_device_properties will initialize the CUDA context, which can cause issues in multiprocessing environments. This file provides a safer utility, cuda_get_device_properties, which avoids this side effect. Using it here would make the function more robust.

Suggested change

def get_cu_count(device_id: int = 0) -> int:

"""Returns the total number of compute units (CU) on single GPU."""

return torch.cuda.get_device_properties(device_id).multi_processor_count

def get_cu_count(device_id: int = 0) -> int:

"""Returns the total number of compute units (CU) on single GPU."""

return cuda_get_device_properties(device_id, ("multi_processor_count",))[0]

ganyi1996ppo · 2025-11-13T02:38:53Z

@tjtanaa @wangxiyuan @gshtras @HAIAI please take a look

tjtanaa

Thank you for the fix.

wangxiyuan

Oh, my silly mistake

ganyi1996ppo · 2025-11-13T02:46:06Z

Oh, my silly mistake

haha, that's fine, I spot it in time!

…t#28608) Signed-off-by: ganyi <[email protected]> Signed-off-by: George D. Torres <[email protected]>

…t#28608) Signed-off-by: ganyi <[email protected]> Signed-off-by: Bram Wasti <[email protected]>

fix get_cu_count error

b18a737

Signed-off-by: ganyi <[email protected]>

gemini-code-assist bot reviewed Nov 13, 2025

View reviewed changes

ganyi1996ppo changed the title ~~[BugFix] Fix get_cu_count missing variable error~~ [BugFix][ROCm] Fix get_cu_count missing variable error Nov 13, 2025

mergify bot added the rocm Related to AMD ROCm label Nov 13, 2025

tjtanaa approved these changes Nov 13, 2025

View reviewed changes

wangxiyuan approved these changes Nov 13, 2025

View reviewed changes

tjtanaa added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 13, 2025

tjtanaa enabled auto-merge (squash) November 13, 2025 02:46

tjtanaa merged commit 7dca0c9 into vllm-project:main Nov 13, 2025
53 checks passed

gshtras deleted the ganyi/fix_get_cu_count branch November 13, 2025 17:07

geodavic pushed a commit to geodavic/vllm that referenced this pull request Nov 16, 2025

[BugFix][ROCm] Fix get_cu_count missing variable error (vllm-projec…

25b6180

…t#28608) Signed-off-by: ganyi <[email protected]> Signed-off-by: George D. Torres <[email protected]>

bwasti pushed a commit to bwasti/vllm that referenced this pull request Nov 17, 2025

[BugFix][ROCm] Fix get_cu_count missing variable error (vllm-projec…

52c4a6f

…t#28608) Signed-off-by: ganyi <[email protected]> Signed-off-by: Bram Wasti <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BugFix][ROCm] Fix `get_cu_count` missing variable error #28608

[BugFix][ROCm] Fix `get_cu_count` missing variable error #28608

Uh oh!

ganyi1996ppo commented Nov 13, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 13, 2025

Uh oh!

ganyi1996ppo commented Nov 13, 2025

Uh oh!

tjtanaa left a comment

Uh oh!

wangxiyuan left a comment

Uh oh!

ganyi1996ppo commented Nov 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[BugFix][ROCm] Fix get_cu_count missing variable error #28608

[BugFix][ROCm] Fix get_cu_count missing variable error #28608

Uh oh!

Conversation

ganyi1996ppo commented Nov 13, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

ganyi1996ppo commented Nov 13, 2025

Uh oh!

tjtanaa left a comment

Choose a reason for hiding this comment

Uh oh!

wangxiyuan left a comment

Choose a reason for hiding this comment

Uh oh!

ganyi1996ppo commented Nov 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[BugFix][ROCm] Fix `get_cu_count` missing variable error #28608

[BugFix][ROCm] Fix `get_cu_count` missing variable error #28608

ganyi1996ppo commented Nov 13, 2025 •

edited by github-actions bot

Loading