[ROCm][BugFix] Remove the usage of `device_info` from aiter #28383

ganyi1996ppo · 2025-11-10T07:23:10Z

Purpose

Many user report the following error encountered after this PR merged #25763

from aiter.ops.triton.utils.device_info import get_num_sms
(EngineCore_DP0 pid=230496) ModuleNotFoundError: No module named 'aiter.ops.triton.utils.device_info'

And we notice many user's aiter doesn't have this module, This PR remove its usage to maintain the backward compatibility to aiter

Test Plan

gsm8k

Test Result

gsm8k result

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.8203	±	0.0106
		strict-match	5	exact_match	↑	0.8901	±	0.0086

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: ganyi <[email protected]>

gemini-code-assist

Code Review

This pull request correctly addresses a ModuleNotFoundError by removing a dependency on aiter.ops.triton.utils.device_info. However, the removal of the get_num_sms() call and its replacement with total_tokens for NUM_PRGMS might introduce a performance regression. The original logic was likely intended to optimize the number of launched Triton programs based on the available hardware compute units. I've provided a suggestion to restore this optimization using vLLM's platform abstraction layer, which should resolve the import error while preserving performance.

gemini-code-assist · 2025-11-10T07:24:24Z

vllm/v1/attention/backends/rocm_aiter_fa.py

        num_heads = key_cache.shape[2]

-        NUM_PRGMS = num_programs(total_tokens)
+        NUM_PRGMS = total_tokens


While removing the dependency on aiter.ops.triton.utils.device_info.get_num_sms fixes the ModuleNotFoundError, changing NUM_PRGMS to total_tokens could lead to a significant performance regression. The original logic min(total_tokens, get_num_sms()) capped the number of Triton programs to the number of streaming multiprocessors (SMs) or compute units (CUs) to optimize execution. By setting NUM_PRGMS = total_tokens, you might be launching an excessive number of programs (e.g., one per token), which can be inefficient.

A better approach would be to use vLLM's platform abstraction to get the number of compute units. You can replace get_num_sms() with current_platform.get_cu_count() to preserve the optimization.

Suggested change

NUM_PRGMS = total_tokens

NUM_PRGMS = min(total_tokens, current_platform.get_cu_count())

@ganyi1996ppo ~~Does this advice help? If it doesn't overall it looks good to me.~~

It seems gemini suggest correctly. I have double checked the get_sms() from aiter and vLLM's get_cu_count()

they are the same,

VLLM:

vllm/vllm/platforms/rocm.py

Line 454 in d0e186c

return torch.cuda.get_device_properties(device_id).multi_processor_count

and AITER:

https://github.com/ROCm/aiter/blob/de14bec0ca5a9de94e10f5cad4dc1541ac558689/aiter/ops/triton/utils/device_info.py#L4-L9

Thanks for the comments, that's better indeed!

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-11-10T07:24:52Z

vllm/v1/attention/backends/rocm_aiter_fa.py

+        NUM_PRGMS = total_tokens
        BLOCK_SIZE = block_size(key_cache, head_dim)
        grid = lambda meta: (NUM_PRGMS,)
        cp_mha_gather_cache_kernel[grid](


Avoid recompiling Triton kernel for every token count

The new NUM_PRGMS = total_tokens value is passed as tl.constexpr, so Triton specializes and caches a separate kernel for every distinct total_tokens encountered. During decoding the token count fluctuates almost every invocation, which now forces a JIT compilation on every call and will quickly thrash the compile cache and slow down inference. The previous code bounded NUM_PRGMS to the device SM count, keeping the number of compiled variants small and stable. Consider clamping NUM_PRGMS to a fixed upper limit (e.g., SMs or another constant) rather than the raw token count to avoid repeated compilations.

Useful? React with 👍 / 👎.

Remove NUM_PRGMS from tl.constexpr

Signed-off-by: ganyi <[email protected]>

tjtanaa

LGTM. Thank you @ganyi1996ppo

ganyi1996ppo · 2025-11-12T02:37:43Z

LGTM. Thank you @ganyi1996ppo

Thanks for the thoughtful review!

tjtanaa · 2025-11-12T04:36:27Z

@ganyi1996ppo I will monitor for a while as there is this discussion on other two PRs about whether it is appropriate to use current_platform.get_cu_count(). Could we also get your advice on this question? I have tagged you in one of the discussion thread in #28311 (comment) .

Signed-off-by: ganyi <[email protected]>

tjtanaa · 2025-11-12T20:26:35Z

/gemini review

gemini-code-assist

Code Review

This pull request addresses a critical bug where a ModuleNotFoundError for aiter.ops.triton.utils.device_info was causing crashes for users. The fix correctly removes this problematic dependency. The usage of get_num_sms is replaced with current_platform.get_cu_count(), which is a more robust way to get the number of compute units from within the vLLM framework. Additionally, the logic for calculating num_programs has been improved to be based on total_tokens instead of head_dim, which is more appropriate for token-level parallelization in the Triton kernel. The change from a constexpr to a runtime parameter for the number of programs is also a necessary correctness fix. The changes are well-implemented and resolve the reported issue while also improving the kernel's logic.

DarkLight1337

This PR is in conflict with #27005. You need to update the import of get_cu_count

…28383)" This reverts commit ca00b1b.

…ject#28383) Signed-off-by: ganyi <[email protected]> Signed-off-by: George D. Torres <[email protected]>

…ject#28383) Signed-off-by: ganyi <[email protected]> Signed-off-by: Bram Wasti <[email protected]>

remove device_info module acquired from aiter

94b490f

Signed-off-by: ganyi <[email protected]>

ganyi1996ppo requested a review from gshtras as a code owner November 10, 2025 07:23

mergify bot added rocm Related to AMD ROCm v1 labels Nov 10, 2025

gemini-code-assist bot reviewed Nov 10, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Nov 10, 2025

View reviewed changes

apinge mentioned this pull request Nov 10, 2025

[ROCm] Support for Whisper v1 with Aiter Unified Attention and Aiter Flash Attention #28376

Open

5 tasks

ganyi1996ppo added 2 commits November 12, 2025 01:54

adopt current_platform.get_cu_count to replace the aiter's api

6455831

Signed-off-by: ganyi <[email protected]>

remove the num_program from tl.constexpr

2d61722

Signed-off-by: ganyi <[email protected]>

tjtanaa approved these changes Nov 12, 2025

View reviewed changes

tjtanaa mentioned this pull request Nov 12, 2025

[BugFix][CI/Build][ROCM] Fix import error and apply assert in appropriate case in test_struct_output_generate #28311

Merged

fix bug

747f602

Signed-off-by: ganyi <[email protected]>

gemini-code-assist bot reviewed Nov 12, 2025

View reviewed changes

tjtanaa added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 12, 2025

Merge branch 'main' into ganyi/remove_device_info

06aea92

tjtanaa merged commit ca00b1b into vllm-project:main Nov 13, 2025
49 checks passed

DarkLight1337 reviewed Nov 13, 2025

View reviewed changes

tjtanaa added a commit that referenced this pull request Nov 13, 2025

Revert "[ROCm][BugFix] Remove the usage of device_info from aiter (#…

1233350

…28383)" This reverts commit ca00b1b.

ganyi1996ppo mentioned this pull request Nov 13, 2025

[ROCm][BugFix]Fix get_cu_count in rocm_aiter_fa.py #28618

Merged

5 tasks

geodavic pushed a commit to geodavic/vllm that referenced this pull request Nov 16, 2025

[ROCm][BugFix] Remove the usage of device_info from aiter (vllm-pro…

32f9a31

…ject#28383) Signed-off-by: ganyi <[email protected]> Signed-off-by: George D. Torres <[email protected]>

bwasti pushed a commit to bwasti/vllm that referenced this pull request Nov 17, 2025

[ROCm][BugFix] Remove the usage of device_info from aiter (vllm-pro…

467f62b

…ject#28383) Signed-off-by: ganyi <[email protected]> Signed-off-by: Bram Wasti <[email protected]>

	NUM_PRGMS = total_tokens
	NUM_PRGMS = min(total_tokens, current_platform.get_cu_count())

Uh oh!

[ROCm][BugFix] Remove the usage of device_info from aiter #28383

[ROCm][BugFix] Remove the usage of device_info from aiter #28383

Uh oh!

Conversation

ganyi1996ppo commented Nov 10, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

tjtanaa Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ganyi1996ppo Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

ganyi1996ppo Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

tjtanaa left a comment

Choose a reason for hiding this comment

Uh oh!

ganyi1996ppo commented Nov 12, 2025

Uh oh!

tjtanaa commented Nov 12, 2025

Uh oh!

tjtanaa commented Nov 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[ROCm][BugFix] Remove the usage of `device_info` from aiter #28383

[ROCm][BugFix] Remove the usage of `device_info` from aiter #28383

ganyi1996ppo commented Nov 10, 2025 •

edited by github-actions bot

Loading

tjtanaa Nov 10, 2025 •

edited

Loading