[ROCm][AITER] Support AITER Rope ops in RotaryEmbedding Module. #22521

vllmellm · 2025-08-08T12:34:34Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Integrate aiter rope ops in the RotaryEmbedding module, which boosts model performance.

Benchmark Results

meta-llama/Meta-Llama-3-8B-Instruct

Metric	With Aiter Rope	Without Aiter Rope
Request Throughput (req/s)	2.52	2.53
Output Token Thpt (tok/s)	2,089.46	1,995.06
Total Token Thpt (tok/s)	4,608.15	4,525.02
Mean TTFT (ms)	431.57	499.96
Median TTFT (ms)	131.37	179.53
P99 TTFT (ms)	1,423.43	1,557.27
Mean TPOT (ms)	23.32	25.15
Median TPOT (ms)	16.01	16.12
P99 TPOT (ms)	184.30	206.32
Mean ITL (ms)	15.95	16.01
Median ITL (ms)	15.24	15.07
P99 ITL (ms)	49.05	50.28

deepseek-ai/DeepSeek-V2-Lite-Chat

Metric	With Aiter Rope	Without Aiter Rope
Request Throughput (req/s)	3.12	3.07
Output Token Thpt (tok/s)	2,993.27	2,986.46
Total Token Thpt (tok/s)	6,112.77	6,055.55
Mean TTFT (ms)	127.90	204.32
Median TTFT (ms)	81.33	109.66
P99 TTFT (ms)	453.34	852.63
Mean TPOT (ms)	11.70	11.66
Median TPOT (ms)	11.42	11.52
P99 TPOT (ms)	14.14	13.11
Mean ITL (ms)	11.68	11.70
Median ITL (ms)	10.82	10.90
P99 ITL (ms)	39.48	39.37

IMPORTANT NOTE: You must use --compilation-config '{ "custom_ops": ["+rotary_embedding"]}' to enable this custom ops.

benchmark setting

python vllm/benchmarks/benchmark_serving.py --backend vllm --model "$model_name" --dataset-name random --num-prompts 50 --request-rate 10 --random-input-len 1000 --random-output-len 1000

Test Plan

Test models that are afftected by this change, using lm_eval on gsm8k dataset.

environment setting

Step 1: run vllm serve

VLLM_USE_V1=1 VLLM_ROCM_USE_AITER=1 VLLM_ROCM_USE_AITER_RMSNORM=0 VLLM_ROCM_USE_AITER_LINEAR=0 SAFETENSORS_FAST_GPU=1

vllm serve $MODEL_NAME --compilation-config '{ "custom_ops": ["+rotary_embedding"]}' --trust-remote-code --swap-space 16 --distributed-executor-backend mp

Step 2: run lm_eval

lm_eval --model local-completions --tasks gsm8k --model_args model=$MODEL_NAME,base_url=http://localhost:8000/v1/completions --trust_remote_code --num_fewshot 5 --batch_size 256

Tested models:

meta-llama/Meta-Llama-3-8B-Instruct (tests
Llama3RotaryEmbedding)
deepseek-ai/DeepSeek-V2-Lite-Chat (tests DeepseekScalingRotaryEmbedding)
tencent/Hunyuan-A13B-Pretrain (tests DynamicNTKAlphaRotaryEmbedding)
NousResearch/Yarn-Mistral-7b-128k (tests YaRNScalingRotaryEmbedding)
Qwen/Qwen3-235B-A22B-FP8
mistralai/Mixtral-8x7B-Instruct-v0.1
mistralai/Mistral-7B-Instruct-v0.1

Test Result

meta-llama/Meta-Llama-3-8B-Instruct

Tasks	Version	Filter	n-shot	Metric	Value (Before)	Value (After)	Stderr (Before)	Stderr (After)
gsm8k	3	flexible-extract	5	exact_match	0.7566	0.7528	0.0118	0.0119
		strict-match	5	exact_match	0.7589	0.7551	0.0118	0.0118

deepseek-ai/DeepSeek-V2-Lite-Chat (-tp 1)

Tasks	Version	Filter	n-shot	Metric	Value (Before)	Value (After)	Stderr (Before)	Stderr (After)
gsm8k	3	flexible-extract	5	exact_match	0.6626	0.6611	0.0130	0.0130
		strict-match	5	exact_match	0.6566	0.6520	0.0131	0.0131

tencent/Hunyuan-A13B-Pretrain (-tp 2)

Tasks	Version	Filter	n-shot	Metric	Value (Before)	Value (After)	Stderr (Before)	Stderr (After)
gsm8k	3	flexible-extract	5	exact_match	0.6422	0.6346	0.0132	0.0133
		strict-match	5	exact_match	0.3548	0.3450	0.0132	0.0131

NousResearch/Yarn-Mistral-7b-128k (-tp 2)

Tasks	Version	Filter	n-shot	Metric	Value (Before)	Value (After)	Stderr (Before)	Stderr (After)
gsm8k	3	flexible-extract	5	exact_match	0.2790	0.2813	0.0124	0.0124
		strict-match	5	exact_match	0.2767	0.2790	0.0123	0.0124

Qwen/Qwen3-235B-A22B-FP8

Tasks	Version	Filter	n-shot	Metric	Value (Before)	Value (After)	Stderr (Before)	Stderr (After)
gsm8k	3	flexible-extract	5	exact_match	0.8802	0.8741	0.0089	0.0091
		strict-match	5	exact_match	0.8605	0.8544	0.0095	0.0097

mistralai/Mixtral-8x7B-Instruct-v0.1

Tasks	Version	Filter	n-shot	Metric	Value (Before)	Value (After)	Stderr (Before)	Stderr (After)
gsm8k	3	flexible-extract	5	exact_match	0.6497	0.6475	0.0131	0.0132
		strict-match	5	exact_match	0.6452	0.6429	0.0132	0.0132

mistralai/Mistral-7B-Instruct-v0.1 (-tp 2)

Tasks	Version	Filter	n-shot	Metric	Value (Before)	Value (After)	Stderr (Before)	Stderr (After)
gsm8k	3	flexible-extract	5	exact_match	0.3404	0.3381	0.0131	0.013
		strict-match	5	exact_match	0.3336	0.3328	0.0130	0.013

Signed-off-by: vllmellm <[email protected]>

github-actions · 2025-08-08T12:34:41Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request integrates AITensaR (AITER) Rotary Position Embedding (RoPE) operations for ROCm to improve performance. The changes look promising and the benchmark results are positive. I've found a critical bug that could lead to a runtime error and a typo in a function name that should be corrected for maintainability. Please see my detailed comments.

vllm/model_executor/layers/rotary_embedding/deepseek_scaling_rope.py

vllm/model_executor/layers/rotary_embedding/rocm_aiter_rope_ops.py

Signed-off-by: vllmellm <[email protected]>

vllm/model_executor/layers/rotary_embedding/deepseek_scaling_rope.py

DarkLight1337

LGTM, can you merge from main to fix CI?

…-project#22521) Signed-off-by: vllmellm <[email protected]> Signed-off-by: Paul Pak <[email protected]>

…-project#22521) Signed-off-by: vllmellm <[email protected]> Signed-off-by: Diego-Castan <[email protected]>

…-project#22521) Signed-off-by: vllmellm <[email protected]>

…-project#22521) Signed-off-by: vllmellm <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

…-project#22521) Signed-off-by: vllmellm <[email protected]>

vllmellm added 12 commits August 5, 2025 09:39

integrate aiter rope module

29d44d1

Signed-off-by: vllmellm <[email protected]>

fix device type

8af1c71

Signed-off-by: vllmellm <[email protected]>

add aiter rope hip forward func

2ec0a28

Signed-off-by: vllmellm <[email protected]>

fix bugs in function registeration and invocation

ab16de6

Signed-off-by: vllmellm <[email protected]>

fix shape mismatch

f778d2a

Signed-off-by: vllmellm <[email protected]>

fix pre-commit issues

607f62b

Signed-off-by: vllmellm <[email protected]>

clean up forward_hip

2e22977

Signed-off-by: vllmellm <[email protected]>

bugfix

7f016e7

Signed-off-by: vllmellm <[email protected]>

improve code

04458da

Signed-off-by: vllmellm <[email protected]>

Merge remote-tracking branch 'origin/main' into aiter-rope

54c56af

bugfix

3234686

Signed-off-by: vllmellm <[email protected]>

remove unnecessary command

9708eb2

Signed-off-by: vllmellm <[email protected]>

mergify bot added the rocm Related to AMD ROCm label Aug 8, 2025

gemini-code-assist bot reviewed Aug 8, 2025

View reviewed changes

vllm/model_executor/layers/rotary_embedding/deepseek_scaling_rope.py Outdated Show resolved Hide resolved

vllm/model_executor/layers/rotary_embedding/rocm_aiter_rope_ops.py Outdated Show resolved Hide resolved

fix miss-spelling function name

6abeb5a

Signed-off-by: vllmellm <[email protected]>

vllmellm marked this pull request as ready for review August 8, 2025 12:49

tjtanaa mentioned this pull request Aug 8, 2025

[Feature] [ROCm]: AITER Kernel Integration #14964

Open

61 tasks

DarkLight1337 reviewed Aug 10, 2025

View reviewed changes

vllm/model_executor/layers/rotary_embedding/deepseek_scaling_rope.py Show resolved Hide resolved

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 11, 2025

DarkLight1337 approved these changes Aug 11, 2025

View reviewed changes

Merge remote-tracking branch 'origin/main' into aiter-rope

5c8101a

vllm-bot merged commit 9c97a1c into vllm-project:main Aug 11, 2025
37 of 41 checks passed

paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025

[ROCm][AITER] Support AITER Rope ops in RotaryEmbedding Module. (vllm…

d627c0a

…-project#22521) Signed-off-by: vllmellm <[email protected]> Signed-off-by: Paul Pak <[email protected]>

diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025

[ROCm][AITER] Support AITER Rope ops in RotaryEmbedding Module. (vllm…

3fb860b

…-project#22521) Signed-off-by: vllmellm <[email protected]> Signed-off-by: Diego-Castan <[email protected]>

yiliu30 pushed a commit to yiliu30/vllm-fork that referenced this pull request Aug 19, 2025

[ROCm][AITER] Support AITER Rope ops in RotaryEmbedding Module. (vllm…

17d7096

…-project#22521) Signed-off-by: vllmellm <[email protected]>

vllmellm mentioned this pull request Aug 26, 2025

[Feature] [ROCm]: AITER Kernel Integration vllmellm/vllm#51

Open

61 tasks

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

[ROCm][AITER] Support AITER Rope ops in RotaryEmbedding Module. (vllm…

2b2c323

…-project#22521) Signed-off-by: vllmellm <[email protected]>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

[ROCm][AITER] Support AITER Rope ops in RotaryEmbedding Module. (vllm…

12025f7

…-project#22521) Signed-off-by: vllmellm <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

[ROCm][AITER] Support AITER Rope ops in RotaryEmbedding Module. (vllm…

e7156ad

…-project#22521) Signed-off-by: vllmellm <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[ROCm][AITER] Support AITER Rope ops in RotaryEmbedding Module. #22521

[ROCm][AITER] Support AITER Rope ops in RotaryEmbedding Module. #22521

Uh oh!

vllmellm commented Aug 8, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Aug 8, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[ROCm][AITER] Support AITER Rope ops in RotaryEmbedding Module. #22521

[ROCm][AITER] Support AITER Rope ops in RotaryEmbedding Module. #22521

Uh oh!

Conversation

vllmellm commented Aug 8, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

meta-llama/Meta-Llama-3-8B-Instruct

deepseek-ai/DeepSeek-V2-Lite-Chat

Test Plan

Test Result

meta-llama/Meta-Llama-3-8B-Instruct

deepseek-ai/DeepSeek-V2-Lite-Chat (-tp 1)

tencent/Hunyuan-A13B-Pretrain (-tp 2)

NousResearch/Yarn-Mistral-7b-128k (-tp 2)

Qwen/Qwen3-235B-A22B-FP8

mistralai/Mixtral-8x7B-Instruct-v0.1

mistralai/Mistral-7B-Instruct-v0.1 (-tp 2)

Uh oh!

github-actions bot commented Aug 8, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vllmellm commented Aug 8, 2025 •

edited by github-actions bot

Loading