[ROCm]: Enable customop and rope+kvcache fusion for AITER RoPE#35180
[ROCm]: Enable customop and rope+kvcache fusion for AITER RoPE#35180gshtras merged 11 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request enables the AITER RoPE custom op and RoPE+KVCache fusion for ROCm. The changes look good overall, enabling the new custom op and updating relevant parts of the code, including tests and configuration. However, I found a critical issue in the configuration that prevents the new RoPE+KVCache fusion from being enabled by default, which contradicts the goal of this PR. Please see the detailed comment.
vllm/config/vllm.py
Outdated
| return ( | ||
| cfg.compilation_config.is_custom_op_enabled("rotary_embedding") | ||
| and cfg.compilation_config.use_inductor_graph_partition | ||
| ) |
There was a problem hiding this comment.
The condition and cfg.compilation_config.use_inductor_graph_partition will cause this function to return False for the default optimization levels (O2, O3), as use_inductor_graph_partition is set to False for them. This effectively disables the fuse_rope_kvcache feature that this pull request intends to enable. To ensure the fusion is enabled by default as intended, this condition should be removed.
return cfg.compilation_config.is_custom_op_enabled("rotary_embedding")Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
Hi @Rohan138, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
Hi @Rohan138, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
Hi @Rohan138, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
| return ( | ||
| rocm_aiter_ops.is_enabled() | ||
| and cfg.compilation_config.is_custom_op_enabled("rotary_embedding") | ||
| and cfg.compilation_config.use_inductor_graph_partition |
There was a problem hiding this comment.
So are you guys using inductor graph partition on rocm by default? Otherwise we should also return true here I'd dynamo partition and kv cache op not in splitting ops
There was a problem hiding this comment.
(Somehow GH ate my original PR comment that explained this)
This PR is necessary but not sufficient to actually enable this fusion by default. We also need:
- [ROCm] Add extra step in config initialization to populate custom ops before compilation config init #34848 to actually enable the
rotary_embeddingcustom op before this check - [Feature]: Remove attention layer name from
unified_kv_cache_update#33267 to removelayer_namefromunified_kv_cache_update(and thus remove the op from splitting_ops if using dynamo partition). - We need some offline discussion on improving our pass management depending on platform, AITER/FI enabled, etc.
return true if dynamo partition and kv cache op not in splitting ops
https://github.com/vllm-project/vllm/blob/main/vllm/config/compilation.py#1001 is called in https://github.com/vllm-project/vllm/blob/main/vllm/config/vllm.py#L961 after the defaults are set in https://github.com/vllm-project/vllm/blob/main/vllm/config/vllm.py#L807. So if inductor partition is not enabled, we would return true for this, then append kv cache to splitting ops, which would silently break the fusion.
There was a problem hiding this comment.
Links are broken but I know what you mean - but if splitting_ops=[] is passed kvcache won't be added so it should still work. So this check should be if inductor_partition or len(splitting_ops)==0
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
…ROCm/vllm into fused_aiter_triton_rope_kvcache
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
…project#35180) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
…project#35180) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: xjx <493337577@qq.com>
…project#35180) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
…project#35180) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
@Rohan138 is the lm eval score from DeepSeek-R1? The GSM8K score should be at |
…project#35180) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com>
…project#35180) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Purpose
Follow-up to #25135 and #33443 to fix and enable the AITER RoPE custom op by default, and enable RoPE+KVCache fusion. During prefill
(q.shape[0] > 256), we thus use the AITER unfused RoPE kernel instead of the vllm native custom op, which gives another 1% uplift on gpt-oss.This is also a mild prerequisite for MLA RoPE+KVCache fusion on ROCm, since there currently isn't a vllm native custom op for
DeepseekScalingRotaryEmbedding.Test Plan
Before:
Test Result
After:
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.