[ROCm] [CI] Add new fusion test cases that are relevant to vLLM IR Ops by tjtanaa · Pull Request #34307 · vllm-project/vllm

tjtanaa · 2026-02-11T04:17:18Z

Purpose

Split away from #34244 to focus on fusion pass only.

Test Plan

Triggered the tests on AMD CI.

Test Result

https://buildkite.com/vllm/amd-ci/builds/4599/steps/canvas

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

…to fusionpassci

Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

mergify · 2026-02-11T04:17:57Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @tjtanaa.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gemini-code-assist

Code Review

This pull request adds new fusion test cases for ROCm to the CI pipeline. The changes look good overall, with modifications to CI configuration and test files to support ROCm-specific backends and features. I've found one issue in the CI configuration that needs to be addressed.

gemini-code-assist · 2026-02-11T04:19:02Z

+    - rocm-smi
+    # Run just llama3 (fp8) for all config combinations
+    - "pytest -v -s tests/compile/fusions_e2e/test_tp1_quant.py -k 'llama-3'"
+    - "pytest -v -s tests/compile/fusions_e2e/test_tp1_quant.py -k 'inductor_partition and not +rms_norm and not +quant_fp8' -k 'inductor_partition and not +rms_norm and +quant_fp8 and qwen3' -k 'llama-3'"


The pytest command on this line appears to have a logical error in its -k expressions. The combination of 'not +quant_fp8' and '+quant_fp8 and qwen3' in separate -k arguments will result in no tests being selected because pytest combines multiple -k options with an AND operator, and these two expressions are mutually exclusive.

Given that the comment on line 1763 states the goal is to "Run just llama3 (fp8) for all config combinations", and line 1764 already runs all llama-3 tests, this line seems redundant and incorrect. It might be a copy-paste error.

I suggest removing this line to fix the test step.

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

ProExpertProg · 2026-02-26T21:07:33Z

    + [
        (TestSiluMulNvfp4QuantModel, False, None),
-        (TestSiluMulGroupFp8QuantModel, False, None),
+        pytest.param(


I think this was added for ROCm, we don't have this for CUDA: #25693

We have removed that test case (TestSiluMulGroupFp8QuantModel, False, None),. If you look back #25693, they hardcoded to search for the matching of AITER ops into a fused AITER op.

The valid fusion pass required us to enable +quant_fp8.

ProExpertProg · 2026-02-26T21:09:02Z

+            m.setenv("VLLM_ROCM_USE_AITER", "1")
+            rocm_aiter_ops.refresh_env_variables()


This is a bit ugly, can we just do monkeypatch.setenv() at the start of the test?

Let's keep it here so that we don't need to redundantly check for IS_AITER_FOUND. We still need to call IS_AITER_FOUND if we move the monkeypatch to the start.

with set_current_vllm_config(config), monkeypatch.context() as m: fusion_passes = [ActivationQuantFusionPass(config)] + if IS_AITER_FOUND and model_class is TestSiluMulGroupFp8QuantModel: from vllm._aiter_ops import rocm_aiter_ops from vllm.compilation.passes.fusion.rocm_aiter_fusion import ( RocmAiterSiluMulFp8GroupQuantFusionPass, ) m.setenv("VLLM_ROCM_USE_AITER", "1") rocm_aiter_ops.refresh_env_variables() + fusion_passes += [RocmAiterSiluMulFp8GroupQuantFusionPass(config)]

ProExpertProg · 2026-02-26T21:17:15Z

    - VLLM_TEST_CLEAN_GPU_MEMORY=1 pytest -v -s tests/compile/passes/distributed/test_async_tp.py
    - pytest -v -s tests/compile/passes/distributed/test_sequence_parallelism.py
-    - pytest -v -s tests/compile/passes/distributed/test_fusion_all_reduce.py
+    # TODO: this test is not supported on ROCm, there are aiter kernels for this.


Can you quote this issue (and let's make a sub-issue?): #25179

ProExpertProg · 2026-02-26T21:18:10Z

    - VLLM_TEST_CLEAN_GPU_MEMORY=1 pytest -v -s tests/compile/passes/distributed/test_async_tp.py
    - pytest -v -s tests/compile/passes/distributed/test_sequence_parallelism.py


These are actually passing? I'm surprised

Yea. The test still passes but the logs are not useful. fused ops just call torch.ops.symm_mem which exists in ROCm even though they don't work.

The tests/compile/fusions_e2e/test_tp2_async_tp.py also passes. But it doesn't mean this feature works on ROCm

ProExpertProg · 2026-02-26T21:19:36Z

+    - "pytest -v -s tests/compile/passes/test_fusion_attn.py"
+    - "pytest -v -s tests/compile/passes/test_silu_mul_quant_fusion.py"


This is duplicating the for loop above (seems like both are mi325_1): PyTorch Compilation Passes Unit Tests

@ProExpertProg We are just following the .buildkite/test_areas/compile.yaml and .buildkite/test_areas/pytorch.yaml. So this means the CUDA tests also has redundant runs.

Yeah but on CUDA we duplicate on purpose because we need to run FP4 fusion tests on blackwell. I don't think it makes sense to replicate on ROCm and run duplicate tests on the exact same GPU.

Ok. I have removed the redundant tests. I commented out the whole test group and leave it there as there are TODOs for this test group.

ProExpertProg

A few minor notes, otherwise LGTM!

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

vllm-project#34307) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>

tjtanaa and others added 15 commits February 9, 2026 08:28

try to enable new fusion pass test for ROCm

230246d

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

fix silu-mul-groupquant fuion test

1c9552a

Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

fix full graph test

bffe181

Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

clearer test case for silu mul and group quant test

28ed03f

Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

fix e2e fusion tests

5628eb9

Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

Merge branch 'fusionpassci' of https://github.com/EmbeddedLLM/vllm in…

223fe34

…to fusionpassci

fix tests in fusion silu_mul and tidy up kite

119b4b0

Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

remove unnecessary change

218fcfb

Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

remove duplicate

befaba1

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

need to add quote

ca801a1

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

fix syntax

0b65174

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

fix Fusion E2E TP2 (MI325) path

be40a22

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

fix test-amd syntax

d8d0712

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

Merge remote-tracking branch 'origin/main' into fusionpassci

a03b94d

revert pytorch tests

f58033a

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

mergify Bot added ci/build rocm Related to AMD ROCm labels Feb 11, 2026

mergify Bot added the needs-rebase label Feb 11, 2026

github-project-automation Bot added this to AMD Feb 11, 2026

github-project-automation Bot moved this to Todo in AMD Feb 11, 2026

gemini-code-assist Bot reviewed Feb 11, 2026

View reviewed changes

tjtanaa added 6 commits February 11, 2026 05:01

fix agent pool

eabee32

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

add fix test_full_graph

56ac061

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

remove unrelated comment

b8c0bcd

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

reduce test and compute resource

9ef71e4

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

skip kvcache tests and reverted the changes in test_full_graph

158ea2f

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

remove tj marker

0997661

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

tjtanaa added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 11, 2026

syn main

f432148

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

remove todo

0a42a79

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

ProExpertProg reviewed Feb 26, 2026

View reviewed changes

Comment thread tests/compile/fusions_e2e/test_tp1_quant.py Outdated

ProExpertProg reviewed Feb 26, 2026

View reviewed changes

Comment thread tests/compile/fusions_e2e/test_tp1_quant.py

tjtanaa added 2 commits February 26, 2026 01:32

remove unnecessary test_tp1_quant.py

3204c5c

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

apply reviewer feedback

de42cfb

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

tjtanaa commented Feb 26, 2026

View reviewed changes

Comment thread tests/compile/fusions_e2e/models.py

tjtanaa commented Feb 26, 2026

View reviewed changes

Comment thread tests/compile/fusions_e2e/models.py

ProExpertProg reviewed Feb 26, 2026

View reviewed changes

Comment thread tests/compile/fusions_e2e/test_tp1_quant.py Outdated

ProExpertProg reviewed Feb 26, 2026

View reviewed changes

ProExpertProg approved these changes Feb 26, 2026

View reviewed changes

tjtanaa and others added 4 commits March 2, 2026 06:56

remove comment

a168f7b

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

fix SiluMulGroupQaunt

53d253d

Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

comment out redundant tests

8374509

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

Merge remote-tracking branch 'origin/main' into fusionpasscionly

4b0cd59

ProExpertProg approved these changes Mar 3, 2026

View reviewed changes

vllm-bot merged commit fb7fdc4 into vllm-project:main Mar 3, 2026
58 of 60 checks passed

github-project-automation Bot moved this from Todo to Done in AMD Mar 3, 2026

		m.setenv("VLLM_ROCM_USE_AITER", "1")
		rocm_aiter_ops.refresh_env_variables()

		- VLLM_TEST_CLEAN_GPU_MEMORY=1 pytest -v -s tests/compile/passes/distributed/test_async_tp.py
		- pytest -v -s tests/compile/passes/distributed/test_sequence_parallelism.py

		- "pytest -v -s tests/compile/passes/test_fusion_attn.py"
		- "pytest -v -s tests/compile/passes/test_silu_mul_quant_fusion.py"

Uh oh!

Conversation

tjtanaa commented Feb 11, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify Bot commented Feb 11, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tjtanaa Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tjtanaa Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tjtanaa commented Feb 11, 2026 •

edited by github-actions Bot

Loading

tjtanaa Mar 3, 2026 •

edited

Loading

tjtanaa Mar 3, 2026 •

edited

Loading