Skip to content

[ROCm] [CI] Add new fusion test cases that are relevant to vLLM IR Ops#34307

Merged
vllm-bot merged 54 commits intovllm-project:mainfrom
EmbeddedLLM:fusionpasscionly
Mar 3, 2026
Merged

[ROCm] [CI] Add new fusion test cases that are relevant to vLLM IR Ops#34307
vllm-bot merged 54 commits intovllm-project:mainfrom
EmbeddedLLM:fusionpasscionly

Conversation

@tjtanaa
Copy link
Copy Markdown
Collaborator

@tjtanaa tjtanaa commented Feb 11, 2026

Purpose

Split away from #34244 to focus on fusion pass only.

Test Plan

Triggered the tests on AMD CI.

Test Result

https://buildkite.com/vllm/amd-ci/builds/4599/steps/canvas


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

tjtanaa and others added 15 commits February 9, 2026 08:28
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
@mergify mergify Bot added ci/build rocm Related to AMD ROCm labels Feb 11, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Feb 11, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @tjtanaa.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds new fusion test cases for ROCm to the CI pipeline. The changes look good overall, with modifications to CI configuration and test files to support ROCm-specific backends and features. I've found one issue in the CI configuration that needs to be addressed.

Comment thread .buildkite/test-amd.yaml Outdated
- rocm-smi
# Run just llama3 (fp8) for all config combinations
- "pytest -v -s tests/compile/fusions_e2e/test_tp1_quant.py -k 'llama-3'"
- "pytest -v -s tests/compile/fusions_e2e/test_tp1_quant.py -k 'inductor_partition and not +rms_norm and not +quant_fp8' -k 'inductor_partition and not +rms_norm and +quant_fp8 and qwen3' -k 'llama-3'"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The pytest command on this line appears to have a logical error in its -k expressions. The combination of 'not +quant_fp8' and '+quant_fp8 and qwen3' in separate -k arguments will result in no tests being selected because pytest combines multiple -k options with an AND operator, and these two expressions are mutually exclusive.

Given that the comment on line 1763 states the goal is to "Run just llama3 (fp8) for all config combinations", and line 1764 already runs all llama-3 tests, this line seems redundant and incorrect. It might be a copy-paste error.

I suggest removing this line to fix the test step.

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
@tjtanaa tjtanaa added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 11, 2026
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Comment thread tests/compile/fusions_e2e/test_tp1_quant.py Outdated
Comment thread tests/compile/fusions_e2e/test_tp1_quant.py
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Comment thread tests/compile/fusions_e2e/models.py
Comment thread tests/compile/fusions_e2e/models.py
Comment thread tests/compile/fusions_e2e/test_tp1_quant.py Outdated
+ [
(TestSiluMulNvfp4QuantModel, False, None),
(TestSiluMulGroupFp8QuantModel, False, None),
pytest.param(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was added for ROCm, we don't have this for CUDA: #25693

Copy link
Copy Markdown
Collaborator Author

@tjtanaa tjtanaa Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have removed that test case (TestSiluMulGroupFp8QuantModel, False, None),. If you look back #25693, they hardcoded to search for the matching of AITER ops into a fused AITER op.

The valid fusion pass required us to enable +quant_fp8.

Comment on lines +256 to +257
m.setenv("VLLM_ROCM_USE_AITER", "1")
rocm_aiter_ops.refresh_env_variables()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit ugly, can we just do monkeypatch.setenv() at the start of the test?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it here so that we don't need to redundantly check for IS_AITER_FOUND. We still need to call IS_AITER_FOUND if we move the monkeypatch to the start.

    with set_current_vllm_config(config), monkeypatch.context() as m:
        fusion_passes = [ActivationQuantFusionPass(config)]
+        if IS_AITER_FOUND and model_class is TestSiluMulGroupFp8QuantModel:
            from vllm._aiter_ops import rocm_aiter_ops
            from vllm.compilation.passes.fusion.rocm_aiter_fusion import (
                RocmAiterSiluMulFp8GroupQuantFusionPass,
            )

            m.setenv("VLLM_ROCM_USE_AITER", "1")
            rocm_aiter_ops.refresh_env_variables()
+            fusion_passes += [RocmAiterSiluMulFp8GroupQuantFusionPass(config)]

Comment thread .buildkite/test-amd.yaml
- VLLM_TEST_CLEAN_GPU_MEMORY=1 pytest -v -s tests/compile/passes/distributed/test_async_tp.py
- pytest -v -s tests/compile/passes/distributed/test_sequence_parallelism.py
- pytest -v -s tests/compile/passes/distributed/test_fusion_all_reduce.py
# TODO: this test is not supported on ROCm, there are aiter kernels for this.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you quote this issue (and let's make a sub-issue?): #25179

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created

Comment thread .buildkite/test-amd.yaml
Comment on lines 1525 to 1526
- VLLM_TEST_CLEAN_GPU_MEMORY=1 pytest -v -s tests/compile/passes/distributed/test_async_tp.py
- pytest -v -s tests/compile/passes/distributed/test_sequence_parallelism.py
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are actually passing? I'm surprised

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea. The test still passes but the logs are not useful. fused ops just call torch.ops.symm_mem which exists in ROCm even though they don't work.

The tests/compile/fusions_e2e/test_tp2_async_tp.py also passes. But it doesn't mean this feature works on ROCm

Comment thread .buildkite/test-amd.yaml Outdated
Comment on lines +1674 to +1675
- "pytest -v -s tests/compile/passes/test_fusion_attn.py"
- "pytest -v -s tests/compile/passes/test_silu_mul_quant_fusion.py"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplicating the for loop above (seems like both are mi325_1): PyTorch Compilation Passes Unit Tests

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ProExpertProg We are just following the .buildkite/test_areas/compile.yaml and .buildkite/test_areas/pytorch.yaml. So this means the CUDA tests also has redundant runs.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but on CUDA we duplicate on purpose because we need to run FP4 fusion tests on blackwell. I don't think it makes sense to replicate on ROCm and run duplicate tests on the exact same GPU.

Copy link
Copy Markdown
Collaborator Author

@tjtanaa tjtanaa Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I have removed the redundant tests. I commented out the whole test group and leave it there as there are TODOs for this test group.

Copy link
Copy Markdown
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor notes, otherwise LGTM!

tjtanaa and others added 4 commits March 2, 2026 06:56
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
@vllm-bot vllm-bot merged commit fb7fdc4 into vllm-project:main Mar 3, 2026
58 of 60 checks passed
@github-project-automation github-project-automation Bot moved this from Todo to Done in AMD Mar 3, 2026
Copilot AI pushed a commit to machov/vllm that referenced this pull request Mar 10, 2026
vllm-project#34307)

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
avinashsingh77 pushed a commit to avinashsingh77/vllm that referenced this pull request Mar 12, 2026
vllm-project#34307)

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026
vllm-project#34307)

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants