[ROCm] [CI] Add new fusion test cases that are relevant to vLLM IR Ops#34307
[ROCm] [CI] Add new fusion test cases that are relevant to vLLM IR Ops#34307vllm-bot merged 54 commits intovllm-project:mainfrom
Conversation
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Code Review
This pull request adds new fusion test cases for ROCm to the CI pipeline. The changes look good overall, with modifications to CI configuration and test files to support ROCm-specific backends and features. I've found one issue in the CI configuration that needs to be addressed.
| - rocm-smi | ||
| # Run just llama3 (fp8) for all config combinations | ||
| - "pytest -v -s tests/compile/fusions_e2e/test_tp1_quant.py -k 'llama-3'" | ||
| - "pytest -v -s tests/compile/fusions_e2e/test_tp1_quant.py -k 'inductor_partition and not +rms_norm and not +quant_fp8' -k 'inductor_partition and not +rms_norm and +quant_fp8 and qwen3' -k 'llama-3'" |
There was a problem hiding this comment.
The pytest command on this line appears to have a logical error in its -k expressions. The combination of 'not +quant_fp8' and '+quant_fp8 and qwen3' in separate -k arguments will result in no tests being selected because pytest combines multiple -k options with an AND operator, and these two expressions are mutually exclusive.
Given that the comment on line 1763 states the goal is to "Run just llama3 (fp8) for all config combinations", and line 1764 already runs all llama-3 tests, this line seems redundant and incorrect. It might be a copy-paste error.
I suggest removing this line to fix the test step.
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
| + [ | ||
| (TestSiluMulNvfp4QuantModel, False, None), | ||
| (TestSiluMulGroupFp8QuantModel, False, None), | ||
| pytest.param( |
There was a problem hiding this comment.
I think this was added for ROCm, we don't have this for CUDA: #25693
There was a problem hiding this comment.
We have removed that test case (TestSiluMulGroupFp8QuantModel, False, None),. If you look back #25693, they hardcoded to search for the matching of AITER ops into a fused AITER op.
The valid fusion pass required us to enable +quant_fp8.
| m.setenv("VLLM_ROCM_USE_AITER", "1") | ||
| rocm_aiter_ops.refresh_env_variables() |
There was a problem hiding this comment.
This is a bit ugly, can we just do monkeypatch.setenv() at the start of the test?
There was a problem hiding this comment.
Let's keep it here so that we don't need to redundantly check for IS_AITER_FOUND. We still need to call IS_AITER_FOUND if we move the monkeypatch to the start.
with set_current_vllm_config(config), monkeypatch.context() as m:
fusion_passes = [ActivationQuantFusionPass(config)]
+ if IS_AITER_FOUND and model_class is TestSiluMulGroupFp8QuantModel:
from vllm._aiter_ops import rocm_aiter_ops
from vllm.compilation.passes.fusion.rocm_aiter_fusion import (
RocmAiterSiluMulFp8GroupQuantFusionPass,
)
m.setenv("VLLM_ROCM_USE_AITER", "1")
rocm_aiter_ops.refresh_env_variables()
+ fusion_passes += [RocmAiterSiluMulFp8GroupQuantFusionPass(config)]| - VLLM_TEST_CLEAN_GPU_MEMORY=1 pytest -v -s tests/compile/passes/distributed/test_async_tp.py | ||
| - pytest -v -s tests/compile/passes/distributed/test_sequence_parallelism.py | ||
| - pytest -v -s tests/compile/passes/distributed/test_fusion_all_reduce.py | ||
| # TODO: this test is not supported on ROCm, there are aiter kernels for this. |
There was a problem hiding this comment.
Can you quote this issue (and let's make a sub-issue?): #25179
| - VLLM_TEST_CLEAN_GPU_MEMORY=1 pytest -v -s tests/compile/passes/distributed/test_async_tp.py | ||
| - pytest -v -s tests/compile/passes/distributed/test_sequence_parallelism.py |
There was a problem hiding this comment.
These are actually passing? I'm surprised
There was a problem hiding this comment.
Yea. The test still passes but the logs are not useful. fused ops just call torch.ops.symm_mem which exists in ROCm even though they don't work.
The tests/compile/fusions_e2e/test_tp2_async_tp.py also passes. But it doesn't mean this feature works on ROCm
| - "pytest -v -s tests/compile/passes/test_fusion_attn.py" | ||
| - "pytest -v -s tests/compile/passes/test_silu_mul_quant_fusion.py" |
There was a problem hiding this comment.
This is duplicating the for loop above (seems like both are mi325_1): PyTorch Compilation Passes Unit Tests
There was a problem hiding this comment.
@ProExpertProg We are just following the .buildkite/test_areas/compile.yaml and .buildkite/test_areas/pytorch.yaml. So this means the CUDA tests also has redundant runs.
There was a problem hiding this comment.
Yeah but on CUDA we duplicate on purpose because we need to run FP4 fusion tests on blackwell. I don't think it makes sense to replicate on ROCm and run duplicate tests on the exact same GPU.
There was a problem hiding this comment.
Ok. I have removed the redundant tests. I commented out the whole test group and leave it there as there are TODOs for this test group.
ProExpertProg
left a comment
There was a problem hiding this comment.
A few minor notes, otherwise LGTM!
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
vllm-project#34307) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
vllm-project#34307) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
vllm-project#34307) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
Purpose
Split away from #34244 to focus on fusion pass only.
Test Plan
Triggered the tests on AMD CI.
Test Result
https://buildkite.com/vllm/amd-ci/builds/4599/steps/canvas
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.