[Bugfix] Add regression test for allreduce RMS fusion with PP#35960
[Bugfix] Add regression test for allreduce RMS fusion with PP#35960robellliu-dev wants to merge 6 commits into
Conversation
Signed-off-by: Codex <codex@openai.com>
Signed-off-by: Codex <codex@openai.com>
…eate-pr [Bugfix] Add regression test for allreduce RMS fusion with PP
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Code Review
This pull request adds a regression test to ensure enable_allreduce_rms_fusion is correctly disabled when pipeline parallelism is used. The added test is correct and addresses the issue. I've suggested an improvement to make the test more comprehensive by using parameterization to cover all gating conditions in the function, which will enhance its robustness against future regressions.
| def test_enable_allreduce_rms_fusion_disabled_for_pp(): | ||
| cfg = VllmConfig( | ||
| parallel_config=ParallelConfig( | ||
| tensor_parallel_size=2, | ||
| pipeline_parallel_size=1, | ||
| data_parallel_size=1, | ||
| ) | ||
| ) | ||
|
|
||
| with ( | ||
| patch("vllm.utils.flashinfer.has_flashinfer", return_value=True), | ||
| patch.object(current_platform, "is_cuda", return_value=True), | ||
| patch.object(current_platform, "is_device_capability", return_value=True), | ||
| ): | ||
| assert enable_allreduce_rms_fusion(cfg) | ||
|
|
||
| cfg.parallel_config.pipeline_parallel_size = 2 | ||
| assert not enable_allreduce_rms_fusion(cfg) |
There was a problem hiding this comment.
The current test correctly covers the pipeline parallelism case as intended. However, to make it more robust and prevent future regressions, it would be beneficial to cover all gating conditions within enable_allreduce_rms_fusion, including tensor_parallel_size and data_parallel_size.
A parameterized test would be a clean way to test all ParallelConfig combinations and improve the test's clarity and maintainability.
@pytest.mark.parametrize(
("parallel_config", "should_be_enabled"),
[
# Should be enabled with only TP > 1
(ParallelConfig(tensor_parallel_size=2, pipeline_parallel_size=1, data_parallel_size=1), True),
# Should be disabled with TP <= 1
(ParallelConfig(tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1), False),
# Should be disabled with PP > 1
(ParallelConfig(tensor_parallel_size=2, pipeline_parallel_size=2, data_parallel_size=1), False),
# Should be disabled with DP > 1
(ParallelConfig(tensor_parallel_size=2, pipeline_parallel_size=1, data_parallel_size=2), False),
],
ids=["TP-only", "No-TP", "With-PP", "With-DP"]
)
def test_enable_allreduce_rms_fusion_gating(parallel_config, should_be_enabled):
cfg = VllmConfig(parallel_config=parallel_config)
with (
patch("vllm.utils.flashinfer.has_flashinfer", return_value=True),
patch.object(current_platform, "is_cuda", return_value=True),
patch.object(current_platform, "is_device_capability", return_value=True),
):
assert enable_allreduce_rms_fusion(cfg) is should_be_enabledSigned-off-by: Codex <codex@openai.com>
|
This pull request has merge conflicts that must be resolved before it can be |
|
This pull request has merge conflicts that must be resolved before it can be |
Motivation
Prevent a regression where enable_allreduce_rms_fusion would be incorrectly enabled when pipeline_parallel_size > 1, covering the scenario reported in issue #32730.
Ensure the gating logic for enabling the allreduce+RMS fusion is explicitly tested for tensor-parallel-only and pipeline-parallel cases.
Description
Added a new unit test test_enable_allreduce_rms_fusion_disabled_for_pp in tests/test_config.py that exercises enable_allreduce_rms_fusion under different ParallelConfig settings.
Imported enable_allreduce_rms_fusion into the test module and mocked vllm.utils.flashinfer.has_flashinfer and current_platform capability checks to isolate config gating logic.
Commit uses the [Bugfix] PR title prefix and includes a Signed-off-by: header to satisfy the project's DCO requirement.
Testing
Ran ruff check tests/test_config.py and it passed.
Ran python -m py_compile tests/test_config.py and it succeeded.
Attempted pytest -q tests/test_config.py -k enable_allreduce_rms_fusion_disabled_for_pp, but the run was blocked in this environment due to a missing test dependency (tblib) and network restrictions preventing installation, so the pytest run could not be completed.