[CI][torch.compile] Reduce e2e fusion test time by ProExpertProg · Pull Request #33293 · vllm-project/vllm

ProExpertProg · 2026-01-29T00:01:57Z

Purpose

Fusion E2E tests are out of control: they have poor coverage but also take a long time in CI.

This PR simultaneously improves coverage, splits up the tests, and cuts running times by reducing n_hidden_layers and using dummy weights. The old E2E tests are removed completely in favor of a new fusions_e2e directory. We add utilities to make it easier to add models and fusions in the future.

In CI, the E2E fusion tests are now split into "quick" (all models, single config) and "sweep (single model, sweeping all of config). "quick" tests run on any change in vllm and are limited to <15mins. Sweep tests only run on specific changes to compilation/model forward code.

Additionally, distributed compilation tests are pulled out of distributed and added to distributed compilation tests.

Follow-ups

Add ROCm test cases (just tp1, add attention backend cases and ROCm-specific fusions)
Improve we are matching on
Fix broken fusions

Before

Distributed

Compile

PyTorch

SP

After

Distributed

Compile

PyTorch

SP

Test Plan

CI

Test Result

Looks good

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

gemini-code-assist

Code Review

This pull request aims to reduce test time on Blackwell GPUs by modifying the CI pipeline. It changes the pytest command for test_fusion_attn.py to only run test_attention_quant_pattern, effectively disabling the test_attn_quant integration test. While this does reduce test time, I have a concern that disabling test_attn_quant may leave a gap in test coverage for attention quantization on full models for Blackwell. I've added a comment with more details.

.buildkite/test-pipeline.yaml

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

…debugging, fix qwen fusion logic Signed-off-by: Luka Govedič <lgovedic@redhat.com>

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

…bers, fixing check & skip consistency Signed-off-by: Luka Govedič <lgovedic@redhat.com>

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

tjtanaa · 2026-02-04T00:32:52Z

tests/test_config.py

    )
    assert config.compilation_config.cudagraph_mode == CUDAGraphMode.NONE
-    assert config.compilation_config.pass_config.fuse_gemm_comms is True
+    assert config.compilation_config.pass_config.enable_qk_norm_rope_fusion is True


should we start standardizing the terms?
fuse_gemm_comms -> gemm_comms_fusion (I saw most of them are in this format. {custom_op_name1}_{custom_op_name2}_fusion
enable_qk_norm_rope_fusion -> qk_norm_rope_fusion ?

for pass config we did standardize on fuse_<op>_<op> - but the norm-rope one landed during the renaming. We should just fix that one, I haven't had a chance

mgoin · 2026-02-04T02:11:16Z

This PR is confusing because there are so many more tests and in some cases the total time is increased, like the sequence parallel tests going from 1@1hr to now 2@40min. Can we just keep the same number of tests but make them faster?

ProExpertProg · 2026-02-04T03:40:10Z

Where do you see 2h40mins? There are two tests 40mins each, 1 on L40 and 1 on h100 (nightly only), which is the same as before. Also before the tests were like 1h each because they also included the unit tests which were moved into distributed unit tests. To clarify, one of the SP tests came from distributed which is why that one is now 40mins faster.

For the E2E tests, they've been breaking a lot so I wanted some signal on all changes in vllm, and that's why I tried to keep them to under 20 mins. I don't think the old grouping made sense. And with test areas I feel like it's fine to have more ci tests because they're organized much better.

Do you have a specific proposal for grouping the tests? We can remove the L40 SP tests and run h100 in their place instead of on nightly if you prefer.

ProExpertProg · 2026-02-04T03:51:57Z

@mgoin also see #33731 for further cleanup and reorganization, perhaps that helps

tjtanaa

LGTM as well. The PassConfig is not related to this PR.

zou3519 · 2026-02-04T14:48:47Z

vllm/config/vllm.py

+            if self.parallel_config.tensor_parallel_size == 1:
+                logger.warning("Sequence Parallelism requires TP>1, disabling")
+                self.compilation_config.pass_config.enable_sp = False
+                self.compilation_config.pass_config.fuse_gemm_comms = False


what happened before? an error?

Yes, because all reduce is a noop during tp=1, SP would match on just rms and during replacement tracing lack of TP would cause an error

zou3519 · 2026-02-04T15:10:34Z

tests/compile/fusions_e2e/conftest.py

+
+        # Disable, compile cache to make sure custom passes run.
+        # Otherwise, we can't verify fusion happened through the logs.
+        monkeypatch.setenv("VLLM_DISABLE_COMPILE_CACHE", "1")


The test coverage looked reasonable to me. I think we do want e2e tests around so that we can be sure these things are working or not.

Some ideas to further reduce the test time (that we could pursue in the future):

If the compile cache is off, this means that we have a cold compile each time. If the goal is to just check that the fusion happened, we could figure out how to disable Inductor triton kernel generation (which is a sizable chunk of the compile time)

we can avoid checking logs (there's probably some i/o there?) Instead there's a way to get the inductor graphs. We just want to check that the custom pass ran successfully and that there is a new fused custom op in the graph right?

Yeah that sounds good to me. I think we should still run the graph to check it's not broken but agreed that skipping triton would be nice. Also if we had a way to pass the counters between processes that would be much better instead of log parsing.

Are you okay with doing these in a follow-up?

Yes, follow-up is good.

Also if we had a way to pass the counters between processes that would be much better instead of log parsing.

setting the vllm multiprocessing envvar to 0 seemed to work for me to retrieve pytorch's counters (in test_cold_start.py).

That only works for TP=1 :/

.buildkite/test_areas/compile.yaml

Signed-off-by: ProExpertProg <luka.govedic@gmail.com>

ProExpertProg · 2026-02-04T19:04:33Z

Waiting for H100 runner availability before merging

ProExpertProg · 2026-02-05T00:08:58Z

Okay found the culprit for the test_async_tp.py failure, so I feel good about merging

Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: ProExpertProg <luka.govedic@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: felix01.yu <felix01.yu@vipshop.com>

Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: ProExpertProg <luka.govedic@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>

Reduce Blackwell test time

f84a045

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

ProExpertProg added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 29, 2026

mergify bot added the ci/build label Jan 29, 2026

gemini-code-assist bot reviewed Jan 29, 2026

View reviewed changes

.buildkite/test-pipeline.yaml Outdated Show resolved Hide resolved

E2E tests initial version

32a9b27

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

ProExpertProg changed the title ~~Reduce Blackwell test time~~ [WIP] Reduce WIP test time Jan 29, 2026

ProExpertProg changed the title ~~[WIP] Reduce WIP test time~~ [WIP] Reduce e2e fusion test time Jan 29, 2026

ProExpertProg added 2 commits January 29, 2026 22:02

Reorganize tests into model sweep and config sweep, add printing for …

0a8eec6

…debugging, fix qwen fusion logic Signed-off-by: Luka Govedič <lgovedic@redhat.com>

Add AsyncTP tests, reenable qwen qknorm for bf16

11b0cca

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

ProExpertProg requested review from WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, tlrmchlsmth, yewentao256 and youkaichao as code owners January 30, 2026 04:18

ProExpertProg added 4 commits January 30, 2026 00:12

Add AR+RMS(+QUANT) fusion tests, improve logging

dc8fdbb

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

Fix config test, temp disable tma alignment

3a35b34

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

Fixing timings & splitting ops, adding h100

d793a1f

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

oops

3a795db

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

ProExpertProg mentioned this pull request Jan 30, 2026

[W8A8 Block Linear Refactor][1/N] Keep all quantization types into QuantFP8 class. #33047

Merged

5 tasks

ProExpertProg added 2 commits January 30, 2026 11:33

remove triton backend for fp4

e775fbf

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

remove FI for non-quant asynctp, enable non-quant tp2 on hopper

9e6b696

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

ProExpertProg changed the title ~~[WIP] Reduce e2e fusion test time~~ [CI][torch.compile] Reduce e2e fusion test time Jan 30, 2026

ProExpertProg added 3 commits January 31, 2026 11:54

Special DeepGEMM handling

474a850

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

Remove deepgemm from ar+rms test, clean up

85e86c2

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

rename param to inductor_partition for better selection logic

449595d

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

KevinODonovan60 approved these changes Jan 31, 2026

View reviewed changes

Fixing ar+rms+quant and rms+quant interaction, fixing norm fusion num…

b8c0b5c

…bers, fixing check & skip consistency Signed-off-by: Luka Govedič <lgovedic@redhat.com>

Remove e2e test referenced from AMD and test-pipeline.yaml

4b44e17

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

ProExpertProg mentioned this pull request Feb 3, 2026

[torch.compile] Reorganize vllm/compilation and tests/compile (0/N for vLLM IR) #33731

Merged

5 tasks

Merge branch 'main' into luka/fix-fusion-tests

e2ba0ce

ProExpertProg enabled auto-merge (squash) February 3, 2026 23:47

tjtanaa reviewed Feb 4, 2026

View reviewed changes

tjtanaa approved these changes Feb 4, 2026

View reviewed changes

zou3519 reviewed Feb 4, 2026

View reviewed changes

mgoin reviewed Feb 4, 2026

View reviewed changes

.buildkite/test_areas/compile.yaml Outdated Show resolved Hide resolved

.buildkite/test_areas/compile.yaml Outdated Show resolved Hide resolved

ProExpertProg and others added 2 commits February 4, 2026 17:07

Reduce number of tests, reduce B200 running times

287d1e9

Signed-off-by: ProExpertProg <luka.govedic@gmail.com>

Merge branch 'main' into luka/fix-fusion-tests

0a19d07

ProExpertProg disabled auto-merge February 4, 2026 17:34

mgoin approved these changes Feb 4, 2026

View reviewed changes

ProExpertProg merged commit 4d95135 into vllm-project:main Feb 5, 2026
52 checks passed

This was referenced Feb 5, 2026

[CI Failure]: Distributed 2xH100 tests #33802

Closed

Revert "[Attention][FA3] Update FA3 to include new swizzle optimization" #33841

Merged

ProExpertProg mentioned this pull request Feb 7, 2026

[CI][torch.compile] Fix incorrect filtering for E2E fusion tests on B200 #34031

Merged

Uh oh!

Conversation

ProExpertProg commented Jan 29, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Follow-ups

Before

Distributed

Compile

PyTorch

SP

After

Distributed

Compile

PyTorch

SP

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

tjtanaa Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ProExpertProg Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

mgoin commented Feb 4, 2026

Uh oh!

ProExpertProg commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProExpertProg commented Feb 4, 2026

Uh oh!

tjtanaa left a comment

Choose a reason for hiding this comment

Uh oh!

zou3519 Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

ProExpertProg Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

zou3519 Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ProExpertProg Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

zou3519 Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ProExpertProg Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ProExpertProg commented Feb 4, 2026

Uh oh!

ProExpertProg commented Feb 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

ProExpertProg commented Jan 29, 2026 •

edited by github-actions bot

Loading

tjtanaa Feb 4, 2026 •

edited

Loading

ProExpertProg commented Feb 4, 2026 •

edited

Loading

zou3519 Feb 4, 2026 •

edited

Loading

zou3519 Feb 4, 2026 •

edited

Loading