[CI] Add Flux2 Klein Tests by alex-jw-brooks · Pull Request #2027 · vllm-project/vllm-omni

alex-jw-brooks · 2026-03-19T23:57:16Z

Purpose

Adds Flux2 tests for #1832

Test Plan

For the 4B model, I'm not 100% sure the tests can run on the L4 cards, but it's very close, so I think worth a try; I also took a pass at the buildkite config since it doesn't look like it's been added for L4 card models yet.

Tests added are:

cache_dit + cpu offload (1 L4 card)
cache_dit + ring 2 + ulysses 2 + fp8 (4 L4 cards)
cache_dir + tp + cfg parallel + gguf (4 L4 cards)

Test Result

Tests pass locally

@fhfuih @nuclearwu can you please take a look? This should cover everything for flux2klein except HSDP at the moment I think, but can also reduce cases if needed

Signed-off-by: Alex Brooks <albrooks@redhat.com>

hsliuustc0106 · 2026-03-20T00:09:15Z

please change the test label from test-nightly to test-merge so that we can test it before merging.

Signed-off-by: Alex Brooks <albrooks@redhat.com>

alex-jw-brooks · 2026-03-20T00:27:54Z

Sure, thanks @hsliuustc0106 - moved it into test-merge and to depend on upload-merge-pipeline for now

fhfuih · 2026-03-20T01:41:42Z

I wonder if L4 can hold this model during run time. But let's see first. Waiting for a "ready" label to get CI started

fhfuih · 2026-03-20T01:45:47Z

+                    "--ulysses-degree",
+                    "2",
+                    "--quantization",
+                    "fp8",


Does Flux 2 Klein support FP8? As per #1217 I only see GGUF. Correct me if I'm wrong and there is a recent update.

Yes, it does, at least for some layers - I think it was added as a side effect of this PR, because it pushes the vLLM quant config down through the DiT layers. I checked to make sure that the quant post processing is called for the DiT layers, and the memory with it on is ~ 5gb lower 🙂

fhfuih · 2026-03-20T01:46:35Z

+NEGATIVE_PROMPT = "blurry, low quality"
+
+
+# Currently Flux2 tests target Flux2 Klein.


Could you also add HSDP test case (#1900 just merged 4 days ago) thanks!

Yup! I just went ahead and combined it with the CPU offload test

Signed-off-by: Alex Brooks <albrooks@redhat.com>

alex-jw-brooks · 2026-03-20T04:01:21Z

Thanks @fhfuih, I think it'll be close, so worth a try. From my local testing, the memory seems to float around 18-22 Gb vram for this model when quant is not used 🤞

Since the tests in this effort mostly focus on making sure we can run the models without exploding with different acceleration methods/combinations, we could also consider running them with randomly initialized models for tests that aren't explicitly testing the outputs to keep the CI from getting too heavy. what do you think?

Similar to the way in which libs like transformers will use a small randomly initialized model created from a config for their basic tests, with heavy @slow tests to actually verify outputs (example)

wtomin · 2026-03-20T06:31:24Z

Since the tests in this effort mostly focus on making sure we can run the models without exploding with different acceleration methods/combinations, we could also consider running them with randomly initialized models for tests that aren't explicitly testing the outputs to keep the CI from getting too heavy. what do you think?

I think it is a good idea. I guess we can reuse huggingface internal testing checkpoints. For example, https://huggingface.co/hf-internal-testing/tiny-flux2-klein.

Only functionalities will be checked in these tests, not speed/accruacy, I think we can tolerate the random weights. @fhfuih What dou you think?

fhfuih · 2026-03-20T07:00:49Z

Only functionalities will be checked in these tests, not speed/accruacy, I think we can tolerate the random weights. @fhfuih What dou you think?

Yeah, I also agree. Random weights should not break the diffusion features.

@alex-jw-brooks apologize for any confusion caused, but after some internal discussion, we just decided that we should reduce the number of test cases for not-high-priority models. Could you help settle a recommended feature combination for this model, and edit the test script to only include that feature combination? If you need help finding a good combination of diffusion features, see if hsliuustc0106/vllm-omni-skills#19 this AI skill can help, or search relevant PR in this repo that introduces this model or relevant features (for any example code snippets).

CC @yangjianjuan

Signed-off-by: Alex Brooks <albrooks@redhat.com>

alex-jw-brooks · 2026-03-21T17:56:26Z

Sounds good thanks @fhfuih @wtomin - for flux2, Ulysses SP & FP8 with cache dit generally give a good speedup (a bit less than 2x), so I've updated the test to test that with TP2 for coverage, since TP for this configuration only adds a little overhead. Should also be good to fit in L4 since memory looks to be around 17 Gb per gpu 🤞

for tiny model testing, maybe let's save it for a follow-up? I think it would be better to have a separate PR to add it generically so that new models can be added easily, and just parametrize over some common tests for configurations for those models to validate them (kind of like our common multimodal tests in vLLM)

hsliuustc0106 · 2026-03-22T05:04:39Z

common multimodal tests

I think https://github.com/vllm-project/vllm/blob/main/tests/models/multimodal/generation/test_common.py will help a lot, please discuss under the RFC #1623

hsliuustc0106

lgtm

Add e2e tests for Stable Diffusion 3.5 medium model following the same pattern as Flux2 Klein tests. This PR adds test coverage for SD3.5 with commonly used acceleration features. Tests added: * cache_dit + cfg_parallel + tp (4 L4 cards) The test configuration uses: - Cache-DiT for faster inference - CFG-Parallel (size=2) for classifier-free guidance parallelization - Tensor-Parallel (size=2) for distributed inference This combination provides good performance improvements while keeping memory usage reasonable (~18-22GB VRAM per GPU based on similar models). Test parameters: - Resolution: 1024x1024 (standard for SD3.5) - Inference steps: 28 (recommended for quality) - Guidance scale: 4.5 (SD3.5 default) Following the pattern from PR vllm-project#2027, only one test case is included to keep CI lightweight while ensuring the model works with key acceleration features. Signed-off-by: LiuBingyu <liubingyu62@gmail.com>

add flux2 tests

919167d

Signed-off-by: Alex Brooks <albrooks@redhat.com>

alex-jw-brooks requested a review from hsliuustc0106 as a code owner March 19, 2026 23:57

alex-jw-brooks mentioned this pull request Mar 20, 2026

[RFC]: Continuous Diffusion Model Acceleration Support #1217

Open

1 task

move l4 -> merge

a0b40af

Signed-off-by: Alex Brooks <albrooks@redhat.com>

fhfuih mentioned this pull request Mar 20, 2026

[RFC]: L4 e2e tests of diffusion models and diffusion features (continuous maintanance) #1832

Open

1 task

fhfuih reviewed Mar 20, 2026

View reviewed changes

yenuo26 mentioned this pull request Mar 20, 2026

[RFC]: Supplement use cases for L1, L3, and L4 JiusiServe/vllm-omni#163

Closed

1 task

fhfuih reviewed Mar 20, 2026

View reviewed changes

add hsdp to cpu offload test

2eeaa73

Signed-off-by: Alex Brooks <albrooks@redhat.com>

wtomin added the ready label to trigger buildkite CI label Mar 20, 2026

reduce test cases for flux2

7c6c876

Signed-off-by: Alex Brooks <albrooks@redhat.com>

hsliuustc0106 approved these changes Mar 22, 2026

View reviewed changes

hsliuustc0106 merged commit a5574a2 into vllm-project:main Mar 22, 2026
8 checks passed

		NEGATIVE_PROMPT = "blurry, low quality"


		# Currently Flux2 tests target Flux2 Klein.

Conversation

alex-jw-brooks commented Mar 19, 2026

Purpose

Test Plan

Test Result

Uh oh!

hsliuustc0106 commented Mar 20, 2026

Uh oh!

alex-jw-brooks commented Mar 20, 2026

Uh oh!

fhfuih commented Mar 20, 2026

Uh oh!

fhfuih Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

alex-jw-brooks Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

fhfuih Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

alex-jw-brooks Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

alex-jw-brooks commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wtomin commented Mar 20, 2026

Uh oh!

fhfuih commented Mar 20, 2026

Uh oh!

alex-jw-brooks commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hsliuustc0106 commented Mar 22, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

alex-jw-brooks commented Mar 20, 2026 •

edited

Loading

alex-jw-brooks commented Mar 21, 2026 •

edited

Loading