[CI] Add Flux2 Klein Tests#2027
Conversation
Signed-off-by: Alex Brooks <albrooks@redhat.com>
|
please change the test label from test-nightly to test-merge so that we can test it before merging. |
Signed-off-by: Alex Brooks <albrooks@redhat.com>
|
Sure, thanks @hsliuustc0106 - moved it into |
|
I wonder if L4 can hold this model during run time. But let's see first. Waiting for a "ready" label to get CI started |
| "--ulysses-degree", | ||
| "2", | ||
| "--quantization", | ||
| "fp8", |
There was a problem hiding this comment.
Does Flux 2 Klein support FP8? As per #1217 I only see GGUF. Correct me if I'm wrong and there is a recent update.
There was a problem hiding this comment.
Yes, it does, at least for some layers - I think it was added as a side effect of this PR, because it pushes the vLLM quant config down through the DiT layers. I checked to make sure that the quant post processing is called for the DiT layers, and the memory with it on is ~ 5gb lower 🙂
| NEGATIVE_PROMPT = "blurry, low quality" | ||
|
|
||
|
|
||
| # Currently Flux2 tests target Flux2 Klein. |
There was a problem hiding this comment.
Could you also add HSDP test case (#1900 just merged 4 days ago) thanks!
There was a problem hiding this comment.
Yup! I just went ahead and combined it with the CPU offload test
Signed-off-by: Alex Brooks <albrooks@redhat.com>
|
Thanks @fhfuih, I think it'll be close, so worth a try. From my local testing, the memory seems to float around 18-22 Gb vram for this model when quant is not used 🤞 Since the tests in this effort mostly focus on making sure we can run the models without exploding with different acceleration methods/combinations, we could also consider running them with randomly initialized models for tests that aren't explicitly testing the outputs to keep the CI from getting too heavy. what do you think? Similar to the way in which libs like transformers will use a small randomly initialized model created from a config for their basic tests, with heavy |
I think it is a good idea. I guess we can reuse huggingface internal testing checkpoints. For example, https://huggingface.co/hf-internal-testing/tiny-flux2-klein. Only functionalities will be checked in these tests, not speed/accruacy, I think we can tolerate the random weights. @fhfuih What dou you think? |
Yeah, I also agree. Random weights should not break the diffusion features. @alex-jw-brooks apologize for any confusion caused, but after some internal discussion, we just decided that we should reduce the number of test cases for not-high-priority models. Could you help settle a recommended feature combination for this model, and edit the test script to only include that feature combination? If you need help finding a good combination of diffusion features, see if hsliuustc0106/vllm-omni-skills#19 this AI skill can help, or search relevant PR in this repo that introduces this model or relevant features (for any example code snippets). |
Signed-off-by: Alex Brooks <albrooks@redhat.com>
|
Sounds good thanks @fhfuih @wtomin - for flux2, Ulysses SP & FP8 with cache dit generally give a good speedup (a bit less than 2x), so I've updated the test to test that with TP2 for coverage, since TP for this configuration only adds a little overhead. Should also be good to fit in L4 since memory looks to be around 17 Gb per gpu 🤞 for tiny model testing, maybe let's save it for a follow-up? I think it would be better to have a separate PR to add it generically so that new models can be added easily, and just parametrize over some common tests for configurations for those models to validate them (kind of like our common multimodal tests in vLLM) |
I think https://github.com/vllm-project/vllm/blob/main/tests/models/multimodal/generation/test_common.py will help a lot, please discuss under the RFC #1623 |
Add e2e tests for Stable Diffusion 3.5 medium model following the same pattern as Flux2 Klein tests. This PR adds test coverage for SD3.5 with commonly used acceleration features. Tests added: * cache_dit + cfg_parallel + tp (4 L4 cards) The test configuration uses: - Cache-DiT for faster inference - CFG-Parallel (size=2) for classifier-free guidance parallelization - Tensor-Parallel (size=2) for distributed inference This combination provides good performance improvements while keeping memory usage reasonable (~18-22GB VRAM per GPU based on similar models). Test parameters: - Resolution: 1024x1024 (standard for SD3.5) - Inference steps: 28 (recommended for quality) - Guidance scale: 4.5 (SD3.5 default) Following the pattern from PR vllm-project#2027, only one test case is included to keep CI lightweight while ensuring the model works with key acceleration features. Signed-off-by: LiuBingyu <liubingyu62@gmail.com>
Purpose
Adds Flux2 tests for #1832
Test Plan
For the 4B model, I'm not 100% sure the tests can run on the L4 cards, but it's very close, so I think worth a try; I also took a pass at the buildkite config since it doesn't look like it's been added for L4 card models yet.
Tests added are:
Test Result
Tests pass locally
@fhfuih @nuclearwu can you please take a look? This should cover everything for flux2klein except HSDP at the moment I think, but can also reduce cases if needed