[model] support FLUX.1-Kontext-dev by RuixiangMa · Pull Request #561 · vllm-project/vllm-omni

RuixiangMa · 2025-12-31T04:39:52Z

Purpose

FLUX.1-Kontext-dev model support

Test Plan

Model Loading Test
Inference Test

Test Result

4 * NVIDIA 4090(24G)

vllm serve black-forest-labs/FLUX.1-Kontext-dev --omni --port 8004 --enable_cpu_offload --tensor_parallel_size 4

curl -s -X POST "http://localhost:8004/v1/images/edits" -F "image=@test.jpg" -F "prompt=Change the sky to orange sunset." -F "guidance_scale=1.0" -F "num_inference_steps=50" -F "n=1" -F "size=1024x1024" -F "output_format=png" | jq -r '.data[0].b64_json' | base64 --decode > output.png

Target Image	TP=2	TP=4

Time （s/img）	56.185	49.583
Peak Memory(GiB)	20.4	17.7

E2E UT:

tests/e2e/offline_inference/test_flux_kontext.py::test_flux_kontext_text_to_image PASSED [ 50%]
tests/e2e/offline_inference/test_flux_kontext.py::test_flux_kontext_image_edit PASSED [100%]

=========================================================== =======================

tests/e2e/online_serving/test_flux_kontext_expansion.py::test_flux_kontext_text_to_image[parallel_001] PASSED [ 20%]
tests/e2e/online_serving/test_flux_kontext_expansion.py::test_flux_kontext_image_edit[parallel_001] PASSED [ 40%]
tests/e2e/online_serving/test_flux_kontext_expansion.py::test_flux_kontext_image_edit_no_negative[parallel_001] PASSED [ 60%]
tests/e2e/online_serving/test_flux_kontext_expansion.py::test_flux_kontext_high_resolution[parallel_001] PASSED [ 80%]
tests/e2e/online_serving/test_flux_kontext_expansion.py::test_flux_kontext_multiple_outputs[parallel_001] PASSED [100%]

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: Lancer <maruxiiang6688@gmail.com>

hsliuustc0106 · 2025-12-31T09:00:53Z

@Bounty-hunter @wtomin PTAL

Signed-off-by: Lancer <maruixiang6688@gmail.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e9b6432d70

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: Lancer <maruixiang6688@gmail.com>

hsliuustc0106 · 2026-01-29T13:11:54Z

@david6666666 @ZJY0516

hsliuustc0106 · 2026-01-29T13:12:35Z

if you use offloading, please also share your latency here with a table

Signed-off-by: Lancer <maruixiang6688@gmail.com>

RuixiangMa · 2026-01-29T16:02:21Z

done

Signed-off-by: Lancer <maruixiang6688@gmail.com>

RuixiangMa · 2026-02-06T07:48:24Z

cc @hsliuustc0106

Refactored flux_kontext，move components into the flux folder

lishunyang12

Thanks for the contribution — left some thoughts inline, mostly around opportunities to reuse existing code.

lishunyang12 · 2026-02-22T01:37:10Z

+        freqs_sin = torch.cat(sin_out, dim=-1).to(ids.device)
+        return freqs_cos, freqs_sin
+
+


These classes (FluxPosEmbed, ColumnParallelApproxGELU, FeedForward) look identical to the ones in flux_transformer.py — would it be possible to import them from there instead of redefining?

lishunyang12 · 2026-02-22T01:37:11Z

+        return hidden_states
+
+
+class FluxKontextAttention(nn.Module):


FluxKontextAttention looks very similar to FluxAttention with the main difference being attention_mask support. Would it be feasible to extend FluxAttention or parameterize it instead? Just trying to reduce the maintenance burden.

lishunyang12 · 2026-02-22T01:37:11Z

+        num_single_layers: int = 38,
+        attention_head_dim: int = 128,
+        num_attention_heads: int = 24,
+        hidden_size: int = 3072,


I noticed hidden_size is accepted as a parameter but only used in a divisibility check — inner_dim is computed independently from num_attention_heads * attention_head_dim. Is hidden_size still needed, or should the check verify hidden_size == inner_dim?

lishunyang12 · 2026-02-22T01:37:11Z

+        ids = torch.cat((txt_ids, img_ids), dim=0)
+        image_rotary_emb = self.pos_embed(ids)
+
+        for _, block in enumerate(self.transformer_blocks):


Nit: since the index from enumerate(...) is unused, this could be simplified to for block in self.transformer_blocks:. Same on line 522.

lishunyang12 · 2026-02-22T01:37:11Z

+logger = init_logger(__name__)
+
+
+def calculate_shift(


These utility functions (calculate_shift, retrieve_timesteps, retrieve_latents) seem identical to the ones in pipeline_flux.py. Would it make sense to share them via a utility module?

lishunyang12 · 2026-02-22T01:37:11Z

+
+        model_path = download_weights_from_hf_specific(model_name, None, ["*"])
+
+    vae_config_path = os.path.join(model_path, "vae/config.json")


If vae/config.json doesn't exist at this path, this would raise a raw FileNotFoundError. Might be nice to add a friendlier error message, but not critical.

lishunyang12 · 2026-02-22T01:37:11Z

+    ) -> DiffusionOutput:
+        # Handle multiple prompts - only take the first one, similar to Flux2KleinPipeline
+        if len(req.prompts) > 1:
+            logger.warning(


Small thing — logger.warning() takes the second string as a format argument rather than concatenating it, so "Taking only the first prompt for now." might get silently dropped. Concatenating the strings should fix it.

lishunyang12 · 2026-02-22T01:37:11Z

+            raise ValueError(f"`max_sequence_length` cannot be greater than 512 but is {max_sequence_length}")
+
+    @staticmethod
+    def _prepare_latent_image_ids(batch_size, height, width, device, dtype):


Nit: _prepare_latent_image_ids accepts batch_size but doesn't seem to use it. Same pattern exists in FluxPipeline so maybe it's inherited, but could be cleaned up.

lishunyang12 · 2026-02-22T01:37:11Z

+            if isinstance(first_prompt, str)
+            else (first_prompt.get("prompt") or "")
+            if first_prompt
+            else prompt


The walrus-operator chain here is a bit hard to follow — would a simpler if/elif work instead? Just a readability suggestion.

RuixiangMa · 2026-02-22T02:06:18Z

Thanks for the contribution — left some thoughts inline, mostly around opportunities to reuse existing code.

Exactly， there's actually a ton of reusable code in the Flux models, including Flux2. I kept them separate for independence, but I'm planning to refactor and abstract the common parts if needed.

lishunyang12 · 2026-02-22T02:22:09Z

Thanks for the contribution — left some thoughts inline, mostly around opportunities to reuse existing code.

Exactly， there's actually a ton of reusable code in the Flux models, including Flux2. I kept them separate for independence, but I'm planning to refactor and abstract the common parts if needed.

Look forward to your following work!

Signed-off-by: Lancer <maruixiang6688@gmail.com>

Signed-off-by: Lancer <402430575@qq.com>

RuixiangMa · 2026-03-12T00:22:31Z

@lishunyang12 @hsliuustc0106 The PR's been open a while. Could someone take a look?

nuclearwu · 2026-03-12T05:59:50Z

Code Review: Critical Issues Only

🔴 Critical Issues (Must Fix)

1. Zero Test Coverage

+837 lines of new code
+0 test files

Risk: No regression protection for:

FluxKontextPipeline initialization
Image preprocessing pipeline
VAE encoder/decoder integration with Kontext model
Text encoder (CLIP + T5) integration
TP sharding logic for FluxKontextTransformer2DModel
Weight loading with FluxKontextTransformer2DModel.load_weights

Required Tests:

# tests/diffusion/models/flux/test_kontext_pipeline.py
def test_kontext_pipeline_initialization():
    """Verify FluxKontextPipeline initializes correctly"""

def test_image_to_image_editing():
    """Verify image editing produces correct outputs"""

def test_weight_loading_kontext():
    """Verify weights load correctly from HF checkpoint"""

def test_text_encoding_kontext():
    """Verify CLIP + T5 text encoders work together"""

def test_vae_scaling_factor_kontext():
    """Verify VAE scale factor is computed correctly"""

2. Missing Memory Profiling

| Target Image | TP=2 | TP=4 |
|--------------|------|------|
| Time         | | |
| Memory       |   |  |

Missing Information:

GPU memory usage for each TP config
Peak memory during inference
Minimum GPU memory requirements
gpu_memory_utilization tuning guidance

Required: Add memory profiling similar to PR #1629:

## Memory Profiling (FLUX.1-Kontext-dev, 1024x1024, 50 steps)

| Config | GPU Memory | Peak Memory | Status |
|--------|------------|-------------|--------|
| TP=2, 2x RTX 4090 24GB | ~22 GB | ~23 GB | ✅ Works |
| TP=4, 4x RTX 4090 24GB | ~12 GB | ~13 GB | ✅ Works |

**Minimum Requirements:**
- TP=2: 2× RTX 4090 24GB or equivalent
- TP=4: 4× RTX 4090 24GB or equivalent

📝 Required Changes

Priority	Item
BLOCKER	Add unit tests (pipeline initialization, weight loading, image editing)
BLOCKER	Add memory profiling for TP=2 and TP=4 configurations

Verdict

Rating	Notes
CHANGES_REQUESTED ⚠️	Good architecture, but zero tests and missing memory profiling

Rationale:

Excellent code refactoring with FluxPipelineMixin
Complete pipeline implementation with proper inheritance
Performance benchmarks provided
But 837 lines with 0 tests is unacceptable for production code
Memory profiling is essential for users to plan hardware

Post-fix: Once tests are added and memory profiling is provided, this is an APPROVE.

Reference PR: #1629 (FLUX.2-dev)
Review Date: 2026-03-12

Gaohan123 · 2026-03-18T03:20:09Z

@wtomin @SamitHuang @ZJY0516 PTAL

wtomin · 2026-03-18T03:49:29Z

No test files found in tests/ directory. Please add:

tests/e2e/offline_inference/test_flux_kontext.py
tests/e2e/online_serve/test_flux_kontext_expansion.py

This model seems to support TP. Update documentation tables:

docs/user_guide/diffusion_acceleration.md
docs/user_guide/diffusion/parallelism_acceleration.md

Can you add peak VRAM measurement and e2e latency to PR body? It would be great if you can compare it with diffusers' performance.

Signed-off-by: Lancer <maruixiang6688@gmail.com>

RuixiangMa · 2026-03-18T09:17:42Z

No test files found in tests/ directory. Please add:

tests/e2e/offline_inference/test_flux_kontext.py

tests/e2e/online_serve/test_flux_kontext_expansion.py

This model seems to support TP. Update documentation tables:

docs/user_guide/diffusion_acceleration.md

docs/user_guide/diffusion/parallelism_acceleration.md

Can you add peak VRAM measurement and e2e latency to PR body? It would be great if you can compare it with diffusers' performance.

updated！also, I can't test the diffusers baseline, I only have a few 4090 (24GB).

I focus on offload and parallelism, mostly driven by GPU constraints🤭.

wtomin · 2026-03-20T12:01:41Z

Good enough. Resolve the conflicts pls.

Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>

hsliuustc0106 · 2026-03-20T12:09:18Z

@@ -0,0 +1,126 @@
+# SPDX-License-Identifier: Apache-2.0


please check your tests with guidance from #1623

Signed-off-by: Lancer <maruixiang6688@gmail.com>

RuixiangMa · 2026-03-20T13:52:42Z

Good enough. Resolve the conflicts pls.

fixed

lishunyang12 · 2026-03-21T23:24:30Z

@wtomin is already covering the main points. Looks like conflicts are resolved — I'll defer to their review.

Signed-off-by: Gao Han <hgaoaf@connect.ust.hk>

yenuo26 · 2026-03-26T08:55:56Z

+EDIT_PROMPT = "Transform this modern, geometrist image into a Vincent van Gogh style impressionist painting."
+NEGATIVE_PROMPT = "blurry, low quality, modern, geometrist"
+MODEL = "black-forest-labs/FLUX.1-Kontext-dev"
+


Hello, because you didn't add a hardware mark, the CI will not pick up this case. Like this：

SINGLE_CARD_FEATURE_MARKS = hardware_marks(res={"cuda": "H100"}) PARALLEL_FEATURE_MARKS = hardware_marks(res={"cuda": "H100"}, num_cards=2)

RuixiangMa requested a review from hsliuustc0106 as a code owner December 31, 2025 04:39

RuixiangMa marked this pull request as draft December 31, 2025 04:40

chatgpt-codex-connector Bot reviewed Dec 31, 2025

View reviewed changes

Comment thread vllm_omni/diffusion/models/flux_kontext/flux_kontext_transformer.py Outdated

Comment thread vllm_omni/diffusion/models/flux/pipeline_flux_kontext.py Outdated

Comment thread vllm_omni/diffusion/models/flux/pipeline_flux_kontext.py Outdated

RuixiangMa force-pushed the flux1 branch from ce9ca59 to 1a2eb7f Compare December 31, 2025 04:51

RuixiangMa changed the title ~~[model] support FLUX.1-Kontext-dev model~~ [model] support FLUX.1-Kontext-dev Dec 31, 2025

support FLUX.1-Kontext-dev model

a6b28e1

Signed-off-by: Lancer <maruxiiang6688@gmail.com>

RuixiangMa force-pushed the flux1 branch from 1a2eb7f to a6b28e1 Compare December 31, 2025 05:19

david6666666 mentioned this pull request Jan 5, 2026

[RFC]: DiT model and feature support enhancement #85

Closed

58 tasks

david6666666 mentioned this pull request Jan 16, 2026

vLLM-Omni Model Support #808

Open

63 tasks

RuixiangMa force-pushed the flux1 branch 2 times, most recently from 65f308f to b260332 Compare January 22, 2026 08:58

Merge branch 'main' into flux1

7dd5060

Signed-off-by: Lancer <maruixiang6688@gmail.com>

RuixiangMa force-pushed the flux1 branch 3 times, most recently from 503c037 to e9b6432 Compare January 25, 2026 15:08

RuixiangMa marked this pull request as ready for review January 25, 2026 15:08

chatgpt-codex-connector Bot reviewed Jan 25, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/models/flux/pipeline_flux_kontext.py

Comment thread vllm_omni/diffusion/models/flux/pipeline_flux_kontext.py

RuixiangMa force-pushed the flux1 branch from e9b6432 to c30f7a9 Compare January 28, 2026 17:36

[models] upd

892e29f

Signed-off-by: Lancer <maruixiang6688@gmail.com>

RuixiangMa force-pushed the flux1 branch from c30f7a9 to 892e29f Compare January 28, 2026 17:51

Merge branch 'main' into flux1

abb1848

Signed-off-by: Lancer <maruixiang6688@gmail.com>

upd

09d137b

Signed-off-by: Lancer <maruixiang6688@gmail.com>

RuixiangMa force-pushed the flux1 branch from 2d9d268 to 09d137b Compare January 29, 2026 16:00

Merge branch 'main' into flux1

ec60c2b

Signed-off-by: Lancer <maruixiang6688@gmail.com>

RuixiangMa force-pushed the flux1 branch from c95beb3 to ec60c2b Compare February 6, 2026 07:38

lishunyang12 reviewed Feb 22, 2026

View reviewed changes

RuixiangMa and others added 4 commits February 26, 2026 17:06

Merge branch 'main' into flux1

5338905

Signed-off-by: Lancer <maruixiang6688@gmail.com>

upd

41749c9

Signed-off-by: Lancer <maruixiang6688@gmail.com>

Merge branch 'main' into flux1

19a6380

Signed-off-by: Lancer <maruixiang6688@gmail.com>

upd

e12e773

Signed-off-by: Lancer <402430575@qq.com>

Gaohan123 added this to the v0.18.0 milestone Mar 18, 2026

wtomin mentioned this pull request Mar 18, 2026

[RFC]: Diffusion Models Features Supports Plan #814

Open

54 tasks

RuixiangMa added 2 commits March 18, 2026 14:20

Merge branch 'main' into flux1

80c2bb3

Signed-off-by: Lancer <maruixiang6688@gmail.com>

upd

510bdd4

Signed-off-by: Lancer <maruixiang6688@gmail.com>

wtomin approved these changes Mar 20, 2026

View reviewed changes

Merge branch 'main' into flux1

753ca22

Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>

hsliuustc0106 added the ready label to trigger buildkite CI label Mar 20, 2026

hsliuustc0106 reviewed Mar 20, 2026

View reviewed changes

upd

c063cf6

Signed-off-by: Lancer <maruixiang6688@gmail.com>

Merge branch 'main' into flux1

236d4bc

Signed-off-by: Gao Han <hgaoaf@connect.ust.hk>

Gaohan123 enabled auto-merge (squash) March 23, 2026 16:58

Gaohan123 merged commit 9bf8f8c into vllm-project:main Mar 23, 2026
7 of 8 checks passed

wtomin mentioned this pull request Mar 25, 2026

[RFC]: Continuous Diffusion Model Acceleration Support #1217

Open

1 task

yenuo26 reviewed Mar 26, 2026

View reviewed changes

fhfuih mentioned this pull request Mar 31, 2026

[RFC]: L4 e2e tests of diffusion models and diffusion features (continuous maintanance) #1832

Open

1 task

		freqs_sin = torch.cat(sin_out, dim=-1).to(ids.device)
		return freqs_cos, freqs_sin


		model_path = download_weights_from_hf_specific(model_name, None, ["*"])

		vae_config_path = os.path.join(model_path, "vae/config.json")

Conversation

RuixiangMa commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hsliuustc0106 commented Dec 31, 2025

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

hsliuustc0106 commented Jan 29, 2026

Uh oh!

hsliuustc0106 commented Jan 29, 2026

Uh oh!

RuixiangMa commented Jan 29, 2026

Uh oh!

RuixiangMa commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lishunyang12 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RuixiangMa commented Feb 22, 2026

Uh oh!

lishunyang12 commented Feb 22, 2026

Uh oh!

RuixiangMa commented Mar 12, 2026

Uh oh!

nuclearwu commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Critical Issues Only

RuixiangMa commented Dec 31, 2025 •

edited

Loading

RuixiangMa commented Feb 6, 2026 •

edited

Loading

lishunyang12 left a comment •

edited

Loading

lishunyang12 Feb 22, 2026 •

edited

Loading

lishunyang12 Feb 22, 2026 •

edited

Loading

lishunyang12 Feb 22, 2026 •

edited

Loading

lishunyang12 Feb 22, 2026 •

edited

Loading

lishunyang12 Feb 22, 2026 •

edited

Loading

lishunyang12 Feb 22, 2026 •

edited

Loading

lishunyang12 Feb 22, 2026 •

edited

Loading

lishunyang12 Feb 22, 2026 •

edited

Loading

lishunyang12 Feb 22, 2026 •

edited

Loading

nuclearwu commented Mar 12, 2026 •

edited

Loading

RuixiangMa commented Mar 18, 2026 •

edited

Loading