Skip to content

[model] support FLUX.1-Kontext-dev#561

Merged
Gaohan123 merged 15 commits intovllm-project:mainfrom
RuixiangMa:flux1
Mar 23, 2026
Merged

[model] support FLUX.1-Kontext-dev#561
Gaohan123 merged 15 commits intovllm-project:mainfrom
RuixiangMa:flux1

Conversation

@RuixiangMa
Copy link
Copy Markdown
Contributor

@RuixiangMa RuixiangMa commented Dec 31, 2025

Purpose

FLUX.1-Kontext-dev model support

Test Plan

  • Model Loading Test
  • Inference Test

Test Result

4 * NVIDIA 4090(24G)

vllm serve black-forest-labs/FLUX.1-Kontext-dev --omni --port 8004 --enable_cpu_offload --tensor_parallel_size 4

curl -s -X POST "http://localhost:8004/v1/images/edits" -F "image=@test.jpg" -F "prompt=Change the sky to orange sunset." -F "guidance_scale=1.0" -F "num_inference_steps=50" -F "n=1" -F "size=1024x1024" -F "output_format=png" | jq -r '.data[0].b64_json' | base64 --decode > output.png

Target Image TP=2 TP=4
img img img
Time (s/img) 56.185 49.583
Peak Memory(GiB) 20.4 17.7

E2E UT:

tests/e2e/offline_inference/test_flux_kontext.py::test_flux_kontext_text_to_image PASSED [ 50%]
tests/e2e/offline_inference/test_flux_kontext.py::test_flux_kontext_image_edit PASSED [100%]

=========================================================== =======================

tests/e2e/online_serving/test_flux_kontext_expansion.py::test_flux_kontext_text_to_image[parallel_001] PASSED [ 20%]
tests/e2e/online_serving/test_flux_kontext_expansion.py::test_flux_kontext_image_edit[parallel_001] PASSED [ 40%]
tests/e2e/online_serving/test_flux_kontext_expansion.py::test_flux_kontext_image_edit_no_negative[parallel_001] PASSED [ 60%]
tests/e2e/online_serving/test_flux_kontext_expansion.py::test_flux_kontext_high_resolution[parallel_001] PASSED [ 80%]
tests/e2e/online_serving/test_flux_kontext_expansion.py::test_flux_kontext_multiple_outputs[parallel_001] PASSED [100%]

@RuixiangMa RuixiangMa marked this pull request as draft December 31, 2025 04:40
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread vllm_omni/diffusion/models/flux_kontext/flux_kontext_transformer.py Outdated
Comment thread vllm_omni/diffusion/models/flux/pipeline_flux_kontext.py Outdated
Comment thread vllm_omni/diffusion/models/flux/pipeline_flux_kontext.py Outdated
@RuixiangMa RuixiangMa changed the title [model] support FLUX.1-Kontext-dev model [model] support FLUX.1-Kontext-dev Dec 31, 2025
Signed-off-by: Lancer <maruxiiang6688@gmail.com>
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

@Bounty-hunter @wtomin PTAL

@david6666666 david6666666 mentioned this pull request Jan 16, 2026
63 tasks
@RuixiangMa RuixiangMa force-pushed the flux1 branch 2 times, most recently from 65f308f to b260332 Compare January 22, 2026 08:58
Signed-off-by: Lancer <maruixiang6688@gmail.com>
@RuixiangMa RuixiangMa force-pushed the flux1 branch 3 times, most recently from 503c037 to e9b6432 Compare January 25, 2026 15:08
@RuixiangMa RuixiangMa marked this pull request as ready for review January 25, 2026 15:08
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e9b6432d70

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread vllm_omni/diffusion/models/flux/pipeline_flux_kontext.py
Comment thread vllm_omni/diffusion/models/flux/pipeline_flux_kontext.py
Signed-off-by: Lancer <maruixiang6688@gmail.com>
Signed-off-by: Lancer <maruixiang6688@gmail.com>
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

@david6666666 @ZJY0516

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

if you use offloading, please also share your latency here with a table

Signed-off-by: Lancer <maruixiang6688@gmail.com>
@RuixiangMa
Copy link
Copy Markdown
Contributor Author

done

Signed-off-by: Lancer <maruixiang6688@gmail.com>
@RuixiangMa
Copy link
Copy Markdown
Contributor Author

RuixiangMa commented Feb 6, 2026

cc @hsliuustc0106

Refactored flux_kontext,move components into the flux folder

Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution — left some thoughts inline, mostly around opportunities to reuse existing code.

freqs_sin = torch.cat(sin_out, dim=-1).to(ids.device)
return freqs_cos, freqs_sin


Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These classes (FluxPosEmbed, ColumnParallelApproxGELU, FeedForward) look identical to the ones in flux_transformer.py — would it be possible to import them from there instead of redefining?

return hidden_states


class FluxKontextAttention(nn.Module):
Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FluxKontextAttention looks very similar to FluxAttention with the main difference being attention_mask support. Would it be feasible to extend FluxAttention or parameterize it instead? Just trying to reduce the maintenance burden.

num_single_layers: int = 38,
attention_head_dim: int = 128,
num_attention_heads: int = 24,
hidden_size: int = 3072,
Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed hidden_size is accepted as a parameter but only used in a divisibility check — inner_dim is computed independently from num_attention_heads * attention_head_dim. Is hidden_size still needed, or should the check verify hidden_size == inner_dim?

ids = torch.cat((txt_ids, img_ids), dim=0)
image_rotary_emb = self.pos_embed(ids)

for _, block in enumerate(self.transformer_blocks):
Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: since the index from enumerate(...) is unused, this could be simplified to for block in self.transformer_blocks:. Same on line 522.

logger = init_logger(__name__)


def calculate_shift(
Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These utility functions (calculate_shift, retrieve_timesteps, retrieve_latents) seem identical to the ones in pipeline_flux.py. Would it make sense to share them via a utility module?


model_path = download_weights_from_hf_specific(model_name, None, ["*"])

vae_config_path = os.path.join(model_path, "vae/config.json")
Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If vae/config.json doesn't exist at this path, this would raise a raw FileNotFoundError. Might be nice to add a friendlier error message, but not critical.

) -> DiffusionOutput:
# Handle multiple prompts - only take the first one, similar to Flux2KleinPipeline
if len(req.prompts) > 1:
logger.warning(
Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small thing — logger.warning() takes the second string as a format argument rather than concatenating it, so "Taking only the first prompt for now." might get silently dropped. Concatenating the strings should fix it.

raise ValueError(f"`max_sequence_length` cannot be greater than 512 but is {max_sequence_length}")

@staticmethod
def _prepare_latent_image_ids(batch_size, height, width, device, dtype):
Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: _prepare_latent_image_ids accepts batch_size but doesn't seem to use it. Same pattern exists in FluxPipeline so maybe it's inherited, but could be cleaned up.

if isinstance(first_prompt, str)
else (first_prompt.get("prompt") or "")
if first_prompt
else prompt
Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The walrus-operator chain here is a bit hard to follow — would a simpler if/elif work instead? Just a readability suggestion.

@RuixiangMa
Copy link
Copy Markdown
Contributor Author

Thanks for the contribution — left some thoughts inline, mostly around opportunities to reuse existing code.

Exactly, there's actually a ton of reusable code in the Flux models, including Flux2. I kept them separate for independence, but I'm planning to refactor and abstract the common parts if needed.

@lishunyang12
Copy link
Copy Markdown
Collaborator

Thanks for the contribution — left some thoughts inline, mostly around opportunities to reuse existing code.

Exactly, there's actually a ton of reusable code in the Flux models, including Flux2. I kept them separate for independence, but I'm planning to refactor and abstract the common parts if needed.

Look forward to your following work!

RuixiangMa and others added 4 commits February 26, 2026 17:06
Signed-off-by: Lancer <maruixiang6688@gmail.com>
Signed-off-by: Lancer <maruixiang6688@gmail.com>
Signed-off-by: Lancer <maruixiang6688@gmail.com>
Signed-off-by: Lancer <402430575@qq.com>
@RuixiangMa
Copy link
Copy Markdown
Contributor Author

@lishunyang12 @hsliuustc0106 The PR's been open a while. Could someone take a look?

@nuclearwu
Copy link
Copy Markdown
Contributor

nuclearwu commented Mar 12, 2026

Code Review: Critical Issues Only

🔴 Critical Issues (Must Fix)

1. Zero Test Coverage

+837 lines of new code
+0 test files

Risk: No regression protection for:

  • FluxKontextPipeline initialization
  • Image preprocessing pipeline
  • VAE encoder/decoder integration with Kontext model
  • Text encoder (CLIP + T5) integration
  • TP sharding logic for FluxKontextTransformer2DModel
  • Weight loading with FluxKontextTransformer2DModel.load_weights

Required Tests:

# tests/diffusion/models/flux/test_kontext_pipeline.py
def test_kontext_pipeline_initialization():
    """Verify FluxKontextPipeline initializes correctly"""

def test_image_to_image_editing():
    """Verify image editing produces correct outputs"""

def test_weight_loading_kontext():
    """Verify weights load correctly from HF checkpoint"""

def test_text_encoding_kontext():
    """Verify CLIP + T5 text encoders work together"""

def test_vae_scaling_factor_kontext():
    """Verify VAE scale factor is computed correctly"""

2. Missing Memory Profiling

| Target Image | TP=2 | TP=4 |
|--------------|------|------|
| Time         | | |
| Memory       |   |  |

Missing Information:

  • GPU memory usage for each TP config
  • Peak memory during inference
  • Minimum GPU memory requirements
  • gpu_memory_utilization tuning guidance

Required: Add memory profiling similar to PR #1629:

## Memory Profiling (FLUX.1-Kontext-dev, 1024x1024, 50 steps)

| Config | GPU Memory | Peak Memory | Status |
|--------|------------|-------------|--------|
| TP=2, 2x RTX 4090 24GB | ~22 GB | ~23 GB | ✅ Works |
| TP=4, 4x RTX 4090 24GB | ~12 GB | ~13 GB | ✅ Works |

**Minimum Requirements:**
- TP=2: 2× RTX 4090 24GB or equivalent
- TP=4: 4× RTX 4090 24GB or equivalent

📝 Required Changes

Priority Item
BLOCKER Add unit tests (pipeline initialization, weight loading, image editing)
BLOCKER Add memory profiling for TP=2 and TP=4 configurations

Verdict

Rating Notes
CHANGES_REQUESTED ⚠️ Good architecture, but zero tests and missing memory profiling

Rationale:

  • Excellent code refactoring with FluxPipelineMixin
  • Complete pipeline implementation with proper inheritance
  • Performance benchmarks provided
  • But 837 lines with 0 tests is unacceptable for production code
  • Memory profiling is essential for users to plan hardware

Post-fix: Once tests are added and memory profiling is provided, this is an APPROVE.


Reference PR: #1629 (FLUX.2-dev)
Review Date: 2026-03-12

@Gaohan123
Copy link
Copy Markdown
Collaborator

@wtomin @SamitHuang @ZJY0516 PTAL

@Gaohan123 Gaohan123 added this to the v0.18.0 milestone Mar 18, 2026
@wtomin
Copy link
Copy Markdown
Collaborator

wtomin commented Mar 18, 2026

No test files found in tests/ directory. Please add:

  • tests/e2e/offline_inference/test_flux_kontext.py
  • tests/e2e/online_serve/test_flux_kontext_expansion.py

This model seems to support TP. Update documentation tables:

  • docs/user_guide/diffusion_acceleration.md
  • docs/user_guide/diffusion/parallelism_acceleration.md

Can you add peak VRAM measurement and e2e latency to PR body? It would be great if you can compare it with diffusers' performance.

Signed-off-by: Lancer <maruixiang6688@gmail.com>
Signed-off-by: Lancer <maruixiang6688@gmail.com>
@RuixiangMa
Copy link
Copy Markdown
Contributor Author

RuixiangMa commented Mar 18, 2026

No test files found in tests/ directory. Please add:

  • tests/e2e/offline_inference/test_flux_kontext.py
  • tests/e2e/online_serve/test_flux_kontext_expansion.py

This model seems to support TP. Update documentation tables:

  • docs/user_guide/diffusion_acceleration.md
  • docs/user_guide/diffusion/parallelism_acceleration.md

Can you add peak VRAM measurement and e2e latency to PR body? It would be great if you can compare it with diffusers' performance.

updated!also, I can't test the diffusers baseline, I only have a few 4090 (24GB).

I focus on offload and parallelism, mostly driven by GPU constraints🤭.

@wtomin
Copy link
Copy Markdown
Collaborator

wtomin commented Mar 20, 2026

Good enough. Resolve the conflicts pls.

Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>
@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Mar 20, 2026
@@ -0,0 +1,126 @@
# SPDX-License-Identifier: Apache-2.0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check your tests with guidance from #1623

Signed-off-by: Lancer <maruixiang6688@gmail.com>
@RuixiangMa
Copy link
Copy Markdown
Contributor Author

Good enough. Resolve the conflicts pls.

fixed

@lishunyang12
Copy link
Copy Markdown
Collaborator

@wtomin is already covering the main points. Looks like conflicts are resolved — I'll defer to their review.

Signed-off-by: Gao Han <hgaoaf@connect.ust.hk>
@Gaohan123 Gaohan123 enabled auto-merge (squash) March 23, 2026 16:58
@Gaohan123 Gaohan123 merged commit 9bf8f8c into vllm-project:main Mar 23, 2026
7 of 8 checks passed
EDIT_PROMPT = "Transform this modern, geometrist image into a Vincent van Gogh style impressionist painting."
NEGATIVE_PROMPT = "blurry, low quality, modern, geometrist"
MODEL = "black-forest-labs/FLUX.1-Kontext-dev"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, because you didn't add a hardware mark, the CI will not pick up this case. Like this:

SINGLE_CARD_FEATURE_MARKS = hardware_marks(res={"cuda": "H100"})
PARALLEL_FEATURE_MARKS = hardware_marks(res={"cuda": "H100"}, num_cards=2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants