[model] support FLUX.1-Kontext-dev#561
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: Lancer <maruxiiang6688@gmail.com>
|
@Bounty-hunter @wtomin PTAL |
65f308f to
b260332
Compare
Signed-off-by: Lancer <maruixiang6688@gmail.com>
503c037 to
e9b6432
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e9b6432d70
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: Lancer <maruixiang6688@gmail.com>
Signed-off-by: Lancer <maruixiang6688@gmail.com>
|
if you use offloading, please also share your latency here with a table |
|
done |
Signed-off-by: Lancer <maruixiang6688@gmail.com>
|
Refactored flux_kontext,move components into the flux folder |
| freqs_sin = torch.cat(sin_out, dim=-1).to(ids.device) | ||
| return freqs_cos, freqs_sin | ||
|
|
||
|
|
There was a problem hiding this comment.
These classes (FluxPosEmbed, ColumnParallelApproxGELU, FeedForward) look identical to the ones in flux_transformer.py — would it be possible to import them from there instead of redefining?
| return hidden_states | ||
|
|
||
|
|
||
| class FluxKontextAttention(nn.Module): |
There was a problem hiding this comment.
FluxKontextAttention looks very similar to FluxAttention with the main difference being attention_mask support. Would it be feasible to extend FluxAttention or parameterize it instead? Just trying to reduce the maintenance burden.
| num_single_layers: int = 38, | ||
| attention_head_dim: int = 128, | ||
| num_attention_heads: int = 24, | ||
| hidden_size: int = 3072, |
There was a problem hiding this comment.
I noticed hidden_size is accepted as a parameter but only used in a divisibility check — inner_dim is computed independently from num_attention_heads * attention_head_dim. Is hidden_size still needed, or should the check verify hidden_size == inner_dim?
| ids = torch.cat((txt_ids, img_ids), dim=0) | ||
| image_rotary_emb = self.pos_embed(ids) | ||
|
|
||
| for _, block in enumerate(self.transformer_blocks): |
There was a problem hiding this comment.
Nit: since the index from enumerate(...) is unused, this could be simplified to for block in self.transformer_blocks:. Same on line 522.
| logger = init_logger(__name__) | ||
|
|
||
|
|
||
| def calculate_shift( |
There was a problem hiding this comment.
These utility functions (calculate_shift, retrieve_timesteps, retrieve_latents) seem identical to the ones in pipeline_flux.py. Would it make sense to share them via a utility module?
|
|
||
| model_path = download_weights_from_hf_specific(model_name, None, ["*"]) | ||
|
|
||
| vae_config_path = os.path.join(model_path, "vae/config.json") |
There was a problem hiding this comment.
If vae/config.json doesn't exist at this path, this would raise a raw FileNotFoundError. Might be nice to add a friendlier error message, but not critical.
| ) -> DiffusionOutput: | ||
| # Handle multiple prompts - only take the first one, similar to Flux2KleinPipeline | ||
| if len(req.prompts) > 1: | ||
| logger.warning( |
There was a problem hiding this comment.
Small thing — logger.warning() takes the second string as a format argument rather than concatenating it, so "Taking only the first prompt for now." might get silently dropped. Concatenating the strings should fix it.
| raise ValueError(f"`max_sequence_length` cannot be greater than 512 but is {max_sequence_length}") | ||
|
|
||
| @staticmethod | ||
| def _prepare_latent_image_ids(batch_size, height, width, device, dtype): |
There was a problem hiding this comment.
Nit: _prepare_latent_image_ids accepts batch_size but doesn't seem to use it. Same pattern exists in FluxPipeline so maybe it's inherited, but could be cleaned up.
| if isinstance(first_prompt, str) | ||
| else (first_prompt.get("prompt") or "") | ||
| if first_prompt | ||
| else prompt |
There was a problem hiding this comment.
The walrus-operator chain here is a bit hard to follow — would a simpler if/elif work instead? Just a readability suggestion.
Exactly, there's actually a ton of reusable code in the Flux models, including Flux2. I kept them separate for independence, but I'm planning to refactor and abstract the common parts if needed. |
Look forward to your following work! |
Signed-off-by: Lancer <maruixiang6688@gmail.com>
Signed-off-by: Lancer <maruixiang6688@gmail.com>
|
@lishunyang12 @hsliuustc0106 The PR's been open a while. Could someone take a look? |
Code Review: Critical Issues Only🔴 Critical Issues (Must Fix)1. Zero Test CoverageRisk: No regression protection for:
Required Tests: # tests/diffusion/models/flux/test_kontext_pipeline.py
def test_kontext_pipeline_initialization():
"""Verify FluxKontextPipeline initializes correctly"""
def test_image_to_image_editing():
"""Verify image editing produces correct outputs"""
def test_weight_loading_kontext():
"""Verify weights load correctly from HF checkpoint"""
def test_text_encoding_kontext():
"""Verify CLIP + T5 text encoders work together"""
def test_vae_scaling_factor_kontext():
"""Verify VAE scale factor is computed correctly"""2. Missing Memory ProfilingMissing Information:
Required: Add memory profiling similar to PR #1629: ## Memory Profiling (FLUX.1-Kontext-dev, 1024x1024, 50 steps)
| Config | GPU Memory | Peak Memory | Status |
|--------|------------|-------------|--------|
| TP=2, 2x RTX 4090 24GB | ~22 GB | ~23 GB | ✅ Works |
| TP=4, 4x RTX 4090 24GB | ~12 GB | ~13 GB | ✅ Works |
**Minimum Requirements:**
- TP=2: 2× RTX 4090 24GB or equivalent
- TP=4: 4× RTX 4090 24GB or equivalent📝 Required Changes
Verdict
Rationale:
Post-fix: Once tests are added and memory profiling is provided, this is an APPROVE. Reference PR: #1629 (FLUX.2-dev) |
|
@wtomin @SamitHuang @ZJY0516 PTAL |
|
No test files found in tests/ directory. Please add:
This model seems to support TP. Update documentation tables:
Can you add peak VRAM measurement and e2e latency to PR body? It would be great if you can compare it with diffusers' performance. |
Signed-off-by: Lancer <maruixiang6688@gmail.com>
updated!also, I can't test the diffusers baseline, I only have a few 4090 (24GB). I focus on offload and parallelism, mostly driven by GPU constraints🤭. |
|
Good enough. Resolve the conflicts pls. |
Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>
| @@ -0,0 +1,126 @@ | |||
| # SPDX-License-Identifier: Apache-2.0 | |||
There was a problem hiding this comment.
please check your tests with guidance from #1623
fixed |
|
@wtomin is already covering the main points. Looks like conflicts are resolved — I'll defer to their review. |
Signed-off-by: Gao Han <hgaoaf@connect.ust.hk>
| EDIT_PROMPT = "Transform this modern, geometrist image into a Vincent van Gogh style impressionist painting." | ||
| NEGATIVE_PROMPT = "blurry, low quality, modern, geometrist" | ||
| MODEL = "black-forest-labs/FLUX.1-Kontext-dev" | ||
|
|
There was a problem hiding this comment.
Hello, because you didn't add a hardware mark, the CI will not pick up this case. Like this:
SINGLE_CARD_FEATURE_MARKS = hardware_marks(res={"cuda": "H100"})
PARALLEL_FEATURE_MARKS = hardware_marks(res={"cuda": "H100"}, num_cards=2)
Purpose
FLUX.1-Kontext-dev model support
Test Plan
Test Result
4 * NVIDIA 4090(24G)
vllm serve black-forest-labs/FLUX.1-Kontext-dev --omni --port 8004 --enable_cpu_offload --tensor_parallel_size 4curl -s -X POST "http://localhost:8004/v1/images/edits" -F "image=@test.jpg" -F "prompt=Change the sky to orange sunset." -F "guidance_scale=1.0" -F "num_inference_steps=50" -F "n=1" -F "size=1024x1024" -F "output_format=png" | jq -r '.data[0].b64_json' | base64 --decode > output.pngE2E UT:
tests/e2e/offline_inference/test_flux_kontext.py::test_flux_kontext_text_to_image PASSED [ 50%]
tests/e2e/offline_inference/test_flux_kontext.py::test_flux_kontext_image_edit PASSED [100%]
=========================================================== =======================
tests/e2e/online_serving/test_flux_kontext_expansion.py::test_flux_kontext_text_to_image[parallel_001] PASSED [ 20%]
tests/e2e/online_serving/test_flux_kontext_expansion.py::test_flux_kontext_image_edit[parallel_001] PASSED [ 40%]
tests/e2e/online_serving/test_flux_kontext_expansion.py::test_flux_kontext_image_edit_no_negative[parallel_001] PASSED [ 60%]
tests/e2e/online_serving/test_flux_kontext_expansion.py::test_flux_kontext_high_resolution[parallel_001] PASSED [ 80%]
tests/e2e/online_serving/test_flux_kontext_expansion.py::test_flux_kontext_multiple_outputs[parallel_001] PASSED [100%]