[Model]: add FLUX.2-dev model by nuclearwu · Pull Request #1629 · vllm-project/vllm-omni

nuclearwu · 2026-03-03T07:03:39Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

support https://huggingface.co/black-forest-labs/FLUX.2-dev

Test Plan

vLLM-Omni:
Text-to-Image:

python examples/offline_inference/text_to_image/text_to_image.py \
  --model /workspace/cache/ymttest/johnjan/models/black-forest-labs/FLUX___2-dev/ \
  --prompt "a lovely bunny holding a sign that says 'vllm-omni'" \
  --seed 42 \
  --tensor-parallel-size 2 \
  --num-images-per-prompt 1 \
  --num-inference-steps 50 \
  --guidance-scale 4.0 \
  --height 1024 \
  --width 1024 \
  --output outputs/flux2-dev.png

Online Serving:

MODEL_NAME_OR_PATH=/workspace/cache/ymttest/johnjan/models/black-forest-labs/FLUX___2-dev/

vllm serve ${MODEL_NAME_OR_PATH} \
   --omni \
   --port 8092 \
   --tensor-parallel-size 1 \
   --vae_use_slicing \
   --vae_use_tiling \
   --enable-cpu-offload

Memory Profile:

nvidia-smi --query-gpu=memory.used,memory.total --format=csv -l 1 > memory.log &
NVIDIA_SMI_PID=$!

echo "Memory monitoring started with PID: $NVIDIA_SMI_PID"

# Run inference
python examples/offline_inference/text_to_image/text_to_image.py \
  --model /workspace/cache/ymttest/johnjan/models/black-forest-labs/FLUX___2-dev/ \
  --prompt "a lovely bunny holding a sign that says 'vllm-omni'" \
  --seed 42 \
  --tensor-parallel-size 1 \
  --num-images-per-prompt 1 \
  --num-inference-steps 50 \
  --guidance-scale 4.0 \
  --height 1024 \
  --width 1024 \
  --output outputs/flux2-dev.png

kill -9 $NVIDIA_SMI_PID
echo "Memory monitoring stopped"

# Analyze peak
python -c "
import pandas as pd
df = pd.read_csv('memory.log')
df.iloc[:,0] = df.iloc[:,0].str.replace(' MiB', '').astype(float)
print(f'Peak memory: {df.iloc[:,0].max()} MB')
print(f'Total samples: {len(df)}')
"

Image-to-Image:

python examples/offline_inference/image_to_image/image_edit.py \
    --model /workspace/cache/ymttest/johnjan/models/black-forest-labs/FLUX___2-dev/ \
    --image outputs/flux2-dev.png \
    --prompt "replace the bunny in the image with dog." \
    --output outputs/flux2-dev-edit.png \
    --seed 42 \
    --tensor-parallel-size 2 \
    --num-inference-steps 50 \
    --guidance-scale 4.0

Test Result

vLLM-Omni:
Reproduced with 4xA800.

Model/TP	diffusers	TP=1	TP=1 & --enable-cpu-offload	TP=2	TP=4
Flux.2-dev		OOM
Time	104.9411s/img	OOM	89.8067s/img	39.1087s/img	29.0770s/img

Online Serving:

curl -s http://localhost:8092/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "a lovely bunny holding a sign that says 'vllm-omni'"}
    ],
    "extra_body": {
      "height": 1024,
      "width": 1024,
      "num_inference_steps": 50,
      "guidance_scale": 4.0,
      "seed": 42
    }
  }' | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > flux.2-dev.png

Memory Profiling (FLUX.2-dev, 1024x1024, 50 steps):

Config	GPU Memory	Peak Memory	Status
TP=1, 1x A800 80GB	OOM	-	❌ Insufficient
TP=1, 1x A800 80GB & --enable-cpu-offload	66696MiB	67352MiB	✅ Works
TP=2, 2x A800 80GB	81112MiB	81182MiB	✅ Works
TP=4, 4x A800 80GB	68160MiB	81116MiB	✅ Works

TP=1 OOM Explanation:
The OOM on a single A800 (80GB) at TP=1 is inevitable because the total size of FLUX.2-dev weights is approximately 112.6 GB (including ~64.3 GB for the Transformer and ~48.0 GB for the T5-XXL text encoder), which significantly exceeds the 80GB VRAM capacity. But we can enable cpu offload by --enable-cpu-offload at TP=1 to. run Flux.2-dev and it works well.

Minimum Requirements:

TP=1: 1x A800 80GB & --enable-cpu-offload or equivalent
TP=2: 2× A800 80GB or equivalent
TP=4: 4× A800 80GB or equivalent

Image-to-Image:

python examples/offline_inference/image_to_image/image_edit.py \
    --model /workspace/cache/ymttest/johnjan/models/black-forest-labs/FLUX___2-dev/ \
    --image outputs/flux2-dev.png \
    --prompt "replace the bunny in the image with dog." \
    --output outputs/flux2-dev-edit.png \
    --seed 42 \
    --tensor-parallel-size 2 \
    --num-inference-steps 50 \
    --guidance-scale 4.0

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a53145a246

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>

# Conflicts: # docs/user_guide/diffusion_acceleration.md

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>

nuclearwu · 2026-03-03T08:42:00Z

cc @hsliuustc0106 @ZJY0516 @wtomin

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>

hsliuustc0106 · 2026-03-03T13:10:23Z

why oom but diffusers works fine?

hsliuustc0106

Architectural Code Review

📋 Summary

Item	Details
PR	[Model]: add FLUX.2-dev model
Author	@nuclearwu
Scale	+1871 lines (2 new files, 6 modified)
Status	Needs Changes

✅ Strengths

1. Complete Model Implementation

Full transformer architecture (764 lines)
Complete pipeline (1081 lines)
Proper registry integration

2. Performance Benchmarks

Config	Time	vs diffusers
TP=2	39.1s/img	2.7x faster
TP=4	29.1s/img	3.6x faster

3. Architecture Patterns

Proper Mixin composition: CFGParallelMixin, SupportImageInput
Fused QKV+MLP projection: Flux2ParallelSelfAttention
RoPE integration with RotaryEmbedding(is_neox_style=False)

🔴 Critical Issues

1. Zero Test Coverage

+1871 lines of new code
+0 test files

Risk: No regression protection for:

Weight loading (load_weights with stacked_params_mapping)
TP sharding logic
Image preprocessing pipeline
Text encoder integration

Required:

# tests/diffusion/models/flux2/test_flux2_transformer.py
def test_weight_loading_tp2():
    """Verify weights load correctly with TP=2"""
    
def test_rope_position_embedding():
    """Verify RoPE produces correct embeddings for 4D coords"""

def test_packed_module_mapping():
    """Verify to_qkv packing matches HF checkpoint"""

2. Weight Loading Typo

# flux2_transformer.py:716
if "to_qkvkv_mlp_proj" in name:  # ❌ Typo: qkvkv
    name = name.replace("to_qkvkv_mlp_proj", "to_qkv_mlp_proj")

Questions:

What HF checkpoint has this typo?
Is this a diffusers bug or model-specific?

Fix: Add comment explaining the source, or fix upstream if possible.

3. TP=1 OOM Without Explanation

| Model/TP | TP=1 | TP=2 | TP=4 |
| Flux.2-dev | OOM | ✅ | ✅ |

Missing:

Memory requirement estimate
Minimum GPU memory for each TP config
gpu_memory_utilization tuning guidance

🟡 Significant Concerns

4. Code Attribution from diffusers

# pipeline_flux2.py:27-33
from diffusers import AutoencoderKLFlux2, FlowMatchEulerDiscreteScheduler
from diffusers.pipelines.flux2.pipeline_flux2 import UPSAMPLING_MAX_IMAGE_SIZE
from diffusers.pipelines.flux2.system_messages import SYSTEM_MESSAGE, ...

Large sections appear copied from diffusers:

retrieve_timesteps (67 lines) - copied with "Copied from diffusers"
retrieve_latents (12 lines) - copied with "Copied from diffusers"
_validate_and_process_images (33 lines) - "Adapted from diffusers"

Concerns:

Are all copied sections properly attributed?
License compatibility (Apache 2.0 vs diffusers license)?

Recommendation: Audit all copied code for proper attribution headers.

5. Hardcoded Magic Values

# pipeline_flux2.py:52-55
max_aspect_ratio: int = 8,
min_side_length: int = 64,
max_area: int = 1024 * 1024,

# pipeline_flux2.py:304
max_length=2048,  # ❌ Why 2048?

# pipeline_flux2.py:454
scale: int = 10,  # ❌ Why 10?

Fix: Document each constant or make configurable.

6. Inconsistent Error Handling

# pipeline_flux2.py:560
if latents.dtype != self.vae.dtype:
    latents = latents.to(self.vae.dtype)

vs.

# pipeline_flux2.py:563
image = self.vae.decode(latents, return_dict=False)[0]  # No dtype check

🟢 Minor Suggestions

7. Missing Type Hints

# flux2_transformer.py:714
def load_weights(self, weights: Iterable[tuple[str, torch.Tensor]]) -> set[str]:

vs.

# pipeline_flux2.py:733
def load_weights(self, weights):  # ❌ No type hints

8. Docstring Gaps

def _prepare_latent_ids(self, latents):
    """Missing docstring for complex coordinate generation"""

🏗️ Architecture Impact Analysis

Registry Integration:

# registry.py - Correct pattern
_DIFFUSION_PIPELINES["Flux2Pipeline"] = ("flux2", "pipeline_flux2", "Flux2Pipeline")
_POST_PROCESS_FUNCS["Flux2Pipeline"] = "get_flux2_post_process_func"

TP Sharding:

# Correct use of vLLM parallel layers
QKVParallelLinear, MergedColumnParallelLinear, RowParallelLinear

Attention Backend:

Uses vllm_omni.diffusion.attention.layer.Attention
Properly integrates RoPE

📝 Required Changes

Priority	Item
BLOCKER	Add unit tests (weight loading, TP sharding, preprocessing)
BLOCKER	Document/fix `to_qkvkv_mlp_proj` typo source
IMPORTANT	Add memory requirements documentation
IMPORTANT	Audit diffusers code attribution
SUGGESTED	Document magic constants

Verdict

Rating	Notes
CHANGES_REQUESTED ⚠️	Solid implementation, but zero tests is unacceptable

Rationale:

Good architecture patterns and performance
Complete model integration
But 1871 lines with 0 tests is a maintenance liability

Post-fix: Once tests are added, this is an APPROVE.

Reviewed by: vllm-omni-reviewer MCP tool 🦐

hsliuustc0106

Additional Feedback: Memory Profiling Required

Good point from review — this PR should include memory profiling for different TP configurations.

Why This Matters

TP=1 OOM is unexplained — Users need to know minimum GPU memory requirements
Capacity planning — Users need to choose the right GPU/TP config
gpu_memory_utilization tuning — Users need guidance on memory fraction settings

Suggested Memory Report Format

## Memory Profiling (FLUX.2-dev, 1024x1024, 50 steps)

| Config | GPU Memory | Peak Memory | Status |
|--------|------------|-------------|--------|
| TP=1, 1x A100 80GB | OOM | - | ❌ Insufficient |
| TP=2, 2x A100 80GB | ~45 GB | ~52 GB | ✅ Works |
| TP=4, 4x A100 80GB | ~25 GB | ~30 GB | ✅ Works |

**Minimum Requirements:**
- TP=2: 2× A100 80GB or equivalent
- TP=4: 4× A100 40GB or equivalent

How to Profile

# Enable memory tracking
export VLLM_ATTENTION_BACKEND=FLASHINFER
nvidia-smi --query-gpu=memory.used,memory.total --format=csv -l 1 > memory.log &

# Run inference
python examples/offline_inference/text_to_image/text_to_image.py \
  --model black-forest-labs/FLUX.2-dev \
  --tensor-parallel-size 2 \
  --height 1024 --width 1024 \
  --num-inference-steps 50

# Analyze peak
python -c "
import pandas as pd
df = pd.read_csv('memory.log')
print(f'Peak memory: {df.iloc[:,0].max()} MB')
"

Additional Metrics to Include

Model weights memory (fixed overhead)
Activation memory (depends on batch size, resolution)
KV cache memory (if applicable)
VAE encoder/decoder memory

This information is essential for users to decide if their hardware can run the model.

🦐 vllm-omni-reviewer

# Conflicts: # vllm_omni/diffusion/registry.py

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>

nuclearwu · 2026-03-04T07:21:56Z

why oom but diffusers works fine?

@hsliuustc0106 The OOM on a single A800 (80GB) at TP=1 is inevitable because the total size of FLUX.2-dev weights is approximately 112.6 GB (including ~64.3 GB for the Transformer and ~48.0 GB for the T5-XXL text encoder), which significantly exceeds the 80GB VRAM capacity. However, diffusers works fine because save some VRAM by offloading the model to CPU.

import torch
import time
from modelscope import Flux2Pipeline

device = "cuda"
dtype = torch.bfloat16

pipe = Flux2Pipeline.from_pretrained("/workspace/cache/ymttest/johnjan/models/black-forest-labs/FLUX___2-dev/", torch_dtype=dtype)
pipe.enable_model_cpu_offload()  # save some VRAM by offloading the model to CPU

prompt = "a lovely bunny holding a sign that says 'vllm-omni'"
image = pipe(
    prompt=prompt,
    height=1024,
    width=1024,
    guidance_scale=4.0,
    num_inference_steps=3,
    max_sequence_length=512,
    generator=torch.Generator(device=device).manual_seed(42)
).images[0]
generation_start = time.perf_counter()
image = pipe(
    prompt=prompt,
    height=1024,
    width=1024,
    guidance_scale=4.0,
    num_inference_steps=50,
    max_sequence_length=512,
    generator=torch.Generator(device=device).manual_seed(42)
).images[0]
generation_end = time.perf_counter()
generation_time = generation_end - generation_start
print(f"Total generation time: {generation_time:.4f} seconds ({generation_time * 1000:.2f} ms)")
image.save("outputs/flux2-dev-diffusers.png")

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>

lishunyang12

nit inline

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>

lishunyang12

left a couple of comments

lishunyang12 · 2026-03-05T02:08:18Z

The PR is in good shape overall. Fix those and i will left maintainers with the remaining items.

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>

nuclearwu · 2026-03-10T05:40:45Z

cc @hsliuustc0106

wtomin · 2026-03-10T07:29:51Z

Overall, it's good:

Comprehensive PR body — Best-in-class documentation with memory profiling, benchmarks, and clear minimum requirements
Clean implementation — No diffusers Mixin，pure vLLM-Omni abstractions
Tensor Parallel support — Properly implemented with QKVParallelLinear
CPU Offload support — Enables single-GPU deployment (80GB+)
Unit test coverage — focused TP unit tests

⚠️ Minor Suggestions:

Would be better if you can edit examples/offline_inference/text_to_image/README.md and add a CLI inference example of Flux.2-dev model. Especially mention about the memory constraint (>80GB), therefore cpu-offloading and other memory optimization methods are highly recommended.
Do you consider supporting other memory optimization methods (such as quantization) for FLUX.2-dev model?
Would be great if you can test its online serving functionality. Just to double check it's working.

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>

hsliuustc0106 · 2026-03-11T01:42:43Z

fix dco and solve @wtomin's comments

nuclearwu · 2026-03-11T01:51:01Z

Overall, it's good:

Comprehensive PR body — Best-in-class documentation with memory profiling, benchmarks, and clear minimum requirements

Clean implementation — No diffusers Mixin，pure vLLM-Omni abstractions

Tensor Parallel support — Properly implemented with QKVParallelLinear

CPU Offload support — Enables single-GPU deployment (80GB+)

Unit test coverage — focused TP unit tests

⚠️ Minor Suggestions:

Would be better if you can edit examples/offline_inference/text_to_image/README.md and add a CLI inference example of Flux.2-dev model. Especially mention about the memory constraint (>80GB), therefore cpu-offloading and other memory optimization methods are highly recommended.

Do you consider supporting other memory optimization methods (such as quantization) for FLUX.2-dev model?

Would be great if you can test its online serving functionality. Just to double check it's working.

@wtomin Thank you for your review, I will consider supporting quantization in the future. The rest have all been revised.

nuclearwu · 2026-03-11T01:51:40Z

fix dco and solve @wtomin's comments

@hsliuustc0106 done

wtomin

LGTM.

# Conflicts: # docs/user_guide/diffusion/parallelism_acceleration.md

wtomin · 2026-03-11T02:54:54Z

Solve the conflicts please. @nuclearwu

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>

nuclearwu · 2026-03-11T02:56:58Z

Solve the conflicts please. @nuclearwu

@wtomin done

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>

jannikstdl · 2026-03-12T18:27:10Z

Does VLLM Omni Support Flux2-dev Image to Image in the API Server?

wtomin · 2026-03-13T02:24:19Z

@jannikstdl Please take this image-to-image tutorial as reference, changing the model name to Flux2-dev . If you encountered some errors, feel free to raise an issue.

jannikstdl · 2026-03-13T08:44:32Z

Update: It does already support Image Editing with Flux2.dev in API Server. Running VLLM Omni 0.16.0
Thanks!

nuclearwu requested a review from hsliuustc0106 as a code owner March 3, 2026 07:03

nuclearwu closed this Mar 3, 2026

nuclearwu reopened this Mar 3, 2026

chatgpt-codex-connector Bot reviewed Mar 3, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/models/flux2/pipeline_flux2.py Outdated

Comment thread vllm_omni/diffusion/models/flux2/pipeline_flux2.py

mergify Bot mentioned this pull request Mar 3, 2026

[Model]: add FLUX.2-dev model #1630

Closed

5 tasks

[feature]: support Flux.2-dev model

00b5a4c

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>

nuclearwu force-pushed the flux2 branch from a53145a to 00b5a4c Compare March 3, 2026 07:21

nuclearwu added 4 commits March 3, 2026 15:31

[feature]: support Flux.2-dev model

ae1d8d2

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>

[feature]: support Flux.2-dev model

7ec051f

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>

Merge branch 'main' of github.com:vllm-project/vllm-omni

fee59ec

# Conflicts: # docs/user_guide/diffusion_acceleration.md

[feature]: support Flux.2-dev model

ceeec04

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>

[feature]: support Flux.2-dev model

910c2d3

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>

hsliuustc0106 requested changes Mar 3, 2026

View reviewed changes

hsliuustc0106 reviewed Mar 3, 2026

View reviewed changes

nuclearwu added 2 commits March 4, 2026 14:30

Merge branch 'main' of github.com:vllm-project/vllm-omni

550a96d

# Conflicts: # vllm_omni/diffusion/registry.py

[feature]: support Flux.2-dev model

496276b

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>

nuclearwu added 2 commits March 4, 2026 15:57

[feature]: support Flux.2-dev model

72a3af8

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>

[feature]: support Flux.2-dev model

df1782d

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>

nuclearwu requested a review from hsliuustc0106 March 4, 2026 09:03

[feature]: support Flux.2-dev model

00322c9

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>

lishunyang12 reviewed Mar 4, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/models/flux2/pipeline_flux2.py

[feature]: support Flux.2-dev model

cc26148

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>

nuclearwu requested a review from lishunyang12 March 5, 2026 01:30

lishunyang12 reviewed Mar 5, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/models/flux2/flux2_transformer.py

Comment thread vllm_omni/diffusion/models/flux2/pipeline_flux2.py

[feature]: support Flux.2-dev model

78f0849

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>

nuclearwu mentioned this pull request Mar 10, 2026

vLLM-Omni Model Support #808

Open

63 tasks

Merge branch 'main' of github.com:vllm-project/vllm-omni

0dfc72a

wtomin reviewed Mar 10, 2026

View reviewed changes

Comment thread tests/diffusion/models/flux2/test_flux2_transformer_tp.py

wtomin added the new model add new model label Mar 10, 2026

[feature]: support Flux.2-dev model

3bb68bd

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>

nuclearwu force-pushed the flux2 branch from 8287fdb to 3bb68bd Compare March 11, 2026 01:42

nuclearwu requested a review from wtomin March 11, 2026 01:52

wtomin approved these changes Mar 11, 2026

View reviewed changes

hsliuustc0106 added the ready label to trigger buildkite CI label Mar 11, 2026

Merge branch 'main' of github.com:vllm-project/vllm-omni

296e7e8

# Conflicts: # docs/user_guide/diffusion/parallelism_acceleration.md

wtomin requested review from ZJY0516 and removed request for lishunyang12 March 11, 2026 02:54

[feature]: support Flux.2-dev model

8f8639e

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>

[feature]: support Flux.2-dev model

1092a9c

Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>

hsliuustc0106 merged commit 4d89eba into vllm-project:main Mar 11, 2026
6 of 7 checks passed

This was referenced Mar 11, 2026

[RFC]: FLUX.2-dev Model Acceleration Support #1806

Open

[model] support FLUX.1-Kontext-dev #561

Merged

wtomin mentioned this pull request Mar 12, 2026

[New Model]: add FLUX.2 [dev] #153

Closed

1 task

nuclearwu mentioned this pull request Mar 27, 2026

[Feature]: support Flux.2-dev CFG-Parallel #2010

Merged

5 tasks

nuclearwu mentioned this pull request Apr 14, 2026

[Model] Add TP-aware MistralEncoder for FLUX.2-dev TP #2465

Open

5 tasks

Conversation

nuclearwu commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

nuclearwu commented Mar 3, 2026

Uh oh!

hsliuustc0106 commented Mar 3, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Architectural Code Review

📋 Summary

✅ Strengths

🔴 Critical Issues

1. Zero Test Coverage

2. Weight Loading Typo

3. TP=1 OOM Without Explanation

🟡 Significant Concerns

4. Code Attribution from diffusers

5. Hardcoded Magic Values

6. Inconsistent Error Handling

🟢 Minor Suggestions

7. Missing Type Hints

8. Docstring Gaps

🏗️ Architecture Impact Analysis

📝 Required Changes

Verdict

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Additional Feedback: Memory Profiling Required

Why This Matters

Suggested Memory Report Format

How to Profile

Additional Metrics to Include

Uh oh!

nuclearwu commented Mar 4, 2026

Uh oh!

lishunyang12 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lishunyang12 commented Mar 5, 2026

Uh oh!

nuclearwu commented Mar 10, 2026

Uh oh!

Uh oh!

wtomin commented Mar 10, 2026

Uh oh!

hsliuustc0106 commented Mar 11, 2026

Uh oh!

nuclearwu commented Mar 11, 2026

Uh oh!

nuclearwu commented Mar 11, 2026

Uh oh!

wtomin left a comment

Choose a reason for hiding this comment

Uh oh!

wtomin commented Mar 11, 2026

Uh oh!

nuclearwu commented Mar 11, 2026

Uh oh!

Uh oh!

jannikstdl commented Mar 12, 2026

Uh oh!

nuclearwu commented Mar 3, 2026 •

edited

Loading

lishunyang12 left a comment •

edited

Loading