[Model] Add edit preprocessor for HunyuanImage3 by sjuxax · Pull Request #1644 · vllm-project/vllm-omni

sjuxax · 2026-03-04T00:58:23Z

Purpose

Allows HunyuanImage-3.0-Instruct to be used to edit images.

Test Plan

Some tests have been added, and have tried manually with the ComfyUI extension. All seems to be working well.

See the new tests/diffusion/test_hunyuan_image3_edit_preprocess.py.

Test Result

All seems to be working well.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

Wire HunyuanImage3 into the /images/edit path with full conditional-image preprocessing and request plumbing. - add get_hunyuan_image_3_pre_process_func and register it for HunyuanImage3ForCausalMM - normalize edit inputs from PIL/ndarray/tensor, resize/crop for VAE, and build VAE+ViT JointImageInfo payloads - serialize/deserialize conditional image info so async RPC transport remains compatible - propagate batch_cond_image_info through forward -> prepare_model_inputs - make vae_encode accept 3D/4D image tensors by normalizing to (B, C, T, H, W) - declare HunyuanImage3Pipeline.support_image_input = True - implement LightProjector.forward to unblock vision aligner calls during edit generation - extend module discovery/layerwise hints for Hunyuan model offload path - add regression tests for preprocess payload construction and LightProjector callability Co-authored-by: Codex <codex@openai.com> Signed-off-by: Jeff Cook <jeff@jeffcook.io>

Prevent stage IPC serialization failures on image edit requests by coercing numpy scalar metadata (e.g. np.int64) into plain Python scalars before attaching conditional image payloads to prompts. - coerce ImageInfo payload scalar fields via helper - normalize target/base/ratio values to int during preprocess - handle numpy scalar values in payload decode helper - extend preprocess regression test to cover numpy int64 metadata Co-authored-by: Codex <codex@openai.com> Signed-off-by: Jeff Cook <jeff@jeffcook.io>

Signed-off-by: Jeff Cook <jeff@jeffcook.io>

hsliuustc0106 · 2026-03-04T01:24:51Z

PR #1644 Review: Add edit preprocessor for HunyuanImage3

📊 Overall Assessment: 8.5/10

This is a well-structured PR that adds image editing support to the HunyuanImage3 model. The implementation is comprehensive, handles edge cases properly, and includes good test coverage.

✅ Strengths

1. Complete End-to-End Implementation

✅ Preprocessing pipeline: Full image input handling (PIL, ndarray, tensor)
✅ IPC compatibility: Numpy scalar normalization for multi-process serialization
✅ Batch processing: Proper batch_cond_image_info propagation through the pipeline
✅ VAE flexibility: Accepts 3D/4D/5D image tensors with automatic normalization
✅ Model registration: Properly registers preprocessor in diffusion registry

2. Robust Image Input Handling

def _to_pil_image(image: Any) -> PILImage.Image:
    # Handles PIL, str, ndarray, tensor
    # Normalizes dtype, channels, dimensions

✅ Good: Comprehensive conversion logic handles multiple input formats gracefully.

3. IPC Serialization Fix

def _to_python_scalar(value: Any) -> Any:
    if isinstance(value, np.generic):
        return value.item()
    return value

✅ Excellent: Critical fix for numpy scalar serialization in multi-process environments. This prevents IPC failures that would be hard to debug.

4. Proper Batch Validation

if any(has_cond_image) and not all(has_cond_image):
    raise ValueError(
        "When batching Hunyuan image editing requests, "
        "every prompt must include input image(s)."
    )

✅ Good: Clear error message for inconsistent batch inputs.

5. Test Coverage

✅ Preprocessing test with numpy int64 metadata
✅ LightProjector callability test
✅ Payload roundtrip validation
✅ Tests marked with pytest.mark.cpu for CI

🔍 Code Review

Main Components

1. Image Preprocessing (`get_hunyuan_image_3_pre_process_func`)

Strengths:

✅ Proper VAE resizing with center crop
✅ VAE tensor extraction
✅ ViT tensor extraction with spatial shapes
✅ JointImageInfo construction

Minor issue:

target_width = int(target_width)
target_height = int(target_height)

Already using _to_python_scalar, but explicit int() calls are redundant. However, they're harmless and add clarity.

2. VAE Encoding (`vae_encode`)

if image.ndim == 3:
    image = image.unsqueeze(0)
if image.ndim == 4:
    image = image.unsqueeze(2)

✅ Good: Flexible dimension handling for VAE input.

3. Payload Serialization/Deserialization

def _image_info_to_payload(image_info: ImageInfo) -> dict[str, Any]:
    return {
        "image_type": image_info.image_type,
        "image_tensor": image_info.image_tensor,
        "image_width": _to_python_scalar(image_info.image_width),
        ...
    }

✅ Excellent: Clean serialization with numpy scalar normalization.

4. LightProjector.forward

def forward(self, x):
    return self.layers(x)

✅ Good: Simple implementation to unblock vision aligner calls.

⚠️ Suggestions for Improvement

1. Missing Type Hints

Some helper functions lack type hints:

# Current
def _resize_and_crop_center(image, target_width, target_height):

# Suggested
def _resize_and_crop_center(
    image: PILImage.Image,
    target_width: int,
    target_height: int
) -> PILImage.Image:

2. Documentation

The PR description says "allows HunyuanImage-3.0-Instruct to be used to edit images" but doesn't specify:

What types of edits are supported?
Input requirements (image format, resolution limits)
Example usage

Suggestion: Add example in PR description:

from vllm_omni.entrypoints.omni import Omni

omni = Omni(model="tencent/HunyuanImage-3.0")
outputs = omni.generate(
    prompt="Change the sky to sunset colors",
    images=["input.jpg"],
)

3. Error Messages Could Be More Specific

raise TypeError(f"Unsupported image input type: {type(image)}")

Could include supported types:

raise TypeError(
    f"Unsupported image input type: {type(image)}. "
    f"Expected PIL.Image, numpy.ndarray, torch.Tensor, or str (path)."
)

4. Magic Numbers

scale = max(target_width / src_width, target_height / src_height)

No explanation of why we use max instead of min or other strategies.

Suggestion: Add comment:

# Use max scale to ensure image covers target area, then center crop
scale = max(target_width / src_width, target_height / src_height)

5. Test Coverage Gap

Tests cover:

✅ Preprocessing logic
✅ LightProjector callability
❌ Missing: End-to-end integration test with actual model
❌ Missing: Multi-image batch test
❌ Missing: Error case tests (invalid inputs)

Suggestion: Add integration test (even if slow/optional):

@pytest.mark.slow
def test_hunyuan_image3_edit_e2e():
    """End-to-end test for image editing."""
    omni = Omni(model="tencent/HunyuanImage-3.0")
    # Test actual image editing

🐛 Potential Issues

1. Memory Usage with Multiple Images

cond_image_infos = [_build_cond_joint_image(image) for image in image_list]

No validation of image list length. Processing many images could cause OOM.

Suggestion: Add validation:

if len(image_list) > MAX_IMAGES_PER_REQUEST:
    raise ValueError(
        f"Too many images: {len(image_list)}. "
        f"Maximum supported: {MAX_IMAGES_PER_REQUEST}"
    )

2. No Image Size Validation

first_image_w, first_image_h = _to_pil_image(image_list[0]).size
if request.sampling_params.width is None:
    request.sampling_params.width = int(first_image_w)

No validation that image size is within model's supported range.

Suggestion: Add size bounds checking (if model has limits).

3. Unused `cfg_factor` Parameter

def vae_encode(self, image, cfg_factor=1):
    ...
    # cfg_factor is never used

Either document why it's there or remove it.

📋 Testing

Existing Tests ✅

Preprocessing test with mocks
LightProjector forward pass
Payload roundtrip serialization

Missing Tests ❌

Multi-image batch processing
Invalid input handling (wrong types, corrupt images)
Different image formats (JPEG, PNG, WebP)
Edge cases (1x1 images, very large images)
Integration test with model inference

🔄 Consistency with Other Models

Checking against similar models (Qwen-Image-Edit, LongCat-Image-Edit):

✅ Similar preprocessing pattern
✅ Consistent payload structure
✅ Same error handling approach
⚠️ Different dimension handling (some models use different VAE conventions)

📝 Documentation Updates Needed

PR checklist shows documentation update is unchecked. Should update:

✅ supported_models.md: Add HunyuanImage3 to image editing models
✅ examples/: Add image editing example
✅ Model card: Document edit capabilities

🎯 Merge Recommendation

Ready to Merge with Minor Changes ⚠️

Rationale:

✅ Solid implementation of image editing support
✅ Critical IPC fix for numpy scalars
✅ Good test coverage for core functionality
⚠️ Missing integration test (optional but recommended)
⚠️ Documentation updates needed (should add before merge)

Required before merge:

✅ Update supported_models.md to list HunyuanImage3 for image editing
⚠️ Add example usage to PR description or examples/
⚠️ Consider adding input validation (image count, size bounds)

Optional improvements for follow-up:

Add type hints to helper functions
Add integration test (can be in separate PR)
Add input validation for edge cases

🎉 Summary

Pros:

✅ Complete end-to-end implementation
✅ Excellent IPC serialization handling
✅ Good test coverage for core logic
✅ Proper batch validation
✅ Clean code structure

Cons:

⚠️ Missing integration test
⚠️ Documentation updates needed
⚠️ Minor: No input validation for edge cases
⚠️ Minor: Some type hints missing

Overall: This is a high-quality PR that successfully adds image editing support to HunyuanImage3. The implementation is robust, handles edge cases well, and the IPC serialization fix is particularly valuable. With minor documentation updates, this is ready to merge.

🦐 Reviewed by AI Assistant with vLLM-Omni skills

- Add image editing capability for HunyuanImage3 model - Document conditional image preprocessing pipeline - Note IPC serialization fix for numpy scalars Source: vllm-project/vllm-omni#1644

lishunyang12

LGTM — solid edit preprocessor with good test coverage.

… hunyuanimage3-edit

usberkeley · 2026-03-13T02:53:29Z

look good, thanks Jeff

Gaohan123 · 2026-03-14T14:19:52Z

@wtomin @SamitHuang @princepride PTAL

princepride · 2026-03-14T14:22:36Z

@Semmer2 @usberkeley @nussejzz please also take a look😊

… hunyuanimage3-edit

wtomin · 2026-03-20T12:55:32Z

Give at least one offline inference script and one online serving script. please also include the generated images, VRAM, e2e latency in your PR body.

You may need to update the documents, support_models.md, image_to_image.md, etc.

wtomin · 2026-03-31T04:08:37Z

Please resolve this conflicts.

skf-1999 · 2026-04-09T03:14:41Z

+    assert request.sampling_params.height == 16
+
+
+def test_hunyuan_image3_light_projector_is_callable():


Is this test required?

Signed-off-by: Samit <285365963@qq.com>

SamitHuang

conflicts resolved. LGTM

wtomin · 2026-04-14T07:54:31Z

Can you create a doc examples/offline_inference/hunyuan_image3/image_to_image.md to further explain how to use it? @sjuxax

Gaohan123 · 2026-04-30T08:22:24Z

Please check PR #3107 as it is duplicated

hsliuustc0106 · 2026-05-06T14:38:14Z

closed since #3107 merged

sjuxax requested a review from hsliuustc0106 as a code owner March 4, 2026 00:58

sjuxax and others added 2 commits March 3, 2026 18:03

Fix pre-commit errors.

d1aeb33

Signed-off-by: Jeff Cook <jeff@jeffcook.io>

sjuxax force-pushed the hunyuanimage3-edit branch from 620eb35 to d1aeb33 Compare March 4, 2026 01:17

hsliuustc0106 mentioned this pull request Mar 4, 2026

[Auto-Update] Add HunyuanImage3 image editing support from PR #1644 hsliuustc0106/vllm-omni-skills#1

Closed

lishunyang12 approved these changes Mar 4, 2026

View reviewed changes

sjuxax added 2 commits March 4, 2026 09:24

Merge branch 'main' of https://github.com/vllm-project/vllm-omni into…

c338f57

… hunyuanimage3-edit

Merge remote-tracking branch 'origin/main' into hunyuanimage3-edit

0135a4f

sjuxax force-pushed the hunyuanimage3-edit branch from 20b9f89 to 0135a4f Compare March 10, 2026 13:56

hsliuustc0106 added the ready label to trigger buildkite CI label Mar 10, 2026

sjuxax added 2 commits March 10, 2026 21:58

Merge remote-tracking branch 'origin/main' into hunyuanimage3-edit

0e3c4aa

Merge branch 'main' into hunyuanimage3-edit

0458c26

Gaohan123 added this to the v0.18.0 milestone Mar 14, 2026

sjuxax added 2 commits March 15, 2026 17:42

Merge remote-tracking branch 'origin/main' into hunyuanimage3-edit

147da90

Merge branch 'main' of https://github.com/vllm-project/vllm-omni into…

4295b3d

… hunyuanimage3-edit

This comment was marked as resolved.

Sign in to view

skf-1999 reviewed Apr 9, 2026

View reviewed changes

Merge branch 'main' into hunyuanimage3-edit

a8d78e2

Signed-off-by: Samit <285365963@qq.com>

SamitHuang approved these changes Apr 11, 2026

View reviewed changes

Gaohan123 modified the milestones: v0.18.0, v0.20.0 Apr 14, 2026

Gaohan123 removed this from the v0.20.0 milestone Apr 30, 2026

hsliuustc0106 closed this May 6, 2026

		assert request.sampling_params.height == 16


		def test_hunyuan_image3_light_projector_is_callable():

Conversation

sjuxax commented Mar 4, 2026

Purpose

Test Plan

Test Result

Uh oh!

hsliuustc0106 commented Mar 4, 2026

PR #1644 Review: Add edit preprocessor for HunyuanImage3

📊 Overall Assessment: 8.5/10

✅ Strengths

1. Complete End-to-End Implementation

2. Robust Image Input Handling

3. IPC Serialization Fix

4. Proper Batch Validation

5. Test Coverage

🔍 Code Review

Main Components

1. Image Preprocessing (get_hunyuan_image_3_pre_process_func)

2. VAE Encoding (vae_encode)

3. Payload Serialization/Deserialization

4. LightProjector.forward

⚠️ Suggestions for Improvement

1. Missing Type Hints

2. Documentation

3. Error Messages Could Be More Specific

4. Magic Numbers

5. Test Coverage Gap

🐛 Potential Issues

1. Memory Usage with Multiple Images

2. No Image Size Validation

3. Unused cfg_factor Parameter

📋 Testing

Existing Tests ✅

Missing Tests ❌

🔄 Consistency with Other Models

📝 Documentation Updates Needed

🎯 Merge Recommendation

Ready to Merge with Minor Changes ⚠️

🎉 Summary

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

usberkeley commented Mar 13, 2026

Uh oh!

Gaohan123 commented Mar 14, 2026

Uh oh!

princepride commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wtomin commented Mar 20, 2026

Uh oh!

wtomin commented Mar 31, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

skf-1999 Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

SamitHuang left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wtomin commented Apr 14, 2026

Uh oh!

Gaohan123 commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hsliuustc0106 commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

1. Image Preprocessing (`get_hunyuan_image_3_pre_process_func`)

2. VAE Encoding (`vae_encode`)

3. Unused `cfg_factor` Parameter

princepride commented Mar 14, 2026 •

edited

Loading

SamitHuang left a comment •

edited

Loading

Gaohan123 commented Apr 30, 2026 •

edited

Loading