Skip to content

[Model] Add edit preprocessor for HunyuanImage3#1644

Closed
sjuxax wants to merge 10 commits into
vllm-project:mainfrom
sjuxax:hunyuanimage3-edit
Closed

[Model] Add edit preprocessor for HunyuanImage3#1644
sjuxax wants to merge 10 commits into
vllm-project:mainfrom
sjuxax:hunyuanimage3-edit

Conversation

@sjuxax
Copy link
Copy Markdown

@sjuxax sjuxax commented Mar 4, 2026

Purpose

Allows HunyuanImage-3.0-Instruct to be used to edit images.

Test Plan

Some tests have been added, and have tried manually with the ComfyUI extension. All seems to be working well.

See the new tests/diffusion/test_hunyuan_image3_edit_preprocess.py.

Test Result

All seems to be working well.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

Wire HunyuanImage3 into the /images/edit path with full conditional-image
preprocessing and request plumbing.

- add get_hunyuan_image_3_pre_process_func and register it for HunyuanImage3ForCausalMM
- normalize edit inputs from PIL/ndarray/tensor, resize/crop for VAE, and build VAE+ViT JointImageInfo payloads
- serialize/deserialize conditional image info so async RPC transport remains compatible
- propagate batch_cond_image_info through forward -> prepare_model_inputs
- make vae_encode accept 3D/4D image tensors by normalizing to (B, C, T, H, W)
- declare HunyuanImage3Pipeline.support_image_input = True
- implement LightProjector.forward to unblock vision aligner calls during edit generation
- extend module discovery/layerwise hints for Hunyuan model offload path
- add regression tests for preprocess payload construction and LightProjector callability

Co-authored-by: Codex <codex@openai.com>
Signed-off-by: Jeff Cook <jeff@jeffcook.io>
@sjuxax sjuxax requested a review from hsliuustc0106 as a code owner March 4, 2026 00:58
sjuxax and others added 2 commits March 3, 2026 18:03
Prevent stage IPC serialization failures on image edit requests by coercing numpy scalar metadata (e.g. np.int64) into plain Python scalars before attaching conditional image payloads to prompts.
- coerce ImageInfo payload scalar fields via helper
- normalize target/base/ratio values to int during preprocess
- handle numpy scalar values in payload decode helper
- extend preprocess regression test to cover numpy int64 metadata

Co-authored-by: Codex <codex@openai.com>
Signed-off-by: Jeff Cook <jeff@jeffcook.io>
Signed-off-by: Jeff Cook <jeff@jeffcook.io>
@sjuxax sjuxax force-pushed the hunyuanimage3-edit branch from 620eb35 to d1aeb33 Compare March 4, 2026 01:17
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

PR #1644 Review: Add edit preprocessor for HunyuanImage3

📊 Overall Assessment: 8.5/10

This is a well-structured PR that adds image editing support to the HunyuanImage3 model. The implementation is comprehensive, handles edge cases properly, and includes good test coverage.


✅ Strengths

1. Complete End-to-End Implementation

  • Preprocessing pipeline: Full image input handling (PIL, ndarray, tensor)
  • IPC compatibility: Numpy scalar normalization for multi-process serialization
  • Batch processing: Proper batch_cond_image_info propagation through the pipeline
  • VAE flexibility: Accepts 3D/4D/5D image tensors with automatic normalization
  • Model registration: Properly registers preprocessor in diffusion registry

2. Robust Image Input Handling

def _to_pil_image(image: Any) -> PILImage.Image:
    # Handles PIL, str, ndarray, tensor
    # Normalizes dtype, channels, dimensions

✅ Good: Comprehensive conversion logic handles multiple input formats gracefully.

3. IPC Serialization Fix

def _to_python_scalar(value: Any) -> Any:
    if isinstance(value, np.generic):
        return value.item()
    return value

✅ Excellent: Critical fix for numpy scalar serialization in multi-process environments. This prevents IPC failures that would be hard to debug.

4. Proper Batch Validation

if any(has_cond_image) and not all(has_cond_image):
    raise ValueError(
        "When batching Hunyuan image editing requests, "
        "every prompt must include input image(s)."
    )

✅ Good: Clear error message for inconsistent batch inputs.

5. Test Coverage

  • ✅ Preprocessing test with numpy int64 metadata
  • ✅ LightProjector callability test
  • ✅ Payload roundtrip validation
  • ✅ Tests marked with pytest.mark.cpu for CI

🔍 Code Review

Main Components

1. Image Preprocessing (get_hunyuan_image_3_pre_process_func)

Strengths:

  • ✅ Proper VAE resizing with center crop
  • ✅ VAE tensor extraction
  • ✅ ViT tensor extraction with spatial shapes
  • ✅ JointImageInfo construction

Minor issue:

target_width = int(target_width)
target_height = int(target_height)

Already using _to_python_scalar, but explicit int() calls are redundant. However, they're harmless and add clarity.

2. VAE Encoding (vae_encode)

if image.ndim == 3:
    image = image.unsqueeze(0)
if image.ndim == 4:
    image = image.unsqueeze(2)

✅ Good: Flexible dimension handling for VAE input.

3. Payload Serialization/Deserialization

def _image_info_to_payload(image_info: ImageInfo) -> dict[str, Any]:
    return {
        "image_type": image_info.image_type,
        "image_tensor": image_info.image_tensor,
        "image_width": _to_python_scalar(image_info.image_width),
        ...
    }

✅ Excellent: Clean serialization with numpy scalar normalization.

4. LightProjector.forward

def forward(self, x):
    return self.layers(x)

✅ Good: Simple implementation to unblock vision aligner calls.


⚠️ Suggestions for Improvement

1. Missing Type Hints

Some helper functions lack type hints:

# Current
def _resize_and_crop_center(image, target_width, target_height):

# Suggested
def _resize_and_crop_center(
    image: PILImage.Image,
    target_width: int,
    target_height: int
) -> PILImage.Image:

2. Documentation

The PR description says "allows HunyuanImage-3.0-Instruct to be used to edit images" but doesn't specify:

  • What types of edits are supported?
  • Input requirements (image format, resolution limits)
  • Example usage

Suggestion: Add example in PR description:

from vllm_omni.entrypoints.omni import Omni

omni = Omni(model="tencent/HunyuanImage-3.0")
outputs = omni.generate(
    prompt="Change the sky to sunset colors",
    images=["input.jpg"],
)

3. Error Messages Could Be More Specific

raise TypeError(f"Unsupported image input type: {type(image)}")

Could include supported types:

raise TypeError(
    f"Unsupported image input type: {type(image)}. "
    f"Expected PIL.Image, numpy.ndarray, torch.Tensor, or str (path)."
)

4. Magic Numbers

scale = max(target_width / src_width, target_height / src_height)

No explanation of why we use max instead of min or other strategies.

Suggestion: Add comment:

# Use max scale to ensure image covers target area, then center crop
scale = max(target_width / src_width, target_height / src_height)

5. Test Coverage Gap

Tests cover:

  • ✅ Preprocessing logic
  • ✅ LightProjector callability
  • Missing: End-to-end integration test with actual model
  • Missing: Multi-image batch test
  • Missing: Error case tests (invalid inputs)

Suggestion: Add integration test (even if slow/optional):

@pytest.mark.slow
def test_hunyuan_image3_edit_e2e():
    """End-to-end test for image editing."""
    omni = Omni(model="tencent/HunyuanImage-3.0")
    # Test actual image editing

🐛 Potential Issues

1. Memory Usage with Multiple Images

cond_image_infos = [_build_cond_joint_image(image) for image in image_list]

No validation of image list length. Processing many images could cause OOM.

Suggestion: Add validation:

if len(image_list) > MAX_IMAGES_PER_REQUEST:
    raise ValueError(
        f"Too many images: {len(image_list)}. "
        f"Maximum supported: {MAX_IMAGES_PER_REQUEST}"
    )

2. No Image Size Validation

first_image_w, first_image_h = _to_pil_image(image_list[0]).size
if request.sampling_params.width is None:
    request.sampling_params.width = int(first_image_w)

No validation that image size is within model's supported range.

Suggestion: Add size bounds checking (if model has limits).

3. Unused cfg_factor Parameter

def vae_encode(self, image, cfg_factor=1):
    ...
    # cfg_factor is never used

Either document why it's there or remove it.


📋 Testing

Existing Tests ✅

  • Preprocessing test with mocks
  • LightProjector forward pass
  • Payload roundtrip serialization

Missing Tests ❌

  • Multi-image batch processing
  • Invalid input handling (wrong types, corrupt images)
  • Different image formats (JPEG, PNG, WebP)
  • Edge cases (1x1 images, very large images)
  • Integration test with model inference

🔄 Consistency with Other Models

Checking against similar models (Qwen-Image-Edit, LongCat-Image-Edit):

  • ✅ Similar preprocessing pattern
  • ✅ Consistent payload structure
  • ✅ Same error handling approach
  • ⚠️ Different dimension handling (some models use different VAE conventions)

📝 Documentation Updates Needed

PR checklist shows documentation update is unchecked. Should update:

  • supported_models.md: Add HunyuanImage3 to image editing models
  • examples/: Add image editing example
  • ✅ Model card: Document edit capabilities

🎯 Merge Recommendation

Ready to Merge with Minor Changes ⚠️

Rationale:

  1. Solid implementation of image editing support
  2. Critical IPC fix for numpy scalars
  3. Good test coverage for core functionality
  4. ⚠️ Missing integration test (optional but recommended)
  5. ⚠️ Documentation updates needed (should add before merge)

Required before merge:

  1. ✅ Update supported_models.md to list HunyuanImage3 for image editing
  2. ⚠️ Add example usage to PR description or examples/
  3. ⚠️ Consider adding input validation (image count, size bounds)

Optional improvements for follow-up:

  • Add type hints to helper functions
  • Add integration test (can be in separate PR)
  • Add input validation for edge cases

🎉 Summary

Pros:

  • ✅ Complete end-to-end implementation
  • ✅ Excellent IPC serialization handling
  • ✅ Good test coverage for core logic
  • ✅ Proper batch validation
  • ✅ Clean code structure

Cons:

  • ⚠️ Missing integration test
  • ⚠️ Documentation updates needed
  • ⚠️ Minor: No input validation for edge cases
  • ⚠️ Minor: Some type hints missing

Overall: This is a high-quality PR that successfully adds image editing support to HunyuanImage3. The implementation is robust, handles edge cases well, and the IPC serialization fix is particularly valuable. With minor documentation updates, this is ready to merge.


🦐 Reviewed by AI Assistant with vLLM-Omni skills

hsliuustc0106 added a commit to hsliuustc0106/vllm-omni-skills that referenced this pull request Mar 4, 2026
- Add image editing capability for HunyuanImage3 model
- Document conditional image preprocessing pipeline
- Note IPC serialization fix for numpy scalars

Source: vllm-project/vllm-omni#1644
Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — solid edit preprocessor with good test coverage.

@sjuxax sjuxax force-pushed the hunyuanimage3-edit branch from 20b9f89 to 0135a4f Compare March 10, 2026 13:56
@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Mar 10, 2026
@usberkeley
Copy link
Copy Markdown
Contributor

look good, thanks Jeff

@Gaohan123
Copy link
Copy Markdown
Collaborator

@wtomin @SamitHuang @princepride PTAL

@Gaohan123 Gaohan123 added this to the v0.18.0 milestone Mar 14, 2026
@princepride
Copy link
Copy Markdown
Collaborator

princepride commented Mar 14, 2026

@Semmer2 @usberkeley @nussejzz please also take a look😊

@wtomin
Copy link
Copy Markdown
Collaborator

wtomin commented Mar 20, 2026

Give at least one offline inference script and one online serving script. please also include the generated images, VRAM, e2e latency in your PR body.

You may need to update the documents, support_models.md, image_to_image.md, etc.

@wtomin
Copy link
Copy Markdown
Collaborator

wtomin commented Mar 31, 2026

Please resolve this conflicts.

skf-1999

This comment was marked as resolved.

assert request.sampling_params.height == 16


def test_hunyuan_image3_light_projector_is_callable():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test required?

Signed-off-by: Samit <285365963@qq.com>
Copy link
Copy Markdown
Collaborator

@SamitHuang SamitHuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conflicts resolved. LGTM

@wtomin
Copy link
Copy Markdown
Collaborator

wtomin commented Apr 14, 2026

Can you create a doc examples/offline_inference/hunyuan_image3/image_to_image.md to further explain how to use it? @sjuxax

@Gaohan123 Gaohan123 modified the milestones: v0.18.0, v0.20.0 Apr 14, 2026
@Gaohan123
Copy link
Copy Markdown
Collaborator

Gaohan123 commented Apr 30, 2026

Please check PR #3107 as it is duplicated

@Gaohan123 Gaohan123 removed this from the v0.20.0 milestone Apr 30, 2026
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

closed since #3107 merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants