Skip to content

[Feature] online HunyuanImage-3.0 IT2I (image editing) support#3410

Merged
hsliuustc0106 merged 1 commit into
vllm-project:mainfrom
skf-1999:feat/image-edit-api
May 8, 2026
Merged

[Feature] online HunyuanImage-3.0 IT2I (image editing) support#3410
hsliuustc0106 merged 1 commit into
vllm-project:mainfrom
skf-1999:feat/image-edit-api

Conversation

@skf-1999
Copy link
Copy Markdown
Contributor

@skf-1999 skf-1999 commented May 7, 2026

Purpose

Support online HunyuanImage-3.0 IT2I (image editing) inference
This PR needs to run on top of the bug fix from PR #3395.

Test Plan

Online Inference

vllm serve "/data/HunyuanImage-3.0-Instruct" \
    --omni \
    --port "8091" \
    --tensor_parallel_size 8 \
    --stage-configs-path vllm_omni/model_executor/stage_configs/hunyuan_image3_it2i.yaml \
    --enforce-eager

Online Request

curl -X POST http://localhost:8091/v1/images/edits \
  -F "image=@/data/s00957182/0506/edit_dog.png" \
  -F "prompt=新年宠物海报,Q版圆润的可爱标题\"新年快乐汪\",副标题\"HAPPY NEW YEAR\"。 鱼眼镜头,背景是房间门口,近景,上传的主体歪头笑,围着红色围巾,戴着红色毛线帽,高清,绒毛细节,面部特写。 宝丽莱相纸,超现实主义,写实主义,胶片摄影,打印颗粒感肌理。肌理,超写实,复古感。" \
  -F "bot_task=it2i_think" \
  -F "n=1" \
  -F "num_inference_steps=50" \
  -F "guidance_scale=2.5" \
  -F "seed=42" \
  | jq -r '.data[0].b64_json' \
  | base64 -d > result.png

bot_task can be chosen between "it2i_think" or "it2i_recaption"

Test Result

An examplr for "it2i_think"
result11


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@skf-1999 skf-1999 requested a review from hsliuustc0106 as a code owner May 7, 2026 09:10
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fff228c533

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

lora_dict = _get_lora_from_json_str(lora)
_parse_lora_request(lora_dict)
extra_body["lora"] = lora_dict
if hunyuan_task is not None:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Initialize hunyuan_task before conditional extraction

hunyuan_task is only assigned inside the JSON-only branch, but it is later referenced unconditionally when building extra_body for multi-stage edits. For non-JSON requests (the existing multipart/form-data path), this raises UnboundLocalError at runtime once len(stage_configs) > 1, so regular image-edit requests to multi-stage pipelines will fail before dispatch.

Useful? React with 👍 / 👎.

"""
# Handle JSON request
json_data = None
if raw_request.headers.get("Content-Type") == "application/json":
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Relax JSON content-type check to handle charset parameters

The JSON parsing path only runs when Content-Type is exactly application/json. Many clients send valid JSON headers like application/json; charset=utf-8; those requests skip this branch, leaving form fields unset and causing false 422 validation errors instead of processing the JSON payload. This makes the new JSON mode unreliable across common HTTP clients.

Useful? React with 👍 / 👎.

@skf-1999 skf-1999 force-pushed the feat/image-edit-api branch from fff228c to 1e6d0d2 Compare May 7, 2026 09:16
lora_dict = _get_lora_from_json_str(lora)
_parse_lora_request(lora_dict)
extra_body["lora"] = lora_dict
if hunyuan_task is not None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnboundLocalError: is only assigned in the JSON branch (line 1703-1722) but is referenced unconditionally at line 1936. For non-JSON requests using multi-stage pipelines, this will crash before dispatch. Initialize before the JSON branch.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed.

"""
# Handle JSON request
json_data = None
if raw_request.headers.get("Content-Type") == "application/json":
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON content-type check is too strict. Many clients send "application/json; charset=utf-8" instead of just "application/json". Those requests will skip this branch and cause 422 errors instead of processing the JSON payload. Use "application/json" in raw_request.headers.get("Content-Type", "") to handle charset parameters.

layers = extra_body.get("layers")
resolution = extra_body.get("resolution")
hunyuan_task = extra_body.get("hunyuan_task")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug print statement should be removed before merge.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

# `attention_mask` to `pixel_attention_mask` so the dict key must match
# the expected forward signature.
vit_kwargs = {"spatial_shapes": [], "pixel_attention_mask": []}
vit_kwargs = {"spatial_shapes": [], "attention_mask": []}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above says "transformers >=5.54 renamed the kwarg from attention_mask to pixel_attention_mask", but this change does the opposite. Are you using an old transformers version? If so, please add a version constraint or explain why this is safe.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR relies on the bug fix from PR #3395 to run. In fact, this change is a bug fix for the previous implementation rather than online support. I will delete it later.

images.append(img)
except Exception as e:
raise ValueError(f"Failed to open uploaded file: {e}")
# 4. Local file path
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local file path support via os.path.exists needs security review. This allows arbitrary file access from the server filesystem. Consider restricting to a whitelist directory or adding path validation to prevent directory traversal attacks.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Local file upload has been removed; images are now passed via base64 instead.

Copy link
Copy Markdown
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you rebase main? this have removed in main

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet, will rebase now.

# Extract parameters from JSON
image = json_data.get("image")
prompt = json_data.get("prompt")
model = json_data.get("model")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image edit is a standard interface. why need add these field , just reuse from input paramter image prompt

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Removed all standard field extractions from the JSON block (image, prompt, model, n, size, etc.). These are already declared as Form(...) / File(...) parameters in the function signature and are correctly parsed by FastAPI for standard multipart/form-data requests.

lora = json_data.get("lora")
layers = json_data.get("layers")
resolution = json_data.get("resolution")
hunyuan_task = json_data.get("hunyuan_task")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use bot_task?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Renamed hunyuan_task to bot_task.

@skf-1999 skf-1999 force-pushed the feat/image-edit-api branch 2 times, most recently from 4bf4543 to 8e13eda Compare May 8, 2026 06:20
@skf-1999 skf-1999 requested a review from tzhouam as a code owner May 8, 2026 06:20
@skf-1999 skf-1999 force-pushed the feat/image-edit-api branch from 8e13eda to 8851bbb Compare May 8, 2026 06:35
Copy link
Copy Markdown
Contributor

@Bounty-hunter Bounty-hunter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label May 8, 2026
@hsliuustc0106 hsliuustc0106 merged commit 039a09a into vllm-project:main May 8, 2026
8 checks passed
Signed-off-by: skf1999 <13234016272@163.com>
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants