[Doc] Improve diffusion generation parameter docs for online serving by SamitHuang · Pull Request #2051 · vllm-project/vllm-omni

SamitHuang · 2026-03-20T16:14:40Z

Purpose

Improve documentation clarity and consistency for passing generation parameters
(e.g., num_inference_steps, height, width, guidance_scale) to diffusion
models via the /v1/chat/completions endpoint.

Currently, docs and examples exclusively show a nested "extra_body": {...} JSON
format for curl requests. This format is non-standard and causes confusion because:

The OpenAI Python SDK's extra_body= parameter flattens fields into the
top-level request body — it does not send a literal "extra_body" JSON key.
Users following SDK conventions silently lose all generation parameters (related: [bugfix] /chat/completion doesn't read extra_body for diffusion model #2042).
The extra_body naming collides with the SDK keyword, leading users to assume
they are equivalent when they are not.
No documentation page explains how parameter passing works across different
client libraries.

This PR addresses the confusion by:

Adding a cross-cutting Diffusion Chat API guide
that explains the three supported parameter formats (top-level fields, OpenAI SDK
extra_body=, and legacy nested "extra_body" JSON key), with examples for curl,
Python requests, and the OpenAI SDK.
Updating model-specific docs (text_to_image.md, image_to_image.md,
glm_image.md) to show top-level parameter format in curl examples and add
OpenAI SDK code samples.
Updating example scripts (openai_chat_client.py and run_curl_*.sh for
glm_image and image_to_image) to use top-level parameter format for consistency
with the SDK behavior.

Related: #2042 (code fix for model_extra fallback in diffusion handler)

Test Plan

Documentation-only changes plus example script updates.
Verified locally with mkdocs serve that the new page renders correctly and all
cross-references resolve.
The example scripts use the same parameter keys as before; only the JSON nesting
level changes (top-level instead of nested in extra_body). This is fully backward
compatible after [bugfix] /chat/completion doesn't read extra_body for diffusion model #2042 merges.

Test Result

N/A — documentation and example scripts only.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bb0b93d009

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-20T16:17:53Z

+    payload: dict[str, object] = {
+        "messages": messages,
        "height": height,
        "width": width,
        "num_inference_steps": steps,


Keep diffusion params in extra_body for raw JSON clients

This change moves generation parameters to top-level JSON for requests/curl payloads, but the diffusion chat handler still reads only request.extra_body (see vllm_omni/entrypoints/openai/serving_chat.py around lines 2039-2071). In the current server behavior, fields like height, width, num_inference_steps, guidance_scale, and seed sent at top level are ignored, so these examples silently run with defaults and break reproducibility/tuning for users following the updated scripts.

Useful? React with 👍 / 👎.

fhfuih · 2026-03-21T04:24:03Z

I think the /chat/completion API problem may not be limited to diffusion handler. When I curl a BAGEL service, I can also observe extra_body appear in both request body and model_extra (although not flattened). But that still depends on whether we should frame BAGEL (and later models like GLM, with mixed stages) as diffusion models in our documentation, and expect their users to refer to this paged with "Diffusion" in its title

Add a cross-cutting Diffusion Chat API guide explaining how to pass generation parameters (num_inference_steps, height, width, etc.) via /v1/chat/completions across different clients (curl, OpenAI SDK, Python requests). Update model-specific docs and example READMEs to add OpenAI SDK examples and cross-reference the new guide. Add Qwen-Image-Layered guidance to image_to_image docs with curl, SDK, and Python examples, covering its model-specific parameters (layers, resolution, cfg_scale) and multi-image response format. Made-with: Cursor Signed-off-by: samithuang <285365963@qq.com>

hsliuustc0106 · 2026-03-22T01:46:22Z

@princepride

lishunyang12

Left a few comments. The central guide page is a good addition but there are some internal inconsistencies to fix.

lishunyang12 · 2026-03-22T17:27:05Z

+        with open(f"layer_{i}.png", "wb") as f:
+            f.write(base64.b64decode(b64_data))
+    ```
+


Same here — the Python requests example uses nested "extra_body". Per the new guide, raw HTTP clients should use top-level fields.

Unified to use extra_body for chat/completions endpoint

princepride · 2026-03-23T02:07:47Z

Out of curiosity, have you test glm-image recently? I remember it has some bug after last time refactor😂

fhfuih

I think it is really helpful that this PR clarify

The format ambiguity in /chat/completion endpoint
The concurrent support of /chat/competion and /image/XXX for image generation/editing tasks
Qwen Image Layered usage

But there might be a better way to organize doc pages and the information.

Left some comments above. I think

Whether to create a new page docs/serving/diffusion_chat_api.md requires some immediate discussion & consideration. Plus this doc page somehow mentions irrelevant content and repeating Generation Parameter Reference
For other comments in other pages, the content refinement is less urgent. Can simply resolve formatting suggestions and get this PR on first. In the future, there can be a systematic planning of online doc refinement, similar to #1244 (the offline one)

Co-authored-by: Zeyu Huang | 黃澤宇 <11222265+fhfuih@users.noreply.github.com> Signed-off-by: Samit <285365963@qq.com>

- Remove `--cfg-parallel-size` from param tables (server CLI flag, not request param) - Reframe diffusion_chat_api.md: nested `extra_body` is the primary format for curl/requests, remove "Legacy" label and endpoint overview table - Update curl/requests examples in the guide to use nested `extra_body` - Remove video-specific params from the chat API guide (out of scope) - Unify note wording across READMEs for SDK vs JSON `extra_body` - Fix glm_image.md parameter table intro for consistency Made-with: Cursor Signed-off-by: samithuang <285365963@qq.com>

Remove duplicated examples, parameter table, and image-editing section that overlap with model-specific docs. Keep only the unique content: the extra_body SDK-vs-JSON explanation and the "ignored fields" warning. Add links to model-specific guides for full examples. Addresses fhfuih's review feedback about single source of truth. Made-with: Cursor Signed-off-by: samithuang <285365963@qq.com>

Remove inline curl and OpenAI SDK code blocks that duplicate the general text-to-image and image-to-image guides. Keep only the model-specific script examples (openai_chat_client.py, run_curl_*.sh) and link to the general guides for other request methods. Addresses fhfuih's review feedback. Made-with: Cursor Signed-off-by: samithuang <285365963@qq.com>

Signed-off-by: Samit <285365963@qq.com>

Update the Generation Parameters sections in all model-specific docs to clarify that /v1/images/generations and /v1/images/edits accept parameters as top-level fields, while /v1/chat/completions requires them inside extra_body. Made-with: Cursor Signed-off-by: samithuang <285365963@qq.com>

fhfuih

LGTM! And also confirm that the doc update request mentioned in #2053 is also included

gcanlin

Thanks for the doc cleanup. I checked the documented defaults against the implementation and found a few mismatches:

The dedicated image endpoints do not accept all /v1/chat/completions extra_body field names as top-level fields. In particular, /v1/images/generations and /v1/images/edits use size / n for dimensions and count, while chat uses height / width / num_outputs_per_prompt.
Qwen-Image-Edit's effective default guidance_scale is 1.0, not 7.5.
GLM-Image does not have a fixed default seed of 42. If seed is omitted, chat leaves it unset, while /v1/images/* generates one server-side.

I left inline suggestions on the affected doc lines. The same fixes should probably also be mirrored in the corresponding examples/online_serving/*/README.md files, since they duplicate the same wording/tables.

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

…llm-project#2051) Signed-off-by: samithuang <285365963@qq.com> Signed-off-by: Samit <285365963@qq.com> Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: Zeyu Huang | 黃澤宇 <11222265+fhfuih@users.noreply.github.com> Co-authored-by: gcanlin <canlinguosdu@gmail.com>

SamitHuang requested a review from hsliuustc0106 as a code owner March 20, 2026 16:14

chatgpt-codex-connector Bot reviewed Mar 20, 2026

View reviewed changes

SamitHuang force-pushed the docs/diffusion-extra-body-guide branch 2 times, most recently from 2f73c32 to 843df29 Compare March 20, 2026 16:33

SamitHuang mentioned this pull request Mar 20, 2026

[bugfix] /chat/completion doesn't read extra_body for diffusion model #2042

Merged

5 tasks

SamitHuang requested review from david6666666 and wtomin March 20, 2026 16:34

gcanlin reviewed Mar 20, 2026

View reviewed changes

Comment thread docs/serving/diffusion_chat_api.md Outdated

gcanlin reviewed Mar 20, 2026

View reviewed changes

Comment thread examples/online_serving/image_to_image/README.md

SamitHuang force-pushed the docs/diffusion-extra-body-guide branch from 843df29 to 537953a Compare March 21, 2026 07:17

SamitHuang added the ready label to trigger buildkite CI label Mar 21, 2026

lishunyang12 reviewed Mar 22, 2026

View reviewed changes