Skip to content

[Doc] Improve diffusion generation parameter docs for online serving#2051

Merged
gcanlin merged 9 commits intovllm-project:mainfrom
SamitHuang:docs/diffusion-extra-body-guide
Mar 26, 2026
Merged

[Doc] Improve diffusion generation parameter docs for online serving#2051
gcanlin merged 9 commits intovllm-project:mainfrom
SamitHuang:docs/diffusion-extra-body-guide

Conversation

@SamitHuang
Copy link
Copy Markdown
Collaborator

@SamitHuang SamitHuang commented Mar 20, 2026

Purpose

Improve documentation clarity and consistency for passing generation parameters
(e.g., num_inference_steps, height, width, guidance_scale) to diffusion
models via the /v1/chat/completions endpoint.

Currently, docs and examples exclusively show a nested "extra_body": {...} JSON
format for curl requests. This format is non-standard and causes confusion because:

  1. The OpenAI Python SDK's extra_body= parameter flattens fields into the
    top-level request body — it does not send a literal "extra_body" JSON key.
    Users following SDK conventions silently lose all generation parameters (related: [bugfix] /chat/completion doesn't read extra_body for diffusion model #2042).
  2. The extra_body naming collides with the SDK keyword, leading users to assume
    they are equivalent when they are not.
  3. No documentation page explains how parameter passing works across different
    client libraries.

This PR addresses the confusion by:

  • Adding a cross-cutting Diffusion Chat API guide
    that explains the three supported parameter formats (top-level fields, OpenAI SDK
    extra_body=, and legacy nested "extra_body" JSON key), with examples for curl,
    Python requests, and the OpenAI SDK.
  • Updating model-specific docs (text_to_image.md, image_to_image.md,
    glm_image.md) to show top-level parameter format in curl examples and add
    OpenAI SDK code samples.
  • Updating example scripts (openai_chat_client.py and run_curl_*.sh for
    glm_image and image_to_image) to use top-level parameter format for consistency
    with the SDK behavior.

Related: #2042 (code fix for model_extra fallback in diffusion handler)

Test Plan

  • Documentation-only changes plus example script updates.
  • Verified locally with mkdocs serve that the new page renders correctly and all
    cross-references resolve.
  • The example scripts use the same parameter keys as before; only the JSON nesting
    level changes (top-level instead of nested in extra_body). This is fully backward
    compatible after [bugfix] /chat/completion doesn't read extra_body for diffusion model #2042 merges.

Test Result

N/A — documentation and example scripts only.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bb0b93d009

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 70 to 74
payload: dict[str, object] = {
"messages": messages,
"height": height,
"width": width,
"num_inference_steps": steps,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep diffusion params in extra_body for raw JSON clients

This change moves generation parameters to top-level JSON for requests/curl payloads, but the diffusion chat handler still reads only request.extra_body (see vllm_omni/entrypoints/openai/serving_chat.py around lines 2039-2071). In the current server behavior, fields like height, width, num_inference_steps, guidance_scale, and seed sent at top level are ignored, so these examples silently run with defaults and break reproducibility/tuning for users following the updated scripts.

Useful? React with 👍 / 👎.

@SamitHuang SamitHuang force-pushed the docs/diffusion-extra-body-guide branch 2 times, most recently from 2f73c32 to 843df29 Compare March 20, 2026 16:33
Comment thread docs/serving/diffusion_chat_api.md Outdated
Comment thread examples/online_serving/image_to_image/README.md
@fhfuih
Copy link
Copy Markdown
Contributor

fhfuih commented Mar 21, 2026

I think the /chat/completion API problem may not be limited to diffusion handler. When I curl a BAGEL service, I can also observe extra_body appear in both request body and model_extra (although not flattened). But that still depends on whether we should frame BAGEL (and later models like GLM, with mixed stages) as diffusion models in our documentation, and expect their users to refer to this paged with "Diffusion" in its title

Add a cross-cutting Diffusion Chat API guide explaining how to pass
generation parameters (num_inference_steps, height, width, etc.) via
/v1/chat/completions across different clients (curl, OpenAI SDK,
Python requests). Update model-specific docs and example READMEs to
add OpenAI SDK examples and cross-reference the new guide.

Add Qwen-Image-Layered guidance to image_to_image docs with curl,
SDK, and Python examples, covering its model-specific parameters
(layers, resolution, cfg_scale) and multi-image response format.

Made-with: Cursor

Signed-off-by: samithuang <285365963@qq.com>
@SamitHuang SamitHuang force-pushed the docs/diffusion-extra-body-guide branch from 843df29 to 537953a Compare March 21, 2026 07:17
@SamitHuang SamitHuang added the ready label to trigger buildkite CI label Mar 21, 2026
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

@princepride

Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments. The central guide page is a good addition but there are some internal inconsistencies to fix.

Comment thread docs/serving/diffusion_chat_api.md Outdated
Comment thread docs/serving/diffusion_chat_api.md Outdated
Comment thread docs/user_guide/examples/online_serving/image_to_image.md
with open(f"layer_{i}.png", "wb") as f:
f.write(base64.b64decode(b64_data))
```

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here — the Python requests example uses nested "extra_body". Per the new guide, raw HTTP clients should use top-level fields.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unified to use extra_body for chat/completions endpoint

Comment thread docs/user_guide/examples/online_serving/text_to_image.md
Comment thread examples/online_serving/text_to_image/README.md Outdated
Comment thread examples/online_serving/text_to_image/README.md Outdated
Comment thread docs/serving/diffusion_chat_api.md
Comment thread examples/online_serving/glm_image/README.md
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, have you test glm-image recently? I remember it has some bug after last time refactor😂

Copy link
Copy Markdown
Contributor

@fhfuih fhfuih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is really helpful that this PR clarify

  • The format ambiguity in /chat/completion endpoint
  • The concurrent support of /chat/competion and /image/XXX for image generation/editing tasks
  • Qwen Image Layered usage

But there might be a better way to organize doc pages and the information.

Left some comments above. I think

  • Whether to create a new page docs/serving/diffusion_chat_api.md requires some immediate discussion & consideration. Plus this doc page somehow mentions irrelevant content and repeating Generation Parameter Reference
  • For other comments in other pages, the content refinement is less urgent. Can simply resolve formatting suggestions and get this PR on first. In the future, there can be a systematic planning of online doc refinement, similar to #1244 (the offline one)

SamitHuang and others added 5 commits March 23, 2026 23:07
Co-authored-by: Zeyu Huang | 黃澤宇 <11222265+fhfuih@users.noreply.github.com>
Signed-off-by: Samit <285365963@qq.com>
- Remove `--cfg-parallel-size` from param tables (server CLI flag, not request param)
- Reframe diffusion_chat_api.md: nested `extra_body` is the primary
  format for curl/requests, remove "Legacy" label and endpoint overview table
- Update curl/requests examples in the guide to use nested `extra_body`
- Remove video-specific params from the chat API guide (out of scope)
- Unify note wording across READMEs for SDK vs JSON `extra_body`
- Fix glm_image.md parameter table intro for consistency

Made-with: Cursor

Signed-off-by: samithuang <285365963@qq.com>
Remove duplicated examples, parameter table, and image-editing section
that overlap with model-specific docs. Keep only the unique content:
the extra_body SDK-vs-JSON explanation and the "ignored fields" warning.
Add links to model-specific guides for full examples.

Addresses fhfuih's review feedback about single source of truth.

Made-with: Cursor

Signed-off-by: samithuang <285365963@qq.com>
Remove inline curl and OpenAI SDK code blocks that duplicate the general
text-to-image and image-to-image guides. Keep only the model-specific
script examples (openai_chat_client.py, run_curl_*.sh) and link to the
general guides for other request methods.

Addresses fhfuih's review feedback.

Made-with: Cursor

Signed-off-by: samithuang <285365963@qq.com>
Signed-off-by: Samit <285365963@qq.com>
Update the Generation Parameters sections in all model-specific docs
to clarify that /v1/images/generations and /v1/images/edits accept
parameters as top-level fields, while /v1/chat/completions requires
them inside extra_body.

Made-with: Cursor

Signed-off-by: samithuang <285365963@qq.com>
Copy link
Copy Markdown
Contributor

@fhfuih fhfuih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! And also confirm that the doc update request mentioned in #2053 is also included

Copy link
Copy Markdown
Collaborator

@gcanlin gcanlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc cleanup. I checked the documented defaults against the implementation and found a few mismatches:

  • The dedicated image endpoints do not accept all /v1/chat/completions extra_body field names as top-level fields. In particular, /v1/images/generations and /v1/images/edits use size / n for dimensions and count, while chat uses height / width / num_outputs_per_prompt.
  • Qwen-Image-Edit's effective default guidance_scale is 1.0, not 7.5.
  • GLM-Image does not have a fixed default seed of 42. If seed is omitted, chat leaves it unset, while /v1/images/* generates one server-side.

I left inline suggestions on the affected doc lines. The same fixes should probably also be mirrored in the corresponding examples/online_serving/*/README.md files, since they duplicate the same wording/tables.

Comment thread docs/user_guide/examples/online_serving/text_to_image.md Outdated
Comment thread docs/user_guide/examples/online_serving/image_to_image.md Outdated
Comment thread docs/user_guide/examples/online_serving/image_to_image.md Outdated
Comment thread docs/user_guide/examples/online_serving/glm_image.md Outdated
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
@gcanlin gcanlin enabled auto-merge (squash) March 26, 2026 08:41
@gcanlin gcanlin merged commit d96d63c into vllm-project:main Mar 26, 2026
5 of 6 checks passed
vraiti pushed a commit to vraiti/vllm-omni that referenced this pull request Apr 9, 2026
…llm-project#2051)

Signed-off-by: samithuang <285365963@qq.com>
Signed-off-by: Samit <285365963@qq.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Co-authored-by: Zeyu Huang | 黃澤宇 <11222265+fhfuih@users.noreply.github.com>
Co-authored-by: gcanlin <canlinguosdu@gmail.com>
@oglok oglok mentioned this pull request Apr 22, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants