[Doc] Improve diffusion generation parameter docs for online serving#2051
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bb0b93d009
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| payload: dict[str, object] = { | ||
| "messages": messages, | ||
| "height": height, | ||
| "width": width, | ||
| "num_inference_steps": steps, |
There was a problem hiding this comment.
Keep diffusion params in
extra_body for raw JSON clients
This change moves generation parameters to top-level JSON for requests/curl payloads, but the diffusion chat handler still reads only request.extra_body (see vllm_omni/entrypoints/openai/serving_chat.py around lines 2039-2071). In the current server behavior, fields like height, width, num_inference_steps, guidance_scale, and seed sent at top level are ignored, so these examples silently run with defaults and break reproducibility/tuning for users following the updated scripts.
Useful? React with 👍 / 👎.
2f73c32 to
843df29
Compare
|
I think the /chat/completion API problem may not be limited to diffusion handler. When I curl a BAGEL service, I can also observe extra_body appear in both request body and model_extra (although not flattened). But that still depends on whether we should frame BAGEL (and later models like GLM, with mixed stages) as diffusion models in our documentation, and expect their users to refer to this paged with "Diffusion" in its title |
Add a cross-cutting Diffusion Chat API guide explaining how to pass generation parameters (num_inference_steps, height, width, etc.) via /v1/chat/completions across different clients (curl, OpenAI SDK, Python requests). Update model-specific docs and example READMEs to add OpenAI SDK examples and cross-reference the new guide. Add Qwen-Image-Layered guidance to image_to_image docs with curl, SDK, and Python examples, covering its model-specific parameters (layers, resolution, cfg_scale) and multi-image response format. Made-with: Cursor Signed-off-by: samithuang <285365963@qq.com>
843df29 to
537953a
Compare
lishunyang12
left a comment
There was a problem hiding this comment.
Left a few comments. The central guide page is a good addition but there are some internal inconsistencies to fix.
| with open(f"layer_{i}.png", "wb") as f: | ||
| f.write(base64.b64decode(b64_data)) | ||
| ``` | ||
|
|
There was a problem hiding this comment.
Same here — the Python requests example uses nested "extra_body". Per the new guide, raw HTTP clients should use top-level fields.
There was a problem hiding this comment.
Unified to use extra_body for chat/completions endpoint
There was a problem hiding this comment.
Out of curiosity, have you test glm-image recently? I remember it has some bug after last time refactor😂
There was a problem hiding this comment.
I think it is really helpful that this PR clarify
- The format ambiguity in /chat/completion endpoint
- The concurrent support of /chat/competion and /image/XXX for image generation/editing tasks
- Qwen Image Layered usage
But there might be a better way to organize doc pages and the information.
Left some comments above. I think
- Whether to create a new page
docs/serving/diffusion_chat_api.mdrequires some immediate discussion & consideration. Plus this doc page somehow mentions irrelevant content and repeating Generation Parameter Reference - For other comments in other pages, the content refinement is less urgent. Can simply resolve formatting suggestions and get this PR on first. In the future, there can be a systematic planning of online doc refinement, similar to #1244 (the offline one)
Co-authored-by: Zeyu Huang | 黃澤宇 <11222265+fhfuih@users.noreply.github.com> Signed-off-by: Samit <285365963@qq.com>
- Remove `--cfg-parallel-size` from param tables (server CLI flag, not request param) - Reframe diffusion_chat_api.md: nested `extra_body` is the primary format for curl/requests, remove "Legacy" label and endpoint overview table - Update curl/requests examples in the guide to use nested `extra_body` - Remove video-specific params from the chat API guide (out of scope) - Unify note wording across READMEs for SDK vs JSON `extra_body` - Fix glm_image.md parameter table intro for consistency Made-with: Cursor Signed-off-by: samithuang <285365963@qq.com>
Remove duplicated examples, parameter table, and image-editing section that overlap with model-specific docs. Keep only the unique content: the extra_body SDK-vs-JSON explanation and the "ignored fields" warning. Add links to model-specific guides for full examples. Addresses fhfuih's review feedback about single source of truth. Made-with: Cursor Signed-off-by: samithuang <285365963@qq.com>
Remove inline curl and OpenAI SDK code blocks that duplicate the general text-to-image and image-to-image guides. Keep only the model-specific script examples (openai_chat_client.py, run_curl_*.sh) and link to the general guides for other request methods. Addresses fhfuih's review feedback. Made-with: Cursor Signed-off-by: samithuang <285365963@qq.com>
Signed-off-by: Samit <285365963@qq.com>
Update the Generation Parameters sections in all model-specific docs to clarify that /v1/images/generations and /v1/images/edits accept parameters as top-level fields, while /v1/chat/completions requires them inside extra_body. Made-with: Cursor Signed-off-by: samithuang <285365963@qq.com>
gcanlin
left a comment
There was a problem hiding this comment.
Thanks for the doc cleanup. I checked the documented defaults against the implementation and found a few mismatches:
- The dedicated image endpoints do not accept all
/v1/chat/completionsextra_bodyfield names as top-level fields. In particular,/v1/images/generationsand/v1/images/editsusesize/nfor dimensions and count, while chat usesheight/width/num_outputs_per_prompt. - Qwen-Image-Edit's effective default
guidance_scaleis1.0, not7.5. - GLM-Image does not have a fixed default seed of
42. Ifseedis omitted, chat leaves it unset, while/v1/images/*generates one server-side.
I left inline suggestions on the affected doc lines. The same fixes should probably also be mirrored in the corresponding examples/online_serving/*/README.md files, since they duplicate the same wording/tables.
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
…llm-project#2051) Signed-off-by: samithuang <285365963@qq.com> Signed-off-by: Samit <285365963@qq.com> Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: Zeyu Huang | 黃澤宇 <11222265+fhfuih@users.noreply.github.com> Co-authored-by: gcanlin <canlinguosdu@gmail.com>
Purpose
Improve documentation clarity and consistency for passing generation parameters
(e.g.,
num_inference_steps,height,width,guidance_scale) to diffusionmodels via the
/v1/chat/completionsendpoint.Currently, docs and examples exclusively show a nested
"extra_body": {...}JSONformat for curl requests. This format is non-standard and causes confusion because:
extra_body=parameter flattens fields into thetop-level request body — it does not send a literal
"extra_body"JSON key.Users following SDK conventions silently lose all generation parameters (related: [bugfix] /chat/completion doesn't read extra_body for diffusion model #2042).
extra_bodynaming collides with the SDK keyword, leading users to assumethey are equivalent when they are not.
client libraries.
This PR addresses the confusion by:
that explains the three supported parameter formats (top-level fields, OpenAI SDK
extra_body=, and legacy nested"extra_body"JSON key), with examples for curl,Python
requests, and the OpenAI SDK.text_to_image.md,image_to_image.md,glm_image.md) to show top-level parameter format in curl examples and addOpenAI SDK code samples.
openai_chat_client.pyandrun_curl_*.shforglm_imageandimage_to_image) to use top-level parameter format for consistencywith the SDK behavior.
Related: #2042 (code fix for
model_extrafallback in diffusion handler)Test Plan
mkdocs servethat the new page renders correctly and allcross-references resolve.
level changes (top-level instead of nested in
extra_body). This is fully backwardcompatible after [bugfix] /chat/completion doesn't read extra_body for diffusion model #2042 merges.
Test Result
N/A — documentation and example scripts only.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.