From 537953a31eaa0d1d99e9d7b8cec74ef141a2a005 Mon Sep 17 00:00:00 2001
From: samithuang <285365963@qq.com>
Date: Fri, 20 Mar 2026 16:14:07 +0000
Subject: [PATCH 1/7] [Doc] Improve diffusion generation parameter docs for
 online serving

Add a cross-cutting Diffusion Chat API guide explaining how to pass
generation parameters (num_inference_steps, height, width, etc.) via
/v1/chat/completions across different clients (curl, OpenAI SDK,
Python requests). Update model-specific docs and example READMEs to
add OpenAI SDK examples and cross-reference the new guide.

Add Qwen-Image-Layered guidance to image_to_image docs with curl,
SDK, and Python examples, covering its model-specific parameters
(layers, resolution, cfg_scale) and multi-image response format.

Made-with: Cursor

Signed-off-by: samithuang <285365963@qq.com>
---
 docs/.nav.yml                                 |   1 +
 docs/serving/diffusion_chat_api.md            | 230 ++++++++++++++++++
 .../examples/online_serving/glm_image.md      |  31 ++-
 .../examples/online_serving/image_to_image.md | 194 ++++++++++++++-
 .../examples/online_serving/text_to_image.md  |  51 +++-
 examples/online_serving/glm_image/README.md   |  35 ++-
 .../online_serving/image_to_image/README.md   | 141 ++++++++++-
 .../online_serving/text_to_image/README.md    |  41 +++-
 8 files changed, 705 insertions(+), 19 deletions(-)
 create mode 100644 docs/serving/diffusion_chat_api.md

diff --git a/docs/.nav.yml b/docs/.nav.yml
index bfa9365f6f6..14725cf1511 100644
--- a/docs/.nav.yml
+++ b/docs/.nav.yml
@@ -6,6 +6,7 @@ nav:
     - getting_started/installation/*
   - Serving:
     - OpenAI-Compatible API:
+      - Diffusion Chat API: serving/diffusion_chat_api.md
       - Image Generation: serving/image_generation_api.md
       - Image Edit: serving/image_edit_api.md
       - Text to Speech: serving/speech_api.md
diff --git a/docs/serving/diffusion_chat_api.md b/docs/serving/diffusion_chat_api.md
new file mode 100644
index 00000000000..41e9eada9f7
--- /dev/null
+++ b/docs/serving/diffusion_chat_api.md
@@ -0,0 +1,230 @@
+# Diffusion Chat Completions API
+
+vLLM-Omni supports generating images via the `/v1/chat/completions` endpoint using diffusion models.
+This page explains how to pass generation parameters (such as `num_inference_steps`, `height`, `width`)
+to diffusion models through this endpoint across different client libraries.
+
+!!! tip
+    For text-to-image generation without chat context, the dedicated
+    [`/v1/images/generations`](image_generation_api.md) endpoint accepts these
+    parameters as top-level fields and may be simpler for your use case.
+
+## API Endpoints Overview
+
+vLLM-Omni provides multiple endpoints for diffusion models. Each has its own parameter-passing
+convention:
+
+| Endpoint | Use Case | Parameter Format |
+|----------|----------|-----------------|
+| `/v1/chat/completions` | Image gen/edit via chat | Generation params in `extra_body` (see below) |
+| `/v1/images/generations` | Dedicated text-to-image | Top-level JSON fields |
+| `/v1/images/edits` | Dedicated image editing | Multipart form fields |
+| `/v1/videos` | Video generation | Multipart form fields |
+
+## Passing Generation Parameters via `/v1/chat/completions`
+
+The `/v1/chat/completions` endpoint follows the OpenAI Chat API schema, which does not natively
+include diffusion-specific fields like `num_inference_steps` or `height`. vLLM-Omni accepts
+these as **extra fields** on the request body.
+
+There are two supported methods depending on your client:
+
+### Method 1: Using curl or Python `requests`
+
+Put generation parameters as **top-level fields** in the JSON body alongside `messages`:
+
+=== "curl"
+
+    ```bash
+    curl -s http://localhost:8091/v1/chat/completions \
+      -H "Content-Type: application/json" \
+      -d '{
+        "messages": [
+          {"role": "user", "content": "A beautiful landscape painting"}
+        ],
+        "height": 1024,
+        "width": 1024,
+        "num_inference_steps": 50,
+        "true_cfg_scale": 4.0,
+        "seed": 42
+      }' | jq -r '.choices[0].message.content[0].image_url.url' \
+         | cut -d',' -f2- | base64 -d > output.png
+    ```
+
+=== "Python requests"
+
+    ```python
+    import requests
+    import base64
+
+    payload = {
+        "messages": [
+            {"role": "user", "content": "A beautiful landscape painting"}
+        ],
+        "height": 1024,
+        "width": 1024,
+        "num_inference_steps": 50,
+        "true_cfg_scale": 4.0,
+        "seed": 42,
+    }
+
+    resp = requests.post(
+        "http://localhost:8091/v1/chat/completions",
+        json=payload,
+        timeout=300,
+    )
+    data = resp.json()
+
+    img_url = data["choices"][0]["message"]["content"][0]["image_url"]["url"]
+    _, b64_data = img_url.split(",", 1)
+    with open("output.png", "wb") as f:
+        f.write(base64.b64decode(b64_data))
+    ```
+
+### Method 2: Using the OpenAI Python SDK
+
+The OpenAI Python SDK uses the `extra_body` keyword argument to pass non-standard fields.
+The SDK automatically merges these into the top-level request body:
+
+```python
+from openai import OpenAI
+import base64
+
+client = OpenAI(base_url="http://localhost:8091/v1", api_key="none")
+
+response = client.chat.completions.create(
+    model="Qwen/Qwen-Image",
+    messages=[
+        {"role": "user", "content": "A beautiful landscape painting"}
+    ],
+    extra_body={
+        "height": 1024,
+        "width": 1024,
+        "num_inference_steps": 50,
+        "true_cfg_scale": 4.0,
+        "seed": 42,
+    },
+)
+
+img_url = response.choices[0].message.content[0].image_url.url
+_, b64_data = img_url.split(",", 1)
+with open("output.png", "wb") as f:
+    f.write(base64.b64decode(b64_data))
+```
+
+### Legacy Format: Nested `extra_body` in JSON
+
+You may see examples that nest generation parameters inside an `"extra_body"` key in the
+JSON body. This format is still supported for backward compatibility:
+
+```json
+{
+  "messages": [{"role": "user", "content": "A beautiful landscape painting"}],
+  "extra_body": {
+    "height": 1024,
+    "width": 1024,
+    "num_inference_steps": 50
+  }
+}
+```
+
+Both formats (top-level fields and nested `extra_body`) are accepted.
+
+!!! note "About the `ignored fields` warning"
+    When sending non-standard fields, you may see a log message like:
+
+    ```
+    WARNING: The following fields were present in the request but ignored: {'height', 'width', ...}
+    ```
+
+    This warning is **harmless**. It is emitted by vLLM's request validation layer because
+    these fields are not part of the standard OpenAI `ChatCompletionRequest` schema.
+    The fields are still stored internally and correctly forwarded to the diffusion pipeline.
+
+## Image Editing (Image-to-Image)
+
+For image editing, include both text and image in the message content:
+
+=== "curl"
+
+    ```bash
+    IMG_B64=$(base64 -w0 input.png)
+
+    curl -s http://localhost:8092/v1/chat/completions \
+      -H "Content-Type: application/json" \
+      -d '{
+        "messages": [{
+          "role": "user",
+          "content": [
+            {"type": "text", "text": "Convert to watercolor style"},
+            {"type": "image_url", "image_url": {"url": "data:image/png;base64,'"$IMG_B64"'"}}
+          ]
+        }],
+        "num_inference_steps": 50,
+        "guidance_scale": 1,
+        "seed": 42
+      }' | jq -r '.choices[0].message.content[0].image_url.url' \
+         | cut -d',' -f2 | base64 -d > output.png
+    ```
+
+=== "OpenAI SDK"
+
+    ```python
+    import base64
+    from openai import OpenAI
+
+    client = OpenAI(base_url="http://localhost:8092/v1", api_key="none")
+
+    with open("input.png", "rb") as f:
+        img_b64 = base64.b64encode(f.read()).decode()
+
+    response = client.chat.completions.create(
+        model="Qwen/Qwen-Image-Edit",
+        messages=[{
+            "role": "user",
+            "content": [
+                {"type": "text", "text": "Convert to watercolor style"},
+                {"type": "image_url", "image_url": {
+                    "url": f"data:image/png;base64,{img_b64}"
+                }},
+            ],
+        }],
+        extra_body={
+            "num_inference_steps": 50,
+            "guidance_scale": 1,
+            "seed": 42,
+        },
+    )
+
+    img_url = response.choices[0].message.content[0].image_url.url
+    _, b64_data = img_url.split(",", 1)
+    with open("output.png", "wb") as f:
+        f.write(base64.b64decode(b64_data))
+    ```
+
+## Generation Parameters Reference
+
+The following parameters are accepted as extra fields on `/v1/chat/completions` for
+diffusion models:
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `height` | int | Output image height in pixels |
+| `width` | int | Output image width in pixels |
+| `size` | str | Output size in "WxH" format (alternative to separate height/width) |
+| `num_inference_steps` | int | Number of denoising steps |
+| `guidance_scale` | float | Classifier-free guidance scale |
+| `true_cfg_scale` | float | True CFG scale (Qwen-Image specific) |
+| `seed` | int | Random seed for reproducibility |
+| `negative_prompt` | str | Text describing what to avoid |
+| `num_outputs_per_prompt` | int | Number of images to generate (default: 1) |
+| `num_frames` | int | Number of frames (video models) |
+| `guidance_scale_2` | float | Secondary guidance scale (Wan2.2 models) |
+| `layers` | int | Number of layers to generate (Qwen-Image-Layered, default: 4) |
+| `resolution` | int | Resolution for dimension calculation (Qwen-Image-Layered, 640 or 1024) |
+| `lora` | object | Per-request LoRA adapter configuration |
+
+!!! info "Model-specific defaults"
+    When a parameter is not specified, the underlying diffusion pipeline applies its own
+    model-specific default. For example, `num_inference_steps` defaults to 50 for most models
+    but may differ for turbo/distilled variants.
diff --git a/docs/user_guide/examples/online_serving/glm_image.md b/docs/user_guide/examples/online_serving/glm_image.md
index c0d1764801a..d170a5511a4 100644
--- a/docs/user_guide/examples/online_serving/glm_image.md
+++ b/docs/user_guide/examples/online_serving/glm_image.md
@@ -104,6 +104,32 @@ curl -s http://localhost:8091/v1/chat/completions \
   }' | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > output.png
 ```
 
+**Using OpenAI SDK**
+
+```python
+from openai import OpenAI
+import base64
+
+client = OpenAI(base_url="http://localhost:8091/v1", api_key="none")
+
+response = client.chat.completions.create(
+    model="zai-org/GLM-Image",
+    messages=[{"role": "user", "content": "A beautiful sunset over the ocean"}],
+    extra_body={
+        "height": 1024,
+        "width": 1024,
+        "num_inference_steps": 50,
+        "guidance_scale": 1.5,
+        "seed": 42,
+    },
+)
+
+img_url = response.choices[0].message.content[0].image_url.url
+_, b64_data = img_url.split(",", 1)
+with open("output.png", "wb") as f:
+    f.write(base64.b64decode(b64_data))
+```
+
 Or use the script:
 
 ```bash
@@ -156,7 +182,10 @@ Or use the script:
 bash run_curl_image_edit.sh input.png "Convert to watercolor style"
 ```
 
-## Generation Parameters (extra_body)
+## Generation Parameters
+
+These can be passed as top-level fields in curl/requests, or via `extra_body` in the OpenAI SDK.
+See the [Diffusion Chat API guide](../../../../serving/diffusion_chat_api.md) for details.
 
 | Parameter             | Type  | Default | Description                         |
 | --------------------- | ----- | ------- | ----------------------------------- |
diff --git a/docs/user_guide/examples/online_serving/image_to_image.md b/docs/user_guide/examples/online_serving/image_to_image.md
index 6be2a4a7e82..6b446749739 100644
--- a/docs/user_guide/examples/online_serving/image_to_image.md
+++ b/docs/user_guide/examples/online_serving/image_to_image.md
@@ -69,10 +69,49 @@ cat <<EOF > request.json
 }
 EOF
 
-curl -s http://localhost:8092/v1/chat/completions   -H "Content-Type: application/json"   -d @request.json | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2 | base64 -d > output.png
+curl -s http://localhost:8092/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d @request.json \
+  | jq -r '.choices[0].message.content[0].image_url.url' \
+  | cut -d',' -f2 | base64 -d > output.png
 ```
 
-### Method 2: Using Python Client
+### Method 2: Using OpenAI Python SDK
+
+```python
+import base64
+from openai import OpenAI
+
+client = OpenAI(base_url="http://localhost:8092/v1", api_key="none")
+
+with open("input.png", "rb") as f:
+    img_b64 = base64.b64encode(f.read()).decode()
+
+response = client.chat.completions.create(
+    model="Qwen/Qwen-Image-Edit",
+    messages=[{
+        "role": "user",
+        "content": [
+            {"type": "text", "text": "Convert to watercolor style"},
+            {"type": "image_url", "image_url": {
+                "url": f"data:image/png;base64,{img_b64}"
+            }},
+        ],
+    }],
+    extra_body={
+        "num_inference_steps": 50,
+        "guidance_scale": 1,
+        "seed": 42,
+    },
+)
+
+img_url = response.choices[0].message.content[0].image_url.url
+_, b64_data = img_url.split(",", 1)
+with open("output.png", "wb") as f:
+    f.write(base64.b64decode(b64_data))
+```
+
+### Method 3: Using Python Client Script
 
 ```bash
 python openai_chat_client.py --input input.png --prompt "Convert to oil painting style" --output output.png
@@ -81,7 +120,7 @@ python openai_chat_client.py --input input.png --prompt "Convert to oil painting
 python openai_chat_client.py --input input1.png input2.png --prompt "Combine these images into a single scene" --output output.png
 ```
 
-### Method 3: Using Gradio Demo
+### Method 4: Using Gradio Demo
 
 ```bash
 python gradio_demo.py
@@ -124,7 +163,7 @@ python gradio_demo.py
 
 ### Image Editing with Parameters
 
-Use `extra_body` to pass generation parameters:
+Wrap generation parameters inside `extra_body` in the request JSON:
 
 ```json
 {
@@ -147,6 +186,149 @@ Use `extra_body` to pass generation parameters:
 }
 ```
 
+!!! tip "Using the OpenAI SDK"
+    When using the OpenAI Python SDK, pass these parameters via the `extra_body`
+    keyword argument. The SDK merges them into the top-level request body automatically:
+
+    ```python
+    client.chat.completions.create(
+        model="Qwen/Qwen-Image-Edit",
+        messages=[...],
+        extra_body={"num_inference_steps": 50, "guidance_scale": 7.5, "seed": 42},
+    )
+    ```
+
+    For details on how generation parameters are handled across different clients, see the
+    [Diffusion Chat API guide](../../../../serving/diffusion_chat_api.md).
+
+### Layered Image Generation (Qwen-Image-Layered)
+
+Qwen-Image-Layered generates multiple decomposed layers from a reference image and a text prompt.
+Start the server with:
+
+```bash
+vllm serve Qwen/Qwen-Image-Layered --omni --port 8093
+```
+
+=== "curl"
+
+    ```bash
+    IMG_B64=$(base64 -w0 input.png)
+
+    curl -sS http://localhost:8093/v1/chat/completions \
+      -H "Content-Type: application/json" \
+      -d "$(jq -n --arg img "$IMG_B64" '{
+        messages: [{
+          role: "user",
+          content: [
+            {type: "image_url", image_url: {url: ("data:image/png;base64," + $img)}},
+            {type: "text", text: "a rabbit"}
+          ]
+        }],
+        extra_body: {
+          num_inference_steps: 50,
+          cfg_scale: 4.0,
+          seed: 0,
+          layers: 4,
+          resolution: 640
+        }
+      }')" \
+      | jq -r '.choices[0].message.content[] | .image_url.url | split(",")[1]' \
+      | while IFS= read -r b64; do
+          ((i++)); echo "$b64" | base64 -d > "layer_${i}.png"
+        done
+    ```
+
+=== "OpenAI SDK"
+
+    ```python
+    import base64
+    from openai import OpenAI
+
+    client = OpenAI(base_url="http://localhost:8093/v1", api_key="none")
+
+    with open("input.png", "rb") as f:
+        img_b64 = base64.b64encode(f.read()).decode()
+
+    response = client.chat.completions.create(
+        model="Qwen/Qwen-Image-Layered",
+        messages=[{
+            "role": "user",
+            "content": [
+                {"type": "image_url", "image_url": {
+                    "url": f"data:image/png;base64,{img_b64}"
+                }},
+                {"type": "text", "text": "a rabbit"},
+            ],
+        }],
+        extra_body={
+            "num_inference_steps": 50,
+            "cfg_scale": 4.0,
+            "seed": 0,
+            "layers": 4,
+            "resolution": 640,
+        },
+    )
+
+    for i, item in enumerate(response.choices[0].message.content):
+        _, b64_data = item.image_url.url.split(",", 1)
+        with open(f"layer_{i}.png", "wb") as f:
+            f.write(base64.b64decode(b64_data))
+    ```
+
+=== "Python requests"
+
+    ```python
+    import base64
+    import requests
+
+    with open("input.png", "rb") as f:
+        img_b64 = base64.b64encode(f.read()).decode()
+
+    payload = {
+        "messages": [{
+            "role": "user",
+            "content": [
+                {"type": "image_url", "image_url": {
+                    "url": f"data:image/png;base64,{img_b64}"
+                }},
+                {"type": "text", "text": "a rabbit"},
+            ],
+        }],
+        "extra_body": {
+            "num_inference_steps": 50,
+            "cfg_scale": 4.0,
+            "seed": 0,
+            "layers": 4,
+            "resolution": 640,
+        },
+    }
+
+    resp = requests.post(
+        "http://localhost:8093/v1/chat/completions",
+        json=payload,
+        timeout=600,
+    )
+    data = resp.json()
+
+    for i, item in enumerate(data["choices"][0]["message"]["content"]):
+        _, b64_data = item["image_url"]["url"].split(",", 1)
+        with open(f"layer_{i}.png", "wb") as f:
+            f.write(base64.b64decode(b64_data))
+    ```
+
+The response contains multiple images in `choices[0].message.content` — one per generated layer.
+
+#### Qwen-Image-Layered Parameters
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `layers` | int | 4 | Number of layers to decompose |
+| `resolution` | int | 640 | Resolution for dimension calculation (640 or 1024) |
+| `cfg_scale` | float | 4.0 | Classifier-free guidance scale (alias for `true_cfg_scale`) |
+| `num_inference_steps` | int | 50 | Number of denoising steps |
+| `seed` | int | None | Random seed for reproducibility |
+
 ### Multi-Image Editing (Qwen-Image-Edit-2509)
 
 Provide multiple images in `content` (order matters):
@@ -166,7 +348,7 @@ Provide multiple images in `content` (order matters):
 }
 ```
 
-## Generation Parameters (extra_body)
+## Generation Parameters
 
 | Parameter                | Type  | Default | Description                           |
 | ------------------------ | ----- | ------- | ------------------------------------- |
@@ -178,6 +360,8 @@ Provide multiple images in `content` (order matters):
 | `seed`                   | int   | None    | Random seed (reproducible)            |
 | `negative_prompt`        | str   | None    | Negative prompt                       |
 | `num_outputs_per_prompt` | int   | 1       | Number of images to generate          |
+| `layers`                 | int   | 4       | Number of layers (Qwen-Image-Layered) |
+| `resolution`             | int   | 640     | Resolution, 640 or 1024 (Qwen-Image-Layered) |
 
 ## Response Format
 
diff --git a/docs/user_guide/examples/online_serving/text_to_image.md b/docs/user_guide/examples/online_serving/text_to_image.md
index 7931294883e..5ea4ba51156 100644
--- a/docs/user_guide/examples/online_serving/text_to_image.md
+++ b/docs/user_guide/examples/online_serving/text_to_image.md
@@ -71,13 +71,39 @@ curl -s http://localhost:8091/v1/chat/completions \
   }' | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > output.png
 ```
 
-### Method 2: Using Python Client
+### Method 2: Using OpenAI Python SDK
+
+```python
+from openai import OpenAI
+import base64
+
+client = OpenAI(base_url="http://localhost:8091/v1", api_key="none")
+
+response = client.chat.completions.create(
+    model="Qwen/Qwen-Image",
+    messages=[{"role": "user", "content": "A beautiful landscape painting"}],
+    extra_body={
+        "height": 1024,
+        "width": 1024,
+        "num_inference_steps": 50,
+        "true_cfg_scale": 4.0,
+        "seed": 42,
+    },
+)
+
+img_url = response.choices[0].message.content[0].image_url.url
+_, b64_data = img_url.split(",", 1)
+with open("output.png", "wb") as f:
+    f.write(base64.b64decode(b64_data))
+```
+
+### Method 3: Using Python Client Script
 
 ```bash
 python openai_chat_client.py --prompt "A beautiful landscape painting" --output output.png
 ```
 
-### Method 3: Using Gradio Demo
+### Method 4: Using Gradio Demo
 
 ```bash
 python gradio_demo.py
@@ -151,7 +177,7 @@ lora_adapter/
 
 ### Generation with Parameters
 
-Use `extra_body` to pass generation parameters:
+Wrap generation parameters inside `extra_body` in the request JSON:
 
 ```json
 {
@@ -168,6 +194,21 @@ Use `extra_body` to pass generation parameters:
 }
 ```
 
+!!! tip "Using the OpenAI SDK"
+    When using the OpenAI Python SDK, pass these parameters via the `extra_body`
+    keyword argument. The SDK merges them into the top-level request body automatically:
+
+    ```python
+    client.chat.completions.create(
+        model="Qwen/Qwen-Image",
+        messages=[...],
+        extra_body={"height": 1024, "width": 1024, "num_inference_steps": 50},
+    )
+    ```
+
+    For details on how generation parameters are handled across different clients, see the
+    [Diffusion Chat API guide](../../../../serving/diffusion_chat_api.md).
+
 ### Multimodal Input (Text + Structured Content)
 
 ```json
@@ -183,7 +224,7 @@ Use `extra_body` to pass generation parameters:
 }
 ```
 
-## Generation Parameters (extra_body)
+## Generation Parameters
 
 | Parameter                | Type  | Default | Description                    |
 | ------------------------ | ----- | ------- | ------------------------------ |
@@ -195,7 +236,7 @@ Use `extra_body` to pass generation parameters:
 | `seed`                   | int   | None    | Random seed (reproducible)     |
 | `negative_prompt`        | str   | None    | Negative prompt                |
 | `num_outputs_per_prompt` | int   | 1       | Number of images to generate   |
-| `--cfg-parallel-size`.   | int   | 1       | Number of GPUs for CFG parallelism |
+| `--cfg-parallel-size`    | int   | 1       | Number of GPUs for CFG parallelism |
 
 ## Response Format
 
diff --git a/examples/online_serving/glm_image/README.md b/examples/online_serving/glm_image/README.md
index 5efeba8068c..80dfcb2926c 100644
--- a/examples/online_serving/glm_image/README.md
+++ b/examples/online_serving/glm_image/README.md
@@ -101,6 +101,36 @@ curl -s http://localhost:8091/v1/chat/completions \
   }' | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > output.png
 ```
 
+**Using OpenAI SDK**
+
+```python
+from openai import OpenAI
+import base64
+
+client = OpenAI(base_url="http://localhost:8091/v1", api_key="none")
+
+response = client.chat.completions.create(
+    model="zai-org/GLM-Image",
+    messages=[{"role": "user", "content": "A beautiful sunset over the ocean"}],
+    extra_body={
+        "height": 1024,
+        "width": 1024,
+        "num_inference_steps": 50,
+        "guidance_scale": 1.5,
+        "seed": 42,
+    },
+)
+
+img_url = response.choices[0].message.content[0].image_url.url
+_, b64_data = img_url.split(",", 1)
+with open("output.png", "wb") as f:
+    f.write(base64.b64decode(b64_data))
+```
+
+> **Note:** The OpenAI SDK's `extra_body` keyword merges parameters into the top-level
+> request body. This is different from placing a literal `"extra_body"` key in the JSON
+> (as shown in the curl example), but both formats are supported by the server.
+
 Or use the script:
 
 ```bash
@@ -153,7 +183,10 @@ Or use the script:
 bash run_curl_image_edit.sh input.png "Convert to watercolor style"
 ```
 
-## Generation Parameters (extra_body)
+## Generation Parameters
+
+These parameters can be passed inside `extra_body` in the curl JSON, or via the
+`extra_body` keyword argument when using the OpenAI Python SDK.
 
 | Parameter             | Type  | Default | Description                         |
 | --------------------- | ----- | ------- | ----------------------------------- |
diff --git a/examples/online_serving/image_to_image/README.md b/examples/online_serving/image_to_image/README.md
index f69fa8b4286..c5a1cf9ea52 100644
--- a/examples/online_serving/image_to_image/README.md
+++ b/examples/online_serving/image_to_image/README.md
@@ -69,7 +69,46 @@ EOF
 curl -s http://localhost:8092/v1/chat/completions   -H "Content-Type: application/json"   -d @request.json | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2 | base64 -d > output.png
 ```
 
-### Method 2: Using Python Client
+### Method 2: Using OpenAI Python SDK
+
+```python
+import base64
+from openai import OpenAI
+
+client = OpenAI(base_url="http://localhost:8092/v1", api_key="none")
+
+with open("input.png", "rb") as f:
+    img_b64 = base64.b64encode(f.read()).decode()
+
+response = client.chat.completions.create(
+    model="Qwen/Qwen-Image-Edit",
+    messages=[{
+        "role": "user",
+        "content": [
+            {"type": "text", "text": "Convert to watercolor style"},
+            {"type": "image_url", "image_url": {
+                "url": f"data:image/png;base64,{img_b64}"
+            }},
+        ],
+    }],
+    extra_body={
+        "num_inference_steps": 50,
+        "guidance_scale": 1,
+        "seed": 42,
+    },
+)
+
+img_url = response.choices[0].message.content[0].image_url.url
+_, b64_data = img_url.split(",", 1)
+with open("output.png", "wb") as f:
+    f.write(base64.b64decode(b64_data))
+```
+
+> **Note:** The OpenAI SDK's `extra_body` keyword merges parameters into the top-level
+> request body. This is different from placing a literal `"extra_body"` key in the JSON
+> (as shown in the curl example), but both formats are supported by the server.
+
+### Method 3: Using Python Client Script
 
 ```bash
 python openai_chat_client.py --input input.png --prompt "Convert to oil painting style" --output output.png
@@ -78,7 +117,7 @@ python openai_chat_client.py --input input.png --prompt "Convert to oil painting
 python openai_chat_client.py --input input1.png input2.png --prompt "Combine these images into a single scene" --output output.png
 ```
 
-### Method 3: Using Gradio Demo
+### Method 4: Using Gradio Demo
 
 ```bash
 python gradio_demo.py
@@ -144,6 +183,97 @@ Use `extra_body` to pass generation parameters:
 }
 ```
 
+### Layered Image Generation (Qwen-Image-Layered)
+
+Qwen-Image-Layered generates multiple decomposed layers from a reference image and a text prompt.
+Start the server with:
+
+```bash
+vllm serve Qwen/Qwen-Image-Layered --omni --port 8093
+```
+
+**Using curl**
+
+```bash
+IMG_B64=$(base64 -w0 input.png)
+
+curl -sS http://localhost:8093/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d "$(jq -n --arg img "$IMG_B64" '{
+    messages: [{
+      role: "user",
+      content: [
+        {type: "image_url", image_url: {url: ("data:image/png;base64," + $img)}},
+        {type: "text", text: "a rabbit"}
+      ]
+    }],
+    extra_body: {
+      num_inference_steps: 50,
+      cfg_scale: 4.0,
+      seed: 0,
+      layers: 4,
+      resolution: 640
+    }
+  }')" \
+  | jq -r '.choices[0].message.content[] | .image_url.url | split(",")[1]' \
+  | while IFS= read -r b64; do
+      ((i++)); echo "$b64" | base64 -d > "layer_${i}.png"
+    done
+```
+
+**Using Python**
+
+```python
+import base64
+import requests
+
+with open("input.png", "rb") as f:
+    img_b64 = base64.b64encode(f.read()).decode()
+
+payload = {
+    "messages": [{
+        "role": "user",
+        "content": [
+            {"type": "image_url", "image_url": {
+                "url": f"data:image/png;base64,{img_b64}"
+            }},
+            {"type": "text", "text": "a rabbit"},
+        ],
+    }],
+    "extra_body": {
+        "num_inference_steps": 50,
+        "cfg_scale": 4.0,
+        "seed": 0,
+        "layers": 4,
+        "resolution": 640,
+    },
+}
+
+resp = requests.post(
+    "http://localhost:8093/v1/chat/completions",
+    json=payload,
+    timeout=600,
+)
+data = resp.json()
+
+for i, item in enumerate(data["choices"][0]["message"]["content"]):
+    _, b64_data = item["image_url"]["url"].split(",", 1)
+    with open(f"layer_{i}.png", "wb") as f:
+        f.write(base64.b64decode(b64_data))
+```
+
+The response contains multiple images in `choices[0].message.content` — one per generated layer.
+
+#### Qwen-Image-Layered Parameters
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `layers` | int | 4 | Number of layers to decompose |
+| `resolution` | int | 640 | Resolution for dimension calculation (640 or 1024) |
+| `cfg_scale` | float | 4.0 | Classifier-free guidance scale (alias for `true_cfg_scale`) |
+| `num_inference_steps` | int | 50 | Number of denoising steps |
+| `seed` | int | None | Random seed for reproducibility |
+
 ### Multi-Image Editing (Qwen-Image-Edit-2509)
 
 Provide multiple images in `content` (order matters):
@@ -163,7 +293,10 @@ Provide multiple images in `content` (order matters):
 }
 ```
 
-## Generation Parameters (extra_body)
+## Generation Parameters
+
+These parameters can be passed inside `extra_body` in the curl JSON, or via the
+`extra_body` keyword argument when using the OpenAI Python SDK.
 
 | Parameter                | Type  | Default | Description                           |
 | ------------------------ | ----- | ------- | ------------------------------------- |
@@ -175,6 +308,8 @@ Provide multiple images in `content` (order matters):
 | `seed`                   | int   | None    | Random seed (reproducible)            |
 | `negative_prompt`        | str   | None    | Negative prompt                       |
 | `num_outputs_per_prompt` | int   | 1       | Number of images to generate          |
+| `layers`                 | int   | 4       | Number of layers (Qwen-Image-Layered) |
+| `resolution`             | int   | 640     | Resolution, 640 or 1024 (Qwen-Image-Layered) |
 
 ## Response Format
 
diff --git a/examples/online_serving/text_to_image/README.md b/examples/online_serving/text_to_image/README.md
index 140036d00c7..528c22cf9eb 100644
--- a/examples/online_serving/text_to_image/README.md
+++ b/examples/online_serving/text_to_image/README.md
@@ -45,13 +45,43 @@ curl -s http://localhost:8091/v1/chat/completions \
   }' | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > output.png
 ```
 
-### Method 2: Using Python Client
+### Method 2: Using OpenAI Python SDK
+
+```python
+from openai import OpenAI
+import base64
+
+client = OpenAI(base_url="http://localhost:8091/v1", api_key="none")
+
+response = client.chat.completions.create(
+    model="Qwen/Qwen-Image",
+    messages=[{"role": "user", "content": "A beautiful landscape painting"}],
+    extra_body={
+        "height": 1024,
+        "width": 1024,
+        "num_inference_steps": 50,
+        "true_cfg_scale": 4.0,
+        "seed": 42,
+    },
+)
+
+img_url = response.choices[0].message.content[0].image_url.url
+_, b64_data = img_url.split(",", 1)
+with open("output.png", "wb") as f:
+    f.write(base64.b64decode(b64_data))
+```
+
+> **Note:** The OpenAI SDK's `extra_body` keyword merges parameters into the top-level
+> request body. This is different from placing a literal `"extra_body"` key in the JSON
+> (as shown in the curl example), but both formats are supported by the server.
+
+### Method 3: Using Python Client Script
 
 ```bash
 python openai_chat_client.py --prompt "A beautiful landscape painting" --output output.png
 ```
 
-### Method 3: Using Gradio Demo
+### Method 4: Using Gradio Demo
 
 ```bash
 python gradio_demo.py
@@ -157,7 +187,10 @@ Use `extra_body` to pass generation parameters:
 }
 ```
 
-## Generation Parameters (extra_body)
+## Generation Parameters
+
+These parameters can be passed inside `extra_body` in the curl JSON, or via the
+`extra_body` keyword argument when using the OpenAI Python SDK.
 
 | Parameter                | Type  | Default | Description                    |
 | ------------------------ | ----- | ------- | ------------------------------ |
@@ -169,7 +202,7 @@ Use `extra_body` to pass generation parameters:
 | `seed`                   | int   | None    | Random seed (reproducible)     |
 | `negative_prompt`        | str   | None    | Negative prompt                |
 | `num_outputs_per_prompt` | int   | 1       | Number of images to generate   |
-| `--cfg-parallel-size`.   | int   | 1       | Number of GPUs for CFG parallelism |
+| `--cfg-parallel-size`    | int   | 1       | Number of GPUs for CFG parallelism |
 
 ## Response Format
 

From 824420dd43abf90bc316bb52e9043c473c801736 Mon Sep 17 00:00:00 2001
From: Samit <285365963@qq.com>
Date: Mon, 23 Mar 2026 23:07:10 +0800
Subject: [PATCH 2/7] Update examples/online_serving/text_to_image/README.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Zeyu Huang | 黃澤宇 <11222265+fhfuih@users.noreply.github.com>
Signed-off-by: Samit <285365963@qq.com>
---
 examples/online_serving/text_to_image/README.md | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/examples/online_serving/text_to_image/README.md b/examples/online_serving/text_to_image/README.md
index 528c22cf9eb..7c92b431c18 100644
--- a/examples/online_serving/text_to_image/README.md
+++ b/examples/online_serving/text_to_image/README.md
@@ -71,9 +71,10 @@ with open("output.png", "wb") as f:
     f.write(base64.b64decode(b64_data))
 ```
 
-> **Note:** The OpenAI SDK's `extra_body` keyword merges parameters into the top-level
-> request body. This is different from placing a literal `"extra_body"` key in the JSON
-> (as shown in the curl example), but both formats are supported by the server.
+!!! note
+    The OpenAI SDK's `extra_body` keyword merges parameters into the top-level
+    request body. This is different from placing a literal `"extra_body"` key in the JSON
+    (as shown in the curl example), but both formats are supported by the server.
 
 ### Method 3: Using Python Client Script
 

From 718be6584eef613b7f1b435a32c923f5a6864c78 Mon Sep 17 00:00:00 2001
From: samithuang <285365963@qq.com>
Date: Mon, 23 Mar 2026 15:12:18 +0000
Subject: [PATCH 3/7] docs: address PR review feedback

- Remove `--cfg-parallel-size` from param tables (server CLI flag, not request param)
- Reframe diffusion_chat_api.md: nested `extra_body` is the primary
  format for curl/requests, remove "Legacy" label and endpoint overview table
- Update curl/requests examples in the guide to use nested `extra_body`
- Remove video-specific params from the chat API guide (out of scope)
- Unify note wording across READMEs for SDK vs JSON `extra_body`
- Fix glm_image.md parameter table intro for consistency

Made-with: Cursor

Signed-off-by: samithuang <285365963@qq.com>
---
 docs/serving/diffusion_chat_api.md            | 78 +++++++------------
 .../examples/online_serving/glm_image.md      |  3 +-
 .../examples/online_serving/text_to_image.md  |  1 -
 examples/online_serving/glm_image/README.md   |  8 +-
 .../online_serving/image_to_image/README.md   |  8 +-
 .../online_serving/text_to_image/README.md    |  8 +-
 6 files changed, 45 insertions(+), 61 deletions(-)

diff --git a/docs/serving/diffusion_chat_api.md b/docs/serving/diffusion_chat_api.md
index 41e9eada9f7..b0dfec2e9a2 100644
--- a/docs/serving/diffusion_chat_api.md
+++ b/docs/serving/diffusion_chat_api.md
@@ -9,29 +9,17 @@ to diffusion models through this endpoint across different client libraries.
     [`/v1/images/generations`](image_generation_api.md) endpoint accepts these
     parameters as top-level fields and may be simpler for your use case.
 
-## API Endpoints Overview
-
-vLLM-Omni provides multiple endpoints for diffusion models. Each has its own parameter-passing
-convention:
-
-| Endpoint | Use Case | Parameter Format |
-|----------|----------|-----------------|
-| `/v1/chat/completions` | Image gen/edit via chat | Generation params in `extra_body` (see below) |
-| `/v1/images/generations` | Dedicated text-to-image | Top-level JSON fields |
-| `/v1/images/edits` | Dedicated image editing | Multipart form fields |
-| `/v1/videos` | Video generation | Multipart form fields |
-
-## Passing Generation Parameters via `/v1/chat/completions`
+## Passing Generation Parameters
 
 The `/v1/chat/completions` endpoint follows the OpenAI Chat API schema, which does not natively
 include diffusion-specific fields like `num_inference_steps` or `height`. vLLM-Omni accepts
 these as **extra fields** on the request body.
 
-There are two supported methods depending on your client:
+How you pass these fields depends on your client:
 
-### Method 1: Using curl or Python `requests`
+### Using curl or Python `requests`
 
-Put generation parameters as **top-level fields** in the JSON body alongside `messages`:
+Wrap generation parameters inside an `"extra_body"` key in the JSON body:
 
 === "curl"
 
@@ -42,11 +30,13 @@ Put generation parameters as **top-level fields** in the JSON body alongside `me
         "messages": [
           {"role": "user", "content": "A beautiful landscape painting"}
         ],
-        "height": 1024,
-        "width": 1024,
-        "num_inference_steps": 50,
-        "true_cfg_scale": 4.0,
-        "seed": 42
+        "extra_body": {
+          "height": 1024,
+          "width": 1024,
+          "num_inference_steps": 50,
+          "true_cfg_scale": 4.0,
+          "seed": 42
+        }
       }' | jq -r '.choices[0].message.content[0].image_url.url' \
          | cut -d',' -f2- | base64 -d > output.png
     ```
@@ -61,11 +51,13 @@ Put generation parameters as **top-level fields** in the JSON body alongside `me
         "messages": [
             {"role": "user", "content": "A beautiful landscape painting"}
         ],
-        "height": 1024,
-        "width": 1024,
-        "num_inference_steps": 50,
-        "true_cfg_scale": 4.0,
-        "seed": 42,
+        "extra_body": {
+            "height": 1024,
+            "width": 1024,
+            "num_inference_steps": 50,
+            "true_cfg_scale": 4.0,
+            "seed": 42,
+        },
     }
 
     resp = requests.post(
@@ -81,7 +73,7 @@ Put generation parameters as **top-level fields** in the JSON body alongside `me
         f.write(base64.b64decode(b64_data))
     ```
 
-### Method 2: Using the OpenAI Python SDK
+### Using the OpenAI Python SDK
 
 The OpenAI Python SDK uses the `extra_body` keyword argument to pass non-standard fields.
 The SDK automatically merges these into the top-level request body:
@@ -112,23 +104,11 @@ with open("output.png", "wb") as f:
     f.write(base64.b64decode(b64_data))
 ```
 
-### Legacy Format: Nested `extra_body` in JSON
-
-You may see examples that nest generation parameters inside an `"extra_body"` key in the
-JSON body. This format is still supported for backward compatibility:
-
-```json
-{
-  "messages": [{"role": "user", "content": "A beautiful landscape painting"}],
-  "extra_body": {
-    "height": 1024,
-    "width": 1024,
-    "num_inference_steps": 50
-  }
-}
-```
-
-Both formats (top-level fields and nested `extra_body`) are accepted.
+!!! note "SDK `extra_body` vs. JSON `extra_body`"
+    The OpenAI SDK's `extra_body` keyword argument and the literal `"extra_body"` key in
+    curl/requests JSON serve the same purpose but work differently under the hood.
+    The SDK flattens `extra_body` fields into the top-level request body, while the JSON
+    approach nests them. Both are handled correctly by the server.
 
 !!! note "About the `ignored fields` warning"
     When sending non-standard fields, you may see a log message like:
@@ -160,9 +140,11 @@ For image editing, include both text and image in the message content:
             {"type": "image_url", "image_url": {"url": "data:image/png;base64,'"$IMG_B64"'"}}
           ]
         }],
-        "num_inference_steps": 50,
-        "guidance_scale": 1,
-        "seed": 42
+        "extra_body": {
+          "num_inference_steps": 50,
+          "guidance_scale": 1,
+          "seed": 42
+        }
       }' | jq -r '.choices[0].message.content[0].image_url.url' \
          | cut -d',' -f2 | base64 -d > output.png
     ```
@@ -218,8 +200,6 @@ diffusion models:
 | `seed` | int | Random seed for reproducibility |
 | `negative_prompt` | str | Text describing what to avoid |
 | `num_outputs_per_prompt` | int | Number of images to generate (default: 1) |
-| `num_frames` | int | Number of frames (video models) |
-| `guidance_scale_2` | float | Secondary guidance scale (Wan2.2 models) |
 | `layers` | int | Number of layers to generate (Qwen-Image-Layered, default: 4) |
 | `resolution` | int | Resolution for dimension calculation (Qwen-Image-Layered, 640 or 1024) |
 | `lora` | object | Per-request LoRA adapter configuration |
diff --git a/docs/user_guide/examples/online_serving/glm_image.md b/docs/user_guide/examples/online_serving/glm_image.md
index d170a5511a4..45dbb53dbac 100644
--- a/docs/user_guide/examples/online_serving/glm_image.md
+++ b/docs/user_guide/examples/online_serving/glm_image.md
@@ -184,7 +184,8 @@ bash run_curl_image_edit.sh input.png "Convert to watercolor style"
 
 ## Generation Parameters
 
-These can be passed as top-level fields in curl/requests, or via `extra_body` in the OpenAI SDK.
+These can be passed inside `extra_body` in the curl JSON, or via the
+`extra_body` keyword argument when using the OpenAI Python SDK.
 See the [Diffusion Chat API guide](../../../../serving/diffusion_chat_api.md) for details.
 
 | Parameter             | Type  | Default | Description                         |
diff --git a/docs/user_guide/examples/online_serving/text_to_image.md b/docs/user_guide/examples/online_serving/text_to_image.md
index 5ea4ba51156..73eb9613c9b 100644
--- a/docs/user_guide/examples/online_serving/text_to_image.md
+++ b/docs/user_guide/examples/online_serving/text_to_image.md
@@ -236,7 +236,6 @@ Wrap generation parameters inside `extra_body` in the request JSON:
 | `seed`                   | int   | None    | Random seed (reproducible)     |
 | `negative_prompt`        | str   | None    | Negative prompt                |
 | `num_outputs_per_prompt` | int   | 1       | Number of images to generate   |
-| `--cfg-parallel-size`    | int   | 1       | Number of GPUs for CFG parallelism |
 
 ## Response Format
 
diff --git a/examples/online_serving/glm_image/README.md b/examples/online_serving/glm_image/README.md
index 80dfcb2926c..16685ee5db7 100644
--- a/examples/online_serving/glm_image/README.md
+++ b/examples/online_serving/glm_image/README.md
@@ -127,9 +127,11 @@ with open("output.png", "wb") as f:
     f.write(base64.b64decode(b64_data))
 ```
 
-> **Note:** The OpenAI SDK's `extra_body` keyword merges parameters into the top-level
-> request body. This is different from placing a literal `"extra_body"` key in the JSON
-> (as shown in the curl example), but both formats are supported by the server.
+!!! note
+    The OpenAI SDK's `extra_body` keyword argument merges parameters into the
+    top-level request body automatically. When using curl or Python `requests`,
+    wrap generation parameters inside a literal `"extra_body"` key in the JSON
+    instead (as shown in the curl example above).
 
 Or use the script:
 
diff --git a/examples/online_serving/image_to_image/README.md b/examples/online_serving/image_to_image/README.md
index c5a1cf9ea52..d9cae7e27c4 100644
--- a/examples/online_serving/image_to_image/README.md
+++ b/examples/online_serving/image_to_image/README.md
@@ -104,9 +104,11 @@ with open("output.png", "wb") as f:
     f.write(base64.b64decode(b64_data))
 ```
 
-> **Note:** The OpenAI SDK's `extra_body` keyword merges parameters into the top-level
-> request body. This is different from placing a literal `"extra_body"` key in the JSON
-> (as shown in the curl example), but both formats are supported by the server.
+!!! note
+    The OpenAI SDK's `extra_body` keyword argument merges parameters into the
+    top-level request body automatically. When using curl or Python `requests`,
+    wrap generation parameters inside a literal `"extra_body"` key in the JSON
+    instead (as shown in the curl example above).
 
 ### Method 3: Using Python Client Script
 
diff --git a/examples/online_serving/text_to_image/README.md b/examples/online_serving/text_to_image/README.md
index 7c92b431c18..2f88e339a6c 100644
--- a/examples/online_serving/text_to_image/README.md
+++ b/examples/online_serving/text_to_image/README.md
@@ -72,9 +72,10 @@ with open("output.png", "wb") as f:
 ```
 
 !!! note
-    The OpenAI SDK's `extra_body` keyword merges parameters into the top-level
-    request body. This is different from placing a literal `"extra_body"` key in the JSON
-    (as shown in the curl example), but both formats are supported by the server.
+    The OpenAI SDK's `extra_body` keyword argument merges parameters into the
+    top-level request body automatically. When using curl or Python `requests`,
+    wrap generation parameters inside a literal `"extra_body"` key in the JSON
+    instead (as shown in the curl example above).
 
 ### Method 3: Using Python Client Script
 
@@ -203,7 +204,6 @@ These parameters can be passed inside `extra_body` in the curl JSON, or via the
 | `seed`                   | int   | None    | Random seed (reproducible)     |
 | `negative_prompt`        | str   | None    | Negative prompt                |
 | `num_outputs_per_prompt` | int   | 1       | Number of images to generate   |
-| `--cfg-parallel-size`    | int   | 1       | Number of GPUs for CFG parallelism |
 
 ## Response Format
 

From 77ecb0e6ea54a0ff8abcf56acf0664ceeefc553a Mon Sep 17 00:00:00 2001
From: samithuang <285365963@qq.com>
Date: Mon, 23 Mar 2026 15:25:08 +0000
Subject: [PATCH 4/7] docs: slim down diffusion_chat_api.md to avoid content
 duplication

Remove duplicated examples, parameter table, and image-editing section
that overlap with model-specific docs. Keep only the unique content:
the extra_body SDK-vs-JSON explanation and the "ignored fields" warning.
Add links to model-specific guides for full examples.

Addresses fhfuih's review feedback about single source of truth.

Made-with: Cursor

Signed-off-by: samithuang <285365963@qq.com>
---
 docs/serving/diffusion_chat_api.md | 218 ++++++-----------------------
 1 file changed, 43 insertions(+), 175 deletions(-)

diff --git a/docs/serving/diffusion_chat_api.md b/docs/serving/diffusion_chat_api.md
index b0dfec2e9a2..d0e2990ad6c 100644
--- a/docs/serving/diffusion_chat_api.md
+++ b/docs/serving/diffusion_chat_api.md
@@ -1,210 +1,78 @@
 # Diffusion Chat Completions API
 
-vLLM-Omni supports generating images via the `/v1/chat/completions` endpoint using diffusion models.
-This page explains how to pass generation parameters (such as `num_inference_steps`, `height`, `width`)
-to diffusion models through this endpoint across different client libraries.
+vLLM-Omni supports generating and editing images via the `/v1/chat/completions`
+endpoint using diffusion models. This page explains how to pass generation
+parameters (such as `num_inference_steps`, `height`, `width`) to diffusion
+models through this endpoint.
 
 !!! tip
-    For text-to-image generation without chat context, the dedicated
-    [`/v1/images/generations`](image_generation_api.md) endpoint accepts these
-    parameters as top-level fields and may be simpler for your use case.
+    For dedicated endpoints that accept generation parameters as top-level
+    fields, see [Image Generation API](image_generation_api.md) and
+    [Image Edit API](image_edit_api.md).
 
 ## Passing Generation Parameters
 
-The `/v1/chat/completions` endpoint follows the OpenAI Chat API schema, which does not natively
-include diffusion-specific fields like `num_inference_steps` or `height`. vLLM-Omni accepts
-these as **extra fields** on the request body.
+The `/v1/chat/completions` endpoint follows the OpenAI Chat API schema, which
+does not natively include diffusion-specific fields like `num_inference_steps`
+or `height`. How you pass these extra fields depends on your client.
 
-How you pass these fields depends on your client:
-
-### Using curl or Python `requests`
+### curl / Python `requests`
 
 Wrap generation parameters inside an `"extra_body"` key in the JSON body:
 
-=== "curl"
-
-    ```bash
-    curl -s http://localhost:8091/v1/chat/completions \
-      -H "Content-Type: application/json" \
-      -d '{
-        "messages": [
-          {"role": "user", "content": "A beautiful landscape painting"}
-        ],
-        "extra_body": {
-          "height": 1024,
-          "width": 1024,
-          "num_inference_steps": 50,
-          "true_cfg_scale": 4.0,
-          "seed": 42
-        }
-      }' | jq -r '.choices[0].message.content[0].image_url.url' \
-         | cut -d',' -f2- | base64 -d > output.png
-    ```
-
-=== "Python requests"
-
-    ```python
-    import requests
-    import base64
-
-    payload = {
-        "messages": [
-            {"role": "user", "content": "A beautiful landscape painting"}
-        ],
-        "extra_body": {
-            "height": 1024,
-            "width": 1024,
-            "num_inference_steps": 50,
-            "true_cfg_scale": 4.0,
-            "seed": 42,
-        },
+```bash
+curl -s http://localhost:8091/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "messages": [
+      {"role": "user", "content": "A beautiful landscape painting"}
+    ],
+    "extra_body": {
+      "num_inference_steps": 50,
+      "seed": 42
     }
+  }'
+```
 
-    resp = requests.post(
-        "http://localhost:8091/v1/chat/completions",
-        json=payload,
-        timeout=300,
-    )
-    data = resp.json()
-
-    img_url = data["choices"][0]["message"]["content"][0]["image_url"]["url"]
-    _, b64_data = img_url.split(",", 1)
-    with open("output.png", "wb") as f:
-        f.write(base64.b64decode(b64_data))
-    ```
-
-### Using the OpenAI Python SDK
+### OpenAI Python SDK
 
-The OpenAI Python SDK uses the `extra_body` keyword argument to pass non-standard fields.
-The SDK automatically merges these into the top-level request body:
+Use the `extra_body` **keyword argument**. The SDK automatically merges these
+fields into the top-level request body:
 
 ```python
-from openai import OpenAI
-import base64
-
-client = OpenAI(base_url="http://localhost:8091/v1", api_key="none")
-
 response = client.chat.completions.create(
     model="Qwen/Qwen-Image",
-    messages=[
-        {"role": "user", "content": "A beautiful landscape painting"}
-    ],
+    messages=[{"role": "user", "content": "A beautiful landscape painting"}],
     extra_body={
-        "height": 1024,
-        "width": 1024,
         "num_inference_steps": 50,
-        "true_cfg_scale": 4.0,
         "seed": 42,
     },
 )
-
-img_url = response.choices[0].message.content[0].image_url.url
-_, b64_data = img_url.split(",", 1)
-with open("output.png", "wb") as f:
-    f.write(base64.b64decode(b64_data))
 ```
 
 !!! note "SDK `extra_body` vs. JSON `extra_body`"
-    The OpenAI SDK's `extra_body` keyword argument and the literal `"extra_body"` key in
-    curl/requests JSON serve the same purpose but work differently under the hood.
-    The SDK flattens `extra_body` fields into the top-level request body, while the JSON
-    approach nests them. Both are handled correctly by the server.
+    These two `extra_body` usages look similar but work differently under the
+    hood. The SDK flattens the dict into the top-level request JSON, while the
+    curl/requests approach sends it as a nested `"extra_body"` key. Both are
+    handled correctly by the server.
 
 !!! note "About the `ignored fields` warning"
-    When sending non-standard fields, you may see a log message like:
+    You may see a log message like:
 
     ```
     WARNING: The following fields were present in the request but ignored: {'height', 'width', ...}
     ```
 
-    This warning is **harmless**. It is emitted by vLLM's request validation layer because
-    these fields are not part of the standard OpenAI `ChatCompletionRequest` schema.
-    The fields are still stored internally and correctly forwarded to the diffusion pipeline.
-
-## Image Editing (Image-to-Image)
-
-For image editing, include both text and image in the message content:
-
-=== "curl"
-
-    ```bash
-    IMG_B64=$(base64 -w0 input.png)
-
-    curl -s http://localhost:8092/v1/chat/completions \
-      -H "Content-Type: application/json" \
-      -d '{
-        "messages": [{
-          "role": "user",
-          "content": [
-            {"type": "text", "text": "Convert to watercolor style"},
-            {"type": "image_url", "image_url": {"url": "data:image/png;base64,'"$IMG_B64"'"}}
-          ]
-        }],
-        "extra_body": {
-          "num_inference_steps": 50,
-          "guidance_scale": 1,
-          "seed": 42
-        }
-      }' | jq -r '.choices[0].message.content[0].image_url.url' \
-         | cut -d',' -f2 | base64 -d > output.png
-    ```
+    This is **harmless**. It is emitted by vLLM's request validation layer
+    because these fields are not part of the standard OpenAI
+    `ChatCompletionRequest` schema. The fields are still stored internally
+    and correctly forwarded to the diffusion pipeline.
 
-=== "OpenAI SDK"
-
-    ```python
-    import base64
-    from openai import OpenAI
-
-    client = OpenAI(base_url="http://localhost:8092/v1", api_key="none")
-
-    with open("input.png", "rb") as f:
-        img_b64 = base64.b64encode(f.read()).decode()
-
-    response = client.chat.completions.create(
-        model="Qwen/Qwen-Image-Edit",
-        messages=[{
-            "role": "user",
-            "content": [
-                {"type": "text", "text": "Convert to watercolor style"},
-                {"type": "image_url", "image_url": {
-                    "url": f"data:image/png;base64,{img_b64}"
-                }},
-            ],
-        }],
-        extra_body={
-            "num_inference_steps": 50,
-            "guidance_scale": 1,
-            "seed": 42,
-        },
-    )
-
-    img_url = response.choices[0].message.content[0].image_url.url
-    _, b64_data = img_url.split(",", 1)
-    with open("output.png", "wb") as f:
-        f.write(base64.b64decode(b64_data))
-    ```
+## Model-Specific Examples
+
+For complete examples with full request/response details, see the model-specific
+guides:
 
-## Generation Parameters Reference
-
-The following parameters are accepted as extra fields on `/v1/chat/completions` for
-diffusion models:
-
-| Parameter | Type | Description |
-|-----------|------|-------------|
-| `height` | int | Output image height in pixels |
-| `width` | int | Output image width in pixels |
-| `size` | str | Output size in "WxH" format (alternative to separate height/width) |
-| `num_inference_steps` | int | Number of denoising steps |
-| `guidance_scale` | float | Classifier-free guidance scale |
-| `true_cfg_scale` | float | True CFG scale (Qwen-Image specific) |
-| `seed` | int | Random seed for reproducibility |
-| `negative_prompt` | str | Text describing what to avoid |
-| `num_outputs_per_prompt` | int | Number of images to generate (default: 1) |
-| `layers` | int | Number of layers to generate (Qwen-Image-Layered, default: 4) |
-| `resolution` | int | Resolution for dimension calculation (Qwen-Image-Layered, 640 or 1024) |
-| `lora` | object | Per-request LoRA adapter configuration |
-
-!!! info "Model-specific defaults"
-    When a parameter is not specified, the underlying diffusion pipeline applies its own
-    model-specific default. For example, `num_inference_steps` defaults to 50 for most models
-    but may differ for turbo/distilled variants.
+- [Text-to-Image (Qwen-Image)](../user_guide/examples/online_serving/text_to_image.md)
+- [Image-to-Image (Qwen-Image-Edit, Qwen-Image-Layered)](../user_guide/examples/online_serving/image_to_image.md)
+- [GLM-Image](../user_guide/examples/online_serving/glm_image.md)

From da4b4fd48cce74544681463cbf74da295918cc55 Mon Sep 17 00:00:00 2001
From: samithuang <285365963@qq.com>
Date: Mon, 23 Mar 2026 15:29:30 +0000
Subject: [PATCH 5/7] docs: simplify glm_image docs to avoid repeating generic
 request methods

Remove inline curl and OpenAI SDK code blocks that duplicate the general
text-to-image and image-to-image guides. Keep only the model-specific
script examples (openai_chat_client.py, run_curl_*.sh) and link to the
general guides for other request methods.

Addresses fhfuih's review feedback.

Made-with: Cursor

Signed-off-by: samithuang <285365963@qq.com>
---
 .../examples/online_serving/glm_image.md      |  94 ++--------------
 examples/online_serving/glm_image/README.md   | 100 ++----------------
 2 files changed, 12 insertions(+), 182 deletions(-)

diff --git a/docs/user_guide/examples/online_serving/glm_image.md b/docs/user_guide/examples/online_serving/glm_image.md
index 45dbb53dbac..bc151d6f84b 100644
--- a/docs/user_guide/examples/online_serving/glm_image.md
+++ b/docs/user_guide/examples/online_serving/glm_image.md
@@ -73,115 +73,33 @@ The default yaml configuration deploys AR on GPU 0 and DiT on GPU 1. You can use
 
 ### Text-to-Image
 
-Generate images from text prompts:
-
-**Using Python client**
-
 ```bash
 python openai_chat_client.py \
     --prompt "A photorealistic mountain landscape at sunset" \
     --height 1024 \
     --width 1024 \
     --output landscape.png
-```
-
-**Using curl**
 
-```bash
-curl -s http://localhost:8091/v1/chat/completions \
-  -H "Content-Type: application/json" \
-  -d '{
-    "messages": [
-      {"role": "user", "content": "A beautiful sunset over the ocean with sailing boats"}
-    ],
-    "extra_body": {
-      "height": 1024,
-      "width": 1024,
-      "num_inference_steps": 50,
-      "guidance_scale": 1.5,
-      "seed": 42
-    }
-  }' | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > output.png
-```
-
-**Using OpenAI SDK**
-
-```python
-from openai import OpenAI
-import base64
-
-client = OpenAI(base_url="http://localhost:8091/v1", api_key="none")
-
-response = client.chat.completions.create(
-    model="zai-org/GLM-Image",
-    messages=[{"role": "user", "content": "A beautiful sunset over the ocean"}],
-    extra_body={
-        "height": 1024,
-        "width": 1024,
-        "num_inference_steps": 50,
-        "guidance_scale": 1.5,
-        "seed": 42,
-    },
-)
-
-img_url = response.choices[0].message.content[0].image_url.url
-_, b64_data = img_url.split(",", 1)
-with open("output.png", "wb") as f:
-    f.write(base64.b64decode(b64_data))
-```
-
-Or use the script:
-
-```bash
+# Or use the curl script:
 bash run_curl_text_to_image.sh "A futuristic city skyline at night"
 ```
 
 ### Image-to-Image (Image Editing)
 
-Edit images with text instructions:
-
-**Using Python client**
-
 ```bash
 python openai_chat_client.py \
     --prompt "Convert this image to watercolor style" \
     --image input.png \
     --output watercolor.png
-```
-
-**Using curl**
-
-```bash
-IMG_B64=$(base64 < input.png | tr -d '\n')
-
-curl -s http://localhost:8091/v1/chat/completions \
-  -H "Content-Type: application/json" \
-  -d @- <<EOF | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > output.png
-{
-  "messages": [{
-    "role": "user",
-    "content": [
-      {"type": "text", "text": "Convert this image to watercolor style"},
-      {"type": "image_url", "image_url": {"url": "data:image/png;base64,'$IMG_B64'"}}
-    ]
-  }],
-  "extra_body": {
-    "height": 1024,
-    "width": 1024,
-    "num_inference_steps": 50,
-    "guidance_scale": 1.5,
-    "seed": 42
-  }
-}
-EOF
-```
 
-Or use the script:
-
-```bash
+# Or use the curl script:
 bash run_curl_image_edit.sh input.png "Convert to watercolor style"
 ```
 
+For general-purpose request methods (curl, OpenAI SDK, Python `requests`), see
+the [Text-to-Image](text_to_image.md) and [Image-to-Image](image_to_image.md)
+guides.
+
 ## Generation Parameters
 
 These can be passed inside `extra_body` in the curl JSON, or via the
diff --git a/examples/online_serving/glm_image/README.md b/examples/online_serving/glm_image/README.md
index 16685ee5db7..2a7e301e70e 100644
--- a/examples/online_serving/glm_image/README.md
+++ b/examples/online_serving/glm_image/README.md
@@ -70,121 +70,33 @@ The default yaml configuration deploys AR on GPU 0 and DiT on GPU 1. You can use
 
 ### Text-to-Image
 
-Generate images from text prompts:
-
-**Using Python client**
-
 ```bash
 python openai_chat_client.py \
     --prompt "A photorealistic mountain landscape at sunset" \
     --height 1024 \
     --width 1024 \
     --output landscape.png
-```
-
-**Using curl**
 
-```bash
-curl -s http://localhost:8091/v1/chat/completions \
-  -H "Content-Type: application/json" \
-  -d '{
-    "messages": [
-      {"role": "user", "content": "A beautiful sunset over the ocean with sailing boats"}
-    ],
-    "extra_body": {
-      "height": 1024,
-      "width": 1024,
-      "num_inference_steps": 50,
-      "guidance_scale": 1.5,
-      "seed": 42
-    }
-  }' | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > output.png
-```
-
-**Using OpenAI SDK**
-
-```python
-from openai import OpenAI
-import base64
-
-client = OpenAI(base_url="http://localhost:8091/v1", api_key="none")
-
-response = client.chat.completions.create(
-    model="zai-org/GLM-Image",
-    messages=[{"role": "user", "content": "A beautiful sunset over the ocean"}],
-    extra_body={
-        "height": 1024,
-        "width": 1024,
-        "num_inference_steps": 50,
-        "guidance_scale": 1.5,
-        "seed": 42,
-    },
-)
-
-img_url = response.choices[0].message.content[0].image_url.url
-_, b64_data = img_url.split(",", 1)
-with open("output.png", "wb") as f:
-    f.write(base64.b64decode(b64_data))
-```
-
-!!! note
-    The OpenAI SDK's `extra_body` keyword argument merges parameters into the
-    top-level request body automatically. When using curl or Python `requests`,
-    wrap generation parameters inside a literal `"extra_body"` key in the JSON
-    instead (as shown in the curl example above).
-
-Or use the script:
-
-```bash
+# Or use the curl script:
 bash run_curl_text_to_image.sh "A futuristic city skyline at night"
 ```
 
 ### Image-to-Image (Image Editing)
 
-Edit images with text instructions:
-
-**Using Python client**
-
 ```bash
 python openai_chat_client.py \
     --prompt "Convert this image to watercolor style" \
     --image input.png \
     --output watercolor.png
-```
-
-**Using curl**
 
-```bash
-IMG_B64=$(base64 < input.png | tr -d '\n')
-
-curl -s http://localhost:8091/v1/chat/completions \
-  -H "Content-Type: application/json" \
-  -d @- <<EOF | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > output.png
-{
-  "messages": [{
-    "role": "user",
-    "content": [
-      {"type": "text", "text": "Convert this image to watercolor style"},
-      {"type": "image_url", "image_url": {"url": "data:image/png;base64,'$IMG_B64'"}}
-    ]
-  }],
-  "extra_body": {
-    "height": 1024,
-    "width": 1024,
-    "num_inference_steps": 50,
-    "guidance_scale": 1.5,
-    "seed": 42
-  }
-}
-EOF
-```
-
-Or use the script:
-
-```bash
+# Or use the curl script:
 bash run_curl_image_edit.sh input.png "Convert to watercolor style"
 ```
 
+For general-purpose request methods (curl, OpenAI SDK, Python `requests`), see
+the [Text-to-Image](../text_to_image/README.md) and
+[Image-to-Image](../image_to_image/README.md) guides.
+
 ## Generation Parameters
 
 These parameters can be passed inside `extra_body` in the curl JSON, or via the

From 337287a23d6fdc1f6f2ed0c9b40b920dd13f57fc Mon Sep 17 00:00:00 2001
From: samithuang <285365963@qq.com>
Date: Tue, 24 Mar 2026 07:49:18 +0000
Subject: [PATCH 6/7] docs: mention dedicated endpoints support top-level
 parameters

Update the Generation Parameters sections in all model-specific docs
to clarify that /v1/images/generations and /v1/images/edits accept
parameters as top-level fields, while /v1/chat/completions requires
them inside extra_body.

Made-with: Cursor

Signed-off-by: samithuang <285365963@qq.com>
---
 docs/user_guide/examples/online_serving/glm_image.md     | 9 ++++++---
 .../user_guide/examples/online_serving/image_to_image.md | 6 ++++++
 docs/user_guide/examples/online_serving/text_to_image.md | 6 ++++++
 examples/online_serving/glm_image/README.md              | 6 ++++--
 examples/online_serving/image_to_image/README.md         | 6 ++++--
 examples/online_serving/text_to_image/README.md          | 6 ++++--
 6 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/docs/user_guide/examples/online_serving/glm_image.md b/docs/user_guide/examples/online_serving/glm_image.md
index bc151d6f84b..37d7de6a64c 100644
--- a/docs/user_guide/examples/online_serving/glm_image.md
+++ b/docs/user_guide/examples/online_serving/glm_image.md
@@ -102,9 +102,12 @@ guides.
 
 ## Generation Parameters
 
-These can be passed inside `extra_body` in the curl JSON, or via the
-`extra_body` keyword argument when using the OpenAI Python SDK.
-See the [Diffusion Chat API guide](../../../../serving/diffusion_chat_api.md) for details.
+When using `/v1/chat/completions`, pass these inside `extra_body` in the curl
+JSON, or via the `extra_body` keyword argument in the OpenAI Python SDK (see the
+[Diffusion Chat API guide](../../../../serving/diffusion_chat_api.md)).
+When using the dedicated [`/v1/images/generations`](../../../../serving/image_generation_api.md)
+or [`/v1/images/edits`](../../../../serving/image_edit_api.md) endpoints, pass
+them as top-level fields directly.
 
 | Parameter             | Type  | Default | Description                         |
 | --------------------- | ----- | ------- | ----------------------------------- |
diff --git a/docs/user_guide/examples/online_serving/image_to_image.md b/docs/user_guide/examples/online_serving/image_to_image.md
index 6b446749739..da6cbf220de 100644
--- a/docs/user_guide/examples/online_serving/image_to_image.md
+++ b/docs/user_guide/examples/online_serving/image_to_image.md
@@ -350,6 +350,12 @@ Provide multiple images in `content` (order matters):
 
 ## Generation Parameters
 
+When using `/v1/chat/completions`, pass these inside `extra_body` in the curl
+JSON, or via the `extra_body` keyword argument in the OpenAI Python SDK (see the
+[Diffusion Chat API guide](../../../../serving/diffusion_chat_api.md)).
+When using the dedicated [`/v1/images/edits`](../../../../serving/image_edit_api.md)
+endpoint, pass them as top-level form fields directly.
+
 | Parameter                | Type  | Default | Description                           |
 | ------------------------ | ----- | ------- | ------------------------------------- |
 | `height`                 | int   | None    | Output image height in pixels         |
diff --git a/docs/user_guide/examples/online_serving/text_to_image.md b/docs/user_guide/examples/online_serving/text_to_image.md
index a1f2b8c9997..9d29cd5063c 100644
--- a/docs/user_guide/examples/online_serving/text_to_image.md
+++ b/docs/user_guide/examples/online_serving/text_to_image.md
@@ -226,6 +226,12 @@ Wrap generation parameters inside `extra_body` in the request JSON:
 
 ## Generation Parameters
 
+When using `/v1/chat/completions`, pass these inside `extra_body` in the curl
+JSON, or via the `extra_body` keyword argument in the OpenAI Python SDK (see the
+[Diffusion Chat API guide](../../../../serving/diffusion_chat_api.md)).
+When using the dedicated [`/v1/images/generations`](../../../../serving/image_generation_api.md)
+endpoint, pass them as top-level JSON fields directly.
+
 | Parameter                | Type  | Default | Description                    |
 | ------------------------ | ----- | ------- | ------------------------------ |
 | `height`                 | int   | None    | Image height in pixels         |
diff --git a/examples/online_serving/glm_image/README.md b/examples/online_serving/glm_image/README.md
index 2a7e301e70e..54a4708a606 100644
--- a/examples/online_serving/glm_image/README.md
+++ b/examples/online_serving/glm_image/README.md
@@ -99,8 +99,10 @@ the [Text-to-Image](../text_to_image/README.md) and
 
 ## Generation Parameters
 
-These parameters can be passed inside `extra_body` in the curl JSON, or via the
-`extra_body` keyword argument when using the OpenAI Python SDK.
+When using `/v1/chat/completions`, pass these inside `extra_body` in the curl
+JSON, or via the `extra_body` keyword argument in the OpenAI Python SDK.
+When using the dedicated `/v1/images/generations` or `/v1/images/edits`
+endpoints, pass them as top-level fields directly.
 
 | Parameter             | Type  | Default | Description                         |
 | --------------------- | ----- | ------- | ----------------------------------- |
diff --git a/examples/online_serving/image_to_image/README.md b/examples/online_serving/image_to_image/README.md
index d9cae7e27c4..1d0a1d3961d 100644
--- a/examples/online_serving/image_to_image/README.md
+++ b/examples/online_serving/image_to_image/README.md
@@ -297,8 +297,10 @@ Provide multiple images in `content` (order matters):
 
 ## Generation Parameters
 
-These parameters can be passed inside `extra_body` in the curl JSON, or via the
-`extra_body` keyword argument when using the OpenAI Python SDK.
+When using `/v1/chat/completions`, pass these inside `extra_body` in the curl
+JSON, or via the `extra_body` keyword argument in the OpenAI Python SDK.
+When using the dedicated `/v1/images/edits` endpoint, pass them as top-level
+form fields directly.
 
 | Parameter                | Type  | Default | Description                           |
 | ------------------------ | ----- | ------- | ------------------------------------- |
diff --git a/examples/online_serving/text_to_image/README.md b/examples/online_serving/text_to_image/README.md
index af7e5857722..af27bc05602 100644
--- a/examples/online_serving/text_to_image/README.md
+++ b/examples/online_serving/text_to_image/README.md
@@ -214,8 +214,10 @@ Use `extra_body` to pass generation parameters:
 
 ## Generation Parameters
 
-These parameters can be passed inside `extra_body` in the curl JSON, or via the
-`extra_body` keyword argument when using the OpenAI Python SDK.
+When using `/v1/chat/completions`, pass these inside `extra_body` in the curl
+JSON, or via the `extra_body` keyword argument in the OpenAI Python SDK.
+When using the dedicated `/v1/images/generations` endpoint, pass them as
+top-level JSON fields directly.
 
 | Parameter                | Type  | Default | Description                    |
 | ------------------------ | ----- | ------- | ------------------------------ |

From b07b9dca13b1b2e09cc2bf9f39f7f2e7965d5b5e Mon Sep 17 00:00:00 2001
From: gcanlin <canlinguosdu@gmail.com>
Date: Thu, 26 Mar 2026 08:40:32 +0000
Subject: [PATCH 7/7] docs: fix diffusion parameter defaults

Signed-off-by: gcanlin <canlinguosdu@gmail.com>
---
 docs/user_guide/examples/online_serving/glm_image.md      | 5 +++--
 docs/user_guide/examples/online_serving/image_to_image.md | 6 ++++--
 docs/user_guide/examples/online_serving/text_to_image.md  | 4 +++-
 examples/online_serving/glm_image/README.md               | 6 ++++--
 examples/online_serving/image_to_image/README.md          | 8 +++++---
 examples/online_serving/text_to_image/README.md           | 6 ++++--
 6 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/docs/user_guide/examples/online_serving/glm_image.md b/docs/user_guide/examples/online_serving/glm_image.md
index 37d7de6a64c..f7027b906db 100644
--- a/docs/user_guide/examples/online_serving/glm_image.md
+++ b/docs/user_guide/examples/online_serving/glm_image.md
@@ -107,7 +107,8 @@ JSON, or via the `extra_body` keyword argument in the OpenAI Python SDK (see the
 [Diffusion Chat API guide](../../../../serving/diffusion_chat_api.md)).
 When using the dedicated [`/v1/images/generations`](../../../../serving/image_generation_api.md)
 or [`/v1/images/edits`](../../../../serving/image_edit_api.md) endpoints, pass
-them as top-level fields directly.
+the supported generation controls as top-level fields directly. For image
+dimensions and count, use `size` and `n` rather than `height` or `width`.
 
 | Parameter             | Type  | Default | Description                         |
 | --------------------- | ----- | ------- | ----------------------------------- |
@@ -115,7 +116,7 @@ them as top-level fields directly.
 | `width`               | int   | 1024    | Image width in pixels               |
 | `num_inference_steps` | int   | 50      | Number of diffusion denoising steps |
 | `guidance_scale`      | float | 1.5     | Classifier-free guidance scale      |
-| `seed`                | int   | 42      | Random seed for reproducibility     |
+| `seed`                | int   | None    | Optional random seed; `/v1/images/*` generates one server-side if omitted |
 | `negative_prompt`     | str   | None    | Negative prompt                     |
 
 ## Response Format
diff --git a/docs/user_guide/examples/online_serving/image_to_image.md b/docs/user_guide/examples/online_serving/image_to_image.md
index da6cbf220de..b19e9462da0 100644
--- a/docs/user_guide/examples/online_serving/image_to_image.md
+++ b/docs/user_guide/examples/online_serving/image_to_image.md
@@ -354,7 +354,9 @@ When using `/v1/chat/completions`, pass these inside `extra_body` in the curl
 JSON, or via the `extra_body` keyword argument in the OpenAI Python SDK (see the
 [Diffusion Chat API guide](../../../../serving/diffusion_chat_api.md)).
 When using the dedicated [`/v1/images/edits`](../../../../serving/image_edit_api.md)
-endpoint, pass them as top-level form fields directly.
+endpoint, pass the supported generation controls as top-level form fields
+directly. For image dimensions and count, use `size` and `n` rather than
+`height`, `width`, or `num_outputs_per_prompt`.
 
 | Parameter                | Type  | Default | Description                           |
 | ------------------------ | ----- | ------- | ------------------------------------- |
@@ -362,7 +364,7 @@ endpoint, pass them as top-level form fields directly.
 | `width`                  | int   | None    | Output image width in pixels          |
 | `size`                   | str   | None    | Output image size (e.g., "1024x1024") |
 | `num_inference_steps`    | int   | 50      | Number of denoising steps             |
-| `guidance_scale`         | float | 7.5     | CFG guidance scale                    |
+| `guidance_scale`         | float | 1.0     | CFG guidance scale                    |
 | `seed`                   | int   | None    | Random seed (reproducible)            |
 | `negative_prompt`        | str   | None    | Negative prompt                       |
 | `num_outputs_per_prompt` | int   | 1       | Number of images to generate          |
diff --git a/docs/user_guide/examples/online_serving/text_to_image.md b/docs/user_guide/examples/online_serving/text_to_image.md
index 9d29cd5063c..2e79749b3b2 100644
--- a/docs/user_guide/examples/online_serving/text_to_image.md
+++ b/docs/user_guide/examples/online_serving/text_to_image.md
@@ -230,7 +230,9 @@ When using `/v1/chat/completions`, pass these inside `extra_body` in the curl
 JSON, or via the `extra_body` keyword argument in the OpenAI Python SDK (see the
 [Diffusion Chat API guide](../../../../serving/diffusion_chat_api.md)).
 When using the dedicated [`/v1/images/generations`](../../../../serving/image_generation_api.md)
-endpoint, pass them as top-level JSON fields directly.
+endpoint, pass the supported generation controls as top-level JSON fields
+directly. For image dimensions and count, use `size` and `n` rather than
+`height`, `width`, or `num_outputs_per_prompt`.
 
 | Parameter                | Type  | Default | Description                    |
 | ------------------------ | ----- | ------- | ------------------------------ |
diff --git a/examples/online_serving/glm_image/README.md b/examples/online_serving/glm_image/README.md
index 54a4708a606..13ed00861da 100644
--- a/examples/online_serving/glm_image/README.md
+++ b/examples/online_serving/glm_image/README.md
@@ -102,7 +102,9 @@ the [Text-to-Image](../text_to_image/README.md) and
 When using `/v1/chat/completions`, pass these inside `extra_body` in the curl
 JSON, or via the `extra_body` keyword argument in the OpenAI Python SDK.
 When using the dedicated `/v1/images/generations` or `/v1/images/edits`
-endpoints, pass them as top-level fields directly.
+endpoints, pass the supported generation controls as top-level fields directly.
+For image dimensions and count, use `size` and `n` rather than `height` or
+`width`.
 
 | Parameter             | Type  | Default | Description                         |
 | --------------------- | ----- | ------- | ----------------------------------- |
@@ -110,7 +112,7 @@ endpoints, pass them as top-level fields directly.
 | `width`               | int   | 1024    | Image width in pixels               |
 | `num_inference_steps` | int   | 50      | Number of diffusion denoising steps |
 | `guidance_scale`      | float | 1.5     | Classifier-free guidance scale      |
-| `seed`                | int   | 42      | Random seed for reproducibility     |
+| `seed`                | int   | None    | Optional random seed; `/v1/images/*` generates one server-side if omitted |
 | `negative_prompt`     | str   | None    | Negative prompt                     |
 
 ## Response Format
diff --git a/examples/online_serving/image_to_image/README.md b/examples/online_serving/image_to_image/README.md
index 1d0a1d3961d..789258473fd 100644
--- a/examples/online_serving/image_to_image/README.md
+++ b/examples/online_serving/image_to_image/README.md
@@ -299,8 +299,10 @@ Provide multiple images in `content` (order matters):
 
 When using `/v1/chat/completions`, pass these inside `extra_body` in the curl
 JSON, or via the `extra_body` keyword argument in the OpenAI Python SDK.
-When using the dedicated `/v1/images/edits` endpoint, pass them as top-level
-form fields directly.
+When using the dedicated `/v1/images/edits` endpoint, pass the supported
+generation controls as top-level form fields directly. For image dimensions and
+count, use `size` and `n` rather than `height`, `width`, or
+`num_outputs_per_prompt`.
 
 | Parameter                | Type  | Default | Description                           |
 | ------------------------ | ----- | ------- | ------------------------------------- |
@@ -308,7 +310,7 @@ form fields directly.
 | `width`                  | int   | None    | Output image width in pixels          |
 | `size`                   | str   | None    | Output image size (e.g., "1024x1024") |
 | `num_inference_steps`    | int   | 50      | Number of denoising steps             |
-| `guidance_scale`         | float | 7.5     | CFG guidance scale                    |
+| `guidance_scale`         | float | 1.0     | CFG guidance scale                    |
 | `seed`                   | int   | None    | Random seed (reproducible)            |
 | `negative_prompt`        | str   | None    | Negative prompt                       |
 | `num_outputs_per_prompt` | int   | 1       | Number of images to generate          |
diff --git a/examples/online_serving/text_to_image/README.md b/examples/online_serving/text_to_image/README.md
index af27bc05602..87b6a56438e 100644
--- a/examples/online_serving/text_to_image/README.md
+++ b/examples/online_serving/text_to_image/README.md
@@ -216,8 +216,10 @@ Use `extra_body` to pass generation parameters:
 
 When using `/v1/chat/completions`, pass these inside `extra_body` in the curl
 JSON, or via the `extra_body` keyword argument in the OpenAI Python SDK.
-When using the dedicated `/v1/images/generations` endpoint, pass them as
-top-level JSON fields directly.
+When using the dedicated `/v1/images/generations` endpoint, pass the supported
+generation controls as top-level JSON fields directly. For image dimensions and
+count, use `size` and `n` rather than `height`, `width`, or
+`num_outputs_per_prompt`.
 
 | Parameter                | Type  | Default | Description                    |
 | ------------------------ | ----- | ------- | ------------------------------ |