Skip to content
Merged
1 change: 1 addition & 0 deletions docs/.nav.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ nav:
- getting_started/installation/*
- Serving:
- OpenAI-Compatible API:
- Diffusion Chat API: serving/diffusion_chat_api.md
- Image Generation: serving/image_generation_api.md
- Image Edit: serving/image_edit_api.md
- Text to Speech: serving/speech_api.md
Expand Down
78 changes: 78 additions & 0 deletions docs/serving/diffusion_chat_api.md
Comment thread
SamitHuang marked this conversation as resolved.
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Diffusion Chat Completions API

vLLM-Omni supports generating and editing images via the `/v1/chat/completions`
endpoint using diffusion models. This page explains how to pass generation
parameters (such as `num_inference_steps`, `height`, `width`) to diffusion
models through this endpoint.

!!! tip
For dedicated endpoints that accept generation parameters as top-level
fields, see [Image Generation API](image_generation_api.md) and
[Image Edit API](image_edit_api.md).

## Passing Generation Parameters

The `/v1/chat/completions` endpoint follows the OpenAI Chat API schema, which
does not natively include diffusion-specific fields like `num_inference_steps`
or `height`. How you pass these extra fields depends on your client.

### curl / Python `requests`

Wrap generation parameters inside an `"extra_body"` key in the JSON body:

```bash
curl -s http://localhost:8091/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "A beautiful landscape painting"}
],
"extra_body": {
"num_inference_steps": 50,
"seed": 42
}
}'
```

### OpenAI Python SDK

Use the `extra_body` **keyword argument**. The SDK automatically merges these
fields into the top-level request body:

```python
response = client.chat.completions.create(
model="Qwen/Qwen-Image",
messages=[{"role": "user", "content": "A beautiful landscape painting"}],
extra_body={
"num_inference_steps": 50,
"seed": 42,
},
)
```

!!! note "SDK `extra_body` vs. JSON `extra_body`"
These two `extra_body` usages look similar but work differently under the
hood. The SDK flattens the dict into the top-level request JSON, while the
curl/requests approach sends it as a nested `"extra_body"` key. Both are
handled correctly by the server.

!!! note "About the `ignored fields` warning"
You may see a log message like:

```
WARNING: The following fields were present in the request but ignored: {'height', 'width', ...}
```

This is **harmless**. It is emitted by vLLM's request validation layer
because these fields are not part of the standard OpenAI
`ChatCompletionRequest` schema. The fields are still stored internally
and correctly forwarded to the diffusion pipeline.

## Model-Specific Examples

For complete examples with full request/response details, see the model-specific
guides:

- [Text-to-Image (Qwen-Image)](../user_guide/examples/online_serving/text_to_image.md)
- [Image-to-Image (Qwen-Image-Edit, Qwen-Image-Layered)](../user_guide/examples/online_serving/image_to_image.md)
- [GLM-Image](../user_guide/examples/online_serving/glm_image.md)
78 changes: 15 additions & 63 deletions docs/user_guide/examples/online_serving/glm_image.md
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, have you test glm-image recently? I remember it has some bug after last time refactor😂

Original file line number Diff line number Diff line change
Expand Up @@ -73,98 +73,50 @@ The default yaml configuration deploys AR on GPU 0 and DiT on GPU 1. You can use

### Text-to-Image

Generate images from text prompts:

**Using Python client**

```bash
python openai_chat_client.py \
--prompt "A photorealistic mountain landscape at sunset" \
--height 1024 \
--width 1024 \
--output landscape.png
```

**Using curl**

```bash
curl -s http://localhost:8091/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "A beautiful sunset over the ocean with sailing boats"}
],
"extra_body": {
"height": 1024,
"width": 1024,
"num_inference_steps": 50,
"guidance_scale": 1.5,
"seed": 42
}
}' | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > output.png
```

Or use the script:

```bash
# Or use the curl script:
bash run_curl_text_to_image.sh "A futuristic city skyline at night"
```

### Image-to-Image (Image Editing)

Edit images with text instructions:

**Using Python client**

```bash
python openai_chat_client.py \
--prompt "Convert this image to watercolor style" \
--image input.png \
--output watercolor.png
```

**Using curl**

```bash
IMG_B64=$(base64 < input.png | tr -d '\n')

curl -s http://localhost:8091/v1/chat/completions \
-H "Content-Type: application/json" \
-d @- <<EOF | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > output.png
{
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "Convert this image to watercolor style"},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,'$IMG_B64'"}}
]
}],
"extra_body": {
"height": 1024,
"width": 1024,
"num_inference_steps": 50,
"guidance_scale": 1.5,
"seed": 42
}
}
EOF
# Or use the curl script:
bash run_curl_image_edit.sh input.png "Convert to watercolor style"
```

Or use the script:
For general-purpose request methods (curl, OpenAI SDK, Python `requests`), see
the [Text-to-Image](text_to_image.md) and [Image-to-Image](image_to_image.md)
guides.

```bash
bash run_curl_image_edit.sh input.png "Convert to watercolor style"
```
## Generation Parameters

## Generation Parameters (extra_body)
When using `/v1/chat/completions`, pass these inside `extra_body` in the curl
JSON, or via the `extra_body` keyword argument in the OpenAI Python SDK (see the
[Diffusion Chat API guide](../../../../serving/diffusion_chat_api.md)).
When using the dedicated [`/v1/images/generations`](../../../../serving/image_generation_api.md)
or [`/v1/images/edits`](../../../../serving/image_edit_api.md) endpoints, pass
the supported generation controls as top-level fields directly. For image
dimensions and count, use `size` and `n` rather than `height` or `width`.

| Parameter | Type | Default | Description |
| --------------------- | ----- | ------- | ----------------------------------- |
| `height` | int | 1024 | Image height in pixels |
| `width` | int | 1024 | Image width in pixels |
| `num_inference_steps` | int | 50 | Number of diffusion denoising steps |
| `guidance_scale` | float | 1.5 | Classifier-free guidance scale |
| `seed` | int | 42 | Random seed for reproducibility |
| `seed` | int | None | Optional random seed; `/v1/images/*` generates one server-side if omitted |
| `negative_prompt` | str | None | Negative prompt |

## Response Format
Expand Down
Loading
Loading