diff --git a/docs/.nav.yml b/docs/.nav.yml index 48917d526ed..af80816e409 100644 --- a/docs/.nav.yml +++ b/docs/.nav.yml @@ -6,6 +6,7 @@ nav: - getting_started/installation/* - Serving: - OpenAI-Compatible API: + - Diffusion Chat API: serving/diffusion_chat_api.md - Image Generation: serving/image_generation_api.md - Image Edit: serving/image_edit_api.md - Text to Speech: serving/speech_api.md diff --git a/docs/serving/diffusion_chat_api.md b/docs/serving/diffusion_chat_api.md new file mode 100644 index 00000000000..d0e2990ad6c --- /dev/null +++ b/docs/serving/diffusion_chat_api.md @@ -0,0 +1,78 @@ +# Diffusion Chat Completions API + +vLLM-Omni supports generating and editing images via the `/v1/chat/completions` +endpoint using diffusion models. This page explains how to pass generation +parameters (such as `num_inference_steps`, `height`, `width`) to diffusion +models through this endpoint. + +!!! tip + For dedicated endpoints that accept generation parameters as top-level + fields, see [Image Generation API](image_generation_api.md) and + [Image Edit API](image_edit_api.md). + +## Passing Generation Parameters + +The `/v1/chat/completions` endpoint follows the OpenAI Chat API schema, which +does not natively include diffusion-specific fields like `num_inference_steps` +or `height`. How you pass these extra fields depends on your client. + +### curl / Python `requests` + +Wrap generation parameters inside an `"extra_body"` key in the JSON body: + +```bash +curl -s http://localhost:8091/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{ + "messages": [ + {"role": "user", "content": "A beautiful landscape painting"} + ], + "extra_body": { + "num_inference_steps": 50, + "seed": 42 + } + }' +``` + +### OpenAI Python SDK + +Use the `extra_body` **keyword argument**. The SDK automatically merges these +fields into the top-level request body: + +```python +response = client.chat.completions.create( + model="Qwen/Qwen-Image", + messages=[{"role": "user", "content": "A beautiful landscape painting"}], + extra_body={ + "num_inference_steps": 50, + "seed": 42, + }, +) +``` + +!!! note "SDK `extra_body` vs. JSON `extra_body`" + These two `extra_body` usages look similar but work differently under the + hood. The SDK flattens the dict into the top-level request JSON, while the + curl/requests approach sends it as a nested `"extra_body"` key. Both are + handled correctly by the server. + +!!! note "About the `ignored fields` warning" + You may see a log message like: + + ``` + WARNING: The following fields were present in the request but ignored: {'height', 'width', ...} + ``` + + This is **harmless**. It is emitted by vLLM's request validation layer + because these fields are not part of the standard OpenAI + `ChatCompletionRequest` schema. The fields are still stored internally + and correctly forwarded to the diffusion pipeline. + +## Model-Specific Examples + +For complete examples with full request/response details, see the model-specific +guides: + +- [Text-to-Image (Qwen-Image)](../user_guide/examples/online_serving/text_to_image.md) +- [Image-to-Image (Qwen-Image-Edit, Qwen-Image-Layered)](../user_guide/examples/online_serving/image_to_image.md) +- [GLM-Image](../user_guide/examples/online_serving/glm_image.md) diff --git a/docs/user_guide/examples/online_serving/glm_image.md b/docs/user_guide/examples/online_serving/glm_image.md index c0d1764801a..f7027b906db 100644 --- a/docs/user_guide/examples/online_serving/glm_image.md +++ b/docs/user_guide/examples/online_serving/glm_image.md @@ -73,90 +73,42 @@ The default yaml configuration deploys AR on GPU 0 and DiT on GPU 1. You can use ### Text-to-Image -Generate images from text prompts: - -**Using Python client** - ```bash python openai_chat_client.py \ --prompt "A photorealistic mountain landscape at sunset" \ --height 1024 \ --width 1024 \ --output landscape.png -``` -**Using curl** - -```bash -curl -s http://localhost:8091/v1/chat/completions \ - -H "Content-Type: application/json" \ - -d '{ - "messages": [ - {"role": "user", "content": "A beautiful sunset over the ocean with sailing boats"} - ], - "extra_body": { - "height": 1024, - "width": 1024, - "num_inference_steps": 50, - "guidance_scale": 1.5, - "seed": 42 - } - }' | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > output.png -``` - -Or use the script: - -```bash +# Or use the curl script: bash run_curl_text_to_image.sh "A futuristic city skyline at night" ``` ### Image-to-Image (Image Editing) -Edit images with text instructions: - -**Using Python client** - ```bash python openai_chat_client.py \ --prompt "Convert this image to watercolor style" \ --image input.png \ --output watercolor.png -``` - -**Using curl** -```bash -IMG_B64=$(base64 < input.png | tr -d '\n') - -curl -s http://localhost:8091/v1/chat/completions \ - -H "Content-Type: application/json" \ - -d @- < output.png -{ - "messages": [{ - "role": "user", - "content": [ - {"type": "text", "text": "Convert this image to watercolor style"}, - {"type": "image_url", "image_url": {"url": "data:image/png;base64,'$IMG_B64'"}} - ] - }], - "extra_body": { - "height": 1024, - "width": 1024, - "num_inference_steps": 50, - "guidance_scale": 1.5, - "seed": 42 - } -} -EOF +# Or use the curl script: +bash run_curl_image_edit.sh input.png "Convert to watercolor style" ``` -Or use the script: +For general-purpose request methods (curl, OpenAI SDK, Python `requests`), see +the [Text-to-Image](text_to_image.md) and [Image-to-Image](image_to_image.md) +guides. -```bash -bash run_curl_image_edit.sh input.png "Convert to watercolor style" -``` +## Generation Parameters -## Generation Parameters (extra_body) +When using `/v1/chat/completions`, pass these inside `extra_body` in the curl +JSON, or via the `extra_body` keyword argument in the OpenAI Python SDK (see the +[Diffusion Chat API guide](../../../../serving/diffusion_chat_api.md)). +When using the dedicated [`/v1/images/generations`](../../../../serving/image_generation_api.md) +or [`/v1/images/edits`](../../../../serving/image_edit_api.md) endpoints, pass +the supported generation controls as top-level fields directly. For image +dimensions and count, use `size` and `n` rather than `height` or `width`. | Parameter | Type | Default | Description | | --------------------- | ----- | ------- | ----------------------------------- | @@ -164,7 +116,7 @@ bash run_curl_image_edit.sh input.png "Convert to watercolor style" | `width` | int | 1024 | Image width in pixels | | `num_inference_steps` | int | 50 | Number of diffusion denoising steps | | `guidance_scale` | float | 1.5 | Classifier-free guidance scale | -| `seed` | int | 42 | Random seed for reproducibility | +| `seed` | int | None | Optional random seed; `/v1/images/*` generates one server-side if omitted | | `negative_prompt` | str | None | Negative prompt | ## Response Format diff --git a/docs/user_guide/examples/online_serving/image_to_image.md b/docs/user_guide/examples/online_serving/image_to_image.md index 6be2a4a7e82..b19e9462da0 100644 --- a/docs/user_guide/examples/online_serving/image_to_image.md +++ b/docs/user_guide/examples/online_serving/image_to_image.md @@ -69,10 +69,49 @@ cat < request.json } EOF -curl -s http://localhost:8092/v1/chat/completions -H "Content-Type: application/json" -d @request.json | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2 | base64 -d > output.png +curl -s http://localhost:8092/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d @request.json \ + | jq -r '.choices[0].message.content[0].image_url.url' \ + | cut -d',' -f2 | base64 -d > output.png ``` -### Method 2: Using Python Client +### Method 2: Using OpenAI Python SDK + +```python +import base64 +from openai import OpenAI + +client = OpenAI(base_url="http://localhost:8092/v1", api_key="none") + +with open("input.png", "rb") as f: + img_b64 = base64.b64encode(f.read()).decode() + +response = client.chat.completions.create( + model="Qwen/Qwen-Image-Edit", + messages=[{ + "role": "user", + "content": [ + {"type": "text", "text": "Convert to watercolor style"}, + {"type": "image_url", "image_url": { + "url": f"data:image/png;base64,{img_b64}" + }}, + ], + }], + extra_body={ + "num_inference_steps": 50, + "guidance_scale": 1, + "seed": 42, + }, +) + +img_url = response.choices[0].message.content[0].image_url.url +_, b64_data = img_url.split(",", 1) +with open("output.png", "wb") as f: + f.write(base64.b64decode(b64_data)) +``` + +### Method 3: Using Python Client Script ```bash python openai_chat_client.py --input input.png --prompt "Convert to oil painting style" --output output.png @@ -81,7 +120,7 @@ python openai_chat_client.py --input input.png --prompt "Convert to oil painting python openai_chat_client.py --input input1.png input2.png --prompt "Combine these images into a single scene" --output output.png ``` -### Method 3: Using Gradio Demo +### Method 4: Using Gradio Demo ```bash python gradio_demo.py @@ -124,7 +163,7 @@ python gradio_demo.py ### Image Editing with Parameters -Use `extra_body` to pass generation parameters: +Wrap generation parameters inside `extra_body` in the request JSON: ```json { @@ -147,6 +186,149 @@ Use `extra_body` to pass generation parameters: } ``` +!!! tip "Using the OpenAI SDK" + When using the OpenAI Python SDK, pass these parameters via the `extra_body` + keyword argument. The SDK merges them into the top-level request body automatically: + + ```python + client.chat.completions.create( + model="Qwen/Qwen-Image-Edit", + messages=[...], + extra_body={"num_inference_steps": 50, "guidance_scale": 7.5, "seed": 42}, + ) + ``` + + For details on how generation parameters are handled across different clients, see the + [Diffusion Chat API guide](../../../../serving/diffusion_chat_api.md). + +### Layered Image Generation (Qwen-Image-Layered) + +Qwen-Image-Layered generates multiple decomposed layers from a reference image and a text prompt. +Start the server with: + +```bash +vllm serve Qwen/Qwen-Image-Layered --omni --port 8093 +``` + +=== "curl" + + ```bash + IMG_B64=$(base64 -w0 input.png) + + curl -sS http://localhost:8093/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d "$(jq -n --arg img "$IMG_B64" '{ + messages: [{ + role: "user", + content: [ + {type: "image_url", image_url: {url: ("data:image/png;base64," + $img)}}, + {type: "text", text: "a rabbit"} + ] + }], + extra_body: { + num_inference_steps: 50, + cfg_scale: 4.0, + seed: 0, + layers: 4, + resolution: 640 + } + }')" \ + | jq -r '.choices[0].message.content[] | .image_url.url | split(",")[1]' \ + | while IFS= read -r b64; do + ((i++)); echo "$b64" | base64 -d > "layer_${i}.png" + done + ``` + +=== "OpenAI SDK" + + ```python + import base64 + from openai import OpenAI + + client = OpenAI(base_url="http://localhost:8093/v1", api_key="none") + + with open("input.png", "rb") as f: + img_b64 = base64.b64encode(f.read()).decode() + + response = client.chat.completions.create( + model="Qwen/Qwen-Image-Layered", + messages=[{ + "role": "user", + "content": [ + {"type": "image_url", "image_url": { + "url": f"data:image/png;base64,{img_b64}" + }}, + {"type": "text", "text": "a rabbit"}, + ], + }], + extra_body={ + "num_inference_steps": 50, + "cfg_scale": 4.0, + "seed": 0, + "layers": 4, + "resolution": 640, + }, + ) + + for i, item in enumerate(response.choices[0].message.content): + _, b64_data = item.image_url.url.split(",", 1) + with open(f"layer_{i}.png", "wb") as f: + f.write(base64.b64decode(b64_data)) + ``` + +=== "Python requests" + + ```python + import base64 + import requests + + with open("input.png", "rb") as f: + img_b64 = base64.b64encode(f.read()).decode() + + payload = { + "messages": [{ + "role": "user", + "content": [ + {"type": "image_url", "image_url": { + "url": f"data:image/png;base64,{img_b64}" + }}, + {"type": "text", "text": "a rabbit"}, + ], + }], + "extra_body": { + "num_inference_steps": 50, + "cfg_scale": 4.0, + "seed": 0, + "layers": 4, + "resolution": 640, + }, + } + + resp = requests.post( + "http://localhost:8093/v1/chat/completions", + json=payload, + timeout=600, + ) + data = resp.json() + + for i, item in enumerate(data["choices"][0]["message"]["content"]): + _, b64_data = item["image_url"]["url"].split(",", 1) + with open(f"layer_{i}.png", "wb") as f: + f.write(base64.b64decode(b64_data)) + ``` + +The response contains multiple images in `choices[0].message.content` — one per generated layer. + +#### Qwen-Image-Layered Parameters + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `layers` | int | 4 | Number of layers to decompose | +| `resolution` | int | 640 | Resolution for dimension calculation (640 or 1024) | +| `cfg_scale` | float | 4.0 | Classifier-free guidance scale (alias for `true_cfg_scale`) | +| `num_inference_steps` | int | 50 | Number of denoising steps | +| `seed` | int | None | Random seed for reproducibility | + ### Multi-Image Editing (Qwen-Image-Edit-2509) Provide multiple images in `content` (order matters): @@ -166,7 +348,15 @@ Provide multiple images in `content` (order matters): } ``` -## Generation Parameters (extra_body) +## Generation Parameters + +When using `/v1/chat/completions`, pass these inside `extra_body` in the curl +JSON, or via the `extra_body` keyword argument in the OpenAI Python SDK (see the +[Diffusion Chat API guide](../../../../serving/diffusion_chat_api.md)). +When using the dedicated [`/v1/images/edits`](../../../../serving/image_edit_api.md) +endpoint, pass the supported generation controls as top-level form fields +directly. For image dimensions and count, use `size` and `n` rather than +`height`, `width`, or `num_outputs_per_prompt`. | Parameter | Type | Default | Description | | ------------------------ | ----- | ------- | ------------------------------------- | @@ -174,10 +364,12 @@ Provide multiple images in `content` (order matters): | `width` | int | None | Output image width in pixels | | `size` | str | None | Output image size (e.g., "1024x1024") | | `num_inference_steps` | int | 50 | Number of denoising steps | -| `guidance_scale` | float | 7.5 | CFG guidance scale | +| `guidance_scale` | float | 1.0 | CFG guidance scale | | `seed` | int | None | Random seed (reproducible) | | `negative_prompt` | str | None | Negative prompt | | `num_outputs_per_prompt` | int | 1 | Number of images to generate | +| `layers` | int | 4 | Number of layers (Qwen-Image-Layered) | +| `resolution` | int | 640 | Resolution, 640 or 1024 (Qwen-Image-Layered) | ## Response Format diff --git a/docs/user_guide/examples/online_serving/text_to_image.md b/docs/user_guide/examples/online_serving/text_to_image.md index 39be6e5f087..2e79749b3b2 100644 --- a/docs/user_guide/examples/online_serving/text_to_image.md +++ b/docs/user_guide/examples/online_serving/text_to_image.md @@ -71,13 +71,39 @@ curl -s http://localhost:8091/v1/chat/completions \ }' | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > output.png ``` -### Method 2: Using Python Client +### Method 2: Using OpenAI Python SDK + +```python +from openai import OpenAI +import base64 + +client = OpenAI(base_url="http://localhost:8091/v1", api_key="none") + +response = client.chat.completions.create( + model="Qwen/Qwen-Image", + messages=[{"role": "user", "content": "A beautiful landscape painting"}], + extra_body={ + "height": 1024, + "width": 1024, + "num_inference_steps": 50, + "true_cfg_scale": 4.0, + "seed": 42, + }, +) + +img_url = response.choices[0].message.content[0].image_url.url +_, b64_data = img_url.split(",", 1) +with open("output.png", "wb") as f: + f.write(base64.b64decode(b64_data)) +``` + +### Method 3: Using Python Client Script ```bash python openai_chat_client.py --prompt "A beautiful landscape painting" --output output.png ``` -### Method 3: Using Gradio Demo +### Method 4: Using Gradio Demo ```bash python gradio_demo.py @@ -151,7 +177,7 @@ lora_adapter/ ### Generation with Parameters -Use `extra_body` to pass generation parameters: +Wrap generation parameters inside `extra_body` in the request JSON: ```json { @@ -168,6 +194,21 @@ Use `extra_body` to pass generation parameters: } ``` +!!! tip "Using the OpenAI SDK" + When using the OpenAI Python SDK, pass these parameters via the `extra_body` + keyword argument. The SDK merges them into the top-level request body automatically: + + ```python + client.chat.completions.create( + model="Qwen/Qwen-Image", + messages=[...], + extra_body={"height": 1024, "width": 1024, "num_inference_steps": 50}, + ) + ``` + + For details on how generation parameters are handled across different clients, see the + [Diffusion Chat API guide](../../../../serving/diffusion_chat_api.md). + ### Multimodal Input (Text + Structured Content) ```json @@ -183,19 +224,26 @@ Use `extra_body` to pass generation parameters: } ``` -## Generation Parameters (extra_body) - -| Parameter | Type | Default | Description | -| ------------------------ | ----- | ------- | ---------------------------------- | -| `height` | int | None | Image height in pixels | -| `width` | int | None | Image width in pixels | -| `size` | str | None | Image size (e.g., "1024x1024") | -| `num_inference_steps` | int | 50 | Number of denoising steps | -| `true_cfg_scale` | float | 4.0 | Qwen-Image CFG scale | -| `seed` | int | None | Random seed (reproducible) | -| `negative_prompt` | str | None | Negative prompt | -| `num_outputs_per_prompt` | int | 1 | Number of images to generate | -| `--cfg-parallel-size`. | int | 1 | Number of GPUs for CFG parallelism | +## Generation Parameters + +When using `/v1/chat/completions`, pass these inside `extra_body` in the curl +JSON, or via the `extra_body` keyword argument in the OpenAI Python SDK (see the +[Diffusion Chat API guide](../../../../serving/diffusion_chat_api.md)). +When using the dedicated [`/v1/images/generations`](../../../../serving/image_generation_api.md) +endpoint, pass the supported generation controls as top-level JSON fields +directly. For image dimensions and count, use `size` and `n` rather than +`height`, `width`, or `num_outputs_per_prompt`. + +| Parameter | Type | Default | Description | +| ------------------------ | ----- | ------- | ------------------------------ | +| `height` | int | None | Image height in pixels | +| `width` | int | None | Image width in pixels | +| `size` | str | None | Image size (e.g., "1024x1024") | +| `num_inference_steps` | int | 50 | Number of denoising steps | +| `true_cfg_scale` | float | 4.0 | Qwen-Image CFG scale | +| `seed` | int | None | Random seed (reproducible) | +| `negative_prompt` | str | None | Negative prompt | +| `num_outputs_per_prompt` | int | 1 | Number of images to generate | ## Response Format diff --git a/examples/online_serving/glm_image/README.md b/examples/online_serving/glm_image/README.md index 5efeba8068c..13ed00861da 100644 --- a/examples/online_serving/glm_image/README.md +++ b/examples/online_serving/glm_image/README.md @@ -70,90 +70,41 @@ The default yaml configuration deploys AR on GPU 0 and DiT on GPU 1. You can use ### Text-to-Image -Generate images from text prompts: - -**Using Python client** - ```bash python openai_chat_client.py \ --prompt "A photorealistic mountain landscape at sunset" \ --height 1024 \ --width 1024 \ --output landscape.png -``` -**Using curl** - -```bash -curl -s http://localhost:8091/v1/chat/completions \ - -H "Content-Type: application/json" \ - -d '{ - "messages": [ - {"role": "user", "content": "A beautiful sunset over the ocean with sailing boats"} - ], - "extra_body": { - "height": 1024, - "width": 1024, - "num_inference_steps": 50, - "guidance_scale": 1.5, - "seed": 42 - } - }' | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > output.png -``` - -Or use the script: - -```bash +# Or use the curl script: bash run_curl_text_to_image.sh "A futuristic city skyline at night" ``` ### Image-to-Image (Image Editing) -Edit images with text instructions: - -**Using Python client** - ```bash python openai_chat_client.py \ --prompt "Convert this image to watercolor style" \ --image input.png \ --output watercolor.png -``` - -**Using curl** -```bash -IMG_B64=$(base64 < input.png | tr -d '\n') - -curl -s http://localhost:8091/v1/chat/completions \ - -H "Content-Type: application/json" \ - -d @- < output.png -{ - "messages": [{ - "role": "user", - "content": [ - {"type": "text", "text": "Convert this image to watercolor style"}, - {"type": "image_url", "image_url": {"url": "data:image/png;base64,'$IMG_B64'"}} - ] - }], - "extra_body": { - "height": 1024, - "width": 1024, - "num_inference_steps": 50, - "guidance_scale": 1.5, - "seed": 42 - } -} -EOF +# Or use the curl script: +bash run_curl_image_edit.sh input.png "Convert to watercolor style" ``` -Or use the script: +For general-purpose request methods (curl, OpenAI SDK, Python `requests`), see +the [Text-to-Image](../text_to_image/README.md) and +[Image-to-Image](../image_to_image/README.md) guides. -```bash -bash run_curl_image_edit.sh input.png "Convert to watercolor style" -``` +## Generation Parameters -## Generation Parameters (extra_body) +When using `/v1/chat/completions`, pass these inside `extra_body` in the curl +JSON, or via the `extra_body` keyword argument in the OpenAI Python SDK. +When using the dedicated `/v1/images/generations` or `/v1/images/edits` +endpoints, pass the supported generation controls as top-level fields directly. +For image dimensions and count, use `size` and `n` rather than `height` or +`width`. | Parameter | Type | Default | Description | | --------------------- | ----- | ------- | ----------------------------------- | @@ -161,7 +112,7 @@ bash run_curl_image_edit.sh input.png "Convert to watercolor style" | `width` | int | 1024 | Image width in pixels | | `num_inference_steps` | int | 50 | Number of diffusion denoising steps | | `guidance_scale` | float | 1.5 | Classifier-free guidance scale | -| `seed` | int | 42 | Random seed for reproducibility | +| `seed` | int | None | Optional random seed; `/v1/images/*` generates one server-side if omitted | | `negative_prompt` | str | None | Negative prompt | ## Response Format diff --git a/examples/online_serving/image_to_image/README.md b/examples/online_serving/image_to_image/README.md index f69fa8b4286..789258473fd 100644 --- a/examples/online_serving/image_to_image/README.md +++ b/examples/online_serving/image_to_image/README.md @@ -69,7 +69,48 @@ EOF curl -s http://localhost:8092/v1/chat/completions -H "Content-Type: application/json" -d @request.json | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2 | base64 -d > output.png ``` -### Method 2: Using Python Client +### Method 2: Using OpenAI Python SDK + +```python +import base64 +from openai import OpenAI + +client = OpenAI(base_url="http://localhost:8092/v1", api_key="none") + +with open("input.png", "rb") as f: + img_b64 = base64.b64encode(f.read()).decode() + +response = client.chat.completions.create( + model="Qwen/Qwen-Image-Edit", + messages=[{ + "role": "user", + "content": [ + {"type": "text", "text": "Convert to watercolor style"}, + {"type": "image_url", "image_url": { + "url": f"data:image/png;base64,{img_b64}" + }}, + ], + }], + extra_body={ + "num_inference_steps": 50, + "guidance_scale": 1, + "seed": 42, + }, +) + +img_url = response.choices[0].message.content[0].image_url.url +_, b64_data = img_url.split(",", 1) +with open("output.png", "wb") as f: + f.write(base64.b64decode(b64_data)) +``` + +!!! note + The OpenAI SDK's `extra_body` keyword argument merges parameters into the + top-level request body automatically. When using curl or Python `requests`, + wrap generation parameters inside a literal `"extra_body"` key in the JSON + instead (as shown in the curl example above). + +### Method 3: Using Python Client Script ```bash python openai_chat_client.py --input input.png --prompt "Convert to oil painting style" --output output.png @@ -78,7 +119,7 @@ python openai_chat_client.py --input input.png --prompt "Convert to oil painting python openai_chat_client.py --input input1.png input2.png --prompt "Combine these images into a single scene" --output output.png ``` -### Method 3: Using Gradio Demo +### Method 4: Using Gradio Demo ```bash python gradio_demo.py @@ -144,6 +185,97 @@ Use `extra_body` to pass generation parameters: } ``` +### Layered Image Generation (Qwen-Image-Layered) + +Qwen-Image-Layered generates multiple decomposed layers from a reference image and a text prompt. +Start the server with: + +```bash +vllm serve Qwen/Qwen-Image-Layered --omni --port 8093 +``` + +**Using curl** + +```bash +IMG_B64=$(base64 -w0 input.png) + +curl -sS http://localhost:8093/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d "$(jq -n --arg img "$IMG_B64" '{ + messages: [{ + role: "user", + content: [ + {type: "image_url", image_url: {url: ("data:image/png;base64," + $img)}}, + {type: "text", text: "a rabbit"} + ] + }], + extra_body: { + num_inference_steps: 50, + cfg_scale: 4.0, + seed: 0, + layers: 4, + resolution: 640 + } + }')" \ + | jq -r '.choices[0].message.content[] | .image_url.url | split(",")[1]' \ + | while IFS= read -r b64; do + ((i++)); echo "$b64" | base64 -d > "layer_${i}.png" + done +``` + +**Using Python** + +```python +import base64 +import requests + +with open("input.png", "rb") as f: + img_b64 = base64.b64encode(f.read()).decode() + +payload = { + "messages": [{ + "role": "user", + "content": [ + {"type": "image_url", "image_url": { + "url": f"data:image/png;base64,{img_b64}" + }}, + {"type": "text", "text": "a rabbit"}, + ], + }], + "extra_body": { + "num_inference_steps": 50, + "cfg_scale": 4.0, + "seed": 0, + "layers": 4, + "resolution": 640, + }, +} + +resp = requests.post( + "http://localhost:8093/v1/chat/completions", + json=payload, + timeout=600, +) +data = resp.json() + +for i, item in enumerate(data["choices"][0]["message"]["content"]): + _, b64_data = item["image_url"]["url"].split(",", 1) + with open(f"layer_{i}.png", "wb") as f: + f.write(base64.b64decode(b64_data)) +``` + +The response contains multiple images in `choices[0].message.content` — one per generated layer. + +#### Qwen-Image-Layered Parameters + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `layers` | int | 4 | Number of layers to decompose | +| `resolution` | int | 640 | Resolution for dimension calculation (640 or 1024) | +| `cfg_scale` | float | 4.0 | Classifier-free guidance scale (alias for `true_cfg_scale`) | +| `num_inference_steps` | int | 50 | Number of denoising steps | +| `seed` | int | None | Random seed for reproducibility | + ### Multi-Image Editing (Qwen-Image-Edit-2509) Provide multiple images in `content` (order matters): @@ -163,7 +295,14 @@ Provide multiple images in `content` (order matters): } ``` -## Generation Parameters (extra_body) +## Generation Parameters + +When using `/v1/chat/completions`, pass these inside `extra_body` in the curl +JSON, or via the `extra_body` keyword argument in the OpenAI Python SDK. +When using the dedicated `/v1/images/edits` endpoint, pass the supported +generation controls as top-level form fields directly. For image dimensions and +count, use `size` and `n` rather than `height`, `width`, or +`num_outputs_per_prompt`. | Parameter | Type | Default | Description | | ------------------------ | ----- | ------- | ------------------------------------- | @@ -171,10 +310,12 @@ Provide multiple images in `content` (order matters): | `width` | int | None | Output image width in pixels | | `size` | str | None | Output image size (e.g., "1024x1024") | | `num_inference_steps` | int | 50 | Number of denoising steps | -| `guidance_scale` | float | 7.5 | CFG guidance scale | +| `guidance_scale` | float | 1.0 | CFG guidance scale | | `seed` | int | None | Random seed (reproducible) | | `negative_prompt` | str | None | Negative prompt | | `num_outputs_per_prompt` | int | 1 | Number of images to generate | +| `layers` | int | 4 | Number of layers (Qwen-Image-Layered) | +| `resolution` | int | 640 | Resolution, 640 or 1024 (Qwen-Image-Layered) | ## Response Format diff --git a/examples/online_serving/text_to_image/README.md b/examples/online_serving/text_to_image/README.md index 52091045f11..87b6a56438e 100644 --- a/examples/online_serving/text_to_image/README.md +++ b/examples/online_serving/text_to_image/README.md @@ -68,13 +68,45 @@ curl -s http://localhost:8091/v1/chat/completions \ }' | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > output.png ``` -### Method 2: Using Python Client +### Method 2: Using OpenAI Python SDK + +```python +from openai import OpenAI +import base64 + +client = OpenAI(base_url="http://localhost:8091/v1", api_key="none") + +response = client.chat.completions.create( + model="Qwen/Qwen-Image", + messages=[{"role": "user", "content": "A beautiful landscape painting"}], + extra_body={ + "height": 1024, + "width": 1024, + "num_inference_steps": 50, + "true_cfg_scale": 4.0, + "seed": 42, + }, +) + +img_url = response.choices[0].message.content[0].image_url.url +_, b64_data = img_url.split(",", 1) +with open("output.png", "wb") as f: + f.write(base64.b64decode(b64_data)) +``` + +!!! note + The OpenAI SDK's `extra_body` keyword argument merges parameters into the + top-level request body automatically. When using curl or Python `requests`, + wrap generation parameters inside a literal `"extra_body"` key in the JSON + instead (as shown in the curl example above). + +### Method 3: Using Python Client Script ```bash python openai_chat_client.py --prompt "A beautiful landscape painting" --output output.png ``` -### Method 3: Using Gradio Demo +### Method 4: Using Gradio Demo ```bash python gradio_demo.py @@ -180,19 +212,25 @@ Use `extra_body` to pass generation parameters: } ``` -## Generation Parameters (extra_body) - -| Parameter | Type | Default | Description | -| ------------------------ | ----- | ------- | ---------------------------------- | -| `height` | int | None | Image height in pixels | -| `width` | int | None | Image width in pixels | -| `size` | str | None | Image size (e.g., "1024x1024") | -| `num_inference_steps` | int | 50 | Number of denoising steps | -| `true_cfg_scale` | float | 4.0 | Qwen-Image CFG scale | -| `seed` | int | None | Random seed (reproducible) | -| `negative_prompt` | str | None | Negative prompt | -| `num_outputs_per_prompt` | int | 1 | Number of images to generate | -| `--cfg-parallel-size`. | int | 1 | Number of GPUs for CFG parallelism | +## Generation Parameters + +When using `/v1/chat/completions`, pass these inside `extra_body` in the curl +JSON, or via the `extra_body` keyword argument in the OpenAI Python SDK. +When using the dedicated `/v1/images/generations` endpoint, pass the supported +generation controls as top-level JSON fields directly. For image dimensions and +count, use `size` and `n` rather than `height`, `width`, or +`num_outputs_per_prompt`. + +| Parameter | Type | Default | Description | +| ------------------------ | ----- | ------- | ------------------------------ | +| `height` | int | None | Image height in pixels | +| `width` | int | None | Image width in pixels | +| `size` | str | None | Image size (e.g., "1024x1024") | +| `num_inference_steps` | int | 50 | Number of denoising steps | +| `true_cfg_scale` | float | 4.0 | Qwen-Image CFG scale | +| `seed` | int | None | Random seed (reproducible) | +| `negative_prompt` | str | None | Negative prompt | +| `num_outputs_per_prompt` | int | 1 | Number of images to generate | ## Response Format