[Docs]Add recipe for GLM-Image on 2x A800 GPUs and 1x A800 GPU by nainiu258 · Pull Request #2950 · vllm-project/vllm-omni

nainiu258 · 2026-04-20T10:54:16Z

Summary

Adds a community recipe for serving Z.ai GLM-Image with vLLM-Omni: text-to-image (T2I) via the OpenAI-compatible online API, including 1× and 2× NVIDIA A800 80GB deployment notes and links to the canonical user guide and examples/online_serving/glm_image clients.

Changes

File	Description
`recipes/GLM/GLM-Image.md`	New recipe: vendor/model context, when to use it, GPU sections (default 2× A800 80GB split vs 1× A800 80GB with custom stage YAML), environment, verification, and operational notes; references upstream docs and related discussion (#2888). Model id documented as `GLM/GLM-Image` (aligned with HF-style naming in the recipe).
`recipes/README.md`	Index entry linking to `GLM/GLM-Image.md` for 1×/2× A800 80GB image generation.

chatgpt-codex-connector · 2026-04-20T10:54:21Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

hsliuustc0106 · 2026-04-20T11:24:40Z

I just merged #2320, you can test it locally again and paste test results

nainiu258 · 2026-04-20T12:55:57Z

I just merged #2320, you can test it locally again and paste test results

Seems like nothing changed in my end-to-end case:

time python examples/online_serving/glm_image/openai_chat_client.py  
    --prompt "A cute cat sitting on a window sill"   
    --output glm_image_output.png   
    --server http://localhost:8091
Mode: text-to-image
Prompt: A cute cat sitting on a window sill
Sending text-to-image request to http://localhost:8091...
Image saved to: glm_image_output.png
Size: 1034.0 KB

real    1m2.037s

2x A800：
GPU memory: GPU0 48774 MiB, GPU1 23900 MiB

And here's the output of DiffusionPipelineProfiler :

GlmImagePipeline.text_encoder.forward took 0.008926s
GlmImagePipeline.text_encoder.forward took 0.007685s
GlmImagePipeline.diffuse took 33.394773s
GlmImagePipeline.vae.decode took 0.351598s
GlmImagePipeline.forward took 33.766904s

The result on 1x A800 is the same

hsliuustc0106 · 2026-04-20T12:59:09Z

I just merged #2320, you can test it locally again and paste test results

I just merged #2320, you can test it locally again and paste test results

Seems like nothing changed in my end-to-end case:
time python examples/online_serving/glm_image/openai_chat_client.py  
    --prompt "A cute cat sitting on a window sill"   
    --output glm_image_output.png   
    --server http://localhost:8091
Mode: text-to-image
Prompt: A cute cat sitting on a window sill
Sending text-to-image request to http://localhost:8091...
Image saved to: glm_image_output.png
Size: 1034.0 KB

real    1m2.037s
2x A800： GPU memory: GPU0 48774 MiB, GPU1 23900 MiB

And here's the output of DiffusionPipelineProfiler :

GlmImagePipeline.text_encoder.forward took 0.008926s GlmImagePipeline.text_encoder.forward took 0.007685s GlmImagePipeline.diffuse took 33.394773s GlmImagePipeline.vae.decode took 0.351598s GlmImagePipeline.forward took 33.766904s

The result on 1x A800 is the same

why there is no ar part time? the glm-image first does understanding and then does image generation. cc @JaredforReal

hsliuustc0106 · 2026-04-20T13:00:18Z

check #2834

nainiu258 · 2026-04-20T13:21:13Z

check #2834

Got it!

nainiu258 · 2026-04-20T17:20:19Z

stage0(AR) takes ~25s, stage1(Diffusion) takes ~33s. on 2x A800 and 1x A800 is almost the same
[Overall Summary]
| e2e_wall_time_ms | 61,148.679 |
| e2e_stage_0_wall_time_ms | 24,708.760 |
| e2e_stage_1_wall_time_ms | 33,787.442 |

hsliuustc0106

BLOCKING:

Gate Check — DCO is failing. Please sign off your commits before proceeding.

JaredforReal · 2026-04-21T02:32:35Z

@nainiu258 @hsliuustc0106 I use benchmarks/diffusion/duffusion_benchmark_service.py and serve GLM-Image with --enable-diffusion-pipeline-profiler flag, which gives profiling info of DiT stage, but no AR stage.
I think we need a benchmark specifically for GLM-Image. I will work on it

Signed-off-by: nainiu258 <cperfect02@163.com>

Signed-off-by: nainiu258 <cperfect02@163.com> Signed-off-by: nainiu258 <cperfect02@163.com>

Signed-off-by: nainiu258 <cperfect02@163.com>

nainiu258 · 2026-04-21T07:44:59Z

BLOCKING:

* **Gate Check** — DCO is failing. Please sign off your commits before proceeding.

fixed

Signed-off-by: nainiu258 <cperfect02@163.com>

hsliuustc0106

please check #2977

hsliuustc0106 · 2026-04-21T13:13:37Z

+Overall summary from the run’s metrics. Rough wall-time split: **Stage 0 (AR)** ~**25 s**,
+**Stage 1 (diffusion)** ~**34 s** (see `e2e_stage_*_wall_time_ms` below).
+
+| Field | Value |


there is some problem with this metrics, cc @bjf-frz could you fix it asap?

Signed-off-by: nainiu258 <cperfect02@163.com>

JaredforReal · 2026-04-22T08:57:24Z

+
+- Upstream or canonical docs:
+  [`docs/user_guide/examples/online_serving/glm_image.md`](../../docs/user_guide/examples/online_serving/glm_image.md)
+- Related example under `examples/`:


no longer exists now

/docs/user_guide/examples/offline_inference/glm_image.md

can we replace with this?

JaredforReal · 2026-04-22T09:01:46Z

+vllm serve zai-org/GLM-Image \
+  --omni \
+  --port 8091 \
+  --stage-configs-path vllm_omni/deploy/glm_image_single_gpu.yaml


we will deprecate --stage-config-path and turn to --deploy-config, --stage-overrides, double check thanks

JaredforReal

seems like there are gaps after config refactoring, PTAL

nainiu258 · 2026-04-22T12:44:14Z

seems like there are gaps after config refactoring, PTAL

could you please tell how can we run an example after server is ready?

JaredforReal · 2026-04-23T06:13:43Z

i2i curl example

jq -n --rawfile img <(base64 -i land.png | tr -d '\n') '{
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": ("data:image/png;base64," + $img)}},
        {"type": "text", "text": "make it cartoon style"}
      ],
      "modalities": ["image"],
    }
  ],
  "extra_body": {
    "height": 1024,
    "width": 1024,
    "num_inference_steps": 50,
    "true_cfg_scale": 4.0,
    "seed": 42
  }
}' | curl -s http://172.18.67.228:8091/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d @- | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > land-cartoon.png

t2i curl example

curl -s http://172.18.69.133:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "A beautiful landscape painting"}
    ],
    "extra_body": {
      "height": 1920,
      "width": 1920,
      "num_inference_steps": 50,
      "true_cfg_scale": 1.5,
      "seed": 42
    }
  }' | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > land.png

change the host:port and input/output file path accordingly @nainiu258

JaredforReal · 2026-04-23T06:14:50Z

@nainiu258 i2i online serving got some cache and processor problem for now, will inform u when fixed

JaredforReal · 2026-04-27T08:44:49Z

@nainiu258 GLM-Image is working fine in vllm-omni, can u pull the latest commit and give it another try? Thanks!

nainiu258 · 2026-04-28T02:47:21Z

@nainiu258 GLM-Image is working fine in vllm-omni, can u pull the latest commit and give it another try? Thanks!

got it

Signed-off-by: nainiu258 <cperfect02@163.com>

nainiu258 · 2026-04-28T15:23:57Z

@JaredforReal there is something wrong with the arguments --stage-0-devices and --stage-1-devices

vllm serve zai-org/GLM-Image --omni --port 8091 \
    --stage-0-devices 0 --stage-1-devices 0
usage: vllm [-h] [-v] {serve,bench} ...
vllm: error: unrecognized arguments: --stage-0-devices 0 --stage-1-devices 0

JaredforReal · 2026-04-28T15:47:21Z

@nainiu258 maybe u should just use CUDA_VISIBLE_DEVICE=0,1 vllm serve zai-org/GLM-Image --omni --port 8091 --served-model-name glm-image, I dont think we have a flag like stage-0-devices

nainiu258 · 2026-04-28T16:08:52Z

@nainiu258 maybe u should just use CUDA_VISIBLE_DEVICE=0,1 vllm serve zai-org/GLM-Image --omni --port 8091 --served-model-name glm-image, I dont think we have a flag like stage-0-devices

just found it in docs/user_guide/examples/online_serving/glm_image.md

JaredforReal · 2026-04-29T02:16:55Z

@nainiu258, the user guide is outdated after a lot of refactoring. I will work on it

herotai214 · 2026-05-08T06:59:26Z

@nainiu258 maybe u should just use CUDA_VISIBLE_DEVICE=0,1 vllm serve zai-org/GLM-Image --omni --port 8091 --served-model-name glm-image, I dont think we have a flag like stage-0-devices

@nainiu258
I think now we need to use flags like --stage-overrides '{"0": {"devices": "0"}, "1": {"devices": "0"}}', you can have a try.
I am also working on properly passing CLI config to GLM-Image (& other multi-stage models) (#3384)

Let's see if you can continue working on this GLM-Image recipe with these help writing, and hopefully we can update those outdated documentations too!

nainiu258 requested a review from hsliuustc0106 as a code owner April 20, 2026 10:54

nainiu258 changed the title ~~Add recipe for GLM-Image on 2x A800 GPUs and 1xA800 GPUs~~ Add recipe for GLM-Image on 2x A800 GPUs and 1x A800 GPU Apr 20, 2026

hsliuustc0106 reviewed Apr 20, 2026

View reviewed changes

nainiu258 added 4 commits April 21, 2026 10:40

docs(recipes): add GLM-Image recipe

877d61e

Signed-off-by: nainiu258 <cperfect02@163.com>

docs(recipes): link GLM-Image recipe from recipes README

6c2e34e

Signed-off-by: nainiu258 <cperfect02@163.com> Signed-off-by: nainiu258 <cperfect02@163.com>

docs(recipes): update GLM-Image model id and tidy whitespace in recipe

91280dd

Signed-off-by: nainiu258 <cperfect02@163.com>

docs:add GLM-Image E2E metrics and stage timing

47bfea4

Signed-off-by: nainiu258 <cperfect02@163.com>

nainiu258 force-pushed the main branch 3 times, most recently from 49de5ce to 47bfea4 Compare April 21, 2026 07:41

Replace the position of yaml

40fc45a

Signed-off-by: nainiu258 <cperfect02@163.com>

nainiu258 changed the title ~~Add recipe for GLM-Image on 2x A800 GPUs and 1x A800 GPU~~ [Docs]Add recipe for GLM-Image on 2x A800 GPUs and 1x A800 GPU Apr 21, 2026

hsliuustc0106 reviewed Apr 21, 2026

View reviewed changes

nainiu258 added 2 commits April 22, 2026 09:36

fix: update yaml position

aa295be

Signed-off-by: nainiu258 <cperfect02@163.com>

fixed instruction of start server

47409d1

Signed-off-by: nainiu258 <cperfect02@163.com>

JaredforReal reviewed Apr 22, 2026

View reviewed changes

docs: update GLM-Image recipe

21abb4b

Signed-off-by: nainiu258 <cperfect02@163.com>

Conversation

nainiu258 commented Apr 20, 2026

Summary

Changes

Uh oh!

chatgpt-codex-connector Bot commented Apr 20, 2026

Uh oh!

hsliuustc0106 commented Apr 20, 2026

Uh oh!

nainiu258 commented Apr 20, 2026

Uh oh!

hsliuustc0106 commented Apr 20, 2026

Uh oh!

hsliuustc0106 commented Apr 20, 2026

Uh oh!

nainiu258 commented Apr 20, 2026

Uh oh!

nainiu258 commented Apr 20, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

JaredforReal commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nainiu258 commented Apr 21, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hsliuustc0106 Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

JaredforReal Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

nainiu258 Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

JaredforReal Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nainiu258 Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

JaredforReal left a comment

Choose a reason for hiding this comment

Uh oh!

nainiu258 commented Apr 22, 2026

Uh oh!

JaredforReal commented Apr 23, 2026

Uh oh!

JaredforReal commented Apr 23, 2026

Uh oh!

JaredforReal commented Apr 27, 2026

Uh oh!

nainiu258 commented Apr 28, 2026

Uh oh!

nainiu258 commented Apr 28, 2026

Uh oh!

JaredforReal commented Apr 28, 2026

Uh oh!

nainiu258 commented Apr 28, 2026

Uh oh!

JaredforReal commented Apr 29, 2026

Uh oh!

herotai214 commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JaredforReal commented Apr 21, 2026 •

edited

Loading

JaredforReal Apr 22, 2026 •

edited

Loading