Skip to content

[Docs]Add recipe for GLM-Image on 2x A800 GPUs and 1x A800 GPU#2950

Open
nainiu258 wants to merge 8 commits into
vllm-project:mainfrom
nainiu258:main
Open

[Docs]Add recipe for GLM-Image on 2x A800 GPUs and 1x A800 GPU#2950
nainiu258 wants to merge 8 commits into
vllm-project:mainfrom
nainiu258:main

Conversation

@nainiu258
Copy link
Copy Markdown

Summary

Adds a community recipe for serving Z.ai GLM-Image with vLLM-Omni: text-to-image (T2I) via the OpenAI-compatible online API, including 1× and 2× NVIDIA A800 80GB deployment notes and links to the canonical user guide and examples/online_serving/glm_image clients.

Changes

File Description
recipes/GLM/GLM-Image.md New recipe: vendor/model context, when to use it, GPU sections (default 2× A800 80GB split vs 1× A800 80GB with custom stage YAML), environment, verification, and operational notes; references upstream docs and related discussion (#2888). Model id documented as GLM/GLM-Image (aligned with HF-style naming in the recipe).
recipes/README.md Index entry linking to GLM/GLM-Image.md for 1×/2× A800 80GB image generation.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

I just merged #2320, you can test it locally again and paste test results

@nainiu258
Copy link
Copy Markdown
Author

I just merged #2320, you can test it locally again and paste test results

I just merged #2320, you can test it locally again and paste test results

Seems like nothing changed in my end-to-end case:

time python examples/online_serving/glm_image/openai_chat_client.py  
    --prompt "A cute cat sitting on a window sill"   
    --output glm_image_output.png   
    --server http://localhost:8091
Mode: text-to-image
Prompt: A cute cat sitting on a window sill
Sending text-to-image request to http://localhost:8091...
Image saved to: glm_image_output.png
Size: 1034.0 KB

real    1m2.037s

2x A800:
GPU memory: GPU0 48774 MiB, GPU1 23900 MiB

And here's the output of DiffusionPipelineProfiler :

GlmImagePipeline.text_encoder.forward took 0.008926s
GlmImagePipeline.text_encoder.forward took 0.007685s
GlmImagePipeline.diffuse took 33.394773s
GlmImagePipeline.vae.decode took 0.351598s
GlmImagePipeline.forward took 33.766904s

The result on 1x A800 is the same

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

I just merged #2320, you can test it locally again and paste test results

I just merged #2320, you can test it locally again and paste test results

Seems like nothing changed in my end-to-end case:

time python examples/online_serving/glm_image/openai_chat_client.py  
    --prompt "A cute cat sitting on a window sill"   
    --output glm_image_output.png   
    --server http://localhost:8091
Mode: text-to-image
Prompt: A cute cat sitting on a window sill
Sending text-to-image request to http://localhost:8091...
Image saved to: glm_image_output.png
Size: 1034.0 KB

real    1m2.037s

2x A800: GPU memory: GPU0 48774 MiB, GPU1 23900 MiB

And here's the output of DiffusionPipelineProfiler :

GlmImagePipeline.text_encoder.forward took 0.008926s GlmImagePipeline.text_encoder.forward took 0.007685s GlmImagePipeline.diffuse took 33.394773s GlmImagePipeline.vae.decode took 0.351598s GlmImagePipeline.forward took 33.766904s

The result on 1x A800 is the same

why there is no ar part time? the glm-image first does understanding and then does image generation. cc @JaredforReal

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

check #2834

@nainiu258
Copy link
Copy Markdown
Author

check #2834

Got it!

@nainiu258 nainiu258 changed the title Add recipe for GLM-Image on 2x A800 GPUs and 1xA800 GPUs Add recipe for GLM-Image on 2x A800 GPUs and 1x A800 GPU Apr 20, 2026
@nainiu258
Copy link
Copy Markdown
Author

stage0(AR) takes ~25s, stage1(Diffusion) takes ~33s. on 2x A800 and 1x A800 is almost the same
[Overall Summary]
| e2e_wall_time_ms | 61,148.679 |
| e2e_stage_0_wall_time_ms | 24,708.760 |
| e2e_stage_1_wall_time_ms | 33,787.442 |

Copy link
Copy Markdown
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BLOCKING:

  • Gate Check — DCO is failing. Please sign off your commits before proceeding.

@JaredforReal
Copy link
Copy Markdown
Contributor

JaredforReal commented Apr 21, 2026

@nainiu258 @hsliuustc0106 I use benchmarks/diffusion/duffusion_benchmark_service.py and serve GLM-Image with --enable-diffusion-pipeline-profiler flag, which gives profiling info of DiT stage, but no AR stage.
I think we need a benchmark specifically for GLM-Image. I will work on it

Signed-off-by: nainiu258 <cperfect02@163.com>
Signed-off-by: nainiu258 <cperfect02@163.com>

Signed-off-by: nainiu258 <cperfect02@163.com>
Signed-off-by: nainiu258 <cperfect02@163.com>
Signed-off-by: nainiu258 <cperfect02@163.com>
@nainiu258 nainiu258 force-pushed the main branch 3 times, most recently from 49de5ce to 47bfea4 Compare April 21, 2026 07:41
@nainiu258
Copy link
Copy Markdown
Author

BLOCKING:

* **Gate Check** — DCO is failing. Please sign off your commits before proceeding.

fixed

Signed-off-by: nainiu258 <cperfect02@163.com>
@nainiu258 nainiu258 changed the title Add recipe for GLM-Image on 2x A800 GPUs and 1x A800 GPU [Docs]Add recipe for GLM-Image on 2x A800 GPUs and 1x A800 GPU Apr 21, 2026
Copy link
Copy Markdown
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check #2977

Comment thread recipes/GLM/GLM-Image.md Outdated
Comment thread recipes/GLM/GLM-Image.md
Overall summary from the run’s metrics. Rough wall-time split: **Stage 0 (AR)** ~**25 s**,
**Stage 1 (diffusion)** ~**34 s** (see `e2e_stage_*_wall_time_ms` below).

| Field | Value |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is some problem with this metrics, cc @bjf-frz could you fix it asap?

Signed-off-by: nainiu258 <cperfect02@163.com>
Signed-off-by: nainiu258 <cperfect02@163.com>
Comment thread recipes/GLM/GLM-Image.md Outdated

- Upstream or canonical docs:
[`docs/user_guide/examples/online_serving/glm_image.md`](../../docs/user_guide/examples/online_serving/glm_image.md)
- Related example under `examples/`:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer exists now

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/docs/user_guide/examples/offline_inference/glm_image.md

can we replace with this?

Comment thread recipes/GLM/GLM-Image.md Outdated
vllm serve zai-org/GLM-Image \
--omni \
--port 8091 \
--stage-configs-path vllm_omni/deploy/glm_image_single_gpu.yaml
Copy link
Copy Markdown
Contributor

@JaredforReal JaredforReal Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will deprecate --stage-config-path and turn to --deploy-config, --stage-overrides, double check thanks

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Copy Markdown
Contributor

@JaredforReal JaredforReal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like there are gaps after config refactoring, PTAL

@nainiu258
Copy link
Copy Markdown
Author

seems like there are gaps after config refactoring, PTAL

could you please tell how can we run an example after server is ready?

@JaredforReal
Copy link
Copy Markdown
Contributor

i2i curl example

jq -n --rawfile img <(base64 -i land.png | tr -d '\n') '{
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": ("data:image/png;base64," + $img)}},
        {"type": "text", "text": "make it cartoon style"}
      ],
      "modalities": ["image"],
    }
  ],
  "extra_body": {
    "height": 1024,
    "width": 1024,
    "num_inference_steps": 50,
    "true_cfg_scale": 4.0,
    "seed": 42
  }
}' | curl -s http://172.18.67.228:8091/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d @- | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > land-cartoon.png

t2i curl example

curl -s http://172.18.69.133:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "A beautiful landscape painting"}
    ],
    "extra_body": {
      "height": 1920,
      "width": 1920,
      "num_inference_steps": 50,
      "true_cfg_scale": 1.5,
      "seed": 42
    }
  }' | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > land.png

change the host:port and input/output file path accordingly @nainiu258

@JaredforReal
Copy link
Copy Markdown
Contributor

@nainiu258 i2i online serving got some cache and processor problem for now, will inform u when fixed

@JaredforReal
Copy link
Copy Markdown
Contributor

@nainiu258 GLM-Image is working fine in vllm-omni, can u pull the latest commit and give it another try? Thanks!

@nainiu258
Copy link
Copy Markdown
Author

@nainiu258 GLM-Image is working fine in vllm-omni, can u pull the latest commit and give it another try? Thanks!

got it

Signed-off-by: nainiu258 <cperfect02@163.com>
@nainiu258
Copy link
Copy Markdown
Author

@JaredforReal there is something wrong with the arguments --stage-0-devices and --stage-1-devices

vllm serve zai-org/GLM-Image --omni --port 8091 \
    --stage-0-devices 0 --stage-1-devices 0
usage: vllm [-h] [-v] {serve,bench} ...
vllm: error: unrecognized arguments: --stage-0-devices 0 --stage-1-devices 0

@JaredforReal
Copy link
Copy Markdown
Contributor

@nainiu258 maybe u should just use CUDA_VISIBLE_DEVICE=0,1 vllm serve zai-org/GLM-Image --omni --port 8091 --served-model-name glm-image, I dont think we have a flag like stage-0-devices

@nainiu258
Copy link
Copy Markdown
Author

@nainiu258 maybe u should just use CUDA_VISIBLE_DEVICE=0,1 vllm serve zai-org/GLM-Image --omni --port 8091 --served-model-name glm-image, I dont think we have a flag like stage-0-devices

just found it in docs/user_guide/examples/online_serving/glm_image.md

@JaredforReal
Copy link
Copy Markdown
Contributor

@nainiu258, the user guide is outdated after a lot of refactoring. I will work on it

@herotai214
Copy link
Copy Markdown

@nainiu258 maybe u should just use CUDA_VISIBLE_DEVICE=0,1 vllm serve zai-org/GLM-Image --omni --port 8091 --served-model-name glm-image, I dont think we have a flag like stage-0-devices

@nainiu258
I think now we need to use flags like --stage-overrides '{"0": {"devices": "0"}, "1": {"devices": "0"}}', you can have a try.
I am also working on properly passing CLI config to GLM-Image (& other multi-stage models) (#3384)

Let's see if you can continue working on this GLM-Image recipe with these help writing, and hopefully we can update those outdated documentations too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants