From 877d61eee05647f32a98765d6ddcbf9d249ac198 Mon Sep 17 00:00:00 2001 From: nainiu258 Date: Mon, 20 Apr 2026 13:27:00 +0800 Subject: [PATCH 1/9] docs(recipes): add GLM-Image recipe Signed-off-by: nainiu258 --- recipes/GLM/GLM-Image.md | 154 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 154 insertions(+) create mode 100644 recipes/GLM/GLM-Image.md diff --git a/recipes/GLM/GLM-Image.md b/recipes/GLM/GLM-Image.md new file mode 100644 index 00000000000..cd8963234b3 --- /dev/null +++ b/recipes/GLM/GLM-Image.md @@ -0,0 +1,154 @@ +# GLM-Image for text-to-image and image editing on 2× or 1× A800 80GB + +## Summary + +- Vendor: Z.ai +- Model: `zai-org/GLM-Image` +- Task: Text-to-image (T2I) and image-to-image / editing (I2I) +- Mode: Online serving with the OpenAI-compatible API +- Maintainer: Community + +## When to use this recipe + +Use this recipe when you want a known-good starting point for serving +`zai-org/GLM-Image` with vLLM-Omni on **two 80 GB NVIDIA A800** GPUs (Ampere-class, +same default layout as the upstream **2×A100 80GB** example: Stage 0 AR on GPU 0, +Stage 1 diffusion on GPU 1) and validate the deployment with the existing +`examples/online_serving/glm_image` clients. For **one** A800 80 GB GPU, follow +the **1× A800 80GB** section below (custom stage YAML required). + +## References + +- Upstream or canonical docs: + [`docs/user_guide/examples/online_serving/glm_image.md`](../../docs/user_guide/examples/online_serving/glm_image.md) +- Related example under `examples/`: + [`examples/online_serving/glm_image/README.md`](../../examples/online_serving/glm_image/README.md) +- Related issue or discussion: + [#2888](https://github.com/vllm-project/vllm-omni/pull/2888) + +## Hardware Support + +This recipe documents **dual-GPU** and **single-GPU** CUDA layouts on A800 80 GB +for the same software stack. Add more platforms (for example ROCm / NPU) as +community validation lands. + +## GPU + +### 2× A800 80GB + +#### Environment + +These versions were taken from a working **editable** install: activate `vllm-omni/.venv` (or your equivalent), then align `pip` / Git with the rows below when reproducing this recipe. + +- OS: Linux +- Python: 3.12 +- Driver / runtime: NVIDIA CUDA stack with **two** A800 80 GB GPUs visible (set `CUDA_VISIBLE_DEVICES` on your host if needed) +- vLLM: **0.19.0** +- vLLM-Omni: **0.19.0rc2.dev138+g38d5f2d53** (editable install from this repo; Git **`38d5f2d5`**, `git describe` ≈ **`v0.19.0rc1-138-g38d5f2d5`**) +- Transformers: **5.5.4** (same `.venv` as above; required so `glm_image` configs load for Stage 0) + +#### Command + +Start the server from the repository root: + +```bash +vllm serve zai-org/GLM-Image --omni --port 8091 +``` + +To use the bundled stage config explicitly (same default as above): + +```bash +vllm serve zai-org/GLM-Image \ + --omni \ + --port 8091 \ + --stage-configs-path vllm_omni/model_executor/stage_configs/glm_image.yaml +``` + +#### Verification + +Run one of the existing example clients after the server is ready: + +```bash +python examples/online_serving/glm_image/openai_chat_client.py \ + --prompt "A cute cat sitting on a window sill" \ + --output glm_image_output.png \ + --server http://localhost:8091 +``` +After the command finishes, check for the output files: + +```bash +ls glm_image_output.png +``` + +#### Notes + +- Memory usage: Roughly **~18 GiB + KV** on Stage 0 (AR) and **~20 GiB** on Stage 1 (DiT+VAE) per the user guide; two 80 GB cards match the default split. +- Key flags: `--omni` is required; `--stage-configs-path` is optional unless you use a custom YAML (for example single-GPU). +- Keep **Transformers ≥ 5.5.1** (this recipe used **5.5.4**) so `glm_image` configs resolve; otherwise Stage 0 can fail at `ModelConfig` validation. +- Known limitations: This starter recipe follows the dual-GPU online path documented under `examples/online_serving/glm_image`. The first request may be slower due to warmup. +- Generation time: ~62s end-to-end on 2× A800 80GB (50 inference steps, 1024×1024, chat completions client). + +### 1× A800 80GB + +Default `glm_image.yaml` pins Stage 0 to GPU **0** and Stage 1 to GPU **1**. +On a single card, both stages must use the **same** device id. + +#### Environment + +Same software stack as **2× A800 80GB** (Python **3.12**, vLLM **0.19.0**, +vLLM-Omni **0.19.0rc2.dev138+g38d5f2d53**, Transformers **5.5.4**), but only **one** +A800 80 GB GPU visible (often `CUDA_VISIBLE_DEVICES=0`). + +#### Command + +1. Copy the stock stage file and point **Stage 1** at the same GPU as Stage 0: + +```bash +cp vllm_omni/model_executor/stage_configs/glm_image.yaml \ + vllm_omni/model_executor/stage_configs/glm_image_single_gpu.yaml +``` + +In `glm_image_single_gpu.yaml`, Stage 0 already has `runtime.devices: "0"`. +Under **Stage 1** (`stage_id: 1`), change only the device line from `"1"` to `"0"`: + +```yaml + - stage_id: 1 + stage_type: diffusion + runtime: + process: true + devices: "0" # was "1" in the default dual-GPU file + requires_multimodal_data: true +``` + +2. Start the server with your file: + +```bash +vllm serve zai-org/GLM-Image \ + --omni \ + --port 8091 \ + --stage-configs-path vllm_omni/model_executor/stage_configs/glm_image_single_gpu.yaml +``` + +If you hit **OOM**, lower Stage 0 `engine_args.gpu_memory_utilization` in the same +YAML (for example from `0.6` to `0.5` or `0.45`) and retry; see the +[GLM-Image user guide FAQ](../../docs/user_guide/examples/online_serving/glm_image.md#faq). + +#### Verification + +Same commands as **2× A800 80GB** (Python client and `curl` smoke test); only +the server startup line changes because of `--stage-configs-path`. + +#### Notes + +- Memory usage: AR and diffusion **time-share** one 80 GB GPU; peak usage is + higher than the dual-GPU split. The user-guide ballpark (~48 GiB + KV for AR, + ~22 GiB for DiT+VAE) ~72G for inference in total +- Key flags: **`--stage-configs-path`** is **required** for single-GPU; the + default bundle still targets two GPUs. +- Keep Transformers **≥ 5.5.1** (here + **5.5.4**) for `glm_image` support. +- Known limitations: Stages no longer run on separate devices in parallel; + throughput differs from the 2× recipe. +- Generation time: ~62s end-to-end on 2× A800 80GB (50 inference steps, 1024×1024, chat completions client). + + From 6c2e34e99f1f1e637b152dad27f8d26e993071b4 Mon Sep 17 00:00:00 2001 From: nainiu258 Date: Mon, 20 Apr 2026 13:38:12 +0800 Subject: [PATCH 2/9] docs(recipes): link GLM-Image recipe from recipes README Signed-off-by: nainiu258 Signed-off-by: nainiu258 --- recipes/README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/recipes/README.md b/recipes/README.md index 01ecc41f185..539db67df2f 100644 --- a/recipes/README.md +++ b/recipes/README.md @@ -30,6 +30,9 @@ recipes/ - [`Wan-AI/Wan2.2-I2V.md`](./Wan-AI/Wan2.2-I2V.md): image-to-video serving recipe for Wan2.2 14B on `8x Ascend NPU (A2/A3)` +- [`GLM/GLM-Image.md`](./GLM/GLM-Image.md):online serving recipe for + image generation on `1x A800 80GB` and `2x A800 80GB` + Within a single recipe file, include different hardware support sections such as `GPU`, `ROCm`, and `NPU`, and add concrete tested configurations like `1x A100 80GB` or `2x L40S` inside those sections when applicable. From 91280ddb0df62df5b4f2ee9b819286ee6dd38a1c Mon Sep 17 00:00:00 2001 From: nainiu258 Date: Mon, 20 Apr 2026 13:51:00 +0800 Subject: [PATCH 3/9] docs(recipes): update GLM-Image model id and tidy whitespace in recipe Signed-off-by: nainiu258 --- recipes/GLM/GLM-Image.md | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/recipes/GLM/GLM-Image.md b/recipes/GLM/GLM-Image.md index cd8963234b3..175529312ad 100644 --- a/recipes/GLM/GLM-Image.md +++ b/recipes/GLM/GLM-Image.md @@ -2,8 +2,8 @@ ## Summary -- Vendor: Z.ai -- Model: `zai-org/GLM-Image` +- Vendor: Z.ai +- Model: `GLM/GLM-Image` - Task: Text-to-image (T2I) and image-to-image / editing (I2I) - Mode: Online serving with the OpenAI-compatible API - Maintainer: Community @@ -11,7 +11,7 @@ ## When to use this recipe Use this recipe when you want a known-good starting point for serving -`zai-org/GLM-Image` with vLLM-Omni on **two 80 GB NVIDIA A800** GPUs (Ampere-class, +`GLM/GLM-Image` with vLLM-Omni on **two 80 GB NVIDIA A800** GPUs (Ampere-class, same default layout as the upstream **2×A100 80GB** example: Stage 0 AR on GPU 0, Stage 1 diffusion on GPU 1) and validate the deployment with the existing `examples/online_serving/glm_image` clients. For **one** A800 80 GB GPU, follow @@ -83,7 +83,7 @@ ls glm_image_output.png #### Notes - Memory usage: Roughly **~18 GiB + KV** on Stage 0 (AR) and **~20 GiB** on Stage 1 (DiT+VAE) per the user guide; two 80 GB cards match the default split. -- Key flags: `--omni` is required; `--stage-configs-path` is optional unless you use a custom YAML (for example single-GPU). +- Key flags: `--omni` is required; `--stage-configs-path` is optional unless you use a custom YAML (for example single-GPU). - Keep **Transformers ≥ 5.5.1** (this recipe used **5.5.4**) so `glm_image` configs resolve; otherwise Stage 0 can fail at `ModelConfig` validation. - Known limitations: This starter recipe follows the dual-GPU online path documented under `examples/online_serving/glm_image`. The first request may be slower due to warmup. - Generation time: ~62s end-to-end on 2× A800 80GB (50 inference steps, 1024×1024, chat completions client). @@ -148,7 +148,5 @@ the server startup line changes because of `--stage-configs-path`. - Keep Transformers **≥ 5.5.1** (here **5.5.4**) for `glm_image` support. - Known limitations: Stages no longer run on separate devices in parallel; - throughput differs from the 2× recipe. + throughput differs from the 2× recipe. - Generation time: ~62s end-to-end on 2× A800 80GB (50 inference steps, 1024×1024, chat completions client). - - From 47bfea4038f57751b544f8018ec9e536eab9289d Mon Sep 17 00:00:00 2001 From: nainiu258 Date: Mon, 20 Apr 2026 20:36:18 +0800 Subject: [PATCH 4/9] docs:add GLM-Image E2E metrics and stage timing Signed-off-by: nainiu258 --- recipes/GLM/GLM-Image.md | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/recipes/GLM/GLM-Image.md b/recipes/GLM/GLM-Image.md index 175529312ad..bb4900c24ff 100644 --- a/recipes/GLM/GLM-Image.md +++ b/recipes/GLM/GLM-Image.md @@ -80,13 +80,29 @@ After the command finishes, check for the output files: ls glm_image_output.png ``` +#### Sample end-to-end metrics + +One representative **offline** GLM-Image E2E run on this recipe’s **2× A800 80GB**. +Overall summary from the run’s metrics. Rough wall-time split: **Stage 0 (AR)** ~**25 s**, +**Stage 1 (diffusion)** ~**34 s** (see `e2e_stage_*_wall_time_ms` below). + +| Field | Value | +| --- | ---: | +| e2e_requests | 1 | +| e2e_wall_time_ms | 61,148.679 | +| e2e_total_tokens | 1,300 | +| e2e_avg_time_per_request_ms | 61,148.679 | +| e2e_avg_tokens_per_s | 21.260 | +| e2e_stage_0_wall_time_ms | 24,708.760 | +| e2e_stage_1_wall_time_ms | 33,787.442 | + #### Notes - Memory usage: Roughly **~18 GiB + KV** on Stage 0 (AR) and **~20 GiB** on Stage 1 (DiT+VAE) per the user guide; two 80 GB cards match the default split. - Key flags: `--omni` is required; `--stage-configs-path` is optional unless you use a custom YAML (for example single-GPU). - Keep **Transformers ≥ 5.5.1** (this recipe used **5.5.4**) so `glm_image` configs resolve; otherwise Stage 0 can fail at `ModelConfig` validation. - Known limitations: This starter recipe follows the dual-GPU online path documented under `examples/online_serving/glm_image`. The first request may be slower due to warmup. -- Generation time: ~62s end-to-end on 2× A800 80GB (50 inference steps, 1024×1024, chat completions client). +- Generation time: about **61 s** wall time end-to-end for the sample above (50 inference steps, 1024×1024). ### 1× A800 80GB @@ -135,7 +151,7 @@ YAML (for example from `0.6` to `0.5` or `0.45`) and retry; see the #### Verification -Same commands as **2× A800 80GB** (Python client and `curl` smoke test); only +Same commands as **2× A800 80GB** ; only the server startup line changes because of `--stage-configs-path`. #### Notes @@ -149,4 +165,4 @@ the server startup line changes because of `--stage-configs-path`. **5.5.4**) for `glm_image` support. - Known limitations: Stages no longer run on separate devices in parallel; throughput differs from the 2× recipe. -- Generation time: ~62s end-to-end on 2× A800 80GB (50 inference steps, 1024×1024, chat completions client). +- Generation time: ~62s end-to-end on 1× A800 80GB (50 inference steps, 1024×1024, chat completions client). From 40fc45afc6fbb9ead2c60307f9eb3ee22285124e Mon Sep 17 00:00:00 2001 From: nainiu258 Date: Tue, 21 Apr 2026 16:52:47 +0800 Subject: [PATCH 5/9] Replace the position of yaml Signed-off-by: nainiu258 --- recipes/GLM/GLM-Image.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/recipes/GLM/GLM-Image.md b/recipes/GLM/GLM-Image.md index bb4900c24ff..50bb5b8c22f 100644 --- a/recipes/GLM/GLM-Image.md +++ b/recipes/GLM/GLM-Image.md @@ -61,7 +61,7 @@ To use the bundled stage config explicitly (same default as above): vllm serve zai-org/GLM-Image \ --omni \ --port 8091 \ - --stage-configs-path vllm_omni/model_executor/stage_configs/glm_image.yaml + --stage-configs-path vllm_omni/deploy/glm_image.yaml ``` #### Verification @@ -120,8 +120,8 @@ A800 80 GB GPU visible (often `CUDA_VISIBLE_DEVICES=0`). 1. Copy the stock stage file and point **Stage 1** at the same GPU as Stage 0: ```bash -cp vllm_omni/model_executor/stage_configs/glm_image.yaml \ - vllm_omni/model_executor/stage_configs/glm_image_single_gpu.yaml +cp vllm_omni/deploy/glm_image.yaml \ + vllm_omni/deploy/glm_image_single_gpu.yaml ``` In `glm_image_single_gpu.yaml`, Stage 0 already has `runtime.devices: "0"`. From aa295bed3cc77cd63b4742a34ef722a0a342cd5c Mon Sep 17 00:00:00 2001 From: nainiu258 Date: Wed, 22 Apr 2026 09:36:22 +0800 Subject: [PATCH 6/9] fix: update yaml position Signed-off-by: nainiu258 --- recipes/GLM/GLM-Image.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/recipes/GLM/GLM-Image.md b/recipes/GLM/GLM-Image.md index 50bb5b8c22f..32dbb8875a7 100644 --- a/recipes/GLM/GLM-Image.md +++ b/recipes/GLM/GLM-Image.md @@ -142,7 +142,7 @@ Under **Stage 1** (`stage_id: 1`), change only the device line from `"1"` to `"0 vllm serve zai-org/GLM-Image \ --omni \ --port 8091 \ - --stage-configs-path vllm_omni/model_executor/stage_configs/glm_image_single_gpu.yaml + --stage-configs-path vllm_omni/deploy/glm_image_single_gpu.yaml ``` If you hit **OOM**, lower Stage 0 `engine_args.gpu_memory_utilization` in the same From 47409d1a99569a6845ecb5dc055f3e02b504eac1 Mon Sep 17 00:00:00 2001 From: nainiu258 Date: Wed, 22 Apr 2026 15:32:29 +0800 Subject: [PATCH 7/9] fixed instruction of start server Signed-off-by: nainiu258 --- recipes/GLM/GLM-Image.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/recipes/GLM/GLM-Image.md b/recipes/GLM/GLM-Image.md index 32dbb8875a7..962152b144a 100644 --- a/recipes/GLM/GLM-Image.md +++ b/recipes/GLM/GLM-Image.md @@ -61,7 +61,7 @@ To use the bundled stage config explicitly (same default as above): vllm serve zai-org/GLM-Image \ --omni \ --port 8091 \ - --stage-configs-path vllm_omni/deploy/glm_image.yaml + --deploy-config vllm_omni/deploy/glm_image.yaml ``` #### Verification @@ -142,7 +142,7 @@ Under **Stage 1** (`stage_id: 1`), change only the device line from `"1"` to `"0 vllm serve zai-org/GLM-Image \ --omni \ --port 8091 \ - --stage-configs-path vllm_omni/deploy/glm_image_single_gpu.yaml + --deploy-config vllm_omni/deploy/glm_image_single_gpu.yaml ``` If you hit **OOM**, lower Stage 0 `engine_args.gpu_memory_utilization` in the same From 21abb4bdcfc5fa1b43a49f748830492b749bacdd Mon Sep 17 00:00:00 2001 From: nainiu258 Date: Tue, 28 Apr 2026 18:38:37 +0800 Subject: [PATCH 8/9] docs: update GLM-Image recipe Signed-off-by: nainiu258 --- recipes/GLM/GLM-Image.md | 96 +++++++++------------------------------- 1 file changed, 20 insertions(+), 76 deletions(-) diff --git a/recipes/GLM/GLM-Image.md b/recipes/GLM/GLM-Image.md index 962152b144a..525f395619a 100644 --- a/recipes/GLM/GLM-Image.md +++ b/recipes/GLM/GLM-Image.md @@ -1,10 +1,10 @@ -# GLM-Image for text-to-image and image editing on 2× or 1× A800 80GB +# GLM-Image for text-to-image and image editing on 2× A800 80GB ## Summary - Vendor: Z.ai - Model: `GLM/GLM-Image` -- Task: Text-to-image (T2I) and image-to-image / editing (I2I) +- Task: Text-to-image (T2I) and image-to-image - Mode: Online serving with the OpenAI-compatible API - Maintainer: Community @@ -14,21 +14,18 @@ Use this recipe when you want a known-good starting point for serving `GLM/GLM-Image` with vLLM-Omni on **two 80 GB NVIDIA A800** GPUs (Ampere-class, same default layout as the upstream **2×A100 80GB** example: Stage 0 AR on GPU 0, Stage 1 diffusion on GPU 1) and validate the deployment with the existing -`examples/online_serving/glm_image` clients. For **one** A800 80 GB GPU, follow -the **1× A800 80GB** section below (custom stage YAML required). +`examples/online_serving/glm_image` clients. ## References - Upstream or canonical docs: [`docs/user_guide/examples/online_serving/glm_image.md`](../../docs/user_guide/examples/online_serving/glm_image.md) -- Related example under `examples/`: - [`examples/online_serving/glm_image/README.md`](../../examples/online_serving/glm_image/README.md) - Related issue or discussion: [#2888](https://github.com/vllm-project/vllm-omni/pull/2888) ## Hardware Support -This recipe documents **dual-GPU** and **single-GPU** CUDA layouts on A800 80 GB +This recipe documents **dual-GPU** CUDA layouts on A800 80 GB for the same software stack. Add more platforms (for example ROCm / NPU) as community validation lands. @@ -69,15 +66,25 @@ vllm serve zai-org/GLM-Image \ Run one of the existing example clients after the server is ready: ```bash -python examples/online_serving/glm_image/openai_chat_client.py \ - --prompt "A cute cat sitting on a window sill" \ - --output glm_image_output.png \ - --server http://localhost:8091 +curl -s http://172.18.69.133:8000/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{ + "messages": [ + {"role": "user", "content": "A beautiful landscape painting"} + ], + "extra_body": { + "height": 1920, + "width": 1920, + "num_inference_steps": 50, + "true_cfg_scale": 1.5, + "seed": 42 + } + }' | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > land.png ``` After the command finishes, check for the output files: ```bash -ls glm_image_output.png +ls output.png ``` #### Sample end-to-end metrics @@ -98,71 +105,8 @@ Overall summary from the run’s metrics. Rough wall-time split: **Stage 0 (AR)* #### Notes -- Memory usage: Roughly **~18 GiB + KV** on Stage 0 (AR) and **~20 GiB** on Stage 1 (DiT+VAE) per the user guide; two 80 GB cards match the default split. +- Memory usage: Roughly **~38 GiB + KV** on Stage 0 (AR) and **~20 GiB** on Stage 1 (DiT+VAE) per the user guide; two 80 GB cards match the default split. - Key flags: `--omni` is required; `--stage-configs-path` is optional unless you use a custom YAML (for example single-GPU). - Keep **Transformers ≥ 5.5.1** (this recipe used **5.5.4**) so `glm_image` configs resolve; otherwise Stage 0 can fail at `ModelConfig` validation. - Known limitations: This starter recipe follows the dual-GPU online path documented under `examples/online_serving/glm_image`. The first request may be slower due to warmup. - Generation time: about **61 s** wall time end-to-end for the sample above (50 inference steps, 1024×1024). - -### 1× A800 80GB - -Default `glm_image.yaml` pins Stage 0 to GPU **0** and Stage 1 to GPU **1**. -On a single card, both stages must use the **same** device id. - -#### Environment - -Same software stack as **2× A800 80GB** (Python **3.12**, vLLM **0.19.0**, -vLLM-Omni **0.19.0rc2.dev138+g38d5f2d53**, Transformers **5.5.4**), but only **one** -A800 80 GB GPU visible (often `CUDA_VISIBLE_DEVICES=0`). - -#### Command - -1. Copy the stock stage file and point **Stage 1** at the same GPU as Stage 0: - -```bash -cp vllm_omni/deploy/glm_image.yaml \ - vllm_omni/deploy/glm_image_single_gpu.yaml -``` - -In `glm_image_single_gpu.yaml`, Stage 0 already has `runtime.devices: "0"`. -Under **Stage 1** (`stage_id: 1`), change only the device line from `"1"` to `"0"`: - -```yaml - - stage_id: 1 - stage_type: diffusion - runtime: - process: true - devices: "0" # was "1" in the default dual-GPU file - requires_multimodal_data: true -``` - -2. Start the server with your file: - -```bash -vllm serve zai-org/GLM-Image \ - --omni \ - --port 8091 \ - --deploy-config vllm_omni/deploy/glm_image_single_gpu.yaml -``` - -If you hit **OOM**, lower Stage 0 `engine_args.gpu_memory_utilization` in the same -YAML (for example from `0.6` to `0.5` or `0.45`) and retry; see the -[GLM-Image user guide FAQ](../../docs/user_guide/examples/online_serving/glm_image.md#faq). - -#### Verification - -Same commands as **2× A800 80GB** ; only -the server startup line changes because of `--stage-configs-path`. - -#### Notes - -- Memory usage: AR and diffusion **time-share** one 80 GB GPU; peak usage is - higher than the dual-GPU split. The user-guide ballpark (~48 GiB + KV for AR, - ~22 GiB for DiT+VAE) ~72G for inference in total -- Key flags: **`--stage-configs-path`** is **required** for single-GPU; the - default bundle still targets two GPUs. -- Keep Transformers **≥ 5.5.1** (here - **5.5.4**) for `glm_image` support. -- Known limitations: Stages no longer run on separate devices in parallel; - throughput differs from the 2× recipe. -- Generation time: ~62s end-to-end on 1× A800 80GB (50 inference steps, 1024×1024, chat completions client). From 2e33b11faf9013a5347da029cc8fada260cb4bd6 Mon Sep 17 00:00:00 2001 From: nainiu258 <101917677+nainiu258@users.noreply.github.com> Date: Thu, 28 May 2026 19:10:27 +0800 Subject: [PATCH 9/9] Apply suggestion from @hsliuustc0106 Co-authored-by: Hongsheng Liu Signed-off-by: nainiu258 <101917677+nainiu258@users.noreply.github.com> --- recipes/GLM/GLM-Image.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/recipes/GLM/GLM-Image.md b/recipes/GLM/GLM-Image.md index 525f395619a..bd11558c92d 100644 --- a/recipes/GLM/GLM-Image.md +++ b/recipes/GLM/GLM-Image.md @@ -1,4 +1,4 @@ -# GLM-Image for text-to-image and image editing on 2× A800 80GB +# GLM-Image for text-to-image and image editing ## Summary