Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
fc54ff9
refine glm-image implementation
JaredforReal Jan 15, 2026
91bff06
implement GLM Image vllm AR
JaredforReal Jan 15, 2026
21df56b
init multistage
JaredforReal Jan 15, 2026
b02a12a
revert attention_mask in GlmImageAttention forward()
JaredforReal Jan 15, 2026
d569790
init and registry
JaredforReal Jan 15, 2026
234e49f
implement stage config and stage input processor
JaredforReal Jan 15, 2026
2320cf8
fix image2image error
JaredforReal Jan 15, 2026
6b9b486
implement a pre processor func
JaredforReal Jan 15, 2026
2d92e22
fix image2image error
JaredforReal Jan 15, 2026
fbb2ac8
update stage config
JaredforReal Jan 15, 2026
6f0e4de
Merge branch 'dev/rebase_0.14.0' into perf/glm-image
JaredforReal Jan 15, 2026
0e5366f
implement example offline end2end files
JaredforReal Jan 15, 2026
dc8f2c2
modify dit configs
JaredforReal Jan 15, 2026
ad0da64
fix end2end offline examples
JaredforReal Jan 15, 2026
974ed22
support sub folder and model arch
JaredforReal Jan 15, 2026
f5551f2
fix import error
JaredforReal Jan 16, 2026
88439b6
tokenizer
JaredforReal Jan 16, 2026
6676c96
fix BaseDummyInputsBuilders
JaredforReal Jan 16, 2026
324348c
tokenizer sub dir
JaredforReal Jan 16, 2026
c0877a3
fix text2image
JaredforReal Jan 16, 2026
277fd0d
fix i2i dummy inputs
JaredforReal Jan 16, 2026
4883184
fix gate up load weight
JaredforReal Jan 16, 2026
393dfd2
fix gate up load weight
JaredforReal Jan 16, 2026
aa4b586
get transformer/config.json
JaredforReal Jan 16, 2026
daa57c5
add glm image mrope
JaredforReal Jan 16, 2026
82460fa
fix glm_image spelling
JaredforReal Jan 16, 2026
e1d946f
fix
JaredforReal Jan 16, 2026
468cd04
fix compute_logits
JaredforReal Jan 16, 2026
59df495
fix glm image stage input processors
JaredforReal Jan 16, 2026
396883c
fix stage input
JaredforReal Jan 16, 2026
2a36a47
fix stage input
JaredforReal Jan 16, 2026
31fa960
debug
JaredforReal Jan 16, 2026
1b15a94
diffusion temperature 1.0
JaredforReal Jan 16, 2026
67ec0af
end2end params temp
JaredforReal Jan 16, 2026
15c6a36
apply_chat_template, prepocessor text
JaredforReal Jan 16, 2026
36bd0f7
get processor config
JaredforReal Jan 16, 2026
186f149
debug logging
JaredforReal Jan 16, 2026
69197a4
use processor.tokenizer
JaredforReal Jan 16, 2026
c22875b
use temperature 0.9 and 0.15 top_p
JaredforReal Jan 16, 2026
e37bfcb
align image_grid_thw with transformers
JaredforReal Jan 16, 2026
ac8a81b
fix params
JaredforReal Jan 16, 2026
a6ae872
fix mrope calc
JaredforReal Jan 16, 2026
48881ed
fix import
JaredforReal Jan 16, 2026
2dda88f
add debug logging
JaredforReal Jan 16, 2026
e676c93
more logs
JaredforReal Jan 16, 2026
fc540f7
fix config
JaredforReal Jan 16, 2026
585cecd
more logs
JaredforReal Jan 16, 2026
9620875
correct text-to-image detection for M-RoPE position computation
JaredforReal Jan 16, 2026
0dad161
override config detection
JaredforReal Jan 16, 2026
ed232c7
use a straight detection of mrope_section
JaredforReal Jan 16, 2026
eff86c1
use get_model
JaredforReal Jan 16, 2026
e93c18b
cleanup: remove debug logging and simplify docstrings
JaredforReal Jan 16, 2026
ad849f0
feat: add profiling points for stage timing analysis
JaredforReal Jan 16, 2026
44f2d30
try implement i2i mode
JaredforReal Jan 20, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
138 changes: 138 additions & 0 deletions examples/offline_inference/glm_image/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# GLM-Image Multistage End-to-End Inference

This example demonstrates how to run GLM-Image with the vLLM-Omni multistage architecture.

## Architecture

GLM-Image uses a 2-stage pipeline:

```
┌─────────────────────────────────────────────────────────────┐
│ GLM-Image Pipeline │
├─────────────────────────────────────────────────────────────┤
│ │
│ Stage 0 (AR Model) Stage 1 (Diffusion) │
│ ┌─────────────────┐ ┌─────────────────────┐ │
│ │ vLLM-optimized │ │ GlmImagePipeline │ │
│ │ GlmImageFor │ prior │ ┌───────────────┐ │ │
│ │ Conditional │──tokens───►│ │ DiT Denoiser │ │ │
│ │ Generation │ │ └───────────────┘ │ │
│ │ (9B AR model) │ │ │ │ │
│ └─────────────────┘ │ ▼ │ │
│ ▲ │ ┌───────────────┐ │ │
│ │ │ │ VAE Decode │──┼──► Image
│ Text/Image │ └───────────────┘ │ │
│ Input └─────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
```

## Features

- **vLLM-optimized AR**: Uses PagedAttention and tensor parallelism for faster prior token generation
- **Flexible deployment**: AR and Diffusion stages can run on different GPUs
- **Text-to-Image**: Generate images from text descriptions
- **Image-to-Image**: Edit existing images with text prompts

## Usage

### Text-to-Image

```bash
python end2end.py \
--model-path /path/to/glm-image \
--config-path ../../vllm_omni/model_executor/stage_configs/glm_image.yaml \
--prompt "A beautiful sunset over the ocean with sailing boats" \
--height 1024 \
--width 1024 \
--output output_t2i.png
```

### Image-to-Image (Image Editing)

```bash
python end2end.py \
--model-path /path/to/glm-image \
--config-path ../../vllm_omni/model_executor/stage_configs/glm_image.yaml \
--prompt "Transform this scene into a winter wonderland" \
--image input.png \
--output output_i2i.png
```

### With Custom Parameters

```bash
python end2end.py \
--model-path /path/to/glm-image \
--config-path ../../vllm_omni/model_executor/stage_configs/glm_image.yaml \
--prompt "A photorealistic cat sitting on a window sill" \
--height 1024 \
--width 1024 \
--num-inference-steps 50 \
--guidance-scale 1.5 \
--seed 42 \
--output output.png
```

## Shell Scripts

### Run Text-to-Image

```bash
./run_t2i.sh
```

### Run Image-to-Image

```bash
./run_i2i.sh --image /path/to/input.png
```

## Stage Configuration

The stage config (`glm_image.yaml`) defines:

- **Stage 0 (AR)**: Uses `GPUARWorker` with vLLM engine

- Model: `GlmImageForConditionalGeneration`
- Output: `token_ids` (prior tokens)

- **Stage 1 (Diffusion)**: Uses diffusion engine
- Model: `GlmImagePipeline`
- Output: Generated image

See `vllm_omni/model_executor/stage_configs/glm_image.yaml` for full configuration.

## Comparison with Single-Stage

| Aspect | Single-Stage (transformers) | Multistage (vLLM) |
| ----------- | --------------------------- | ------------------- |
| AR Model | transformers native | vLLM PagedAttention |
| Memory | Higher (no KV cache opt) | Lower (optimized) |
| Throughput | Lower | Higher |
| Flexibility | Single GPU | Multi-GPU support |

## Troubleshooting

### OOM Error

Try reducing memory usage:

```bash
# In glm_image.yaml, adjust:
gpu_memory_utilization: 0.5 # Reduce from 0.6
```

### Slow Initialization

The first run loads model weights. Subsequent runs are faster:

```bash
--stage-init-timeout 900 # Increase timeout for slow storage
```

## Requirements

- vLLM-Omni with GLM-Image support
- CUDA-capable GPU (recommended: H100/A100 with 80GB)
- GLM-Image model weights
Loading