Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
33964bf
init
JaredforReal Jan 21, 2026
8936152
init end2end file
JaredforReal Jan 21, 2026
8edc4e9
registry
JaredforReal Jan 21, 2026
589b866
deal with diffusion stage
JaredforReal Jan 21, 2026
4e3ae79
update for new mmencoderattn
JaredforReal Jan 21, 2026
88a2a23
remove |image| for i2i
JaredforReal Jan 21, 2026
80ad31b
use self.get_model()
JaredforReal Jan 21, 2026
fe20716
detailed log
JaredforReal Jan 21, 2026
83e9fe9
add get_language_model()
JaredforReal Jan 21, 2026
96ff710
add get_language_model()
JaredforReal Jan 21, 2026
e3d3489
extract_moltimodal_outputs
JaredforReal Jan 21, 2026
ee7eef3
add custom mrope
JaredforReal Jan 21, 2026
00ccb97
fix i2i mrope
JaredforReal Jan 21, 2026
054abf9
fix i2i mrope
JaredforReal Jan 21, 2026
d32b844
add model_subdir to stage_worker_async
JaredforReal Jan 21, 2026
4968725
align import with vllm 0.14.0
JaredforReal Jan 21, 2026
f4b44e6
fix
JaredforReal Jan 25, 2026
d9d0ea4
simplify
JaredforReal Jan 26, 2026
1c8c89e
revert
JaredforReal Jan 26, 2026
829bc87
reimplement mrope
JaredforReal Jan 26, 2026
d9deaf1
fix
JaredforReal Jan 26, 2026
2828c3b
fix
JaredforReal Jan 26, 2026
f67f9f2
fix i2i
JaredforReal Jan 26, 2026
8fc9a5d
fix
JaredforReal Jan 26, 2026
163eca3
fix
JaredforReal Jan 27, 2026
6d4b146
fix profile
JaredforReal Jan 27, 2026
2477035
remove supports_multimodal_raw_input_only
JaredforReal Jan 27, 2026
c68723e
fix i2i get_mrope_input_positions
JaredforReal Jan 27, 2026
58ca1dc
debug
JaredforReal Jan 27, 2026
729bbbd
debug
JaredforReal Jan 27, 2026
df6c27c
debug
JaredforReal Jan 27, 2026
e1997d1
debug
JaredforReal Jan 27, 2026
cee1380
fix
JaredforReal Jan 27, 2026
968b97a
fix
JaredforReal Jan 27, 2026
987e8bb
fix
JaredforReal Jan 27, 2026
2c0c2ff
fix
JaredforReal Jan 27, 2026
26b9f84
fix
JaredforReal Jan 27, 2026
f41d101
fix
JaredforReal Jan 27, 2026
7965a15
debug
JaredforReal Jan 27, 2026
09bacfa
debug
JaredforReal Jan 27, 2026
ca0eb2a
fix
JaredforReal Jan 27, 2026
6bc98e3
fix
JaredforReal Jan 27, 2026
b4b7d4d
fix
JaredforReal Jan 27, 2026
342f059
rename
JaredforReal Jan 27, 2026
18909cd
Merge branch 'main' into perf/glm-image-main
JaredforReal Jan 27, 2026
0592a80
remove logging debug codes
JaredforReal Jan 28, 2026
56226d2
remove transformers AR
JaredforReal Jan 28, 2026
cb59851
fix dummy warmup
JaredforReal Jan 28, 2026
d9bf80b
restore unnecessary change
JaredforReal Jan 28, 2026
902192b
clean up
JaredforReal Jan 28, 2026
c7b2be1
change for batch support
JaredforReal Jan 29, 2026
8824f88
Merge branch 'main' into perf/glm-image-main
JaredforReal Jan 29, 2026
1c157dd
update configs
JaredforReal Jan 29, 2026
ea437ba
update sampling_params
JaredforReal Jan 29, 2026
aeffa9a
update end2end
JaredforReal Jan 29, 2026
c48b0e4
try fix ar2diffusion
JaredforReal Jan 29, 2026
d82987b
pre-commit
JaredforReal Jan 29, 2026
f3e2210
fix serving chat
JaredforReal Jan 29, 2026
70dcb49
fix serving chat
JaredforReal Jan 29, 2026
f356b7f
fix serving chat
JaredforReal Jan 29, 2026
2c19670
fix serving chat
JaredforReal Jan 29, 2026
6a021f5
Merge branch 'main' into perf/glm-image-main
JaredforReal Feb 2, 2026
8f13a5f
move _resolve_model_tokenizer_paths to stage utils
JaredforReal Feb 2, 2026
dd01391
fix pipeline
JaredforReal Feb 2, 2026
4033fc6
accept some review
JaredforReal Feb 2, 2026
36661f8
Merge branch 'main' into perf/glm-image-main
JaredforReal Feb 4, 2026
4e006aa
Merge branch 'main' into perf/glm-image-main
JaredforReal Feb 4, 2026
80b14de
Merge branch 'main' into perf/glm-image-main
JaredforReal Feb 9, 2026
45e85fe
revert gpu_ar_model_runner
JaredforReal Feb 9, 2026
7874a19
Merge branch 'main' into perf/glm-image-main
JaredforReal Feb 9, 2026
401f626
fit new api
JaredforReal Feb 9, 2026
69f0f61
remove multimodal_config in MMEncoderAttention
JaredforReal Feb 9, 2026
ec8a4a6
remove attn_backend_override in get_vit_attn_backend
JaredforReal Feb 9, 2026
7469f65
Merge branch 'main' into perf/glm-image-main
princepride Feb 9, 2026
dcc2b5e
Merge branch 'main' into perf/glm-image-main
JaredforReal Feb 10, 2026
84f7c6c
Merge branch 'main' into perf/glm-image-main
JaredforReal Feb 12, 2026
8d0fb13
update online serving examples
JaredforReal Feb 12, 2026
1053dcb
add model support
JaredforReal Feb 12, 2026
6e8b904
pre-commit
JaredforReal Feb 12, 2026
61bce1b
replace GlmImageTextMLP as Qwen2MLP from vllm
JaredforReal Feb 12, 2026
61e1e6a
fix import
JaredforReal Feb 12, 2026
74aed42
Merge branch 'main' into perf/glm-image-main
JaredforReal Feb 13, 2026
60c473d
Merge branch 'main' into perf/glm-image-main
JaredforReal Feb 25, 2026
ab2d2b7
pre-commit
JaredforReal Feb 25, 2026
f0f0a3a
Merge branch 'main' into perf/glm-image-main
hsliuustc0106 Feb 25, 2026
b380c0e
Merge branch 'main' into perf/glm-image-main
hsliuustc0106 Feb 25, 2026
78b72f1
Merge branch 'main' into perf/glm-image-main
hsliuustc0106 Feb 26, 2026
6b1b74c
refactor _calc_mrope_positions()
JaredforReal Feb 26, 2026
29cbd29
Merge branch 'main' into perf/glm-image-main
JaredforReal Feb 26, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ th {
|`Qwen3TTSForConditionalGeneration` | Qwen3-TTS-12Hz-1.7B-CustomVoice | `Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice` |
|`Qwen3TTSForConditionalGeneration` | Qwen3-TTS-12Hz-1.7B-VoiceDesign | `Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign` |
|`Qwen3TTSForConditionalGeneration` | Qwen3-TTS-12Hz-1.7B-Base | `Qwen/Qwen3-TTS-12Hz-0.6B-Base` |
|`GlmImageForConditionalGeneration` | GLM-Image | `zai-org/GLM-Image` |
|`NextStep11Pipeline` | NextStep-1.1 | `stepfun-ai/NextStep-1.1` |


Expand Down
138 changes: 138 additions & 0 deletions examples/offline_inference/glm_image/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# GLM-Image Multistage End-to-End Inference
Comment thread
JaredforReal marked this conversation as resolved.

This example demonstrates how to run GLM-Image with the vLLM-Omni multistage architecture.

## Architecture

GLM-Image uses a 2-stage pipeline:

```
┌─────────────────────────────────────────────────────────────┐
│ GLM-Image Pipeline │
├─────────────────────────────────────────────────────────────┤
│ │
│ Stage 0 (AR Model) Stage 1 (Diffusion) │
│ ┌─────────────────┐ ┌─────────────────────┐ │
│ │ vLLM-optimized │ │ GlmImagePipeline │ │
│ │ GlmImageFor │ prior │ ┌───────────────┐ │ │
│ │ Conditional │──tokens───►│ │ DiT Denoiser │ │ │
│ │ Generation │ │ └───────────────┘ │ │
│ │ (9B AR model) │ │ │ │ │
│ └─────────────────┘ │ ▼ │ │
│ ▲ │ ┌───────────────┐ │ │
│ │ │ │ VAE Decode │──┼──► Image
│ Text/Image │ └───────────────┘ │ │
│ Input └─────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
```

## Features

- **vLLM-optimized AR**: Uses PagedAttention and tensor parallelism for faster prior token generation
- **Flexible deployment**: AR and Diffusion stages can run on different GPUs
- **Text-to-Image**: Generate images from text descriptions
- **Image-to-Image**: Edit existing images with text prompts

## Usage

### Text-to-Image

```bash
python end2end.py \
--model-path /path/to/glm-image \
--config-path ../../vllm_omni/model_executor/stage_configs/glm_image.yaml \
--prompt "A beautiful sunset over the ocean with sailing boats" \
--height 1024 \
--width 1024 \
--output output_t2i.png
```

### Image-to-Image (Image Editing)

```bash
python end2end.py \
--model-path /path/to/glm-image \
--config-path ../../vllm_omni/model_executor/stage_configs/glm_image.yaml \
--prompt "Transform this scene into a winter wonderland" \
--image input.png \
--output output_i2i.png
```

### With Custom Parameters

```bash
python end2end.py \
--model-path /path/to/glm-image \
--config-path ../../vllm_omni/model_executor/stage_configs/glm_image.yaml \
--prompt "A photorealistic cat sitting on a window sill" \
--height 1024 \
--width 1024 \
--num-inference-steps 50 \
--guidance-scale 1.5 \
--seed 42 \
--output output.png
```

## Shell Scripts

### Run Text-to-Image

```bash
./run_t2i.sh
```

### Run Image-to-Image

```bash
./run_i2i.sh --image /path/to/input.png
```

## Stage Configuration

The stage config (`glm_image.yaml`) defines:

- **Stage 0 (AR)**: Uses `GPUARWorker` with vLLM engine

- Model: `GlmImageForConditionalGeneration`
- Output: `token_ids` (prior tokens)

- **Stage 1 (Diffusion)**: Uses diffusion engine
- Model: `GlmImagePipeline`
- Output: Generated image

See `vllm_omni/model_executor/stage_configs/glm_image.yaml` for full configuration.

## Comparison with Single-Stage

| Aspect | Single-Stage (transformers) | Multistage (vLLM) |
| ----------- | --------------------------- | ------------------- |
| AR Model | transformers native | vLLM PagedAttention |
| Memory | Higher (no KV cache opt) | Lower (optimized) |
| Throughput | Lower | Higher |
| Flexibility | Single GPU | Multi-GPU support |

## Troubleshooting

### OOM Error

Try reducing memory usage:

```bash
# In glm_image.yaml, adjust:
gpu_memory_utilization: 0.5 # Reduce from 0.6
```

### Slow Initialization

The first run loads model weights. Subsequent runs are faster:

```bash
--stage-init-timeout 900 # Increase timeout for slow storage
```

## Requirements

- vLLM-Omni with GLM-Image support
- CUDA-capable GPU (recommended: H100/A100 with 80GB)
- GLM-Image model weights
Loading