-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[Perf] GLM Image #920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
hsliuustc0106
merged 89 commits into
vllm-project:main
from
JaredforReal:perf/glm-image-main
Feb 27, 2026
Merged
[Perf] GLM Image #920
Changes from all commits
Commits
Show all changes
89 commits
Select commit
Hold shift + click to select a range
33964bf
init
JaredforReal 8936152
init end2end file
JaredforReal 8edc4e9
registry
JaredforReal 589b866
deal with diffusion stage
JaredforReal 4e3ae79
update for new mmencoderattn
JaredforReal 88a2a23
remove |image| for i2i
JaredforReal 80ad31b
use self.get_model()
JaredforReal fe20716
detailed log
JaredforReal 83e9fe9
add get_language_model()
JaredforReal 96ff710
add get_language_model()
JaredforReal e3d3489
extract_moltimodal_outputs
JaredforReal ee7eef3
add custom mrope
JaredforReal 00ccb97
fix i2i mrope
JaredforReal 054abf9
fix i2i mrope
JaredforReal d32b844
add model_subdir to stage_worker_async
JaredforReal 4968725
align import with vllm 0.14.0
JaredforReal f4b44e6
fix
JaredforReal d9d0ea4
simplify
JaredforReal 1c8c89e
revert
JaredforReal 829bc87
reimplement mrope
JaredforReal d9deaf1
fix
JaredforReal 2828c3b
fix
JaredforReal f67f9f2
fix i2i
JaredforReal 8fc9a5d
fix
JaredforReal 163eca3
fix
JaredforReal 6d4b146
fix profile
JaredforReal 2477035
remove supports_multimodal_raw_input_only
JaredforReal c68723e
fix i2i get_mrope_input_positions
JaredforReal 58ca1dc
debug
JaredforReal 729bbbd
debug
JaredforReal df6c27c
debug
JaredforReal e1997d1
debug
JaredforReal cee1380
fix
JaredforReal 968b97a
fix
JaredforReal 987e8bb
fix
JaredforReal 2c0c2ff
fix
JaredforReal 26b9f84
fix
JaredforReal f41d101
fix
JaredforReal 7965a15
debug
JaredforReal 09bacfa
debug
JaredforReal ca0eb2a
fix
JaredforReal 6bc98e3
fix
JaredforReal b4b7d4d
fix
JaredforReal 342f059
rename
JaredforReal 18909cd
Merge branch 'main' into perf/glm-image-main
JaredforReal 0592a80
remove logging debug codes
JaredforReal 56226d2
remove transformers AR
JaredforReal cb59851
fix dummy warmup
JaredforReal d9bf80b
restore unnecessary change
JaredforReal 902192b
clean up
JaredforReal c7b2be1
change for batch support
JaredforReal 8824f88
Merge branch 'main' into perf/glm-image-main
JaredforReal 1c157dd
update configs
JaredforReal ea437ba
update sampling_params
JaredforReal aeffa9a
update end2end
JaredforReal c48b0e4
try fix ar2diffusion
JaredforReal d82987b
pre-commit
JaredforReal f3e2210
fix serving chat
JaredforReal 70dcb49
fix serving chat
JaredforReal f356b7f
fix serving chat
JaredforReal 2c19670
fix serving chat
JaredforReal 6a021f5
Merge branch 'main' into perf/glm-image-main
JaredforReal 8f13a5f
move _resolve_model_tokenizer_paths to stage utils
JaredforReal dd01391
fix pipeline
JaredforReal 4033fc6
accept some review
JaredforReal 36661f8
Merge branch 'main' into perf/glm-image-main
JaredforReal 4e006aa
Merge branch 'main' into perf/glm-image-main
JaredforReal 80b14de
Merge branch 'main' into perf/glm-image-main
JaredforReal 45e85fe
revert gpu_ar_model_runner
JaredforReal 7874a19
Merge branch 'main' into perf/glm-image-main
JaredforReal 401f626
fit new api
JaredforReal 69f0f61
remove multimodal_config in MMEncoderAttention
JaredforReal ec8a4a6
remove attn_backend_override in get_vit_attn_backend
JaredforReal 7469f65
Merge branch 'main' into perf/glm-image-main
princepride dcc2b5e
Merge branch 'main' into perf/glm-image-main
JaredforReal 84f7c6c
Merge branch 'main' into perf/glm-image-main
JaredforReal 8d0fb13
update online serving examples
JaredforReal 1053dcb
add model support
JaredforReal 6e8b904
pre-commit
JaredforReal 61bce1b
replace GlmImageTextMLP as Qwen2MLP from vllm
JaredforReal 61e1e6a
fix import
JaredforReal 74aed42
Merge branch 'main' into perf/glm-image-main
JaredforReal 60c473d
Merge branch 'main' into perf/glm-image-main
JaredforReal ab2d2b7
pre-commit
JaredforReal f0f0a3a
Merge branch 'main' into perf/glm-image-main
hsliuustc0106 b380c0e
Merge branch 'main' into perf/glm-image-main
hsliuustc0106 78b72f1
Merge branch 'main' into perf/glm-image-main
hsliuustc0106 6b1b74c
refactor _calc_mrope_positions()
JaredforReal 29cbd29
Merge branch 'main' into perf/glm-image-main
JaredforReal File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,138 @@ | ||
| # GLM-Image Multistage End-to-End Inference | ||
|
|
||
| This example demonstrates how to run GLM-Image with the vLLM-Omni multistage architecture. | ||
|
|
||
| ## Architecture | ||
|
|
||
| GLM-Image uses a 2-stage pipeline: | ||
|
|
||
| ``` | ||
| ┌─────────────────────────────────────────────────────────────┐ | ||
| │ GLM-Image Pipeline │ | ||
| ├─────────────────────────────────────────────────────────────┤ | ||
| │ │ | ||
| │ Stage 0 (AR Model) Stage 1 (Diffusion) │ | ||
| │ ┌─────────────────┐ ┌─────────────────────┐ │ | ||
| │ │ vLLM-optimized │ │ GlmImagePipeline │ │ | ||
| │ │ GlmImageFor │ prior │ ┌───────────────┐ │ │ | ||
| │ │ Conditional │──tokens───►│ │ DiT Denoiser │ │ │ | ||
| │ │ Generation │ │ └───────────────┘ │ │ | ||
| │ │ (9B AR model) │ │ │ │ │ | ||
| │ └─────────────────┘ │ ▼ │ │ | ||
| │ ▲ │ ┌───────────────┐ │ │ | ||
| │ │ │ │ VAE Decode │──┼──► Image | ||
| │ Text/Image │ └───────────────┘ │ │ | ||
| │ Input └─────────────────────┘ │ | ||
| │ │ | ||
| └─────────────────────────────────────────────────────────────┘ | ||
| ``` | ||
|
|
||
| ## Features | ||
|
|
||
| - **vLLM-optimized AR**: Uses PagedAttention and tensor parallelism for faster prior token generation | ||
| - **Flexible deployment**: AR and Diffusion stages can run on different GPUs | ||
| - **Text-to-Image**: Generate images from text descriptions | ||
| - **Image-to-Image**: Edit existing images with text prompts | ||
|
|
||
| ## Usage | ||
|
|
||
| ### Text-to-Image | ||
|
|
||
| ```bash | ||
| python end2end.py \ | ||
| --model-path /path/to/glm-image \ | ||
| --config-path ../../vllm_omni/model_executor/stage_configs/glm_image.yaml \ | ||
| --prompt "A beautiful sunset over the ocean with sailing boats" \ | ||
| --height 1024 \ | ||
| --width 1024 \ | ||
| --output output_t2i.png | ||
| ``` | ||
|
|
||
| ### Image-to-Image (Image Editing) | ||
|
|
||
| ```bash | ||
| python end2end.py \ | ||
| --model-path /path/to/glm-image \ | ||
| --config-path ../../vllm_omni/model_executor/stage_configs/glm_image.yaml \ | ||
| --prompt "Transform this scene into a winter wonderland" \ | ||
| --image input.png \ | ||
| --output output_i2i.png | ||
| ``` | ||
|
|
||
| ### With Custom Parameters | ||
|
|
||
| ```bash | ||
| python end2end.py \ | ||
| --model-path /path/to/glm-image \ | ||
| --config-path ../../vllm_omni/model_executor/stage_configs/glm_image.yaml \ | ||
| --prompt "A photorealistic cat sitting on a window sill" \ | ||
| --height 1024 \ | ||
| --width 1024 \ | ||
| --num-inference-steps 50 \ | ||
| --guidance-scale 1.5 \ | ||
| --seed 42 \ | ||
| --output output.png | ||
| ``` | ||
|
|
||
| ## Shell Scripts | ||
|
|
||
| ### Run Text-to-Image | ||
|
|
||
| ```bash | ||
| ./run_t2i.sh | ||
| ``` | ||
|
|
||
| ### Run Image-to-Image | ||
|
|
||
| ```bash | ||
| ./run_i2i.sh --image /path/to/input.png | ||
| ``` | ||
|
|
||
| ## Stage Configuration | ||
|
|
||
| The stage config (`glm_image.yaml`) defines: | ||
|
|
||
| - **Stage 0 (AR)**: Uses `GPUARWorker` with vLLM engine | ||
|
|
||
| - Model: `GlmImageForConditionalGeneration` | ||
| - Output: `token_ids` (prior tokens) | ||
|
|
||
| - **Stage 1 (Diffusion)**: Uses diffusion engine | ||
| - Model: `GlmImagePipeline` | ||
| - Output: Generated image | ||
|
|
||
| See `vllm_omni/model_executor/stage_configs/glm_image.yaml` for full configuration. | ||
|
|
||
| ## Comparison with Single-Stage | ||
|
|
||
| | Aspect | Single-Stage (transformers) | Multistage (vLLM) | | ||
| | ----------- | --------------------------- | ------------------- | | ||
| | AR Model | transformers native | vLLM PagedAttention | | ||
| | Memory | Higher (no KV cache opt) | Lower (optimized) | | ||
| | Throughput | Lower | Higher | | ||
| | Flexibility | Single GPU | Multi-GPU support | | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| ### OOM Error | ||
|
|
||
| Try reducing memory usage: | ||
|
|
||
| ```bash | ||
| # In glm_image.yaml, adjust: | ||
| gpu_memory_utilization: 0.5 # Reduce from 0.6 | ||
| ``` | ||
|
|
||
| ### Slow Initialization | ||
|
|
||
| The first run loads model weights. Subsequent runs are faster: | ||
|
|
||
| ```bash | ||
| --stage-init-timeout 900 # Increase timeout for slow storage | ||
| ``` | ||
|
|
||
| ## Requirements | ||
|
|
||
| - vLLM-Omni with GLM-Image support | ||
| - CUDA-capable GPU (recommended: H100/A100 with 80GB) | ||
| - GLM-Image model weights | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.