[Feat] support TP for GLM-Image#1918
Conversation
Signed-off-by: Lancer <maruixiang6688@gmail.com>
|
does the tp work for AR or DiT or both? |
|
btw, I remember the AR part is not working faster as expected even in TP=1, can you help debug it?cc @JaredforReal |
GLM-image AR likely support tp via vLLM. I only considered DiT, the results are also DiT. |
|
@RuixiangMa GLM Image is not supported in vllm main repo, we need to work on vllm-omni to speed up the AR part |
hsliuustc0106
left a comment
There was a problem hiding this comment.
Blocker Scan
| Category | Result |
|---|---|
| Correctness | ✅ PASS |
| Reliability/Safety | ✅ PASS |
| Breaking Changes | ✅ PASS |
| Test Coverage | |
| Documentation | ✅ PASS |
| Security | ✅ PASS |
OVERALL: 1 CONCERN FOUND
⚠️ Missing Test Coverage
This PR adds tensor parallelism support for GLM-Image but includes no test files. Other models with TP support have dedicated tests:
tests/diffusion/models/flux2/test_flux2_transformer_tp.pytests/diffusion/models/z_image/test_zimage_tp_constraints.py
Consider adding a test to verify:
- TP constraints validation (similar to
test_zimage_tp_constraints.py) - Basic TP inference correctness
💡 Performance Comparison - Minor Gap
Per the performance comparison checklist for [Feature] PRs affecting performance, the PR provides:
- ✅ Latency comparison (TP=1: 46.8s → TP=4: 32.2s)
- ✅ VRAM comparison (TP=1: 23.1GiB → TP=4: 14.8GiB)
- ✅ Output consistency (same image across TP sizes)
For completeness, consider documenting:
- GPU model and CUDA version used for benchmarks
- Exact commands used
VERDICT: COMMENT (non-blocking - tests recommended but implementation looks correct)
Co-authored-by: Didan Deng <33117903+wtomin@users.noreply.github.com> Signed-off-by: Lancer <402430575@qq.com>
Signed-off-by: Lancer <402430575@qq.com>
Signed-off-by: Lancer <maruixiang6688@gmail.com> Signed-off-by: Lancer <402430575@qq.com> Co-authored-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Lancer <maruixiang6688@gmail.com> Signed-off-by: Lancer <402430575@qq.com> Co-authored-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Lancer <maruixiang6688@gmail.com> Signed-off-by: Lancer <402430575@qq.com> Co-authored-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Purpose
Close #911.
Enable tensor parallelism (TP) support for GLM-Image diffusion model.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)