[Feat] support quantization for GLM-IMAGE by RuixiangMa · Pull Request #2292 · vllm-project/vllm-omni

RuixiangMa · 2026-03-28T04:57:01Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Test Plan

Test Result

Metric	NO quantization	quantization
Image

Quantization Quality Benchmark for GPU

1024*1024 + 50 steps

config	Avg Time	Speedup	Memory (GiB)	Mem Reduction	Mean LPIPS
BF16 baseline	46.847s	-	22.66	-	(ref)
fp8	40.155s	1.17x	17.05	25.02%	0.0911

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Lancer <maruixiang6688@gmail.com>

david6666666

The corresponding fp8.md and etc file in the doc also needs to be updated.

david6666666 · 2026-03-30T12:50:16Z

                        cfg.engine_args.lora_scale = lora_scale
+                # Prefer explicit quantization_config; fallback to legacy --quantization.
                quantization_config = kwargs.get("quantization_config")
+                if quantization_config is None:


Do we need to add this logic

AsyncOmniEngine._resolve_stage_configs() follows the multi-stage config path, where top-level kwargs quantization args are not automatically propagated to each diffusion stage’s engine_args.

It is not strictly required if we decide to enforce quantization_config only, but without it, legacy --quantization can be silently ineffective in diffusion stage.

Signed-off-by: Lancer <maruixiang6688@gmail.com>

lishunyang12

left a couple comments

lishunyang12 · 2026-04-02T15:38:32Z

        # Load transformer (DiT)
        logger.info("Loading GlmImageTransformer2DModel (DiT)...")
-        self.transformer = GlmImageTransformer2DModel(od_config=od_config)
+        logger.info("GLM diffusion quantization_config: %s", od_config.quantization_config)


This looks like a debug leftover. Either drop it or downgrade to logger.debug.

yes, removed it

lishunyang12 · 2026-04-02T15:38:32Z

+
+                # Normalize legacy string value to dict-like quantization config.
+                if isinstance(quantization_config, str):
+                    quantization_config = {"method": quantization_config}


OmniDiffusionConfig.__post_init__ and from_kwargs already handle the "quantization" -> "quantization_config" mapping and the str -> QuantizationConfig conversion. This block duplicates that logic at the engine level.

Also, wrapping the string in {"method": quantization_config} is an unnecessary indirection -- cfg.engine_args.quantization_config accepts a plain string and build_quant_config handles it. I would drop this entire addition and keep the existing one-liner.

Signed-off-by: Lancer <maruixiang6688@gmail.com>

lishunyang12 · 2026-04-04T20:03:45Z

We are collecting quantitative result for layerwise precision degradation. If you want to merge this pr, please refer to #1470.

Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>

hsliuustc0106 · 2026-04-18T13:21:30Z

quantization is quite important to optimize the glm-image perf. please check #1470 for a reference

RuixiangMa · 2026-04-21T14:07:58Z

quantization is quite important to optimize the glm-image perf. please check #1470 for a reference

Sorry for missed the comment @lishunyang12 , I will complement the related tests

Signed-off-by: Lancer <maruixiang6688@gmail.com>

hsliuustc0106

lgtm

hsliuustc0106 · 2026-04-24T05:50:23Z

btw, will we support quantization for AR parts?

[Feat] support quantization for GLM-IMAGE

462d11e

Signed-off-by: Lancer <maruixiang6688@gmail.com>

RuixiangMa requested a review from hsliuustc0106 as a code owner March 28, 2026 04:57

RuixiangMa mentioned this pull request Mar 28, 2026

[RFC]: Continuous Diffusion Model Acceleration Support #1217

Open

1 task

upd

62dc61e

Signed-off-by: Lancer <maruixiang6688@gmail.com>

david6666666 reviewed Mar 30, 2026

View reviewed changes

RuixiangMa added 2 commits March 31, 2026 00:58

upd

148b7e1

Signed-off-by: Lancer <maruixiang6688@gmail.com>

Merge branch 'main' into glmquantization

d302087

Signed-off-by: Lancer <maruixiang6688@gmail.com>

lishunyang12 mentioned this pull request Mar 31, 2026

[RFC]: Continuous Quantization Support #1854

Open

zhangj1an mentioned this pull request Apr 2, 2026

[Quantization] feat: add FP8 for Omnigen2 #2441

Merged

5 tasks

lishunyang12 reviewed Apr 2, 2026

View reviewed changes

upd

c78693f

Signed-off-by: Lancer <maruixiang6688@gmail.com>

Merge branch 'main' into glmquantization

890e433

lishunyang12 added the quantization Code related to quantization label Apr 15, 2026

Merge branch 'main' into glmquantization

7103c4c

Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>

hsliuustc0106 mentioned this pull request Apr 18, 2026

[RFC]: GLM-Image Performance Optimization #2834

Open

5 tasks

upd

46bc2b7

Signed-off-by: Lancer <maruixiang6688@gmail.com>

david6666666 added ready label to trigger buildkite CI and removed ready label to trigger buildkite CI labels Apr 23, 2026

hsliuustc0106 approved these changes Apr 24, 2026

View reviewed changes

hsliuustc0106 merged commit 345504b into vllm-project:main Apr 24, 2026
8 checks passed

Conversation

RuixiangMa commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Quantization Quality Benchmark for GPU

Uh oh!

david6666666 left a comment

Choose a reason for hiding this comment

Uh oh!

david6666666 Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

RuixiangMa Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

RuixiangMa Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

RuixiangMa Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 commented Apr 4, 2026

Uh oh!

hsliuustc0106 commented Apr 18, 2026

Uh oh!

RuixiangMa commented Apr 21, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hsliuustc0106 commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

RuixiangMa commented Mar 28, 2026 •

edited

Loading

RuixiangMa Mar 30, 2026 •

edited

Loading