Skip to content

[Feat] support quantization for GLM-IMAGE#2292

Merged
hsliuustc0106 merged 8 commits intovllm-project:mainfrom
RuixiangMa:glmquantization
Apr 24, 2026
Merged

[Feat] support quantization for GLM-IMAGE#2292
hsliuustc0106 merged 8 commits intovllm-project:mainfrom
RuixiangMa:glmquantization

Conversation

@RuixiangMa
Copy link
Copy Markdown
Contributor

@RuixiangMa RuixiangMa commented Mar 28, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Test Plan

Test Result

Metric NO quantization quantization
Image glm_image_t2i_output glm_image_t2i_output

Quantization Quality Benchmark for GPU

1024*1024 + 50 steps

config Avg Time Speedup Memory (GiB) Mem Reduction Mean LPIPS
BF16 baseline 46.847s - 22.66 - (ref)
fp8 40.155s 1.17x 17.05 25.02% 0.0911

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Lancer <maruixiang6688@gmail.com>
Signed-off-by: Lancer <maruixiang6688@gmail.com>
Copy link
Copy Markdown
Collaborator

@david6666666 david6666666 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The corresponding fp8.md and etc file in the doc also needs to be updated.

cfg.engine_args.lora_scale = lora_scale
# Prefer explicit quantization_config; fallback to legacy --quantization.
quantization_config = kwargs.get("quantization_config")
if quantization_config is None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add this logic

Copy link
Copy Markdown
Contributor Author

@RuixiangMa RuixiangMa Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AsyncOmniEngine._resolve_stage_configs() follows the multi-stage config path, where top-level kwargs quantization args are not automatically propagated to each diffusion stage’s engine_args.

It is not strictly required if we decide to enforce quantization_config only, but without it, legacy --quantization can be silently ineffective in diffusion stage.

Signed-off-by: Lancer <maruixiang6688@gmail.com>
Signed-off-by: Lancer <maruixiang6688@gmail.com>
Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a couple comments

# Load transformer (DiT)
logger.info("Loading GlmImageTransformer2DModel (DiT)...")
self.transformer = GlmImageTransformer2DModel(od_config=od_config)
logger.info("GLM diffusion quantization_config: %s", od_config.quantization_config)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a debug leftover. Either drop it or downgrade to logger.debug.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, removed it

Comment thread vllm_omni/engine/async_omni_engine.py Outdated

# Normalize legacy string value to dict-like quantization config.
if isinstance(quantization_config, str):
quantization_config = {"method": quantization_config}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OmniDiffusionConfig.__post_init__ and from_kwargs already handle the "quantization" -> "quantization_config" mapping and the str -> QuantizationConfig conversion. This block duplicates that logic at the engine level.

Also, wrapping the string in {"method": quantization_config} is an unnecessary indirection -- cfg.engine_args.quantization_config accepts a plain string and build_quant_config handles it. I would drop this entire addition and keep the existing one-liner.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, done

Signed-off-by: Lancer <maruixiang6688@gmail.com>
@lishunyang12
Copy link
Copy Markdown
Collaborator

We are collecting quantitative result for layerwise precision degradation. If you want to merge this pr, please refer to #1470.

@lishunyang12 lishunyang12 added the quantization Code related to quantization label Apr 15, 2026
Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

quantization is quite important to optimize the glm-image perf. please check #1470 for a reference

@RuixiangMa
Copy link
Copy Markdown
Contributor Author

quantization is quite important to optimize the glm-image perf. please check #1470 for a reference

Sorry for missed the comment @lishunyang12 , I will complement the related tests

Signed-off-by: Lancer <maruixiang6688@gmail.com>
@david6666666 david6666666 added ready label to trigger buildkite CI and removed ready label to trigger buildkite CI labels Apr 23, 2026
Copy link
Copy Markdown
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@hsliuustc0106 hsliuustc0106 merged commit 345504b into vllm-project:main Apr 24, 2026
8 checks passed
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

btw, will we support quantization for AR parts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

quantization Code related to quantization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants