[Feat] support quantization for GLM-IMAGE#2292
[Feat] support quantization for GLM-IMAGE#2292hsliuustc0106 merged 8 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Lancer <maruixiang6688@gmail.com>
david6666666
left a comment
There was a problem hiding this comment.
The corresponding fp8.md and etc file in the doc also needs to be updated.
| cfg.engine_args.lora_scale = lora_scale | ||
| # Prefer explicit quantization_config; fallback to legacy --quantization. | ||
| quantization_config = kwargs.get("quantization_config") | ||
| if quantization_config is None: |
There was a problem hiding this comment.
Do we need to add this logic
There was a problem hiding this comment.
AsyncOmniEngine._resolve_stage_configs() follows the multi-stage config path, where top-level kwargs quantization args are not automatically propagated to each diffusion stage’s engine_args.
It is not strictly required if we decide to enforce quantization_config only, but without it, legacy --quantization can be silently ineffective in diffusion stage.
Signed-off-by: Lancer <maruixiang6688@gmail.com>
lishunyang12
left a comment
There was a problem hiding this comment.
left a couple comments
| # Load transformer (DiT) | ||
| logger.info("Loading GlmImageTransformer2DModel (DiT)...") | ||
| self.transformer = GlmImageTransformer2DModel(od_config=od_config) | ||
| logger.info("GLM diffusion quantization_config: %s", od_config.quantization_config) |
There was a problem hiding this comment.
This looks like a debug leftover. Either drop it or downgrade to logger.debug.
|
|
||
| # Normalize legacy string value to dict-like quantization config. | ||
| if isinstance(quantization_config, str): | ||
| quantization_config = {"method": quantization_config} |
There was a problem hiding this comment.
OmniDiffusionConfig.__post_init__ and from_kwargs already handle the "quantization" -> "quantization_config" mapping and the str -> QuantizationConfig conversion. This block duplicates that logic at the engine level.
Also, wrapping the string in {"method": quantization_config} is an unnecessary indirection -- cfg.engine_args.quantization_config accepts a plain string and build_quant_config handles it. I would drop this entire addition and keep the existing one-liner.
|
We are collecting quantitative result for layerwise precision degradation. If you want to merge this pr, please refer to #1470. |
Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>
|
quantization is quite important to optimize the glm-image perf. please check #1470 for a reference |
Sorry for missed the comment @lishunyang12 , I will complement the related tests |
|
btw, will we support quantization for AR parts? |
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Test Plan
Test Result
Quantization Quality Benchmark for GPU
1024*1024 + 50 steps
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)