[Diffusion] [AMD] Online MXFP4 and FP8 Quantization for Multimodal Generation#21431
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Co-authored-by: Bowen Bao <bowenbao@amd.com>
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
mickqian
left a comment
There was a problem hiding this comment.
could you also mention this new server arg in cli.md, quantization.md or other related places?
Added documentation in |
…fig fix for zimage, and mxfp4 perf improvements
|
@mickqian Friendly ping for review, all comments addressed, hoping to land this PR soon! |
Definitely, I'm happy either way as long as the functionality lands! I originally didn't notice this PR before I had already created a new one. |
|
|
||
| ## Online Quantization | ||
|
|
||
| Online quantization applies quantization to unquantized models at load time. This is useful for when pre-quantized checkpoints are not available. |
There was a problem hiding this comment.
nit: add (on-the-fly / load-time quantization) as well
|
/tag-and-rerun-ci |
|
@ColinZ22 please fix lint checks |
|
Fixed, @mickqian @wisclmy0611 Re-review would be greatly appreciated! Hoping to land this PR soon. |
|
@amd-bot ci-status |
1 similar comment
|
@amd-bot ci-status |
Motivation
Adding Online MXFP4 (For AMD GPUs) and FP8 Quantization for multimodal (image and video) generation with models like Z-Image-Turbo and Wan 2.2.
Modifications
--quantizationserver argument allowing loading unquantized model and quantizing weights and activations to MXFP4.--quantization-ignored-layersserver argument allows skipping certain layers for online quantization (keeping in full precision)Mxfp4ConfigandMxfp4LinearMethodclasses utilizing AITER dynamic MXFP4 quantization and MXFP4 GEMM kernels.--quantization.Usage Example
To online quantize a Diffusion Model to FP8 or MXFP4, simply add the
--quantizationargument:Generation Quality Comparison
Prompt 1: "A cat sitting at the top of a mountain looking down at a futuristic city"
Prompt 2: "A crowd of people of various age at a busy outdoor marketplace"
Prompt 3: "A young child blowing dandelion seeds, golden hour lighting"
Prompt 4: "A city street at sunset with snow-capped mountain in the distant background"
Performance Benchmarking
Model: Z-Image-Turbo
Dataset: 200 images from HuggingFace Parti-Prompts
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci