[Quantization] Enable FP8 online quantization for Qwen-image-edit text encoder#3484
[Quantization] Enable FP8 online quantization for Qwen-image-edit text encoder#3484xRay2016 wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b2b8d2fe73
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
LGTM. This enables per-component quantization for diffusion models, which is a valuable capability. The parameter approach is clean and enables quantization config entries like to work correctly. Weight mapping from HF keys to vLLM structure is also well done. |
|
@fhfuih PTAL |
b2b8d2f to
839f2ce
Compare
|
@hsliuustc0106 @fhfuih Hi, do you have any remaining comments on this PR? If not, could you please approve it so we can merge? Thanks! |
|
Show visual comparision |
Hi, @lishunyang12 Here are visual comparison generated by Both images use the same input image, prompt, seed, resolution, inference steps, and sampling settings. I don’t observe obvious visual degradation from the quantized output in these examples. BF16 baseline
FP8 text encoder
|
12479e0 to
11e7e7d
Compare
|
Hi! Just following up on this PR. I’ve addressed the previous feedback and would appreciate another review when you have time. Please let me know if there’s anything else you’d like me to change. If everything looks good, an approval would be appreciated. Thanks! |
Signed-off-by: XRay2016 <1150722393@qq.com>
fa482b9 to
7a4d078
Compare


ref #1854
Purpose
Support FP8 quantization for the Qwen-Image-Edit text encoder part by following #1338.
This PR updates the Qwen Image Edit diffusion pipeline to:
text_encoder,transformer, andvaethrough component weight sources.create_transformers_model(...)so vLLM-Omni quantization replacement can be applied.root_prefixpath for recursive linear replacement, allowing quantization config entries such astext_encoder.It also adds a quantization quality test case for
Qwen/Qwen-Image-Editwith FP8 applied to the text encoder.Test Plan
Run the targeted full-model quality test:
Run the e2e qwen image edit test:
Test Result
The result of full-model quality test is:
============================================================ Quantization Quality: fp8_qwen_image_edit_text_encoder ============================================================ Baseline: Qwen/Qwen-Image-Edit Quantized: Qwen/Qwen-Image-Edit Method: {'text_encoder': {'method': 'fp8'}, 'transformer': None, 'vae': None} LPIPS: 0.0167 (threshold: 0.15) PSNR: 33.3159 dB (higher is better) MAE: 0.014689 (lower is better) BF16 memory: 56.87 GiB Quant memory: 49.58 GiB (13% reduction) Result: PASS ============================================================ PASSED PASSEDThe result of e2e qwen image edit is:
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html --- Running Summary ============== 1 passed, 19 warnings in 188.01s (0:03:08) ==============Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)