[Quantization] Enable FP8 online quantization for Qwen-image-edit text encoder by xRay2016 · Pull Request #3484 · vllm-project/vllm-omni

xRay2016 · 2026-05-10T17:07:44Z

ref #1854

Purpose

Support FP8 quantization for the Qwen-Image-Edit text encoder part by following #1338.

This PR updates the Qwen Image Edit diffusion pipeline to:

Load text_encoder, transformer, and vae through component weight sources.
Instantiate the text encoder with create_transformers_model(...) so vLLM-Omni quantization replacement can be applied.
Add a root_prefix path for recursive linear replacement, allowing quantization config entries such as text_encoder.
Map HuggingFace Qwen-Image-Edit text encoder weight names to the vLLM model structure during weight loading.
Load the VAE from config so it is populated through the unified weight loader.

It also adds a quantization quality test case for Qwen/Qwen-Image-Edit with FP8 applied to the text encoder.

Test Plan

Run the targeted full-model quality test:

VLLM_OMNI_QUALITY_OUTPUT_DIR=/tmp/modelopt_quality_outputs \
.venv/bin/python -m pytest tests/diffusion/quantization/test_quantization_quality.py \
  -v -m "" -k "fp8_qwen_image_edit_text_encoder"

Run the e2e qwen image edit test:

.venv/bin/python -m pytest tests/e2e/accuracy/test_qwen_image_edit.py::test_qwen_image_edit_single_matches_diffusers -v -s

Test Result

The result of full-model quality test is:

============================================================
Quantization Quality: fp8_qwen_image_edit_text_encoder
============================================================
  Baseline:      Qwen/Qwen-Image-Edit
  Quantized:     Qwen/Qwen-Image-Edit
  Method:        {'text_encoder': {'method': 'fp8'}, 'transformer': None, 'vae': None}
  LPIPS:         0.0167  (threshold: 0.15)
  PSNR:          33.3159 dB  (higher is better)
  MAE:           0.014689  (lower is better)
  BF16 memory:   56.87 GiB
  Quant memory:  49.58 GiB  (13% reduction)
  Result:        PASS
============================================================

PASSED

PASSED

The result of e2e qwen image edit is:

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
--- Running Summary
============== 1 passed, 19 warnings in 188.01s (0:03:08) ==============

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b2b8d2fe73

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

hsliuustc0106 · 2026-05-10T21:11:01Z

LGTM. This enables per-component quantization for diffusion models, which is a valuable capability.

The parameter approach is clean and enables quantization config entries like to work correctly. Weight mapping from HF keys to vLLM structure is also well done.

hsliuustc0106 · 2026-05-11T02:03:51Z

@fhfuih PTAL

xRay2016 · 2026-05-12T15:29:08Z

@hsliuustc0106 @fhfuih Hi, do you have any remaining comments on this PR? If not, could you please approve it so we can merge? Thanks!

hsliuustc0106 · 2026-05-12T15:54:27Z

@david6666666 @lishunyang12 PTAL

lishunyang12 · 2026-05-12T16:07:39Z

Show visual comparision

xRay2016 · 2026-05-12T17:18:13Z

Show visual comparision

Hi, @lishunyang12

Here are visual comparison generated by tests/diffusion/quantization/test_quantization_quality.py.

Both images use the same input image, prompt, seed, resolution, inference steps, and sampling settings. I don’t observe obvious visual degradation from the quantized output in these examples.

BF16 baseline

fp8_qwen_image_edit_text_encoder_baseline

FP8 text encoder

fp8_qwen_image_edit_text_encoder_quantized

fhfuih

I am not quite familiar with quantization. But referring #1338 this PR looks generally fine. Left some small comments on tests

xRay2016 · 2026-05-21T15:05:05Z

Hi! Just following up on this PR. I’ve addressed the previous feedback and would appreciate another review when you have time.

Please let me know if there’s anything else you’d like me to change. If everything looks good, an approval would be appreciated. Thanks!

Signed-off-by: XRay2016 <1150722393@qq.com>

xRay2016 requested review from Isotr0py, RuixiangMa, SamitHuang, ZJY0516, david6666666, princepride, wtomin and yenuo26 as code owners May 10, 2026 17:07

chatgpt-codex-connector Bot reviewed May 10, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image_edit.py

xRay2016 force-pushed the pr-3279-fp8-text-encoder-v3 branch from b2b8d2f to 839f2ce Compare May 12, 2026 15:24

hsliuustc0106 requested a review from lishunyang12 May 12, 2026 15:54

fhfuih reviewed May 13, 2026

View reviewed changes

xRay2016 force-pushed the pr-3279-fp8-text-encoder-v3 branch from 12479e0 to 11e7e7d Compare May 17, 2026 05:21

lishunyang12 reviewed May 17, 2026

View reviewed changes

Comment thread tests/helpers/fixtures/inputs.py Outdated

feat:support fp8 quantization qwen image edit

7a4d078

Signed-off-by: XRay2016 <1150722393@qq.com>

xRay2016 force-pushed the pr-3279-fp8-text-encoder-v3 branch from fa482b9 to 7a4d078 Compare May 21, 2026 15:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Quantization] Enable FP8 online quantization for Qwen-image-edit text encoder#3484

[Quantization] Enable FP8 online quantization for Qwen-image-edit text encoder#3484
xRay2016 wants to merge 1 commit into
vllm-project:mainfrom
xRay2016:pr-3279-fp8-text-encoder-v3

xRay2016 commented May 10, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

hsliuustc0106 commented May 10, 2026

Uh oh!

hsliuustc0106 commented May 11, 2026

Uh oh!

xRay2016 commented May 12, 2026

Uh oh!

hsliuustc0106 commented May 12, 2026

Uh oh!

lishunyang12 commented May 12, 2026

Uh oh!

xRay2016 commented May 12, 2026

Uh oh!

fhfuih left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xRay2016 commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

xRay2016 commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

hsliuustc0106 commented May 10, 2026

Uh oh!

hsliuustc0106 commented May 11, 2026

Uh oh!

xRay2016 commented May 12, 2026

Uh oh!

hsliuustc0106 commented May 12, 2026

Uh oh!

lishunyang12 commented May 12, 2026

Uh oh!

xRay2016 commented May 12, 2026

BF16 baseline

FP8 text encoder

Uh oh!

fhfuih left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xRay2016 commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xRay2016 commented May 10, 2026 •

edited

Loading