Skip to content

[Quantization] Enable FP8 online quantization for Qwen-image-edit text encoder#3484

Open
xRay2016 wants to merge 1 commit into
vllm-project:mainfrom
xRay2016:pr-3279-fp8-text-encoder-v3
Open

[Quantization] Enable FP8 online quantization for Qwen-image-edit text encoder#3484
xRay2016 wants to merge 1 commit into
vllm-project:mainfrom
xRay2016:pr-3279-fp8-text-encoder-v3

Conversation

@xRay2016
Copy link
Copy Markdown

@xRay2016 xRay2016 commented May 10, 2026

ref #1854

Purpose

Support FP8 quantization for the Qwen-Image-Edit text encoder part by following #1338.

This PR updates the Qwen Image Edit diffusion pipeline to:

  • Load text_encoder, transformer, and vae through component weight sources.
  • Instantiate the text encoder with create_transformers_model(...) so vLLM-Omni quantization replacement can be applied.
  • Add a root_prefix path for recursive linear replacement, allowing quantization config entries such as text_encoder.
  • Map HuggingFace Qwen-Image-Edit text encoder weight names to the vLLM model structure during weight loading.
  • Load the VAE from config so it is populated through the unified weight loader.

It also adds a quantization quality test case for Qwen/Qwen-Image-Edit with FP8 applied to the text encoder.

Test Plan

Run the targeted full-model quality test:

VLLM_OMNI_QUALITY_OUTPUT_DIR=/tmp/modelopt_quality_outputs \
.venv/bin/python -m pytest tests/diffusion/quantization/test_quantization_quality.py \
  -v -m "" -k "fp8_qwen_image_edit_text_encoder"

Run the e2e qwen image edit test:

.venv/bin/python -m pytest tests/e2e/accuracy/test_qwen_image_edit.py::test_qwen_image_edit_single_matches_diffusers -v -s

Test Result

The result of full-model quality test is:

============================================================
Quantization Quality: fp8_qwen_image_edit_text_encoder
============================================================
  Baseline:      Qwen/Qwen-Image-Edit
  Quantized:     Qwen/Qwen-Image-Edit
  Method:        {'text_encoder': {'method': 'fp8'}, 'transformer': None, 'vae': None}
  LPIPS:         0.0167  (threshold: 0.15)
  PSNR:          33.3159 dB  (higher is better)
  MAE:           0.014689  (lower is better)
  BF16 memory:   56.87 GiB
  Quant memory:  49.58 GiB  (13% reduction)
  Result:        PASS
============================================================

PASSED

PASSED

The result of e2e qwen image edit is:

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
--- Running Summary
============== 1 passed, 19 warnings in 188.01s (0:03:08) ==============

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b2b8d2fe73

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image_edit.py
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

LGTM. This enables per-component quantization for diffusion models, which is a valuable capability.

The parameter approach is clean and enables quantization config entries like to work correctly. Weight mapping from HF keys to vLLM structure is also well done.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

@fhfuih PTAL

@xRay2016 xRay2016 force-pushed the pr-3279-fp8-text-encoder-v3 branch from b2b8d2f to 839f2ce Compare May 12, 2026 15:24
@xRay2016
Copy link
Copy Markdown
Author

@hsliuustc0106 @fhfuih Hi, do you have any remaining comments on this PR? If not, could you please approve it so we can merge? Thanks!

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

@david6666666 @lishunyang12 PTAL

@lishunyang12
Copy link
Copy Markdown
Collaborator

Show visual comparision

@xRay2016
Copy link
Copy Markdown
Author

Show visual comparision

Hi, @lishunyang12

Here are visual comparison generated by tests/diffusion/quantization/test_quantization_quality.py.

Both images use the same input image, prompt, seed, resolution, inference steps, and sampling settings. I don’t observe obvious visual degradation from the quantized output in these examples.

BF16 baseline

fp8_qwen_image_edit_text_encoder_baseline

FP8 text encoder

fp8_qwen_image_edit_text_encoder_quantized

Copy link
Copy Markdown
Contributor

@fhfuih fhfuih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quite familiar with quantization. But referring #1338 this PR looks generally fine. Left some small comments on tests

Comment thread tests/diffusion/quantization/test_quantization_quality.py
Comment thread tests/diffusion/quantization/test_quantization_quality.py
Comment thread tests/diffusion/quantization/test_quantization_quality.py Outdated
Comment thread vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image_edit.py Outdated
@xRay2016 xRay2016 force-pushed the pr-3279-fp8-text-encoder-v3 branch from 12479e0 to 11e7e7d Compare May 17, 2026 05:21
Comment thread tests/helpers/fixtures/inputs.py Outdated
@xRay2016
Copy link
Copy Markdown
Author

Hi! Just following up on this PR. I’ve addressed the previous feedback and would appreciate another review when you have time.

Please let me know if there’s anything else you’d like me to change. If everything looks good, an approval would be appreciated. Thanks!

Signed-off-by: XRay2016 <1150722393@qq.com>
@xRay2016 xRay2016 force-pushed the pr-3279-fp8-text-encoder-v3 branch from fa482b9 to 7a4d078 Compare May 21, 2026 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants