[CI][BugFix] Fix and Validate FP8 Z-Image quality gate#3929
Merged
Conversation
Signed-off-by: WeiQing Chen <david6666666@users.noreply.github.com>
congw729
reviewed
May 28, 2026
Signed-off-by: WeiQing Chen <david6666666@users.noreply.github.com>
Signed-off-by: WeiQing Chen <david6666666@users.noreply.github.com>
Signed-off-by: WeiQing Chen <david6666666@users.noreply.github.com>
Signed-off-by: WeiQing Chen <david6666666@users.noreply.github.com>
1 task
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Collaborator
Author
|
@yuanheng-zhao @RuixiangMa ptal thx |
Signed-off-by: WeiQing Chen <david6666666@users.noreply.github.com>
Signed-off-by: WeiQing Chen <david6666666@users.noreply.github.com>
yuanheng-zhao
approved these changes
May 28, 2026
Collaborator
yuanheng-zhao
left a comment
There was a problem hiding this comment.
LGTM. Btw, are the layers for the final 15 attention/FFN blocks ignored for their weighted influences on generated output diff
Collaborator
Author
Yes |
tzhouam
pushed a commit
that referenced
this pull request
May 29, 2026
Signed-off-by: WeiQing Chen <david6666666@users.noreply.github.com> Co-authored-by: WeiQing Chen <david6666666@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Unskips
test_quantization_quality[fp8_z_image]and updates the Z-Image FP8 quality prompt to the requested long floating-archipelago prompt.Following PR #3279's text-encoder FP8 path, this uses regular online FP8 quantization. The text encoder is FP8 for the early/mid blocks, with only the final 8 text-encoder blocks listed in
ignored_layersfor BF16 fallback. The Z-Image transformer also keeps the final 15 main attention/FFN blocks in BF16 throughignored_layers.The final PR diff is intentionally small and only changes
tests/diffusion/quantization/test_quantization_quality.py. The visual artifacts are linked below and are not part of the final file diff.E2E Result
Command:
Result: passed with
max_lpips=0.15.I also tested the simpler "text encoder fully FP8 + transformer final 15 fallback" variant. It failed the requested 0.15 gate with LPIPS
0.199243, so the final config keeps text encoder layers 28-35 as BF16 fallback.Quantized Config
{ "method": "fp8", "ignored_layers": [ "img_mlp", "layers.15..29.{attention.to_qkv,attention.to_out.0,feed_forward.w13,feed_forward.w2}", "model.layers.28..35.{self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_proj,mlp.gate_proj,mlp.up_proj,mlp.down_proj}", ], }Note:
img_mlpappears in Qwen-Image but not in Z-Image, so it is included for parity with PR #1034 but does not match a Z-Image layer by itself.Visual Comparison
BF16 baseline:
FP8 quantized:
Validation
test_fp8_config.py: 28 passed.The local E2E environment used
vllm==0.21.0,torch==2.11.0+cu130,lpips==0.1.4, and a single NVIDIA B300 GPU viaCUDA_VISIBLE_DEVICES=0.