[Quantization] Enable FP8 online quantization for Z-image text encoder by Isotr0py · Pull Request #1338 · vllm-project/vllm-omni

Isotr0py · 2026-02-11T14:46:44Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Following PR for [Feature]: FP8 Quantization Support for DiT #1034

Test Plan

python examples/offline_inference/text_to_image/text_to_image.py --model /mnt/data0/LLM/Z-Image-Turbo/ --width 512 --height 512 --tensor-parallel-size 2

Test Result

Main branch

Loading safetensors checkpoint shards:  33% Completed | 1/3 [00:01<00:03,  1.78s/it]
Loading safetensors checkpoint shards:  67% Completed | 2/3 [00:03<00:01,  1.96s/it]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:04<00:00,  1.57s/it]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:04<00:00,  1.65s/it]

[Stage-0] INFO 02-11 22:49:28 [diffusers_loader.py:256] Loading weights took 5.02 seconds
[Stage-0] INFO 02-11 22:49:28 [diffusers_loader.py:256] Loading weights took 5.15 seconds
[Stage-0] INFO 02-11 22:49:29 [diffusion_model_runner.py:102] Model loading took 10.7946 GiB and 10.276793 seconds

PR

Loading safetensors checkpoint shards:  33% Completed | 1/3 [00:01<00:03,  1.80s/it]
Loading safetensors checkpoint shards:  67% Completed | 2/3 [00:03<00:01,  1.96s/it]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:04<00:00,  1.54s/it]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:04<00:00,  1.64s/it]

Loading safetensors checkpoint shards:   0% Completed | 0/3 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  33% Completed | 1/3 [00:01<00:02,  1.42s/it]
Loading safetensors checkpoint shards:  67% Completed | 2/3 [00:02<00:01,  1.32s/it]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:02<00:00,  1.26it/s]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:02<00:00,  1.06it/s]

[Stage-0] INFO 02-11 22:42:52 [diffusers_loader.py:257] Loading weights took 7.75 seconds
[Stage-0] INFO 02-11 22:42:52 [diffusers_loader.py:257] Loading weights took 7.88 seconds
[Stage-0] INFO 02-11 22:42:53 [diffusion_model_runner.py:102] Model loading took 7.7586 GiB and 11.094385 seconds

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

…ader Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d37a560ffb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

lishunyang12

Nice work enabling FP8 for the Z-image text encoder! The create_transformers_model + recursive_replace_linear pattern is clean and reusable. The weight loader changes to support model.safetensors.index.json are a practical fix too.

Left a couple of small comments below.

lishunyang12 · 2026-02-21T05:26:25Z

+            filter(lambda f: file_exists(model_name_or_path, f, revision=revision), possible_index_files)
+        )
+        assert len(available_index_file) <= 1, (
+            f"Multiple index files found in {model_name_or_path} with subfolder {subfolder}: {available_index_file}"


I was wondering about the assert len(available_index_file) <= 1 here. If a model repo somehow ships both diffusion_pytorch_model.safetensors.index.json and model.safetensors.index.json in the same subfolder, this will crash with an AssertionError and no user-friendly message.

Could we either:

Use if len(...) > 1: raise ValueError(...) so the error survives python -O, or

Pick one with a defined priority (e.g., prefer diffusion_pytorch_model over model) and log a warning?

Option 2 might be more resilient since we can't always control what model authors upload.

model repo somehow ships both diffusion_pytorch_model.safetensors.index.json and model.safetensors.index.json in the same subfolder

I think this is a quite rare case for diffusion pipeline with multiple components., especially diffusion_pytorch_model.safetensors.index.json and model.safetensors.index.json are two different style index file for diffusers and transformers respectively.

I prefer to choose option 1 for now before we actually encountered the index file mixing case.

lishunyang12 · 2026-02-21T05:26:25Z

+
+    def _recursive_replace(module: nn.Module, prefix: str):
+        for child_name, child_module in module.named_children():
+            new_module = child_module


Nice utility! One small thing I noticed: recursive_replace_linear always sets style = "replicate" for every nn.Linear in the model. This works correctly for FP8 quantization today, but if this utility is later reused for tensor-parallel text encoders, we'd need per-layer style selection.

Would it be worth accepting an optional style_map: dict[str, Style] parameter (defaulting to None = all replicate) to make this future-proof? Not a blocker at all -- just thinking about reusability since the function name suggests general-purpose use.

but if this utility is later reused for tensor-parallel text encoders, we'd need per-layer style selection.

Would it be worth accepting an optional style_map: dict[str, Style] parameter (defaulting to None = all replicate) to make this future-proof?

We can reuse tp_plan from Transformers model like vLLM's Transformers backend, but I would like to leave it to a following PR because it can make things quite complicated:
https://github.com/vllm-project/vllm/blob/bebfe55b1c17c2e0fedb1b402df1dddfc1a04684/vllm/model_executor/models/transformers/base.py#L285-L296

lishunyang12 · 2026-02-21T05:26:25Z

-        return loader.load_weights(weights)
+        loaded_weights = loader.load_weights(weights)
+        loaded_weights |= {f"vae.{name}" for name, _ in self.vae.named_parameters()}
+        return loaded_weights


I was curious about this: we're adding all VAE parameter names to loaded_weights so the weight loader doesn't complain about "unloaded" weights, but the VAE was already loaded via AutoencoderKL.from_pretrained() above. This makes sense as a workaround.

But I noticed that self.vae is loaded with from_pretrained (which uses HF's default dtype and device handling), while the text encoder now goes through create_transformers_model (which uses od_config.dtype and meta init). Could there be a dtype mismatch between the two if od_config.dtype differs from the default? Probably fine in practice since VAE is typically float32 anyway, but wanted to flag it.

I see, let's replace .from_pretrained with diffusion loader to load vae weights as well then.

Could there be a dtype mismatch between the two if od_config.dtype differs from the default? Probably fine in practice since VAE is typically float32 anyway, but wanted to flag it.

Latents is usually casted to vae's dtype before decoding,, so I think dtype mismatch won't be a critical issue here:

vllm-omni/vllm_omni/diffusion/models/z_image/pipeline_z_image.py

Lines 634 to 640 in efbe411

if output_type == "latent":

image = latents

else:

latents = latents.to(self.vae.dtype)

latents = (latents / self.vae.config.scaling_factor) + self.vae.config.shift_factor

image = self.vae.decode(latents, return_dict=False)[0]

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Isotr0py · 2026-02-21T13:24:33Z

+        vae_config = AutoencoderKL.load_config(model, subfolder="vae", local_files_only=local_files_only)
+        self.vae = AutoencoderKL.from_config(vae_config).to(self._execution_device)


Actually, the FP8 kernel is not unsuitable for vae, so let's not convert it with vllm quantization layer for now:

[Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] Error executing method 'generate'. This might cause issues in distributed execution. [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] Traceback (most recent call last): [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] File "/home/mozf/develop-projects/vllm-omni/vllm_omni/diffusion/worker/diffusion_worker.py", line 674, in execute_method [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] return func(*args, **kwargs) [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] ^^^^^^^^^^^^^^^^^^^^^ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] File "/home/mozf/develop-projects/vllm-omni/vllm_omni/diffusion/worker/diffusion_worker.py", line 163, in generate [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] return self.execute_model(request, self.od_config) [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] File "/home/mozf/develop-projects/vllm-omni/vllm_omni/diffusion/worker/diffusion_worker.py", line 185, in execute_model [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] return self.model_runner.execute_model(req) [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] File "/home/mozf/develop-projects/vllm-omni/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] return func(*args, **kwargs) [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] ^^^^^^^^^^^^^^^^^^^^^ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] File "/home/mozf/develop-projects/vllm-omni/vllm_omni/diffusion/worker/diffusion_model_runner.py", line 196, in execute_model [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] output = self.pipeline.forward(req) [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] ^^^^^^^^^^^^^^^^^^^^^^^^^^ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] File "/home/mozf/develop-projects/vllm-omni/vllm_omni/diffusion/models/z_image/pipeline_z_image.py", line 667, in forward [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] image = self.vae.decode(latents, return_dict=False)[0] [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] File "/home/mozf/develop-projects/vllm-omni/.venv/lib/python3.12/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] return method(self, *args, **kwargs) [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] File "/home/mozf/develop-projects/vllm-omni/.venv/lib/python3.12/site-packages/diffusers/models/autoencoders/autoencoder_kl.py", line 237, in decode [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] decoded = self._decode(z).sample [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] ^^^^^^^^^^^^^^^ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] File "/home/mozf/develop-projects/vllm-omni/.venv/lib/python3.12/site-packages/diffusers/models/autoencoders/autoencoder_kl.py", line 208, in _decode [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] dec = self.decoder(z) [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] ^^^^^^^^^^^^^^^ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] File "/home/mozf/develop-projects/vllm-omni/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] return self._call_impl(*args, **kwargs) [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] File "/home/mozf/develop-projects/vllm-omni/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] return forward_call(*args, **kwargs) [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] File "/home/mozf/develop-projects/vllm-omni/.venv/lib/python3.12/site-packages/diffusers/models/autoencoders/vae.py", line 298, in forward [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] sample = self.mid_block(sample, latent_embeds) [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] File "/home/mozf/develop-projects/vllm-omni/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] return self._call_impl(*args, **kwargs) [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] File "/home/mozf/develop-projects/vllm-omni/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] return forward_call(*args, **kwargs) [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] File "/home/mozf/develop-projects/vllm-omni/.venv/lib/python3.12/site-packages/diffusers/models/unets/unet_2d_blocks.py", line 745, in forward [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] hidden_states = attn(hidden_states, temb=temb) [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] File "/home/mozf/develop-projects/vllm-omni/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] return self._call_impl(*args, **kwargs) [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] File "/home/mozf/develop-projects/vllm-omni/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] return forward_call(*args, **kwargs) [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] File "/home/mozf/develop-projects/vllm-omni/.venv/lib/python3.12/site-packages/diffusers/models/attention_processor.py", line 605, in forward [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] return self.processor( [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] ^^^^^^^^^^^^^^^ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] File "/home/mozf/develop-projects/vllm-omni/.venv/lib/python3.12/site-packages/diffusers/models/attention_processor.py", line 2740, in __call__ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] query = attn.to_q(hidden_states) [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] ^^^^^^^^^^^^^^^^^^^^^^^^ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] File "/home/mozf/develop-projects/vllm-omni/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] return self._call_impl(*args, **kwargs) [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] File "/home/mozf/develop-projects/vllm-omni/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] return forward_call(*args, **kwargs) [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] File "/home/mozf/develop-projects/vllm-omni/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/linear.py", line 413, in forward [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] output = self.quant_method.apply(self, x, bias) [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] File "/home/mozf/develop-projects/vllm-omni/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/quantization/fp8.py", line 501, in apply [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] return apply_fp8_marlin_linear( [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] ^^^^^^^^^^^^^^^^^^^^^^^^ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] File "/home/mozf/develop-projects/vllm-omni/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/quantization/utils/marlin_utils_fp8.py", line 69, in apply_fp8_marlin_linear [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] output = ops.marlin_gemm( [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] ^^^^^^^^^^^^^^^^ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] File "/home/mozf/develop-projects/vllm-omni/.venv/lib/python3.12/site-packages/vllm/_custom_ops.py", line 1246, in marlin_gemm [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] return torch.ops._C.marlin_gemm( [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] ^^^^^^^^^^^^^^^^^^^^^^^^^ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] File "/home/mozf/develop-projects/vllm-omni/.venv/lib/python3.12/site-packages/torch/_ops.py", line 1209, in __call__ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] return self._op(*args, **kwargs) [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] ^^^^^^^^^^^^^^^^^^^^^^^^^ [Stage-0] ERROR 02-21 21:20:29 [diffusion_worker.py:678] RuntimeError: A.stride(1) is not 1

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

lishunyang12

Followed up on the latest changes -- the VAE loader migration and ValueError fix look good, left a few more nits.

Gaohan123 · 2026-03-17T15:16:41Z

@Isotr0py Please resolve conflicts. Thanks!

…ader Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

…ader Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

…ader Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Isotr0py · 2026-04-23T07:29:04Z

@lishunyang12 Can we merge this PR in v0.20.0?

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

…ader Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

vllm-project#1338) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: NumberWan <wantszkin2003@gmail.com>

vllm-project#1338) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

vllm-project#1338) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>

vllm-project#1338) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Isotr0py added 3 commits February 11, 2026 21:09

draft

e790e33

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

update

0fe72c7

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

fix

51070f4

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

lishunyang12 mentioned this pull request Feb 11, 2026

[RFC] Q1 Quantization Support #1057

Closed

Isotr0py added 6 commits February 12, 2026 22:43

update

4bb6989

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

fix

f6ebce1

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

fix

8a6d59f

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

revert weights tracking

59eb19d

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

code format

c0b8d16

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Merge remote-tracking branch 'upstream/main' into refine-diffusion-lo…

d37a560

…ader Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Isotr0py marked this pull request as ready for review February 12, 2026 15:41

Isotr0py requested a review from hsliuustc0106 as a code owner February 12, 2026 15:41

Isotr0py requested review from SamitHuang, ZJY0516 and david6666666 February 12, 2026 15:41

chatgpt-codex-connector Bot reviewed Feb 12, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/models/z_image/pipeline_z_image.py Outdated

Isotr0py added 3 commits February 12, 2026 23:48

fix codex

e0810eb

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

fix device

fd75eae

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

clean

d35ba01

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

lishunyang12 mentioned this pull request Feb 20, 2026

[Quantization] Enable FP8 weight storage for Qwen Image VAE and text encoder #1414

Closed

3 tasks

lishunyang12 reviewed Feb 21, 2026

View reviewed changes

Isotr0py added 3 commits February 21, 2026 20:49

Merge branch 'main' into refine-diffusion-loader

e81520f

vae use loader

53c28f5

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

clean

b644f56

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Isotr0py commented Feb 21, 2026

View reviewed changes

raise value error for multiple index

75fd6f7

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

lishunyang12 reviewed Feb 21, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/model_loader/diffusers_loader.py Outdated

lishunyang12 reviewed Feb 21, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/model_loader/diffusers_loader.py

lishunyang12 reviewed Feb 21, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/models/utils.py

This was referenced Mar 11, 2026

[Core] Unified quantization framework #1764

Merged

[RFC]: Continuous Quantization Support #1854

Open

Gaohan123 added this to the v0.18.0 milestone Mar 17, 2026

Merge remote-tracking branch 'upstream/main' into refine-diffusion-lo…

44b6e06

…ader Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

lishunyang12 mentioned this pull request Mar 17, 2026

[RFC]: Extend FP8 Quantization to Text Encoders and VAE in Diffusion Models #1044

Open

1 task

Isotr0py added 4 commits March 18, 2026 01:21

fix

865218b

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Merge branch 'main' into refine-diffusion-loader

9cb7a5f

Merge remote-tracking branch 'upstream/main' into refine-diffusion-lo…

5f12f2a

…ader Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

fix

65e8f4d

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Gaohan123 modified the milestones: v0.18.0, v0.20.0 Apr 14, 2026

lishunyang12 mentioned this pull request Apr 19, 2026

[Quant] Wire quant_config through HunyuanVideo-1.5 and Wan2.2 DiT for online FP8 #2920

Open

10 tasks

Merge remote-tracking branch 'upstream/main' into refine-diffusion-lo…

2248cce

…ader Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

hsliuustc0106 added the ready label to trigger buildkite CI label Apr 24, 2026

Isotr0py added 3 commits April 30, 2026 12:18

update doc

0b6fe43

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Merge remote-tracking branch 'upstream/main' into refine-diffusion-lo…

570bbd3

…ader Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

update doc

97f2466

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

lishunyang12 approved these changes Apr 30, 2026

View reviewed changes

hsliuustc0106 merged commit ac66282 into vllm-project:main Apr 30, 2026
6 of 8 checks passed

hsliuustc0106 mentioned this pull request Apr 30, 2026

Revert "[Quantization] Enable FP8 online quantization for Z-image text encoder" #3272

Merged

Isotr0py mentioned this pull request Apr 30, 2026

[Quantization] Redo Z-Image text encoder FP8 online quantization #3279

Merged

5 tasks

xiaohajiayou pushed a commit to xiaohajiayou/vllm-omni that referenced this pull request Apr 30, 2026

[Quantization] Enable FP8 online quantization for Z-image text encoder (

8a86778

vllm-project#1338) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026

[Quantization] Enable FP8 online quantization for Z-image text encoder (

1bdd0ad

vllm-project#1338) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

BeatSeat pushed a commit to BeatSeat/vllm-omni that referenced this pull request May 2, 2026

[Quantization] Enable FP8 online quantization for Z-image text encoder (

1cf6685

vllm-project#1338) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

xRay2016 mentioned this pull request May 10, 2026

[Quantization] Enable FP8 online quantization for Qwen-image-edit text encoder #3484

Open

5 tasks

clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026

[Quantization] Enable FP8 online quantization for Z-image text encoder (

f2041f3

vllm-project#1338) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

	if output_type == "latent":
	image = latents
	else:
	latents = latents.to(self.vae.dtype)
	latents = (latents / self.vae.config.scaling_factor) + self.vae.config.shift_factor

	image = self.vae.decode(latents, return_dict=False)[0]

		vae_config = AutoencoderKL.load_config(model, subfolder="vae", local_files_only=local_files_only)
		self.vae = AutoencoderKL.from_config(vae_config).to(self._execution_device)

Conversation

Isotr0py commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Gaohan123 commented Mar 17, 2026

Uh oh!

Isotr0py commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Isotr0py commented Feb 11, 2026 •

edited

Loading