[Bugfix] Add bias to SKIP_TENSORS to fix online FP8 for models with biased linears by alankessler · Pull Request #39666 · vllm-project/vllm

alankessler · 2026-04-13T04:35:36Z

Purpose

Fix online FP8 quantization producing garbage output for models with bias=True on linear layers (e.g. Qwen2 family).

The layerwise reload mechanism (initialize_online_processing) wraps weight loaders for all tensors not in SKIP_TENSORS. This prevents bias parameters from loading correctly because they stay at zero instead of being populated from the checkpoint. Qwen2's qkv_proj has bias=True with values up to 147.0, so zeroed bias completely corrupts Q/K/V projections.

Same class of bug as #37334 and #38746.

Fixes #39663. Likely related to #27364 and #24025.

Test Plan

python -m pytest tests/quantization/test_fp8.py::test_fp8_online_bias_model -xvs

New integration test: loads Qwen/Qwen2-0.5B in BF16 (baseline) and FP8 online, compares logprobs with check_logprobs_close.

Test Result

Without fix: test fails: FP8 output is vpn.lua之心_burg.numpy...
With fix: test pass: FP8 logprobs match BF16

Full test_fp8.py suite: 15 passed, 4 skipped, 0 failed (CUDA, RTX 5060 Ti, vLLM 0.19.0)

Also manually verified on Intel XPU (Arc Pro B70) with Qwen2.5-0.5B, Phi-2, GPT-2, and Mistral-Nemo (bug originally found because this failed on Intel XPU under Qwen and QwQ) all produce correct output with the fix, no regressions.

…True The layerwise reload mechanism wraps weight loaders for all tensors not in SKIP_TENSORS. This prevents bias parameters from loading correctly during online FP8 quantization, leaving them as zeros. Qwen2 is the most visible case (bias=True on qkv_proj), but any architecture with biased linear layers is affected. Fixes: vllm-project#39663 Related: vllm-project#37334, vllm-project#38746 Signed-off-by: Alan Kessler <alankessler@gmail.com>

gemini-code-assist

Code Review

This pull request introduces a regression test for online FP8 quantization on models with bias and updates the model loader to skip 'bias' tensors during reload to prevent output corruption. A review comment suggests also skipping 'w13_bias' and 'w2_bias' to ensure consistency and prevent similar issues in MoE layers.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: alankessler <alankessler@gmail.com>

alankessler · 2026-04-25T01:54:37Z

It looks like it's blocked on the pre-run check label gate. Could someone add "ready" to kick off the CI, please? Thank you!

alankessler · 2026-04-26T20:17:54Z

@mgoin hi, would you please verify to kick off CI? The contribution instructions told me to ping you all if it's been 7 days without moving; it's been about 2 weeks.

Thanks!

mgoin · 2026-04-27T06:36:33Z

Thanks for finding this issue! Cc @kylesayrs

alankessler · 2026-04-27T15:02:12Z

Failing checks are all unrelated to this PR: schemathesis fuzzer flake, a seeded sampling mismatch in test_cpu_offload, and CUDA OOM in the fusion-e2e-tp2 suite

alankessler · 2026-05-06T21:50:16Z

Closing — #41424 found the root cause. Thanks for reviewing, @mgoin!

alankessler requested review from 22quinn, mgoin, pavanimajety, robertgshaw2-redhat and yewentao256 as code owners April 13, 2026 04:35

mergify Bot added the bug Something isn't working label Apr 13, 2026

gemini-code-assist Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread vllm/model_executor/model_loader/reload/meta.py

alankessler and others added 5 commits April 12, 2026 21:39

Update vllm/model_executor/model_loader/reload/meta.py

0e930d6

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: alankessler <alankessler@gmail.com>

Merge branch 'main' into fix/fp8-online-bias-loading

1e2187c

Merge branch 'main' into fix/fp8-online-bias-loading

b77043e

Merge branch 'vllm-project:main' into fix/fp8-online-bias-loading

47383f5

Merge branch 'main' into fix/fp8-online-bias-loading

32bb199

alankessler mentioned this pull request Apr 23, 2026

[Bugfix] Skip bias tensors in online FP8 quantization pipeline #39962

Closed

alankessler added 2 commits April 26, 2026 02:12

Merge branch 'main' into fix/fp8-online-bias-loading

d43ff6a

Merge branch 'vllm-project:main' into fix/fp8-online-bias-loading

a84d4eb

Merge branch 'vllm-project:main' into fix/fp8-online-bias-loading

9ad5ee1

mgoin approved these changes Apr 27, 2026

View reviewed changes

mgoin added ready ONLY add when PR is ready to merge/full CI is needed quantization labels Apr 27, 2026

Merge branch 'main' into fix/fp8-online-bias-loading

2f63f0a

alankessler added 2 commits April 27, 2026 10:39

Merge branch 'main' into fix/fp8-online-bias-loading

4b5ae7e

Merge branch 'main' into fix/fp8-online-bias-loading

d17e903

alankessler closed this May 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Add bias to SKIP_TENSORS to fix online FP8 for models with biased linears#39666

[Bugfix] Add bias to SKIP_TENSORS to fix online FP8 for models with biased linears#39666
alankessler wants to merge 12 commits into
vllm-project:mainfrom
alankessler:fix/fp8-online-bias-loading

alankessler commented Apr 13, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

alankessler commented Apr 25, 2026

Uh oh!

alankessler commented Apr 26, 2026

Uh oh!

mgoin commented Apr 27, 2026

Uh oh!

alankessler commented Apr 27, 2026 •

edited

Loading

Uh oh!

alankessler commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

alankessler commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

alankessler commented Apr 25, 2026

Uh oh!

alankessler commented Apr 26, 2026

Uh oh!

mgoin commented Apr 27, 2026

Uh oh!

alankessler commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alankessler commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alankessler commented Apr 13, 2026 •

edited

Loading

alankessler commented Apr 27, 2026 •

edited

Loading