[Bugfix] Skip bias tensors in online FP8 quantization pipeline by r266-tech · Pull Request #39962 · vllm-project/vllm

r266-tech · 2026-04-16T04:14:27Z

Summary

Add "bias" to SKIP_TENSORS in vllm/model_executor/model_loader/reload/meta.py so bias parameters bypass make_online_process_loader wrapping during initialize_online_processing.

Fixes #39663. Resubmit of #39665 (auto-closed by my notification cleanup without maintainer review).

Problem

With --quantization fp8 on BF16 checkpoints, models that register bias=True linear layers (Qwen2/2.5, GPT-2, Phi, etc.) produce garbage output: bias tensors get wrapped by the online processing pipeline but never materialize — they silently stay at zero.

Root cause

initialize_online_processing in layerwise.py wraps weight loaders for all tensors not in SKIP_TENSORS. Bias parameters (1D, small) don't need FP8 quantization and are not designed to flow through the deferred loading pipeline. Same class of bug previously addressed for e_score_correction_bias (already in the set).

Reporter @alankessler filed #39663 with the full env capture and repro (Qwen2/2.5 producing garbage output under --quantization fp8).

Fix

Add "bias" to the SKIP_TENSORS set. vLLM parallel linear layers (ColumnParallelLinear, QKVParallelLinear, RowParallelLinear, etc.) register bias as exactly self.bias via Parameter(...) / register_parameter("bias", ...), so the exact-name match in SKIP_TENSORS targets precisely the affected tensors.

Test

Added test_capture_layer_to_meta_skips_bias — CPU-only unit test that verifies:

"bias" is present in SKIP_TENSORS
torch.nn.Linear(bias=True) → capture_layer_to_meta drops the bias but keeps the weight

The negative case (accidental over-skip) is constrained by the existing test_reload_lifecycle test, which exercises end-to-end capture/restore/materialize on a torch.nn.Linear and would fail if weight tensors were incorrectly skipped.

Notes

1-line functional change (alphabetized SKIP_TENSORS for diff noise). No test infra changes, no new deps.
Previous submission [Bugfix] Skip bias tensors in online FP8 quantization pipeline #39665 had Codex adversarial review flagging WEAK_CONCERNS about "bias" being a generic name. The mitigating evidence: reporter's repro plus shared precedent (e_score_correction_bias) of the same exact-name skip pattern + existing e_score_correction_bias precedent (also a narrow exact-name skip).

gemini-code-assist

Code Review

This pull request addresses a bug where bias parameters were incorrectly handled by the online loader during FP8 quantization. It adds "bias" to the SKIP_TENSORS set in vllm/model_executor/model_loader/reload/meta.py to ensure it follows the normal load path. A regression test has also been added to verify that bias parameters are skipped during meta capture. I have no feedback to provide.

pstefa1707 · 2026-04-22T18:17:51Z

I haven't been involved in this and haven't empirically validated anything!
Maybe a typo on username?

r266-tech · 2026-04-23T04:09:00Z

@pstefa1707 Apologies for the mistaken @-mention — you weren't involved in this. I confused handles from another context. The actual reporter who provided the repro/validation is @alankessler in #39663. I'm correcting the PR body now.

alankessler · 2026-04-23T04:20:04Z

You’re really eager to poach my PR, huh @r266-tech #39666

r266-tech · 2026-04-23T10:20:54Z

Hi @alankessler — you're right to call this out. My apologies. I didn't internalize that you already had an open fix when I resubmitted this (I filed #39665 the same day you opened #39666 and then resubmitted as #39962 after closing mine), and I should have deferred to your PR from the start.

Closing this in favor of #39666, which was yours first and addresses the same fix. Sorry for the noise and for the mistaken @-mention of @pstefa1707 earlier in the thread — that came from confusing handles while drafting the body and isn't an excuse. Rooting for #39666 to land.

r266-tech added 2 commits April 16, 2026 12:13

Add bias to SKIP_TENSORS to fix online FP8 quantization

3c9692e

Add regression test for bias skip in online reload

46705d8

r266-tech requested a review from 22quinn as a code owner April 16, 2026 04:14

mergify Bot added the bug Something isn't working label Apr 16, 2026

gemini-code-assist Bot reviewed Apr 16, 2026

View reviewed changes

r266-tech closed this Apr 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Skip bias tensors in online FP8 quantization pipeline#39962

[Bugfix] Skip bias tensors in online FP8 quantization pipeline#39962
r266-tech wants to merge 2 commits into
vllm-project:mainfrom
r266-tech:fix/fp8-online-quant-skip-bias-v2

r266-tech commented Apr 16, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

pstefa1707 commented Apr 22, 2026

Uh oh!

r266-tech commented Apr 23, 2026

Uh oh!

alankessler commented Apr 23, 2026

Uh oh!

r266-tech commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

r266-tech commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Root cause

Fix

Test

Notes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

pstefa1707 commented Apr 22, 2026

Uh oh!

r266-tech commented Apr 23, 2026

Uh oh!

alankessler commented Apr 23, 2026

Uh oh!

r266-tech commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

r266-tech commented Apr 16, 2026 •

edited

Loading