Skip to content

[Bugfix] Skip bias tensors in online FP8 quantization pipeline#39962

Closed
r266-tech wants to merge 2 commits into
vllm-project:mainfrom
r266-tech:fix/fp8-online-quant-skip-bias-v2
Closed

[Bugfix] Skip bias tensors in online FP8 quantization pipeline#39962
r266-tech wants to merge 2 commits into
vllm-project:mainfrom
r266-tech:fix/fp8-online-quant-skip-bias-v2

Conversation

@r266-tech
Copy link
Copy Markdown
Contributor

@r266-tech r266-tech commented Apr 16, 2026

Summary

Add "bias" to SKIP_TENSORS in vllm/model_executor/model_loader/reload/meta.py so bias parameters bypass make_online_process_loader wrapping during initialize_online_processing.

Fixes #39663. Resubmit of #39665 (auto-closed by my notification cleanup without maintainer review).

Problem

With --quantization fp8 on BF16 checkpoints, models that register bias=True linear layers (Qwen2/2.5, GPT-2, Phi, etc.) produce garbage output: bias tensors get wrapped by the online processing pipeline but never materialize — they silently stay at zero.

Root cause

initialize_online_processing in layerwise.py wraps weight loaders for all tensors not in SKIP_TENSORS. Bias parameters (1D, small) don't need FP8 quantization and are not designed to flow through the deferred loading pipeline. Same class of bug previously addressed for e_score_correction_bias (already in the set).

Reporter @alankessler filed #39663 with the full env capture and repro (Qwen2/2.5 producing garbage output under --quantization fp8).

Fix

Add "bias" to the SKIP_TENSORS set. vLLM parallel linear layers (ColumnParallelLinear, QKVParallelLinear, RowParallelLinear, etc.) register bias as exactly self.bias via Parameter(...) / register_parameter("bias", ...), so the exact-name match in SKIP_TENSORS targets precisely the affected tensors.

Test

Added test_capture_layer_to_meta_skips_bias — CPU-only unit test that verifies:

  1. "bias" is present in SKIP_TENSORS
  2. torch.nn.Linear(bias=True)capture_layer_to_meta drops the bias but keeps the weight

The negative case (accidental over-skip) is constrained by the existing test_reload_lifecycle test, which exercises end-to-end capture/restore/materialize on a torch.nn.Linear and would fail if weight tensors were incorrectly skipped.

Notes

  • 1-line functional change (alphabetized SKIP_TENSORS for diff noise). No test infra changes, no new deps.
  • Previous submission [Bugfix] Skip bias tensors in online FP8 quantization pipeline #39665 had Codex adversarial review flagging WEAK_CONCERNS about "bias" being a generic name. The mitigating evidence: reporter's repro plus shared precedent (e_score_correction_bias) of the same exact-name skip pattern + existing e_score_correction_bias precedent (also a narrow exact-name skip).

@r266-tech r266-tech requested a review from 22quinn as a code owner April 16, 2026 04:14
@mergify mergify Bot added the bug Something isn't working label Apr 16, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug where bias parameters were incorrectly handled by the online loader during FP8 quantization. It adds "bias" to the SKIP_TENSORS set in vllm/model_executor/model_loader/reload/meta.py to ensure it follows the normal load path. A regression test has also been added to verify that bias parameters are skipped during meta capture. I have no feedback to provide.

@pstefa1707
Copy link
Copy Markdown

I haven't been involved in this and haven't empirically validated anything!
Maybe a typo on username?

@r266-tech
Copy link
Copy Markdown
Contributor Author

@pstefa1707 Apologies for the mistaken @-mention — you weren't involved in this. I confused handles from another context. The actual reporter who provided the repro/validation is @alankessler in #39663. I'm correcting the PR body now.

@alankessler
Copy link
Copy Markdown

You’re really eager to poach my PR, huh @r266-tech #39666

@r266-tech
Copy link
Copy Markdown
Contributor Author

Hi @alankessler — you're right to call this out. My apologies. I didn't internalize that you already had an open fix when I resubmitted this (I filed #39665 the same day you opened #39666 and then resubmitted as #39962 after closing mine), and I should have deferred to your PR from the start.

Closing this in favor of #39666, which was yours first and addresses the same fix. Sorry for the noise and for the mistaken @-mention of @pstefa1707 earlier in the thread — that came from confusing handles while drafting the body and isn't an excuse. Rooting for #39666 to land.

@r266-tech r266-tech closed this Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Online FP8 quantization drops bias weights, which breaks Qwen2 and other models with bias=True

3 participants