[Bugfix] Skip bias tensors in online FP8 quantization pipeline by r266-tech · Pull Request #39665 · vllm-project/vllm

r266-tech · 2026-04-13T04:19:43Z

Summary

Add "bias" to SKIP_TENSORS in vllm/model_executor/model_loader/reload/meta.py so bias parameters bypass make_online_process_loader wrapping during initialize_online_processing.

Problem: When using --quantization fp8 on BF16 checkpoints, models with bias=True linear layers (Qwen2/2.5, GPT-2, Phi, etc.) produce garbage output. The bias tensors get wrapped by the online processing pipeline but never materialize — they silently stay at zero.

Root cause: initialize_online_processing in layerwise.py wraps weight loaders for all tensors not in SKIP_TENSORS. Bias parameters (1D, small) don't need FP8 quantization and are not designed to flow through the deferred loading pipeline. Same class of bug as #37334 and #38746.

Fix: 1-line addition of "bias" to the skip set. All vllm parallel linear layers (ColumnParallelLinear, QKVParallelLinear, RowParallelLinear, etc.) register bias as self.bias via Parameter(...) or register_parameter("bias", None), so the exact string "bias" matches all standard bias parameters.

Note: Custom non-standard bias parameters with different names (e.g., e_score_correction_bias) are already individually listed in SKIP_TENSORS. This fix targets the standard nn.Linear-style bias that all parallel linear layers inherit.

Fixes #39663

gemini-code-assist

Code Review

This pull request updates the metadata keys in vllm/model_executor/model_loader/reload/meta.py by adding "bias" to the existing set. There are no review comments, and I have no feedback to provide.

Add "bias" to SKIP_TENSORS so bias parameters are not wrapped by make_online_process_loader during initialize_online_processing. Without this, bias tensors on models with bias=True (Qwen2/2.5, GPT-2, Phi, etc.) silently stay at zero when using --quantization fp8 on BF16 checkpoints. Fixes vllm-project#39663 Signed-off-by: r266-tech <183631678+r266-tech@users.noreply.github.com>

r266-tech requested a review from 22quinn as a code owner April 13, 2026 04:19

mergify Bot added the bug Something isn't working label Apr 13, 2026

gemini-code-assist Bot reviewed Apr 13, 2026

View reviewed changes

r266-tech closed this Apr 13, 2026

r266-tech force-pushed the fix/fp8-online-quant-skip-bias branch from 8120d4b to 274a4f8 Compare April 13, 2026 05:17

r266-tech requested review from BoyuanFeng, LucasWilkinson, MatthewBonanni, ProExpertProg, chaunceyjiang, houseroad, pavanimajety, tlrmchlsmth and zou3519 as code owners April 13, 2026 05:17

mergify Bot assigned sangstar Apr 13, 2026

mergify Bot added the kv-connector label Apr 13, 2026

r266-tech mentioned this pull request Apr 16, 2026

[Bugfix] Skip bias tensors in online FP8 quantization pipeline #39962

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Skip bias tensors in online FP8 quantization pipeline#39665

[Bugfix] Skip bias tensors in online FP8 quantization pipeline#39665
r266-tech wants to merge 1 commit into
vllm-project:mainfrom
r266-tech:fix/fp8-online-quant-skip-bias

r266-tech commented Apr 13, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

r266-tech commented Apr 13, 2026

Summary

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants