[Quantization] Support FP8 MoE bias for models like GPT-OSS by jasperjiaguo · Pull Request #34906 · vllm-project/vllm

jasperjiaguo · 2026-02-19T18:57:23Z

Summary

GPT-OSS-120B has biased MoE layers (gate_up_proj_bias, down_proj_bias). When serving the BF16 model with --quantization fp8, Fp8MoEMethod doesn't register bias parameters, causing weight loading failures.

This PR adds bias support to Fp8MoEMethod:

Register w13_bias / w2_bias in create_weights() when FusedMoEConfig.has_bias is set
Pass biases through to fused_experts() in apply()
Guard against unsupported FusedMoEModularKernel + bias combination (consistent with UnquantizedFusedMoEMethod)

Test plan

Tested on 4×H200 with GPT-OSS-120B BF16 model:

Metric	BF16 (no quant)	BF16 + `--quantization fp8`
GSM8K Accuracy	0.848	0.834
Output throughput	~9,600 tok/s	~14,000 tok/s

vllm serve --quantization fp8 loads successfully with biased MoE
FP8 fused_moe_kernel confirmed via nsys profiling

Code Review

This pull request introduces support for FP8 MoE bias in models like GPT-OSS-120B. The changes include registering bias parameters in Fp8MoEMethod.create_weights, passing them to the fused_experts kernel in apply, and adding a safety guard for the FusedMoEModularKernel.

While the logic is sound, there are a few critical issues regarding the data types used for bias parameters and potential typos in attribute names that could lead to runtime errors or precision loss. Specifically, biases should typically remain in the model's original high precision (e.g., BF16) rather than being quantized to FP8, and there are references to self.moe which should likely be self.moe_config.

gemini-code-assist · 2026-02-19T19:03:20Z

vllm/model_executor/layers/quantization/fp8.py

        set_weight_attrs(w2_weight, extra_weight_attrs)

+        # BIASES (for models like GPT-OSS that have biased MoE)
+        if self.moe.has_bias:


The attribute self.moe is not defined in Fp8MoEMethod. Based on the class initialization and vLLM conventions, this should be self.moe_config.

Suggested change

if self.moe.has_bias:

if self.moe_config.has_bias:

Can you double check this?

This is false positive. self.moe is inherited from the parent class FusedMoEMethodBase

class FusedMoEMethodBase(QuantizeMethodBase): def __init__(self, moe: FusedMoEConfig): super().__init__() self.moe: FusedMoEConfig = moe self.moe_quant_config: FusedMoEQuantConfig | None = None self.moe_mk: mk.FusedMoEModularKernel | None = None

vllm/vllm/model_executor/layers/fused_moe/fused_moe_method_base.py

Line 28 in b8d8b7e

self.moe: FusedMoEConfig = moe

vllm/model_executor/layers/quantization/fp8.py

mergify · 2026-02-19T19:44:41Z

Hi @jasperjiaguo, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

GPT-OSS-120B has biased MoE layers (gate_up_proj_bias, down_proj_bias). When serving the BF16 model with `--quantization fp8`, Fp8MoEMethod and Fp8OnlineMoEMethod don't register bias parameters, causing weight loading failures. This adds bias support to both FP8 MoE method classes: - Register w13_bias/w2_bias in Fp8MoEMethod.create_weights() when moe.has_bias is set - Inject biases into quant_config via get_fused_moe_quant_config() - Register biases in Fp8OnlineMoEMethod.create_weights() using the original (unpatched) weight_loader Tested on 4xH200 with GPT-OSS-120B BF16 + vllm 0.15.1: - vllm serve --quantization fp8 loads and serves successfully - TRITON Fp8 MoE backend selected correctly - GSM8K accuracy: 0.834 (FP8) vs 0.848 (BF16) - 1.5x throughput improvement with FP8 Companion PR: sgl-project/sglang#18988 Signed-off-by: jasperjiaguo <jasperg662@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

vkuzo · 2026-02-24T19:04:14Z

@jasperjiaguo could you share the exact command to reproduce the test plan in this PR?

…ject#34906) Signed-off-by: jasperjiaguo <jasperg662@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ject#34906) Signed-off-by: jasperjiaguo <jasperg662@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com>

…ject#34906) Signed-off-by: jasperjiaguo <jasperg662@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jasperjiaguo requested review from mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners February 19, 2026 18:57

mergify bot added gpt-oss Related to GPT-OSS models needs-rebase labels Feb 19, 2026

github-project-automation bot added this to gpt-oss Issues & Enhancements Feb 19, 2026

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Feb 19, 2026

gemini-code-assist bot reviewed Feb 19, 2026

View reviewed changes

jasperjiaguo force-pushed the gpt-oss-fp8-moe-bias branch 2 times, most recently from 3f49b35 to be3c045 Compare February 19, 2026 19:40

mergify bot removed the needs-rebase label Feb 19, 2026

jasperjiaguo force-pushed the gpt-oss-fp8-moe-bias branch 2 times, most recently from 5469510 to 7963ae7 Compare February 19, 2026 22:46

jasperjiaguo force-pushed the gpt-oss-fp8-moe-bias branch from 7963ae7 to 29b8f6a Compare February 19, 2026 22:47

simon-mo approved these changes Feb 22, 2026

View reviewed changes

github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Feb 22, 2026

simon-mo added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 22, 2026

jasperjiaguo added 3 commits February 22, 2026 21:29

Merge branch 'main' into gpt-oss-fp8-moe-bias

f82bb7e

Merge branch 'main' into gpt-oss-fp8-moe-bias

6268898

Merge branch 'main' into gpt-oss-fp8-moe-bias

6abc91b

simon-mo merged commit ec85340 into vllm-project:main Feb 24, 2026
59 of 62 checks passed

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements Feb 24, 2026

vkuzo mentioned this pull request Feb 24, 2026

refactor fp8.py online quant weight loading to use layerwise reload utils #33814

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Quantization] Support FP8 MoE bias for models like GPT-OSS#34906

[Quantization] Support FP8 MoE bias for models like GPT-OSS#34906
simon-mo merged 4 commits intovllm-project:mainfrom
jasperjiaguo:gpt-oss-fp8-moe-bias

jasperjiaguo commented Feb 19, 2026

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

mergify bot commented Feb 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 19, 2026

Uh oh!

simon-mo Feb 22, 2026

Uh oh!

jasperjiaguo Feb 23, 2026 •

edited

Loading

Uh oh!

jasperjiaguo Feb 23, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Feb 19, 2026

Uh oh!

Uh oh!

vkuzo commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

jasperjiaguo commented Feb 19, 2026

Summary

Test plan

Related

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

mergify bot commented Feb 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

simon-mo Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

jasperjiaguo Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jasperjiaguo Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Feb 19, 2026

Uh oh!

Uh oh!

vkuzo commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jasperjiaguo Feb 23, 2026 •

edited

Loading