Add support for ModelOpt MXFP8 models by danisereb · Pull Request #31603 · vllm-project/vllm

danisereb · 2026-01-01T16:16:24Z

Purpose

Add support for ModelOpt MXFP8 models.

Test Plan

Test a model that was converted to MXFP8 using ModelOpt.
https://huggingface.co/nvidia/OpenMath2-Llama3.1-8B

Test Result

Eval command:

export MODEL_PATH=/my_home/hf_models/nvidia/OpenMath2-Llama3.1-8B-MXFP8

lm_eval \
  --model vllm \
  --model_args pretrained=$MODEL_PATH,max_model_len=4096,enforce_eager=True \
  --tasks gsm8k \
  --batch_size auto

Benchmark command:

export MODEL_PATH=/my_home/hf_models/nvidia/OpenMath2-Llama3.1-8B-MXFP8

vllm bench throughput --model $MODEL_PATH \
--tensor-parallel-size 1 \
--load-format dummy \
--enforce-eager \
--trust-remote-code \
--async-scheduling \
--backend vllm \
--dataset-name random \
--num-prompts 16 \
--input-len 1000 \
--output-len 1000

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request adds basic support for MXFP8 quantized models. The changes include adding mxfp8 to quantization configurations, implementing Mxfp8Config for linear layers and MoE layers, and adding utility functions for MXFP8 operations.

The implementation for linear layers uses torch._scaled_mm for performance. The MoE implementation currently falls back to dequantizing weights to BF16, as noted in the PR description.

I've found two critical issues:

In the MoE implementation, there's incorrect slicing logic for weight scales when expert parallelism is used, which would lead to errors.
The MXFP8 linear layer implementation is missing the bias addition.

Please address these issues. Otherwise, the changes look good and are a good step towards full MXFP8 support.

vllm/model_executor/layers/quantization/mxfp8.py

vllm/model_executor/layers/quantization/utils/mxfp8_utils.py

robertgshaw2-redhat · 2026-01-01T16:36:38Z

the PR generally looks good

However, we are actively trying to deprecate the long tail of quantization integrations to focus on our core integrations

We support MXFP8 in llm-compressor/compressed-tensors. Would you be open to adding this as a compressed-tensors backend rather than as a new discrete quantization integration?

mergify · 2026-01-15T11:53:14Z

Documentation preview: https://vllm--31603.org.readthedocs.build/en/31603/

mergify · 2026-01-28T17:52:16Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @danisereb.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>

mergify · 2026-02-03T06:56:43Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @danisereb.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

danisereb · 2026-02-03T14:03:49Z

Not relevant, a new PR will be opened if required.

gemini-code-assist bot reviewed Jan 1, 2026

View reviewed changes

vllm/model_executor/layers/quantization/mxfp8.py Outdated Show resolved Hide resolved

vllm/model_executor/layers/quantization/utils/mxfp8_utils.py Outdated Show resolved Hide resolved

danisereb force-pushed the support_mxfp8_basic branch from 3ff19c5 to fa4ac0c Compare January 15, 2026 11:52

mergify bot added the documentation Improvements or additions to documentation label Jan 15, 2026

danisereb changed the title ~~Add basic support for mxfp8 quantized models~~ Add support for ModelOpt MXFP8 models Jan 15, 2026

danisereb force-pushed the support_mxfp8_basic branch 3 times, most recently from caeae4d to 4bf4d13 Compare January 19, 2026 15:35

danisereb mentioned this pull request Jan 25, 2026

Add support for MXFP8 PTQ NVIDIA/Model-Optimizer#736

Merged

danisereb force-pushed the support_mxfp8_basic branch from 4bf4d13 to 054c113 Compare January 28, 2026 17:44

mergify bot added the needs-rebase label Jan 28, 2026

Add support for ModelOpt MXFP8 models

d2f5a05

Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>

danisereb force-pushed the support_mxfp8_basic branch from 054c113 to d2f5a05 Compare January 29, 2026 14:54

Remove use_fallback from Mxfp8LinearOp

4849013

Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>

mergify bot removed the needs-rebase label Jan 29, 2026

mergify bot added the needs-rebase label Feb 3, 2026

danisereb closed this Feb 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for ModelOpt MXFP8 models#31603

Add support for ModelOpt MXFP8 models#31603
danisereb wants to merge 2 commits intovllm-project:mainfrom
danisereb:support_mxfp8_basic

danisereb commented Jan 1, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

robertgshaw2-redhat commented Jan 1, 2026

Uh oh!

mergify bot commented Jan 15, 2026

Uh oh!

mergify bot commented Jan 28, 2026

Uh oh!

mergify bot commented Feb 3, 2026

Uh oh!

danisereb commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

danisereb commented Jan 1, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

robertgshaw2-redhat commented Jan 1, 2026

Uh oh!

mergify bot commented Jan 15, 2026

Uh oh!

mergify bot commented Jan 28, 2026

Uh oh!

mergify bot commented Feb 3, 2026

Uh oh!

danisereb commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danisereb commented Jan 1, 2026 •

edited by github-actions bot

Loading