Add support for ModelOpt MXFP8 models#31603
Add support for ModelOpt MXFP8 models#31603danisereb wants to merge 2 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds basic support for MXFP8 quantized models. The changes include adding mxfp8 to quantization configurations, implementing Mxfp8Config for linear layers and MoE layers, and adding utility functions for MXFP8 operations.
The implementation for linear layers uses torch._scaled_mm for performance. The MoE implementation currently falls back to dequantizing weights to BF16, as noted in the PR description.
I've found two critical issues:
- In the MoE implementation, there's incorrect slicing logic for weight scales when expert parallelism is used, which would lead to errors.
- The MXFP8 linear layer implementation is missing the bias addition.
Please address these issues. Otherwise, the changes look good and are a good step towards full MXFP8 support.
|
the PR generally looks good However, we are actively trying to deprecate the long tail of quantization integrations to focus on our core integrations We support MXFP8 in llm-compressor/compressed-tensors. Would you be open to adding this as a compressed-tensors backend rather than as a new discrete quantization integration? |
3ff19c5 to
fa4ac0c
Compare
|
Documentation preview: https://vllm--31603.org.readthedocs.build/en/31603/ |
caeae4d to
4bf4d13
Compare
4bf4d13 to
054c113
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
054c113 to
d2f5a05
Compare
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
|
This pull request has merge conflicts that must be resolved before it can be |
|
Not relevant, a new PR will be opened if required. |
Purpose
Add support for ModelOpt MXFP8 models.
Test Plan
Test a model that was converted to MXFP8 using ModelOpt.
https://huggingface.co/nvidia/OpenMath2-Llama3.1-8B
Test Result
Eval command:
Benchmark command:
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.