Skip to content

Support mxfp4 for GPT-OSS#8843

Merged
zhyncs merged 9 commits intomainfrom
gpt-oss-mxfp4
Aug 6, 2025
Merged

Support mxfp4 for GPT-OSS#8843
zhyncs merged 9 commits intomainfrom
gpt-oss-mxfp4

Conversation

@Ying1123
Copy link
Copy Markdown
Contributor

@Ying1123 Ying1123 commented Aug 6, 2025

No description provided.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@Ying1123 Ying1123 marked this pull request as draft August 6, 2025 04:00
@zhyncs zhyncs marked this pull request as ready for review August 6, 2025 04:49
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@zhyncs zhyncs merged commit 168033d into main Aug 6, 2025
9 of 57 checks passed
@zhyncs zhyncs deleted the gpt-oss-mxfp4 branch August 6, 2025 07:05
@zhyncs
Copy link
Copy Markdown
Collaborator

zhyncs commented Aug 6, 2025

Thanks to the NVIDIA Solution Architect Team for their great work! @zhuofan1123 @liz-badada @xutizhou @linhu-nv



class MxFp4Config(QuantizationConfig):
class Mxfp4Config(QuantizationConfig):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ying1123

Thank you to code works!

Is it ok about MxFp4Config -> Mxfp4Config?

@zhyncs zhyncs mentioned this pull request Aug 6, 2025
6 tasks
@EduardDurech
Copy link
Copy Markdown
Contributor

EduardDurech commented Aug 6, 2025

? I'm using meta-llama/Llama-3.2-3B-Instruct and not quantized

    engine    = sgl.Engine(model_path=a.model, tp_size=int(os.getenv("TP")))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sglang/python/sglang/api.py", line 44, in Engine
    from sglang.srt.entrypoints.engine import Engine
  File "/sglang/python/sglang/srt/entrypoints/engine.py", line 42, in <module>
    from sglang.srt.managers.data_parallel_controller import (
  File "/sglang/python/sglang/srt/managers/data_parallel_controller.py", line 38, in <module>
    from sglang.srt.managers.scheduler import run_scheduler_process
  File "/sglang/python/sglang/srt/managers/scheduler.py", line 37, in <module>
    from sglang.srt.configs.model_config import ModelConfig
  File "/sglang/python/sglang/srt/configs/model_config.py", line 31, in <module>
    from sglang.srt.layers.quantization import QUANTIZATION_METHODS
  File "/sglang/python/sglang/srt/layers/quantization/__init__.py", line 69, in <module>
    from sglang.srt.layers.quantization.mxfp4 import Mxfp4Config
  File "/sglang/python/sglang/srt/layers/quantization/mxfp4.py", line 36, in <module>
    from flashinfer import (
ImportError: cannot import name 'mxfp8_quantize' from 'flashinfer' (/usr/local/lib/python3.12/dist-packages/flashinfer/__init__.py). Did you mean: 'fp4_quantize'?

We need to rebuild all our libraries I assume

narutolhy pushed a commit to narutolhy/sglang that referenced this pull request Aug 17, 2025
Co-authored-by: Co-author fzyzcjy <ch271828n@outlook.com>
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
Co-authored-by: zhuofan1123 <zhuofanl@nvidia.com>
Co-authored-by: liz-badada <jinyanc@nvidia.com>
Co-authored-by: xutizhou <xutingz@nvidia.com>
Co-authored-by: linhu-nv <linhu@nvidia.com>
MahmoudAshraf97 pushed a commit to MahmoudAshraf97/sglang that referenced this pull request Sep 8, 2025
Co-authored-by: Co-author fzyzcjy <ch271828n@outlook.com>
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
Co-authored-by: zhuofan1123 <zhuofanl@nvidia.com>
Co-authored-by: liz-badada <jinyanc@nvidia.com>
Co-authored-by: xutizhou <xutingz@nvidia.com>
Co-authored-by: linhu-nv <linhu@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants