Support mxfp4 for GPT-OSS by Ying1123 · Pull Request #8843 · sgl-project/sglang

Ying1123 · 2025-08-06T03:59:42Z

No description provided.

gemini-code-assist · 2025-08-06T03:59:45Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

…into gpt-oss-mxfp4

python/sglang/srt/layers/moe/fused_moe_triton/layer.py

gemini-code-assist · 2025-08-06T06:52:07Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

zhyncs · 2025-08-06T07:05:31Z

Thanks to the NVIDIA Solution Architect Team for their great work! @zhuofan1123 @liz-badada @xutizhou @linhu-nv

Byeong-Chan · 2025-08-06T08:12:55Z

python/sglang/srt/layers/quantization/fp4.py



-class MxFp4Config(QuantizationConfig):
+class Mxfp4Config(QuantizationConfig):


@Ying1123

Thank you to code works!

Is it ok about MxFp4Config -> Mxfp4Config?

EduardDurech · 2025-08-06T20:48:52Z

? I'm using meta-llama/Llama-3.2-3B-Instruct and not quantized

    engine    = sgl.Engine(model_path=a.model, tp_size=int(os.getenv("TP")))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sglang/python/sglang/api.py", line 44, in Engine
    from sglang.srt.entrypoints.engine import Engine
  File "/sglang/python/sglang/srt/entrypoints/engine.py", line 42, in <module>
    from sglang.srt.managers.data_parallel_controller import (
  File "/sglang/python/sglang/srt/managers/data_parallel_controller.py", line 38, in <module>
    from sglang.srt.managers.scheduler import run_scheduler_process
  File "/sglang/python/sglang/srt/managers/scheduler.py", line 37, in <module>
    from sglang.srt.configs.model_config import ModelConfig
  File "/sglang/python/sglang/srt/configs/model_config.py", line 31, in <module>
    from sglang.srt.layers.quantization import QUANTIZATION_METHODS
  File "/sglang/python/sglang/srt/layers/quantization/__init__.py", line 69, in <module>
    from sglang.srt.layers.quantization.mxfp4 import Mxfp4Config
  File "/sglang/python/sglang/srt/layers/quantization/mxfp4.py", line 36, in <module>
    from flashinfer import (
ImportError: cannot import name 'mxfp8_quantize' from 'flashinfer' (/usr/local/lib/python3.12/dist-packages/flashinfer/__init__.py). Did you mean: 'fp4_quantize'?

We need to rebuild all our libraries I assume

Co-authored-by: Co-author fzyzcjy <ch271828n@outlook.com> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Co-authored-by: zhuofan1123 <zhuofanl@nvidia.com> Co-authored-by: liz-badada <jinyanc@nvidia.com> Co-authored-by: xutizhou <xutingz@nvidia.com> Co-authored-by: linhu-nv <linhu@nvidia.com>

mxfp4

f90f4d2

Ying1123 requested review from BBuf, HaiShaw, ch-wan, ispobock, kushanam, merrymercy and zhyncs as code owners August 6, 2025 03:59

Ying1123 marked this pull request as draft August 6, 2025 04:00

zhyncs marked this pull request as ready for review August 6, 2025 04:49

fzyzcjy and others added 6 commits August 6, 2025 12:50

fmt

f97f050

Merge branch 'main' into gpt-oss-mxfp4

9afa5f3

fix ci

3eace91

Merge branch 'gpt-oss-mxfp4' of https://github.com/sgl-project/sglang …

75b3c2e

…into gpt-oss-mxfp4

fmt

0222012

minor cleanup

008d7d0

fzyzcjy mentioned this pull request Aug 6, 2025

[Tracking] OpenAI gpt-oss Day 0 Support #8833

Closed

Merge branch 'main' into gpt-oss-mxfp4

efb69fb

zhyncs reviewed Aug 6, 2025

View reviewed changes

python/sglang/srt/layers/moe/fused_moe_triton/layer.py Outdated Show resolved Hide resolved

upd

8a6ff4e

zhyncs approved these changes Aug 6, 2025

View reviewed changes

zhyncs merged commit 168033d into main Aug 6, 2025
9 of 57 checks passed

zhyncs deleted the gpt-oss-mxfp4 branch August 6, 2025 07:05

Byeong-Chan reviewed Aug 6, 2025

View reviewed changes

zhyncs mentioned this pull request Aug 6, 2025

Add model gpt-oss #8822

Closed

6 tasks

weedge mentioned this pull request Aug 11, 2025

feat: add sglang + openai gpt-oss serve and benchmark on modal ai-bot-pro/achatbot#183

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support mxfp4 for GPT-OSS#8843

Support mxfp4 for GPT-OSS#8843
zhyncs merged 9 commits intomainfrom
gpt-oss-mxfp4

Ying1123 commented Aug 6, 2025

Uh oh!

gemini-code-assist bot commented Aug 6, 2025

Uh oh!

Uh oh!

gemini-code-assist bot commented Aug 6, 2025

Uh oh!

Uh oh!

zhyncs commented Aug 6, 2025

Uh oh!

Byeong-Chan Aug 6, 2025

Uh oh!

EduardDurech commented Aug 6, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants



		class MxFp4Config(QuantizationConfig):
		class Mxfp4Config(QuantizationConfig):

Conversation

Ying1123 commented Aug 6, 2025

Uh oh!

gemini-code-assist bot commented Aug 6, 2025

Uh oh!

Uh oh!

gemini-code-assist bot commented Aug 6, 2025

Uh oh!

Uh oh!

zhyncs commented Aug 6, 2025

Uh oh!

Byeong-Chan Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

EduardDurech commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

EduardDurech commented Aug 6, 2025 •

edited

Loading