Mistral Large 3 NVFP4 support by dcampora · Pull Request #14485 · sgl-project/sglang

dcampora · 2025-12-05T07:01:21Z

Support Mistral Large 3 NVFP4.

Depends on #14466.

GSM8K test results:

SGLANG_ENABLE_JIT_DEEPGEMM=0 \
python3 -m sglang.launch_server \
--model mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4 \
--kv-cache-dtype fp8_e4m3 \
--tensor-parallel-size 8 \
--disable-radix-cache \
--stream-interval 20 \
--mem-fraction-static 0.9 \
--attention-backend trtllm_mla \
--model-loader-extra-config '{"enable_multithread_load": true}' \
--max-running-requests 1024 \
--cuda-graph-max-bs 1024 \
--chat-template mistral

lm_eval \
--model local-chat-completions \
--model_args model=mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4,\
base_url=http://127.0.0.1:30000/v1/chat/completions,\
num_concurrent=128,timeout=999999,max_gen_toks=8192 \
--tasks gsm8k \
--batch_size 128 \
--apply_chat_template \
--num_fewshot 8

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     8|exact_match|↑  |0.9249|±  |0.0073|
|     |       |strict-match    |     8|exact_match|↑  |0.7104|±  |0.0125|

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

gemini-code-assist · 2025-12-05T07:01:25Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

JustinTong0323 · 2025-12-05T20:10:04Z

/tag-and-rerun-ci

elvischenv · 2025-12-09T03:41:02Z

Before merging main, the server can be launched and the accuracy is good. After merging, I got lots of rope related issues:

[Bugfix] Fix KeyError for Mistral-Large-3 rope_scaling config #14627

`rope_parameters`'s factor field must be a float >= 1, got 36
`rope_parameters`'s beta_fast field must be a float, got 32
`rope_parameters`'s beta_slow field must be a float, got 1

  File "/workspace/sglang/python/sglang/srt/model_loader/__init__.py", line 28, in get_model
    return loader.load_model(
           ^^^^^^^^^^^^^^^^^^
  File "/workspace/sglang/python/sglang/srt/model_loader/loader.py", line 595, in load_model
    model = _initialize_model(
            ^^^^^^^^^^^^^^^^^^
  File "/workspace/sglang/python/sglang/srt/model_loader/loader.py", line 263, in _initialize_model
    return model_class(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/sglang/python/sglang/srt/models/pixtral.py", line 87, in __init__
    self.vision_args = VisionEncoderArgs(**vision_args)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: VisionEncoderArgs.__init__() missing 1 required positional argument: 'rope_theta'

Update: all these issues are caused by transformers update: 8200fb5

ispobock · 2025-12-09T07:51:35Z

/tag-and-rerun-ci

python/sglang/srt/configs/model_config.py

Fridge003 · 2025-12-09T19:23:08Z

@elvischenv @dcampora
This PR also handled the rope issue. Is it conflicting with your code?
#14745

elvischenv · 2025-12-10T02:18:25Z

@elvischenv @dcampora This PR also handled the rope issue. Is it conflicting with your code? #14745

@Fridge003 It won't have conflicts, but that fix is quite ugly. Like for ml3, there are lots of sub attributes in rope_scaling:

rope_scaling: {'rope_type': 'yarn', 'mscale_all_dim': 1, 'factor': 36.0, 'original_max_position_embeddings': 8192.0, 'beta_fast': 32.0, 'beta_slow': 1.0}

That fix will pull all these sub attributes to the top level of the config, which is unnecessary and error-prone. Not all attributes are needed to be pulled, it would better just pull rope_theta for resolving that issue. @yhyang201

Fridge003 · 2025-12-10T22:08:47Z

@elvischenv We reverted the transformers version to 4.57 and I removed the logics in model_config.py. Please check whether it works on your side

python/sglang/srt/models/pixtral.py

python/sglang/srt/utils/mistral_utils.py

Fridge003 · 2025-12-11T06:15:43Z

/tag-and-rerun-ci

ispobock · 2025-12-12T07:28:27Z

/tag-and-rerun-ci

…n_eagle3_npu * 'main' of https://github.com/sgl-project/sglang: (25 commits) [NPU] perf update with kvcache nz & w4a8 quant (sgl-project#14423) [PP Prefill][NIXL] Fix PP mode transfer completion tracking to wait for all ranks (sgl-project#15027) Fix GLM-4.6 tool calls don't support streaming output for arguments i… (sgl-project#13989) feature: adding nightly wheel workflow and indexer (sgl-project#14924) [diffusion] feat: Improve LoRA compatibility by adding unified format detection and diffusers-based normalization (sgl-project#14659) [Fix] Disable trtllm moe backend for draft model for a qucik fix (sgl-project#15002) [diffusion] fix: use NDRotaryEmbedding in flux_2 (sgl-project#15034) Mistral Large 3 NVFP4 support (sgl-project#14485) call check_quantized_moe_compatibility after initialize (sgl-project#13876) Add sgl_router_attempt_http_responses_total for single attempt information (sgl-project#15037) Add error code in prometheus metrics and add X-SMG-Error-Code header (sgl-project#15036) Provide more fine grained error reason for reqwest error (sgl-project#15032) Tiny change http router response format to unify (sgl-project#15031) Tiny unify grpc existing error responses into new format (sgl-project#15030) Add `code` field and unify error responses for router (sgl-project#15028) Super tiny remove unused log_request (sgl-project#15035) Fix decode OOM caused by retraction (sgl-project#14939) [CI]Add gb200 runner back (sgl-project#15024) Add a special label for b200 CI runner that can run kernel tests (sgl-project#15033) Fix regression caused by fa3 block_table (sgl-project#15009) ... # Conflicts: # python/sglang/srt/hardware_backend/npu/attention/ascend_backend.py

Co-authored-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com> Co-authored-by: Daniel Campora <961215+dcampora@users.noreply.github.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com>

elvischenv and others added 7 commits December 4, 2025 07:01

Support eagle

d6322e0

Fixes for running eagle

0c9e202

Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>

Merge branch 'main' into elvis/eagle

0575231

Added w4a16 loading support.

1402ae4

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

Adding w4a4 support for compressed tensors.

b8b4cc6

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

Do not change sgl kernel.

c54fc52

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

add compressed tensors w4a4 nvfp4 moe support

ab9fe7a

dcampora requested review from AniZpZ, BBuf, Edwardf0t1, FlamingoPg, Fridge003, HaiShaw, Ying1123, ch-wan, ispobock and merrymercy as code owners December 5, 2025 07:01

github-actions bot added quant LLM Quantization deepseek blackwell SM100/SM120 labels Dec 5, 2025

dcampora changed the title ~~Mistral Large 3 Eagle and NVFP4 support~~ Mistral Large 3 NVFP4 support Dec 5, 2025

JustinTong0323 and others added 4 commits December 5, 2025 15:47

Merge branch 'main' into dcampora/nvfp4_support

09d4eba

lint

fdf7a4e

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

fix marlin undefined name

6b67c6c

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

Merge branch 'main' into dcampora/nvfp4_support

1c5b478

github-actions bot added the run-ci label Dec 5, 2025

JustinTong0323 added 2 commits December 5, 2025 21:35

Merge branch 'main' into dcampora/nvfp4_support

c4b3399

Merge branch 'main' into dcampora/nvfp4_support

da20243

elvischenv added 2 commits December 8, 2025 23:16

fix rope issue

527e399

Merge remote-tracking branch 'origin/main' into dcampora/nvfp4_support

a54c7f5

elvischenv requested review from fzyzcjy and zhyncs as code owners December 9, 2025 07:27

add assertion for yarn

022553d

ispobock approved these changes Dec 9, 2025

View reviewed changes

elvischenv reviewed Dec 9, 2025

View reviewed changes

python/sglang/srt/configs/model_config.py Outdated Show resolved Hide resolved

Remove assertion

bc11677

Merge branch 'main' into dcampora/nvfp4_support

e044562

yhyang201 mentioned this pull request Dec 10, 2025

Revert transformers to 4.57.1 #14801

Merged

6 tasks

Merge branch 'main' into dcampora/nvfp4_support

b2683a0

elvischenv reviewed Dec 11, 2025

View reviewed changes

python/sglang/srt/models/pixtral.py Outdated Show resolved Hide resolved

python/sglang/srt/utils/mistral_utils.py Outdated Show resolved Hide resolved

elvischenv and others added 4 commits December 11, 2025 09:21

Revert rope fix

e2c42cb

Revert rope fix

b8a2643

Merge branch 'main' into dcampora/nvfp4_support

8879db9

Merge branch 'main' into dcampora/nvfp4_support

1e24516

Merge branch 'main' into dcampora/nvfp4_support

2e724b2

update copyright

1fcb827

ispobock merged commit f6031ad into sgl-project:main Dec 13, 2025
190 of 207 checks passed

elvischenv mentioned this pull request Dec 13, 2025

Mistral Large 3 NVFP4 TRTLLM MoE support #15049

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mistral Large 3 NVFP4 support#14485

Mistral Large 3 NVFP4 support#14485
ispobock merged 29 commits intosgl-project:mainfrom
dcampora:dcampora/nvfp4_support

dcampora commented Dec 5, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Dec 5, 2025

Uh oh!

JustinTong0323 commented Dec 5, 2025

Uh oh!

elvischenv commented Dec 9, 2025 •

edited

Loading

Uh oh!

ispobock commented Dec 9, 2025

Uh oh!

Uh oh!

Fridge003 commented Dec 9, 2025

Uh oh!

elvischenv commented Dec 10, 2025

Uh oh!

Fridge003 commented Dec 10, 2025

Uh oh!

Uh oh!

Uh oh!

Fridge003 commented Dec 11, 2025

Uh oh!

ispobock commented Dec 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

dcampora commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

gemini-code-assist bot commented Dec 5, 2025

Uh oh!

JustinTong0323 commented Dec 5, 2025

Uh oh!

elvischenv commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ispobock commented Dec 9, 2025

Uh oh!

Uh oh!

Fridge003 commented Dec 9, 2025

Uh oh!

elvischenv commented Dec 10, 2025

Uh oh!

Fridge003 commented Dec 10, 2025

Uh oh!

Uh oh!

Uh oh!

Fridge003 commented Dec 11, 2025

Uh oh!

ispobock commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

dcampora commented Dec 5, 2025 •

edited

Loading

elvischenv commented Dec 9, 2025 •

edited

Loading

ispobock commented Dec 12, 2025 •

edited

Loading