Skip to content

Mistral Large 3 NVFP4 support#14485

Merged
ispobock merged 29 commits intosgl-project:mainfrom
dcampora:dcampora/nvfp4_support
Dec 13, 2025
Merged

Mistral Large 3 NVFP4 support#14485
ispobock merged 29 commits intosgl-project:mainfrom
dcampora:dcampora/nvfp4_support

Conversation

@dcampora
Copy link
Contributor

@dcampora dcampora commented Dec 5, 2025

Support Mistral Large 3 NVFP4.

Depends on #14466.

  • GSM8K test results:
SGLANG_ENABLE_JIT_DEEPGEMM=0 \
python3 -m sglang.launch_server \
--model mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4 \
--kv-cache-dtype fp8_e4m3 \
--tensor-parallel-size 8 \
--disable-radix-cache \
--stream-interval 20 \
--mem-fraction-static 0.9 \
--attention-backend trtllm_mla \
--model-loader-extra-config '{"enable_multithread_load": true}' \
--max-running-requests 1024 \
--cuda-graph-max-bs 1024 \
--chat-template mistral

lm_eval \
--model local-chat-completions \
--model_args model=mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4,\
base_url=http://127.0.0.1:30000/v1/chat/completions,\
num_concurrent=128,timeout=999999,max_gen_toks=8192 \
--tasks gsm8k \
--batch_size 128 \
--apply_chat_template \
--num_fewshot 8
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     8|exact_match|↑  |0.9249|±  |0.0073|
|     |       |strict-match    |     8|exact_match|↑  |0.7104|±  |0.0125|

Checklist

elvischenv and others added 7 commits December 4, 2025 07:01
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions bot added quant LLM Quantization deepseek blackwell SM100/SM120 labels Dec 5, 2025
@dcampora dcampora changed the title Mistral Large 3 Eagle and NVFP4 support Mistral Large 3 NVFP4 support Dec 5, 2025
JustinTong0323 and others added 4 commits December 5, 2025 15:47
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
@JustinTong0323
Copy link
Collaborator

/tag-and-rerun-ci

@github-actions github-actions bot added the run-ci label Dec 5, 2025
@elvischenv
Copy link
Contributor

elvischenv commented Dec 9, 2025

Before merging main, the server can be launched and the accuracy is good. After merging, I got lots of rope related issues:

  1. [Bugfix] Fix KeyError for Mistral-Large-3 rope_scaling config #14627
`rope_parameters`'s factor field must be a float >= 1, got 36
`rope_parameters`'s beta_fast field must be a float, got 32
`rope_parameters`'s beta_slow field must be a float, got 1
  File "/workspace/sglang/python/sglang/srt/model_loader/__init__.py", line 28, in get_model
    return loader.load_model(
           ^^^^^^^^^^^^^^^^^^
  File "/workspace/sglang/python/sglang/srt/model_loader/loader.py", line 595, in load_model
    model = _initialize_model(
            ^^^^^^^^^^^^^^^^^^
  File "/workspace/sglang/python/sglang/srt/model_loader/loader.py", line 263, in _initialize_model
    return model_class(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/sglang/python/sglang/srt/models/pixtral.py", line 87, in __init__
    self.vision_args = VisionEncoderArgs(**vision_args)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: VisionEncoderArgs.__init__() missing 1 required positional argument: 'rope_theta'

Update: all these issues are caused by transformers update: 8200fb5

@ispobock
Copy link
Collaborator

ispobock commented Dec 9, 2025

/tag-and-rerun-ci

@Fridge003
Copy link
Collaborator

@elvischenv @dcampora
This PR also handled the rope issue. Is it conflicting with your code?
#14745

@elvischenv
Copy link
Contributor

@elvischenv @dcampora This PR also handled the rope issue. Is it conflicting with your code? #14745

@Fridge003 It won't have conflicts, but that fix is quite ugly. Like for ml3, there are lots of sub attributes in rope_scaling:

rope_scaling: {'rope_type': 'yarn', 'mscale_all_dim': 1, 'factor': 36.0, 'original_max_position_embeddings': 8192.0, 'beta_fast': 32.0, 'beta_slow': 1.0}

That fix will pull all these sub attributes to the top level of the config, which is unnecessary and error-prone. Not all attributes are needed to be pulled, it would better just pull rope_theta for resolving that issue. @yhyang201

@yhyang201 yhyang201 mentioned this pull request Dec 10, 2025
6 tasks
@Fridge003
Copy link
Collaborator

@elvischenv We reverted the transformers version to 4.57 and I removed the logics in model_config.py. Please check whether it works on your side

@Fridge003
Copy link
Collaborator

/tag-and-rerun-ci

@ispobock
Copy link
Collaborator

ispobock commented Dec 12, 2025

/tag-and-rerun-ci

@ispobock ispobock merged commit f6031ad into sgl-project:main Dec 13, 2025
190 of 207 checks passed
Liwansi added a commit to iforgetmyname/sglang that referenced this pull request Dec 13, 2025
…n_eagle3_npu

* 'main' of https://github.com/sgl-project/sglang: (25 commits)
  [NPU] perf update with kvcache nz & w4a8 quant (sgl-project#14423)
  [PP Prefill][NIXL] Fix PP mode transfer completion tracking to wait for all ranks (sgl-project#15027)
  Fix GLM-4.6 tool calls don't support streaming output for arguments i… (sgl-project#13989)
  feature: adding nightly wheel workflow and indexer (sgl-project#14924)
  [diffusion] feat: Improve LoRA compatibility by adding unified format detection and diffusers-based normalization (sgl-project#14659)
  [Fix] Disable trtllm moe backend for draft model for a qucik fix (sgl-project#15002)
  [diffusion] fix: use NDRotaryEmbedding in flux_2   (sgl-project#15034)
  Mistral Large 3 NVFP4 support (sgl-project#14485)
  call check_quantized_moe_compatibility after initialize (sgl-project#13876)
  Add sgl_router_attempt_http_responses_total for single attempt information (sgl-project#15037)
  Add error code in prometheus metrics and add X-SMG-Error-Code header (sgl-project#15036)
  Provide more fine grained error reason for reqwest error (sgl-project#15032)
  Tiny change http router response format to unify (sgl-project#15031)
  Tiny unify grpc existing error responses into new format (sgl-project#15030)
  Add `code` field and unify error responses for router (sgl-project#15028)
  Super tiny remove unused log_request (sgl-project#15035)
  Fix decode OOM caused by retraction (sgl-project#14939)
  [CI]Add gb200 runner back (sgl-project#15024)
  Add a special label for b200 CI runner that can run kernel tests (sgl-project#15033)
  Fix regression caused by fa3 block_table (sgl-project#15009)
  ...

# Conflicts:
#	python/sglang/srt/hardware_backend/npu/attention/ascend_backend.py
Prozac614 pushed a commit to Prozac614/sglang that referenced this pull request Dec 17, 2025
Co-authored-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Co-authored-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026
Co-authored-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Co-authored-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blackwell SM100/SM120 deepseek quant LLM Quantization run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants