[TEMP FOR DOCKER BUILD - WILL DELETE LATER] Add Mistral Small 4 support with patched transformers by dougyster · Pull Request #20713 · sgl-project/sglang

dougyster · 2026-03-16T21:12:01Z

Summary

Adds Mistral Small 4 (119B) model support, based on @JustinTong0323's work in Add Mistral Small 4 (Pixtral) support #20708
Adds patched transformers install in Dockerfile to fix Mistral tekken tokenizer vocab offset bug

Changes

All SGLang-side Mistral 4 changes (config loading, vision processor, reasoning parser, chat template fallback)
Dockerfile: installs dougyster/transformers@mistral-4-patch which includes:
- HuggingFace transformers main (with Mistral 4 model support from Add Mistral 4 huggingface/transformers#44760)
- Tekken tokenizer fix: correct vocab ID offset by num_special_tokens
- Tekken converter: use full tokenizer_object instead of bare vocab+merges

Test plan

Build Docker image from this branch
Verify Mistral-Small-4-119B-2603 loads and generates correct output with --tp 2
Verify tokenizer produces correct token IDs

🤖 Generated with Claude Code

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

…size Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

…processor - Use patch_size * spatial_merge_size as the effective patch size in PixtralImageProcessor so images resize to multiples of 28 (not 14), matching PatchMerger requirements with spatial_merge_size=2 - Remove manual _resize and get_patch_grid_size methods, relying on the correctly configured HF image processor instead - Add multi-image offset splitting for per-image MultimodalDataItem - Remove unused torch import

- Add --model flag (default "default") to avoid hardcoded model name - Add --reasoning-effort flag passed as top-level request field - Support local image paths via base64 data URI encoding - Pass reasoning_effort and model as explicit parameters instead of smuggling through sampling_params dict

…riable The flashinfer trtllm_fp8_per_tensor_scale_moe already defaults activation_type to Swiglu (3), which matches Mistral-Small-4's silu+gated config. Also replace unused ncols with _ in pixtral processor.

…al with 0% accuracy when thinking

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

…rapper

gemini-code-assist · 2026-03-16T21:12:06Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

…ormers v5 compat

Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>

… tokens Mistral's tokenizer defines [THINK] (id=34) and [/THINK] (id=35) as special tokens. When skip_special_tokens=True (the default), these tokens are stripped during decoding, making the reasoning parser unable to detect thinking boundaries and split reasoning_content from content. This is an upstream issue in the Mistral checkpoint/tokenizer config — reasoning markers should not be special tokens (cf. DeepSeek's <think>/</think> which are regular tokens and work without workarounds). As a workaround, disable skip_special_tokens when the Mistral reasoning parser is active and reasoning_effort is set.

The EAGLE draft model for Mistral Small 4 (mistralai/Mistral-Small-4-119B-2603-eagle) uses dense MLA layers without MoE, unlike the Mistral Large 3 EAGLE which has MoE. This caused three issues: 1. `adapt_config_dict` in mistral_utils.py did not handle dense EAGLE models (moe=null in params.json), falling through to an unsupported architecture. Fix: add a branch for `is_eagle and not is_moe` that sets model_type=deepseek_v3 with all-dense MoE overrides (first_k_dense_replace=num_layers). 2. `_remap_mistral_yarn_args` did not include rope_theta in rope_scaling, causing transformers yarn validation to fail. Fix: copy rope_theta into the rope_scaling dict. 3. `MistralLarge3ForCausalLMEagle.__init__` set `self.model_cls` but `DeepseekV2ForCausalLM.__init__` hardcodes `self.model = DeepseekV2Model`, so the EAGLE fc layer was never created. The draft model ran without fusing token embeddings with target hidden states, producing garbage draft tokens (accept rate 0.25). Fix: call super().__init__() then replace self.model with MistralLarge3EagleModel which has the fc layer. Accept rate: 0.25 -> 0.83.

JustinTong0323 and others added 21 commits February 28, 2026 13:57

Add Mistral4/Pixtral support changes

7eeffcb

lint

c3297fc

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

Add special handling for mistral 4

296fcd5

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

add reasoning parser for mistral

c7457c9

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

Set default reasoning_effort to None in ChatCompletionRequest

557d6fd

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

fix: Add activation type mapping for FlashInfer in moe_runner

2c4349f

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

fix: add reasoning request handling for mistral 4

0322d01

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

fix: streamline vision config handling in get_processor function

4802ecc

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

fix: adjust patch grid size calculation to incorporate spatial merge …

04a8673

…size Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

cleanup: remove redundant activation_type mapping and unused ncols va…

0f1471e

…riable The flashinfer trtllm_fp8_per_tensor_scale_moe already defaults activation_type to Swiglu (3), which matches Mistral-Small-4's silu+gated config. Also replace unused ncols with _ in pixtral processor.

fix reasoning trace having answer and benchmark getting no answers ev…

e10aa5a

…al with 0% accuracy when thinking

possible fix for -HF chkpt

2041c65

LeanStral works

01c72d6

Merge branch 'main' into mistral4-support

0da0ae3

lint

afe8772

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

fix: update model name in MistralDetector docstring (2602 -> 2603)

d508481

fix: expose mistral load format and update MistralDetector docstring

34a699f

fix: use correct custom op name for trtllm_fp8_per_tensor_scale_moe_w…

7da7666

…rapper

feat: auto-detect Mistral native format and set load_format='mistral'

c04df33

dougyster requested review from Fridge003, HaiShaw, JustinTong0323, Ying1123, ispobock, merrymercy, mickqian, yhyang201 and yuan-luo as code owners March 16, 2026 21:12

dougyster requested review from BBuf, CatherineSue, Edwardf0t1, ch-wan, ishandhanani, slin1237 and yctseng0211 as code owners March 16, 2026 21:12

dougyster changed the title ~~Add Mistral Small 4 support with patched transformers~~ [TEMP FOR DOCKER BUILD - WILL DELETE LATER] Add Mistral Small 4 support with patched transformers Mar 16, 2026

JustinTong0323 added 2 commits March 16, 2026 22:04

lint

e22540b

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

fix: add defaults to PretrainedConfig subclass annotations for transf…

bbc7267

…ormers v5 compat

dougyster force-pushed the mistral4-support branch from bd7113b to b40d2f9 Compare March 16, 2026 22:18

github-actions Bot added the deepseek label Mar 16, 2026

JustinTong0323 and others added 3 commits March 16, 2026 23:48

fix: only pass reasoning_effort to chat template when explicitly set

638f439

Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>

fix: support multiple consecutive compact tool calls in Mistral detector

77675d1

Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>

dougyster force-pushed the mistral4-support branch from b40d2f9 to 89d23b3 Compare March 17, 2026 02:58

JustinTong0323 and others added 3 commits March 17, 2026 04:08

Merge branch 'main' into mistral4-support

943abd5

init for mistral4

c0bf47b

dougyster force-pushed the mistral4-support branch from 89d23b3 to c0bf47b Compare March 17, 2026 05:44

replace with transformers main

124c1e1

dougyster closed this Mar 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TEMP FOR DOCKER BUILD - WILL DELETE LATER] Add Mistral Small 4 support with patched transformers#20713

[TEMP FOR DOCKER BUILD - WILL DELETE LATER] Add Mistral Small 4 support with patched transformers#20713
dougyster wants to merge 30 commits into
mainfrom
mistral4-support

dougyster commented Mar 16, 2026

Uh oh!

gemini-code-assist Bot commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dougyster commented Mar 16, 2026

Summary

Changes

Test plan

Uh oh!

gemini-code-assist Bot commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants