Skip to content

Flashinfer MOE FP8 support for Mistral Large 3.#15422

Merged
Fridge003 merged 1 commit intosgl-project:mainfrom
dcampora:dcampora/support_fp8_trtllm_moe
Feb 25, 2026
Merged

Flashinfer MOE FP8 support for Mistral Large 3.#15422
Fridge003 merged 1 commit intosgl-project:mainfrom
dcampora:dcampora/support_fp8_trtllm_moe

Conversation

@dcampora
Copy link
Contributor

@dcampora dcampora commented Dec 18, 2025

Motivation

This PR brings in Flashinfer MOE FP8 support for Mistral Large 3.

It requires an upcoming release of flashinfer to work.

Modifications

Accuracy Tests

Without EP8:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     8|exact_match|↑  |0.9257|±  |0.0072|
|     |       |strict-match    |     8|exact_match|↑  |0.6012|±  |0.0135|

With EP8:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     8|exact_match|↑  |0.9287|±  |0.0071|
|     |       |strict-match    |     8|exact_match|↑  |0.5861|±  |0.0136|

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions bot added the quant LLM Quantization label Dec 18, 2025
@dcampora dcampora force-pushed the dcampora/support_fp8_trtllm_moe branch from 0c733ae to 6fe36c1 Compare December 19, 2025 07:04
@Fridge003 Fridge003 mentioned this pull request Dec 21, 2025
6 tasks
@Fridge003
Copy link
Collaborator

Hi @dcampora flashinfer has been upgraded. Can you please take a look again

@elvischenv
Copy link
Contributor

Hi @dcampora flashinfer has been upgraded. Can you please take a look again

Hi @Fridge003, this PR depends on flashinfer-ai/flashinfer@36380e2 which is out of 0.6.1 but included in 0.6.2.

@Fridge003
Copy link
Collaborator

@dcampora @elvischenv 0.6.2 has already been included on latest main branch

@elvischenv elvischenv force-pushed the dcampora/support_fp8_trtllm_moe branch from 6fe36c1 to e2a92bd Compare February 1, 2026 07:49
@elvischenv elvischenv force-pushed the dcampora/support_fp8_trtllm_moe branch from e2a92bd to 3fc6ba6 Compare February 1, 2026 09:15
@elvischenv
Copy link
Contributor

elvischenv commented Feb 1, 2026

Hi @Fridge003, could you help remove "draft" from PR's title and enable CI for this PR? I don't have permission to update the title.

@elvischenv elvischenv force-pushed the dcampora/support_fp8_trtllm_moe branch from 3fc6ba6 to f68b5a0 Compare February 3, 2026 17:26
@Fridge003 Fridge003 changed the title Draft: Flashinfer MOE FP8 support for Mistral Large 3. Flashinfer MOE FP8 support for Mistral Large 3. Feb 4, 2026
@Fridge003
Copy link
Collaborator

/tag-and-rerun-ci

@github-actions github-actions bot added the run-ci label Feb 4, 2026
@elvischenv elvischenv force-pushed the dcampora/support_fp8_trtllm_moe branch from 3281fce to 9ac6bc0 Compare February 9, 2026 02:09
@elvischenv elvischenv force-pushed the dcampora/support_fp8_trtllm_moe branch from 9ac6bc0 to ce6c78b Compare February 24, 2026 05:52
@Fridge003 Fridge003 merged commit 3501904 into sgl-project:main Feb 25, 2026
262 of 280 checks passed
klhhhhh pushed a commit to klhhhhh/sglang that referenced this pull request Feb 26, 2026
Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
magicYang1573 pushed a commit to magicYang1573/sglang that referenced this pull request Mar 9, 2026
Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026
Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants