Skip to content

Fix Mistral Large 3 nightly test#25407

Merged
Fridge003 merged 1 commit into
sgl-project:mainfrom
bzhng-development:brayden/fix-so-random
May 16, 2026
Merged

Fix Mistral Large 3 nightly test#25407
Fridge003 merged 1 commit into
sgl-project:mainfrom
bzhng-development:brayden/fix-so-random

Conversation

@b8zhong
Copy link
Copy Markdown
Collaborator

@b8zhong b8zhong commented May 15, 2026

python /sgl-workspace/sglang/test/registered/8-gpu-models/test_mistral_large3.py


============================================================
Mistral-Large-3 Results Summary
Dataset: gsm8k
Baseline: 0.85
============================================================

Model 1: mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4
  Performance: PASS, output: 3773.9 tok/s
  Accuracy: PASS
  Score: 0.957

============================================================
OVERALL: ALL TESTS PASSED
============================================================

.
----------------------------------------------------------------------
Ran 1 test in 2366.479s

OK

CI States

Latest PR Test: ❌ Missing run-ci label — add it to run CI tests.
Latest PR Test (Extra): ❌ Blockedrun-ci is required first.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the quantization process in the MoE scheme to ensure that the input scale tensor passed to fp4_quantize has a shape of [1], which is a requirement for the cute-dsl backend. Feedback was provided regarding a potential edge case where slicing with [:1] could result in an empty tensor (shape [0]) if the source tensor is empty, specifically in distributed environments where a rank might have no local experts.

Jiminator added a commit to Jiminator/sglang that referenced this pull request May 15, 2026
…5407

The Mistral-Large-3 B200 nightly partition has been red because of
TWO independent regressions sharing the same job. Keeping them in one
document is misleading — different root causes, different fixes,
different PRs. This split:

- Creates mistral_large3_tp8_mtp_b200_bisect_report.md with all
  TP8+MTP-specific content (root cause d2c1034 / PR sgl-project#24436, the
  _resolve_speculative_algorithm_alias crash on Mistral-native-format
  drafts, the AutoConfig.from_pretrained ValueError, the empirical
  one-commit bisect d2c1034 vs f1395af, the proposed try/except fix,
  the maintainer-ready server log block, and the CI-visibility table).

- Strips the same content out of
  mistral_large3_nvfp4_b200_bisect_report.md, replacing it with
  cross-references in the header, Open Items, follow-up note, and TL;DR.

- Adds a PR sgl-project#25407 verification section to BOTH documents (NVFP4 doc
  records that PR sgl-project#25407 fixes its issue with gsm8k 0.957; TP8+MTP doc
  records that PR sgl-project#25407 explicitly does NOT touch server_args.py and
  the failure remains identical).

Run summary on PR sgl-project#25407 head e3fb4ee (1574s wall time, 8x B200,
flashinfer 0.6.11.post1, sglang-kernel 0.4.2.post2+cu130, torch 2.11.0):
  - TP8        PASS  gsm8k 0.953
  - TP8+MTP    FAIL  unchanged ValueError (server_args.py:329)
  - NVFP4      PASS  gsm8k 0.957

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Jiminator
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@Jiminator
Copy link
Copy Markdown
Collaborator

Jiminator commented May 15, 2026

I think this is fine as a quick hotfix, but it might be best to fix it at the source in compressed_tensors_w4a4_nvfp4_moe.py in process_weights_after_loading(). Is the expand operation necessary now that FI expects a tensor with numel = 1?

@Fridge003 Fridge003 merged commit d523ae1 into sgl-project:main May 16, 2026
260 of 320 checks passed
Fridge003 pushed a commit that referenced this pull request May 16, 2026
Co-authored-by: b8zhong <b8zhong@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants