Fix Mistral Large 3 nightly test#25407
Merged
Fridge003 merged 1 commit intoMay 16, 2026
Merged
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request modifies the quantization process in the MoE scheme to ensure that the input scale tensor passed to fp4_quantize has a shape of [1], which is a requirement for the cute-dsl backend. Feedback was provided regarding a potential edge case where slicing with [:1] could result in an empty tensor (shape [0]) if the source tensor is empty, specifically in distributed environments where a rank might have no local experts.
Jiminator
added a commit
to Jiminator/sglang
that referenced
this pull request
May 15, 2026
…5407 The Mistral-Large-3 B200 nightly partition has been red because of TWO independent regressions sharing the same job. Keeping them in one document is misleading — different root causes, different fixes, different PRs. This split: - Creates mistral_large3_tp8_mtp_b200_bisect_report.md with all TP8+MTP-specific content (root cause d2c1034 / PR sgl-project#24436, the _resolve_speculative_algorithm_alias crash on Mistral-native-format drafts, the AutoConfig.from_pretrained ValueError, the empirical one-commit bisect d2c1034 vs f1395af, the proposed try/except fix, the maintainer-ready server log block, and the CI-visibility table). - Strips the same content out of mistral_large3_nvfp4_b200_bisect_report.md, replacing it with cross-references in the header, Open Items, follow-up note, and TL;DR. - Adds a PR sgl-project#25407 verification section to BOTH documents (NVFP4 doc records that PR sgl-project#25407 fixes its issue with gsm8k 0.957; TP8+MTP doc records that PR sgl-project#25407 explicitly does NOT touch server_args.py and the failure remains identical). Run summary on PR sgl-project#25407 head e3fb4ee (1574s wall time, 8x B200, flashinfer 0.6.11.post1, sglang-kernel 0.4.2.post2+cu130, torch 2.11.0): - TP8 PASS gsm8k 0.953 - TP8+MTP FAIL unchanged ValueError (server_args.py:329) - NVFP4 PASS gsm8k 0.957 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Collaborator
|
/tag-and-rerun-ci |
Collaborator
|
I think this is fine as a quick hotfix, but it might be best to fix it at the source in compressed_tensors_w4a4_nvfp4_moe.py in process_weights_after_loading(). Is the expand operation necessary now that FI expects a tensor with numel = 1? |
Fridge003
approved these changes
May 16, 2026
Fridge003
pushed a commit
that referenced
this pull request
May 16, 2026
Co-authored-by: b8zhong <b8zhong@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
python /sgl-workspace/sglang/test/registered/8-gpu-models/test_mistral_large3.pyCI States
Latest PR Test: ❌ Missing
run-cilabel — add it to run CI tests.Latest PR Test (Extra): ❌ Blocked —
run-ciis required first.