[Bugfix] Fix Mistral Large 3 NVFP4 TRTLLM MoE#18065
[Bugfix] Fix Mistral Large 3 NVFP4 TRTLLM MoE#18065Fridge003 merged 4 commits intosgl-project:mainfrom
Conversation
Summary of ChangesHello @elvischenv, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a critical bug that had impacted Mistral Large 3 NVFP4 MoE support, a regression introduced by a prior TRTLLM MoE refactoring. The fix involves consolidating the Mixture-of-Experts (MoE) application logic within the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request fixes a bug in Mistral Large 3 NVFP4 MoE support by refactoring the apply method in CompressedTensorsW4A4Nvfp4MoEMethod. The logic from the now-removed apply_with_router_logits method has been correctly integrated into apply, conditioned on self.use_flashinfer_trtllm. This change unifies the two execution paths and ensures consistent return types. The fix is sound. I have one suggestion to further improve maintainability by splitting the large apply method into smaller, more focused private helper methods.
python/sglang/srt/layers/quantization/compressed_tensors/compressed_tensors_moe.py
Show resolved
Hide resolved
b8zhong
left a comment
There was a problem hiding this comment.
Thanks. Sorry, we will be more careful in the future.
|
/tag-and-rerun-ci |
d819dbb to
0855503
Compare
Motivation
TRTLLM MoE refactoring PR(#15151) broke Mistral Large 3 NVFP4 MoE support(#15049), this PR is trying to fix the issue.
Modifications
Accuracy Tests
Benchmarking and Profiling
Same with results in #15049
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci