[CI] Add Mistral Large 3 Eagle basic PR test#14526
Conversation
Add PR CI test for mistralai/Mistral-Large-3-675B-Instruct-2512-Eagle model. The test includes: - GSM8K accuracy test (threshold 0.90) - Single batch speed test (threshold 50 tok/s) Eagle-specific configuration: - --speculative-moe-runner-backend flashinfer_trtllm - --kv-cache-dtype auto (to avoid low AR with FP8 kv cache) - --attention-backend trtllm_mla - --tp 8 Runs on 8-gpu-b200 via the per-commit-8-gpu-b200 suite.
Summary of ChangesHello @alisonshao, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates a new continuous integration test for the Mistral-Large-3-675B-Instruct-2512-Eagle model. The primary goal is to automatically validate the model's performance, specifically its GSM8K accuracy and single-batch inference speed, within an 8-GPU B200 environment, ensuring its stable and efficient operation. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Code Review
This pull request adds a new CI test for the mistralai/Mistral-Large-3-675B-Instruct-2512-Eagle model, which is a great addition for ensuring model stability. The changes look good and follow existing patterns. I've added a few suggestions to improve the robustness of the new test file, mainly around resource cleanup and URL parsing. Overall, this is a solid contribution.
| kill_process_tree(cls.process.pid) | ||
| # Clean up environment variable | ||
| if "SGLANG_ENABLE_JIT_DEEPGEMM" in os.environ: | ||
| del os.environ["SGLANG_ENABLE_JIT_DEEPGEMM"] |
There was a problem hiding this comment.
Using a try...finally block ensures that the environment variable is cleaned up even if kill_process_tree raises an exception. This makes the test cleanup more robust.
try:
kill_process_tree(cls.process.pid)
finally:
# Clean up environment variable
if "SGLANG_ENABLE_JIT_DEEPGEMM" in os.environ:
del os.environ["SGLANG_ENABLE_JIT_DEEPGEMM"]| parallel=1400, | ||
| max_new_tokens=512, | ||
| host="http://127.0.0.1", | ||
| port=int(self.base_url.split(":")[-1]), |
There was a problem hiding this comment.
| self.assertGreater(metrics["accuracy"], 0.90) | ||
|
|
||
| def test_bs_1_speed(self): | ||
| args = BenchArgs(port=int(self.base_url.split(":")[-1]), max_new_tokens=2048) |
There was a problem hiding this comment.
This comment was marked as outdated.
This comment was marked as outdated.
|
/tag-and-rerun-ci |
This comment was marked as off-topic.
This comment was marked as off-topic.
|
/rerun-stage unit-test-backend-8-gpu-b200 |
|
✅ Triggered Check the Actions tab for progress. |
|
Can we move the eagle basic test to nightly? |
yes, in fact this is already moved to nightly with #14525. i will close this PR. |
Summary
mistralai/Mistral-Large-3-675B-Instruct-2512-Eaglemodelper-commit-8-gpu-b200suiteEagle-specific configuration
--speculative-moe-runner-backend flashinfer_trtllm--kv-cache-dtype auto(to avoid low AR with FP8 kv cache)--attention-backend trtllm_mla--tp 8Related PRs