Skip to content

[CI] Add Mistral Large 3 Eagle basic PR test#14526

Closed
alisonshao wants to merge 8 commits intomainfrom
add-eagle-pr-ci-test
Closed

[CI] Add Mistral Large 3 Eagle basic PR test#14526
alisonshao wants to merge 8 commits intomainfrom
add-eagle-pr-ci-test

Conversation

@alisonshao
Copy link
Collaborator

Summary

  • Add PR CI test for mistralai/Mistral-Large-3-675B-Instruct-2512-Eagle model
  • Test includes GSM8K accuracy and single batch speed evaluation
  • Runs on 8-gpu-b200 via the per-commit-8-gpu-b200 suite

Eagle-specific configuration

  • --speculative-moe-runner-backend flashinfer_trtllm
  • --kv-cache-dtype auto (to avoid low AR with FP8 kv cache)
  • --attention-backend trtllm_mla
  • --tp 8

Related PRs

Add PR CI test for mistralai/Mistral-Large-3-675B-Instruct-2512-Eagle model.

The test includes:
- GSM8K accuracy test (threshold 0.90)
- Single batch speed test (threshold 50 tok/s)

Eagle-specific configuration:
- --speculative-moe-runner-backend flashinfer_trtllm
- --kv-cache-dtype auto (to avoid low AR with FP8 kv cache)
- --attention-backend trtllm_mla
- --tp 8

Runs on 8-gpu-b200 via the per-commit-8-gpu-b200 suite.
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @alisonshao, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates a new continuous integration test for the Mistral-Large-3-675B-Instruct-2512-Eagle model. The primary goal is to automatically validate the model's performance, specifically its GSM8K accuracy and single-batch inference speed, within an 8-GPU B200 environment, ensuring its stable and efficient operation.

Highlights

  • New CI Test for Mistral Large 3 Eagle: A dedicated CI test has been introduced for the mistralai/Mistral-Large-3-675B-Instruct-2512-Eagle model.
  • Performance and Accuracy Evaluation: The new test suite includes evaluations for GSM8K accuracy and single batch speed to ensure model quality.
  • 8-GPU B200 Execution: The test is configured to run on the per-commit-8-gpu-b200 suite, leveraging an 8-GPU setup.
  • Eagle-Specific Configurations: Specific server arguments are applied for the Eagle model, including --speculative-moe-runner-backend flashinfer_trtllm, --kv-cache-dtype auto, --attention-backend trtllm_mla, and --tp 8.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@alisonshao

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new CI test for the mistralai/Mistral-Large-3-675B-Instruct-2512-Eagle model, which is a great addition for ensuring model stability. The changes look good and follow existing patterns. I've added a few suggestions to improve the robustness of the new test file, mainly around resource cleanup and URL parsing. Overall, this is a solid contribution.

Comment on lines +51 to +54
kill_process_tree(cls.process.pid)
# Clean up environment variable
if "SGLANG_ENABLE_JIT_DEEPGEMM" in os.environ:
del os.environ["SGLANG_ENABLE_JIT_DEEPGEMM"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a try...finally block ensures that the environment variable is cleaned up even if kill_process_tree raises an exception. This makes the test cleanup more robust.

        try:
            kill_process_tree(cls.process.pid)
        finally:
            # Clean up environment variable
            if "SGLANG_ENABLE_JIT_DEEPGEMM" in os.environ:
                del os.environ["SGLANG_ENABLE_JIT_DEEPGEMM"]

parallel=1400,
max_new_tokens=512,
host="http://127.0.0.1",
port=int(self.base_url.split(":")[-1]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using urlparse is more robust for extracting the port from the base URL compared to string splitting. This avoids potential issues if the URL format changes (e.g., using IPv6 addresses).

Please also add from urllib.parse import urlparse at the top of the file.

            port=urlparse(self.base_url).port,

self.assertGreater(metrics["accuracy"], 0.90)

def test_bs_1_speed(self):
args = BenchArgs(port=int(self.base_url.split(":")[-1]), max_new_tokens=2048)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similarly, using urlparse here for both host and port improves robustness and consistency. This avoids relying on the default host value in BenchArgs.

        parsed_url = urlparse(self.base_url)
        args = BenchArgs(host=parsed_url.hostname, port=parsed_url.port, max_new_tokens=2048)

@github-actions

This comment was marked as outdated.

@alisonshao
Copy link
Collaborator Author

/tag-and-rerun-ci

@github-actions github-actions bot added the run-ci label Dec 6, 2025
@alisonshao

This comment was marked as off-topic.

@alisonshao
Copy link
Collaborator Author

/rerun-stage unit-test-backend-8-gpu-b200

@github-actions
Copy link
Contributor

github-actions bot commented Dec 7, 2025

✅ Triggered unit-test-backend-8-gpu-b200 to run independently (skipping dependencies).

Check the Actions tab for progress.

@alisonshao
Copy link
Collaborator Author

alisonshao commented Dec 7, 2025

@Fridge003
Copy link
Collaborator

Can we move the eagle basic test to nightly?

@alisonshao
Copy link
Collaborator Author

alisonshao commented Dec 12, 2025

Can we move the eagle basic test to nightly?

yes, in fact this is already moved to nightly with #14525. i will close this PR.

@alisonshao alisonshao closed this Dec 12, 2025
@alisonshao alisonshao deleted the add-eagle-pr-ci-test branch December 12, 2025 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants