Conversation
There was a problem hiding this comment.
Summary of Changes
Hello @ppraneth, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
I've reviewed this pull request, which primarily focuses on extending our model compatibility by adding support for Bailing Mixture-of-Experts (MoE) models. This integration includes the inclusionAI/Ling-lite and Ling-plus models, enhancing our framework's ability to handle these advanced architectures for improved performance and flexibility.
Highlights
- New Model Support: I've added comprehensive support for Bailing Mixture-of-Experts (MoE) models, specifically integrating
inclusionAI/Ling-liteandLing-plus, which significantly expands the range of advanced models our system can serve. - Core MoE Architecture Implementation: A new Python module (
python/sglang/srt/models/bailing_moe.py) has been introduced. This module defines the complete Bailing MoE model architecture, including its attention, MLP, and expert layers, adapted from the vLLM project to ensure compatibility and performance. - Specialized Weight Loading: I've implemented custom weight loading logic within the new Bailing MoE model. This ensures correct handling of the unique parameter structures inherent to MoE models, such as stacked and expert-specific weights, facilitating seamless model integration.
- Documentation and Testing: The
docs/supported_models/generative_models.mdhas been updated to reflect the newly supported Ling MoE models. Additionally, a dedicated test case forinclusionAI/Ling-litehas been added totest/srt/models/test_generation_models.pyto validate its proper functionality and integration within our generation framework.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This pull request adds support for the Bailing MoE model, including the model implementation, a test case, and documentation updates. The implementation appears correct and follows the project's patterns. I've provided a few suggestions to improve the documentation's clarity and the code's maintainability. Overall, great work!
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
@ppraneth Great Job! I'll take a look. |
|
@ant-yy Please help to take a look |
|
@ppraneth Can you provide your benchmark detail information? Such as GPU information, bench scripts, etc. |
I ran it on H100 Vllm: |
|
Thank you very much for your reply. Can you provide |
https://github.com/sgl-project/sglang/tree/main/benchmark/mmlu |
|
I am using A800 locally, then following the instructions to conduct testing. The conclusions are as follows:
For my environment, please refer to the attachment Once again, thank you for your helpful response |
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
SGLang vs. vLLM: Verified MMLU Benchmark on Ling-lite Model
Comparison with Official MMLU Scores
For
inclusionAI/Ling-lite:Ling-lite Benchmark Results
Detailed Subject-by-Subject Comparison (Ling-lite)
Technical Configurations
SGLang Benchmark Commands
Vllm Benchmark Commands
Motivation
Closes #8621
Modifications
Checklist