adapt ant moving to A2 single machine#362
Conversation
Summary of ChangesHello @luanyundu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Mixture-of-Experts (MoE) 'ant moving' functionality by implementing a multi-round processing strategy. This adaptation allows the system to efficiently handle substantially larger batch sizes and sequence lengths on a single A2 machine, which was previously a limitation. The changes involve a comprehensive overhaul of the underlying data flow, memory management, and kernel dispatching to seamlessly manage token processing across multiple rounds. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces significant changes to support multi-round processing for long sequences, referred to as "ant moving", on A2 single machines. The changes are extensive, touching operator definitions, tiling logic, and kernel implementations across numerous files. Key modifications include refactoring utility code into a shared header, updating operator APIs, and implementing new multi-round logic in the MoE kernels. A new kernel implementation for multi-round combine operation has been added. Overall, the changes appear to correctly implement the new feature. However, I've identified a critical issue related to potential memory corruption and a medium-severity issue regarding a logic mismatch in a check. Please see the detailed comments for suggestions.
…n 256 Co-authored-by: WSEmma <wusemma@163.com>
* upstream/main: CI execution requirements for separating a2 and a3 (sgl-project#367) Fix the bug that total expert num greater than 256 or local expert num is less than 8 (sgl-project#364) adapt ant moving to A2 single machine (sgl-project#362)
…-npu into sgl-cmake2 * 'sgl-cmake2' of https://github.com/1329009851/sgl-kernel-npu: CI execution requirements for separating a2 and a3 (sgl-project#367) Fix the bug that total expert num greater than 256 or local expert num is less than 8 (sgl-project#364) adapt ant moving to A2 single machine (sgl-project#362) reset ci -- run test mixed running for experts on a2. (sgl-project#365) Revert "Build the deepep package with the chip model included. (sgl-project#274)" (sgl-project#363) fix:buffer control (sgl-project#361) Build the deepep package with the chip model included. (sgl-project#274) bugfix wrong packages build dir (sgl-project#360) bump version to 2026.02.01 (sgl-project#359) Cover the workflows cases on a3 (sgl-project#321) release follows naming convention (sgl-project#356) Modify notifydispatch to support DEEPEP_NORMAL_LONG_SEQ_ROUND up to 128. (sgl-project#352) fix the hanging bug (sgl-project#355) [Bugfix] Fix build script working with cann 8.5.0 (sgl-project#354) Modify the description of DeepEP in the README file. (sgl-project#348) Revert "Add scripts for building CMake files (sgl-project#344)" (sgl-project#353) Add scripts for building CMake files (sgl-project#344) Support x86_64 and aarch64 binary release (sgl-project#325) add function for deep-ep tests (sgl-project#301) [Doc] Improved README.md content and English grammar and integrated the DeepWiki badge for Ask AI (sgl-project#345)
* adapt ant moving to A2 single machine * fix CI bug that misalign when localExpertsNum less than 8 or more than 256 Co-authored-by: WSEmma <wusemma@163.com> --------- Co-authored-by: WSEmma <wusemma@163.com>
* adapt ant moving to A2 single machine * fix CI bug that misalign when localExpertsNum less than 8 or more than 256 Co-authored-by: WSEmma <wusemma@163.com> --------- Co-authored-by: WSEmma <wusemma@163.com>
Use DEEPEP_NORMAL_LONG_SEQ_ROUND, DEEPEP_NORMAL_LONG_SEQ_PER_ROUND_TOKENS, DEEPEP_NORMAL_COMBINE_ENABLE_LONG_SEQ to control wether enable ant moving. The code was already proved by 32k token in test_intranode.py and test_normal_and_low_latency.py.