Register allgather/reducescatter buffers with symm memory#12572
Register allgather/reducescatter buffers with symm memory#12572merrymercy merged 6 commits intosgl-project:mainfrom
Conversation
Summary of ChangesHello @nvcastet, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors and enhances the symmetric memory allocation strategy for distributed operations within the SGLang framework. By introducing a more flexible context manager and integrating it across key components like linear layers, MoE layers, and attention mechanisms, the changes aim to optimize memory management and improve the efficiency of distributed tensor computations, particularly for Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request refactors the usage of symmetric memory by introducing a SymmetricMemoryContext context manager and a use_symmetric_memory factory function. This removes the need for manual tensor tagging with .tag(), leading to a cleaner and less error-prone API. The changes correctly wrap tensor allocations that need to be in symmetric memory, and add support for reduce_scatter with this mechanism. The overall changes are a significant improvement. I've found one critical issue and one minor suggestion for improvement.
9818f8b to
576426d
Compare
e194e3f to
cad8929
Compare
Motivation
Rebase version of #9358 on top main (including #12524)
Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist