Skip to content

Register allgather/reducescatter buffers with symm memory#12572

Merged
merrymercy merged 6 commits intosgl-project:mainfrom
nvcastet:fp4_allgather_fixed
Nov 5, 2025
Merged

Register allgather/reducescatter buffers with symm memory#12572
merrymercy merged 6 commits intosgl-project:mainfrom
nvcastet:fp4_allgather_fixed

Conversation

@nvcastet
Copy link
Copy Markdown
Collaborator

@nvcastet nvcastet commented Nov 4, 2025

Motivation

Rebase version of #9358 on top main (including #12524)

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @nvcastet, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors and enhances the symmetric memory allocation strategy for distributed operations within the SGLang framework. By introducing a more flexible context manager and integrating it across key components like linear layers, MoE layers, and attention mechanisms, the changes aim to optimize memory management and improve the efficiency of distributed tensor computations, particularly for allgather and reducescatter buffers. The update also includes the addition of a new custom reduce_scatter_tensor operation to streamline distributed data handling.

Highlights

  • Symmetric Memory Context Manager Refactor: The use_symmetric_memory class has been renamed to SymmetricMemoryContext and refactored into a more flexible function-based context manager. This new use_symmetric_memory function now conditionally enables symmetric memory allocation based on whether it's globally enabled, explicitly disabled, or if the world size is 1, returning a nullcontext when disabled.
  • Integration of Symmetric Memory in Tensor Allocations: The new use_symmetric_memory context manager has been widely integrated across various parts of the codebase, including linear layers, MoE layers (Cutlass, Triton, FP8, MXFP4), attention mechanisms, and vocabulary parallel embedding. This ensures that tensors involved in distributed operations are allocated with symmetric memory when appropriate, optimizing performance.
  • New reduce_scatter_tensor Custom Operation: A new custom operation, reg_reduce_scatter_tensor, along with its fake implementation, has been added and registered. This provides a standardized way to perform reduce-scatter operations within the framework, leveraging PyNCCL for symmetric memory when available.
  • Dynamic Symmetric Memory Allocation for DP Attention Buffers: Data Parallel (DP) attention buffers (global_dp_buffer and local_dp_buffer) now utilize symmetric memory allocation. The local_dp_buffer's symmetric memory usage is conditionally disabled if dp_max_padding is not enabled, allowing for more granular control over memory allocation strategies.
  • Removal of Manual tag Method: The explicit tag method for marking tensors for symmetric memory has been removed. The new context manager approach implicitly handles the symmetric memory allocation, simplifying usage and reducing boilerplate.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the usage of symmetric memory by introducing a SymmetricMemoryContext context manager and a use_symmetric_memory factory function. This removes the need for manual tensor tagging with .tag(), leading to a cleaner and less error-prone API. The changes correctly wrap tensor allocations that need to be in symmetric memory, and add support for reduce_scatter with this mechanism. The overall changes are a significant improvement. I've found one critical issue and one minor suggestion for improvement.

@nvcastet nvcastet force-pushed the fp4_allgather_fixed branch from 9818f8b to 576426d Compare November 4, 2025 20:33
@nvcastet nvcastet force-pushed the fp4_allgather_fixed branch from e194e3f to cad8929 Compare November 4, 2025 23:17
@nvcastet nvcastet enabled auto-merge (squash) November 4, 2025 23:20
@merrymercy merrymercy disabled auto-merge November 5, 2025 01:11
@merrymercy merrymercy merged commit 2340798 into sgl-project:main Nov 5, 2025
125 of 155 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants