Skip to content

add doc for #14386#14655

Merged
Fridge003 merged 2 commits intosgl-project:mainfrom
liusy58:gpu_hash
Jan 9, 2026
Merged

add doc for #14386#14655
Fridge003 merged 2 commits intosgl-project:mainfrom
liusy58:gpu_hash

Conversation

@liusy58
Copy link
Collaborator

@liusy58 liusy58 commented Dec 8, 2025

Motivation

Add docs for #14386

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Dec 8, 2025
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @liusy58, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on enhancing the documentation and clarifying the implementation of a multi-modal feature hashing optimization. It introduces and documents the SGLANG_MM_BUFFER_SIZE_MB environment variable, which allows users to configure a GPU buffer for faster hash computations while efficiently managing GPU memory. Additionally, explanatory comments have been added to the relevant Python file to provide better insight into this optimization's functionality.

Highlights

  • Documentation Update: Added documentation for a new environment variable, SGLANG_MM_BUFFER_SIZE_MB, which controls the size of a preallocated GPU buffer for multi-modal feature hashing optimization.
  • Multi-modal Feature Hashing Optimization: Introduced an optimization where feature tensors are temporarily moved to the GPU for faster hash computation when SGLANG_MM_BUFFER_SIZE_MB is set to a positive value, helping to avoid out-of-memory issues.
  • Code Clarity: Added inline comments in schedule_batch.py to explain the mechanism and benefits of the multi-modal feature hashing optimization.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds documentation and a code comment for a new feature that optimizes multi-modal feature hashing by using a pre-allocated GPU buffer. The changes are clear and help explain the functionality of the SGLANG_MM_BUFFER_SIZE_MB environment variable. My review includes a suggestion to improve the readability of the new documentation.

| `SGLANG_SCHEDULER_RECV_SKIPPER_WEIGHT_DECODE` | Weight increment for decode forward mode in scheduler recv skipper. Works with `--scheduler-recv-interval` to control polling frequency during decode phase. | `1` |
| `SGLANG_SCHEDULER_RECV_SKIPPER_WEIGHT_VERIFY` | Weight increment for target verify forward mode in scheduler recv skipper. Works with `--scheduler-recv-interval` to control polling frequency during verification phase. | `1` |
| `SGLANG_SCHEDULER_RECV_SKIPPER_WEIGHT_NONE` | Weight increment when forward mode is None in scheduler recv skipper. Works with `--scheduler-recv-interval` to control polling frequency when no specific forward mode is active. | `1` |
| `SGLANG_MM_BUFFER_SIZE_MB` | Size of preallocated GPU buffer (in MB) for multi-modal feature hashing optimization. When set to a positive value, temporarily moves features to GPU for faster hash computation, then moves them back to CPU to save GPU memory. Larger features benefit more from GPU hashing. Set to `0` to disable. | `0` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The description for SGLANG_MM_BUFFER_SIZE_MB is quite long, which can make the table cell difficult to read. Breaking the description into multiple lines using <br> tags would improve readability.

Suggested change
| `SGLANG_MM_BUFFER_SIZE_MB` | Size of preallocated GPU buffer (in MB) for multi-modal feature hashing optimization. When set to a positive value, temporarily moves features to GPU for faster hash computation, then moves them back to CPU to save GPU memory. Larger features benefit more from GPU hashing. Set to `0` to disable. | `0` |
| `SGLANG_MM_BUFFER_SIZE_MB` | Size of preallocated GPU buffer (in MB) for multi-modal feature hashing optimization.<br>When set to a positive value, it temporarily moves features to GPU for faster hash computation, then moves them back to CPU to save GPU memory.<br>Larger features benefit more from GPU hashing.<br>Set to `0` to disable. | `0` |

@yhyang201
Copy link
Collaborator

/tag-and-rerun-ci

@Fridge003 Fridge003 merged commit 068abe7 into sgl-project:main Jan 9, 2026
24 of 71 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants