Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/references/environment_variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ SGLang supports various environment variables that can be used to configure its
| `SGLANG_SCHEDULER_RECV_SKIPPER_WEIGHT_DECODE` | Weight increment for decode forward mode in scheduler recv skipper. Works with `--scheduler-recv-interval` to control polling frequency during decode phase. | `1` |
| `SGLANG_SCHEDULER_RECV_SKIPPER_WEIGHT_VERIFY` | Weight increment for target verify forward mode in scheduler recv skipper. Works with `--scheduler-recv-interval` to control polling frequency during verification phase. | `1` |
| `SGLANG_SCHEDULER_RECV_SKIPPER_WEIGHT_NONE` | Weight increment when forward mode is None in scheduler recv skipper. Works with `--scheduler-recv-interval` to control polling frequency when no specific forward mode is active. | `1` |
| `SGLANG_MM_BUFFER_SIZE_MB` | Size of preallocated GPU buffer (in MB) for multi-modal feature hashing optimization. When set to a positive value, temporarily moves features to GPU for faster hash computation, then moves them back to CPU to save GPU memory. Larger features benefit more from GPU hashing. Set to `0` to disable. | `0` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The description for SGLANG_MM_BUFFER_SIZE_MB is quite long, which can make the table cell difficult to read. Breaking the description into multiple lines using <br> tags would improve readability.

Suggested change
| `SGLANG_MM_BUFFER_SIZE_MB` | Size of preallocated GPU buffer (in MB) for multi-modal feature hashing optimization. When set to a positive value, temporarily moves features to GPU for faster hash computation, then moves them back to CPU to save GPU memory. Larger features benefit more from GPU hashing. Set to `0` to disable. | `0` |
| `SGLANG_MM_BUFFER_SIZE_MB` | Size of preallocated GPU buffer (in MB) for multi-modal feature hashing optimization.<br>When set to a positive value, it temporarily moves features to GPU for faster hash computation, then moves them back to CPU to save GPU memory.<br>Larger features benefit more from GPU hashing.<br>Set to `0` to disable. | `0` |

| `SGLANG_MM_PRECOMPUTE_HASH` | Enable precomputing of hash values for MultimodalDataItem | `false` |


Expand Down
3 changes: 3 additions & 0 deletions python/sglang/srt/managers/schedule_batch.py
Original file line number Diff line number Diff line change
Expand Up @@ -360,6 +360,9 @@ def from_dict(obj: dict):
ret.mm_items = [item for item in ret.mm_items if item.is_valid()]

if envs.SGLANG_MM_BUFFER_SIZE_MB.get() > 0:
# Multi-modal feature hashing optimization:
# When SGLANG_MM_BUFFER_SIZE_MB > 0, we temporarily move feature tensors to GPU
# for faster hash computation, while avoiding OOM issues.
from sglang.srt.managers.mm_utils import (
init_feature_buffer,
is_feature_buffer_initialized,
Expand Down
Loading