Skip to content
4 changes: 2 additions & 2 deletions docs/advanced_features/hicache_design.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,9 +121,9 @@ Specifically, **LMCache**, an efficient KV cache layer for enterprise-scale LLM

- **`--enable-hierarchical-cache`**: Enable hierarchical cache functionality. This is required to use HiCache.

- **`--hicache-ratio HICACHE_RATIO`**: The ratio of the size of host KV cache memory pool to the size of device pool. For example, a value of 2 means the host memory pool is twice as large as the device memory pool. The minimum allowed value is 2.
- **`--hicache-ratio HICACHE_RATIO`**: The ratio of the size of host KV cache memory pool to the size of device pool. For example, a value of 2 means the host memory pool is twice as large as the device memory pool. The value of this parameter must be greater than 1, as the current implementation requires the host memory allocated for the KV cache to be larger than the device memory allocated for the KV cache.

- **`--hicache-size HICACHE_SIZE`**: The size of host KV cache memory pool in gigabytes. This parameter overrides `hicache-ratio` if set. For example, `--hicache-size 30` allocates 30GB for the host memory pool **for each rank**. If there are 8 ranks, then the total memory size is 240GB.
- **`--hicache-size HICACHE_SIZE`**: The size of host KV cache memory pool in gigabytes. This parameter overrides `hicache-ratio` if set. For example, `--hicache-size 30` allocates 30GB (1GB = 1e9 bytes) for the host memory pool **for each rank**. If there are 8 ranks, then the total memory size is 240GB. Just like `hicache-ratio`, the value of this parameter must be larger than the size of device memory allocated for KV cache.

**Note**: `--hicache-ratio` and `--hicache-size` are two critical parameters. In general, a larger HiCache size leads to a higher cache hit rate, which improves prefill performance. However, the relationship between cache size and hit rate is not linear. Once most reusable KV data—especially hot tokens—are already cached, further increasing the size may yield only marginal performance gains. Users can set these parameters based on their workload characteristics and performance requirements.

Expand Down
11 changes: 11 additions & 0 deletions python/sglang/srt/environ.py
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,17 @@ class Envs:
SGLANG_MOONCAKE_CUSTOM_MEM_POOL = EnvStr(None)
ENABLE_ASCEND_TRANSFER_WITH_MOONCAKE = EnvBool(False)

# Mooncake Store
SGLANG_HICACHE_MOONCAKE_CONFIG_PATH = EnvStr(None)
MOONCAKE_MASTER = EnvStr(None)
MOONCAKE_LOCAL_HOSTNAME = EnvStr("localhost")
MOONCAKE_TE_META_DATA_SERVER = EnvStr("P2PHANDSHAKE")
MOONCAKE_GLOBAL_SEGMENT_SIZE = EnvStr("4gb")
MOONCAKE_PROTOCOL = EnvStr("tcp")
MOONCAKE_DEVICE = EnvStr("")
MOONCAKE_MASTER_METRICS_PORT = EnvInt(9003)
MOONCAKE_CHECK_SERVER = EnvBool(False)

# AMD & ROCm
SGLANG_USE_AITER = EnvBool(False)
SGLANG_ROCM_FUSED_DECODE_MLA = EnvBool(False)
Expand Down
Loading
Loading