sgl-project · hnyls2002 · Oct 30, 2025 · Oct 25, 2025 · Oct 27, 2025 · Oct 27, 2025
diff --git a/docs/advanced_features/checkpoint_engine.md b/docs/advanced_features/checkpoint_engine.md
@@ -0,0 +1,254 @@
+# Checkpoint Engine Integration
+
+The SGLang checkpoint engine integration provides an efficient way to load model weights using a distributed checkpoint loading system. This feature significantly reduces model loading time, especially for large models and multi-node setups, by parallelizing the weight loading process across multiple processes and nodes.
+
+## Overview
+
+The checkpoint engine integration allows SGLang to:
+- Load model weights in parallel using multiple processes
+- Distribute weight loading across multiple nodes to increase effective disk bandwidth
+- Overlap weight loading with other initialization tasks like CUDA graph capture
+- Support both single-node and multi-node deployments
+
+## Installation
+
+First, install the checkpoint engine package:
+
+```bash
+pip install 'checkpoint-engine[p2p]'
+```
+
+## Architecture
+
+The system consists of two main components:
+
+1. **SGLang Server**: Runs with `--wait-for-initial-weights` flag to wait for weights before becoming ready
+2. **Checkpoint Engine Workers**: Separate processes (managed by torchrun) that load and distribute model weights
+
+The checkpoint engine uses a parameter server architecture with support for:
+- **Broadcast mode**: Weights are broadcast from loading processes to inference processes
+- **P2P mode**: Direct peer-to-peer weight transfer between processes
+- **All mode**: Combination of both broadcast and P2P methods
+
+## Usage Examples
+
+### Single Node Setup
+
+**Terminal 1 - Launch SGLang Server:**
+```bash
+python -m sglang.launch_server \
+    --model-path Qwen/Qwen3-8B \
+    --tp 8 \
+    --load-format dummy \
+    --wait-for-initial-weights
+```
+
+**Terminal 2 - Run Checkpoint Engine:**
+
+Using sglang entrypoint:
+```bash
+python -m sglang.srt.checkpoint_engine.update \
+    --update-method broadcast \
+    --checkpoint-path /path/to/Qwen/Qwen3-8B/ \
+    --inference-parallel-size 8
+```
+
+Using torchrun directly:
+```bash
+torchrun --nproc-per-node 8 \
+    examples/checkpoint_engine/update.py \
+    --update-method broadcast \
+    --checkpoint-path /path/to/Qwen/Qwen3-8B/ \
+    --inference-parallel-size 8
+```
+
+### Multi-Node Setup (2 Nodes)
+
+**Node 0:**
+
+Launch SGLang server:
+```bash
+python -m sglang.launch_server \
+    --model-path Qwen/Qwen3-8B \
+    --tp 8 \
+    --load-format dummy \
+    --wait-for-initial-weights \
+    --host [IP]
+```
+
+Run checkpoint engine:
+
+Using sglang entrypoint (recommended):
+```bash
+python -m sglang.srt.checkpoint_engine.update \
+    --update-method broadcast \
+    --checkpoint-path /path/to/Qwen/Qwen3-8B/ \
+    --inference-parallel-size 8
+```
+
+Using torchrun directly:
+```bash
+torchrun --nproc-per-node 8 \
+    --nnodes 2 \
+    --node-rank 0 \
+    --master-addr [IP] \
+    --master-port 29500 \
+    examples/checkpoint_engine/update.py \
+    --update-method broadcast \
+    --checkpoint-path /path/to/Qwen/Qwen3-8B/ \
+    --inference-parallel-size 8
+```
+
+**Node 1:**
+
+Launch SGLang server:
+```bash
+python -m sglang.launch_server \
+    --model-path Qwen/Qwen3-8B \
+    --tp 8 \
+    --load-format dummy \
+    --wait-for-initial-weights \
+    --host [IP]
+```
+
+Run checkpoint engine:
+
+Using sglang entrypoint (recommended):
+```bash
+python -m sglang.srt.checkpoint_engine.update \
+    --update-method broadcast \
+    --checkpoint-path /path/to/Qwen/Qwen3-8B/ \
+    --inference-parallel-size 8
+```
+
+Using torchrun directly:
+```bash
+torchrun --nproc-per-node 8 \
+    --nnodes 2 \
+    --node-rank 1 \
+    --master-addr [IP] \
+    --master-port 29500 \
+    examples/checkpoint_engine/update.py \
+    --update-method broadcast \
+    --checkpoint-path /path/to/Qwen/Qwen3-8B/ \
+    --inference-parallel-size 8
+```
+
+### Multi-Node Setup with Tensor Parallelism (TP=16)
+
+**Node 0:**
+
+Launch SGLang server:
+```bash
+python -m sglang.launch_server \
+    --model-path Qwen/Qwen3-8B \
+    --tp 8 \
+    --load-format dummy \
+    --wait-for-initial-weights \
+    --host [IP] \
+    --dist-init-addr [IP]:9120 \
+    --nnodes 2 \
+    --node-rank 0
+```
+
+Run checkpoint engine:
+
+Using sglang entrypoint (recommended):
+```bash
+python -m sglang.srt.checkpoint_engine.update \
+    --update-method broadcast \
+    --checkpoint-path /path/to/Qwen/Qwen3-8B/ \
+    --inference-parallel-size 16
+```
+
+Using torchrun directly:
+```bash
+torchrun --nproc-per-node 8 \
+    --nnodes 2 \
+    --node-rank 0 \
+    --master-addr [IP] \
+    --master-port 29500 \
+    examples/checkpoint_engine/update.py \
+    --update-method broadcast \
+    --checkpoint-path /path/to/Qwen/Qwen3-8B/ \
+    --inference-parallel-size 16
+```
+
+**Node 1:**
+
+Launch SGLang server:
+```bash
+python -m sglang.launch_server \
+    --model-path Qwen/Qwen3-8B \
+    --tp 8 \
+    --load-format dummy \
+    --wait-for-initial-weights \
+    --host [IP] \
+    --dist-init-addr [IP]:9120 \
+    --nnodes 2 \
+    --node-rank 1
+```
+
+Run checkpoint engine:
+
+Using sglang entrypoint (recommended):
+```bash
+python -m sglang.srt.checkpoint_engine.update \
+    --update-method broadcast \
+    --checkpoint-path /path/to/Qwen/Qwen3-8B/ \
+    --inference-parallel-size 16
+```
+
+Using torchrun directly:
+```bash
+torchrun --nproc-per-node 8 \
+    --nnodes 2 \
+    --node-rank 1 \
+    --master-addr [IP] \
+    --master-port 29500 \
+    examples/checkpoint_engine/update.py \
+    --update-method broadcast \
+    --checkpoint-path /path/to/Qwen/Qwen3-8B/ \
+    --inference-parallel-size 16
+```
+
+## Configuration Options
+
+### SGLang Server Options
+
+- `--load-format dummy`: Use dummy format for initial loading (allows overlapping with other tasks)
+- `--wait-for-initial-weights`: Wait for checkpoint engine to provide weights before becoming ready
+- `--host`: Host address for multi-node setups
+- `--dist-init-addr`: Distributed initialization address for tensor parallelism
+
+### Checkpoint Engine Options
+
+- `--update-method`: Weight update method (`broadcast`, `p2p`, or `all`)
+- `--checkpoint-path`: Path to model checkpoint directory
+- `--inference-parallel-size`: Number of inference parallel processes
+- `--endpoint`: SGLang server endpoint (default: `http://localhost:19730`)
+- `--checkpoint-name`: Name for the checkpoint (default: `my-checkpoint-iter-0`)
+- `--save-metas-file`: File to save checkpoint metadata
+- `--load-metas-file`: File to load checkpoint metadata from
+- `--uds`: Unix domain socket path for communication
+- `--weight-version`: Version identifier for weights
+
+## Performance Benefits
+
+The checkpoint engine provides significant time savings in two main aspects:
+
+1. **Multi-node Loading**: Each node only loads a portion of weights from disk, effectively increasing disk bandwidth. More participating nodes provide greater acceleration. Preliminary tests show 20-second acceleration when loading DeepSeek-R1 on H20-3e with two nodes.
+
+2. **Single Process Optimization**: Using dummy format allows overlapping disk-to-CPU transfer with CUDA graph capture and other initialization tasks, providing additional time savings.
+
+## Troubleshooting
+
+- Ensure checkpoint engine package is installed: `pip install 'checkpoint-engine[p2p]'`
+- Verify network connectivity between nodes in multi-node setups
+- Check that the checkpoint path contains valid model files
+- Monitor logs for connection errors between SGLang server and checkpoint engine
+- Use `--sleep-time` parameter to add delays if needed for debugging
+
+## References
+
+- [Checkpoint Engine Repository](https://github.com/MoonshotAI/checkpoint-engine)
diff --git a/docs/index.rst b/docs/index.rst
@@ -53,6 +53,7 @@ Its core features include:
    advanced_features/router.md
    advanced_features/deterministic_inference.md
    advanced_features/observability.md
+   advanced_features/checkpoint_engine.md
 
 .. toctree::
    :maxdepth: 1

diff --git a/python/sglang/srt/checkpoint_engine/__init__.py b/python/sglang/srt/checkpoint_engine/__init__.py
@@ -0,0 +1,9 @@
+"""
+Checkpoint engine module for SGLang.
+
+This module provides functionality for updating model weights via checkpoint engine.
+"""
+
+from sglang.srt.checkpoint_engine.update import main
+
+__all__ = ["main"]