ishandhanani · ishandhanani · Dec 19, 2025 · Dec 17, 2025 · Dec 17, 2025 · Dec 17, 2025
diff --git a/docs/README.md b/docs/README.md
@@ -14,18 +14,6 @@ Running large language models across multiple GPUs and nodes requires orchestrat
 - **Parameter sweeps** - Run grid searches across configurations with a single command
 - **Profiling support** - Built-in torch/nsys profiling modes
 
-## Architecture Overview
-
-`srtctl` orchestrates distributed inference using SGLang workers in either **disaggregated** or **aggregated** mode.
-
-**Disaggregated Mode** separates prefill and decode into specialized workers:
-
-- Prefill workers handle the initial prompt processing
-- Decode workers handle token generation
-- Frontend distribution via nginx load balancer (default) or sglang_router
-
-**Aggregated Mode** runs combined prefill+decode on each worker, simpler but potentially less efficient for high-throughput scenarios.
-
 ## How It Works
 
 When you run `srtctl apply -f config.yaml`, the tool:
@@ -54,3 +42,4 @@ Once allocated, workers launch inside containers, discover each other through ET
 - [Parameter Sweeps](sweeps.md) - Run grid searches across configurations
 - [Profiling](profiling.md) - Performance analysis with torch/nsys
 - [Analyzing Results](analyzing.md) - Dashboard and visualization
+- [SGLang Router](sglang-router.md) - Alternative to Dynamo for PD disaggregation
diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md
@@ -1,8 +1,10 @@
 # Table of contents
 
-* [Introduction](README.md)
-* [Installation](installation.md)
-* [Profiling](profiling.md)
-* [Monitoring](monitoring.md)
-* [Parameter Sweeps](sweeps.md)
-* [Analying](analyzing.md)
+- [Introduction](README.md)
+- [Installation](installation.md)
+- [SGLang Router](sglang-router.md)
+- [Profiling](profiling.md)
+- [Monitoring](monitoring.md)
+- [Parameter Sweeps](sweeps.md)
+- [Analyzing](analyzing.md)
+- [SLURM FAQ](slurm-faq.md)
diff --git a/docs/analyzing.md b/docs/analyzing.md
@@ -6,22 +6,5 @@ uv run streamlit run analysis/dashboard/app.py
 # Another way to launch dashboard
 make dashboard
 ```
-Opens interactive dashboard at http://localhost:8501
-
-
-## Features
-
-### 📊 Interactive Dashboard
 
-- **Pareto Analysis** - TPS/GPU vs TPS/User tradeoffs
-- **Latency Breakdown** - TTFT, TPOT, ITL across concurrency levels
-- **Node Metrics** - Runtime metrics from prefill/decode nodes
-- **Config Comparison** - Side-by-side configuration diffs
-- **Run Comparison** - Performance deltas between runs
-
-### 🚀 SLURM Job Submission
-
-- Disaggregated (prefill/decode) or aggregated mode
-- Multiple frontends with nginx load balancing (default)
-- Automated benchmarking with sa-bench
-- Job metadata tracking
+Opens interactive dashboard at http://localhost:8501
diff --git a/docs/installation.md b/docs/installation.md
@@ -18,6 +18,8 @@ pip install -e .
 
 ## Gather your cluster user and target partition
 
+These commands might not work on all clusters. You can use AI to figure out the right set of commands for your cluster.
+
 ```bash
 # user
 sacctmgr -nP show assoc where user=$(whoami) format=account
@@ -27,6 +29,8 @@ sinfo
 
 ## Run Setup
 
+If you are trying to deploy onto Grace (GH200, GB200, etc.), you need to use the `aarch64` architecture. Otherwise use `x86_64`.
+
 ```bash
 make setup ARCH=aarch64  # or ARCH=x86_64
 ```
@@ -42,8 +46,6 @@ The setup will:
 3. Create `srtslurm.yaml` with your settings
 4. Auto-detect and set `srtctl_root` path
 
-Dynamo 0.7.0 is now available on PyPI and will be installed automatically from pip when workers start.
-
 ## Configure srtslurm.yaml
 
 After setup, edit `srtslurm.yaml` to add model paths, containers, and cluster-specific settings:
@@ -56,7 +58,6 @@ The `model_paths` section maps short aliases to full filesystem paths:
 model_paths:
   deepseek-r1: "/mnt/lustre/models/DeepSeek-R1"
   deepseek-r1-fp4: "/mnt/lustre/models/deepseek-r1-0528-fp4-v2"
-  llama-70b: "/mnt/lustre/models/Llama-3-70B"
 ```
 
 Models must be accessible from all compute nodes (typically on a shared filesystem like Lustre or GPFS).
@@ -67,15 +68,14 @@ The `containers` section maps version aliases to `.sqsh` container images:
 
 ```yaml
 containers:
-  latest: "/mnt/containers/lmsysorg+sglang+v0.5.5.sqsh"
-  stable: "/mnt/containers/lmsysorg+sglang+v0.5.4.sqsh"
+  container1: "/mnt/containers/lmsysorg+sglang+v0.5.5.sqsh"
+  container2: "/mnt/containers/lmsysorg+sglang+v0.5.4.sqsh"
 ```
 
 To create a container image from Docker:
 
 ```bash
 enroot import docker://lmsysorg/sglang:v0.5.5
-mv lmsysorg+sglang+v0.5.5.sqsh /mnt/containers/
 ```
 
 ### Cloud Sync (Optional)
@@ -91,42 +91,43 @@ cloud:
 
 Then use `make sync-to-cloud` or `make sync-run RUN_ID=<run_id>`.
 
-### Cluster Compatibility Settings
-
-Some SLURM clusters don't support certain SBATCH directives. If you encounter errors during job submission, you may need to adjust these settings:
-
-#### GPU Resource Specification
-
-If you see this error when submitting jobs:
+### Complete srtslurm.yaml Reference
 
-```
-sbatch: error: Invalid generic resource (gres) specification
-```
-
-Your cluster doesn't support the `--gpus-per-node` directive. Disable it in `srtslurm.yaml`:
+Here's a complete example of all available options:
 
 ```yaml
-use_gpus_per_node_directive: false
-```
+# Default SLURM settings
+default_account: "your-account"
+default_partition: "batch"
+default_time_limit: "4:00:00"
 
-This will omit the `#SBATCH --gpus-per-node` directive from generated job scripts while keeping all other functionality intact.
+# Resource defaults
+gpus_per_node: 4
 
-#### Segment-Based Scheduling
+# SLURM directive compatibility
+use_gpus_per_node_directive: true # Set false if cluster doesn't support --gpus-per-node
+use_segment_sbatch_directive: true # Set false if cluster doesn't support --segment
 
-If you see this error when submitting jobs:
+# Path to srtctl repo root (auto-set by make setup)
+srtctl_root: "/path/to/srtctl"
 
-```
-sbatch: error: Invalid --segment specification
-```
+# Model path aliases
+model_paths:
+  deepseek-r1: "/models/DeepSeek-R1"
+  llama-70b: "/models/Llama-3-70B"
 
-Your cluster doesn't support the `--segment` directive for topology-aware scheduling. Disable it in `srtslurm.yaml`:
+# Container aliases
+containers:
+  latest: "/containers/sglang-latest.sqsh"
+  stable: "/containers/sglang-stable.sqsh"
 
-```yaml
-use_segment_sbatch_directive: false
+# Cloud sync settings (optional)
+cloud:
+  endpoint_url: "https://s3.example.com"
+  bucket: "benchmark-results"
+  prefix: "my-team/"
 ```
 
-The `--segment` directive ensures all allocated nodes are within the same network segment/switch for optimal interconnect performance between prefill and decode workers. If your cluster doesn't support it, SLURM will still allocate nodes but may scatter them across the cluster.
-
 ## Create a Job Config
 
 Create `configs/my-job.yaml`:
@@ -178,7 +179,7 @@ benchmark:
   isl: 1024
   osl: 1024
   concurrencies: [256, 512]
-  req_rate: "inf" # Request rate, use "inf" for max throughput
+  req_rate: "inf"
 ```
 
 ### Backend Options
@@ -195,35 +196,6 @@ backend:
   use_sglang_router: false # Default: false. Use sglang_router for load balancing
 ```
 
-## Profiling (torch / nsys)
-
-You can enable profiling via a top-level `profiling` section in your job YAML:
-
-```yaml
-profiling:
-  type: "torch" # one of: "none", "torch", "nsys"
-  isl: 1024
-  osl: 128
-  concurrency: 24
-  start_step: 0 # optional
-  stop_step: 50 # optional
-
-benchmark:
-  type: "manual" # Required - profiling and benchmarking are mutually exclusive
-```
-
-See [Profiling](profiling.md) for detailed configuration options, constraints, and output file locations.
-
-## Validate with Dry Run
-
-Always validate before submitting:
-
-```bash
-srtctl dry-run -f configs/my-job.yaml
-```
-
-This validates your config, resolves aliases, generates all files, and saves them to `dry-runs/` without submitting to SLURM.
-
 ## Submit the Job
 
 ```bash
@@ -285,43 +257,6 @@ You can run custom initialization scripts on worker nodes before starting SGLang
    srtctl apply -f configs/my-job.yaml --setup-script custom-setup.sh
    ```
 
-The script will be executed on each worker node (prefill, decode, and aggregated) before installing Dynamo from PyPI and starting the SGLang workers. The script must be located in the `configs/` directory, which is mounted into containers at `/configs/`.
+The script will be executed on each worker node (prefill, decode, or aggregated) before installing Dynamo from PyPI and starting the SGLang workers. The script must be located in the `configs/` directory, which is mounted into containers at `/configs/`.
 
 **Note**: Setup scripts only run when you explicitly specify `--setup-script`. No default setup script will run if this flag is omitted.
-
-## Complete srtslurm.yaml Reference
-
-Here's a complete example of all available options:
-
-```yaml
-# Default SLURM settings
-default_account: "your-account"
-default_partition: "batch"
-default_time_limit: "4:00:00"
-
-# Resource defaults
-gpus_per_node: 4
-
-# SLURM directive compatibility
-use_gpus_per_node_directive: true # Set false if cluster doesn't support --gpus-per-node
-use_segment_sbatch_directive: true # Set false if cluster doesn't support --segment
-
-# Path to srtctl repo root (auto-set by make setup)
-srtctl_root: "/path/to/srtctl"
-
-# Model path aliases
-model_paths:
-  deepseek-r1: "/models/DeepSeek-R1"
-  llama-70b: "/models/Llama-3-70B"
-
-# Container aliases
-containers:
-  latest: "/containers/sglang-latest.sqsh"
-  stable: "/containers/sglang-stable.sqsh"
-
-# Cloud sync settings (optional)
-cloud:
-  endpoint_url: "https://s3.example.com"
-  bucket: "benchmark-results"
-  prefix: "my-team/"
-```
diff --git a/docs/profiling.md b/docs/profiling.md
@@ -1,12 +1,16 @@
 # Profiling
 
-srtctl supports two profiling backends for performance analysis: **Torch Profiler** and **NVIDIA Nsight Systems (nsys)**. Profiling helps identify bottlenecks in prefill and decode operations.
+srtctl supports two profiling backends for performance analysis: **Torch Profiler** and **NVIDIA Nsight Systems (nsys)**.
 
 ## Quick Start
 
 Add a `profiling` section to your job YAML:
 
 ```yaml
+# must set benchmark type to "manual"
+benchmark:
+  type: "manual"
+
 profiling:
   type: "torch" # or "nsys"
   isl: 1024