Skip to content
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions components/backends/sglang/deploy/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ extraPodSpec:

Before using these templates, ensure you have:

1. **Dynamo Cloud Platform installed** - See [Installing Dynamo Cloud](../../docs/guides/dynamo_deploy/dynamo_cloud.md)
1. **Dynamo Cloud Platform installed** - See [Installing Dynamo Cloud](../../../../docs/guides/dynamo_deploy/dynamo_cloud.md)
2. **Kubernetes cluster with GPU support**
3. **Container registry access** for SGLang runtime images
4. **HuggingFace token secret** (referenced as `envFromSecret: hf-token-secret`)
Expand Down Expand Up @@ -159,4 +159,4 @@ Common issues and solutions:
3. **Health check failures**: Review model loading logs and increase `initialDelaySeconds`
4. **Out of memory**: Increase memory limits or reduce model batch size

For additional support, refer to the [deployment troubleshooting guide](../../docs/guides/dynamo_deploy/quickstart.md#troubleshooting).
For additional support, refer to the [deployment troubleshooting guide](../../../../docs/guides/dynamo_deploy/quickstart.md#troubleshooting).
2 changes: 1 addition & 1 deletion components/backends/sglang/slurm_jobs/README.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
Please refer to [Deploying Dynamo with SGLang on SLURM](../../../../../docs/components/backends/sglang/slurm_jobs/README.md) for more details.
Please refer to [Deploying Dynamo with SGLang on SLURM](../../../../docs/components/backends/sglang/slurm_jobs/README.md) for more details.
20 changes: 10 additions & 10 deletions components/backends/trtllm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,12 +49,12 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))

| Feature | TensorRT-LLM | Notes |
|---------|--------------|-------|
| [**Disaggregated Serving**](../../../architecture/disagg_serving.md) | ✅ | |
| [**Conditional Disaggregation**](../../../architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | Not supported yet |
| [**KV-Aware Routing**](../../../architecture/kv_cache_routing.md) | ✅ | |
| [**SLA-Based Planner**](../../../architecture/sla_planner.md) | 🚧 | Planned |
| [**Load Based Planner**](../../../architecture/load_planner.md) | 🚧 | Planned |
| [**KVBM**](../../../architecture/kvbm_architecture.md) | 🚧 | Planned |
| [**Disaggregated Serving**](../../../docs/architecture/disagg_serving.md) | ✅ | |
| [**Conditional Disaggregation**](../../../docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | Not supported yet |
| [**KV-Aware Routing**](../../../docs/architecture/kv_cache_routing.md) | ✅ | |
| [**SLA-Based Planner**](../../../docs/architecture/sla_planner.md) | 🚧 | Planned |
| [**Load Based Planner**](../../../docs/architecture/load_planner.md) | 🚧 | Planned |
| [**KVBM**](../../../docs/architecture/kvbm_architecture.md) | 🚧 | Planned |

### Large Scale P/D and WideEP Features

Expand Down Expand Up @@ -180,14 +180,14 @@ Below we provide a selected list of advanced examples. Please open up an issue i

### Multinode Deployment

For comprehensive instructions on multinode serving, see the [multinode-examples.md](./multinode-examples.md) guide. It provides step-by-step deployment examples and configuration tips for running Dynamo with TensorRT-LLM across multiple nodes. While the walkthrough uses DeepSeek-R1 as the model, you can easily adapt the process for any supported model by updating the relevant configuration files. You can see [Llama4+eagle](./llama4_plus_eagle.md) guide to learn how to use these scripts when a single worker fits on the single node.
For comprehensive instructions on multinode serving, see the [multinode-examples.md](../../../docs/components/backends/trtllm/multinode-examples.md) guide. It provides step-by-step deployment examples and configuration tips for running Dynamo with TensorRT-LLM across multiple nodes. While the walkthrough uses DeepSeek-R1 as the model, you can easily adapt the process for any supported model by updating the relevant configuration files. You can see [Llama4+eagle](../../../docs/components/backends/trtllm/llama4_plus_eagle.md) guide to learn how to use these scripts when a single worker fits on the single node.

### Speculative Decoding
- **[Llama 4 Maverick Instruct + Eagle Speculative Decoding](./llama4_plus_eagle.md)**
- **[Llama 4 Maverick Instruct + Eagle Speculative Decoding](../../../docs/components/backends/trtllm/llama4_plus_eagle.md)**

### Kubernetes Deployment

For complete Kubernetes deployment instructions, configurations, and troubleshooting, see [TensorRT-LLM Kubernetes Deployment Guide](deploy/README.md)
For complete Kubernetes deployment instructions, configurations, and troubleshooting, see [TensorRT-LLM Kubernetes Deployment Guide](../../../docs/components/backends/trtllm/deploy/README.md)

### Client

Expand Down Expand Up @@ -216,7 +216,7 @@ DISAGGREGATION_STRATEGY="prefill_first" ./launch/disagg.sh

## KV Cache Transfer in Disaggregated Serving

Dynamo with TensorRT-LLM supports two methods for transferring KV cache in disaggregated serving: UCX (default) and NIXL (experimental). For detailed information and configuration instructions for each method, see the [KV cache transfer guide](./kv-cache-tranfer.md).
Dynamo with TensorRT-LLM supports two methods for transferring KV cache in disaggregated serving: UCX (default) and NIXL (experimental). For detailed information and configuration instructions for each method, see the [KV cache transfer guide](../../../docs/components/backends/trtllm/kv-cache-tranfer.md).

## Request Migration

Expand Down
14 changes: 7 additions & 7 deletions components/backends/vllm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,12 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))

| Feature | vLLM | Notes |
|---------|------|-------|
| [**Disaggregated Serving**](../../../architecture/disagg_serving.md) | ✅ | |
| [**Conditional Disaggregation**](../../../architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | WIP |
| [**KV-Aware Routing**](../../../architecture/kv_cache_routing.md) | ✅ | |
| [**SLA-Based Planner**](../../../architecture/sla_planner.md) | ✅ | |
| [**Load Based Planner**](../../../architecture/load_planner.md) | 🚧 | WIP |
| [**KVBM**](../../../architecture/kvbm_architecture.md) | 🚧 | WIP |
| [**Disaggregated Serving**](../../../docs/architecture/disagg_serving.md) | ✅ | |
| [**Conditional Disaggregation**](../../../docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | WIP |
| [**KV-Aware Routing**](../../../docs/architecture/kv_cache_routing.md) | ✅ | |
| [**SLA-Based Planner**](../../../docs/architecture/sla_planner.md) | ✅ | |
| [**Load Based Planner**](../../../docs/architecture/load_planner.md) | 🚧 | WIP |
| [**KVBM**](../../../docs/architecture/kvbm_architecture.md) | 🚧 | WIP |

### Large Scale P/D and WideEP Features

Expand Down Expand Up @@ -152,7 +152,7 @@ Below we provide a selected list of advanced deployments. Please open up an issu

### Kubernetes Deployment

For complete Kubernetes deployment instructions, configurations, and troubleshooting, see [vLLM Kubernetes Deployment Guide](deploy/README.md)
For complete Kubernetes deployment instructions, configurations, and troubleshooting, see [vLLM Kubernetes Deployment Guide](../../../docs/components/backends/vllm/deploy/README.md)

## Configuration

Expand Down
2 changes: 1 addition & 1 deletion deploy/inference-gateway/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ kubectl get gateway inference-gateway -n my-model

3. **Deploy model**

Follow the steps in [model deployment](../../components/backends/vllm/deploy/README.md) to deploy `Qwen/Qwen3-0.6B` model in aggregate mode using [agg.yaml](../../components/backends/vllm/deploy/agg.yaml) in `my-model` kubernetes namespace.
Follow the steps in [model deployment](../../docs/components/backends/vllm/deploy/README.md) to deploy `Qwen/Qwen3-0.6B` model in aggregate mode using [agg.yaml](../../components/backends/vllm/deploy/agg.yaml) in `my-model` kubernetes namespace.

Sample commands to deploy model:
```bash
Expand Down
1 change: 1 addition & 0 deletions docs/API/nixl_connect/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ sequenceDiagram
RemoteWorker -->> LocalWorker: Notify completion (unblock awaiter)
```


## Python Classes

- [Connector](connector.md)
Expand Down
5 changes: 5 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,11 @@ Get started with Dynamo locally in just a few commands:

# Install uv (recommended Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh
<<<<<<< HEAD

=======

>>>>>>> 167c793df4ea3877c40f90d1c8ff627c8c13f8b7
# Create virtual environment and install Dynamo
uv venv venv
source venv/bin/activate
Expand Down Expand Up @@ -143,6 +147,7 @@ The examples below assume you build the latest image yourself from source. If us
Writing Python Workers in Dynamo <guides/backend.md>
Disaggregation and Performance Tuning <guides/disagg_perf_tuning.md>
Working with Dynamo Kubernetes Operator <guides/dynamo_deploy/dynamo_operator.md>
Configuring Metrics for Observability <guides/metrics.md>

.. toctree::
:hidden:
Expand Down
1 change: 0 additions & 1 deletion examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ Learn fundamental Dynamo concepts through these introductory examples:
- **[Quickstart](basics/quickstart/README.md)** - Simple aggregated serving example with vLLM backend
- **[Disaggregated Serving](basics/disaggregated_serving/README.md)** - Prefill/decode separation for enhanced performance and scalability
- **[Multi-node](basics/multinode/README.md)** - Distributed inference across multiple nodes and GPUs
- **[Multimodal](basics/multimodal/README.md)** - Multimodal model deployment with E/P/D disaggregated serving

## Deployment Examples

Expand Down
2 changes: 1 addition & 1 deletion examples/runtime/hello_world/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ Hello star!

## Deployment to Kubernetes

Follow the [Quickstart Guide](../../../guides/dynamo_deploy/quickstart.md) to install Dynamo Cloud.
Follow the [Quickstart Guide](../../../docs/guides/dynamo_deploy/quickstart.md) to install Dynamo Cloud.
Then deploy to kubernetes using

```bash
Expand Down
Loading