Skip to content
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion components/backends/trtllm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
| Feature | TensorRT-LLM | Notes |
|--------------------|--------------|-----------------------------------------------------------------------|
| **WideEP** | ✅ | |
| **DP Rank Routing**| ✅ | |
| **Attention DP** | ✅ | |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be consistent with table across all 3 backends if possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great catch, thank you. will update all of them

| **GB200 Support** | ✅ | |

## Quick Start
Expand Down
58 changes: 19 additions & 39 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,29 +17,29 @@
Welcome to NVIDIA Dynamo
========================

The NVIDIA Dynamo Platform is a high-performance, low-latency inference framework designed to serve all AI models—across any framework, architecture, or deployment scale.
The NVIDIA Dynamo Platform is a high-performance, low-latency inference framework designed to serve generative AI and reasoning models—across any framework, architecture, or deployment scale. Built in Rust for performance and in Python for extensibility, Dynamo is fully open-source and driven by a transparent, OSS (Open Source Software) first development approach.

Large language models are quickly outgrowing the memory and compute budget of any single GPU. Tensor-parallelism solves the capacity problem by spreading each layer across many GPUs—and sometimes many servers—but it creates a new one: how do you coordinate those shards, route requests, and share KV cache fast enough to feel like one accelerator? This orchestration gap is exactly what NVIDIA Dynamo is built to close.

.. admonition:: 💎 Discover the latest developments!
:class: seealso

This guide is a snapshot of the `Dynamo GitHub Repository <https://github.com/ai-dynamo/dynamo>`_ at a specific point in time. For the latest information and examples, see:

- `Dynamo README <https://github.com/ai-dynamo/dynamo/blob/main/README.md>`_
- `Architecture and features doc <https://github.com/ai-dynamo/dynamo/blob/main/docs/architecture/>`_
- `Usage guides <https://github.com/ai-dynamo/dynamo/tree/main/docs/guides>`_
- `Dynamo examples repo <https://github.com/ai-dynamo/examples>`_
- `Architecture and Features <https://github.com/ai-dynamo/dynamo/blob/main/docs/architecture/>`_
- `Usage Guides <https://github.com/ai-dynamo/dynamo/tree/main/docs/guides>`_
- `Dynamo Examples <https://github.com/ai-dynamo/dynamo/tree/main/examples>`_


Quick Start
-----------------
Follow the :doc:`Quick Guide to install Dynamo Platform <guides/dynamo_deploy/quickstart>`.


Dive in: Examples
Dive in: Dynamo Examples
-----------------

The examples below assume you build the latest image yourself from source. If using a prebuilt image follow the examples from the corresponding branch.

.. grid:: 1 2 2 2
:gutter: 3
:margin: 0
Expand All @@ -49,25 +49,25 @@ The examples below assume you build the latest image yourself from source. If us
:link: examples/runtime/hello_world/README
:link-type: doc

Demonstrates the basic concepts of Dynamo by creating a simple GPU-unaware graph
Demonstrates the basic concepts of Dynamo by creating a simple GPU-unaware graph using Python bindings.

.. grid-item-card:: :doc:`LLM Serving with VLLM <components/backends/vllm/README>`
.. grid-item-card:: :doc:`LLM Serving with vLLM <components/backends/vllm/README>`
:link: components/backends/vllm/README
:link-type: doc

Presents examples and reference implementations for deploying Large Language Models (LLMs) in various configurations with VLLM.
Examples and reference implementations for deploying LLM inference workflows in various configurations with vLLM.

.. grid-item-card:: :doc:`Multinode with SGLang <components/backends/sglang/docs/multinode-examples>`
:link: components/backends/sglang/docs/multinode-examples
.. grid-item-card:: :doc:`Deploy DeepSeek R1 Disaggregated with WideEP using SGLang <components/backends/sglang/docs/dsr1-wideep-gb200.md>`
:link: components/backends/sglang/docs/dsr1-wideep-gb200.md
:link-type: doc

Demonstrates disaggregated serving on several nodes.
Demonstrates disaggregated serving of DeepSeek R1 using Wide Expert Parallelism using SGLang.

.. grid-item-card:: :doc:`TensorRT-LLM <components/backends/trtllm/README>`
.. grid-item-card:: :doc:`Deploy with TensorRT-LLM <components/backends/trtllm/README>`
:link: components/backends/trtllm/README
:link-type: doc

Presents TensorRT-LLM examples and reference implementations for deploying Large Language Models (LLMs) in various configurations.
Presents TensorRT-LLM examples and reference implementations for deploying LLMs in various configurations.


.. toctree::
Expand All @@ -92,42 +92,21 @@ The examples below assume you build the latest image yourself from source. If us
:hidden:
:caption: Using Dynamo

Running Inference Graphs Locally (dynamo-run) <guides/dynamo_run.md>
Deploying Inference Graphs <guides/dynamo_deploy/README.md>

.. toctree::
:hidden:
:caption: Usage Guides

Writing Python Workers in Dynamo <guides/backend.md>
Disaggregation and Performance Tuning <guides/disagg_perf_tuning.md>
KV Cache Router Performance Tuning <guides/kv_router_perf_tuning.md>
Working with Dynamo Kubernetes Operator <guides/dynamo_deploy/dynamo_operator.md>
Configuring Metrics for Observability <guides/metrics.md>

.. toctree::
:hidden:
:caption: Deployment Guides

Dynamo Deploy Quickstart <guides/dynamo_deploy/quickstart.md>
Dynamo Cloud Kubernetes Platform <guides/dynamo_deploy/dynamo_cloud.md>
Deploying Dynamo on Kubernetes <guides/dynamo_deploy/README.md>
Manual Helm Deployment <deploy/helm/README.md>
GKE Setup Guide <guides/dynamo_deploy/gke_setup.md>
Minikube Setup Guide <guides/dynamo_deploy/minikube.md>
Model Caching with Fluid <guides/dynamo_deploy/model_caching_with_fluid.md>

.. toctree::
:hidden:
:caption: Benchmarking

Planner Benchmark Example <guides/planner_benchmark/README.md>


.. toctree::
:hidden:
:caption: API

NIXL Connect API <API/nixl_connect/README.md>

.. toctree::
:hidden:
:caption: Examples
Expand All @@ -136,13 +115,14 @@ The examples below assume you build the latest image yourself from source. If us
LLM Deployment Examples using VLLM <components/backends/vllm/README.md>
Multinode Examples using SGLang <components/backends/sglang/docs/multinode-examples.md>
LLM Deployment Examples using TensorRT-LLM <components/backends/trtllm/README.md>
Planner Benchmark Example <guides/planner_benchmark/README.md>

.. toctree::
:hidden:
:caption: Reference


Glossary <dynamo_glossary.md>
NIXL Connect API <API/nixl_connect/README.md>
KVBM Reading <architecture/kvbm_reading.md>


Loading