Skip to content

Commit 98708c4

Browse files
docs: add image to front page readme (#1320)
Signed-off-by: Faradawn Yang <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
1 parent 46b8c66 commit 98708c4

File tree

5 files changed

+19
-2
lines changed

5 files changed

+19
-2
lines changed

README.md

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,7 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1414
See the License for the specific language governing permissions and
1515
limitations under the License.
1616
-->
17-
18-
# NVIDIA Dynamo
17+
![Dynamo banner](./docs/images/frontpage-banner.png)
1918

2019
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
2120
[![GitHub Release](https://img.shields.io/github/v/release/ai-dynamo/dynamo)](https://github.com/ai-dynamo/dynamo/releases/latest)
@@ -25,8 +24,24 @@ limitations under the License.
2524

2625
### 📢 **Please join us for our** [ **first Dynamo in-person meetup with vLLM and SGLang leads**](https://events.nvidia.com/nvidiadynamousermeetups) **on 6/5 (Thu) in SF!** ###
2726

27+
28+
### The Era of Multi-Node, Multi-GPU
29+
30+
![GPU Evolution](./docs/images/frontpage-gpu-evolution.png)
31+
32+
33+
Large language models are quickly outgrowing the memory and compute budget of any single GPU. Tensor-parallelism solves the capacity problem by spreading each layer across many GPUs—and sometimes many servers—but it creates a new one: how do you coordinate those shards, route requests, and share KV cache fast enough to feel like one accelerator? This orchestration gap is exactly what NVIDIA Dynamo is built to close.
34+
35+
![Multi Node Multi-GPU topology](./docs/images/frontpage-gpu-vertical.png)
36+
37+
38+
39+
### Introducing NVIDIA Dynamo
40+
2841
NVIDIA Dynamo is a high-throughput low-latency inference framework designed for serving generative AI and reasoning models in multi-node distributed environments. Dynamo is designed to be inference engine agnostic (supports TRT-LLM, vLLM, SGLang or others) and captures LLM-specific capabilities such as:
2942

43+
![Dynamo architecture](./docs/images/frontpage-architecture.png)
44+
3045
- **Disaggregated prefill & decode inference** – Maximizes GPU throughput and facilitates trade off between throughput and latency.
3146
- **Dynamic GPU scheduling** – Optimizes performance based on fluctuating demand
3247
- **LLM-aware request routing** – Eliminates unnecessary KV cache re-computation
@@ -35,6 +50,8 @@ NVIDIA Dynamo is a high-throughput low-latency inference framework designed for
3550

3651
Built in Rust for performance and in Python for extensibility, Dynamo is fully open-source and driven by a transparent, OSS (Open Source Software) first development approach.
3752

53+
54+
3855
### Installation
3956

4057
The following examples require a few system level packages.
88.5 KB
Loading

docs/images/frontpage-banner.png

1.65 MB
Loading
85.4 KB
Loading
150 KB
Loading

0 commit comments

Comments
 (0)