ai-dynamo
diff --git a/‎README.md‎
Lines changed: 19 additions & 2 deletions b/‎README.md‎
Lines changed: 19 additions & 2 deletions
diff --git a/‎docs/images/frontpage-architecture.png‎
88.5 KB b/‎docs/images/frontpage-architecture.png‎
88.5 KB
diff --git a/‎docs/images/frontpage-banner.png‎
1.65 MB b/‎docs/images/frontpage-banner.png‎
1.65 MB
diff --git a/‎docs/images/frontpage-gpu-evolution.png‎
85.4 KB b/‎docs/images/frontpage-gpu-evolution.png‎
85.4 KB
diff --git a/‎docs/images/frontpage-gpu-vertical.png‎
150 KB b/‎docs/images/frontpage-gpu-vertical.png‎
150 KB
@@ -14,8 +14,7 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 -->
-
-# NVIDIA Dynamo
+![Dynamo banner](./docs/images/frontpage-banner.png)
 
 [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
 [![GitHub Release](https://img.shields.io/github/v/release/ai-dynamo/dynamo)](https://github.com/ai-dynamo/dynamo/releases/latest)
@@ -25,8 +24,24 @@ limitations under the License.
 
 ### 📢 **Please join us for our** [ **first Dynamo in-person meetup with vLLM and SGLang leads**](https://events.nvidia.com/nvidiadynamousermeetups) **on 6/5 (Thu) in SF!** ###
 
+
+### The Era of Multi-Node, Multi-GPU
+
+![GPU Evolution](./docs/images/frontpage-gpu-evolution.png)
+
+
+Large language models are quickly outgrowing the memory and compute budget of any single GPU. Tensor-parallelism solves the capacity problem by spreading each layer across many GPUs—and sometimes many servers—but it creates a new one: how do you coordinate those shards, route requests, and share KV cache fast enough to feel like one accelerator? This orchestration gap is exactly what NVIDIA Dynamo is built to close.
+
+![Multi Node Multi-GPU topology](./docs/images/frontpage-gpu-vertical.png)
+
+
+
+### Introducing NVIDIA Dynamo
+
 NVIDIA Dynamo is a high-throughput low-latency inference framework designed for serving generative AI and reasoning models in multi-node distributed environments. Dynamo is designed to be inference engine agnostic (supports TRT-LLM, vLLM, SGLang or others) and captures LLM-specific capabilities such as:
 
+![Dynamo architecture](./docs/images/frontpage-architecture.png)
+
 - **Disaggregated prefill & decode inference** – Maximizes GPU throughput and facilitates trade off between throughput and latency.
 - **Dynamic GPU scheduling** – Optimizes performance based on fluctuating demand
 - **LLM-aware request routing** – Eliminates unnecessary KV cache re-computation
@@ -35,6 +50,8 @@ NVIDIA Dynamo is a high-throughput low-latency inference framework designed for
 
 Built in Rust for performance and in Python for extensibility, Dynamo is fully open-source and driven by a transparent, OSS (Open Source Software) first development approach.
 
+
+
 ### Installation
 
 The following examples require a few system level packages.