You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,7 +30,7 @@ High-throughput, low-latency inference framework designed for serving generative
30
30
31
31
## Latest News
32
32
33
-
-[08/05] Deploy `openai/gpt-oss-120b` with disaggregated serving on NVIDIA Blackwell GPUs using Dynamo [➡️ link](./components/backends/trtllm/gpt-oss.md)
33
+
-[08/05] Deploy `openai/gpt-oss-120b` with disaggregated serving on NVIDIA Blackwell GPUs using Dynamo [➡️ link](./docs/backends/trtllm/gpt-oss.md)
34
34
35
35
## The Era of Multi-GPU, Multi-Node
36
36
@@ -65,9 +65,9 @@ Dynamo is designed to be inference engine agnostic (supports TRT-LLM, vLLM, SGLa
65
65
66
66
To learn more about each framework and their capabilities, check out each framework's README!
Built in Rust for performance and in Python for extensibility, Dynamo is fully open-source and driven by a transparent, OSS (Open Source Software) first development approach.
Copy file name to clipboardExpand all lines: components/README.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,9 +23,9 @@ This directory contains the core components that make up the Dynamo inference fr
23
23
24
24
Dynamo supports multiple inference engines (with a focus on SGLang, vLLM, and TensorRT-LLM), each with their own deployment configurations and capabilities:
25
25
26
-
-**[vLLM](backends/vllm/README.md)** - High-performance LLM inference with native KV cache events and NIXL-based transfer mechanisms
27
-
-**[SGLang](backends/sglang/README.md)** - Structured generation language framework with ZMQ-based communication
28
-
-**[TensorRT-LLM](backends/trtllm/README.md)** - NVIDIA's optimized LLM inference engine with TensorRT acceleration
26
+
-**[vLLM](/docs/backends/vllm/README.md)** - High-performance LLM inference with native KV cache events and NIXL-based transfer mechanisms
27
+
-**[SGLang](/docs/backends/sglang/README.md)** - Structured generation language framework with ZMQ-based communication
28
+
-**[TensorRT-LLM](/docs/backends/trtllm/README.md)** - NVIDIA's optimized LLM inference engine with TensorRT acceleration
29
29
30
30
Each engine provides launch scripts for different deployment patterns in their respective `/launch` & `/deploy` directories.
0 commit comments