From 300fc711a172d9133b320602b491295028b111be Mon Sep 17 00:00:00 2001 From: Ryan McCormick Date: Mon, 13 Oct 2025 12:01:16 -0700 Subject: [PATCH] docs: Fix typos and remove incomplete section from vllm multinode doc Signed-off-by: Ryan McCormick --- docs/backends/vllm/multi-node.md | 15 ++------------- 1 file changed, 2 insertions(+), 13 deletions(-) diff --git a/docs/backends/vllm/multi-node.md b/docs/backends/vllm/multi-node.md index 6ffbf969d2..1c614fe6ce 100644 --- a/docs/backends/vllm/multi-node.md +++ b/docs/backends/vllm/multi-node.md @@ -73,7 +73,7 @@ python -m dynamo.vllm \ Deploy prefill and decode workers on separate nodes for optimized resource utilization: -**Node 1**: Run ingress and prefill workers +**Node 1**: Run ingress and decode worker ```bash # Start ingress python -m dynamo.frontend --router-mode kv & @@ -85,7 +85,7 @@ python -m dynamo.vllm \ --enforce-eager ``` -**Node 2**: Run decode workers +**Node 2**: Run prefill worker ```bash # Start decode worker python -m dynamo.vllm \ @@ -94,14 +94,3 @@ python -m dynamo.vllm \ --enforce-eager \ --is-prefill-worker ``` - -## Large Model Deployment - -For models requiring more GPUs than available on a single node such as tensor-parallel-size 16: - -**Node 1**: First part of tensor-parallel model -```bash -# Start ingress -python -m dynamo.frontend --router-mode kv & -``` -