Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 2 additions & 13 deletions docs/backends/vllm/multi-node.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ python -m dynamo.vllm \

Deploy prefill and decode workers on separate nodes for optimized resource utilization:

**Node 1**: Run ingress and prefill workers
**Node 1**: Run ingress and decode worker
```bash
# Start ingress
python -m dynamo.frontend --router-mode kv &
Expand All @@ -85,7 +85,7 @@ python -m dynamo.vllm \
--enforce-eager
```

**Node 2**: Run decode workers
**Node 2**: Run prefill worker
```bash
# Start decode worker
python -m dynamo.vllm \
Expand All @@ -94,14 +94,3 @@ python -m dynamo.vllm \
--enforce-eager \
--is-prefill-worker
```

## Large Model Deployment

For models requiring more GPUs than available on a single node such as tensor-parallel-size 16:

**Node 1**: First part of tensor-parallel model
```bash
# Start ingress
python -m dynamo.frontend --router-mode kv &
```

Loading