Skip to content

Commit 4a045d0

Browse files
committed
Update README.md to remove trailing whitespace and enhance clarity in deployment instructions
1 parent b246cde commit 4a045d0

File tree

1 file changed

+17
-17
lines changed

1 file changed

+17
-17
lines changed

examples/deployments/LLM Router/README.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -46,33 +46,33 @@ graph TB
4646
subgraph "Ingress Layer"
4747
LB[Load Balancer/Ingress]
4848
end
49-
49+
5050
subgraph "LLM Router (Helm)"
5151
RC[Router Controller]
5252
RS[Router Server + GPU]
5353
end
54-
54+
5555
subgraph "Dynamo Platform - Shared Frontend Architecture"
5656
FE[Shared Frontend Service]
5757
PR[Processor]
58-
58+
5959
subgraph "Model 1 Workers"
6060
VW1[VllmDecodeWorker-8B + GPU]
6161
PW1[VllmPrefillWorker-8B + GPU]
6262
end
63-
63+
6464
subgraph "Model 2 Workers"
6565
VW2[VllmDecodeWorker-70B + GPU]
6666
PW2[VllmPrefillWorker-70B + GPU]
6767
end
68-
68+
6969
subgraph "Model 3 Workers"
7070
VW3[VllmDecodeWorker-Mixtral + GPU]
7171
PW3[VllmPrefillWorker-Mixtral + GPU]
7272
end
7373
end
7474
end
75-
75+
7676
LB --> RC
7777
RC --> RS
7878
RS --> FE
@@ -83,7 +83,7 @@ graph TB
8383
PR --> PW1
8484
PR --> PW2
8585
PR --> PW3
86-
86+
8787
style LB fill:#e1f5fe
8888
style RC fill:#f3e5f5
8989
style RS fill:#f3e5f5
@@ -331,7 +331,7 @@ The deployment now uses a **shared frontend architecture** that splits the origi
331331

332332
**Key Advantages:**
333333
- **Resource Efficiency**: Single frontend serves all models, reducing CPU and memory overhead
334-
- **Simplified Operations**: One service to monitor, scale, and maintain instead of multiple frontends
334+
- **Simplified Operations**: One service to monitor, scale, and maintain instead of multiple frontends
335335
- **Better Load Distribution**: Intelligent request routing across all available model workers
336336
- **Cost Optimization**: Fewer running services means lower infrastructure costs
337337
- **Unified API Gateway**: Single endpoint for all models with consistent OpenAI API interface
@@ -641,7 +641,7 @@ graph LR
641641
C --> D[Setup Router]
642642
D --> E[Configure Access]
643643
E --> F[Test Integration]
644-
644+
645645
style A fill:#e3f2fd
646646
style B fill:#f3e5f5
647647
style C fill:#e8f5e8
@@ -747,7 +747,7 @@ envsubst < disagg.yaml | kubectl apply -f - -n ${NAMESPACE}
747747

748748
**Current Setup**: We deploy 3 models that cover most use cases:
749749
- **Llama-3.1-8B**: Fast model for simple tasks
750-
- **Llama-3.1-70B**: Powerful model for complex tasks
750+
- **Llama-3.1-70B**: Powerful model for complex tasks
751751
- **Mixtral-8x22B**: Creative model for conversational tasks
752752

753753
**To add more models**, follow this pattern:
@@ -762,7 +762,7 @@ export MODEL_SUFFIX=phi-3-mini
762762
# Deploy using aggregated workers
763763
envsubst < agg.yaml | kubectl apply -f - -n ${NAMESPACE}
764764

765-
# OR deploy using disaggregated workers
765+
# OR deploy using disaggregated workers
766766
envsubst < disagg.yaml | kubectl apply -f - -n ${NAMESPACE}
767767
```
768768

@@ -894,7 +894,7 @@ nano ../examples/deployments/LLM\ Router/llm-router-values-override.yaml
894894
# Update line ~34: Replace "YOUR_REGISTRY_HERE/" with your actual registry:
895895
# Examples:
896896
# - "nvcr.io/nvidia/" (if you have access to NVIDIA's public registry)
897-
# - "your-company-registry.com/llm-router/" (for private registries)
897+
# - "your-company-registry.com/llm-router/" (for private registries)
898898
# - "docker.io/your-username/" (for Docker Hub)
899899

900900
# Also update imagePullSecrets name to match your registry credentials
@@ -1050,11 +1050,11 @@ The LLM Router controller:
10501050

10511051
### Router Configuration
10521052

1053-
The `router-config-dynamo.yaml` configures routing policies to our deployed models.
1053+
The `router-config-dynamo.yaml` configures routing policies to our deployed models.
10541054

10551055
**Current Setup**: The configuration routes to different models based on task complexity and type:
10561056
- `meta-llama/Llama-3.1-8B-Instruct` - Fast model for simple tasks (8B parameters)
1057-
- `meta-llama/Llama-3.1-70B-Instruct` - Powerful model for complex tasks (70B parameters)
1057+
- `meta-llama/Llama-3.1-70B-Instruct` - Powerful model for complex tasks (70B parameters)
10581058
- `mistralai/Mixtral-8x22B-Instruct-v0.1` - Creative model for conversational tasks (8x22B parameters)
10591059

10601060
**Note**: This guide shows the full 3-model production setup. For testing/development, you can start with fewer models (e.g., just Llama-8B + Qwen-0.6B) and add more as needed. The router will work with any subset of the configured models.
@@ -1101,7 +1101,7 @@ curl -X POST http://localhost:8084/v1/chat/completions \
11011101
-d '{
11021102
"messages": [
11031103
{
1104-
"role": "user",
1104+
"role": "user",
11051105
"content": "Write a Python function to calculate fibonacci numbers"
11061106
}
11071107
],
@@ -1119,7 +1119,7 @@ curl -X POST http://localhost:8084/v1/chat/completions \
11191119
-d '{
11201120
"messages": [
11211121
{
1122-
"role": "user",
1122+
"role": "user",
11231123
"content": "Explain quantum computing in simple terms"
11241124
}
11251125
],
@@ -1220,4 +1220,4 @@ docker compose -f deploy/metrics/docker-compose.yml down
12201220
- [LLM Router GitHub Repository](https://github.com/NVIDIA-AI-Blueprints/llm-router)
12211221
- [LLM Router Helm Chart](https://github.com/NVIDIA-AI-Blueprints/llm-router/tree/main/deploy/helm/llm-router)
12221222
- [Kubernetes Documentation](https://kubernetes.io/docs/)
1223-
- [NVIDIA GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/overview.html)
1223+
- [NVIDIA GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/overview.html)

0 commit comments

Comments
 (0)