You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Also update imagePullSecrets name to match your registry credentials
@@ -1050,11 +1050,11 @@ The LLM Router controller:
1050
1050
1051
1051
### Router Configuration
1052
1052
1053
-
The `router-config-dynamo.yaml` configures routing policies to our deployed models.
1053
+
The `router-config-dynamo.yaml` configures routing policies to our deployed models.
1054
1054
1055
1055
**Current Setup**: The configuration routes to different models based on task complexity and type:
1056
1056
- `meta-llama/Llama-3.1-8B-Instruct`- Fast model for simple tasks (8B parameters)
1057
-
- `meta-llama/Llama-3.1-70B-Instruct`- Powerful model for complex tasks (70B parameters)
1057
+
- `meta-llama/Llama-3.1-70B-Instruct`- Powerful model for complex tasks (70B parameters)
1058
1058
- `mistralai/Mixtral-8x22B-Instruct-v0.1`- Creative model for conversational tasks (8x22B parameters)
1059
1059
1060
1060
**Note**: This guide shows the full 3-model production setup. For testing/development, you can start with fewer models (e.g., just Llama-8B + Qwen-0.6B) and add more as needed. The router will work with any subset of the configured models.
@@ -1101,7 +1101,7 @@ curl -X POST http://localhost:8084/v1/chat/completions \
1101
1101
-d '{
1102
1102
"messages": [
1103
1103
{
1104
-
"role": "user",
1104
+
"role": "user",
1105
1105
"content": "Write a Python function to calculate fibonacci numbers"
1106
1106
}
1107
1107
],
@@ -1119,7 +1119,7 @@ curl -X POST http://localhost:8084/v1/chat/completions \
1119
1119
-d '{
1120
1120
"messages": [
1121
1121
{
1122
-
"role": "user",
1122
+
"role": "user",
1123
1123
"content": "Explain quantum computing in simple terms"
1124
1124
}
1125
1125
],
@@ -1220,4 +1220,4 @@ docker compose -f deploy/metrics/docker-compose.yml down
0 commit comments