Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .agents/configs/model-routing-table.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
"_docs": "See .agents/tools/context/model-routing.md for routing rules, .agents/tools/ai-assistants/fallback-chains.md for integration.",

"tiers": {
"local": { "models": ["local/llama.cpp"], "fallback": "haiku", "cost": 0 },
"haiku": { "models": ["anthropic/claude-haiku-4-5"] },
"flash": { "models": ["anthropic/claude-haiku-4-5"] },
"sonnet": { "models": ["anthropic/claude-sonnet-4-6"] },
Expand All @@ -14,6 +15,13 @@
},

"providers": {
"local": {
"endpoint": "http://localhost:8080/v1/chat/completions",
"key_env": null,
"probe_path": "/v1/models",
"probe_timeout_seconds": 3,
"_comment": "llama.cpp or compatible OpenAI-API server. No API key needed. Use local-model-helper.sh status to check."
},
"anthropic": {
"endpoint": "https://api.anthropic.com/v1/messages",
"key_env": "ANTHROPIC_API_KEY",
Expand Down
12 changes: 6 additions & 6 deletions .agents/tools/context/model-routing.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ model: haiku
- **Purpose**: Route tasks to the cheapest model that can handle them well
- **Philosophy**: Use the smallest model that produces acceptable quality
- **Default**: sonnet (best balance of cost/capability for most tasks)
- **Cost spectrum**: local (free) -> flash -> haiku -> sonnet -> pro -> opus (highest)
- **Cost spectrum**: local (free) -> haiku -> flash -> sonnet -> pro -> opus (highest)

## Model Tiers

Expand Down Expand Up @@ -51,7 +51,7 @@ model: haiku
**Fallback behaviour**: If a local model is not running or not installed, the routing depends on why `local` was selected:

- **Privacy/on-device requirement**: FAIL — do not route to cloud. Return an error instructing the user to start the local server or pass `--allow-cloud` to explicitly override.
- **Cost optimisation or experimentation**: Fall back to `flash` (cheapest cloud tier by blended cost).
- **Cost optimisation or experimentation**: Fall back to `haiku` (next tier in the routing chain). Local has no same-tier fallback — it skips directly to the cheapest cloud tier.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better conciseness and consistency with other fallback descriptions, consider integrating the full explanation of the local tier's fallback behavior into the parenthetical note.

Suggested change
- **Cost optimisation or experimentation**: Fall back to `haiku` (next tier in the routing chain). Local has no same-tier fallback — it skips directly to the cheapest cloud tier.
- **Cost optimisation or experimentation**: Fall back to `haiku` (next tier in the routing chain — local has no same-tier fallback).


### Use `flash` when:

Expand Down Expand Up @@ -106,7 +106,7 @@ tools:

Valid values: `local`, `haiku`, `flash`, `sonnet`, `pro`, `opus`

> **Note**: The `local` tier requires `local-model-helper.sh` to be set up and a model server running. If no local server is available, `local` in frontmatter will fall back to `haiku`. See `tools/local-models/local-models.md` for setup.
> **Note**: The `local` tier requires `local-model-helper.sh` to be set up and a model server running. If no local server is available, `local` in frontmatter falls back to `haiku` (next tier in the routing chain — local has no same-tier fallback). See `tools/local-models/local-models.md` for setup.

When `model:` is absent, `sonnet` is assumed (the default tier).

Expand All @@ -131,7 +131,7 @@ Concrete model subagents are defined across these paths (`tools/ai-assistants/mo

| Tier | Subagent | Primary Model | Fallback |
|------|----------|---------------|----------|
| `local` | `tools/local-models/local-models.md` | llama.cpp (user GGUF) | FAIL (privacy) or flash (cost) |
| `local` | `tools/local-models/local-models.md` | llama.cpp (user GGUF) | FAIL (privacy) or haiku (cost) |
| `flash` | `models/flash.md` | gemini-2.5-flash | gpt-4.1-mini |
| `haiku` | `models/haiku.md` | claude-haiku-4-5 | gemini-2.5-flash |
| `sonnet` | `models/sonnet.md` | claude-sonnet-4 | gpt-4.1 |
Expand Down Expand Up @@ -182,7 +182,7 @@ Each tier defines a primary model and a fallback from a different provider. When

| Tier | Primary | Fallback | When to Fallback |
|------|---------|----------|------------------|
| `local` | llama.cpp (localhost) | flash (cost-only) or FAIL (privacy) | Server not running, no model installed. Fails closed for privacy/on-device tasks; falls back to flash only for cost-optimisation use cases. |
| `local` | llama.cpp (localhost) | haiku (cost-only) or FAIL (privacy) | Server not running, no model installed. Fails closed for privacy/on-device tasks; falls back to haiku (next tier in chain) for cost-optimisation use cases. No same-tier fallback exists — local skips directly to cloud. |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve conciseness and consistency with other fallback explanations, integrate the full fallback description into the parenthetical note. This avoids repeating the 'no same-tier fallback' concept in a separate sentence.

Suggested change
| `local` | llama.cpp (localhost) | haiku (cost-only) or FAIL (privacy) | Server not running, no model installed. Fails closed for privacy/on-device tasks; falls back to haiku (next tier in chain) for cost-optimisation use cases. No same-tier fallback exists — local skips directly to cloud. |
| `local` | llama.cpp (localhost) | haiku (cost-only) or FAIL (privacy) | Server not running, no model installed. Fails closed for privacy/on-device tasks; falls back to haiku (next tier in the routing chain — local has no same-tier fallback) for cost-optimisation use cases. |

| `flash` | gemini-2.5-flash | gpt-4.1-mini | No Google key |
| `haiku` | claude-haiku-4-5 | gemini-2.5-flash | No Anthropic key |
| `sonnet` | claude-sonnet-4 | gpt-4.1 | No Anthropic key |
Expand Down Expand Up @@ -253,7 +253,7 @@ Is the task privacy/on-device constrained?
→ NO: Is the task bulk/offline where local saves cost?
→ YES: Is a local model running and capable enough?
→ YES: local
→ NO: flash (cheapest cloud fallback)
→ NO: haiku (next tier in chain — local has no same-tier fallback)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with other descriptions of the local tier fallback in this document, consider using the full phrase next tier in the routing chain.

Suggested change
→ NO: haiku (next tier in chain — local has no same-tier fallback)
→ NO: haiku (next tier in the routing chain — local has no same-tier fallback)

→ NO: Is the task simple classification/formatting?
→ YES: haiku
→ NO: Does it need >50K tokens of context?
Expand Down
Loading