t1338.1: Align local tier fallback to haiku and add local to routing table#2385
t1338.1: Align local tier fallback to haiku and add local to routing table#2385marcusquinn merged 1 commit intomainfrom
Conversation
…le (t1338.1) - Fix fallback chain: local → haiku (not flash), matching AGENTS.md ordering local→haiku→flash→sonnet→pro→opus. Local has no same-tier fallback; it skips directly to the cheapest cloud tier (haiku) for cost-optimisation cases, or FAILs closed for privacy/on-device requirements. - Add local tier to model-routing-table.json with localhost provider entry (llama.cpp OpenAI-compatible API on port 8080, no API key needed). - Update all 6 references in model-routing.md: cost spectrum, routing rules, frontmatter note, model-specific subagents table, fallback routing table, and decision flowchart. Ref #2320
Summary of ChangesHello @alex-solovyev, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refines the model routing logic by integrating a dedicated "local" tier and standardizing its fallback behavior. The changes ensure that local model usage is properly configured and that its fallback to cloud tiers (specifically "haiku") is consistent with the overall routing strategy, improving both cost optimization and adherence to privacy requirements. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
WalkthroughThis PR introduces support for local model inference by adding a new "local" tier and provider configuration to the model routing system. The local tier uses a localhost:8080 OpenAI-compatible endpoint (llama.cpp) with zero cost, falling back to the "haiku" tier when unavailable. Documentation is updated to reflect the revised fallback chain and routing behavior. Changes
Possibly Related PRs
Estimated Code Review Effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Thu Feb 26 18:23:42 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
There was a problem hiding this comment.
Code Review
This pull request successfully aligns the local tier fallback to "haiku" and integrates the "local" tier into the model routing table. The changes in .agents/configs/model-routing-table.json correctly define the "local" tier and provider. The updates in .agents/tools/context/model-routing.md consistently reflect the new fallback chain. The review comments suggest minor improvements for documentation consistency and conciseness in the explanations of the local tier's fallback behavior, and all comments have been retained as they align with good documentation practices and do not contradict any provided rules.
|
|
||
| - **Privacy/on-device requirement**: FAIL — do not route to cloud. Return an error instructing the user to start the local server or pass `--allow-cloud` to explicitly override. | ||
| - **Cost optimisation or experimentation**: Fall back to `flash` (cheapest cloud tier by blended cost). | ||
| - **Cost optimisation or experimentation**: Fall back to `haiku` (next tier in the routing chain). Local has no same-tier fallback — it skips directly to the cheapest cloud tier. |
There was a problem hiding this comment.
For better conciseness and consistency with other fallback descriptions, consider integrating the full explanation of the local tier's fallback behavior into the parenthetical note.
| - **Cost optimisation or experimentation**: Fall back to `haiku` (next tier in the routing chain). Local has no same-tier fallback — it skips directly to the cheapest cloud tier. | |
| - **Cost optimisation or experimentation**: Fall back to `haiku` (next tier in the routing chain — local has no same-tier fallback). |
| | Tier | Primary | Fallback | When to Fallback | | ||
| |------|---------|----------|------------------| | ||
| | `local` | llama.cpp (localhost) | flash (cost-only) or FAIL (privacy) | Server not running, no model installed. Fails closed for privacy/on-device tasks; falls back to flash only for cost-optimisation use cases. | | ||
| | `local` | llama.cpp (localhost) | haiku (cost-only) or FAIL (privacy) | Server not running, no model installed. Fails closed for privacy/on-device tasks; falls back to haiku (next tier in chain) for cost-optimisation use cases. No same-tier fallback exists — local skips directly to cloud. | |
There was a problem hiding this comment.
To improve conciseness and consistency with other fallback explanations, integrate the full fallback description into the parenthetical note. This avoids repeating the 'no same-tier fallback' concept in a separate sentence.
| | `local` | llama.cpp (localhost) | haiku (cost-only) or FAIL (privacy) | Server not running, no model installed. Fails closed for privacy/on-device tasks; falls back to haiku (next tier in chain) for cost-optimisation use cases. No same-tier fallback exists — local skips directly to cloud. | | |
| | `local` | llama.cpp (localhost) | haiku (cost-only) or FAIL (privacy) | Server not running, no model installed. Fails closed for privacy/on-device tasks; falls back to haiku (next tier in the routing chain — local has no same-tier fallback) for cost-optimisation use cases. | |
| → YES: Is a local model running and capable enough? | ||
| → YES: local | ||
| → NO: flash (cheapest cloud fallback) | ||
| → NO: haiku (next tier in chain — local has no same-tier fallback) |
There was a problem hiding this comment.
For consistency with other descriptions of the local tier fallback in this document, consider using the full phrase next tier in the routing chain.
| → NO: haiku (next tier in chain — local has no same-tier fallback) | |
| → NO: haiku (next tier in the routing chain — local has no same-tier fallback) |



Summary
flash, but AGENTS.md defines the routing chain aslocal→haiku→flash→sonnet→pro→opus. Updated all 6 references in model-routing.md to fall back tohaikuinstead offlash.localtier entry withfallback: "haiku"andcost: 0, plus alocalprovider entry for llama.cpp's OpenAI-compatible API (localhost:8080, no API key).Changes
.agents/tools/context/model-routing.mdflash→haikufor local tier fallback.agents/configs/model-routing-table.jsonlocaltier andlocalprovider entriesVerification
flashfallback references for local tier remainlocal→haiku→flash→sonnet→pro→opusRef #2320
Summary by CodeRabbit