Skip to content

t1338.1: Align local tier fallback to haiku and add local to routing table#2385

Merged
marcusquinn merged 1 commit intomainfrom
feature/t1338.1
Feb 26, 2026
Merged

t1338.1: Align local tier fallback to haiku and add local to routing table#2385
marcusquinn merged 1 commit intomainfrom
feature/t1338.1

Conversation

@alex-solovyev
Copy link
Collaborator

@alex-solovyev alex-solovyev commented Feb 26, 2026

Summary

  • Fix fallback chain inconsistency: Local tier previously fell back to flash, but AGENTS.md defines the routing chain as local→haiku→flash→sonnet→pro→opus. Updated all 6 references in model-routing.md to fall back to haiku instead of flash.
  • Add local tier to model-routing-table.json: Added local tier entry with fallback: "haiku" and cost: 0, plus a local provider entry for llama.cpp's OpenAI-compatible API (localhost:8080, no API key).
  • Consistent semantics: Local has no same-tier fallback (no alternative local provider). For privacy/on-device tasks it FAILs closed; for cost-optimisation it skips directly to haiku (cheapest cloud tier in the chain).

Changes

File What changed
.agents/tools/context/model-routing.md Updated cost spectrum, routing rules fallback, frontmatter note, model-specific subagents table, fallback routing table, and decision flowchart — all flashhaiku for local tier fallback
.agents/configs/model-routing-table.json Added local tier and local provider entries

Verification

  • JSON validated (python3 json.load)
  • All 6 local-fallback references in model-routing.md confirmed consistent (haiku, not flash)
  • No stale flash fallback references for local tier remain
  • Cost spectrum matches AGENTS.md ordering: local→haiku→flash→sonnet→pro→opus

Ref #2320

Summary by CodeRabbit

  • New Features
    • Added support for local AI model server integration, enabling users to run models locally with automatic fallback to cloud-based services when unavailable.

…le (t1338.1)

- Fix fallback chain: local → haiku (not flash), matching AGENTS.md ordering
  local→haiku→flash→sonnet→pro→opus. Local has no same-tier fallback;
  it skips directly to the cheapest cloud tier (haiku) for cost-optimisation
  cases, or FAILs closed for privacy/on-device requirements.
- Add local tier to model-routing-table.json with localhost provider entry
  (llama.cpp OpenAI-compatible API on port 8080, no API key needed).
- Update all 6 references in model-routing.md: cost spectrum, routing rules,
  frontmatter note, model-specific subagents table, fallback routing table,
  and decision flowchart.

Ref #2320
@gemini-code-assist
Copy link

Summary of Changes

Hello @alex-solovyev, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the model routing logic by integrating a dedicated "local" tier and standardizing its fallback behavior. The changes ensure that local model usage is properly configured and that its fallback to cloud tiers (specifically "haiku") is consistent with the overall routing strategy, improving both cost optimization and adherence to privacy requirements.

Highlights

  • Local Tier Fallback Alignment: The local tier's fallback mechanism has been updated to consistently route to "haiku" instead of "flash", aligning with the defined "local→haiku→flash→sonnet→pro→opus" routing chain.
  • Local Tier Configuration: A new "local" tier entry was added to "model-routing-table.json" with "fallback: "haiku"" and "cost: 0", along with a "local" provider entry for llama.cpp's OpenAI-compatible API.
  • Routing Semantics Clarification: Clarified that the "local" tier has no same-tier fallback, directly skipping to "haiku" for cost optimization, while failing closed for privacy-sensitive tasks.
Changelog
  • .agents/configs/model-routing-table.json
    • Added the "local" tier definition.
    • Included the "local" provider configuration for llama.cpp.
  • .agents/tools/context/model-routing.md
    • Updated the cost spectrum to reflect "local -> haiku -> flash".
    • Modified fallback behavior descriptions for the "local" tier.
    • Adjusted notes and tables to show "haiku" as the fallback for "local".
    • Updated the decision flowchart for "local" tier fallback.
Activity
  • No activity has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b86e0a0 and 27ed700.

📒 Files selected for processing (2)
  • .agents/configs/model-routing-table.json
  • .agents/tools/context/model-routing.md

Walkthrough

This PR introduces support for local model inference by adding a new "local" tier and provider configuration to the model routing system. The local tier uses a localhost:8080 OpenAI-compatible endpoint (llama.cpp) with zero cost, falling back to the "haiku" tier when unavailable. Documentation is updated to reflect the revised fallback chain and routing behavior.

Changes

Cohort / File(s) Summary
Model Routing Configuration
.agents/configs/model-routing-table.json
Adds new "local" tier (models: ["local/llama.cpp"], fallback: "haiku", cost: 0) and "local" provider (endpoint: http://localhost:8080/v1/chat/completions, null API key, probe settings included).
Model Routing Documentation
.agents/tools/context/model-routing.md
Updates cost spectrum ordering and fallback behavior to reflect haiku as the primary fallback for local tier (replacing flash); clarifies that local skips directly to cloud with no same-tier fallback; adjusts routing chain descriptions throughout.

Possibly Related PRs

Estimated Code Review Effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

Local models now run offline with pride, 🚀
No API keys needed, just localhost inside,
When local stumbles, haiku takes the call,
A routing chain to handle them all. ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main changes: aligning the local tier fallback from flash to haiku and adding the local tier to the routing configuration.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/t1338.1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 89 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Feb 26 18:23:38 UTC 2026: Code review monitoring started
Thu Feb 26 18:23:39 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 89

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 89
  • VULNERABILITIES: 0

Generated on: Thu Feb 26 18:23:42 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully aligns the local tier fallback to "haiku" and integrates the "local" tier into the model routing table. The changes in .agents/configs/model-routing-table.json correctly define the "local" tier and provider. The updates in .agents/tools/context/model-routing.md consistently reflect the new fallback chain. The review comments suggest minor improvements for documentation consistency and conciseness in the explanations of the local tier's fallback behavior, and all comments have been retained as they align with good documentation practices and do not contradict any provided rules.


- **Privacy/on-device requirement**: FAIL — do not route to cloud. Return an error instructing the user to start the local server or pass `--allow-cloud` to explicitly override.
- **Cost optimisation or experimentation**: Fall back to `flash` (cheapest cloud tier by blended cost).
- **Cost optimisation or experimentation**: Fall back to `haiku` (next tier in the routing chain). Local has no same-tier fallback — it skips directly to the cheapest cloud tier.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better conciseness and consistency with other fallback descriptions, consider integrating the full explanation of the local tier's fallback behavior into the parenthetical note.

Suggested change
- **Cost optimisation or experimentation**: Fall back to `haiku` (next tier in the routing chain). Local has no same-tier fallback — it skips directly to the cheapest cloud tier.
- **Cost optimisation or experimentation**: Fall back to `haiku` (next tier in the routing chain — local has no same-tier fallback).

| Tier | Primary | Fallback | When to Fallback |
|------|---------|----------|------------------|
| `local` | llama.cpp (localhost) | flash (cost-only) or FAIL (privacy) | Server not running, no model installed. Fails closed for privacy/on-device tasks; falls back to flash only for cost-optimisation use cases. |
| `local` | llama.cpp (localhost) | haiku (cost-only) or FAIL (privacy) | Server not running, no model installed. Fails closed for privacy/on-device tasks; falls back to haiku (next tier in chain) for cost-optimisation use cases. No same-tier fallback exists — local skips directly to cloud. |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve conciseness and consistency with other fallback explanations, integrate the full fallback description into the parenthetical note. This avoids repeating the 'no same-tier fallback' concept in a separate sentence.

Suggested change
| `local` | llama.cpp (localhost) | haiku (cost-only) or FAIL (privacy) | Server not running, no model installed. Fails closed for privacy/on-device tasks; falls back to haiku (next tier in chain) for cost-optimisation use cases. No same-tier fallback exists — local skips directly to cloud. |
| `local` | llama.cpp (localhost) | haiku (cost-only) or FAIL (privacy) | Server not running, no model installed. Fails closed for privacy/on-device tasks; falls back to haiku (next tier in the routing chain — local has no same-tier fallback) for cost-optimisation use cases. |

→ YES: Is a local model running and capable enough?
→ YES: local
→ NO: flash (cheapest cloud fallback)
→ NO: haiku (next tier in chain — local has no same-tier fallback)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with other descriptions of the local tier fallback in this document, consider using the full phrase next tier in the routing chain.

Suggested change
→ NO: haiku (next tier in chain — local has no same-tier fallback)
→ NO: haiku (next tier in the routing chain — local has no same-tier fallback)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants