t1338.1: Align local tier fallback to haiku and add local to routing table by alex-solovyev · Pull Request #2385 · marcusquinn/aidevops

alex-solovyev · 2026-02-26T18:23:08Z

Summary

Fix fallback chain inconsistency: Local tier previously fell back to flash, but AGENTS.md defines the routing chain as local→haiku→flash→sonnet→pro→opus. Updated all 6 references in model-routing.md to fall back to haiku instead of flash.
Add local tier to model-routing-table.json: Added local tier entry with fallback: "haiku" and cost: 0, plus a local provider entry for llama.cpp's OpenAI-compatible API (localhost:8080, no API key).
Consistent semantics: Local has no same-tier fallback (no alternative local provider). For privacy/on-device tasks it FAILs closed; for cost-optimisation it skips directly to haiku (cheapest cloud tier in the chain).

Changes

File	What changed
`.agents/tools/context/model-routing.md`	Updated cost spectrum, routing rules fallback, frontmatter note, model-specific subagents table, fallback routing table, and decision flowchart — all `flash` → `haiku` for local tier fallback
`.agents/configs/model-routing-table.json`	Added `local` tier and `local` provider entries

Verification

JSON validated (python3 json.load)
All 6 local-fallback references in model-routing.md confirmed consistent (haiku, not flash)
No stale flash fallback references for local tier remain
Cost spectrum matches AGENTS.md ordering: local→haiku→flash→sonnet→pro→opus

Ref #2320

Summary by CodeRabbit

New Features
- Added support for local AI model server integration, enabling users to run models locally with automatic fallback to cloud-based services when unavailable.

…le (t1338.1) - Fix fallback chain: local → haiku (not flash), matching AGENTS.md ordering local→haiku→flash→sonnet→pro→opus. Local has no same-tier fallback; it skips directly to the cheapest cloud tier (haiku) for cost-optimisation cases, or FAILs closed for privacy/on-device requirements. - Add local tier to model-routing-table.json with localhost provider entry (llama.cpp OpenAI-compatible API on port 8080, no API key needed). - Update all 6 references in model-routing.md: cost spectrum, routing rules, frontmatter note, model-specific subagents table, fallback routing table, and decision flowchart. Ref #2320

gemini-code-assist · 2026-02-26T18:23:25Z

Summary of Changes

Hello @alex-solovyev, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the model routing logic by integrating a dedicated "local" tier and standardizing its fallback behavior. The changes ensure that local model usage is properly configured and that its fallback to cloud tiers (specifically "haiku") is consistent with the overall routing strategy, improving both cost optimization and adherence to privacy requirements.

Highlights

Local Tier Fallback Alignment: The local tier's fallback mechanism has been updated to consistently route to "haiku" instead of "flash", aligning with the defined "local→haiku→flash→sonnet→pro→opus" routing chain.
Local Tier Configuration: A new "local" tier entry was added to "model-routing-table.json" with "fallback: "haiku"" and "cost: 0", along with a "local" provider entry for llama.cpp's OpenAI-compatible API.
Routing Semantics Clarification: Clarified that the "local" tier has no same-tier fallback, directly skipping to "haiku" for cost optimization, while failing closed for privacy-sensitive tasks.

Changelog

.agents/configs/model-routing-table.json
- Added the "local" tier definition.
- Included the "local" provider configuration for llama.cpp.
.agents/tools/context/model-routing.md
- Updated the cost spectrum to reflect "local -> haiku -> flash".
- Modified fallback behavior descriptions for the "local" tier.
- Adjusted notes and tables to show "haiku" as the fallback for "local".
- Updated the decision flowchart for "local" tier fallback.

Activity

No activity has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-02-26T18:23:29Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b86e0a0 and 27ed700.

📒 Files selected for processing (2)

.agents/configs/model-routing-table.json
.agents/tools/context/model-routing.md

Walkthrough

This PR introduces support for local model inference by adding a new "local" tier and provider configuration to the model routing system. The local tier uses a localhost:8080 OpenAI-compatible endpoint (llama.cpp) with zero cost, falling back to the "haiku" tier when unavailable. Documentation is updated to reflect the revised fallback chain and routing behavior.

Changes

Cohort / File(s)	Summary
Model Routing Configuration `.agents/configs/model-routing-table.json`	Adds new "local" tier (models: `["local/llama.cpp"]`, fallback: `"haiku"`, cost: 0) and "local" provider (endpoint: `http://localhost:8080/v1/chat/completions`, null API key, probe settings included).
Model Routing Documentation `.agents/tools/context/model-routing.md`	Updates cost spectrum ordering and fallback behavior to reflect haiku as the primary fallback for local tier (replacing flash); clarifies that local skips directly to cloud with no same-tier fallback; adjusts routing chain descriptions throughout.

Possibly Related PRs

t1338.2: Create local-models.md subagent #2328: Adds local-model integration configuration with the same localhost:8080 OpenAI-compatible endpoint setup
t1338.1: Extend model-routing.md with local tier #2326: Configures the new local model tier with haiku fallback behavior
chore: update stale model references to current Anthropic lineup #2118: Updates haiku model tier references in documentation

Estimated Code Review Effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

Local models now run offline with pride, 🚀
No API keys needed, just localhost inside,
When local stumbles, haiku takes the call,
A routing chain to handle them all. ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and specifically describes the main changes: aligning the local tier fallback from flash to haiku and adding the local tier to the routing configuration.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/t1338.1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-02-26T18:23:43Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 89 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Feb 26 18:23:38 UTC 2026: Code review monitoring started
Thu Feb 26 18:23:39 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 89

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 89
VULNERABILITIES: 0

Generated on: Thu Feb 26 18:23:42 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

sonarqubecloud · 2026-02-26T18:24:32Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

gemini-code-assist

Code Review

This pull request successfully aligns the local tier fallback to "haiku" and integrates the "local" tier into the model routing table. The changes in .agents/configs/model-routing-table.json correctly define the "local" tier and provider. The updates in .agents/tools/context/model-routing.md consistently reflect the new fallback chain. The review comments suggest minor improvements for documentation consistency and conciseness in the explanations of the local tier's fallback behavior, and all comments have been retained as they align with good documentation practices and do not contradict any provided rules.

gemini-code-assist · 2026-02-26T18:25:33Z

.agents/tools/context/model-routing.md


 - **Privacy/on-device requirement**: FAIL — do not route to cloud. Return an error instructing the user to start the local server or pass `--allow-cloud` to explicitly override.
- **Cost optimisation or experimentation**: Fall back to `flash` (cheapest cloud tier by blended cost).
+- **Cost optimisation or experimentation**: Fall back to `haiku` (next tier in the routing chain). Local has no same-tier fallback — it skips directly to the cheapest cloud tier.


For better conciseness and consistency with other fallback descriptions, consider integrating the full explanation of the local tier's fallback behavior into the parenthetical note.

Suggested change

- **Cost optimisation or experimentation**: Fall back to `haiku` (next tier in the routing chain). Local has no same-tier fallback — it skips directly to the cheapest cloud tier.

- **Cost optimisation or experimentation**: Fall back to `haiku` (next tier in the routing chain — local has no same-tier fallback).

gemini-code-assist · 2026-02-26T18:25:34Z

.agents/tools/context/model-routing.md

 | Tier | Primary | Fallback | When to Fallback |
 |------|---------|----------|------------------|
-| `local` | llama.cpp (localhost) | flash (cost-only) or FAIL (privacy) | Server not running, no model installed. Fails closed for privacy/on-device tasks; falls back to flash only for cost-optimisation use cases. |
+| `local` | llama.cpp (localhost) | haiku (cost-only) or FAIL (privacy) | Server not running, no model installed. Fails closed for privacy/on-device tasks; falls back to haiku (next tier in chain) for cost-optimisation use cases. No same-tier fallback exists — local skips directly to cloud. |


To improve conciseness and consistency with other fallback explanations, integrate the full fallback description into the parenthetical note. This avoids repeating the 'no same-tier fallback' concept in a separate sentence.

Suggested change

| `local` | llama.cpp (localhost) | haiku (cost-only) or FAIL (privacy) | Server not running, no model installed. Fails closed for privacy/on-device tasks; falls back to haiku (next tier in chain) for cost-optimisation use cases. No same-tier fallback exists — local skips directly to cloud. |

| `local` | llama.cpp (localhost) | haiku (cost-only) or FAIL (privacy) | Server not running, no model installed. Fails closed for privacy/on-device tasks; falls back to haiku (next tier in the routing chain — local has no same-tier fallback) for cost-optimisation use cases. |

gemini-code-assist · 2026-02-26T18:25:34Z

.agents/tools/context/model-routing.md

    → YES: Is a local model running and capable enough?
      → YES: local
-      → NO: flash (cheapest cloud fallback)
+      → NO: haiku (next tier in chain — local has no same-tier fallback)


For consistency with other descriptions of the local tier fallback in this document, consider using the full phrase next tier in the routing chain.

Suggested change

→ NO: haiku (next tier in chain — local has no same-tier fallback)

→ NO: haiku (next tier in the routing chain — local has no same-tier fallback)

gemini-code-assist bot reviewed Feb 26, 2026

View reviewed changes

coderabbitai bot approved these changes Feb 26, 2026

View reviewed changes

marcusquinn merged commit 119ff97 into main Feb 26, 2026
25 checks passed

github-actions bot mentioned this pull request Feb 26, 2026

t1338.1: Extend model-routing.md with local tier #2320

Closed

github-actions bot pushed a commit that referenced this pull request Feb 26, 2026

chore: mark t1338.1 complete (pr:#2385 completed:2026-02-26) [skip ci]

bfdc866

github-actions bot mentioned this pull request Feb 26, 2026

t1338: Local AI model support (llama.cpp + HuggingFace) #2307

Closed

15 tasks

alex-solovyev deleted the feature/t1338.1 branch February 26, 2026 18:48

marcusquinn mentioned this pull request Feb 27, 2026

t1338: PR-merge triggered issue-sync for closing hygiene #2327

Closed

marcusquinn mentioned this pull request Mar 7, 2026

quality-debt: .agents/tools/context/model-routing.md — PR #2326 review feedback (high) #3299

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

t1338.1: Align local tier fallback to haiku and add local to routing table#2385

t1338.1: Align local tier fallback to haiku and add local to routing table#2385
marcusquinn merged 1 commit intomainfrom
feature/t1338.1

alex-solovyev commented Feb 26, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

gemini-code-assist bot commented Feb 26, 2026

Uh oh!

coderabbitai bot commented Feb 26, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 26, 2026

Uh oh!

sonarqubecloud bot commented Feb 26, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 26, 2026

Uh oh!

gemini-code-assist bot Feb 26, 2026

Uh oh!

gemini-code-assist bot Feb 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	- Cost optimisation or experimentation: Fall back to `haiku` (next tier in the routing chain). Local has no same-tier fallback — it skips directly to the cheapest cloud tier.
	- Cost optimisation or experimentation: Fall back to `haiku` (next tier in the routing chain — local has no same-tier fallback).

	\| `local` \| llama.cpp (localhost) \| haiku (cost-only) or FAIL (privacy) \| Server not running, no model installed. Fails closed for privacy/on-device tasks; falls back to haiku (next tier in chain) for cost-optimisation use cases. No same-tier fallback exists — local skips directly to cloud. \|
	\| `local` \| llama.cpp (localhost) \| haiku (cost-only) or FAIL (privacy) \| Server not running, no model installed. Fails closed for privacy/on-device tasks; falls back to haiku (next tier in the routing chain — local has no same-tier fallback) for cost-optimisation use cases. \|

	→ NO: haiku (next tier in chain — local has no same-tier fallback)
	→ NO: haiku (next tier in the routing chain — local has no same-tier fallback)

Conversation

alex-solovyev commented Feb 26, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Verification

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Feb 26, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly Related PRs

Estimated Code Review Effort

Poem

Uh oh!

github-actions bot commented Feb 26, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

sonarqubecloud bot commented Feb 26, 2026

Quality Gate passed

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alex-solovyev commented Feb 26, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 26, 2026 •

edited

Loading