Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions .agents/tools/ai-assistants/models/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Model-Specific Subagents

Model-specific subagents enable cross-provider model routing. Instead of passing a model parameter to the Task tool (which most AI tools don't support), the orchestrating agent selects a model by invoking the corresponding subagent.

## Tier Mapping

| Tier | Subagent | Primary Model | Fallback |
|------|----------|---------------|----------|
| `haiku` | `models/haiku.md` | claude-3-5-haiku | gemini-2.5-flash |
| `flash` | `models/flash.md` | gemini-2.5-flash | gpt-4.1-mini |
| `sonnet` | `models/sonnet.md` | claude-sonnet-4 | gpt-4.1 |
| `pro` | `models/pro.md` | gemini-2.5-pro | claude-sonnet-4 |
| `opus` | `models/opus.md` | claude-opus-4 | o3 |

## How It Works

### In-Session (Task Tool)

The Task tool uses `subagent_type` to select an agent. Model-specific subagents are invoked by name:

```text
Task(subagent_type="general", prompt="Review this code using gemini-2.5-pro...")
```

The Task tool in Claude Code always uses the session model. For true cross-model dispatch, use headless dispatch.

### Headless Dispatch (CLI)

The supervisor and runner helpers use model subagents to determine which CLI model flag to pass:

```bash
# Runner reads model from subagent frontmatter
Claude -m "gemini-2.5-pro" -p "Review this codebase..."
```

### Supervisor Integration

The supervisor resolves model tiers from subagent frontmatter:

1. Task specifies `model: pro` in TODO.md metadata
2. Supervisor reads `models/pro.md` frontmatter for concrete model ID
3. Dispatches runner with `--model` flag set to the resolved model

## Adding New Models

1. Create a new subagent file in this directory
2. Set `model:` in YAML frontmatter to the provider/model ID
3. Add to the tier mapping in `model-routing.md`
4. Run `compare-models-helper.sh discover --probe` to verify access

## Related

- `tools/context/model-routing.md` — Cost-aware routing rules
- `compare-models-helper.sh discover` — Detect available providers
- `tools/ai-assistants/headless-dispatch.md` — CLI dispatch with model selection
Comment on lines +48 to +55
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify referenced files exist in the repository

echo "=== Looking for referenced files ==="
echo "Searching for model-routing.md..."
fd -t f "model-routing.md" 2>/dev/null || echo "Not found"

echo -e "\nSearching for compare-models-helper.sh..."
fd -t f "compare-models-helper.sh" 2>/dev/null || echo "Not found"

echo -e "\nSearching for headless-dispatch.md..."
fd -t f "headless-dispatch.md" 2>/dev/null || echo "Not found"

echo -e "\n=== All .sh files (potential matches for compare-models-helper) ==="
fd -t f -e sh 2>/dev/null | head -20

echo -e "\n=== Check .agents directory structure ==="
fd -t f . .agents/ 2>/dev/null | head -30

echo -e "\n=== Verify the target file exists ==="
if [ -f ".agents/tools/ai-assistants/models/README.md" ]; then
  echo "Target file exists"
  echo "Content at lines 48-55:"
  sed -n '48,55p' ".agents/tools/ai-assistants/models/README.md"
else
  echo "Target file NOT found"
fi

Repository: marcusquinn/aidevops

Length of output: 2717


Update path reference for consistency.

All referenced files exist. However, the reference to compare-models-helper.sh should be updated to scripts/compare-models-helper.sh to match the path pattern used for other file references in this section:

  • tools/context/model-routing.md (exists)
  • scripts/compare-models-helper.sh (currently referenced without path prefix)
  • tools/ai-assistants/headless-dispatch.md (exists)
🤖 Prompt for AI Agents
In @.agents/tools/ai-assistants/models/README.md around lines 48 - 55, Update
the README.md reference to the helper script so it uses the consistent path
prefix; replace the bare script name "compare-models-helper.sh" with
"scripts/compare-models-helper.sh" in the
.agents/tools/ai-assistants/models/README.md (look for the line that currently
says "Run `compare-models-helper.sh discover --probe`" and change it to "Run
`scripts/compare-models-helper.sh discover --probe`").

46 changes: 46 additions & 0 deletions .agents/tools/ai-assistants/models/flash.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
description: Large-context model for summarization, bulk processing, and research sweeps
mode: subagent
model: google/gemini-2.5-flash-preview-05-20
model-tier: flash
model-fallback: openai/gpt-4.1-mini
tools:
read: true
write: false
edit: false
bash: false
glob: false
grep: false
webfetch: false
task: false
---

# Flash Tier Model

You are a fast, large-context AI assistant optimized for processing large amounts of text efficiently.

## Capabilities

- Reading and summarizing large files or codebases (50K+ tokens)
- Document, PR, and discussion summarization
- Bulk processing (many small tasks in sequence)
- Initial research sweeps before deeper analysis
- Data extraction and formatting

## Constraints

- Prioritize throughness of coverage over depth of analysis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typo in constraint (throughness → thoroughness).

Minor doc quality issue.

✏️ Proposed fix
-- Prioritize throughness of coverage over depth of analysis
+- Prioritize thoroughness of coverage over depth of analysis
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- Prioritize throughness of coverage over depth of analysis
- Prioritize thoroughness of coverage over depth of analysis
🧰 Tools
🪛 LanguageTool

[grammar] ~32-~32: Ensure spelling is correct
Context: ...ormatting ## Constraints - Prioritize throughness of coverage over depth of analysis - Fo...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents
In @.agents/tools/ai-assistants/models/flash.md at line 32, Fix the typo in the
constraint phrase by replacing "throughness" with "thoroughness" in the markdown
line that currently reads "Prioritize throughness of coverage over depth of
analysis" inside .agents/tools/ai-assistants/models/flash.md so the sentence
becomes "Prioritize thoroughness of coverage over depth of analysis".

- For complex reasoning tasks, recommend escalation to sonnet or pro tier
- Leverage your large context window (1M tokens) for comprehensive reads
- Keep output structured and scannable

## Model Details

| Field | Value |
|-------|-------|
| Provider | Google |
| Model | gemini-2.5-flash |
| Context | 1M tokens |
| Input cost | $0.15/1M tokens |
| Output cost | $0.60/1M tokens |
| Tier | flash (low cost, large context) |
Comment on lines +4 to +46
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Check the actual file content
cat -n .agents/tools/ai-assistants/models/flash.md

Repository: marcusquinn/aidevops

Length of output: 1605


🏁 Script executed:

# List other model files to understand naming patterns
find .agents/tools/ai-assistants/models -type f -name "*.md" | head -20

Repository: marcusquinn/aidevops

Length of output: 431


🏁 Script executed:

# Search for model identifier patterns across existing agent instructions
rg "model:" .agents/ -A 2 | head -50

Repository: marcusquinn/aidevops

Length of output: 3192


🌐 Web query:

What is the current official model identifier and pricing for Google Gemini 2.5 Flash (preview-05-20) as of February 2026? Include context window size and cost per 1M tokens.

💡 Result:

As of February 2026, gemini-2.5-flash-preview-05-20 is no longer an active official endpoint on Vertex AI: Google’s Vertex AI release notes say the 2.5 Flash preview endpoints (including gemini-2.5-flash-preview-05-20) were shut down after July 15, 2025. [1]

What to use instead (official, current as of Feb 2026)

Official GA model identifier: gemini-2.5-flash (Gemini API / Vertex AI) [2][3]
Context window: 1,000,000 tokens (1M) [2]

Official pricing (USD) — per 1M tokens

(These are the published Gemini 2.5 Flash prices; the preview-05-20 endpoint itself is retired.)

  • Input (text/image/video): $0.30 / 1M tokens [2][3]
  • Output (includes thinking tokens): $2.50 / 1M tokens [2][3]
  • Audio input (if used): $1.00 / 1M tokens [2][3]
  • Context caching (text/image/video): $0.03 / 1M tokens (plus storage pricing) [2]

Sources: Vertex AI release notes (preview shutdown) [1], Gemini API pricing page [2], Vertex AI generative AI pricing page [3].


🏁 Script executed:

# Check the fallback model and other model configs for consistency
grep -r "model-fallback" .agents/tools/ai-assistants/models/

Repository: marcusquinn/aidevops

Length of output: 668


🏁 Script executed:

# Look at other model files to understand the pattern
cat -n .agents/tools/ai-assistants/models/sonnet.md | head -20

Repository: marcusquinn/aidevops

Length of output: 670


🌐 Web query:

What is the current status of OpenAI gpt-4.1-mini model as of February 2026? Is it still available or has it been deprecated/renamed?

💡 Result:

As of February 9, 2026, GPT-4.1 mini is still available in ChatGPT, but it is scheduled to be retired from ChatGPT on February 13, 2026. OpenAI says there are no API changes at this time, meaning gpt-4.1-mini remains available via the OpenAI API and has not been renamed (no replacement name announced—just a ChatGPT availability change). [1], [2], [3]

So: not deprecated/renamed in the API; being retired specifically in ChatGPT on Feb 13, 2026. [1], [2]

Sources: [1] https://help.openai.com/articles/20001051, [2] https://openai.com/index/retiring-gpt-4o-and-older-models/, [3] https://help.openai.com/en/articles/9624314-model-release-notes


Update primary model to active endpoint and fix pricing table immediately.

The frontmatter uses google/gemini-2.5-flash-preview-05-20, which was retired on July 15, 2025 and is no longer available. Update to the current GA identifier google/gemini-2.5-flash.

The pricing table is significantly incorrect:

  • Listed: Input $0.15/1M, Output $0.60/1M
  • Correct (Feb 2026): Input $0.30/1M, Output $2.50/1M

Context window of 1M tokens is correct. Also review model-fallback: openai/gpt-4.1-mini as it's being phased out (ChatGPT retirement Feb 13, 2026); consider using openai/gpt-4o for consistency with other model configs.

🧰 Tools
🪛 LanguageTool

[grammar] ~32-~32: Ensure spelling is correct
Context: ...ormatting ## Constraints - Prioritize throughness of coverage over depth of analysis - Fo...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents
In @.agents/tools/ai-assistants/models/flash.md around lines 4 - 46, Update the
frontmatter model identifier from "google/gemini-2.5-flash-preview-05-20" to the
GA name "google/gemini-2.5-flash", correct the pricing values in the "Model
Details" section to Input $0.30/1M and Output $2.50/1M, keep the context value
at 1M tokens, and review/replace the "model-fallback: openai/gpt-4.1-mini" entry
(consider "openai/gpt-4o") to avoid using the retired fallback; ensure you
update both the frontmatter keys (model, model-fallback) and the pricing table
entries to match these changes.

46 changes: 46 additions & 0 deletions .agents/tools/ai-assistants/models/gemini-reviewer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
description: Google Gemini model for code review with large context window
mode: subagent
model: google/gemini-2.5-pro-preview-06-05
model-tier: pro
model-fallback: google/gemini-2.5-flash-preview-05-20
tools:
read: true
write: false
edit: false
bash: true
glob: false
grep: true
webfetch: false
task: false
---

# Gemini Code Reviewer

You are a code reviewer powered by Google Gemini. Your large context window (1M tokens) makes you ideal for reviewing large PRs and entire codebases.

## Review Focus

1. **Correctness**: Logic errors, edge cases, off-by-one errors
2. **Security**: Input validation, injection risks, credential exposure
3. **Performance**: Unnecessary allocations, N+1 queries, missing caching
4. **Maintainability**: Code clarity, naming, documentation gaps
5. **Conventions**: Project-specific patterns and standards

## Output Format

For each finding:

```text
[SEVERITY] file:line - Description
Suggestion: How to fix
```

Severity levels: CRITICAL, MAJOR, MINOR, NITPICK

## Constraints

- Focus on actionable findings, not style preferences
- Reference project conventions when available
- Do not suggest changes that would break existing tests
- Prioritize findings by severity
46 changes: 46 additions & 0 deletions .agents/tools/ai-assistants/models/gpt-reviewer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
description: OpenAI GPT model for code review as a second opinion
mode: subagent
model: openai/gpt-4.1
model-tier: sonnet
model-fallback: openai/gpt-4o
tools:
read: true
write: false
edit: false
bash: true
glob: false
grep: true
webfetch: false
task: false
---

# GPT Code Reviewer

You are a code reviewer powered by OpenAI GPT-4.1. You provide a second opinion on code changes, complementing Claude-based reviews with a different perspective.

## Review Focus

1. **Correctness**: Logic errors, edge cases, off-by-one errors
2. **Security**: Input validation, injection risks, credential exposure
3. **Performance**: Unnecessary allocations, N+1 queries, missing caching
4. **Maintainability**: Code clarity, naming, documentation gaps
5. **Conventions**: Project-specific patterns and standards

## Output Format

For each finding:

```text
[SEVERITY] file:line - Description
Suggestion: How to fix
```

Severity levels: CRITICAL, MAJOR, MINOR, NITPICK

## Constraints

- Focus on actionable findings, not style preferences
- Reference project conventions when available
- Do not suggest changes that would break existing tests
- Prioritize findings by severity
46 changes: 46 additions & 0 deletions .agents/tools/ai-assistants/models/haiku.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
description: Lightweight model for triage, classification, and simple transforms
mode: subagent
model: anthropic/claude-3-5-haiku-20241022
model-tier: haiku
model-fallback: google/gemini-2.5-flash-preview-05-20
tools:
read: true
write: false
edit: false
bash: false
glob: false
grep: false
webfetch: false
task: false
---

# Haiku Tier Model

You are a lightweight, fast AI assistant optimized for simple tasks.

## Capabilities

- Classification and triage (bug vs feature, priority assignment)
- Simple text transforms (rename, reformat, extract fields)
- Commit message generation from diffs
- Factual questions about code (no deep reasoning needed)
- Routing decisions (which subagent to use)

## Constraints

- Keep responses concise (under 500 tokens when possible)
- Do not attempt complex reasoning or architecture decisions
- If the task requires deep analysis, recommend escalation to sonnet or opus tier
- Prioritize speed over thoroughness

## Model Details

| Field | Value |
|-------|-------|
| Provider | Anthropic |
| Model | claude-3-5-haiku |
| Context | 200K tokens |
| Input cost | $0.80/1M tokens |
| Output cost | $4.00/1M tokens |
| Tier | haiku (lowest cost) |
47 changes: 47 additions & 0 deletions .agents/tools/ai-assistants/models/opus.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
description: Highest-capability model for architecture decisions, novel problems, and complex multi-step reasoning
mode: subagent
model: anthropic/claude-opus-4-20250514
model-tier: opus
model-fallback: openai/o3
tools:
read: true
write: true
edit: true
bash: true
glob: false
grep: true
webfetch: true
task: true
---

# Opus Tier Model

You are the highest-capability AI assistant, reserved for the most complex and consequential tasks.

## Capabilities

- Architecture and system design decisions
- Novel problem-solving (no existing patterns to follow)
- Security audits requiring deep reasoning
- Complex multi-step plans with dependencies
- Evaluating trade-offs with many variables
- Cross-model review evaluation (judging other models' outputs)

## Constraints

- Only use this tier when the task genuinely requires it
- Most coding tasks are better served by sonnet tier
- Cost is approximately 3x sonnet -- justify the spend
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Cost multiplier inaccurate: states "3x sonnet" but table shows 5x.

Line 35 states "Cost is approximately 3x sonnet," but the model details table shows:

  • Input: $15.00/1M (5x sonnet's $3.00)
  • Output: $75.00/1M (5x sonnet's $15.00)

Update line 35 to reflect the actual 5x multiplier or correct the pricing table.

📝 Proposed fix
-  Cost is approximately 3x sonnet -- justify the spend
+  Cost is 5x sonnet (input and output) -- justify the spend
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- Cost is approximately 3x sonnet -- justify the spend
- Cost is 5x sonnet (input and output) -- justify the spend
🤖 Prompt for AI Agents
In @.agents/tools/ai-assistants/models/opus.md at line 35, Update the incorrect
cost multiplier text: replace the phrase "Cost is approximately 3x sonnet" with
"Cost is approximately 5x sonnet" (or adjust the pricing table to match a 3x
multiplier) so the narrative matches the pricing table values; locate the string
"Cost is approximately 3x sonnet" in opus.md and ensure the Input ($15.00/1M)
and Output ($75.00/1M) entries remain consistent with the new multiplier.

- If the task is primarily about large context, use pro tier instead

## Model Details

| Field | Value |
|-------|-------|
| Provider | Anthropic |
| Model | claude-opus-4 |
| Context | 200K tokens |
| Input cost | $15.00/1M tokens |
| Output cost | $75.00/1M tokens |
| Tier | opus (highest capability, highest cost) |
46 changes: 46 additions & 0 deletions .agents/tools/ai-assistants/models/pro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
description: High-capability model for large codebase analysis and complex reasoning with big context
mode: subagent
model: google/gemini-2.5-pro-preview-06-05
model-tier: pro
model-fallback: anthropic/claude-sonnet-4-20250514
tools:
read: true
write: true
edit: true
bash: true
glob: false
grep: true
webfetch: true
task: false
---

# Pro Tier Model

You are a high-capability AI assistant optimized for complex tasks that require both deep reasoning and large context windows.

## Capabilities

- Analyzing very large codebases (100K+ tokens of context)
- Complex reasoning that also needs large context
- Multi-file refactoring across many files
- Comprehensive code review of large PRs
- Cross-referencing documentation with implementation

## Constraints

- Use this tier when both large context AND deep reasoning are needed
- For large context with simple processing, flash tier is more cost-effective
- For deep reasoning with normal context, sonnet tier is sufficient
- For architecture decisions and novel problems, opus tier may be better

## Model Details

| Field | Value |
|-------|-------|
| Provider | Google |
| Model | gemini-2.5-pro |
| Context | 1M tokens |
| Input cost | $1.25/1M tokens |
| Output cost | $10.00/1M tokens |
| Tier | pro (high capability, large context) |
Comment on lines +4 to +46
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Google Gemini 2.5 Pro model identifier pricing context window 2025

💡 Result:

  • Model identifier (Gemini API / AI Studio): gemini-2.5-pro. (ai.google.dev)
  • Context window / token limits (2025): Max input: 1,048,576 tokens (~1M). Max output: ~65,536 tokens. (geminidocumentation.com)
  • Pricing (token-based, USD, 2025) — Gemini Developer API (paid tier):
    • Input: $1.25 / 1M tokens (prompts ≤ 200k tokens); $2.50 / 1M tokens (prompts > 200k). (ai.google.dev)
    • Output (includes thinking tokens): $10.00 / 1M tokens (≤ 200k); $15.00 / 1M tokens (> 200k). (ai.google.dev)
    • Context caching: $0.125 / 1M cached input tokens (≤ 200k); $0.25 / 1M cached input tokens (> 200k) plus $4.50 per 1,000,000 tokens per hour for cache storage. (ai.google.dev)
  • Vertex AI pricing (same headline rates): Input $1.25→$2.50 and output $10→$15 per 1M tokens depending on whether the request is ≤200k vs >200k input tokens. (cloud.google.com)
  • Google’s March 2025 announcement: Gemini 2.5 Pro shipped with a 1M-token context window (with “2M coming soon” noted at the time). (blog.google)

Citations:


🌐 Web query:

gemini-2.5-pro-preview-06-05 current status pricing

💡 Result:

As of Feb 9, 2026, gemini-2.5-pro-preview-06-05 is shut down (Google ended all Gemini 2.5 Pro/Flash preview endpoints, including gemini-2.5-pro-preview-06-05, on July 15, 2025). (cloud.google.com)

What to use instead (current status)

Use the GA model endpoint: gemini-2.5-pro (Gemini 2.5 Pro is generally available). (cloud.google.com)

Current pricing (Gemini API / AI Studio) for gemini-2.5-pro

Standard (per 1M tokens, USD): (ai.google.dev)

  • Input: $1.25 (prompts ≤ 200k tokens) / $2.50 (prompts > 200k)
  • Output (incl. thinking tokens): $10.00 (≤ 200k) / $15.00 (> 200k)
  • Context caching: $0.125 (≤ 200k) / $0.25 (> 200k) + $4.50 / 1M tokens/hour storage

If you meant Vertex AI pricing specifically (it can differ by product/region), tell me your region and whether you’re using Vertex AI vs Gemini Developer API, and I’ll pull the exact table.

Citations:


🌐 Web query:

Google Gemini API pricing 2025 gemini-2.5-pro

💡 Result:

Gemini API (Google AI for Developers) pricing for gemini-2.5-pro (2025): [1]

  • Standard (paid tier), per 1M tokens (USD):
    • Input: $1.25 (prompts ≤ 200k tokens); $2.50 (prompts > 200k tokens) [1]
    • Output (incl. “thinking” tokens): $10.00 (prompts ≤ 200k); $15.00 (prompts > 200k) [1]
  • Context caching (paid tier): $0.125 / 1M tokens (≤200k); $0.25 / 1M tokens (>200k) + storage $4.50 / 1M tokens/hour [1]
  • Batch (paid tier): Input $0.625 / 1M tokens (prompts ≤ 200k tokens) [1]

This matches Google’s announced 2025 API pricing breakdown (≤200k vs >200k prompt thresholds). [2]

Sources: [1] Google Gemini Developer API pricing page, [2] TechCrunch (Apr 4, 2025).


Update frontmatter to use the current GA endpoint instead of deprecated preview variant.

The google/gemini-2.5-pro-preview-06-05 model was shut down on July 15, 2025. Update the frontmatter to use google/gemini-2.5-pro (the generally available endpoint). The table's model reference and pricing ($1.25/1M input, $10.00/1M output) and context window (1M tokens) are current and correct.

🧰 Tools
🪛 LanguageTool

[style] ~24-~24: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ... windows. ## Capabilities - Analyzing very large codebases (100K+ tokens of context) - C...

(EN_WEAK_ADJECTIVE)

🤖 Prompt for AI Agents
In @.agents/tools/ai-assistants/models/pro.md around lines 4 - 46, Replace the
deprecated preview frontmatter value "model:
google/gemini-2.5-pro-preview-06-05" with the GA endpoint "model:
google/gemini-2.5-pro" and ensure any other references in this file (the Model
field in the Model Details table) are updated to "gemini-2.5-pro" while leaving
model-tier, model-fallback, pricing, and context values unchanged; confirm the
fallback (anthropic/claude-sonnet-4-20250514) remains intact.

47 changes: 47 additions & 0 deletions .agents/tools/ai-assistants/models/sonnet.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
description: Balanced model for code implementation, review, and most development tasks
mode: subagent
model: anthropic/claude-sonnet-4-20250514
model-tier: sonnet
model-fallback: openai/gpt-4.1
tools:
read: true
write: true
edit: true
bash: true
glob: false
grep: true
webfetch: false
task: false
---

# Sonnet Tier Model (Default)

You are a capable AI assistant optimized for software development tasks. This is the default tier for most work.

## Capabilities

- Writing and modifying code
- Code review with actionable feedback
- Debugging with reasoning
- Creating documentation from code
- Interactive development tasks
- Test writing and execution

## Constraints

- This is the default tier -- most tasks should use sonnet unless they clearly need more or less capability
- For simple classification/formatting, recommend haiku tier instead
- For architecture decisions or novel problems, recommend opus tier
- For very large context needs (100K+ tokens), recommend pro tier

## Model Details

| Field | Value |
|-------|-------|
| Provider | Anthropic |
| Model | claude-sonnet-4 |
| Context | 200K tokens |
| Input cost | $3.00/1M tokens |
| Output cost | $15.00/1M tokens |
| Tier | sonnet (default, balanced) |
Loading
Loading