Skip to content

Conversation

@marcusquinn
Copy link
Owner

Summary

  • Wire dead code: escalate_model_on_failure() (t314) was defined but never called. Now wired into all 3 failure paths (retry, hung worker kill, DB orphan) so failed tasks auto-escalate to stronger models before retry
  • Improve classifier: Added 15 new opus-tier patterns to classify_task_complexity() for pre-commit hooks, CI workflows, GitHub Actions, enforcement tasks, module extraction, supervisor work, etc. Added pre-check for module-level operations that would otherwise match sonnet-tier patterns
  • Auto-upgrade safeguard: When cmd_add() receives a task with explicit model:sonnet but the classifier recommends opus, it auto-upgrades to opus with a logged warning. This prevents complex tasks from being dispatched to underpowered models
  • Upgraded queued tasks: Changed 6 queued tasks in supervisor DB from sonnet to opus. Updated TODO.md model tags for all complex t316-t319 tasks

What this ensures going forward

  1. Failed tasks escalate automatically — retry on sonnet fails → next retry uses opus
  2. Complex tasks can't sneak through as sonnet — the classifier catches them at add time
  3. Pattern-based routing improves over time — new patterns cover real-world task types we've seen fail on sonnet

Testing

  • bash -n syntax check: PASS
  • shellcheck -S warning: Only pre-existing SC2034 warnings (unused variables)
  • Classifier tested against 20 real task descriptions — correctly routes 18/20 to opus (2 correctly stay sonnet: AGENTS.md update, workflow docs update)

Closes #1256

…s, improve classifier, add auto-upgrade safeguard

- Wire escalate_model_on_failure() (t314) into 3 failure paths where it was
  defined but never called (dead code):
  1. Retry path: escalate before re-prompt so retry uses stronger model
  2. Hung worker kill: escalate before self-heal retry
  3. DB orphan: escalate before self-heal retry

- Improve classify_task_complexity() with 15 new opus-tier patterns:
  pre-commit hooks, CI checks/workflows, GitHub Actions, edge cases,
  enforcement, guards, wiring, end-to-end testing, multi-file changes,
  module extraction, supervisor work, diff/staged parsing

- Add pre-check for module-level operations that would otherwise match
  sonnet-tier extract.*function pattern

- Add auto-upgrade safeguard in cmd_add(): when a task has explicit
  model:sonnet but classify_task_complexity() recommends opus, auto-upgrade
  to opus with a logged warning. Safety-first: complex tasks should never
  run on weaker models just because they were manually tagged wrong.
@gemini-code-assist
Copy link

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness and intelligence of the task management system by implementing automatic model escalation for failed tasks and improving the task complexity classification. It ensures that complex tasks are always routed to appropriate, more powerful models, even if initially misconfigured, thereby reducing failures and improving overall system efficiency and reliability.

Highlights

  • Model Escalation Wiring: The escalate_model_on_failure() function, previously defined but unused, has been integrated into three critical failure paths: task retry, hung worker kill, and database orphan detection. This ensures that tasks automatically escalate to stronger models (e.g., Opus) upon failure before being retried.
  • Improved Task Complexity Classifier: The classify_task_complexity() function has been enhanced with 15 new Opus-tier patterns, covering common complex tasks like pre-commit hooks, CI workflows, GitHub Actions, and module extraction. A pre-check was also added to correctly classify module-level operations as Opus-tier, preventing misclassification.
  • Auto-Upgrade Safeguard for Task Dispatch: A new safeguard in cmd_add() automatically upgrades a task's model to Opus if it was explicitly set to Sonnet but the complexity classifier determines it requires an Opus-tier model. This prevents complex tasks from being assigned to underpowered models, logging a warning while prioritizing safety.
Changelog
  • .agents/scripts/supervisor-helper.sh
    • Added a model routing safeguard in cmd_add() to auto-upgrade tasks from Sonnet to Opus if the classifier recommends a stronger model.
    • Introduced a pre-check in classify_task_complexity() to correctly identify module-level operations as Opus-tier tasks.
    • Expanded the opus_patterns array in classify_task_complexity() with 15 new patterns for improved classification accuracy.
    • Wired escalate_model_on_failure() into the task retry logic within cmd_pulse().
    • Integrated escalate_model_on_failure() into the hung worker process cleanup logic in cmd_pulse().
    • Applied escalate_model_on_failure() to the database orphan task handling in cmd_pulse().
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 15 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Feb 12 13:57:54 UTC 2026: Code review monitoring started
Thu Feb 12 13:57:55 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 15

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 15
  • VULNERABILITIES: 0

Generated on: Thu Feb 12 13:57:57 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively wires up the escalate_model_on_failure function, improves the task classifier, and adds a valuable safeguard to prevent complex tasks from being run with underpowered models. The changes align well with making the supervisor more robust and intelligent. My review includes a few suggestions to improve comment and log clarity, ensure adherence to the repository's shell scripting style guide, and correct a potential logic inconsistency in one of the new failure handling paths.

Comment on lines 10497 to 10499
attempt_self_heal "$orphan_id" "failed" "No worker process found" "${batch_id:-}" 2>>"$SUPERVISOR_LOG" || true
# Auto-escalate model on failure before self-heal retry (t314 wiring)
escalate_model_on_failure "$orphan_id" 2>>"$SUPERVISOR_LOG" || true

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The call to escalate_model_on_failure for DB orphan tasks occurs after attempt_self_heal. This is inconsistent with the handling of hung workers (line 10477), where escalation happens before the self-heal attempt. The accompanying comment also states the escalation happens 'before self-heal retry'.

To ensure the model is escalated before a retry is queued, and for consistency across failure paths, this call should be moved before attempt_self_heal.

Suggested change
attempt_self_heal "$orphan_id" "failed" "No worker process found" "${batch_id:-}" 2>>"$SUPERVISOR_LOG" || true
# Auto-escalate model on failure before self-heal retry (t314 wiring)
escalate_model_on_failure "$orphan_id" 2>>"$SUPERVISOR_LOG" || true
# Auto-escalate model on failure before self-heal retry (t314 wiring)
escalate_model_on_failure "$orphan_id" 2>>"$SUPERVISOR_LOG" || true
attempt_self_heal "$orphan_id" "failed" "No worker process found" "${batch_id:-}" 2>>"$SUPERVISOR_LOG" || true

Comment on lines 1981 to 1991
# Model routing safeguard: warn if explicit model:sonnet conflicts with complexity classifier
# This catches tasks that are tagged model:sonnet but are actually complex enough for opus.
# The warning is logged but the explicit model is respected (user intent takes priority).
if [[ -n "$description" && "$model" != "anthropic/claude-opus-4-6" && "$model" != "opus" ]]; then
local auto_tier
auto_tier=$(classify_task_complexity "$description" "" 2>/dev/null || echo "")
if [[ "$auto_tier" == "opus" ]]; then
log_warn "Task $task_id: explicit model:$model but classifier recommends opus — consider upgrading for complex tasks"
# Auto-upgrade to opus when classifier disagrees with explicit sonnet (safety-first)
model="opus"
log_info "Task $task_id: auto-upgraded to model:opus (classifier override)"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment on lines 1981-1983 and the warning log on line 1988 are inconsistent with the code's behavior. The comment states that the user's explicit model choice is respected, but the code proceeds to auto-upgrade it. The log message also suggests upgrading is optional ('consider upgrading') for a mandatory action.

To improve clarity and avoid confusion, please update the comment and log message to accurately reflect that the model is being automatically upgraded based on the classifier's recommendation.

Suggested change
# Model routing safeguard: warn if explicit model:sonnet conflicts with complexity classifier
# This catches tasks that are tagged model:sonnet but are actually complex enough for opus.
# The warning is logged but the explicit model is respected (user intent takes priority).
if [[ -n "$description" && "$model" != "anthropic/claude-opus-4-6" && "$model" != "opus" ]]; then
local auto_tier
auto_tier=$(classify_task_complexity "$description" "" 2>/dev/null || echo "")
if [[ "$auto_tier" == "opus" ]]; then
log_warn "Task $task_id: explicit model:$model but classifier recommends opus — consider upgrading for complex tasks"
# Auto-upgrade to opus when classifier disagrees with explicit sonnet (safety-first)
model="opus"
log_info "Task $task_id: auto-upgraded to model:opus (classifier override)"
# Model routing safeguard: auto-upgrade to opus if classifier disagrees with an explicit weaker model.
# This prevents complex tasks from being dispatched to underpowered models. A warning is logged.
if [[ -n "$description" && "$model" != "anthropic/claude-opus-4-6" && "$model" != "opus" ]]; then
local auto_tier
auto_tier=$(classify_task_complexity "$description" "" 2>/dev/null || echo "")
if [[ "$auto_tier" == "opus" ]]; then
log_warn "Task $task_id: Explicit model '$model' overridden. Classifier recommended 'opus' due to task complexity."
# Auto-upgrade to opus when classifier disagrees with explicit sonnet (safety-first)
model="opus"
log_info "Task $task_id: auto-upgraded to model:opus (classifier override)"

# The warning is logged but the explicit model is respected (user intent takes priority).
if [[ -n "$description" && "$model" != "anthropic/claude-opus-4-6" && "$model" != "opus" ]]; then
local auto_tier
auto_tier=$(classify_task_complexity "$description" "" 2>/dev/null || echo "")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of 2>/dev/null for blanket error suppression violates the repository style guide (rule #50), which states that 2>/dev/null is only acceptable when redirecting to log files. Any potential errors from classify_task_complexity are being silently discarded.

To adhere to the style guide and improve observability, please redirect stderr to the supervisor's log file instead.

Suggested change
auto_tier=$(classify_task_complexity "$description" "" 2>/dev/null || echo "")
auto_tier=$(classify_task_complexity "$description" "" 2>>"$SUPERVISOR_LOG" || echo "")
References
  1. Rule docs: update branch creation to recommend worktrees for parallel sessions #50: 2>/dev/null is acceptable ONLY when redirecting to log files, not blanket suppression. (link)

… redirect

- Move escalate_model_on_failure before attempt_self_heal in DB orphan path
  for consistency with hung worker path (Gemini high-priority)
- Fix comment and log message to accurately reflect auto-upgrade behavior
  instead of suggesting it's optional (Gemini medium-priority)
- Redirect classify_task_complexity stderr to SUPERVISOR_LOG instead of
  /dev/null per style guide rule #50 (Gemini medium-priority)
@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 15 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Feb 12 14:04:07 UTC 2026: Code review monitoring started
Thu Feb 12 14:04:08 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 15

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 15
  • VULNERABILITIES: 0

Generated on: Thu Feb 12 14:04:10 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link

@marcusquinn marcusquinn merged commit c6013ed into main Feb 12, 2026
10 checks passed
marcusquinn added a commit that referenced this pull request Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

t320: Fix model routing: wire escalate_model_on_failure, improve classifier, add auto-upgrade safeguard

1 participant