fix(t320): wire escalate_model_on_failure, improve classifier, add auto-upgrade safeguard #1257

marcusquinn · 2026-02-12T13:57:24Z

Summary

Wire dead code: escalate_model_on_failure() (t314) was defined but never called. Now wired into all 3 failure paths (retry, hung worker kill, DB orphan) so failed tasks auto-escalate to stronger models before retry
Improve classifier: Added 15 new opus-tier patterns to classify_task_complexity() for pre-commit hooks, CI workflows, GitHub Actions, enforcement tasks, module extraction, supervisor work, etc. Added pre-check for module-level operations that would otherwise match sonnet-tier patterns
Auto-upgrade safeguard: When cmd_add() receives a task with explicit model:sonnet but the classifier recommends opus, it auto-upgrades to opus with a logged warning. This prevents complex tasks from being dispatched to underpowered models
Upgraded queued tasks: Changed 6 queued tasks in supervisor DB from sonnet to opus. Updated TODO.md model tags for all complex t316-t319 tasks

What this ensures going forward

Failed tasks escalate automatically — retry on sonnet fails → next retry uses opus
Complex tasks can't sneak through as sonnet — the classifier catches them at add time
Pattern-based routing improves over time — new patterns cover real-world task types we've seen fail on sonnet

Testing

bash -n syntax check: PASS
shellcheck -S warning: Only pre-existing SC2034 warnings (unused variables)
Classifier tested against 20 real task descriptions — correctly routes 18/20 to opus (2 correctly stay sonnet: AGENTS.md update, workflow docs update)

…s, improve classifier, add auto-upgrade safeguard - Wire escalate_model_on_failure() (t314) into 3 failure paths where it was defined but never called (dead code): 1. Retry path: escalate before re-prompt so retry uses stronger model 2. Hung worker kill: escalate before self-heal retry 3. DB orphan: escalate before self-heal retry - Improve classify_task_complexity() with 15 new opus-tier patterns: pre-commit hooks, CI checks/workflows, GitHub Actions, edge cases, enforcement, guards, wiring, end-to-end testing, multi-file changes, module extraction, supervisor work, diff/staged parsing - Add pre-check for module-level operations that would otherwise match sonnet-tier extract.*function pattern - Add auto-upgrade safeguard in cmd_add(): when a task has explicit model:sonnet but classify_task_complexity() recommends opus, auto-upgrade to opus with a logged warning. Safety-first: complex tasks should never run on weaker models just because they were manually tagged wrong.

gemini-code-assist · 2026-02-12T13:57:38Z

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness and intelligence of the task management system by implementing automatic model escalation for failed tasks and improving the task complexity classification. It ensures that complex tasks are always routed to appropriate, more powerful models, even if initially misconfigured, thereby reducing failures and improving overall system efficiency and reliability.

Highlights

Model Escalation Wiring: The escalate_model_on_failure() function, previously defined but unused, has been integrated into three critical failure paths: task retry, hung worker kill, and database orphan detection. This ensures that tasks automatically escalate to stronger models (e.g., Opus) upon failure before being retried.
Improved Task Complexity Classifier: The classify_task_complexity() function has been enhanced with 15 new Opus-tier patterns, covering common complex tasks like pre-commit hooks, CI workflows, GitHub Actions, and module extraction. A pre-check was also added to correctly classify module-level operations as Opus-tier, preventing misclassification.
Auto-Upgrade Safeguard for Task Dispatch: A new safeguard in cmd_add() automatically upgrades a task's model to Opus if it was explicitly set to Sonnet but the complexity classifier determines it requires an Opus-tier model. This prevents complex tasks from being assigned to underpowered models, logging a warning while prioritizing safety.

Changelog

.agents/scripts/supervisor-helper.sh
- Added a model routing safeguard in cmd_add() to auto-upgrade tasks from Sonnet to Opus if the classifier recommends a stronger model.
- Introduced a pre-check in classify_task_complexity() to correctly identify module-level operations as Opus-tier tasks.
- Expanded the opus_patterns array in classify_task_complexity() with 15 new patterns for improved classification accuracy.
- Wired escalate_model_on_failure() into the task retry logic within cmd_pulse().
- Integrated escalate_model_on_failure() into the hung worker process cleanup logic in cmd_pulse().
- Applied escalate_model_on_failure() to the database orphan task handling in cmd_pulse().

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2026-02-12T13:57:59Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 15 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Feb 12 13:57:54 UTC 2026: Code review monitoring started
Thu Feb 12 13:57:55 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 15

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 15
VULNERABILITIES: 0

Generated on: Thu Feb 12 13:57:57 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

gemini-code-assist

Code Review

This pull request effectively wires up the escalate_model_on_failure function, improves the task classifier, and adds a valuable safeguard to prevent complex tasks from being run with underpowered models. The changes align well with making the supervisor more robust and intelligent. My review includes a few suggestions to improve comment and log clarity, ensure adherence to the repository's shell scripting style guide, and correct a potential logic inconsistency in one of the new failure handling paths.

gemini-code-assist · 2026-02-12T14:02:19Z

.agents/scripts/supervisor-helper.sh

 					attempt_self_heal "$orphan_id" "failed" "No worker process found" "${batch_id:-}" 2>>"$SUPERVISOR_LOG" || true
+					# Auto-escalate model on failure before self-heal retry (t314 wiring)
+					escalate_model_on_failure "$orphan_id" 2>>"$SUPERVISOR_LOG" || true


The call to escalate_model_on_failure for DB orphan tasks occurs after attempt_self_heal. This is inconsistent with the handling of hung workers (line 10477), where escalation happens before the self-heal attempt. The accompanying comment also states the escalation happens 'before self-heal retry'.

To ensure the model is escalated before a retry is queued, and for consistency across failure paths, this call should be moved before attempt_self_heal.

Suggested change

attempt_self_heal "$orphan_id" "failed" "No worker process found" "${batch_id:-}" 2>>"$SUPERVISOR_LOG" || true

# Auto-escalate model on failure before self-heal retry (t314 wiring)

escalate_model_on_failure "$orphan_id" 2>>"$SUPERVISOR_LOG" || true

# Auto-escalate model on failure before self-heal retry (t314 wiring)

escalate_model_on_failure "$orphan_id" 2>>"$SUPERVISOR_LOG" || true

attempt_self_heal "$orphan_id" "failed" "No worker process found" "${batch_id:-}" 2>>"$SUPERVISOR_LOG" || true

gemini-code-assist · 2026-02-12T14:02:20Z

.agents/scripts/supervisor-helper.sh

+	# Model routing safeguard: warn if explicit model:sonnet conflicts with complexity classifier
+	# This catches tasks that are tagged model:sonnet but are actually complex enough for opus.
+	# The warning is logged but the explicit model is respected (user intent takes priority).
+	if [[ -n "$description" && "$model" != "anthropic/claude-opus-4-6" && "$model" != "opus" ]]; then
+		local auto_tier
+		auto_tier=$(classify_task_complexity "$description" "" 2>/dev/null || echo "")
+		if [[ "$auto_tier" == "opus" ]]; then
+			log_warn "Task $task_id: explicit model:$model but classifier recommends opus — consider upgrading for complex tasks"
+			# Auto-upgrade to opus when classifier disagrees with explicit sonnet (safety-first)
+			model="opus"
+			log_info "Task $task_id: auto-upgraded to model:opus (classifier override)"


The comment on lines 1981-1983 and the warning log on line 1988 are inconsistent with the code's behavior. The comment states that the user's explicit model choice is respected, but the code proceeds to auto-upgrade it. The log message also suggests upgrading is optional ('consider upgrading') for a mandatory action.

To improve clarity and avoid confusion, please update the comment and log message to accurately reflect that the model is being automatically upgraded based on the classifier's recommendation.

Suggested change

# Model routing safeguard: warn if explicit model:sonnet conflicts with complexity classifier

# This catches tasks that are tagged model:sonnet but are actually complex enough for opus.

# The warning is logged but the explicit model is respected (user intent takes priority).

if [[ -n "$description" && "$model" != "anthropic/claude-opus-4-6" && "$model" != "opus" ]]; then

local auto_tier

auto_tier=$(classify_task_complexity "$description" "" 2>/dev/null || echo "")

if [[ "$auto_tier" == "opus" ]]; then

log_warn "Task $task_id: explicit model:$model but classifier recommends opus — consider upgrading for complex tasks"

# Auto-upgrade to opus when classifier disagrees with explicit sonnet (safety-first)

model="opus"

log_info "Task $task_id: auto-upgraded to model:opus (classifier override)"

# Model routing safeguard: auto-upgrade to opus if classifier disagrees with an explicit weaker model.

# This prevents complex tasks from being dispatched to underpowered models. A warning is logged.

if [[ -n "$description" && "$model" != "anthropic/claude-opus-4-6" && "$model" != "opus" ]]; then

local auto_tier

auto_tier=$(classify_task_complexity "$description" "" 2>/dev/null || echo "")

if [[ "$auto_tier" == "opus" ]]; then

log_warn "Task $task_id: Explicit model '$model' overridden. Classifier recommended 'opus' due to task complexity."

# Auto-upgrade to opus when classifier disagrees with explicit sonnet (safety-first)

model="opus"

log_info "Task $task_id: auto-upgraded to model:opus (classifier override)"

gemini-code-assist · 2026-02-12T14:02:20Z

.agents/scripts/supervisor-helper.sh

+	# The warning is logged but the explicit model is respected (user intent takes priority).
+	if [[ -n "$description" && "$model" != "anthropic/claude-opus-4-6" && "$model" != "opus" ]]; then
+		local auto_tier
+		auto_tier=$(classify_task_complexity "$description" "" 2>/dev/null || echo "")


The use of 2>/dev/null for blanket error suppression violates the repository style guide (rule #50), which states that 2>/dev/null is only acceptable when redirecting to log files. Any potential errors from classify_task_complexity are being silently discarded.

To adhere to the style guide and improve observability, please redirect stderr to the supervisor's log file instead.

Suggested change

auto_tier=$(classify_task_complexity "$description" "" 2>/dev/null || echo "")

auto_tier=$(classify_task_complexity "$description" "" 2>>"$SUPERVISOR_LOG" || echo "")

References

Rule docs: update branch creation to recommend worktrees for parallel sessions #50: 2>/dev/null is acceptable ONLY when redirecting to log files, not blanket suppression. ^(link)

… redirect - Move escalate_model_on_failure before attempt_self_heal in DB orphan path for consistency with hung worker path (Gemini high-priority) - Fix comment and log message to accurately reflect auto-upgrade behavior instead of suggesting it's optional (Gemini medium-priority) - Redirect classify_task_complexity stderr to SUPERVISOR_LOG instead of /dev/null per style guide rule #50 (Gemini medium-priority)

github-actions · 2026-02-12T14:04:11Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 15 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Feb 12 14:04:07 UTC 2026: Code review monitoring started
Thu Feb 12 14:04:08 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 15

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 15
VULNERABILITIES: 0

Generated on: Thu Feb 12 14:04:10 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

sonarqubecloud · 2026-02-12T14:04:57Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

…s for t318/t319

gemini-code-assist bot reviewed Feb 12, 2026

View reviewed changes

marcusquinn merged commit c6013ed into main Feb 12, 2026
10 checks passed

marcusquinn added a commit that referenced this pull request Feb 12, 2026

chore: mark t320 complete (pr:#1257), remove merge conflict duplicate…

4533fdd

…s for t318/t319

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(t320): wire escalate_model_on_failure, improve classifier, add auto-upgrade safeguard #1257

fix(t320): wire escalate_model_on_failure, improve classifier, add auto-upgrade safeguard #1257

Uh oh!

marcusquinn commented Feb 12, 2026

Uh oh!

gemini-code-assist bot commented Feb 12, 2026

Uh oh!

github-actions bot commented Feb 12, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 12, 2026

Uh oh!

gemini-code-assist bot Feb 12, 2026

Uh oh!

gemini-code-assist bot Feb 12, 2026

Uh oh!

github-actions bot commented Feb 12, 2026

Uh oh!

sonarqubecloud bot commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	auto_tier=$(classify_task_complexity "$description" "" 2>/dev/null \|\| echo "")
	auto_tier=$(classify_task_complexity "$description" "" 2>>"$SUPERVISOR_LOG" \|\| echo "")

fix(t320): wire escalate_model_on_failure, improve classifier, add auto-upgrade safeguard #1257

fix(t320): wire escalate_model_on_failure, improve classifier, add auto-upgrade safeguard #1257

Uh oh!

Conversation

marcusquinn commented Feb 12, 2026

Summary

What this ensures going forward

Testing

Uh oh!

gemini-code-assist bot commented Feb 12, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot commented Feb 12, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 12, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

sonarqubecloud bot commented Feb 12, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant