-
Notifications
You must be signed in to change notification settings - Fork 6
fix(t320): wire escalate_model_on_failure, improve classifier, add auto-upgrade safeguard #1257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…s, improve classifier, add auto-upgrade safeguard - Wire escalate_model_on_failure() (t314) into 3 failure paths where it was defined but never called (dead code): 1. Retry path: escalate before re-prompt so retry uses stronger model 2. Hung worker kill: escalate before self-heal retry 3. DB orphan: escalate before self-heal retry - Improve classify_task_complexity() with 15 new opus-tier patterns: pre-commit hooks, CI checks/workflows, GitHub Actions, edge cases, enforcement, guards, wiring, end-to-end testing, multi-file changes, module extraction, supervisor work, diff/staged parsing - Add pre-check for module-level operations that would otherwise match sonnet-tier extract.*function pattern - Add auto-upgrade safeguard in cmd_add(): when a task has explicit model:sonnet but classify_task_complexity() recommends opus, auto-upgrade to opus with a logged warning. Safety-first: complex tasks should never run on weaker models just because they were manually tagged wrong.
Summary of ChangesHello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the robustness and intelligence of the task management system by implementing automatic model escalation for failed tasks and improving the task complexity classification. It ensures that complex tasks are always routed to appropriate, more powerful models, even if initially misconfigured, thereby reducing failures and improving overall system efficiency and reliability. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Thu Feb 12 13:57:57 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively wires up the escalate_model_on_failure function, improves the task classifier, and adds a valuable safeguard to prevent complex tasks from being run with underpowered models. The changes align well with making the supervisor more robust and intelligent. My review includes a few suggestions to improve comment and log clarity, ensure adherence to the repository's shell scripting style guide, and correct a potential logic inconsistency in one of the new failure handling paths.
.agents/scripts/supervisor-helper.sh
Outdated
| attempt_self_heal "$orphan_id" "failed" "No worker process found" "${batch_id:-}" 2>>"$SUPERVISOR_LOG" || true | ||
| # Auto-escalate model on failure before self-heal retry (t314 wiring) | ||
| escalate_model_on_failure "$orphan_id" 2>>"$SUPERVISOR_LOG" || true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call to escalate_model_on_failure for DB orphan tasks occurs after attempt_self_heal. This is inconsistent with the handling of hung workers (line 10477), where escalation happens before the self-heal attempt. The accompanying comment also states the escalation happens 'before self-heal retry'.
To ensure the model is escalated before a retry is queued, and for consistency across failure paths, this call should be moved before attempt_self_heal.
| attempt_self_heal "$orphan_id" "failed" "No worker process found" "${batch_id:-}" 2>>"$SUPERVISOR_LOG" || true | |
| # Auto-escalate model on failure before self-heal retry (t314 wiring) | |
| escalate_model_on_failure "$orphan_id" 2>>"$SUPERVISOR_LOG" || true | |
| # Auto-escalate model on failure before self-heal retry (t314 wiring) | |
| escalate_model_on_failure "$orphan_id" 2>>"$SUPERVISOR_LOG" || true | |
| attempt_self_heal "$orphan_id" "failed" "No worker process found" "${batch_id:-}" 2>>"$SUPERVISOR_LOG" || true |
.agents/scripts/supervisor-helper.sh
Outdated
| # Model routing safeguard: warn if explicit model:sonnet conflicts with complexity classifier | ||
| # This catches tasks that are tagged model:sonnet but are actually complex enough for opus. | ||
| # The warning is logged but the explicit model is respected (user intent takes priority). | ||
| if [[ -n "$description" && "$model" != "anthropic/claude-opus-4-6" && "$model" != "opus" ]]; then | ||
| local auto_tier | ||
| auto_tier=$(classify_task_complexity "$description" "" 2>/dev/null || echo "") | ||
| if [[ "$auto_tier" == "opus" ]]; then | ||
| log_warn "Task $task_id: explicit model:$model but classifier recommends opus — consider upgrading for complex tasks" | ||
| # Auto-upgrade to opus when classifier disagrees with explicit sonnet (safety-first) | ||
| model="opus" | ||
| log_info "Task $task_id: auto-upgraded to model:opus (classifier override)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on lines 1981-1983 and the warning log on line 1988 are inconsistent with the code's behavior. The comment states that the user's explicit model choice is respected, but the code proceeds to auto-upgrade it. The log message also suggests upgrading is optional ('consider upgrading') for a mandatory action.
To improve clarity and avoid confusion, please update the comment and log message to accurately reflect that the model is being automatically upgraded based on the classifier's recommendation.
| # Model routing safeguard: warn if explicit model:sonnet conflicts with complexity classifier | |
| # This catches tasks that are tagged model:sonnet but are actually complex enough for opus. | |
| # The warning is logged but the explicit model is respected (user intent takes priority). | |
| if [[ -n "$description" && "$model" != "anthropic/claude-opus-4-6" && "$model" != "opus" ]]; then | |
| local auto_tier | |
| auto_tier=$(classify_task_complexity "$description" "" 2>/dev/null || echo "") | |
| if [[ "$auto_tier" == "opus" ]]; then | |
| log_warn "Task $task_id: explicit model:$model but classifier recommends opus — consider upgrading for complex tasks" | |
| # Auto-upgrade to opus when classifier disagrees with explicit sonnet (safety-first) | |
| model="opus" | |
| log_info "Task $task_id: auto-upgraded to model:opus (classifier override)" | |
| # Model routing safeguard: auto-upgrade to opus if classifier disagrees with an explicit weaker model. | |
| # This prevents complex tasks from being dispatched to underpowered models. A warning is logged. | |
| if [[ -n "$description" && "$model" != "anthropic/claude-opus-4-6" && "$model" != "opus" ]]; then | |
| local auto_tier | |
| auto_tier=$(classify_task_complexity "$description" "" 2>/dev/null || echo "") | |
| if [[ "$auto_tier" == "opus" ]]; then | |
| log_warn "Task $task_id: Explicit model '$model' overridden. Classifier recommended 'opus' due to task complexity." | |
| # Auto-upgrade to opus when classifier disagrees with explicit sonnet (safety-first) | |
| model="opus" | |
| log_info "Task $task_id: auto-upgraded to model:opus (classifier override)" |
.agents/scripts/supervisor-helper.sh
Outdated
| # The warning is logged but the explicit model is respected (user intent takes priority). | ||
| if [[ -n "$description" && "$model" != "anthropic/claude-opus-4-6" && "$model" != "opus" ]]; then | ||
| local auto_tier | ||
| auto_tier=$(classify_task_complexity "$description" "" 2>/dev/null || echo "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of 2>/dev/null for blanket error suppression violates the repository style guide (rule #50), which states that 2>/dev/null is only acceptable when redirecting to log files. Any potential errors from classify_task_complexity are being silently discarded.
To adhere to the style guide and improve observability, please redirect stderr to the supervisor's log file instead.
| auto_tier=$(classify_task_complexity "$description" "" 2>/dev/null || echo "") | |
| auto_tier=$(classify_task_complexity "$description" "" 2>>"$SUPERVISOR_LOG" || echo "") |
References
- Rule docs: update branch creation to recommend worktrees for parallel sessions #50:
2>/dev/nullis acceptable ONLY when redirecting to log files, not blanket suppression. (link)
… redirect - Move escalate_model_on_failure before attempt_self_heal in DB orphan path for consistency with hung worker path (Gemini high-priority) - Fix comment and log message to accurately reflect auto-upgrade behavior instead of suggesting it's optional (Gemini medium-priority) - Redirect classify_task_complexity stderr to SUPERVISOR_LOG instead of /dev/null per style guide rule #50 (Gemini medium-priority)
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Thu Feb 12 14:04:10 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|



Summary
escalate_model_on_failure()(t314) was defined but never called. Now wired into all 3 failure paths (retry, hung worker kill, DB orphan) so failed tasks auto-escalate to stronger models before retryclassify_task_complexity()for pre-commit hooks, CI workflows, GitHub Actions, enforcement tasks, module extraction, supervisor work, etc. Added pre-check for module-level operations that would otherwise match sonnet-tier patternscmd_add()receives a task with explicitmodel:sonnetbut the classifier recommends opus, it auto-upgrades to opus with a logged warning. This prevents complex tasks from being dispatched to underpowered modelsWhat this ensures going forward
Testing
bash -nsyntax check: PASSshellcheck -S warning: Only pre-existing SC2034 warnings (unused variables)Closes #1256