feat: auto-escalate model on worker failure + extend timeouts (t314) by marcusquinn · Pull Request #1215 · marcusquinn/aidevops

marcusquinn · 2026-02-12T03:51:52Z

Summary

Auto-escalate task model to next tier (e.g., sonnet→opus) when workers fail, so retries use a more capable model
Increase hung worker timeout from 30min to 1h and max runtime from 2h to 4h
Records failure patterns for future routing decisions

Problem

Workers dispatched with model:sonnet for complex tasks (t311.2, t311.3, t303) consistently hung at the 30min timeout. On retry, the same sonnet model was used again, producing the same failure. The existing quality gate escalation only triggers for completed tasks that fail quality checks — it never fires for hung/crashed workers.

Changes

Change	Before	After
`SUPERVISOR_WORKER_TIMEOUT`	1800s (30min)	3600s (1h)
`SUPERVISOR_WORKER_MAX_RUNTIME`	7200s (2h)	14400s (4h)
Model on failure	Same model reused	Auto-escalate via `get_next_tier()`

New function escalate_model_on_failure() is called from attempt_self_heal() before creating diagnostic subtasks. Uses existing escalation_depth/max_escalation columns (from t132.6) to prevent infinite escalation.

Testing

bash -n syntax check: PASS
ShellCheck: no new warnings
Diff: 90 insertions, 3 deletions

Ref #1212

… (t314) When workers fail (hung, crashed, max runtime), automatically escalate the task's model to the next tier via get_next_tier() before re-queuing. Previously, retries repeated with the same underpowered model. Now: sonnet failures auto-escalate to opus, haiku to sonnet, etc. Also doubles worker timeouts: - Hung detection: 30min -> 1h (SUPERVISOR_WORKER_TIMEOUT) - Max runtime: 2h -> 4h (SUPERVISOR_WORKER_MAX_RUNTIME) Complex refactoring tasks (t311.2, t311.3) consistently hit the 30min hung timeout. The new defaults give workers adequate time while still catching truly stuck processes.

gemini-code-assist · 2026-02-12T03:51:56Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

coderabbitai · 2026-02-12T03:52:02Z

Warning

Rate limit exceeded

@marcusquinn has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 18 minutes and 56 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/t314

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-02-12T03:52:24Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 15 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Feb 12 03:52:20 UTC 2026: Code review monitoring started
Thu Feb 12 03:52:20 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 15

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 15
VULNERABILITIES: 0

Generated on: Thu Feb 12 03:52:22 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

sonarqubecloud · 2026-02-12T03:53:04Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

t303 (#1216) and t311.2 (#1218) were branched before t314 (#1215) merged, so their squash-merges overwrote the timeout changes. Restoring: - SUPERVISOR_WORKER_TIMEOUT: 1800 -> 3600 (1h) - SUPERVISOR_WORKER_MAX_RUNTIME: 7200 -> 14400 (4h)

…es (#1219) t303 (#1216) and t311.2 (#1218) were branched before t314 (#1215) merged, so their squash-merges overwrote the timeout changes. Restoring: - SUPERVISOR_WORKER_TIMEOUT: 1800 -> 3600 (1h) - SUPERVISOR_WORKER_MAX_RUNTIME: 7200 -> 14400 (4h)

marcusquinn merged commit 73703d3 into main Feb 12, 2026
10 of 11 checks passed

marcusquinn added a commit that referenced this pull request Feb 12, 2026

chore: mark t314 complete in TODO.md (#1215)

c90bdf2

github-actions bot mentioned this pull request Feb 12, 2026

t314: Auto-escalate model to opus on worker failure + extend worker timeout #1214

Closed

marcusquinn mentioned this pull request Feb 12, 2026

hotfix: restore t314 timeout values overwritten by concurrent PR merges #1219

Merged

marcusquinn mentioned this pull request Feb 19, 2026

t1248: Fix success rate metric — exclude cancelled tasks from failure count #1983

Merged

marcusquinn deleted the feature/t314 branch February 21, 2026 01:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: auto-escalate model on worker failure + extend timeouts (t314)#1215

feat: auto-escalate model on worker failure + extend timeouts (t314)#1215
marcusquinn merged 1 commit intomainfrom
feature/t314

marcusquinn commented Feb 12, 2026

Uh oh!

gemini-code-assist bot commented Feb 12, 2026

Uh oh!

coderabbitai bot commented Feb 12, 2026

Rate limit exceeded

Uh oh!

github-actions bot commented Feb 12, 2026

Uh oh!

sonarqubecloud bot commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

marcusquinn commented Feb 12, 2026

Summary

Problem

Changes

Testing

Uh oh!

gemini-code-assist bot commented Feb 12, 2026

Uh oh!

coderabbitai bot commented Feb 12, 2026

Rate limit exceeded

Uh oh!

github-actions bot commented Feb 12, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

sonarqubecloud bot commented Feb 12, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant