feat: auto-escalate model on worker failure + extend timeouts (t314)#1215
feat: auto-escalate model on worker failure + extend timeouts (t314)#1215marcusquinn merged 1 commit intomainfrom
Conversation
… (t314) When workers fail (hung, crashed, max runtime), automatically escalate the task's model to the next tier via get_next_tier() before re-queuing. Previously, retries repeated with the same underpowered model. Now: sonnet failures auto-escalate to opus, haiku to sonnet, etc. Also doubles worker timeouts: - Hung detection: 30min -> 1h (SUPERVISOR_WORKER_TIMEOUT) - Max runtime: 2h -> 4h (SUPERVISOR_WORKER_MAX_RUNTIME) Complex refactoring tasks (t311.2, t311.3) consistently hit the 30min hung timeout. The new defaults give workers adequate time while still catching truly stuck processes.
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Thu Feb 12 03:52:22 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|



Summary
Problem
Workers dispatched with
model:sonnetfor complex tasks (t311.2, t311.3, t303) consistently hung at the 30min timeout. On retry, the same sonnet model was used again, producing the same failure. The existing quality gate escalation only triggers for completed tasks that fail quality checks — it never fires for hung/crashed workers.Changes
SUPERVISOR_WORKER_TIMEOUTSUPERVISOR_WORKER_MAX_RUNTIMEget_next_tier()New function
escalate_model_on_failure()is called fromattempt_self_heal()before creating diagnostic subtasks. Uses existingescalation_depth/max_escalationcolumns (from t132.6) to prevent infinite escalation.Testing
bash -nsyntax check: PASSRef #1212