-
Notifications
You must be signed in to change notification settings - Fork 6
fix(supervisor): health check 503 false positive on JSON timestamps #488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The bare '503' pattern in check_model_health() matched inside JSON
timestamps (e.g. 1770503633269), causing every health probe to fail
with 'provider error detected' even when the model responded correctly.
Replaced with anchored patterns ('HTTP 503', '503 Service',
'"status": 503') that only match actual HTTP 503 errors. Also
tightened 'over.*usage' to 'over[_ -]?usage' to prevent greedy
matching across JSON fields.
Verified: 5 test cases (2 true positives, 3 true negatives).
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
WalkthroughEnhanced the model health probe failure detection pattern in Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Sat Feb 7 22:54:04 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
TODO.md (1)
58-64:⚠️ Potential issue | 🟡 MinorTask hierarchy inconsistency: parent marked complete while subtasks remain pending.
Task t148 is marked as completed (
[x]withcompleted:2026-02-07), but all six subtasks (t148.1 through t148.6) remain uncompleted ([ ]). This violates task tracking hierarchy rules where a parent task should only be marked complete when all its subtasks are done.Either the subtasks should be marked complete if the work is finished, or the parent task should remain uncompleted until the subtasks are addressed.
🔧 Proposed fix
Option 1: If subtasks are actually complete, mark them:
-- [ ] t148.1 Add check_review_threads() to fetch unresolved threads via GraphQL ~1h blocked-by:none -- [ ] t148.2 Add triage_review_feedback() to classify threads by severity ~1.5h blocked-by:t148.1 -- [ ] t148.3 Add review_triage state to supervisor state machine ~30m blocked-by:t148.1 -- [ ] t148.4 Modify cmd_pr_lifecycle to include triage before merge ~1h blocked-by:t148.2,t148.3 -- [ ] t148.5 Add worker dispatch for fixing valid review feedback ~1.5h blocked-by:t148.4 -- [ ] t148.6 Add --skip-review-triage emergency bypass flag ~15m blocked-by:t148.4 +- [x] t148.1 Add check_review_threads() to fetch unresolved threads via GraphQL ~1h blocked-by:none completed:2026-02-07 +- [x] t148.2 Add triage_review_feedback() to classify threads by severity ~1.5h blocked-by:t148.1 completed:2026-02-07 +- [x] t148.3 Add review_triage state to supervisor state machine ~30m blocked-by:t148.1 completed:2026-02-07 +- [x] t148.4 Modify cmd_pr_lifecycle to include triage before merge ~1h blocked-by:t148.2,t148.3 completed:2026-02-07 +- [x] t148.5 Add worker dispatch for fixing valid review feedback ~1.5h blocked-by:t148.4 completed:2026-02-07 +- [x] t148.6 Add --skip-review-triage emergency bypass flag ~15m blocked-by:t148.4 completed:2026-02-07Option 2: If subtasks are not complete, unmark the parent:
-- [x] t148 Supervisor: add review-triage phase before PR merge `#plan` `#orchestration` `#quality` → [todo/PLANS.md] ~6h (ai:4h test:1.5h read:30m) logged:2026-02-07 ref:GH#437 completed:2026-02-07 +- [ ] t148 Supervisor: add review-triage phase before PR merge `#plan` `#orchestration` `#quality` → [todo/PLANS.md] ~6h (ai:4h test:1.5h read:30m) logged:2026-02-07 ref:GH#437
🤖 Fix all issues with AI agents
In @.agents/scripts/supervisor-helper.sh:
- Around line 1810-1813: The grep pattern in the probe check that inspects
"$probe_result" uses the non-portable escape \s* (see the if ... grep -qiE
line); change that to the POSIX-portable character class [[:space:]]* so the
pattern becomes portable on BSD/macOS and GNU grep while preserving the intended
matching of optional whitespace around the "status" JSON field.
Address CodeRabbit review: \s is GNU grep only, [[:space:]] works on macOS/BSD/Linux. Also resolves merge conflict with main.
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Sat Feb 7 23:14:40 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
…agnostic subtask newline parsing Bug 1: evaluate_worker() checked backend_error_count before exit_code==0, causing tasks with clean exits but backend error strings in log content to be classified as retry:backend_infrastructure_error. Moved backend error check inside the non-zero exit code guard (Tier 2 heuristics). Also moved backend_error_count grep to tail-only search with anchored 503 pattern (consistent with health check fix in PR #488). Bug 2: create_diagnostic_subtask() embedded raw log tail (with newlines) into the task description stored in SQLite. When cmd_next returned this as tab-separated output, embedded newlines caused EXIT:0 and DIAGNOSTIC_CONTEXT_END to be parsed as separate task IDs. Fixed by replacing newlines/tabs with spaces in failure_context before storage. Defense: Added task ID validation guard in both pulse dispatch loops to skip malformed IDs (empty, containing spaces/colons, non-alphanumeric).
…agnostic subtask newline parsing (#490) Bug 1: evaluate_worker() checked backend_error_count before exit_code==0, causing tasks with clean exits but backend error strings in log content to be classified as retry:backend_infrastructure_error. Moved backend error check inside the non-zero exit code guard (Tier 2 heuristics). Also moved backend_error_count grep to tail-only search with anchored 503 pattern (consistent with health check fix in PR #488). Bug 2: create_diagnostic_subtask() embedded raw log tail (with newlines) into the task description stored in SQLite. When cmd_next returned this as tab-separated output, embedded newlines caused EXIT:0 and DIAGNOSTIC_CONTEXT_END to be parsed as separate task IDs. Fixed by replacing newlines/tabs with spaces in failure_context before storage. Defense: Added task ID validation guard in both pulse dispatch loops to skip malformed IDs (empty, containing spaces/colons, non-alphanumeric).



Summary
503pattern incheck_model_health()matched inside JSON timestamps (e.g.1770503633269), causing every health probe to report provider error even when the model responded correctlyHTTP 503,503 Service,"status": 503) that only match actual HTTP errorsover.*usagetoover[_ -]?usageto prevent greedy matching across JSON fieldsRoot Cause
opencode run --format jsonreturns JSON lines with Unix timestamps. The timestamp1770503633269contains the substring503, which the oldgrep -qi '503'pattern matched.Testing
5 test cases verified:
1770503...→ no match (fixed)HTTP 503 Service Unavailable→ match (correct)All Antigravity endpoints failed→ match (correct)Quota protection: over usage→ match (correct)Summary by CodeRabbit