t233: Pre-dispatch model availability check with distinct exit codes#959
t233: Pre-dispatch model availability check with distinct exit codes#959marcusquinn merged 2 commits intomainfrom
Conversation
…(t233) - Enhanced check_model_health() to propagate exit codes from model-availability-helper.sh: exit 0=healthy, 1=unavailable, 2=rate-limited, 3=invalid-key/credits (previously collapsed all failures to exit 1) - Updated cmd_dispatch() to handle distinct codes with specific log messages - Updated cmd_reprompt() to handle distinct codes (still returns EX_TEMPFAIL=75) - Added missing health check to dispatch_review_fix_worker() (previously had none) - Added secondary provider check after model resolution in cmd_dispatch() (catches case where health tier provider differs from dispatch model provider) - Slow-path probe now distinguishes credits/billing (exit 3) from quota/rate-limit (exit 2) from generic provider errors (exit 1) Prevents wasted dispatch attempts when provider is rate-limited or down. Root cause: 9 failures from ambiguous_ai_unavailable + backend_quota_error were caused by workers spawned against unavailable providers.
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the robustness of AI model dispatch and retry mechanisms by introducing more granular availability checks. Previously, various provider failures were collapsed into a single error, leading to wasted worker attempts against unavailable or rate-limited AI services. The changes now allow the system to distinguish between transient issues like rate limiting, persistent problems like invalid API keys, and general unavailability, enabling more intelligent deferral or blocking of tasks. This reduces unnecessary resource consumption and provides clearer diagnostics for operational issues. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Tue Feb 10 17:50:41 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
There was a problem hiding this comment.
Code Review
This pull request is a significant improvement to the supervisor's robustness by introducing a pre-dispatch model availability check with distinct exit codes. Propagating specific failure reasons like rate-limiting and invalid API keys from check_model_health() will prevent wasting worker resources on doomed dispatches. The updates to cmd_dispatch(), cmd_reprompt(), and the addition of a missing check in dispatch_review_fix_worker() are well-executed and clearly address the problems outlined. My review includes a few suggestions to improve consistency and fully align the changes with the PR's goal of handling distinct exit codes, particularly in the new secondary provider check. I've also noted a few violations of the repository's shell scripting style guide regarding error stream handling. Overall, this is a valuable change that hardens the dispatch logic.
.agents/scripts/supervisor-helper.sh
Outdated
| "$availability_helper" check "$resolved_provider" --quiet 2>/dev/null || resolved_avail_exit=$? | ||
| if [[ "$resolved_avail_exit" -ne 0 ]]; then | ||
| log_warn "Resolved model provider '$resolved_provider' unavailable (exit $resolved_avail_exit) for $task_id — deferring dispatch" | ||
| return 3 | ||
| fi |
There was a problem hiding this comment.
This secondary availability check has two issues:
- It uses
2>/dev/nullfor blanket error suppression, which violates the repository style guide (rule docs: update branch creation to recommend worktrees for parallel sessions #50). - It doesn't differentiate between failure types from the availability helper (e.g., rate-limit vs. invalid key), treating all non-zero exits as "unavailable". This undermines the main goal of this PR, which is to propagate and handle distinct exit codes.
The logic should be expanded to handle distinct exit codes with specific logging, and stderr should not be blanket suppressed.
| "$availability_helper" check "$resolved_provider" --quiet 2>/dev/null || resolved_avail_exit=$? | |
| if [[ "$resolved_avail_exit" -ne 0 ]]; then | |
| log_warn "Resolved model provider '$resolved_provider' unavailable (exit $resolved_avail_exit) for $task_id — deferring dispatch" | |
| return 3 | |
| fi | |
| "$availability_helper" check "$resolved_provider" --quiet || resolved_avail_exit=$? | |
| if [[ "$resolved_avail_exit" -ne 0 ]]; then | |
| case "$resolved_avail_exit" in | |
| 2) | |
| log_warn "Resolved model provider '$resolved_provider' is rate-limited (exit $resolved_avail_exit) for $task_id — deferring dispatch" | |
| ;; | |
| 3) | |
| log_error "Resolved model provider '$resolved_provider' has invalid key/credits (exit $resolved_avail_exit) for $task_id — blocking dispatch" | |
| ;; | |
| *) | |
| log_warn "Resolved model provider '$resolved_provider' unavailable (exit $resolved_avail_exit) for $task_id — deferring dispatch" | |
| ;; | |
| esac | |
| return 3 | |
| fi |
References
- Rule docs: update branch creation to recommend worktrees for parallel sessions #50:
2>/dev/nullis acceptable ONLY when redirecting to log files, not blanket suppression. The code uses2>/dev/nullto suppress potential errors from the helper script without logging. (link) - In shell scripts, capture a command's exit code in a variable instead of using
$?directly in conditionals. This aligns with ShellCheck SC2181 and improves clarity for multi-way branches.
There was a problem hiding this comment.
Fixed in ed1fc41: Removed blanket 2>/dev/null and expanded the secondary availability check to handle distinct exit codes (2=rate-limited, 3=invalid-key/credits, *=generic unavailable) with specific logging for each. This now correctly propagates the exit code semantics that are the core goal of t233.
.agents/scripts/supervisor-helper.sh
Outdated
| if echo "$probe_result" | grep -qiE 'CreditsError|Insufficient balance' 2>/dev/null; then | ||
| log_warn "Model health check FAILED: billing/credits exhausted (slow path)" | ||
| return 3 # t233: credits = invalid key equivalent (won't resolve without human action) | ||
| fi | ||
| if echo "$probe_result" | grep -qiE 'Quota protection|over[_ -]?usage|quota reset|429|too many requests|rate.limit' 2>/dev/null; then | ||
| log_warn "Model health check FAILED: quota/rate limited (slow path)" | ||
| return 2 # t233: rate-limited = defer dispatch, retry soon | ||
| fi | ||
| if echo "$probe_result" | grep -qiE 'endpoints failed|"status":[[:space:]]*503|HTTP 503|503 Service|service unavailable' 2>/dev/null; then | ||
| log_warn "Model health check FAILED: provider error detected (slow path)" | ||
| return 1 | ||
| fi |
There was a problem hiding this comment.
The grep commands in this block use 2>/dev/null to suppress stderr. This is a violation of the repository style guide (rule #50), which prohibits blanket error suppression and allows 2>/dev/null only when redirecting to log files. While the risk of grep erroring here is low, adhering to the style guide improves script robustness and debuggability. Please consider removing the suppression or redirecting stderr to a log if appropriate.
References
- Rule docs: update branch creation to recommend worktrees for parallel sessions #50:
2>/dev/nullis acceptable ONLY when redirecting to log files, not blanket suppression. The code uses2>/dev/nullto suppress potential errors fromgrepwithout logging. (link)
There was a problem hiding this comment.
Fixed in ed1fc41: Removed 2>/dev/null from all three grep commands in the failure-pattern detection block. These pipe from echo so stderr errors are unlikely, but removing the suppression aligns with the style guide and improves debuggability.
| case "$health_exit" in | ||
| 2) log_warn "Provider rate-limited for $task_id review-fix — deferring to next pulse" ;; | ||
| 3) log_error "API key invalid/credits exhausted for $task_id review-fix" ;; | ||
| *) log_error "Provider unavailable for $task_id review-fix — deferring" ;; | ||
| esac |
There was a problem hiding this comment.
This case statement uses a compact, single-line format for each branch. For consistency with other case statements in this file and for better readability, please format it using multiple lines.
| case "$health_exit" in | |
| 2) log_warn "Provider rate-limited for $task_id review-fix — deferring to next pulse" ;; | |
| 3) log_error "API key invalid/credits exhausted for $task_id review-fix" ;; | |
| *) log_error "Provider unavailable for $task_id review-fix — deferring" ;; | |
| esac | |
| case "$health_exit" in | |
| 2) | |
| log_warn "Provider rate-limited for $task_id review-fix — deferring to next pulse" | |
| ;; | |
| 3) | |
| log_error "API key invalid/credits exhausted for $task_id review-fix" | |
| ;; | |
| *) | |
| log_error "Provider unavailable for $task_id review-fix — deferring" | |
| ;; | |
| esac |
There was a problem hiding this comment.
Fixed in ed1fc41: Reformatted the compact single-line case branches to multi-line format, consistent with the rest of the file (e.g., the dispatch_worker case statement at line ~4615).
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Tue Feb 10 18:00:58 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|



Summary
check_model_health()to propagate distinct exit codes frommodel-availability-helper.sh(0=healthy, 1=unavailable, 2=rate-limited, 3=invalid-key) instead of collapsing all failures to exit 1cmd_dispatch()andcmd_reprompt()to handle distinct codes with specific log messages and appropriate defer/block behaviordispatch_review_fix_worker()(previously spawned workers with no availability check at all)cmd_dispatch()to catch cases where the dispatch model's provider differs from the health tier providerProblem
9 wasted worker failures from
ambiguous_ai_unavailable+backend_quota_errorcaused by workers being spawned against unavailable/rate-limited providers. The root cause wascheck_model_health()collapsing exit codes 2 (rate-limited) and 3 (invalid-key) from the availability helper into a singlereturn 1, preventing callers from distinguishing transient rate-limits from permanent key failures.Additionally,
dispatch_review_fix_worker()had no health check at all, spawning review-fix workers blindly regardless of provider status.Changes
check_model_health()cmd_dispatch()cmd_reprompt()dispatch_review_fix_worker()Testing
bash -nsyntax check: PASSshellcheck --severity=warning: No new warnings (all existing are pre-existing SC2034 unused vars in unrelated functions)if ! check_model_healthstill work since non-zero is still falsy)