Skip to content

t233: Pre-dispatch model availability check with distinct exit codes#959

Merged
marcusquinn merged 2 commits intomainfrom
feature/t233
Feb 10, 2026
Merged

t233: Pre-dispatch model availability check with distinct exit codes#959
marcusquinn merged 2 commits intomainfrom
feature/t233

Conversation

@marcusquinn
Copy link
Owner

Summary

  • Enhanced check_model_health() to propagate distinct exit codes from model-availability-helper.sh (0=healthy, 1=unavailable, 2=rate-limited, 3=invalid-key) instead of collapsing all failures to exit 1
  • Updated cmd_dispatch() and cmd_reprompt() to handle distinct codes with specific log messages and appropriate defer/block behavior
  • Added missing health check to dispatch_review_fix_worker() (previously spawned workers with no availability check at all)
  • Added secondary provider check after model resolution in cmd_dispatch() to catch cases where the dispatch model's provider differs from the health tier provider

Problem

9 wasted worker failures from ambiguous_ai_unavailable + backend_quota_error caused by workers being spawned against unavailable/rate-limited providers. The root cause was check_model_health() collapsing exit codes 2 (rate-limited) and 3 (invalid-key) from the availability helper into a single return 1, preventing callers from distinguishing transient rate-limits from permanent key failures.

Additionally, dispatch_review_fix_worker() had no health check at all, spawning review-fix workers blindly regardless of provider status.

Changes

Function Change
check_model_health() Propagates exit 2 (rate-limited) and exit 3 (invalid-key) from availability helper; slow-path probe now distinguishes credits/billing (exit 3) from quota (exit 2) from generic errors (exit 1)
cmd_dispatch() Handles distinct exit codes with case statement; adds secondary availability check for resolved model's provider when it differs from health tier
cmd_reprompt() Handles distinct exit codes with case statement; still returns EX_TEMPFAIL (75) to avoid double-incrementing retry counter
dispatch_review_fix_worker() New: pre-dispatch availability check (was completely missing)

Testing

  • bash -n syntax check: PASS
  • shellcheck --severity=warning: No new warnings (all existing are pre-existing SC2034 unused vars in unrelated functions)
  • No functional test changes needed — the exit code propagation is backward-compatible (callers that used if ! check_model_health still work since non-zero is still falsy)

…(t233)

- Enhanced check_model_health() to propagate exit codes from model-availability-helper.sh:
  exit 0=healthy, 1=unavailable, 2=rate-limited, 3=invalid-key/credits
  (previously collapsed all failures to exit 1)
- Updated cmd_dispatch() to handle distinct codes with specific log messages
- Updated cmd_reprompt() to handle distinct codes (still returns EX_TEMPFAIL=75)
- Added missing health check to dispatch_review_fix_worker() (previously had none)
- Added secondary provider check after model resolution in cmd_dispatch()
  (catches case where health tier provider differs from dispatch model provider)
- Slow-path probe now distinguishes credits/billing (exit 3) from quota/rate-limit
  (exit 2) from generic provider errors (exit 1)

Prevents wasted dispatch attempts when provider is rate-limited or down.
Root cause: 9 failures from ambiguous_ai_unavailable + backend_quota_error
were caused by workers spawned against unavailable providers.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 10, 2026

Warning

Rate limit exceeded

@marcusquinn has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 15 minutes and 59 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/t233

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the robustness of AI model dispatch and retry mechanisms by introducing more granular availability checks. Previously, various provider failures were collapsed into a single error, leading to wasted worker attempts against unavailable or rate-limited AI services. The changes now allow the system to distinguish between transient issues like rate limiting, persistent problems like invalid API keys, and general unavailability, enabling more intelligent deferral or blocking of tasks. This reduces unnecessary resource consumption and provides clearer diagnostics for operational issues.

Highlights

  • Enhanced Model Health Check: The check_model_health() function now propagates distinct exit codes (0=healthy, 1=unavailable, 2=rate-limited, 3=invalid-key) from model-availability-helper.sh, providing more granular status information.
  • Improved Dispatch and Reprompt Handling: The cmd_dispatch() and cmd_reprompt() functions have been updated to interpret these distinct exit codes, allowing for specific log messages and appropriate deferral or blocking behavior based on the availability status.
  • Added Health Check for Review Fix Workers: A previously missing pre-dispatch availability check has been added to dispatch_review_fix_worker(), preventing the blind spawning of workers against unavailable or rate-limited providers.
  • Secondary Provider Availability Check: A new check was introduced in cmd_dispatch() to verify the availability of the resolved model's specific provider, especially when it differs from the initial health tier provider.
Changelog
  • .agents/scripts/supervisor-helper.sh
    • Updated check_model_health() to propagate distinct exit codes (2 for rate-limited, 3 for invalid API key) from the availability helper.
    • Modified check_model_health()'s slow path to differentiate between billing/credits exhaustion (exit 3), quota/rate limiting (exit 2), and generic provider errors (exit 1).
    • Refactored cmd_dispatch() to handle the new distinct exit codes from check_model_health() using a case statement for specific logging and return behavior.
    • Added a secondary availability check in cmd_dispatch() to verify the resolved model's provider if it differs from the health tier provider.
    • Updated cmd_reprompt() to process the distinct exit codes from check_model_health() with tailored log messages, while maintaining EX_TEMPFAIL return for transient issues.
    • Implemented a pre-dispatch availability check in dispatch_review_fix_worker() to prevent spawning workers when the AI provider is unavailable.
Activity
  • No human activity has occurred on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 36 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Tue Feb 10 17:50:38 UTC 2026: Code review monitoring started
Tue Feb 10 17:50:39 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 36

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 36
  • VULNERABILITIES: 0

Generated on: Tue Feb 10 17:50:41 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant improvement to the supervisor's robustness by introducing a pre-dispatch model availability check with distinct exit codes. Propagating specific failure reasons like rate-limiting and invalid API keys from check_model_health() will prevent wasting worker resources on doomed dispatches. The updates to cmd_dispatch(), cmd_reprompt(), and the addition of a missing check in dispatch_review_fix_worker() are well-executed and clearly address the problems outlined. My review includes a few suggestions to improve consistency and fully align the changes with the PR's goal of handling distinct exit codes, particularly in the new secondary provider check. I've also noted a few violations of the repository's shell scripting style guide regarding error stream handling. Overall, this is a valuable change that hardens the dispatch logic.

Comment on lines 4727 to 4731
"$availability_helper" check "$resolved_provider" --quiet 2>/dev/null || resolved_avail_exit=$?
if [[ "$resolved_avail_exit" -ne 0 ]]; then
log_warn "Resolved model provider '$resolved_provider' unavailable (exit $resolved_avail_exit) for $task_id — deferring dispatch"
return 3
fi

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This secondary availability check has two issues:

  1. It uses 2>/dev/null for blanket error suppression, which violates the repository style guide (rule docs: update branch creation to recommend worktrees for parallel sessions #50).
  2. It doesn't differentiate between failure types from the availability helper (e.g., rate-limit vs. invalid key), treating all non-zero exits as "unavailable". This undermines the main goal of this PR, which is to propagate and handle distinct exit codes.

The logic should be expanded to handle distinct exit codes with specific logging, and stderr should not be blanket suppressed.

Suggested change
"$availability_helper" check "$resolved_provider" --quiet 2>/dev/null || resolved_avail_exit=$?
if [[ "$resolved_avail_exit" -ne 0 ]]; then
log_warn "Resolved model provider '$resolved_provider' unavailable (exit $resolved_avail_exit) for $task_id — deferring dispatch"
return 3
fi
"$availability_helper" check "$resolved_provider" --quiet || resolved_avail_exit=$?
if [[ "$resolved_avail_exit" -ne 0 ]]; then
case "$resolved_avail_exit" in
2)
log_warn "Resolved model provider '$resolved_provider' is rate-limited (exit $resolved_avail_exit) for $task_id — deferring dispatch"
;;
3)
log_error "Resolved model provider '$resolved_provider' has invalid key/credits (exit $resolved_avail_exit) for $task_id — blocking dispatch"
;;
*)
log_warn "Resolved model provider '$resolved_provider' unavailable (exit $resolved_avail_exit) for $task_id — deferring dispatch"
;;
esac
return 3
fi
References
  1. Rule docs: update branch creation to recommend worktrees for parallel sessions #50: 2>/dev/null is acceptable ONLY when redirecting to log files, not blanket suppression. The code uses 2>/dev/null to suppress potential errors from the helper script without logging. (link)
  2. In shell scripts, capture a command's exit code in a variable instead of using $? directly in conditionals. This aligns with ShellCheck SC2181 and improves clarity for multi-way branches.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ed1fc41: Removed blanket 2>/dev/null and expanded the secondary availability check to handle distinct exit codes (2=rate-limited, 3=invalid-key/credits, *=generic unavailable) with specific logging for each. This now correctly propagates the exit code semantics that are the core goal of t233.

Comment on lines 3801 to 3812
if echo "$probe_result" | grep -qiE 'CreditsError|Insufficient balance' 2>/dev/null; then
log_warn "Model health check FAILED: billing/credits exhausted (slow path)"
return 3 # t233: credits = invalid key equivalent (won't resolve without human action)
fi
if echo "$probe_result" | grep -qiE 'Quota protection|over[_ -]?usage|quota reset|429|too many requests|rate.limit' 2>/dev/null; then
log_warn "Model health check FAILED: quota/rate limited (slow path)"
return 2 # t233: rate-limited = defer dispatch, retry soon
fi
if echo "$probe_result" | grep -qiE 'endpoints failed|"status":[[:space:]]*503|HTTP 503|503 Service|service unavailable' 2>/dev/null; then
log_warn "Model health check FAILED: provider error detected (slow path)"
return 1
fi

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The grep commands in this block use 2>/dev/null to suppress stderr. This is a violation of the repository style guide (rule #50), which prohibits blanket error suppression and allows 2>/dev/null only when redirecting to log files. While the risk of grep erroring here is low, adhering to the style guide improves script robustness and debuggability. Please consider removing the suppression or redirecting stderr to a log if appropriate.

References
  1. Rule docs: update branch creation to recommend worktrees for parallel sessions #50: 2>/dev/null is acceptable ONLY when redirecting to log files, not blanket suppression. The code uses 2>/dev/null to suppress potential errors from grep without logging. (link)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ed1fc41: Removed 2>/dev/null from all three grep commands in the failure-pattern detection block. These pipe from echo so stderr errors are unlikely, but removing the suppression aligns with the style guide and improves debuggability.

Comment on lines 6191 to 6195
case "$health_exit" in
2) log_warn "Provider rate-limited for $task_id review-fix — deferring to next pulse" ;;
3) log_error "API key invalid/credits exhausted for $task_id review-fix" ;;
*) log_error "Provider unavailable for $task_id review-fix — deferring" ;;
esac

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This case statement uses a compact, single-line format for each branch. For consistency with other case statements in this file and for better readability, please format it using multiple lines.

Suggested change
case "$health_exit" in
2) log_warn "Provider rate-limited for $task_id review-fix — deferring to next pulse" ;;
3) log_error "API key invalid/credits exhausted for $task_id review-fix" ;;
*) log_error "Provider unavailable for $task_id review-fix — deferring" ;;
esac
case "$health_exit" in
2)
log_warn "Provider rate-limited for $task_id review-fix — deferring to next pulse"
;;
3)
log_error "API key invalid/credits exhausted for $task_id review-fix"
;;
*)
log_error "Provider unavailable for $task_id review-fix — deferring"
;;
esac

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ed1fc41: Reformatted the compact single-line case branches to multi-line format, consistent with the rest of the file (e.g., the dispatch_worker case statement at line ~4615).

@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 36 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Tue Feb 10 18:00:55 UTC 2026: Code review monitoring started
Tue Feb 10 18:00:56 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 36

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 36
  • VULNERABILITIES: 0

Generated on: Tue Feb 10 18:00:58 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link

@marcusquinn marcusquinn merged commit 0516cfb into main Feb 10, 2026
18 checks passed
@marcusquinn marcusquinn deleted the feature/t233 branch February 10, 2026 18:05
marcusquinn added a commit that referenced this pull request Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant