Skip to content

fix(supervisor): detect gh auth failures to prevent clean_exit_no_signal retry waste (t198)#924

Closed
marcusquinn wants to merge 1 commit intomainfrom
feature/t198
Closed

fix(supervisor): detect gh auth failures to prevent clean_exit_no_signal retry waste (t198)#924
marcusquinn wants to merge 1 commit intomainfrom
feature/t198

Conversation

@marcusquinn
Copy link
Owner

@marcusquinn marcusquinn commented Feb 10, 2026

Summary

  • Root cause analysis of 560 worker logs identified that 79% of clean_exit_no_signal cases are workers that completed work but couldn't push/create PR due to expired GitHub auth
  • These burned 3 retries each at Opus cost (~$15-30 per task) with identical failures
  • Added blocked:gh_auth_expired detection so auth failures block immediately instead of wasting retries

Root Cause Analysis

Category Count % Status
Backend errors (short logs) 169 30% Already handled
PR created, no signal 127 23% Already handled (t192)
Success with signal 92 16% Working correctly
No EXIT line (killed) 65 12% Edge case
clean_exit_no_signal candidates 48 9% This PR
Short logs (other) 24 4% Edge case
Non-zero exit 22 4% Already handled

Of the 48 clean_exit_no_signal candidates:

  • 38 (79%): GitHub auth expired — worker completed work, couldn't push → NOW BLOCKED
  • 7: Context exhaustion / short logs — legitimate retries
  • 2: Task already done — handled by task_obsolete detection
  • 1: Shell unavailable

Changes

  1. extract_log_metadata(): Added gh_auth_failure detection scanning full log for specific gh auth patterns
  2. evaluate_worker(): Added blocked:gh_auth_expired check before clean_exit_no_signal fallback
  3. Tests: 4 new test cases in both test suites (all passing, zero regressions)

Testing

  • ShellCheck: zero new violations
  • test-supervisor-state-machine.sh: 4 new tests pass, 13 pre-existing failures unchanged
  • test-dispatch-worktree-evaluate.sh: 1 new test passes, 12 pre-existing failures unchanged

Closes #817

Summary by CodeRabbit

Release Notes

  • New Features

    • Added detection for GitHub authentication failures in task logs
    • Task evaluation now recognizes expired or invalid GitHub credentials and prevents unnecessary retries
  • Tests

    • Added test coverage for GitHub authentication failure detection and blocking behavior

…nal retry waste (t198)

Root cause analysis of 560 worker logs found 79% (38/48) of clean_exit_no_signal
cases are workers that completed work but couldn't push/create PR due to expired
GitHub auth. These burned 3 retries each at Opus cost with identical failures.

Changes:
- extract_log_metadata(): add gh_auth_failure detection scanning full log for
  specific gh auth patterns (safe from false positives — patterns are specific)
- evaluate_worker(): add blocked:gh_auth_expired check before clean_exit_no_signal
  fallback, so auth failures block immediately instead of wasting retries
- Tests: 4 new test cases covering gh auth failure detection in both test suites

Log analysis breakdown (560 total logs):
- 169 backend errors (short logs, already handled)
- 127 PR created but no signal (already handled by t192)
- 92 success with signal
- 57 clean_exit_no_signal candidates (48 after filtering)
  - 38 gh auth failures (NOW BLOCKED instead of retried)
  - 2 task already done (handled by task_obsolete detection)
  - 7 context exhaustion / short logs (legitimate retries)
  - 1 shell unavailable
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 10, 2026

Walkthrough

This PR adds GitHub authentication failure detection to supervisor task evaluation logic. The extract_log_metadata function now scans task logs for authentication-related error phrases and exposes a gh_auth_failure flag. The evaluate_worker function uses this flag to immediately return blocked:gh_auth_expired verdict when authentication is invalid, preventing unnecessary retries. Supporting integration and state machine tests validate the detection and blocking behavior.

Changes

Cohort / File(s) Summary
Supervisor Helper Script
.agents/scripts/supervisor-helper.sh
Added GitHub authentication failure detection in extract_log_metadata to scan logs for auth-related phrases and expose gh_auth_failure metadata flag. Modified evaluate_worker to check this flag and return blocked:gh_auth_expired verdict when authentication fails, short-circuiting further evaluation and preventing task retries.
Dispatch Worktree Test
tests/test-dispatch-worktree-evaluate.sh
Added integration test section simulating worker completion with expired GitHub authentication. Test validates that evaluation produces blocked:gh_auth_expired verdict and includes assertion logic for auth failure detection.
Supervisor State Machine Test
tests/test-supervisor-state-machine.sh
Added two test scenarios (t198e, t198f) to verify behavior when GitHub authentication fails during task evaluation. Both test cases assert that authentication failures trigger blocked:gh_auth_expired verdict rather than retry logic.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related issues

Possibly related PRs

Poem

🔐 A shield guards the gates, where credentials once fell,
Auth whispers expose their expired tales to tell,
No retry shall follow what GitHub denied,
Blocked verdicts rise swift—let stale tokens subside.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: detecting GitHub auth failures to prevent wasteful retries in the clean_exit_no_signal pattern.
Linked Issues check ✅ Passed All coding requirements from issue #817 are met: gh_auth_failure detection added to extract_log_metadata(), blocked:gh_auth_expired check added to evaluate_worker(), and comprehensive test coverage provided.
Out of Scope Changes check ✅ Passed All changes directly address issue #817 objectives; no out-of-scope modifications detected across supervisor helper, dispatch tests, and state-machine tests.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/t198

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves resource efficiency by addressing a common cause of wasted computational cycles: tasks failing due to expired GitHub authentication. By implementing proactive detection of these authentication issues, the system now immediately blocks such tasks, preventing unnecessary retries and associated costs. This change allows for more efficient resolution of credential problems and optimizes overall task processing.

Highlights

  • GitHub Auth Failure Detection: Implemented logic to detect GitHub authentication failures in worker logs by scanning for specific patterns.
  • Prevented Wasted Retries: Introduced a mechanism to immediately block tasks with GitHub authentication failures, preventing costly and repetitive retries.
  • New Test Cases: Added four new test cases across two test suites to validate the new detection and blocking logic for GitHub authentication issues.
Changelog
  • .agents/scripts/supervisor-helper.sh
    • Added gh_auth_failure variable and grep logic within extract_log_metadata() to identify GitHub authentication failure patterns in worker logs.
    • Introduced a conditional check in evaluate_worker() to set the task status to blocked:gh_auth_expired when a clean exit occurs with no signal but GitHub authentication failure is detected.
  • tests/test-dispatch-worktree-evaluate.sh
    • Added a new integration test case (integ-t198b) to simulate a worker completing its work but failing to push due to expired GitHub authentication, verifying the blocked:gh_auth_expired status.
  • tests/test-supervisor-state-machine.sh
    • Added two new test cases (eval-t198e and eval-t198f) to confirm that tasks exhibiting various GitHub authentication failure messages are correctly identified and blocked as blocked:gh_auth_expired.
Activity
  • No human activity (comments, reviews) has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 34 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Tue Feb 10 13:19:52 UTC 2026: Code review monitoring started
Tue Feb 10 13:19:53 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 34

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 34
  • VULNERABILITIES: 0

Generated on: Tue Feb 10 13:19:56 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @.agents/scripts/supervisor-helper.sh:
- Around line 4769-4780: The current full-log grep that sets
gh_auth_failure=true is too broad and matches generic phrases; update the grep
in the block that defines gh_auth_failure to restrict matches to explicit auth
failure lines only (references: variable gh_auth_failure, the grep invocation
using "$log_file"): match patterns like '^.*gh auth (status|login).*fail',
'(^|[:space])gh:', 'fatal: Authentication failed', 'remote: Permission denied',
and 'authentication token.*expired' (case-insensitive), and remove or
de-prioritize generic phrases such as "not logged in" or "failed to
authenticate" unless they appear in those auth-specific contexts; keep the
2>/dev/null and echo "gh_auth_failure=$gh_auth_failure" behavior.

Comment on lines +4769 to +4780
# GitHub auth failure detection (t198): workers that complete their work but
# can't push/create a PR because gh auth is expired. This is the #1 cause of
# clean_exit_no_signal (79% of cases in production logs). Without this check,
# the supervisor retries them 3x at Opus cost, each failing identically.
# Search the FULL log (not just tail) because auth failures appear in tool
# output mid-log, not at the end. This is safe because gh auth patterns are
# specific enough to avoid false positives from documentation content.
local gh_auth_failure="false"
if grep -qiE 'gh auth (status|login).*fail|authentication token.*expired|not logged in|try authenticating|gh: To use .* in a non-interactive context|could not authenticate|failed to authenticate' "$log_file" 2>/dev/null; then
gh_auth_failure="true"
fi
echo "gh_auth_failure=$gh_auth_failure"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Tighten auth-failure regex to avoid false positives in full-log scan.

Generic phrases like “not logged in” can appear in docs or code samples within logs, which would incorrectly block a task. Scope matches to gh/git auth error lines (e.g., gh auth, gh:, fatal: Authentication failed, remote: Permission denied) before setting gh_auth_failure=true.

Suggested refinement
-    if grep -qiE 'gh auth (status|login).*fail|authentication token.*expired|not logged in|try authenticating|gh: To use .* in a non-interactive context|could not authenticate|failed to authenticate' "$log_file" 2>/dev/null; then
+    if grep -qiE '(gh auth (status|login).*(fail|not logged in|authenticate)|gh:.*(not logged in|authenticate)|GitHub CLI.*(not logged in|authenticate)|fatal:.*authentication failed|remote:.*(authentication|permission) denied)' "$log_file" 2>/dev/null; then
         gh_auth_failure="true"
     fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# GitHub auth failure detection (t198): workers that complete their work but
# can't push/create a PR because gh auth is expired. This is the #1 cause of
# clean_exit_no_signal (79% of cases in production logs). Without this check,
# the supervisor retries them 3x at Opus cost, each failing identically.
# Search the FULL log (not just tail) because auth failures appear in tool
# output mid-log, not at the end. This is safe because gh auth patterns are
# specific enough to avoid false positives from documentation content.
local gh_auth_failure="false"
if grep -qiE 'gh auth (status|login).*fail|authentication token.*expired|not logged in|try authenticating|gh: To use .* in a non-interactive context|could not authenticate|failed to authenticate' "$log_file" 2>/dev/null; then
gh_auth_failure="true"
fi
echo "gh_auth_failure=$gh_auth_failure"
# GitHub auth failure detection (t198): workers that complete their work but
# can't push/create a PR because gh auth is expired. This is the `#1` cause of
# clean_exit_no_signal (79% of cases in production logs). Without this check,
# the supervisor retries them 3x at Opus cost, each failing identically.
# Search the FULL log (not just tail) because auth failures appear in tool
# output mid-log, not at the end. This is safe because gh auth patterns are
# specific enough to avoid false positives from documentation content.
local gh_auth_failure="false"
if grep -qiE '(gh auth (status|login).*(fail|not logged in|authenticate)|gh:.*(not logged in|authenticate)|GitHub CLI.*(not logged in|authenticate)|fatal:.*authentication failed|remote:.*(authentication|permission) denied)' "$log_file" 2>/dev/null; then
gh_auth_failure="true"
fi
echo "gh_auth_failure=$gh_auth_failure"
🤖 Prompt for AI Agents
In @.agents/scripts/supervisor-helper.sh around lines 4769 - 4780, The current
full-log grep that sets gh_auth_failure=true is too broad and matches generic
phrases; update the grep in the block that defines gh_auth_failure to restrict
matches to explicit auth failure lines only (references: variable
gh_auth_failure, the grep invocation using "$log_file"): match patterns like
'^.*gh auth (status|login).*fail', '(^|[:space])gh:', 'fatal: Authentication
failed', 'remote: Permission denied', and 'authentication token.*expired'
(case-insensitive), and remove or de-prioritize generic phrases such as "not
logged in" or "failed to authenticate" unless they appear in those auth-specific
contexts; keep the 2>/dev/null and echo "gh_auth_failure=$gh_auth_failure"
behavior.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses a significant source of wasted retries by detecting GitHub authentication failures early and blocking the task. The logic is sound and the implementation is clean. The addition of new tests to cover this case is also a great improvement. I have one minor suggestion to better align with the repository's shell scripting style guide.

Comment on lines +4777 to +4779
if grep -qiE 'gh auth (status|login).*fail|authentication token.*expired|not logged in|try authenticating|gh: To use .* in a non-interactive context|could not authenticate|failed to authenticate' "$log_file" 2>/dev/null; then
gh_auth_failure="true"
fi

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This use of 2>/dev/null to suppress grep's error output violates the repository's style guide (line 50), which states that 2>/dev/null is only acceptable when redirecting to log files. A safer and more idiomatic way to handle cases where the log file might not exist is to check for the file's existence before calling grep.

Suggested change
if grep -qiE 'gh auth (status|login).*fail|authentication token.*expired|not logged in|try authenticating|gh: To use .* in a non-interactive context|could not authenticate|failed to authenticate' "$log_file" 2>/dev/null; then
gh_auth_failure="true"
fi
if [[ -f "$log_file" ]] && grep -qiE 'gh auth (status|login).*fail|authentication token.*expired|not logged in|try authenticating|gh: To use .* in a non-interactive context|could not authenticate|failed to authenticate' "$log_file"; then
gh_auth_failure="true"
fi
References
  1. The style guide at line 50 bans blanket suppression of stderr with 2>/dev/null, allowing it only when redirecting to log files. The current code uses it to hide potential 'file not found' errors from grep. (link)

@marcusquinn
Copy link
Owner Author

Closing: t198 was already completed in PR #834 (merged 2026-02-10). This PR was created by a re-dispatch of an already-completed task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

t198: Investigate clean_exit_no_signal retry pattern — root cause analysis

1 participant