fix: add recovery logic to auto-update cmd_check() for diverged/dirty repos by alex-solovyev · Pull Request #4245 · marcusquinn/aidevops

alex-solovyev · 2026-03-12T21:47:27Z

Summary

Port recovery safeguards from cmd_update() (aidevops.sh) into cmd_check() (auto-update-helper.sh) to prevent indefinite update failures
Add authenticated gh api as primary version check method (5000 req/hr vs 60 unauthenticated)

Problem

auto-update-helper.sh check uses git pull --ff-only but has no recovery when the working tree is dirty or the branch has diverged. Logs showed 130+ failed pulls across multiple version ranges (stuck for 3-30+ hours each time) and 106 "remote=unknown" errors from unauthenticated API rate limiting.

The manual aidevops update command already handles both failure modes correctly — this PR ports that logic to the auto-updater.

Changes

`cmd_check()` recovery (lines 1067-1110)

Ensure on main branch — handles detached HEAD from interrupted updates
Clean dirty working tree — git reset HEAD -- . + git checkout -- . (install dir, local changes are never intentional)
Fall back to git reset --hard origin/main when --ff-only fails (mirrors cmd_update())

`get_remote_version()` authenticated fallback (lines 304-345)

Method 1: gh api (authenticated, 5000 req/hr) — new
Method 2: unauthenticated curl + jq (60 req/hr) — existing
Method 3: raw.githubusercontent.com CDN fallback — existing

Verification

bash -n syntax check: pass
shellcheck: clean (only SC1091 info for external source)
Logic mirrors proven cmd_update() implementation (aidevops.sh:656-718)

References

fix: aidevops update dirties git working tree, blocking next update #2286 (setup.sh leaving dirty tree)
aidevops update not pulling latest version - stuck on 2.125 while 2.128.3 available #2288 (ff-only fallback in manual update)

Closes #4142

Summary by CodeRabbit

Chores
- Updated internal script documentation to reflect improved fallback strategy for version fetching.

coderabbitai · 2026-03-12T21:47:38Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9842dbe8-bfd9-49ed-b8cc-a94435400d68

📥 Commits

Reviewing files that changed from the base of the PR and between fa8b034 and a7364f5.

📒 Files selected for processing (1)

.agents/scripts/auto-update-helper.sh

Walkthrough

The pull request updates the documentation comment in get_remote_version() to reflect a revised three-tier fallback strategy for fetching remote versions: authenticated GitHub CLI API, then unauthenticated curl, then raw.githubusercontent.com CDN.

Changes

Cohort / File(s)	Summary
Version Fetch Documentation `.agents/scripts/auto-update-helper.sh`	Updated comment in `get_remote_version()` function to document multi-step fallback approach: authenticated `gh api` → unauthenticated `curl` → raw.githubusercontent.com CDN, replacing previous note about avoiding CDN cache via API endpoint.

Estimated Code Review Effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly Related PRs

GH#4142: fix auto-update recovery for dirty tree, diverged branch, and API rate limits #4144: Implements the actual multi-step get_remote_version strategy (authenticated gh → curl → CDN fallback) that this PR's comment now documents.

Suggested Labels

external-contributor, needs-maintainer-review

Poem

📝 A comment refined, a path made clear,
Three fallbacks blessed: gh, curl, then sphere—
The remote version dance, now well-explained,
DevOps wisdom in prose sustained. 🚀

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	Title accurately reflects the primary change: adding recovery logic to cmd_check() for handling diverged/dirty repos, matching the core objective.
Linked Issues check	✅ Passed	PR substantially addresses `#4142`'s requirements: ports recovery safeguards (ensure main branch, clean dirty tree, ff-only fallback), adds authenticated gh api with curl/CDN fallbacks.
Out of Scope Changes check	✅ Passed	Changes are narrowly scoped to auto-update-helper.sh with comment-only modifications to get_remote_version() documenting the multi-step recovery strategy.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch bugfix/auto-update-ff-only-recovery

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-03-12T21:47:55Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the stability and efficiency of the system's auto-update and worker management components. It resolves critical issues where the auto-updater could get stuck due to Git repository inconsistencies or API rate limits, ensuring more reliable updates. Concurrently, it introduces a sophisticated mechanism to identify and manage 'thrashing' workers, optimizing resource utilization and preventing unproductive computational cycles.

Highlights

Auto-Update Recovery Logic: Implemented robust recovery safeguards in cmd_check() within auto-update-helper.sh to prevent indefinite update failures. This includes ensuring the repository is on the main branch, cleaning dirty working trees, and falling back to git reset --hard origin/main if git pull --ff-only fails.
Enhanced Remote Version Check: Prioritized authenticated gh api for get_remote_version to leverage a higher rate limit (5000 req/hr) and reduce 'remote=unknown' errors, falling back to unauthenticated curl and CDN if gh is unavailable or fails.
Zero-Commit Thrash Detection: Introduced a new 'zero-commit thrash' guardrail in worker-watchdog.sh to detect and terminate workers that produce many messages over a long runtime without making any commits, preventing resource waste and infinite loops.
Dynamic GitHub Issue Labeling: Modified the post_kill_github_update function to dynamically apply status:blocked for 'thrash' kills (with retry guidance) and status:available for other recoverable kill reasons (idle, stall, runtime), ensuring appropriate issue re-dispatch behavior.

Changelog

.agents/scripts/auto-update-helper.sh
- Added authenticated gh api as the primary method for get_remote_version to improve reliability and rate limit usage.
- Implemented logic in cmd_check to ensure the Git repository is on the main branch, handling detached HEAD states.
- Added steps to clean a dirty working tree in cmd_check using git reset HEAD -- . and git checkout -- ..
- Introduced a fallback mechanism in cmd_check to perform a git reset --hard origin/main if git pull --ff-only fails.
.agents/scripts/tests/test-worker-stall-diagnosis.sh
- Added a set_struggle_stub function to facilitate testing of struggle ratio calculations.
- Included new test cases (test_thrashing_guard_detects_zero_commit_high_message, test_thrashing_guard_skips_workers_with_commits) for the zero-commit thrashing detection logic.
- Added tests (test_post_kill_marks_thrash_as_blocked, test_post_kill_marks_runtime_as_available) to verify correct GitHub issue labeling after worker kills.
.agents/scripts/worker-watchdog.sh
- Introduced a new 'Zero-commit thrash' failure mode, increasing the total detected failure modes from three to four.
- Defined new environment variables WORKER_THRASH_ELAPSED_THRESHOLD and WORKER_THRASH_MESSAGE_THRESHOLD for configuring thrash detection.
- Implemented the check_zero_commit_thrashing function to identify workers exhibiting zero-commit, high-message thrashing behavior.
- Updated the post_kill_github_update function to handle the 'thrash' reason, applying status:blocked and providing specific retry guidance.
- Modified the post_kill_github_update function to remove various stale status labels before adding the appropriate new status label (status:available or status:blocked).
- Adjusted the cmd_check function to incorporate the new check_zero_commit_thrashing logic into the worker monitoring sequence.
- Updated the cmd_status output and help text to reflect the new thrash guardrail and its parameters.

Ignored Files

Ignored by pattern: .github/workflows/** (1)
- .github/workflows/code-review-monitoring.yml

Activity

Recovery safeguards from cmd_update() (aidevops.sh) were ported into cmd_check() (auto-update-helper.sh).
Authenticated gh api was added as the primary version check method to improve rate limit usage.
The pull request addresses and closes issue auto-update-helper.sh: pull --ff-only fails silently on diverged/dirty repo, blocks updates indefinitely #4142, which reported 'remote=unknown' errors from unauthenticated API rate limiting and indefinite update failures.
It also references issues fix: aidevops update dirties git working tree, blocking next update #2286 (setup.sh leaving dirty tree) and aidevops update not pulling latest version - stuck on 2.125 while 2.128.3 available #2288 (ff-only fallback in manual update) as related problems solved by the new recovery logic.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces two main improvements. First, it enhances the auto-update script's robustness by adding recovery logic for diverged or dirty git repositories and by using authenticated GitHub API calls to prevent rate-limiting issues. This makes the auto-update process more resilient. Second, it implements a new 'zero-commit thrash' detection mechanism in the worker watchdog to identify and terminate worker processes that are consuming resources without making progress. While the changes are sound, I've identified a minor issue in the user-facing notification for the new thrashing guardrail, which could lead to confusion. My feedback includes a suggestion to address this.

gemini-code-assist · 2026-03-12T21:49:45Z

.agents/scripts/worker-watchdog.sh

+${destination_text}
+
+**Retry guidance:** Post a blocker update describing a changed plan (or newly unblocked dependency), then move the issue back to \`status:available\` before re-dispatch.


The 'Retry guidance' is currently included in all kill notification comments. However, this guidance is specific to thrash kills that result in a status:blocked label. For other kill reasons (idle, stall, runtime), the issue is moved to status:available, making the guidance to 'move the issue back to status:available' confusing and redundant.

To improve clarity, the retry guidance should only be displayed when the kill reason is thrash.

Suggested change

${destination_text}

**Retry guidance:** Post a blocker update describing a changed plan (or newly unblocked dependency), then move the issue back to \`status:available\` before re-dispatch.

${destination_text}$([[ "$reason" == "thrash" ]] && printf '%s' "\n\n**Retry guidance:** Post a blocker update describing a changed plan (or newly unblocked dependency), then move the issue back to \`status:available\` before re-dispatch.")

alex-solovyev · 2026-03-12T22:33:37Z

Worker killed after 1h49m with 0 commits (struggle_ratio: 2141).

Model: sonnet (anthropic/claude-sonnet-4-6)
Branch: bugfix/auto-update-ff-only-recovery
Reason: Thrashing — 2141 messages, 0 commits after 1h49m
Diagnosis: Merge conflict resolution likely requires interactive rebase that the worker cannot complete headlessly
Next action: Re-dispatch with explicit conflict resolution instructions

alex-solovyev · 2026-03-12T22:34:38Z

Dispatching worker.

Model: sonnet (anthropic/claude-sonnet-4-6)
Branch: bugfix/auto-update-ff-only-recovery
Scope: Resolve merge conflict in .github/workflows/code-review-monitoring.yml, rebase onto origin/main
Attempt: 2 of 3
Direction: Use git rebase --onto, not interactive rebase. Accept both sides of conflict (PR changes + main changes). Push with --force-with-lease.

alex-solovyev · 2026-03-12T22:36:42Z

@coderabbitai review

coderabbitai · 2026-03-12T22:36:55Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

github-actions · 2026-03-12T22:37:12Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 404 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Mar 12 22:37:09 UTC 2026: Code review monitoring started
Thu Mar 12 22:37:09 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 404

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 404
VULNERABILITIES: 0

Generated on: Thu Mar 12 22:37:11 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

… repos Port safeguards from cmd_update() (aidevops.sh) into cmd_check() (auto-update-helper.sh) to prevent indefinite update failures: 1. Ensure on main branch before pull (handles detached HEAD) 2. Clean dirty working tree before pull (git reset + checkout) 3. Fall back to git reset --hard origin/main when ff-only fails 4. Use authenticated gh api for version checks (5000 req/hr vs 60) Root cause: 130+ failed pulls and 106 remote=unknown errors logged over weeks because cmd_check() had no recovery path — it just retried the same failing git pull --ff-only every 10 minutes. Closes #4142

alex-solovyev · 2026-03-12T22:40:12Z

@coderabbitai review

coderabbitai · 2026-03-12T22:40:23Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

github-actions · 2026-03-12T22:40:32Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 404 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Mar 12 22:40:28 UTC 2026: Code review monitoring started
Thu Mar 12 22:40:29 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 404

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 404
VULNERABILITIES: 0

Generated on: Thu Mar 12 22:40:31 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

sonarqubecloud · 2026-03-12T22:41:16Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Addresses Gemini review suggestion on PR #4245: the 'Retry guidance' text was shown for all kill reasons (idle, stall, runtime, thrash). For non-thrash kills the issue moves to status:available, making the guidance to 'move back to status:available' confusing and redundant. Now the retry guidance is conditionally appended only when reason == thrash.

alex-solovyev · 2026-03-12T22:49:58Z

@coderabbitai review

coderabbitai · 2026-03-12T22:50:08Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

alex-solovyev · 2026-03-12T22:52:43Z

@coderabbitai review

coderabbitai · 2026-03-12T22:52:52Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

alex-solovyev · 2026-03-12T22:55:10Z

Completed via PR #4245.

Model: sonnet (anthropic/claude-sonnet-4-6)
Attempts: 1
Duration: ~36 minutes (PID 1090721)

github-actions bot added the bug Auto-created from TODO.md tag label Mar 12, 2026

gemini-code-assist bot reviewed Mar 12, 2026

View reviewed changes

alex-solovyev mentioned this pull request Mar 12, 2026

[Supervisor:alex-solovyev] 13 PRs, 16 assigned, 6 workers at 06:15 UTC #2646

Closed

marcusquinn mentioned this pull request Mar 12, 2026

[Supervisor:marcusquinn] 0 PRs, 0 assigned, 1 worker at 21:32 UTC #2645

Open

alex-solovyev force-pushed the bugfix/auto-update-ff-only-recovery branch from 21cd837 to c6fbc99 Compare March 12, 2026 22:36

alex-solovyev force-pushed the bugfix/auto-update-ff-only-recovery branch from c6fbc99 to a7364f5 Compare March 12, 2026 22:39

coderabbitai bot approved these changes Mar 12, 2026

View reviewed changes

alex-solovyev added the needs-review-fixes label Mar 12, 2026

alex-solovyev merged commit f214586 into main Mar 12, 2026
20 checks passed

alex-solovyev deleted the bugfix/auto-update-ff-only-recovery branch March 12, 2026 22:46

github-actions bot mentioned this pull request Mar 12, 2026

auto-update-helper.sh: pull --ff-only fails silently on diverged/dirty repo, blocks updates indefinitely #4142

Closed

alex-solovyev removed bug Auto-created from TODO.md tag needs-review-fixes labels Mar 12, 2026

coderabbitai bot mentioned this pull request Mar 14, 2026

fix(auto-update): detect script drift when VERSION matches to prevent stale pulse (GH#4727) #4749

Merged

		${destination_text}

		Retry guidance: Post a blocker update describing a changed plan (or newly unblocked dependency), then move the issue back to \`status:available\` before re-dispatch.

Conversation

alex-solovyev commented Mar 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Changes

cmd_check() recovery (lines 1067-1110)

get_remote_version() authenticated fallback (lines 304-345)

Verification

References

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated Code Review Effort

Possibly Related PRs

Suggested Labels

Poem

Uh oh!

gemini-code-assist bot commented Mar 12, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

alex-solovyev commented Mar 12, 2026

Uh oh!

alex-solovyev commented Mar 12, 2026

Uh oh!

alex-solovyev commented Mar 12, 2026

Uh oh!

coderabbitai bot commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 12, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

alex-solovyev commented Mar 12, 2026

Uh oh!

coderabbitai bot commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 12, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

sonarqubecloud bot commented Mar 12, 2026

Quality Gate passed

Uh oh!

Uh oh!

alex-solovyev commented Mar 12, 2026

Uh oh!

coderabbitai bot commented Mar 12, 2026

Uh oh!

alex-solovyev commented Mar 12, 2026

Uh oh!

coderabbitai bot commented Mar 12, 2026

Uh oh!

alex-solovyev commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

alex-solovyev commented Mar 12, 2026 •

edited by coderabbitai bot

Loading

`cmd_check()` recovery (lines 1067-1110)

`get_remote_version()` authenticated fallback (lines 304-345)

coderabbitai bot commented Mar 12, 2026 •

edited

Loading