t1263: Add stale-claim auto-recovery to supervisor pulse#1982
t1263: Add stale-claim auto-recovery to supervisor pulse#1982marcusquinn merged 5 commits intomainfrom
Conversation
Adds Phase 0.5e stale-claim auto-recovery function that detects tasks with assignee: and/or started: fields in TODO.md that have no active worker, no active worktree, and claim age >24h. Strips the fields so auto-pickup can re-dispatch them. Respects t1017 assignee ownership rule: only unclaims tasks where assignee matches the local user identity.
Adds Phase 0.5e between Phase 0.5d (auto-unblock) and Phase 0.6 (queue reconciliation). Iterates all known repos and calls recover_stale_claims() on each TODO.md. Runs every pulse to promptly detect and recover stuck tasks.
Adds "supervisor-helper.sh stale-claims [--repo path]" command for manual stale-claim recovery. Also adds help text entry.
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughThis PR introduces automatic stale-claim recovery to the supervisor system. When tasks remain claimed (via assignee/started fields) for over 24 hours without active workers or worktrees, they are automatically unclaimed, allowing re-dispatch. Integration spans documentation, CLI, supervisor pulse flow, and core recovery logic. Changes
Sequence Diagram(s)sequenceDiagram
participant Pulse as Supervisor Pulse
participant TodoSync as todo-sync.sh
participant DB as Supervisor DB
participant Git as Git Worktrees
participant TODO as TODO.md
participant Unclaim as cmd_unclaim
Pulse->>TodoSync: Phase 0.5e: recover_stale_claims(repo)
TodoSync->>TODO: Read and parse tasks
TodoSync->>DB: Query active tasks
TodoSync->>Git: List active worktrees
rect rgba(100, 150, 200, 0.5)
Note over TodoSync: Filter & Classify
TodoSync->>TodoSync: Check ownership (identity match)
TodoSync->>TodoSync: Verify not in DB (external/active)
TodoSync->>TodoSync: Verify no worktree
TodoSync->>TodoSync: Check started: age > threshold
end
rect rgba(200, 150, 100, 0.5)
Note over TodoSync,Unclaim: Recover Stale Claims
loop For each stale claim
TodoSync->>Unclaim: cmd_unclaim --force (remove assignee/started)
Unclaim->>TODO: Strip fields
end
end
TodoSync->>Pulse: Return summary (recovered/skipped counts)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Thu Feb 19 23:03:46 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
Adds documentation for the new stale-claim auto-recovery feature including configuration (SUPERVISOR_STALE_CLAIM_SECONDS), manual CLI command, and the detection criteria.
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Thu Feb 19 23:07:08 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
Summary of ChangesHello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical issue where tasks claimed by interactive sessions could become permanently stuck if the session terminated or the user moved on without completing them. By integrating a new auto-recovery phase into the supervisor's pulse cycle, the system can now intelligently detect and reset these stale claims, ensuring that tasks remain dispatchable and the overall task flow is maintained without manual intervention. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces Phase 0.5e stale-claim auto-recovery to the supervisor pulse cycle. The implementation is robust, incorporating safety checks for local user identity and cross-referencing with the supervisor database and active git worktrees. I have identified a few issues related to variable scoping, logic inconsistencies regarding malformed claims, regex extraction order, and efficiency that should be addressed to ensure full correctness and adherence to the project's style guide and established shell scripting practices.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
.agents/scripts/supervisor/todo-sync.sh (1)
159-162:whoamicalled on every loop iteration — minor efficiency nit.
local_user=$(whoami ...)andidentity_user="${identity%%@*}"are recomputed for every task with an assignee. Both values are invariant across the loop and could be computed once above thewhileloop.♻️ Proposed refactor
+ local local_user + local_user=$(whoami 2>/dev/null || echo "") + local gh_user="${_CACHED_GH_USERNAME:-}" + local identity_user="${identity%%@*}" + while IFS= read -r line; do ... if [[ "$is_local_user" == "false" ]]; then - local local_user - local_user=$(whoami 2>/dev/null || echo "") - local gh_user="${_CACHED_GH_USERNAME:-}" - local identity_user="${identity%%@*}" if [[ "$assignee" == "$local_user" ]] || ...🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/scripts/supervisor/todo-sync.sh around lines 159 - 162, Move the invariant computations out of the per-task loop: compute local_user (from whoami) and identity_user (from identity variable via "${identity%%@*}") once before the while loop instead of inside it; remove the per-iteration declarations/localizations inside the loop and reference those precomputed variables where the code currently uses local_user and identity_user so you avoid calling whoami and re-parsing identity on every task iteration.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.agents/scripts/supervisor/pulse.sh:
- Around line 835-852: The code unclaims tasks that have only started: without
an assignee because the grep in recover_stale_claims (todo-sync.sh) matches
either assignee: or started:, then downstream logic uses [[ -n "$assignee" ]] to
gate owner checks but still proceeds to unclaim started-only tasks; fix by
treating started-only tasks as untouchable: in todo-sync.sh inside
recover_stale_claims, after extracting the assignee variable (assignee) add an
explicit guard that if assignee is empty (i.e. started-only claim) then skip
processing/unclaiming and log/continue, or alternatively tighten the upstream
grep to only select lines containing assignee: (so only tasks with an assignee
are considered). Ensure you reference recover_stale_claims, the grep pattern,
and the assignee variable when making the change.
In @.agents/scripts/supervisor/todo-sync.sh:
- Around line 148-177: The ownership check currently only runs when assignee is
non-empty, which lets tasks with a started: timestamp but no assignee be
recovered incorrectly; update the t1017 logic to treat a missing assignee as
unverifiable ownership and skip recovery: before running the is_local_user
checks, detect if assignee is empty (or unset) and increment
skipped_external/log and continue; otherwise proceed with the existing
comparisons using is_local_user, identity, _CACHED_GH_USERNAME, whoami and the
username-prefix checks (assignee, identity_user, ${assignee%%@*}) so only tasks
with a verifiable assignee are unclaimed.
- Around line 203-225: The code treats an unparseable started_ts as age 0 and
falls through to unclaim; change the parsing block around
started_ts/started_epoch so that if started_ts is non-empty but started_epoch
remains 0 (parse failure) you log a clear message and conservatively skip the
task instead of treating it as stale: after the existing date parsing (variables
started_ts and started_epoch) add a branch that checks [[ -n "$started_ts" &&
"$started_epoch" -eq 0 ]] then log_verbose (mentioning the unparseable
started_ts) increment a skipped_unknown counter (or reuse skipped_young) and
continue; keep the existing branch for empty started_ts (the else) intact and
leave the normal claim_age logic when started_epoch > 0 unchanged.
---
Nitpick comments:
In @.agents/scripts/supervisor/todo-sync.sh:
- Around line 159-162: Move the invariant computations out of the per-task loop:
compute local_user (from whoami) and identity_user (from identity variable via
"${identity%%@*}") once before the while loop instead of inside it; remove the
per-iteration declarations/localizations inside the loop and reference those
precomputed variables where the code currently uses local_user and identity_user
so you avoid calling whoami and re-parsing identity on every task iteration.
Auto-dismissed: bot review does not block autonomous pipeline
- Guard empty assignee as untouchable in recover_stale_claims (critical): tasks with started: but no assignee: are now skipped instead of silently unclaimed, protecting external contributors' claims - Treat unparseable started: timestamps conservatively (high): parse failures now skip the task instead of falling through to unclaim - Use tail -1 for started: extraction to match assignee: pattern (medium) - Move loop-invariant identity vars outside while loop (medium) - Separate _sc_repo declaration and assignment (medium)
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Thu Feb 19 23:54:28 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|



Summary
Adds Phase 0.5e stale-claim auto-recovery to the supervisor pulse cycle (t1263).
Problem: When interactive sessions claim tasks (adding
assignee:+started:to TODO.md) but die or move on without completing them, the tasks become permanently stuck. Auto-pickup skips them because they have assignee/started fields, but no worker is running. This caused 15 aidevops tasks to sit idle for 2+ days.Solution: New Phase 0.5e in the pulse cycle that detects stale claims and strips the fields so auto-pickup can re-dispatch them.
Detection Criteria (ALL must be true)
[ ]in TODO.md withassignee:and/orstarted:fieldsSUPERVISOR_STALE_CLAIM_SECONDS)Safety
cmd_unclaim --forcefor the actual field stripping (reuses proven logic)Changes
.agents/scripts/supervisor/todo-sync.sh: Addedrecover_stale_claims()function.agents/scripts/supervisor/pulse.sh: Wired Phase 0.5e between Phase 0.5d (auto-unblock) and Phase 0.6 (queue reconciliation).agents/scripts/supervisor-helper.sh: Addedstale-claimsCLI command + help text.agents/AGENTS.md: Documented the featureConfiguration
SUPERVISOR_STALE_CLAIM_SECONDS— threshold in seconds (default: 86400 = 24h)supervisor-helper.sh stale-claims [--repo path]Ref #1978
Summary by CodeRabbit