t1263: Add stale-claim auto-recovery to supervisor pulse by marcusquinn · Pull Request #1982 · marcusquinn/aidevops

marcusquinn · 2026-02-19T23:03:06Z

Summary

Adds Phase 0.5e stale-claim auto-recovery to the supervisor pulse cycle (t1263).

Problem: When interactive sessions claim tasks (adding assignee: + started: to TODO.md) but die or move on without completing them, the tasks become permanently stuck. Auto-pickup skips them because they have assignee/started fields, but no worker is running. This caused 15 aidevops tasks to sit idle for 2+ days.

Solution: New Phase 0.5e in the pulse cycle that detects stale claims and strips the fields so auto-pickup can re-dispatch them.

Detection Criteria (ALL must be true)

Task is open [ ] in TODO.md with assignee: and/or started: fields
Task is NOT in the supervisor DB as running/dispatched/evaluating/queued/pr_review/review_triage/merging
No active git worktree exists for the task
Claim age exceeds threshold (default: 24h, configurable via SUPERVISOR_STALE_CLAIM_SECONDS)

Safety

Respects t1017 assignee ownership rule: only unclaims tasks where assignee matches the local user (exact match + fuzzy match on username/GitHub handle)
External contributors' claims are NEVER touched
Uses existing cmd_unclaim --force for the actual field stripping (reuses proven logic)
Logs all recoveries for observability
Records SELF_HEAL_PATTERN to pattern tracker

Changes

.agents/scripts/supervisor/todo-sync.sh: Added recover_stale_claims() function
.agents/scripts/supervisor/pulse.sh: Wired Phase 0.5e between Phase 0.5d (auto-unblock) and Phase 0.6 (queue reconciliation)
.agents/scripts/supervisor-helper.sh: Added stale-claims CLI command + help text
.agents/AGENTS.md: Documented the feature

Configuration

SUPERVISOR_STALE_CLAIM_SECONDS — threshold in seconds (default: 86400 = 24h)
Manual check: supervisor-helper.sh stale-claims [--repo path]

Ref #1978

Summary by CodeRabbit

New Features
- Automatic stale-claim recovery: tasks claimed for over 24 hours with no active worker or worktree are now automatically unclaimed, allowing reassignment
- New supervisor command for manual stale-claim detection and recovery

Adds Phase 0.5e stale-claim auto-recovery function that detects tasks with assignee: and/or started: fields in TODO.md that have no active worker, no active worktree, and claim age >24h. Strips the fields so auto-pickup can re-dispatch them. Respects t1017 assignee ownership rule: only unclaims tasks where assignee matches the local user identity.

Adds Phase 0.5e between Phase 0.5d (auto-unblock) and Phase 0.6 (queue reconciliation). Iterates all known repos and calls recover_stale_claims() on each TODO.md. Runs every pulse to promptly detect and recover stuck tasks.

Adds "supervisor-helper.sh stale-claims [--repo path]" command for manual stale-claim recovery. Also adds help text entry.

coderabbitai · 2026-02-19T23:03:15Z

Warning

Rate limit exceeded

@marcusquinn has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 3 minutes and 31 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Walkthrough

This PR introduces automatic stale-claim recovery to the supervisor system. When tasks remain claimed (via assignee/started fields) for over 24 hours without active workers or worktrees, they are automatically unclaimed, allowing re-dispatch. Integration spans documentation, CLI, supervisor pulse flow, and core recovery logic.

Changes

Cohort / File(s)	Summary
Documentation `.agents/AGENTS.md`	Adds section documenting stale-claim auto-recovery feature (t1263), including detection criteria (no active worker, no worktree, claim age > 24h), unclaim behavior, and manual invocation via `supervisor-helper.sh stale-claims`.
CLI Interface `.agents/scripts/supervisor-helper.sh`	Introduces new `stale-claims` subcommand with optional `--repo <path>` parameter; routes to `recover_stale_claims` after ensuring supervisor DB initialization and resolving repository path.
Supervisor Pulse Integration `.agents/scripts/supervisor/pulse.sh`	Adds Phase 0.5e to invoke stale-claim recovery during pulse cycle; scans TODO.md files and calls `recover_stale_claims` per repository to auto-unclaim aged claims.
Recovery Logic `.agents/scripts/supervisor/todo-sync.sh`	Implements `recover_stale_claims()` function that validates TODO.md, gathers active worktrees and supervisor DB tasks, filters claims by ownership and age against configurable `SUPERVISOR_STALE_CLAIM_SECONDS` threshold (default 86400s), and invokes `cmd_unclaim --force` on stale entries.

Sequence Diagram(s)

sequenceDiagram
    participant Pulse as Supervisor Pulse
    participant TodoSync as todo-sync.sh
    participant DB as Supervisor DB
    participant Git as Git Worktrees
    participant TODO as TODO.md
    participant Unclaim as cmd_unclaim

    Pulse->>TodoSync: Phase 0.5e: recover_stale_claims(repo)
    TodoSync->>TODO: Read and parse tasks
    TodoSync->>DB: Query active tasks
    TodoSync->>Git: List active worktrees
    
    rect rgba(100, 150, 200, 0.5)
        Note over TodoSync: Filter & Classify
        TodoSync->>TodoSync: Check ownership (identity match)
        TodoSync->>TodoSync: Verify not in DB (external/active)
        TodoSync->>TodoSync: Verify no worktree
        TodoSync->>TodoSync: Check started: age > threshold
    end
    
    rect rgba(200, 150, 100, 0.5)
        Note over TodoSync,Unclaim: Recover Stale Claims
        loop For each stale claim
            TodoSync->>Unclaim: cmd_unclaim --force (remove assignee/started)
            Unclaim->>TODO: Strip fields
        end
    end
    
    TodoSync->>Pulse: Return summary (recovered/skipped counts)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat: provider-agnostic task claiming via TODO.md (t165) #627 — Introduces the TODO.md-based claim/unclaim flow that recover_stale_claims depends on for reading assignee/started fields and invoking cmd_unclaim.
t1261: Fix dispatch stall from orphaned DB tasks #1974 — Also modifies .agents/scripts/supervisor/todo-sync.sh to add automatic recovery logic for problematic tasks (orphaned DB records), sharing similar patterns and integration points.
feat: add supervisor-helper.sh with SQLite schema and state machine (t128.1) #376 — Establishes the supervisor-helper.sh command framework that this PR extends with the new stale-claims subcommand.

Poem

🌊 When claims grow stale, like bread left long,
The pulse now hums a healing song—
Abandonment turned to freedom's call,
Tasks reborn to serve us all! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and clearly describes the main change: adding stale-claim auto-recovery to the supervisor pulse cycle, matching the PR's core objective.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/t1263

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-02-19T23:03:47Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 30 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Feb 19 23:03:44 UTC 2026: Code review monitoring started
Thu Feb 19 23:03:44 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 30

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 30
VULNERABILITIES: 0

Generated on: Thu Feb 19 23:03:46 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

Adds documentation for the new stale-claim auto-recovery feature including configuration (SUPERVISOR_STALE_CLAIM_SECONDS), manual CLI command, and the detection criteria.

github-actions · 2026-02-19T23:07:10Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 30 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Feb 19 23:07:05 UTC 2026: Code review monitoring started
Thu Feb 19 23:07:06 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 30

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 30
VULNERABILITIES: 0

Generated on: Thu Feb 19 23:07:08 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

gemini-code-assist · 2026-02-19T23:07:10Z

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue where tasks claimed by interactive sessions could become permanently stuck if the session terminated or the user moved on without completing them. By integrating a new auto-recovery phase into the supervisor's pulse cycle, the system can now intelligently detect and reset these stale claims, ensuring that tasks remain dispatchable and the overall task flow is maintained without manual intervention.

Highlights

Stale Claim Auto-Recovery: Introduced Phase 0.5e into the supervisor pulse cycle to automatically detect and recover tasks that have been claimed but are no longer actively worked on, preventing them from becoming permanently stuck.
Detection Criteria: Stale claims are identified based on four conditions: the task is open with assignee/started fields, it's not tracked as active in the supervisor database, no active Git worktree exists for it, and its claim age exceeds a configurable threshold (defaulting to 24 hours).
Safety Measures: The recovery mechanism adheres to assignee ownership rules, only unclaiming tasks assigned to the local user and never touching claims by external contributors. It reuses existing, proven cmd_unclaim --force logic for stripping fields.
New Functionality and Command: A new recover_stale_claims() function was added to handle the recovery logic, and a supervisor-helper.sh stale-claims command was introduced for manual triggering and inspection.
Configuration and Documentation: The stale claim threshold is configurable via SUPERVISOR_STALE_CLAIM_SECONDS, and the new feature is documented in AGENTS.md.

Changelog

.agents/AGENTS.md
- Added documentation for the new stale-claim auto-recovery feature, explaining its purpose, detection criteria, and configuration options.
.agents/scripts/supervisor-helper.sh
- Introduced a new stale-claims command to allow manual detection and recovery of stale TODO.md claims.
- Implemented the logic for the stale-claims command, including parsing repository paths and calling the recover_stale_claims function.
.agents/scripts/supervisor/pulse.sh
- Integrated the recover_stale_claims function into the supervisor's pulse cycle as Phase 0.5e.
- Added logic to iterate through all repositories and apply stale claim recovery during each pulse.
.agents/scripts/supervisor/todo-sync.sh
- Implemented the recover_stale_claims function, which identifies stale tasks based on assignee, started timestamp, supervisor DB status, and active Git worktrees.
- Included safety checks to ensure only local user claims are processed and external claims are skipped.
- Utilized cmd_unclaim --force to strip assignee and started fields from stale tasks.
- Added logging for recovery actions and integration with the pattern tracker for observability.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces Phase 0.5e stale-claim auto-recovery to the supervisor pulse cycle. The implementation is robust, incorporating safety checks for local user identity and cross-referencing with the supervisor database and active git worktrees. I have identified a few issues related to variable scoping, logic inconsistencies regarding malformed claims, regex extraction order, and efficiency that should be addressed to ensure full correctness and adherence to the project's style guide and established shell scripting practices.

.agents/scripts/supervisor/todo-sync.sh

.agents/scripts/supervisor-helper.sh

.agents/scripts/supervisor/todo-sync.sh

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

.agents/scripts/supervisor/todo-sync.sh (1)

159-162: whoami called on every loop iteration — minor efficiency nit.

local_user=$(whoami ...) and identity_user="${identity%%@*}" are recomputed for every task with an assignee. Both values are invariant across the loop and could be computed once above the while loop.

♻️ Proposed refactor

+	local local_user
+	local_user=$(whoami 2>/dev/null || echo "")
+	local gh_user="${_CACHED_GH_USERNAME:-}"
+	local identity_user="${identity%%@*}"
+
 	while IFS= read -r line; do
 		...
 		if [[ "$is_local_user" == "false" ]]; then
-			local local_user
-			local_user=$(whoami 2>/dev/null || echo "")
-			local gh_user="${_CACHED_GH_USERNAME:-}"
-			local identity_user="${identity%%@*}"
 			if [[ "$assignee" == "$local_user" ]] || ...

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/supervisor/todo-sync.sh around lines 159 - 162, Move the
invariant computations out of the per-task loop: compute local_user (from
whoami) and identity_user (from identity variable via "${identity%%@*}") once
before the while loop instead of inside it; remove the per-iteration
declarations/localizations inside the loop and reference those precomputed
variables where the code currently uses local_user and identity_user so you
avoid calling whoami and re-parsing identity on every task iteration.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/scripts/supervisor/pulse.sh:
- Around line 835-852: The code unclaims tasks that have only started: without
an assignee because the grep in recover_stale_claims (todo-sync.sh) matches
either assignee: or started:, then downstream logic uses [[ -n "$assignee" ]] to
gate owner checks but still proceeds to unclaim started-only tasks; fix by
treating started-only tasks as untouchable: in todo-sync.sh inside
recover_stale_claims, after extracting the assignee variable (assignee) add an
explicit guard that if assignee is empty (i.e. started-only claim) then skip
processing/unclaiming and log/continue, or alternatively tighten the upstream
grep to only select lines containing assignee: (so only tasks with an assignee
are considered). Ensure you reference recover_stale_claims, the grep pattern,
and the assignee variable when making the change.

In @.agents/scripts/supervisor/todo-sync.sh:
- Around line 148-177: The ownership check currently only runs when assignee is
non-empty, which lets tasks with a started: timestamp but no assignee be
recovered incorrectly; update the t1017 logic to treat a missing assignee as
unverifiable ownership and skip recovery: before running the is_local_user
checks, detect if assignee is empty (or unset) and increment
skipped_external/log and continue; otherwise proceed with the existing
comparisons using is_local_user, identity, _CACHED_GH_USERNAME, whoami and the
username-prefix checks (assignee, identity_user, ${assignee%%@*}) so only tasks
with a verifiable assignee are unclaimed.
- Around line 203-225: The code treats an unparseable started_ts as age 0 and
falls through to unclaim; change the parsing block around
started_ts/started_epoch so that if started_ts is non-empty but started_epoch
remains 0 (parse failure) you log a clear message and conservatively skip the
task instead of treating it as stale: after the existing date parsing (variables
started_ts and started_epoch) add a branch that checks [[ -n "$started_ts" &&
"$started_epoch" -eq 0 ]] then log_verbose (mentioning the unparseable
started_ts) increment a skipped_unknown counter (or reuse skipped_young) and
continue; keep the existing branch for empty started_ts (the else) intact and
leave the normal claim_age logic when started_epoch > 0 unchanged.

---

Nitpick comments:
In @.agents/scripts/supervisor/todo-sync.sh:
- Around line 159-162: Move the invariant computations out of the per-task loop:
compute local_user (from whoami) and identity_user (from identity variable via
"${identity%%@*}") once before the while loop instead of inside it; remove the
per-iteration declarations/localizations inside the loop and reference those
precomputed variables where the code currently uses local_user and identity_user
so you avoid calling whoami and re-parsing identity on every task iteration.

.agents/scripts/supervisor/pulse.sh

.agents/scripts/supervisor/todo-sync.sh

Auto-dismissed: bot review does not block autonomous pipeline

- Guard empty assignee as untouchable in recover_stale_claims (critical): tasks with started: but no assignee: are now skipped instead of silently unclaimed, protecting external contributors' claims - Treat unparseable started: timestamps conservatively (high): parse failures now skip the task instead of falling through to unclaim - Use tail -1 for started: extraction to match assignee: pattern (medium) - Move loop-invariant identity vars outside while loop (medium) - Separate _sc_repo declaration and assignment (medium)

github-actions · 2026-02-19T23:54:29Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 30 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Feb 19 23:54:25 UTC 2026: Code review monitoring started
Thu Feb 19 23:54:26 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 30

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 30
VULNERABILITIES: 0

Generated on: Thu Feb 19 23:54:28 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

sonarqubecloud · 2026-02-19T23:55:13Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

marcusquinn added 3 commits February 19, 2026 23:01

feat: add stale-claims CLI command to supervisor-helper.sh (t1263)

8c77699

Adds "supervisor-helper.sh stale-claims [--repo path]" command for manual stale-claim recovery. Also adds help text entry.

docs: document Phase 0.5e stale-claim recovery in AGENTS.md (t1263)

c1361bb

Adds documentation for the new stale-claim auto-recovery feature including configuration (SUPERVISOR_STALE_CLAIM_SECONDS), manual CLI command, and the detection criteria.

marcusquinn marked this pull request as ready for review February 19, 2026 23:06

gemini-code-assist bot reviewed Feb 19, 2026

View reviewed changes

coderabbitai bot previously requested changes Feb 19, 2026

View reviewed changes

.agents/scripts/supervisor/pulse.sh Show resolved Hide resolved

.agents/scripts/supervisor/todo-sync.sh Show resolved Hide resolved

.agents/scripts/supervisor/todo-sync.sh Show resolved Hide resolved

marcusquinn mentioned this pull request Feb 19, 2026

[Supervisor:marcusquinn] 0 queued, 0 working, 1 in review at 14:46 UTC #1502

Closed

coderabbitai bot approved these changes Feb 20, 2026

View reviewed changes

marcusquinn merged commit 3c7fa63 into main Feb 20, 2026
35 checks passed

marcusquinn deleted the feature/t1263 branch February 20, 2026 00:28

marcusquinn added a commit that referenced this pull request Feb 20, 2026

chore: mark t1263 complete in TODO.md (#1982)

37eba68

marcusquinn mentioned this pull request Feb 20, 2026

t1263: Add self-healing TODO/DB reconciliation to supervisor pulse #1978

Closed

marcusquinn mentioned this pull request Mar 7, 2026

quality-debt: PR #1982 review feedback (medium) #3634

Open

Conversation

marcusquinn commented Feb 19, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Detection Criteria (ALL must be true)

Safety

Changes

Configuration

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

github-actions bot commented Feb 19, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

github-actions bot commented Feb 19, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

gemini-code-assist bot commented Feb 19, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 19, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

sonarqubecloud bot commented Feb 19, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

marcusquinn commented Feb 19, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 19, 2026 •

edited

Loading