Skip to content

t1263: Add stale-claim auto-recovery to supervisor pulse#1982

Merged
marcusquinn merged 5 commits intomainfrom
feature/t1263
Feb 20, 2026
Merged

t1263: Add stale-claim auto-recovery to supervisor pulse#1982
marcusquinn merged 5 commits intomainfrom
feature/t1263

Conversation

@marcusquinn
Copy link
Owner

@marcusquinn marcusquinn commented Feb 19, 2026

Summary

Adds Phase 0.5e stale-claim auto-recovery to the supervisor pulse cycle (t1263).

Problem: When interactive sessions claim tasks (adding assignee: + started: to TODO.md) but die or move on without completing them, the tasks become permanently stuck. Auto-pickup skips them because they have assignee/started fields, but no worker is running. This caused 15 aidevops tasks to sit idle for 2+ days.

Solution: New Phase 0.5e in the pulse cycle that detects stale claims and strips the fields so auto-pickup can re-dispatch them.

Detection Criteria (ALL must be true)

  1. Task is open [ ] in TODO.md with assignee: and/or started: fields
  2. Task is NOT in the supervisor DB as running/dispatched/evaluating/queued/pr_review/review_triage/merging
  3. No active git worktree exists for the task
  4. Claim age exceeds threshold (default: 24h, configurable via SUPERVISOR_STALE_CLAIM_SECONDS)

Safety

  • Respects t1017 assignee ownership rule: only unclaims tasks where assignee matches the local user (exact match + fuzzy match on username/GitHub handle)
  • External contributors' claims are NEVER touched
  • Uses existing cmd_unclaim --force for the actual field stripping (reuses proven logic)
  • Logs all recoveries for observability
  • Records SELF_HEAL_PATTERN to pattern tracker

Changes

  • .agents/scripts/supervisor/todo-sync.sh: Added recover_stale_claims() function
  • .agents/scripts/supervisor/pulse.sh: Wired Phase 0.5e between Phase 0.5d (auto-unblock) and Phase 0.6 (queue reconciliation)
  • .agents/scripts/supervisor-helper.sh: Added stale-claims CLI command + help text
  • .agents/AGENTS.md: Documented the feature

Configuration

  • SUPERVISOR_STALE_CLAIM_SECONDS — threshold in seconds (default: 86400 = 24h)
  • Manual check: supervisor-helper.sh stale-claims [--repo path]

Ref #1978

Summary by CodeRabbit

  • New Features
    • Automatic stale-claim recovery: tasks claimed for over 24 hours with no active worker or worktree are now automatically unclaimed, allowing reassignment
    • New supervisor command for manual stale-claim detection and recovery

Adds Phase 0.5e stale-claim auto-recovery function that detects tasks with
assignee: and/or started: fields in TODO.md that have no active worker,
no active worktree, and claim age >24h. Strips the fields so auto-pickup
can re-dispatch them. Respects t1017 assignee ownership rule: only unclaims
tasks where assignee matches the local user identity.
Adds Phase 0.5e between Phase 0.5d (auto-unblock) and Phase 0.6 (queue
reconciliation). Iterates all known repos and calls recover_stale_claims()
on each TODO.md. Runs every pulse to promptly detect and recover stuck tasks.
Adds "supervisor-helper.sh stale-claims [--repo path]" command for manual
stale-claim recovery. Also adds help text entry.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 19, 2026

Warning

Rate limit exceeded

@marcusquinn has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 3 minutes and 31 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Walkthrough

This PR introduces automatic stale-claim recovery to the supervisor system. When tasks remain claimed (via assignee/started fields) for over 24 hours without active workers or worktrees, they are automatically unclaimed, allowing re-dispatch. Integration spans documentation, CLI, supervisor pulse flow, and core recovery logic.

Changes

Cohort / File(s) Summary
Documentation
.agents/AGENTS.md
Adds section documenting stale-claim auto-recovery feature (t1263), including detection criteria (no active worker, no worktree, claim age > 24h), unclaim behavior, and manual invocation via supervisor-helper.sh stale-claims.
CLI Interface
.agents/scripts/supervisor-helper.sh
Introduces new stale-claims subcommand with optional --repo <path> parameter; routes to recover_stale_claims after ensuring supervisor DB initialization and resolving repository path.
Supervisor Pulse Integration
.agents/scripts/supervisor/pulse.sh
Adds Phase 0.5e to invoke stale-claim recovery during pulse cycle; scans TODO.md files and calls recover_stale_claims per repository to auto-unclaim aged claims.
Recovery Logic
.agents/scripts/supervisor/todo-sync.sh
Implements recover_stale_claims() function that validates TODO.md, gathers active worktrees and supervisor DB tasks, filters claims by ownership and age against configurable SUPERVISOR_STALE_CLAIM_SECONDS threshold (default 86400s), and invokes cmd_unclaim --force on stale entries.

Sequence Diagram(s)

sequenceDiagram
    participant Pulse as Supervisor Pulse
    participant TodoSync as todo-sync.sh
    participant DB as Supervisor DB
    participant Git as Git Worktrees
    participant TODO as TODO.md
    participant Unclaim as cmd_unclaim

    Pulse->>TodoSync: Phase 0.5e: recover_stale_claims(repo)
    TodoSync->>TODO: Read and parse tasks
    TodoSync->>DB: Query active tasks
    TodoSync->>Git: List active worktrees
    
    rect rgba(100, 150, 200, 0.5)
        Note over TodoSync: Filter & Classify
        TodoSync->>TodoSync: Check ownership (identity match)
        TodoSync->>TodoSync: Verify not in DB (external/active)
        TodoSync->>TodoSync: Verify no worktree
        TodoSync->>TodoSync: Check started: age > threshold
    end
    
    rect rgba(200, 150, 100, 0.5)
        Note over TodoSync,Unclaim: Recover Stale Claims
        loop For each stale claim
            TodoSync->>Unclaim: cmd_unclaim --force (remove assignee/started)
            Unclaim->>TODO: Strip fields
        end
    end
    
    TodoSync->>Pulse: Return summary (recovered/skipped counts)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🌊 When claims grow stale, like bread left long,
The pulse now hums a healing song—
Abandonment turned to freedom's call,
Tasks reborn to serve us all! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly describes the main change: adding stale-claim auto-recovery to the supervisor pulse cycle, matching the PR's core objective.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/t1263

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 30 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Feb 19 23:03:44 UTC 2026: Code review monitoring started
Thu Feb 19 23:03:44 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 30

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 30
  • VULNERABILITIES: 0

Generated on: Thu Feb 19 23:03:46 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

Adds documentation for the new stale-claim auto-recovery feature including
configuration (SUPERVISOR_STALE_CLAIM_SECONDS), manual CLI command, and
the detection criteria.
@marcusquinn marcusquinn marked this pull request as ready for review February 19, 2026 23:06
@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 30 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Feb 19 23:07:05 UTC 2026: Code review monitoring started
Thu Feb 19 23:07:06 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 30

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 30
  • VULNERABILITIES: 0

Generated on: Thu Feb 19 23:07:08 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@gemini-code-assist
Copy link

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue where tasks claimed by interactive sessions could become permanently stuck if the session terminated or the user moved on without completing them. By integrating a new auto-recovery phase into the supervisor's pulse cycle, the system can now intelligently detect and reset these stale claims, ensuring that tasks remain dispatchable and the overall task flow is maintained without manual intervention.

Highlights

  • Stale Claim Auto-Recovery: Introduced Phase 0.5e into the supervisor pulse cycle to automatically detect and recover tasks that have been claimed but are no longer actively worked on, preventing them from becoming permanently stuck.
  • Detection Criteria: Stale claims are identified based on four conditions: the task is open with assignee/started fields, it's not tracked as active in the supervisor database, no active Git worktree exists for it, and its claim age exceeds a configurable threshold (defaulting to 24 hours).
  • Safety Measures: The recovery mechanism adheres to assignee ownership rules, only unclaiming tasks assigned to the local user and never touching claims by external contributors. It reuses existing, proven cmd_unclaim --force logic for stripping fields.
  • New Functionality and Command: A new recover_stale_claims() function was added to handle the recovery logic, and a supervisor-helper.sh stale-claims command was introduced for manual triggering and inspection.
  • Configuration and Documentation: The stale claim threshold is configurable via SUPERVISOR_STALE_CLAIM_SECONDS, and the new feature is documented in AGENTS.md.
Changelog
  • .agents/AGENTS.md
    • Added documentation for the new stale-claim auto-recovery feature, explaining its purpose, detection criteria, and configuration options.
  • .agents/scripts/supervisor-helper.sh
    • Introduced a new stale-claims command to allow manual detection and recovery of stale TODO.md claims.
    • Implemented the logic for the stale-claims command, including parsing repository paths and calling the recover_stale_claims function.
  • .agents/scripts/supervisor/pulse.sh
    • Integrated the recover_stale_claims function into the supervisor's pulse cycle as Phase 0.5e.
    • Added logic to iterate through all repositories and apply stale claim recovery during each pulse.
  • .agents/scripts/supervisor/todo-sync.sh
    • Implemented the recover_stale_claims function, which identifies stale tasks based on assignee, started timestamp, supervisor DB status, and active Git worktrees.
    • Included safety checks to ensure only local user claims are processed and external claims are skipped.
    • Utilized cmd_unclaim --force to strip assignee and started fields from stale tasks.
    • Added logging for recovery actions and integration with the pattern tracker for observability.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Phase 0.5e stale-claim auto-recovery to the supervisor pulse cycle. The implementation is robust, incorporating safety checks for local user identity and cross-referencing with the supervisor database and active git worktrees. I have identified a few issues related to variable scoping, logic inconsistencies regarding malformed claims, regex extraction order, and efficiency that should be addressed to ensure full correctness and adherence to the project's style guide and established shell scripting practices.

coderabbitai[bot]
coderabbitai bot previously requested changes Feb 19, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
.agents/scripts/supervisor/todo-sync.sh (1)

159-162: whoami called on every loop iteration — minor efficiency nit.

local_user=$(whoami ...) and identity_user="${identity%%@*}" are recomputed for every task with an assignee. Both values are invariant across the loop and could be computed once above the while loop.

♻️ Proposed refactor
+	local local_user
+	local_user=$(whoami 2>/dev/null || echo "")
+	local gh_user="${_CACHED_GH_USERNAME:-}"
+	local identity_user="${identity%%@*}"
+
 	while IFS= read -r line; do
 		...
 		if [[ "$is_local_user" == "false" ]]; then
-			local local_user
-			local_user=$(whoami 2>/dev/null || echo "")
-			local gh_user="${_CACHED_GH_USERNAME:-}"
-			local identity_user="${identity%%@*}"
 			if [[ "$assignee" == "$local_user" ]] || ...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/supervisor/todo-sync.sh around lines 159 - 162, Move the
invariant computations out of the per-task loop: compute local_user (from
whoami) and identity_user (from identity variable via "${identity%%@*}") once
before the while loop instead of inside it; remove the per-iteration
declarations/localizations inside the loop and reference those precomputed
variables where the code currently uses local_user and identity_user so you
avoid calling whoami and re-parsing identity on every task iteration.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/scripts/supervisor/pulse.sh:
- Around line 835-852: The code unclaims tasks that have only started: without
an assignee because the grep in recover_stale_claims (todo-sync.sh) matches
either assignee: or started:, then downstream logic uses [[ -n "$assignee" ]] to
gate owner checks but still proceeds to unclaim started-only tasks; fix by
treating started-only tasks as untouchable: in todo-sync.sh inside
recover_stale_claims, after extracting the assignee variable (assignee) add an
explicit guard that if assignee is empty (i.e. started-only claim) then skip
processing/unclaiming and log/continue, or alternatively tighten the upstream
grep to only select lines containing assignee: (so only tasks with an assignee
are considered). Ensure you reference recover_stale_claims, the grep pattern,
and the assignee variable when making the change.

In @.agents/scripts/supervisor/todo-sync.sh:
- Around line 148-177: The ownership check currently only runs when assignee is
non-empty, which lets tasks with a started: timestamp but no assignee be
recovered incorrectly; update the t1017 logic to treat a missing assignee as
unverifiable ownership and skip recovery: before running the is_local_user
checks, detect if assignee is empty (or unset) and increment
skipped_external/log and continue; otherwise proceed with the existing
comparisons using is_local_user, identity, _CACHED_GH_USERNAME, whoami and the
username-prefix checks (assignee, identity_user, ${assignee%%@*}) so only tasks
with a verifiable assignee are unclaimed.
- Around line 203-225: The code treats an unparseable started_ts as age 0 and
falls through to unclaim; change the parsing block around
started_ts/started_epoch so that if started_ts is non-empty but started_epoch
remains 0 (parse failure) you log a clear message and conservatively skip the
task instead of treating it as stale: after the existing date parsing (variables
started_ts and started_epoch) add a branch that checks [[ -n "$started_ts" &&
"$started_epoch" -eq 0 ]] then log_verbose (mentioning the unparseable
started_ts) increment a skipped_unknown counter (or reuse skipped_young) and
continue; keep the existing branch for empty started_ts (the else) intact and
leave the normal claim_age logic when started_epoch > 0 unchanged.

---

Nitpick comments:
In @.agents/scripts/supervisor/todo-sync.sh:
- Around line 159-162: Move the invariant computations out of the per-task loop:
compute local_user (from whoami) and identity_user (from identity variable via
"${identity%%@*}") once before the while loop instead of inside it; remove the
per-iteration declarations/localizations inside the loop and reference those
precomputed variables where the code currently uses local_user and identity_user
so you avoid calling whoami and re-parsing identity on every task iteration.

@marcusquinn marcusquinn dismissed coderabbitai[bot]’s stale review February 19, 2026 23:34

Auto-dismissed: bot review does not block autonomous pipeline

- Guard empty assignee as untouchable in recover_stale_claims (critical):
  tasks with started: but no assignee: are now skipped instead of silently
  unclaimed, protecting external contributors' claims
- Treat unparseable started: timestamps conservatively (high): parse failures
  now skip the task instead of falling through to unclaim
- Use tail -1 for started: extraction to match assignee: pattern (medium)
- Move loop-invariant identity vars outside while loop (medium)
- Separate _sc_repo declaration and assignment (medium)
@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 30 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Feb 19 23:54:25 UTC 2026: Code review monitoring started
Thu Feb 19 23:54:26 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 30

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 30
  • VULNERABILITIES: 0

Generated on: Thu Feb 19 23:54:28 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link

@marcusquinn marcusquinn merged commit 3c7fa63 into main Feb 20, 2026
35 checks passed
@marcusquinn marcusquinn deleted the feature/t1263 branch February 20, 2026 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant