diff --git a/TODO.md b/TODO.md index 6c7bc2d87..e098a9c1c 100644 --- a/TODO.md +++ b/TODO.md @@ -1548,7 +1548,7 @@ t019.3.4,Update AGENTS.md with Beads integration docs,,beads,1h,45m,2025-12-21T1 - [x] t1120.1 Extract platform-agnostic functions from issue-sync-helper.sh (parse, compose, ref management) ~1h ref:GH#1676 assignee:marcusquinn started:2026-02-20T19:52:54Z pr:#2029 completed:2026-02-20 - [x] t1120.2 Add Gitea API adapter functions (create/close/edit/list/search/labels) ~1h ref:GH#1677 assignee:marcusquinn started:2026-02-20T21:24:40Z pr:#2031 completed:2026-02-20 - [x] t1120.3 Add platform detection from git remote URL + multi-platform push/pull/close ~1h ref:GH#1678 #auto-dispatch model:sonnet assignee:marcusquinn started:2026-02-18T22:07:40Z pr:#1815 completed:2026-02-18 - - [ ] t1120.4 Test with dual-hosted repo (GitHub + Gitea sync) ~30m ref:GH#1679 assignee:marcusquinn started:2026-02-20T22:21:08Z + - [ ] t1120.4 Test with dual-hosted repo (GitHub + Gitea sync) ~30m ref:GH#1679 assignee:marcusquinn started:2026-02-20T22:21:08Z blocked-by:t1120.2 - [x] t1121 Fix tea CLI TTY requirement in non-interactive mode #bugfix #git ~30m model:haiku ref:GH#1680 assignee:marcusquinn started:2026-02-18T22:08:17Z logged:2026-02-18 #auto-dispatch pr:#1814 completed:2026-02-18 - `tea issues list` fails with `huh: could not open a new TTY: open /dev/tty: device not configured` when run from scripts/CI @@ -1757,46 +1757,25 @@ t019.3.4,Update AGENTS.md with Beads integration docs,,beads,1h,45m,2025-12-21T1 - [x] t1246 Auto-unblock tasks when blockers are verified — verify t1243 deployment covers audit findings #bugfix #auto-dispatch #self-improvement ~30m model:sonnet category:verification — Issue audit found 3 high-severity unblock opportunities: t1224.5 (blocked by t1224.2,t1224.4 — both verified), t1224.7 (same blockers), t1224.8 (blocked by t1224.3 — verified). Task t1243 was just completed (auto-unblock detection), but these 3 tasks are still showing as blocked. Verify that the t1243 PR's auto-unblock Phase is active in the pulse cycle and has processed these specific cases. If not, manually unblock them and investigate why the new Phase didn't catch them. ref:GH#1937 assignee:marcusquinn started:2026-02-19T06:58:10Z pr:#1938 completed:2026-02-19 -<<<<<<< Updated upstream - -- [x] # t1247 Auto-unblock tasks when blocker transitions to deployed/verified status #bugfix #auto-dispatch #self-improvement ~1h model:sonnet category:automation — Audit found t1224.9 still marked as despite t1224.6 being deployed and verified. Task t1246 was supposed to verify auto-unblock coverage (completed successfully), yet t1224.9 remains blocked. The pulse cycle's auto-unblock phase may have a gap where it doesn't process all blocked tasks after a blocker is verified. Investigate whether the unblock phase runs after every verification or only on specific triggers, and ensure all downstream tasks are unblocked atomically when a blocker completes. ref:GH#1942 assignee:marcusquinn started:2026-02-19T07:45:47Z pr:#1945 completed:2026-02-19 +- [x] t1247 Auto-unblock tasks when blocker transitions to deployed/verified status #bugfix #auto-dispatch #self-improvement ~1h model:sonnet category:automation — Audit found t1224.9 still marked as despite t1224.6 being deployed and verified. Task t1246 was supposed to verify auto-unblock coverage (completed successfully), yet t1224.9 remains blocked. The pulse cycle's auto-unblock phase may have a gap where it doesn't process all blocked tasks after a blocker is verified. Investigate whether the unblock phase runs after every verification or only on specific triggers, and ensure all downstream tasks are unblocked atomically when a blocker completes. ref:GH#1942 assignee:marcusquinn started:2026-02-19T07:45:47Z pr:#1945 completed:2026-02-19 - [x] t1248 Investigate 7-day success rate drop from 94% overall to 89% #bugfix #auto-dispatch #self-improvement ~1h model:sonnet category:reliability — Overall success rate is 94% (920/978) but the 7-day window shows 89% (451 completed, 53 failed). The recent worker outcomes show a cluster of worker_never_started:no_sentinel failures on Feb 13 and worker hung timeouts on Feb 12. Task t1190 exists to investigate worker_never_started but the hung worker pattern (t311.2, t311.3, t303 all timing out at ~1800s) may indicate a separate issue — possibly tasks that exceed the timeout threshold or get stuck in loops. Investigate whether the hung workers share common characteristics (task type, model tier, repo) and whether the timeout threshold needs adjustment or the tasks need better scoping. ref:GH#1944 assignee:marcusquinn started:2026-02-19T22:42:00Z pr:#1983 completed:2026-02-19 - > > > > > > > Stashed changes - - [x] t1249 Add stale-evaluating root cause analysis to pulse cycle #self-improvement #auto-dispatch #bugfix #orchestration ~2h model:sonnet category:automation — Multiple recently completed tasks (t1246, t1244, t1243, t1224.5, t004.1, t1242, t1241, t1240, t1224.4, t1239, t1238, t1224.3) required 'Stale evaluating recovery (Phase 0.7)' intervention. While the self-healing works, the root cause of tasks getting stuck in 'evaluating' state should be diagnosed. Possible causes: evaluator timeout, race condition in status transitions, or worker completing but evaluation not triggering. Add instrumentation to log why tasks enter stale-evaluating state (time between worker completion and evaluation start, evaluator exit codes) so the root cause can be fixed rather than just recovered from. assignee:marcusquinn started:2026-02-19T08:07:35Z ref:GH#1946 pr:#1949 completed:2026-02-19 -<<<<<<< Updated upstream - -- [x] # t1250 Reduce stale-evaluating recovery frequency by improving worker completion reporting #bugfix #auto-dispatch #self-improvement ~2h model:sonnet category:reliability — 8 of 15 recently completed tasks required 'Stale evaluating recovery (Phase 0.7)' intervention. While the recovery mechanism works, the high frequency suggests workers are completing but not reporting status back to the supervisor cleanly. Root cause investigation needed: are workers exiting before the status update commits? Is there a race condition between worker completion and supervisor polling? Fixing this would eliminate unnecessary recovery cycles and reduce pulse overhead. ref:GH#1947 assignee:marcusquinn pr:#1950 completed:2026-02-19 - - > > > > > > > Stashed changes +- [x] t1250 Reduce stale-evaluating recovery frequency by improving worker completion reporting #bugfix #auto-dispatch #self-improvement ~2h model:sonnet category:reliability — 8 of 15 recently completed tasks required 'Stale evaluating recovery (Phase 0.7)' intervention. While the recovery mechanism works, the high frequency suggests workers are completing but not reporting status back to the supervisor cleanly. Root cause investigation needed: are workers exiting before the status update commits? Is there a race condition between worker completion and supervisor polling? Fixing this would eliminate unnecessary recovery cycles and reduce pulse overhead. ref:GH#1947 assignee:marcusquinn pr:#1950 completed:2026-02-19 - [x] t1251 Investigate frequent stale-evaluating recovery pattern in supervisor #bugfix #auto-dispatch #self-improvement ~2h model:sonnet category:reliability — 11 of 15 recently completed tasks (73%) required 'Stale evaluating recovery (Phase 0.7)' intervention. This suggests tasks are routinely getting stuck in 'evaluating' state and requiring the recovery mechanism to unstick them. While the recovery works, the root cause should be investigated — likely the evaluation step is timing out, hanging, or failing silently. Fixing this would reduce task completion latency and eliminate unnecessary retry overhead (~35min avg wasted per stale recovery based on duration data). assignee:marcusquinn started:2026-02-19T09:04:58Z ref:GH#1951 pr:#1952 completed:2026-02-19 -<<<<<<< Updated upstream - -- [x] # t1252 Investigate and reduce stale-evaluating recovery frequency #bugfix #auto-dispatch #self-improvement ~2h model:sonnet category:reliability — 73% of recently completed tasks (11/15) required 'Stale evaluating recovery (Phase 0.7)' intervention. While t1251 was dispatched to investigate this pattern, the root cause should be verified as fixed. If the evaluation step is still routinely timing out or hanging, add instrumentation: (1) log evaluation start/end timestamps, (2) add a watchdog timer that detects evaluation hangs within 60s instead of waiting for the full stale timeout, (3) emit a metric for evaluation duration so trends are visible. Expected benefit: eliminate ~35min avg wasted per stale recovery, reducing task completion latency by ~25% for affected tasks. ref:GH#1954 assignee:marcusquinn pr:#1955 completed:2026-02-19 - > > > > > > > Stashed changes +- [x] t1252 Investigate and reduce stale-evaluating recovery frequency #bugfix #auto-dispatch #self-improvement ~2h model:sonnet category:reliability — 73% of recently completed tasks (11/15) required 'Stale evaluating recovery (Phase 0.7)' intervention. While t1251 was dispatched to investigate this pattern, the root cause should be verified as fixed. If the evaluation step is still routinely timing out or hanging, add instrumentation: (1) log evaluation start/end timestamps, (2) add a watchdog timer that detects evaluation hangs within 60s instead of waiting for the full stale timeout, (3) emit a metric for evaluation duration so trends are visible. Expected benefit: eliminate ~35min avg wasted per stale recovery, reducing task completion latency by ~25% for affected tasks. ref:GH#1954 assignee:marcusquinn pr:#1955 completed:2026-02-19 -<<<<<<< Updated upstream - -- [x] # t1253 Investigate awardsapp dispatch stall — 15 subtasks dispatchable but none being picked up #bugfix #auto-dispatch #self-improvement ~1h model:sonnet category:automation — The awardsapp repo has 15 dispatchable subtasks (t004.3-t004.5, t005.2-t005.6, t007.2-t007.8) but no workers are running and none are being picked up. This could indicate: (1) awardsapp tasks are not in the supervisor batch, (2) cross-repo dispatch fairness is not routing to awardsapp, (3) the @marcus assignee on parent tasks is blocking subtask dispatch, or (4) the subtasks lack #auto-dispatch tags. Investigate the root cause and ensure the dispatch pipeline handles cross-repo queues correctly. Check if parent task assignee propagates to subtasks and blocks auto-pickup. ref:GH#1956 assignee:marcusquinn started:2026-02-19T10:58:53Z pr:#1959 completed:2026-02-19 +- [x] t1253 Investigate awardsapp dispatch stall — 15 subtasks dispatchable but none being picked up #bugfix #auto-dispatch #self-improvement ~1h model:sonnet category:automation — The awardsapp repo has 15 dispatchable subtasks (t004.3-t004.5, t005.2-t005.6, t007.2-t007.8) but no workers are running and none are being picked up. This could indicate: (1) awardsapp tasks are not in the supervisor batch, (2) cross-repo dispatch fairness is not routing to awardsapp, (3) the @marcus assignee on parent tasks is blocking subtask dispatch, or (4) the subtasks lack #auto-dispatch tags. Investigate the root cause and ensure the dispatch pipeline handles cross-repo queues correctly. Check if parent task assignee propagates to subtasks and blocks auto-pickup. ref:GH#1956 assignee:marcusquinn started:2026-02-19T10:58:53Z pr:#1959 completed:2026-02-19 - [x] t1254 Add stale-evaluating root cause fix based on t1251 investigation findings #bugfix #self-improvement #auto-dispatch ~1h model:sonnet category:reliability — t1251 investigated the frequent stale-evaluating recovery pattern (73% of recently completed tasks required Phase 0.7 intervention). The investigation is now complete and verified. Review the PR from t1251 (PR #1952) to determine: (1) whether the root cause was identified and fixed, or (2) whether only the investigation was done and a follow-up implementation task is needed. If the fix was implemented, verify the stale-evaluating rate drops in the next 24h. If only diagnosed, create a targeted fix task. Currently ~35min avg is wasted per stale recovery — eliminating this saves ~6h/day at current throughput (15 tasks/day \* 35min). assignee:marcusquinn started:2026-02-19T10:59:34Z ref:GH#1957 pr:#1958 completed:2026-02-19 - > > > > > > > Stashed changes - -<<<<<<< Updated upstream - -- [x] # t1255 Investigate awardsapp cross-repo dispatch — 15 tasks dispatchable but 0 workers running #bugfix #self-improvement #auto-dispatch ~2h model:sonnet category:reliability — The awardsapp repo has 15 dispatchable subtasks (t004.3-t004.5, t005.2-t005.6, t007.2-t007.8) but the supervisor shows 0 running workers and these tasks aren't being picked up. t1253 was created to investigate but this may indicate a systemic cross-repo dispatch issue — possibly the supervisor pulse only scans the primary repo, or awardsapp tasks lack required metadata for dispatch. This improvement should verify: (1) supervisor pulse scans all registered repos, (2) awardsapp tasks have correct repo path in supervisor DB, (3) cross-repo concurrency fairness (t1188.2) is functioning. If t1253 already covers this scope, link and close this as duplicate. ref:GH#1960 assignee:marcusquinn pr:#1961 completed:2026-02-19 - > > > > > > > Stashed changes - -<<<<<<< Updated upstream -- [x] # t1256 Add stale-evaluating root cause analysis to pulse Phase 0.7 #bugfix #auto-dispatch #self-improvement ~2h model:sonnet category:automation — 14 of 15 recently completed tasks required 'Stale evaluating recovery (Phase 0.7)' intervention. While the self-healing works, the high frequency suggests a systemic issue — tasks are routinely getting stuck in evaluating state rather than transitioning cleanly. Root cause analysis should determine why tasks stall in evaluating (possible causes: evaluator timeout, race condition in status transition, slow PR merge detection) and fix the upstream issue rather than relying on recovery. This would reduce retry overhead and improve average task completion time. ref:GH#1962 assignee:marcusquinn pr:#1963 completed:2026-02-19 +- [x] t1255 Investigate awardsapp cross-repo dispatch — 15 tasks dispatchable but 0 workers running #bugfix #self-improvement #auto-dispatch ~2h model:sonnet category:reliability — The awardsapp repo has 15 dispatchable subtasks (t004.3-t004.5, t005.2-t005.6, t007.2-t007.8) but the supervisor shows 0 running workers and these tasks aren't being picked up. t1253 was created to investigate but this may indicate a systemic cross-repo dispatch issue — possibly the supervisor pulse only scans the primary repo, or awardsapp tasks lack required metadata for dispatch. This improvement should verify: (1) supervisor pulse scans all registered repos, (2) awardsapp tasks have correct repo path in supervisor DB, (3) cross-repo concurrency fairness (t1188.2) is functioning. If t1253 already covers this scope, link and close this as duplicate. ref:GH#1960 assignee:marcusquinn pr:#1961 completed:2026-02-19 - > > > > > > > Stashed changes +- [x] t1256 Add stale-evaluating root cause analysis to pulse Phase 0.7 #bugfix #auto-dispatch #self-improvement ~2h model:sonnet category:automation — 14 of 15 recently completed tasks required 'Stale evaluating recovery (Phase 0.7)' intervention. While the self-healing works, the high frequency suggests a systemic issue — tasks are routinely getting stuck in evaluating state rather than transitioning cleanly. Root cause analysis should determine why tasks stall in evaluating (possible causes: evaluator timeout, race condition in status transition, slow PR merge detection) and fix the upstream issue rather than relying on recovery. This would reduce retry overhead and improve average task completion time. ref:GH#1962 assignee:marcusquinn pr:#1963 completed:2026-02-19 - [ ] t1257 Add sequential dependency enforcement for t1120 subtask chain #bugfix #auto-dispatch #self-improvement ~15m model:haiku category:process — t1120.1, t1120.2, and t1120.4 are all eligible for auto-dispatch but have an implicit sequential dependency (extract functions → add adapter → test). Without explicit blocked-by fields, they could be dispatched simultaneously and t1120.2/t1120.4 would fail because t1120.1 hasn't landed yet. Add to t1120.2 and to t1120.4 in TODO.md to prevent wasted dispatch cycles. ref:GH#1964 assignee:marcusquinn started:2026-02-20T21:25:14Z - [x] t1258 Investigate high volume of stale evaluating recovery events #bugfix #auto-dispatch #self-improvement ~2h model:sonnet category:reliability — 15 of 15 recently completed tasks in the last 24h show 'Stale evaluating recovery (Phase 0.7)' notes. This suggests tasks are routinely getting stuck in 'evaluating' state and requiring Phase 0.7 recovery rather than completing normally. Root cause investigation needed: is the evaluation step timing out? Is there a race condition between worker completion and evaluation? The recovery mechanism works but shouldn't be the primary completion path. Check supervisor-helper.sh evaluate flow, worker sentinel timing, and whether evaluation is blocking on external calls (GitHub API rate limits, PR merge checks). assignee:marcusquinn started:2026-02-19T13:35:18Z ref:GH#1965 pr:#1966 completed:2026-02-19