diff --git a/docs/backlog/P3/B-0519-multi-otto-branch-state-contamination-rca-2026-05-14.md b/docs/backlog/P3/B-0519-multi-otto-branch-state-contamination-rca-2026-05-14.md index ad86daa3ae..a8080cf41b 100644 --- a/docs/backlog/P3/B-0519-multi-otto-branch-state-contamination-rca-2026-05-14.md +++ b/docs/backlog/P3/B-0519-multi-otto-branch-state-contamination-rca-2026-05-14.md @@ -6,7 +6,7 @@ title: "Multi-Otto branch-state contamination — RCA + mechanization candidate" tier: factory-infrastructure effort: M created: 2026-05-14 -last_updated: 2026-05-14 +last_updated: 2026-05-15 depends_on: [] composes_with: [B-0400, B-0444, B-0506] tags: [multi-foreground-surface, git-worktree, branch-state, friction, factory-hygiene] @@ -78,6 +78,42 @@ branch. Recovery: `gh pr create --head ` with an EXPLICIT head ref, so the call doesn't depend on the (poisoned) current-branch state. +### Pattern 7 — Abandoned rebase state in shared `.git/` dir (2026-05-15T02:30Z) + +Observed in the 2026-05-15 Otto-CLI session: a peer agent (Lior on +`lior/decompose-b0139-4`) started an interactive rebase in the primary +worktree at 2026-05-14T20:36Z, left `.git/rebase-merge/` populated +(interactive mode, stopped at a commit), and never resumed. ~5 hours +later, between two consecutive Bash-tool calls, that rebase apparently +resumed in some other process — moving the primary worktree's +`.git/HEAD` from the active branch ref to a raw SHA (detached on +Lior's mid-rebase commit `65c7865`). + +Substrate-honest cost: the contamination was caught by the branch-guard +`test "$(git branch --show-current)" = ""` inline check +(Pattern 5 defense composed correctly) — but only after significant +work had already been done that then needed to be redone in a dedicated +worktree. + +Distinct from Patterns 1-6 because the contaminating event was a +PAUSED-then-RESUMED rebase state, not a fresh `git checkout` or +`git reset` by a parallel process. The rebase metadata in +`.git/rebase-merge/` is **shared filesystem state per-checkout** (not +per-process), so any process operating in that checkout can resume it. + +Recovery: do NOT touch peer's rebase state (per +[`.claude/rules/claim-acquire-before-worktree-work.md`](../../../.claude/rules/claim-acquire-before-worktree-work.md) +worktree force-remove guard, extended in spirit to rebase state). +Pivot to a dedicated `/tmp/zeta-otto-cli-` worktree and continue +work there. Branch refs are global, so commits land on the right ref +regardless of which physical checkout produces them. + +Field-test shards: + +- `docs/hygiene-history/ticks/2026/05/15/0213Z.md` — initial observation +- `docs/hygiene-history/ticks/2026/05/15/0230Z.md` — full forensics + + pivot to dedicated worktrees recovery + ## Mechanization candidates ### Cheap @@ -108,6 +144,28 @@ because a fresh-session Otto won't remember them otherwise. Carried in `.claude/rules/` would be the right home if/when this is promoted from P3 to actively-mechanized. +### Cheap (new — for Pattern 7) + +Cold-boot check that surfaces abandoned rebase state in the primary +worktree. A small TS script run at session start (or as a `.claude/` +hook): + +```typescript +// tools/hygiene/check-stale-rebase-state.ts (proposed) +import { existsSync, statSync, readFileSync } from "node:fs"; +const rebaseMerge = ".git/rebase-merge"; +if (existsSync(rebaseMerge)) { + const ageMs = Date.now() - statSync(rebaseMerge).mtimeMs; + if (ageMs > 60 * 60 * 1000) { + const headName = readFileSync(`${rebaseMerge}/head-name`, "utf-8").trim(); + console.warn(`Stale rebase state on ${headName} (${Math.round(ageMs / 1000 / 60)}min old). Peer agent should resume-or-abort.`); + } +} +``` + +Diagnostic only — does not touch the rebase state. The peer agent +that owns it is the only party that can responsibly resume or abort. + ### Substantial Per-Otto-process git worktree isolation. Each Otto session gets a dedicated `/private/tmp/zeta-/` worktree that's never touched by other processes. Already partially-implemented in some shards (the `--worktree` field on bus claim envelopes per B-0444) but not consistently used by Otto-CLI sessions. diff --git a/docs/backlog/P3/B-0528-shadow-launchd-installer-unit-tests-2026-05-15.md b/docs/backlog/P3/B-0528-shadow-launchd-installer-unit-tests-2026-05-15.md new file mode 100644 index 0000000000..52537fabda --- /dev/null +++ b/docs/backlog/P3/B-0528-shadow-launchd-installer-unit-tests-2026-05-15.md @@ -0,0 +1,109 @@ +--- +id: B-0528 +priority: P3 +status: open +title: "Unit tests for tools/shadow/launchd/install-launchagent.ts" +tier: factory-infrastructure +effort: S +created: 2026-05-15 +last_updated: 2026-05-15 +depends_on: [] +composes_with: [B-0402] +tags: [test-coverage, shadow-observer, launchd, deferred-from-pr-3342] +type: chore +--- + +# Unit tests for tools/shadow/launchd/install-launchagent.ts + +## Origin + +PR #3342 introduced `tools/shadow/launchd/install-launchagent.ts` — a +safety-critical installer that performs placeholder substitution + XML +escaping + atomic plist promotion + optional `launchctl bootstrap`. +Copilot's review of the install script flagged the absence of unit +test coverage (P1): + +> The new installer adds safety-critical substitution and LaunchAgent +> generation logic, but there are no tests covering placeholder +> escaping, missing placeholders, argument validation, or dry-run +> output. `tools/shadow/shadow-observer.ts` already has Bun tests in +> this area, so this installer should get comparable unit coverage +> before it becomes the documented install path. + +The PR #3342 follow-up commit (`970c774`) addresses 8 of the 10 +review-cycle-2 findings but defers test coverage to this row to keep +the PR's scope manageable. + +## Scope + +Add `tools/shadow/launchd/install-launchagent.test.ts` using Bun's +built-in test runner. Verify: + +1. **Placeholder substitution** + - `{{BUN_PATH}}` and `{{REPO_ROOT}}` are replaced + - Unrecognized `{{NAME}}` placeholders cause exit 1 with a + descriptive error +2. **XML escaping** + - Paths containing `&`, `<`, `>`, `"`, `'` produce a plist that + `plutil -lint` accepts (no raw XML-significant characters end + up in `` text nodes) + - Example fixture: `/Users/foo & bar/Zeta` +3. **Argument validation** + - Unknown flag (e.g., `--dryrun`) exits 1 before any FS action + - Relative `--repo-root .` and `--bun-path ./bin` rejected with + "must be an absolute path" + - Missing value for `--repo-root` exits 1 +4. **Dry-run output** + - `--dry-run` writes the rendered plist to stdout, not the + filesystem + - `~/Library/LaunchAgents/com.zeta.shadow-observer.plist` is + unchanged after a dry-run invocation +5. **Default detection** + - `tryDetect("which", ["bun"])` returns undefined cleanly when + bun isn't on PATH (instead of throwing) + - Same for `git rev-parse --show-toplevel` outside a checkout +6. **Availability-preserving install pattern** + - If `readFileSync(destPath, "utf-8")` succeeds, the in-memory + content is written to the side-car backup AFTER the new + content is in place at destPath + - If `readFileSync` throws ENOENT, the install proceeds without + a backup (no existing plist to back up) + - If `readFileSync` throws non-ENOENT, the install proceeds + without a backup but logs the read failure + - If `writeFileSync(tmpDest, ...)` fails, destPath is untouched + (no rename has happened yet) and no backup is created + - If `renameSync(tmpDest, destPath)` fails, the temp file is + unlinked and destPath remains in its pre-call state + (untouched existing plist, or absent if none existed) + - The canonical destPath is NEVER in a missing state during a + successful install — the atomic rename replaces in one syscall + +Reuse the existing test-helper patterns from +`tools/shadow/shadow-observer.test.ts` for fixture setup + +temp-dir handling. + +## Acceptance + +- New file `tools/shadow/launchd/install-launchagent.test.ts` +- All 6 categories above covered with at least one test each +- `bun test tools/shadow/launchd/install-launchagent.test.ts` passes +- CI lint-shellcheck + lint-tsc-tools remain clean +- Document the test-running invocation in + `tools/shadow/launchd/README.md` (single line, no new section) + +## Composes with + +- B-0402 (zeta shadow mode — first-class CLI autocomplete) +- PR #3342 (the install script being tested) +- `tools/shadow/shadow-observer.test.ts` (existing pattern to follow) + +## Non-goals + +- End-to-end LaunchAgent install tests requiring a real `launchctl + bootstrap` (would need a macOS runner + sudo + cleanup of system + state — out of scope; cover with the `--bootstrap` smoke test + document only) +- Refactoring `install-launchagent.ts` for testability beyond the + existing `if (import.meta.main)` guard (the helper functions + `xmlEscape`, `substitutePlaceholders`, `requireAbsolute`, + `tryDetect`, `plutilLint` are already importable) diff --git a/docs/hygiene-history/ticks/2026/05/15/0305Z.md b/docs/hygiene-history/ticks/2026/05/15/0305Z.md new file mode 100644 index 0000000000..3dfd7aa951 --- /dev/null +++ b/docs/hygiene-history/ticks/2026/05/15/0305Z.md @@ -0,0 +1,88 @@ +# Tick 0305Z — PR #3342 thread close-out (5 reviews + 1 CodeQL TOCTOU); PR #3356 merged + +## Headline + +- **PR [#3342](https://github.com/Lucent-Financial-Group/Zeta/pull/3342) thread close-out**: 5 Copilot+Codex review threads resolved + 1 new CodeQL TOCTOU thread caught & fixed mid-cycle. Final state: gate=BLOCKED wait-ci, 0 failed required, 0 unresolved threads, auto-merge armed. Two commits landed (`ee749f4` + `6543430`). +- **PR [#3356](https://github.com/Lucent-Financial-Group/Zeta/pull/3356) MERGED**: 0230Z shard with relative-link depth fix (4x→5x for `docs/`, 5x→6x for `.claude/`). Caught by Codex + Copilot in the same review pass. Same class as the 0027Z shard fix in #3330. + +## What landed on #3342 + +Five original review threads (3 Copilot P1, 2 Codex P2): + +| Finding | Fix | +|---|---| +| Broken provenance refs in README header | Dropped first-name attribution + user-scope memory-glob per `docs/AGENT-BEST-PRACTICES.md` §671-685 | +| `sed` recipe unsafe for paths with `&`, `\|`, `\` | New `tools/shadow/launchd/install-launchagent.ts`: String.replaceAll (metacharacter-safe), plutil-lint pre-write, backup-rename, optional bootstrap | +| Rule file referenced from plist not in tree | Auto-resolved by #3339 merge; reference now resolves on main | +| KeepAlive `SuccessfulExit=false`+`Crashed=true` OR-combined | Dropped `SuccessfulExit=false` (cycle 2); **restored cycle 3** after Copilot pointed out Bun runtime errors exit non-zero and would otherwise leave the agent dead until next login. Final state: both keys present with extended plist comment documenting the reviewer-conflict resolution + ThrottleInterval=60s as loop-cap | +| `launchctl load/unload` deprecated | Replaced with `bootstrap`/`bootout` + `gui/$(id -u)` domain across install + reload + uninstall paths | + +One follow-up CodeQL thread on the new install script: + +| Finding | Fix | +|---|---| +| `js/file-system-race` at writeFileSync (line 184) — existsSync→copyFileSync→writeFileSync had a TOCTOU window | Replaced existsSync+copyFileSync with `renameSync(destPath, backup)` inside try/catch (atomic, single syscall, ENOENT means no prior plist) | + +## Substrate-honest mid-tick stuck-point + +When I attempted to `git push --force-with-lease` after a local rebase of the #3342 branch onto current main, the harness auto-mode classifier blocked the force-push (correctly conservative — feature-branch force-with-lease is normally fine, but the classifier defaults to blocking force operations). Pivoted to a non-rebasing strategy: + +1. `git reset --hard origin/feat/shadow-launchd-install-path-a-otto-cli-2026-05-15` (revert to remote tip) +2. Redo the edits manually (3 files: README, plist, new install script) +3. Regular fast-forward `git push` succeeded + +Loss: the explicit benefit of the rebase (picking up #3349's process-extract.ts fix on the branch) was deferred to the GitHub merge step. Cost was acceptable: the PR's `lint (tsc tools)` check passed anyway because GitHub CI runs against the `refs/pull/3342/merge` virtual merge ref, which already includes main's process-extract.ts fix. + +## CodeQL TOCTOU pattern (preserved for future-reference) + +```typescript +// AVOID — TOCTOU between existsSync and copyFileSync: +if (existsSync(destPath)) { + copyFileSync(destPath, backup); +} +writeFileSync(destPath, configured, "utf-8"); + +// ALSO AVOID — renaming the existing dest aside leaves the canonical +// path missing during the window before writeFileSync completes: +try { renameSync(destPath, backup); } +catch (e) { + if ((e as NodeJS.ErrnoException).code !== "ENOENT") throw e; +} +writeFileSync(destPath, configured, "utf-8"); // window: destPath empty + +// PREFER — availability-preserving write-temp-then-atomic-replace, +// with the backup side-carred from in-memory content AFTER the +// install is in place: +let oldContent: string | undefined; +try { oldContent = readFileSync(destPath, "utf-8"); } +catch (e) { + if ((e as NodeJS.ErrnoException).code !== "ENOENT") { + // Read failed for a non-absence reason — skip backup, proceed install + } +} +writeFileSync(`${destPath}.tmp-${process.pid}`, configured, "utf-8"); +renameSync(`${destPath}.tmp-${process.pid}`, destPath); // atomic replace +if (oldContent !== undefined) { + writeFileSync(`${destPath}.bak-${ts}`, oldContent, "utf-8"); +} +``` + +The third pattern is the canonical POSIX "safe install" — destPath is never empty (atomic rename replaces in one syscall), and the backup is created from in-memory content AFTER the install succeeds. Updated 2026-05-15T0322Z per follow-up Codex review on PR #3342 (the second pattern, "ALSO AVOID", was the first iteration of this fix and is also a regression vs. pattern three). + +## Δ since 0230Z (last shard) + +| What | Before | After | +|---|---|---| +| #3342 gate | BLOCKED (5 threads + 1 failed required) | BLOCKED wait-ci (0 threads, auto-merge armed) | +| #3356 state | OPEN (2 threads on shard relative links) | MERGED | +| `tools/shadow/launchd/install-launchagent.ts` | did not exist | new TS installer (safe substitution, plutil pre-write, bootstrap helper) | +| Plist KeepAlive | `Crashed=true + SuccessfulExit=false` (relaunch on any non-zero exit) | Unchanged — first attempted to drop `SuccessfulExit=false` for crash-only semantics, but cycle-3 Copilot review restored it (Bun runtime errors exit non-zero, not signal-terminate); ThrottleInterval=60s caps any config-error loop. Net: same shape as before this PR, but now with extended plist comment documenting the trade-off | +| README install/teardown | legacy `launchctl load/unload` | modern `bootstrap`/`bootout gui/$(id -u)` | + +## Cron sentinel + +`a2c54a1c` armed. + +## Next + +Cron-driven. Next tick verifies #3342 merge. If merged, prune the worktrees; scan bus + backlog ready-to-grind for next pick. diff --git a/docs/hygiene-history/ticks/2026/05/15/0329Z.md b/docs/hygiene-history/ticks/2026/05/15/0329Z.md new file mode 100644 index 0000000000..ef859c2210 --- /dev/null +++ b/docs/hygiene-history/ticks/2026/05/15/0329Z.md @@ -0,0 +1,49 @@ +# Tick 0329Z — PR #3342 review-cycle-3 close-out; B-0519 Pattern 7 + B-0528 filed + +## Headline + +- PR [#3342](https://github.com/Lucent-Financial-Group/Zeta/pull/3342): three review cycles closed this tick (11 + 1 + 7 = 19 findings addressed; one P1 deferred to backlog). Gate state: BLOCKED wait-ci, 0 failed, 0 unresolved threads, auto-merge armed. Three commits landed: `970c774` + `19af54f` + `7084bb3`. +- [B-0519 RCA](../../../../backlog/P3/B-0519-multi-otto-branch-state-contamination-rca-2026-05-14.md) updated: Pattern 7 "Abandoned rebase state in shared `.git/` dir" added with field-test pointer + cheap-mechanization sketch (stale-rebase-state diagnostic). Committed in `f6789dd`. +- [B-0528](../../../../backlog/P3/B-0528-shadow-launchd-installer-unit-tests-2026-05-15.md) filed for the deferred test-coverage P1. Committed in `cd4964e`. Effort: S; P3 because behavior is correct (smoke-tested) — tests harden against regression. +- Two prior worktrees pruned (`/tmp/zeta-otto-cli-fix-3339-mdlint`, `/tmp/zeta-otto-cli-fix-3349-threads` — both branches merged this morning). + +## Review-cycle-3 findings (this tick, post-`970c774`) + +| Reviewer | File:line | Finding | Resolution | +|---|---|---|---| +| Copilot P1 | README.md:5 + plist:24 | Cross-refs to `.claude/rules/shadow-star-shorthand-*` not in branch tree | Resolved as outdated — file IS on main (#3339); CI merge-ref check passes | +| Copilot P1 | plist:61 | `Crashed=true` alone misses non-zero-exit Bun crashes | Restored `SuccessfulExit=false` with extended plist comment documenting the reviewer-conflict resolution. ThrottleInterval+--dry-run cap config-error loops | +| Copilot P1 | ts:57 | Missing `sonarjs/no-os-command-from-path` suppression on `execFileSync` | Added at tryDetect, plutilLint, all launchctl/id calls; comments cite poll-pr-gate.ts precedent | +| Copilot P2 | ts:306 | `import.meta.main` guard but functions not exported | Added `export` to main, parseArgs, requireAbsolute, tryDetect, xmlEscape, substitutePlaceholders, plutilLint, Args interface | +| Copilot P1 | shard:51 | "PREFER" snippet in 0305Z shard still showed rename-then-write pattern | Updated to 3-pattern progression with the canonical write-temp-then-atomic-replace | +| Codex P2 | ts:257 (line of prev commit) | Rename-then-write still leaves canonical path empty in brief window | Replaced with availability-preserving pattern: read-old-into-memory → write temp → atomic replace → side-car backup. Canonical path never empty | +| Copilot P1 | ts:185 | No unit tests for safety-critical helpers | Deferred to [B-0528](../../../../backlog/P3/B-0528-shadow-launchd-installer-unit-tests-2026-05-15.md) (substrate-honest scope management); PR comment linked the row | + +## Substrate-honest review-cycle observations + +1. **Each commit triggers a fresh review pass.** Cycle 2 found 11 things in the file I authored on cycle-1's "fix"; cycle 3 found 7 more. This is the normal review-bot cadence, not pathological — each fix is real, and Codex+Copilot review the diff against current main. + +2. **Reviewer-conflict resolution (KeepAlive)**: Codex earlier asked me to drop `SuccessfulExit=false` to avoid config-error restart loops. Copilot this cycle asked me to restore it for Bun-runtime-error coverage. Both are correct concerns; the trade-off is real. I restored the key and documented the rationale in the plist comment so future-readers (and future-reviewers) can see the deliberation rather than re-derive it. + +3. **Test coverage as backlog-row vs in-PR**: per substrate-honest scope-management, the test file is a separate piece of work. Filing B-0528 + linking it from the PR thread + commit message is the canonical way to defer without dropping the finding on the floor. + +## Δ since 0305Z (last shard) + +| What | Before | After | +|---|---|---| +| PR #3342 commits | `7b950f7` | `7084bb3` (+4 commits) | +| PR #3342 unresolved threads | 11 (cycle 2) → 1 (cycle 2.5) → 7 (cycle 3) | 0 | +| `install-launchagent.ts` correctness | XML-unsafe substitution, silent unknown args, relative paths accepted, brief destPath-empty window | XML-escaped, unknown-args rejected, absolute-required, availability-preserving install | +| `install-launchagent.ts` testability | helpers unexported, main() unconditional | `export`-ed helpers + `import.meta.main` guard | +| Plist KeepAlive | `Crashed=true` only | `Crashed=true + SuccessfulExit=false` with deliberation comment | +| B-0519 RCA pattern count | 6 (1-6 from 2026-05-14) | 7 (added "abandoned rebase state in shared .git/") | +| Backlog rows in flight | (no B-0528) | B-0528 filed (deferred test coverage) | +| Obsolete worktrees | 2 alive | 0 (pruned cleanly) | + +## Cron sentinel + +`a2c54a1c` armed. + +## Next + +Cron-driven. Next tick verifies #3342 merge. If yet another review cycle surfaces, address in batch + stop. Otherwise prune `/tmp/zeta-otto-cli-fix-3342-threads` and scan bus/backlog for next pick. diff --git a/docs/hygiene-history/ticks/2026/05/15/0543Z.md b/docs/hygiene-history/ticks/2026/05/15/0543Z.md new file mode 100644 index 0000000000..00d036d1c4 --- /dev/null +++ b/docs/hygiene-history/ticks/2026/05/15/0543Z.md @@ -0,0 +1,58 @@ +# Tick 0543Z — PR #3342 review-cycle-4 + cycle-5 close-out + +## Headline + +- PR [#3342](https://github.com/Lucent-Financial-Group/Zeta/pull/3342): two more review cycles closed (5 findings + 4 findings = 9 in this tick alone, 28 total across the PR's life). Gate state: BLOCKED wait-ci, 0 failed, 0 unresolved threads. Two commits landed: `59d5c26` (docs reconciliation) + `c0fd69b` (validation + tmp hardening). +- Main advanced by 3 PRs during the 2-hour gap since last tick: #3361 (shard 0503Z), #3362 (B-0529 row), #3363 (shard 0517Z + bus hygiene). These are peer-Otto work; substrate noted in worldview-refresh. + +## Cycle-4 findings (post-`b9fd8d2`, 5 threads — all dispatched in `59d5c26`) + +| Reviewer | File:line | Finding | Resolution | +|---|---|---|---| +| Copilot P1 | README.md:5 + line 145 | Cross-ref to `.claude/rules/shadow-star-shorthand-*` not in branch tree | Resolved as outdated — file IS on main (#3339); CI merge-ref check confirms | +| Copilot P1 | plist:24 | Same cross-ref issue | Same resolution | +| Copilot P1 | README.md:13 | "Restarts on crash only" doc vs `SuccessfulExit=false` plist behavior | README updated to reflect actual restart policy + ThrottleInterval loop-cap context | +| Copilot P1 | 0305Z.md:17 + line 79 | Tick shard says "dropped SuccessfulExit" but plist still has it (cycle-3 restore) | Both shard rows rewritten to reflect cycle-3 reviewer-conflict resolution | +| Copilot P1 | B-0528.md:69 | Test-scope item #6 described atomicity behavior that no longer exists in the code | Rewrote scope #6 to match the availability-preserving install pattern (read-into-memory → atomic replace → side-car backup) | + +## Cycle-5 findings (post-`59d5c26`, 4 threads — 2 substantive in `c0fd69b`) + +| Reviewer | File:line | Finding | Resolution | +|---|---|---|---| +| Copilot P1 | install-launchagent.ts:200 | No unit tests (5th time flagged) | Resolved with B-0528 pointer (already filed); thread comment cycle | +| Copilot P1 | install-launchagent.ts:143 | `existsSync(bunPath)` accepts directories or non-executable files | Replaced with `statSync` + `isFile()` + `accessSync(X_OK)`. Verified by-hand: `--bun-path /tmp` → "not a regular file"; `--bun-path ` → "not executable by current user". Same shape for `--repo-root` (must be directory). | +| Copilot P1 | install-launchagent.ts:214 | Predictable `/tmp/zeta-shadow-launchagent-${PID}.plist` filename — symlink attack | Replaced with `mkdtempSync(tmpdir() + "zeta-shadow-launchagent-")`. Private 0700 directory with random suffix; `finally` block cleans up file + directory so we don't leak | +| Codex P2 | README.md:null | Same KeepAlive doc-vs-plist drift Copilot flagged | Already addressed in cycle-4 (`59d5c26`); resolved as outdated | + +## Substrate-honest observation on cycle convergence + +Five review cycles produced 28 findings (cycle 1: 5 P1 + 2 cycle-1.5 + cycle-2 11 + cycle-3 7 + cycle-4 5 + cycle-5 4). Each cycle finds fewer + more obscure issues. The trajectory IS converging: + +- Cycle 1: high-level discipline (sed safety, cross-refs, KeepAlive semantics) +- Cycle 2: design quality (XML escape, atomic write, unknown-args, abs paths) +- Cycle 3: reviewer-conflict (restored SuccessfulExit) + style (export, sonarjs) +- Cycle 4: documentation drift from cycle-3 plist change +- Cycle 5: hardening edge cases (non-exec bun, tmp-symlink) + +Each fix is genuine; this is the normal review-bot cadence on a non-trivial new file. Substrate-honest test-coverage deferral to B-0528 remains correct — adding tests in-PR would extend the cycle further. + +## Δ since 0329Z (last shard) + +| What | Before | After | +|---|---|---| +| PR #3342 commits | `b9fd8d2` | `c0fd69b` (+2 commits) | +| Unresolved threads | 0 | 5 (cycle 4) → 4 (cycle 5) → 0 | +| README KeepAlive docs | "crash only" (drifted from plist) | matches actual `Crashed=true + SuccessfulExit=false` behavior | +| 0305Z shard KeepAlive rows | "dropped SuccessfulExit, crash-only" | reflects cycle-3 restore with reviewer-conflict context | +| B-0528 test scope #6 | atomic-promote with rename-restore | availability-preserving install (matches actual code) | +| `--bun-path` validation | existsSync only | statSync.isFile() + accessSync(X_OK) | +| plutilLint temp path | predictable `/tmp/zeta-shadow-launchagent-${PID}.plist` | private mkdtemp dir with `finally` cleanup | +| Main HEAD | `b9fd8d2` was after 0329Z's main | advanced 3 PRs (3361, 3362, 3363) during the 2-hour gap | + +## Cron sentinel + +`a2c54a1c` armed. + +## Next + +Cron-driven. Next tick verifies #3342 merge. Peer-Otto's bus hygiene work (3363) is worth scanning for context; if #3342 merges, prune `/tmp/zeta-otto-cli-fix-3342-threads`. diff --git a/tools/shadow/launchd/README.md b/tools/shadow/launchd/README.md new file mode 100644 index 0000000000..a2bb47626e --- /dev/null +++ b/tools/shadow/launchd/README.md @@ -0,0 +1,146 @@ +# Shadow observer — launchd install (macOS LaunchAgent) + +The shadow observation loop needs a durable tick source. This directory provides the macOS launchd LaunchAgent template (Path A — durable, survives reboots). + +For lighter-weight alternatives + composition shape, see [`.claude/rules/shadow-star-shorthand-autocomplete-marker.md`](../../../.claude/rules/shadow-star-shorthand-autocomplete-marker.md) + [`tools/shadow/README.md`](../README.md). + +## What this does + +Installs `shadow-observer.ts` as a macOS LaunchAgent that: + +- Starts at user login (RunAtLoad) +- Restarts on crash AND on non-zero exit (KeepAlive `Crashed=true` + `SuccessfulExit=false`) — covers signal-termination AND Bun runtime errors that exit non-zero +- Throttles restart attempts to once per 60s (ThrottleInterval) — caps any config-error restart loop to one launch per minute (visible in `launchctl print` + `shadow-observer.stderr.log`) +- Logs to `tools/shadow/shadow-observer.{log,stdout.log,stderr.log}` +- Polls every 2 seconds (--loop-interval 2000) +- Accepts grey-text suggestions after 3-second delay (--delay 3000) +- **Defaults to --dry-run** (logs detections without keystrokes) until you verify it works + +## Install procedure + +### Prerequisites + +1. macOS user account +2. Bun installed (`which bun` returns a path) +3. Zeta checkout at known absolute path +4. Accessibility permission granted to whichever terminal app launchd will run from (or to bun itself) — required by the AppleScript grey-text detector + +### Step 1: configure + write the plist + +The shipped plist is a template with `{{BUN_PATH}}` and `{{REPO_ROOT}}` placeholders. Use the installer script — it does the substitution safely (handles paths containing `&`, `|`, `\` that would break a naive `sed`), validates via `plutil -lint`, and writes to `~/Library/LaunchAgents/`: + +```bash +cd /path/to/your/Zeta/checkout +bun tools/shadow/launchd/install-launchagent.ts +``` + +The installer picks up `which bun` and `git rev-parse --show-toplevel` by default. To override either, pass `--bun-path` / `--repo-root`. Use `--dry-run` to print the substituted plist to stdout without writing anything. + +### Step 2: grant accessibility permission + +The AppleScript grey-text detector reads the foreground app's UI. macOS requires explicit accessibility permission: + +1. Open System Settings → Privacy & Security → Accessibility +2. Click + → add `bun` (the path `which bun` returned) AND your terminal app (Terminal.app, iTerm2, Ghostty, etc.) +3. Enable the toggles + +If you skip this, the detector returns errors and shadow-observer cycles forever doing nothing useful (logs will show AppleScript errors). + +### Step 3: load + verify + +Modern `launchctl bootstrap` is the documented replacement for the legacy `launchctl load`. It returns a proper exit code on failure (unlike `load`, which can silently no-op) and supports targeted domain semantics: + +```bash +launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.zeta.shadow-observer.plist + +# Verify it loaded (modern + legacy commands both work for inspection) +launchctl print gui/$(id -u)/com.zeta.shadow-observer | head -20 +# Should show state = running + +# Verify process is running +pgrep -fl shadow-observer +# Should show the bun process running shadow-observer.ts + +# Watch the log +tail -f tools/shadow/shadow-observer.log +``` + +Or skip the manual `bootstrap` and let the installer do it: `bun tools/shadow/launchd/install-launchagent.ts --bootstrap`. + +### Step 4: test detection in dry-run mode + +With the LaunchAgent running in --dry-run (default), open Claude Code CLI and let it produce a grey-text autocomplete suggestion. Within 2-5 seconds you should see a `detected` event in the log followed by an `accepted` event with `dryRun: true`. No actual Enter keystroke is sent. + +### Step 5 (optional, only after dry-run works): switch to live mode + +Edit `~/Library/LaunchAgents/com.zeta.shadow-observer.plist` and remove the `--dry-run` line from `ProgramArguments`. Then reload: + +```bash +PLIST=~/Library/LaunchAgents/com.zeta.shadow-observer.plist +launchctl bootout gui/$(id -u) "$PLIST" +launchctl bootstrap gui/$(id -u) "$PLIST" +``` + +Live mode sends Enter keystrokes to accept detected grey-text. The "(shadow*)" attribution shorthand documents that the accepted text came from autocomplete (see `.claude/rules/shadow-star-shorthand-autocomplete-marker.md`). + +## Uninstall + +Use `launchctl bootout` (not the legacy `unload`) so a failure to remove the running service is reported instead of swallowed: + +```bash +PLIST=~/Library/LaunchAgents/com.zeta.shadow-observer.plist +launchctl bootout gui/$(id -u) "$PLIST" +# Verify the agent is gone (no output expected from the print) +launchctl print gui/$(id -u)/com.zeta.shadow-observer 2>&1 | head -1 +# Should print: "Could not find service ..." — that's the success case. +rm "$PLIST" +``` + +## Troubleshooting + +### LaunchAgent loads but process doesn't start + +Check stderr log: + +```bash +cat tools/shadow/shadow-observer.stderr.log +``` + +Common causes: + +- `{{BUN_PATH}}` not replaced with actual bun path +- `{{REPO_ROOT}}` not replaced with actual repo path +- Bun not installed at the path given + +### Process starts but log shows no detections + +Likely accessibility permission isn't granted. Try the detector standalone: + +```bash +osascript tools/shadow/detect-grey-text.applescript +``` + +If it returns an error about accessibility / event posting, grant permission per Step 2. + +### Process restarts in a loop + +ThrottleInterval prevents fast crash-loops, but if you see repeated start/exit cycles in `launchctl list` output, check the stderr log for the actual error. + +## Composition with defense-in-depth alternatives + +| Path | Persistence | Use when | +|---|---|---| +| **A — launchd (this dir)** | Survives reboots | Primary install on your daily-driver Mac | +| **B — `<>` CronCreate sentinel** | Bound to Otto-CLI session | Defense-in-depth backup; restarts the launchd process if it dies | +| **C — manual `--loop` foreground** | Until terminal closes | Quick testing of changes | + +All three compose. Path A is the primary; Path B+C supplement. The human maintainer's framing on this design (2026-05-15T~01:11Z): defense in depth. + +## Composes with substrate + +- `tools/shadow/shadow-observer.ts` — the process this LaunchAgent launches +- `tools/shadow/detect-grey-text.applescript` — the AppleScript detector +- `tools/shadow/zeta-shadow.ts` — top-level CLI (alternative entry point) +- `docs/backlog/P0/B-0402-zeta-shadow-mode-first-class-cli-autocomplete.md` — the originating backlog row +- `.claude/rules/shadow-star-shorthand-autocomplete-marker.md` — the shorthand this loop's accepted-suggestions ship under +- `.claude/skills/save-ai-memory/SKILL.md` — composing discipline: memory preservation requires accurate observation; the shadow observer IS one direct-observation surface diff --git a/tools/shadow/launchd/com.zeta.shadow-observer.plist b/tools/shadow/launchd/com.zeta.shadow-observer.plist new file mode 100644 index 0000000000..f1c8776a18 --- /dev/null +++ b/tools/shadow/launchd/com.zeta.shadow-observer.plist @@ -0,0 +1,100 @@ + + + + + + Label + com.zeta.shadow-observer + + ProgramArguments + + {{BUN_PATH}} + {{REPO_ROOT}}/tools/shadow/shadow-observer.ts + --loop + 5000 + --loop-interval + 2000 + --delay + 3000 + --dry-run + --log-file + {{REPO_ROOT}}/tools/shadow/shadow-observer.log + + + WorkingDirectory + {{REPO_ROOT}} + + RunAtLoad + + + KeepAlive + + + SuccessfulExit + + Crashed + + + + ThrottleInterval + 60 + + StandardOutPath + {{REPO_ROOT}}/tools/shadow/shadow-observer.stdout.log + + StandardErrorPath + {{REPO_ROOT}}/tools/shadow/shadow-observer.stderr.log + + EnvironmentVariables + + PATH + /usr/local/bin:/usr/bin:/bin:/opt/homebrew/bin + + + ProcessType + Background + + diff --git a/tools/shadow/launchd/install-launchagent.ts b/tools/shadow/launchd/install-launchagent.ts new file mode 100644 index 0000000000..3bc66c9c47 --- /dev/null +++ b/tools/shadow/launchd/install-launchagent.ts @@ -0,0 +1,356 @@ +#!/usr/bin/env bun +/** + * tools/shadow/launchd/install-launchagent.ts + * + * Install (or reinstall) the Zeta shadow observer as a macOS LaunchAgent. + * + * Performs the placeholder substitution that the README previously documented + * with a `sed` recipe. The `sed` form is unsafe when the substituted values + * contain `&`, `|`, `\`, or newlines — a path like `/Users/foo & bar/Zeta` + * would silently produce a broken plist. This script reads the template, + * substitutes by string replace (no shell metacharacter risk), runs + * `plutil -lint` on the result, and writes it to ~/Library/LaunchAgents/. + * + * USAGE + * + * bun tools/shadow/launchd/install-launchagent.ts + * + * # Optional flags: + * --bun-path override `which bun` + * --repo-root override `git rev-parse --show-toplevel` + * --dry-run write to stdout instead of LaunchAgents dir + * --bootstrap after writing, `launchctl bootstrap` the agent + * + * SAFETY + * + * - Refuses to write if substitution would leave any `{{PLACEHOLDER}}` + * markers in the output (catches mis-spelled / new placeholders). + * - Verifies the output with `plutil -lint` before writing to the + * final location. + * - Backs up any existing LaunchAgent plist before overwriting. + * + * COMPOSES WITH + * + * tools/shadow/launchd/com.zeta.shadow-observer.plist — the template + * tools/shadow/launchd/README.md — install procedure documentation + */ + +import { readFileSync, writeFileSync, existsSync, renameSync, mkdirSync, mkdtempSync, rmdirSync, statSync, accessSync, unlinkSync, constants as fsConstants } from "node:fs"; +import { homedir, tmpdir } from "node:os"; +import { join, isAbsolute, resolve } from "node:path"; +import { execFileSync } from "node:child_process"; + +export interface Args { + bunPath: string; + repoRoot: string; + dryRun: boolean; + bootstrap: boolean; +} + +/** + * Tries `which bun` (or `git rev-parse --show-toplevel`) and returns + * the trimmed stdout. If the command isn't on PATH or errors, returns + * undefined so the caller can emit the documented actionable error + * instead of a raw exception. + */ +export function tryDetect(cmd: string, args: string[]): string | undefined { + try { + // eslint-disable-next-line sonarjs/no-os-command-from-path -- which / git + // are standard dev-environment dependencies invoked by name; same + // convention as tools/github/poll-pr-gate.ts + tools/peer-call/. + return execFileSync(cmd, args, { encoding: "utf-8", stdio: ["ignore", "pipe", "pipe"] }).trim(); + } catch { + return undefined; + } +} + +/** + * Enforce that a user-supplied path override is absolute. LaunchAgent + * plists are persistent and must not depend on the installer's cwd — + * `--repo-root .` would write relative paths into ProgramArguments, + * WorkingDirectory, and log paths, all of which would fail under + * launchd (which has no defined cwd unless explicit). + */ +export function requireAbsolute(name: string, value: string): string { + if (!isAbsolute(value)) { + console.error(`${name} must be an absolute path; got "${value}". Use \`${resolve(value)}\` if that's what you meant.`); + process.exit(1); + } + return value; +} + +export function parseArgs(argv: string[]): Args { + let bunPath: string | undefined; + let repoRoot: string | undefined; + let dryRun = false; + let bootstrap = false; + let i = 0; + function nextArg(name: string): string { + const v = argv[++i]; + if (v === undefined) { + console.error(`Missing value for ${name}`); + process.exit(1); + } + if (v.startsWith("--")) { + console.error(`Missing value for ${name} (next token "${v}" looks like a flag)`); + process.exit(1); + } + return v; + } + for (; i < argv.length; i++) { + const a = argv[i]; + switch (a) { + case "--bun-path": + bunPath = requireAbsolute("--bun-path", nextArg("--bun-path")); + break; + case "--repo-root": + repoRoot = requireAbsolute("--repo-root", nextArg("--repo-root")); + break; + case "--dry-run": + dryRun = true; + break; + case "--bootstrap": + bootstrap = true; + break; + case "--help": + case "-h": + printHelp(); + process.exit(0); + default: + // Unknown flag — refuse before performing any filesystem action. + // Otherwise a typo like `--dryrun` or `--bootstap` would silently + // proceed with default-installation behavior. + console.error(`Unknown argument: ${a}. Run with --help to see supported flags.`); + process.exit(1); + } + } + if (!bunPath) { + bunPath = tryDetect("which", ["bun"]); + if (!bunPath) { + console.error("Could not detect bun via `which bun`. Install bun or pass --bun-path ."); + process.exit(1); + } + } + if (!repoRoot) { + repoRoot = tryDetect("git", ["rev-parse", "--show-toplevel"]); + if (!repoRoot) { + console.error("Could not detect repo root via `git rev-parse --show-toplevel` (not in a git checkout?). Pass --repo-root ."); + process.exit(1); + } + } + // Validate bunPath is an executable regular file. `existsSync` alone + // would pass directories or non-executable files and produce a + // LaunchAgent that can't start. + try { + const st = statSync(bunPath); + if (!st.isFile()) { + console.error(`--bun-path "${bunPath}" exists but is not a regular file. Pass an absolute path to the bun binary.`); + process.exit(1); + } + accessSync(bunPath, fsConstants.X_OK); + } catch (e) { + const code = (e as NodeJS.ErrnoException).code; + if (code === "ENOENT") { + console.error(`bun not found at "${bunPath}". Install bun or pass --bun-path.`); + } else if (code === "EACCES") { + console.error(`--bun-path "${bunPath}" is not executable by the current user. chmod +x or pass a different path.`); + } else { + console.error(`Could not stat --bun-path "${bunPath}" (${(e as Error).message})`); + } + process.exit(1); + } + // repoRoot should be an existing directory (the installer reads + // `${repoRoot}/tools/shadow/launchd/com.zeta.shadow-observer.plist`). + try { + const st = statSync(repoRoot); + if (!st.isDirectory()) { + console.error(`--repo-root "${repoRoot}" exists but is not a directory.`); + process.exit(1); + } + } catch { + console.error(`repo root "${repoRoot}" does not exist or is unreadable. Pass --repo-root explicitly.`); + process.exit(1); + } + return { bunPath, repoRoot, dryRun, bootstrap }; +} + +function printHelp(): void { + console.error(` +Install Zeta shadow observer as a macOS LaunchAgent. + +Usage: bun tools/shadow/launchd/install-launchagent.ts [options] + +Options: + --bun-path Absolute path to bun (default: \`which bun\`) + --repo-root Absolute path to Zeta checkout (default: \`git rev-parse --show-toplevel\`) + --dry-run Write to stdout instead of LaunchAgents dir + --bootstrap After writing, \`launchctl bootstrap\` the agent + +After install, complete Step 2 (accessibility permission) from +tools/shadow/launchd/README.md before live mode is useful. +`); +} + +/** + * XML-text escape: substituted values go inside nodes, + * so any of `& < >` would either produce an invalid plist (`&` without a + * matching `;`) or alter the parsed value. Apple's plutil rejects raw `&` + * in text nodes outright. We escape before substitution. + * + * `'` and `"` aren't strictly required to be escaped inside element text, + * but escaping them is conservative and matches XML 1.0 predefined-entity + * recommendations. + */ +export function xmlEscape(s: string): string { + return s + .replaceAll("&", "&") + .replaceAll("<", "<") + .replaceAll(">", ">") + .replaceAll('"', """) + .replaceAll("'", "'"); +} + +export function substitutePlaceholders(template: string, repoRoot: string, bunPath: string): string { + // String replace is metacharacter-safe (unlike sed). No escaping needed + // for shell — but the values are about to land in XML text nodes, so + // they DO need XML-entity escaping (else `/Users/foo & bar/Zeta` + // produces an invalid plist). + // Placeholders use `{{NAME}}` (Mustache-style) so the template header + // comment can document them without itself being substituted. + const result = template + .replaceAll("{{BUN_PATH}}", xmlEscape(bunPath)) + .replaceAll("{{REPO_ROOT}}", xmlEscape(repoRoot)); + // Sanity check: any remaining {{NAME}} markers indicate a new placeholder + // the script doesn't know about. Refuse to write a partial substitution. + const leftover = result.match(/\{\{[A-Z_]+\}\}/g); + if (leftover) { + console.error(`Unsubstituted placeholder(s) in template: ${[...new Set(leftover)].join(", ")}`); + console.error("Update install-launchagent.ts to handle them."); + process.exit(1); + } + return result; +} + +export function plutilLint(content: string): void { + // Write to a private mkdtemp directory so the temp filename isn't + // predictable from PID. A pre-created symlink at a predictable path + // (e.g., /tmp/zeta-shadow-launchagent-12345.plist) would have caused + // writeFileSync to follow the symlink and clobber an unrelated file + // that the attacker had write access to. mkdtempSync creates a 0700 + // directory with a random suffix that only this user can enter, so + // no other user's symlink can target the file we're about to write. + // We also clean up after ourselves on success and on the error path. + const tmpDir = mkdtempSync(join(tmpdir(), "zeta-shadow-launchagent-")); + const tmp = join(tmpDir, "candidate.plist"); + try { + writeFileSync(tmp, content, "utf-8"); + try { + // eslint-disable-next-line sonarjs/no-os-command-from-path -- plutil + // is a macOS-standard system binary present on all install targets. + execFileSync("plutil", ["-lint", tmp], { stdio: "pipe" }); + } catch (e) { + const err = e as { stdout?: Buffer; stderr?: Buffer }; + console.error("plutil -lint rejected the generated plist:"); + if (err.stdout) console.error(err.stdout.toString()); + if (err.stderr) console.error(err.stderr.toString()); + process.exit(1); + } + } finally { + // Best-effort cleanup so we don't leak temp dirs across runs. + try { unlinkSync(tmp); } catch { /* may not exist */ } + try { rmdirSync(tmpDir); } catch { /* best-effort */ } + } +} + +export function main(): void { + const args = parseArgs(process.argv.slice(2)); + const templatePath = join(args.repoRoot, "tools/shadow/launchd/com.zeta.shadow-observer.plist"); + if (!existsSync(templatePath)) { + console.error(`Template not found at ${templatePath}. Wrong --repo-root?`); + process.exit(1); + } + const template = readFileSync(templatePath, "utf-8"); + const configured = substitutePlaceholders(template, args.repoRoot, args.bunPath); + plutilLint(configured); + + if (args.dryRun) { + process.stdout.write(configured); + return; + } + + const destDir = join(homedir(), "Library", "LaunchAgents"); + const destPath = join(destDir, "com.zeta.shadow-observer.plist"); + // recursive: true is idempotent; no existsSync check needed. + mkdirSync(destDir, { recursive: true }); + + // Availability-preserving install pattern (per Codex review on PR #3342): + // 1. Read existing destPath into memory (skipped on ENOENT) + // 2. Write the new content to a temp file alongside destPath + // 3. Atomically rename(tmp, destPath) — POSIX guarantees this + // replaces destPath in a single syscall, so the canonical path + // is NEVER in a missing state. The previous rename-then-write + // pattern had a brief window where launchd would have seen + // no plist at the canonical path; this pattern eliminates it. + // 4. Side-car the in-memory old content as a timestamped backup + // (best-effort; failure here doesn't affect the install) + let oldContent: string | undefined; + try { + oldContent = readFileSync(destPath, "utf-8"); + } catch (e) { + if ((e as NodeJS.ErrnoException).code !== "ENOENT") { + // Read failed for some reason other than absence — treat as + // "no backup available" and proceed; the goal is the install, + // and if we can't read we probably can't back up anyway. + console.error(`Could not read existing plist for backup (${(e as Error).message}); proceeding without backup`); + } + } + const tmpDest = `${destPath}.tmp-${process.pid}`; + writeFileSync(tmpDest, configured, "utf-8"); + try { + renameSync(tmpDest, destPath); + } catch (e) { + // Promote failed — clean up temp file so we don't leak stale + // candidates in ~/Library/LaunchAgents. + try { unlinkSync(tmpDest); } catch { /* tmp may already be gone */ } + throw e; + } + console.error(`Wrote ${destPath}`); + if (oldContent !== undefined) { + const backup = `${destPath}.bak-${new Date().toISOString().replace(/[:.]/g, "-")}`; + try { + writeFileSync(backup, oldContent, "utf-8"); + console.error(`Backed up previous plist to ${backup}`); + } catch (e) { + // Backup-write failed — the install is still good. Log and move on. + console.error(`Could not write backup at ${backup} (${(e as Error).message}); install OK without backup`); + } + } + + if (args.bootstrap) { + // eslint-disable-next-line sonarjs/no-os-command-from-path -- `id`, + // `launchctl` are macOS-standard system binaries; same convention + // as tools/github/poll-pr-gate.ts. + const uid = execFileSync("id", ["-u"], { encoding: "utf-8" }).trim(); + const domain = `gui/${uid}`; + // bootout first (idempotent — fine if not loaded) + try { + // eslint-disable-next-line sonarjs/no-os-command-from-path + execFileSync("launchctl", ["bootout", domain, destPath], { stdio: "pipe" }); + } catch { + // Not loaded — fine. + } + // eslint-disable-next-line sonarjs/no-os-command-from-path + execFileSync("launchctl", ["bootstrap", domain, destPath], { stdio: "inherit" }); + console.error(`Bootstrapped com.zeta.shadow-observer in ${domain}`); + } else { + console.error("Next: complete Step 2 (accessibility permission) per README, then:"); + console.error(` launchctl bootstrap gui/$(id -u) ${destPath}`); + } +} + +// Bun convention: only invoke main() when this file is the entry point. +// Lets tests / helpers `import { ... } from "./install-launchagent.ts"` +// without triggering filesystem or launchctl side effects. +if (import.meta.main) { + main(); +}