Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ title: "Multi-Otto branch-state contamination — RCA + mechanization candidate"
tier: factory-infrastructure
effort: M
created: 2026-05-14
last_updated: 2026-05-14
last_updated: 2026-05-15
depends_on: []
composes_with: [B-0400, B-0444, B-0506]
tags: [multi-foreground-surface, git-worktree, branch-state, friction, factory-hygiene]
Expand Down Expand Up @@ -78,6 +78,42 @@ branch.
Recovery: `gh pr create --head <my-branch>` with an EXPLICIT head ref,
so the call doesn't depend on the (poisoned) current-branch state.

### Pattern 7 — Abandoned rebase state in shared `.git/` dir (2026-05-15T02:30Z)

Observed in the 2026-05-15 Otto-CLI session: a peer agent (Lior on
`lior/decompose-b0139-4`) started an interactive rebase in the primary
worktree at 2026-05-14T20:36Z, left `.git/rebase-merge/` populated
(interactive mode, stopped at a commit), and never resumed. ~5 hours
later, between two consecutive Bash-tool calls, that rebase apparently
resumed in some other process — moving the primary worktree's
`.git/HEAD` from the active branch ref to a raw SHA (detached on
Lior's mid-rebase commit `65c7865`).

Substrate-honest cost: the contamination was caught by the branch-guard
`test "$(git branch --show-current)" = "<expected>"` inline check
(Pattern 5 defense composed correctly) — but only after significant
work had already been done that then needed to be redone in a dedicated
worktree.

Distinct from Patterns 1-6 because the contaminating event was a
PAUSED-then-RESUMED rebase state, not a fresh `git checkout` or
`git reset` by a parallel process. The rebase metadata in
`.git/rebase-merge/` is **shared filesystem state per-checkout** (not
per-process), so any process operating in that checkout can resume it.

Recovery: do NOT touch peer's rebase state (per
[`.claude/rules/claim-acquire-before-worktree-work.md`](../../../.claude/rules/claim-acquire-before-worktree-work.md)
worktree force-remove guard, extended in spirit to rebase state).
Pivot to a dedicated `/tmp/zeta-otto-cli-<task>` worktree and continue
work there. Branch refs are global, so commits land on the right ref
regardless of which physical checkout produces them.

Field-test shards:

- `docs/hygiene-history/ticks/2026/05/15/0213Z.md` — initial observation
- `docs/hygiene-history/ticks/2026/05/15/0230Z.md` — full forensics +
pivot to dedicated worktrees recovery
Comment on lines +111 to +115

## Mechanization candidates

### Cheap
Expand Down Expand Up @@ -108,6 +144,28 @@ because a fresh-session Otto won't remember them otherwise. Carried in
`.claude/rules/` would be the right home if/when this is promoted from
P3 to actively-mechanized.

### Cheap (new — for Pattern 7)

Cold-boot check that surfaces abandoned rebase state in the primary
worktree. A small TS script run at session start (or as a `.claude/`
hook):

```typescript
// tools/hygiene/check-stale-rebase-state.ts (proposed)
import { existsSync, statSync, readFileSync } from "node:fs";
const rebaseMerge = ".git/rebase-merge";
if (existsSync(rebaseMerge)) {
const ageMs = Date.now() - statSync(rebaseMerge).mtimeMs;
if (ageMs > 60 * 60 * 1000) {
const headName = readFileSync(`${rebaseMerge}/head-name`, "utf-8").trim();
console.warn(`Stale rebase state on ${headName} (${Math.round(ageMs / 1000 / 60)}min old). Peer agent should resume-or-abort.`);
}
}
```

Diagnostic only — does not touch the rebase state. The peer agent
that owns it is the only party that can responsibly resume or abort.

### Substantial

Per-Otto-process git worktree isolation. Each Otto session gets a dedicated `/private/tmp/zeta-<session-id>/` worktree that's never touched by other processes. Already partially-implemented in some shards (the `--worktree` field on bus claim envelopes per B-0444) but not consistently used by Otto-CLI sessions.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
---
id: B-0528
priority: P3
status: open
title: "Unit tests for tools/shadow/launchd/install-launchagent.ts"
tier: factory-infrastructure
effort: S
created: 2026-05-15
last_updated: 2026-05-15
depends_on: []
composes_with: [B-0402]
tags: [test-coverage, shadow-observer, launchd, deferred-from-pr-3342]
type: chore
---

# Unit tests for tools/shadow/launchd/install-launchagent.ts

## Origin

PR #3342 introduced `tools/shadow/launchd/install-launchagent.ts` — a
safety-critical installer that performs placeholder substitution + XML
escaping + atomic plist promotion + optional `launchctl bootstrap`.
Copilot's review of the install script flagged the absence of unit
test coverage (P1):

> The new installer adds safety-critical substitution and LaunchAgent
> generation logic, but there are no tests covering placeholder
> escaping, missing placeholders, argument validation, or dry-run
> output. `tools/shadow/shadow-observer.ts` already has Bun tests in
> this area, so this installer should get comparable unit coverage
> before it becomes the documented install path.

The PR #3342 follow-up commit (`970c774`) addresses 8 of the 10
review-cycle-2 findings but defers test coverage to this row to keep
the PR's scope manageable.

## Scope

Add `tools/shadow/launchd/install-launchagent.test.ts` using Bun's
built-in test runner. Verify:

1. **Placeholder substitution**
- `{{BUN_PATH}}` and `{{REPO_ROOT}}` are replaced
- Unrecognized `{{NAME}}` placeholders cause exit 1 with a
descriptive error
2. **XML escaping**
- Paths containing `&`, `<`, `>`, `"`, `'` produce a plist that
`plutil -lint` accepts (no raw XML-significant characters end
up in `<string>` text nodes)
Comment on lines +46 to +49
- Example fixture: `/Users/foo & bar/Zeta`
3. **Argument validation**
- Unknown flag (e.g., `--dryrun`) exits 1 before any FS action
- Relative `--repo-root .` and `--bun-path ./bin` rejected with
"must be an absolute path"
- Missing value for `--repo-root` exits 1
4. **Dry-run output**
- `--dry-run` writes the rendered plist to stdout, not the
filesystem
- `~/Library/LaunchAgents/com.zeta.shadow-observer.plist` is
unchanged after a dry-run invocation
5. **Default detection**
- `tryDetect("which", ["bun"])` returns undefined cleanly when
bun isn't on PATH (instead of throwing)
- Same for `git rev-parse --show-toplevel` outside a checkout
6. **Availability-preserving install pattern**
- If `readFileSync(destPath, "utf-8")` succeeds, the in-memory
content is written to the side-car backup AFTER the new
content is in place at destPath
- If `readFileSync` throws ENOENT, the install proceeds without
a backup (no existing plist to back up)
- If `readFileSync` throws non-ENOENT, the install proceeds
without a backup but logs the read failure
- If `writeFileSync(tmpDest, ...)` fails, destPath is untouched
(no rename has happened yet) and no backup is created
- If `renameSync(tmpDest, destPath)` fails, the temp file is
unlinked and destPath remains in its pre-call state
(untouched existing plist, or absent if none existed)
- The canonical destPath is NEVER in a missing state during a
successful install — the atomic rename replaces in one syscall

Reuse the existing test-helper patterns from
`tools/shadow/shadow-observer.test.ts` for fixture setup +
temp-dir handling.

## Acceptance

- New file `tools/shadow/launchd/install-launchagent.test.ts`
- All 6 categories above covered with at least one test each
- `bun test tools/shadow/launchd/install-launchagent.test.ts` passes
- CI lint-shellcheck + lint-tsc-tools remain clean
- Document the test-running invocation in
`tools/shadow/launchd/README.md` (single line, no new section)

## Composes with

- B-0402 (zeta shadow mode — first-class CLI autocomplete)
- PR #3342 (the install script being tested)
- `tools/shadow/shadow-observer.test.ts` (existing pattern to follow)

## Non-goals

- End-to-end LaunchAgent install tests requiring a real `launchctl
bootstrap` (would need a macOS runner + sudo + cleanup of system
state — out of scope; cover with the `--bootstrap` smoke test
document only)
- Refactoring `install-launchagent.ts` for testability beyond the
existing `if (import.meta.main)` guard (the helper functions
`xmlEscape`, `substitutePlaceholders`, `requireAbsolute`,
`tryDetect`, `plutilLint` are already importable)
88 changes: 88 additions & 0 deletions docs/hygiene-history/ticks/2026/05/15/0305Z.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Tick 0305Z — PR #3342 thread close-out (5 reviews + 1 CodeQL TOCTOU); PR #3356 merged

## Headline

- **PR [#3342](https://github.com/Lucent-Financial-Group/Zeta/pull/3342) thread close-out**: 5 Copilot+Codex review threads resolved + 1 new CodeQL TOCTOU thread caught & fixed mid-cycle. Final state: gate=BLOCKED wait-ci, 0 failed required, 0 unresolved threads, auto-merge armed. Two commits landed (`ee749f4` + `6543430`).
- **PR [#3356](https://github.com/Lucent-Financial-Group/Zeta/pull/3356) MERGED**: 0230Z shard with relative-link depth fix (4x→5x for `docs/`, 5x→6x for `.claude/`). Caught by Codex + Copilot in the same review pass. Same class as the 0027Z shard fix in #3330.

## What landed on #3342

Five original review threads (3 Copilot P1, 2 Codex P2):

| Finding | Fix |
|---|---|
| Broken provenance refs in README header | Dropped first-name attribution + user-scope memory-glob per `docs/AGENT-BEST-PRACTICES.md` §671-685 |
| `sed` recipe unsafe for paths with `&`, `\|`, `\` | New `tools/shadow/launchd/install-launchagent.ts`: String.replaceAll (metacharacter-safe), plutil-lint pre-write, backup-rename, optional bootstrap |
| Rule file referenced from plist not in tree | Auto-resolved by #3339 merge; reference now resolves on main |
| KeepAlive `SuccessfulExit=false`+`Crashed=true` OR-combined | Dropped `SuccessfulExit=false` (cycle 2); **restored cycle 3** after Copilot pointed out Bun runtime errors exit non-zero and would otherwise leave the agent dead until next login. Final state: both keys present with extended plist comment documenting the reviewer-conflict resolution + ThrottleInterval=60s as loop-cap |
| `launchctl load/unload` deprecated | Replaced with `bootstrap`/`bootout` + `gui/$(id -u)` domain across install + reload + uninstall paths |

One follow-up CodeQL thread on the new install script:

| Finding | Fix |
|---|---|
| `js/file-system-race` at writeFileSync (line 184) — existsSync→copyFileSync→writeFileSync had a TOCTOU window | Replaced existsSync+copyFileSync with `renameSync(destPath, backup)` inside try/catch (atomic, single syscall, ENOENT means no prior plist) |

## Substrate-honest mid-tick stuck-point

When I attempted to `git push --force-with-lease` after a local rebase of the #3342 branch onto current main, the harness auto-mode classifier blocked the force-push (correctly conservative — feature-branch force-with-lease is normally fine, but the classifier defaults to blocking force operations). Pivoted to a non-rebasing strategy:

1. `git reset --hard origin/feat/shadow-launchd-install-path-a-otto-cli-2026-05-15` (revert to remote tip)
2. Redo the edits manually (3 files: README, plist, new install script)
3. Regular fast-forward `git push` succeeded

Loss: the explicit benefit of the rebase (picking up #3349's process-extract.ts fix on the branch) was deferred to the GitHub merge step. Cost was acceptable: the PR's `lint (tsc tools)` check passed anyway because GitHub CI runs against the `refs/pull/3342/merge` virtual merge ref, which already includes main's process-extract.ts fix.

## CodeQL TOCTOU pattern (preserved for future-reference)

```typescript
// AVOID — TOCTOU between existsSync and copyFileSync:
if (existsSync(destPath)) {
copyFileSync(destPath, backup);
}
writeFileSync(destPath, configured, "utf-8");

// ALSO AVOID — renaming the existing dest aside leaves the canonical
// path missing during the window before writeFileSync completes:
try { renameSync(destPath, backup); }
catch (e) {
if ((e as NodeJS.ErrnoException).code !== "ENOENT") throw e;
}
writeFileSync(destPath, configured, "utf-8"); // window: destPath empty

// PREFER — availability-preserving write-temp-then-atomic-replace,
// with the backup side-carred from in-memory content AFTER the
// install is in place:
let oldContent: string | undefined;
try { oldContent = readFileSync(destPath, "utf-8"); }
catch (e) {
if ((e as NodeJS.ErrnoException).code !== "ENOENT") {
// Read failed for a non-absence reason — skip backup, proceed install
}
}
writeFileSync(`${destPath}.tmp-${process.pid}`, configured, "utf-8");
renameSync(`${destPath}.tmp-${process.pid}`, destPath); // atomic replace
if (oldContent !== undefined) {
writeFileSync(`${destPath}.bak-${ts}`, oldContent, "utf-8");
}
```

The third pattern is the canonical POSIX "safe install" — destPath is never empty (atomic rename replaces in one syscall), and the backup is created from in-memory content AFTER the install succeeds. Updated 2026-05-15T0322Z per follow-up Codex review on PR #3342 (the second pattern, "ALSO AVOID", was the first iteration of this fix and is also a regression vs. pattern three).

## Δ since 0230Z (last shard)

| What | Before | After |
|---|---|---|
| #3342 gate | BLOCKED (5 threads + 1 failed required) | BLOCKED wait-ci (0 threads, auto-merge armed) |
| #3356 state | OPEN (2 threads on shard relative links) | MERGED |
| `tools/shadow/launchd/install-launchagent.ts` | did not exist | new TS installer (safe substitution, plutil pre-write, bootstrap helper) |
| Plist KeepAlive | `Crashed=true + SuccessfulExit=false` (relaunch on any non-zero exit) | Unchanged — first attempted to drop `SuccessfulExit=false` for crash-only semantics, but cycle-3 Copilot review restored it (Bun runtime errors exit non-zero, not signal-terminate); ThrottleInterval=60s caps any config-error loop. Net: same shape as before this PR, but now with extended plist comment documenting the trade-off |
| README install/teardown | legacy `launchctl load/unload` | modern `bootstrap`/`bootout gui/$(id -u)` |

## Cron sentinel

`a2c54a1c` armed.

## Next

Cron-driven. Next tick verifies #3342 merge. If merged, prune the worktrees; scan bus + backlog ready-to-grind for next pick.
49 changes: 49 additions & 0 deletions docs/hygiene-history/ticks/2026/05/15/0329Z.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Tick 0329Z — PR #3342 review-cycle-3 close-out; B-0519 Pattern 7 + B-0528 filed

## Headline

- PR [#3342](https://github.com/Lucent-Financial-Group/Zeta/pull/3342): three review cycles closed this tick (11 + 1 + 7 = 19 findings addressed; one P1 deferred to backlog). Gate state: BLOCKED wait-ci, 0 failed, 0 unresolved threads, auto-merge armed. Three commits landed: `970c774` + `19af54f` + `7084bb3`.
- [B-0519 RCA](../../../../backlog/P3/B-0519-multi-otto-branch-state-contamination-rca-2026-05-14.md) updated: Pattern 7 "Abandoned rebase state in shared `.git/` dir" added with field-test pointer + cheap-mechanization sketch (stale-rebase-state diagnostic). Committed in `f6789dd`.
- [B-0528](../../../../backlog/P3/B-0528-shadow-launchd-installer-unit-tests-2026-05-15.md) filed for the deferred test-coverage P1. Committed in `cd4964e`. Effort: S; P3 because behavior is correct (smoke-tested) — tests harden against regression.
Comment on lines +6 to +7
- Two prior worktrees pruned (`/tmp/zeta-otto-cli-fix-3339-mdlint`, `/tmp/zeta-otto-cli-fix-3349-threads` — both branches merged this morning).

## Review-cycle-3 findings (this tick, post-`970c774`)

| Reviewer | File:line | Finding | Resolution |
|---|---|---|---|
| Copilot P1 | README.md:5 + plist:24 | Cross-refs to `.claude/rules/shadow-star-shorthand-*` not in branch tree | Resolved as outdated — file IS on main (#3339); CI merge-ref check passes |
| Copilot P1 | plist:61 | `Crashed=true` alone misses non-zero-exit Bun crashes | Restored `SuccessfulExit=false` with extended plist comment documenting the reviewer-conflict resolution. ThrottleInterval+--dry-run cap config-error loops |
| Copilot P1 | ts:57 | Missing `sonarjs/no-os-command-from-path` suppression on `execFileSync` | Added at tryDetect, plutilLint, all launchctl/id calls; comments cite poll-pr-gate.ts precedent |
| Copilot P2 | ts:306 | `import.meta.main` guard but functions not exported | Added `export` to main, parseArgs, requireAbsolute, tryDetect, xmlEscape, substitutePlaceholders, plutilLint, Args interface |
| Copilot P1 | shard:51 | "PREFER" snippet in 0305Z shard still showed rename-then-write pattern | Updated to 3-pattern progression with the canonical write-temp-then-atomic-replace |
| Codex P2 | ts:257 (line of prev commit) | Rename-then-write still leaves canonical path empty in brief window | Replaced with availability-preserving pattern: read-old-into-memory → write temp → atomic replace → side-car backup. Canonical path never empty |
| Copilot P1 | ts:185 | No unit tests for safety-critical helpers | Deferred to [B-0528](../../../../backlog/P3/B-0528-shadow-launchd-installer-unit-tests-2026-05-15.md) (substrate-honest scope management); PR comment linked the row |

## Substrate-honest review-cycle observations

1. **Each commit triggers a fresh review pass.** Cycle 2 found 11 things in the file I authored on cycle-1's "fix"; cycle 3 found 7 more. This is the normal review-bot cadence, not pathological — each fix is real, and Codex+Copilot review the diff against current main.

2. **Reviewer-conflict resolution (KeepAlive)**: Codex earlier asked me to drop `SuccessfulExit=false` to avoid config-error restart loops. Copilot this cycle asked me to restore it for Bun-runtime-error coverage. Both are correct concerns; the trade-off is real. I restored the key and documented the rationale in the plist comment so future-readers (and future-reviewers) can see the deliberation rather than re-derive it.

3. **Test coverage as backlog-row vs in-PR**: per substrate-honest scope-management, the test file is a separate piece of work. Filing B-0528 + linking it from the PR thread + commit message is the canonical way to defer without dropping the finding on the floor.

## Δ since 0305Z (last shard)

| What | Before | After |
|---|---|---|
| PR #3342 commits | `7b950f7` | `7084bb3` (+4 commits) |
| PR #3342 unresolved threads | 11 (cycle 2) → 1 (cycle 2.5) → 7 (cycle 3) | 0 |
| `install-launchagent.ts` correctness | XML-unsafe substitution, silent unknown args, relative paths accepted, brief destPath-empty window | XML-escaped, unknown-args rejected, absolute-required, availability-preserving install |
| `install-launchagent.ts` testability | helpers unexported, main() unconditional | `export`-ed helpers + `import.meta.main` guard |
| Plist KeepAlive | `Crashed=true` only | `Crashed=true + SuccessfulExit=false` with deliberation comment |
| B-0519 RCA pattern count | 6 (1-6 from 2026-05-14) | 7 (added "abandoned rebase state in shared .git/") |
| Backlog rows in flight | (no B-0528) | B-0528 filed (deferred test coverage) |
| Obsolete worktrees | 2 alive | 0 (pruned cleanly) |

## Cron sentinel

`a2c54a1c` armed.

## Next

Cron-driven. Next tick verifies #3342 merge. If yet another review cycle surfaces, address in batch + stop. Otherwise prune `/tmp/zeta-otto-cli-fix-3342-threads` and scan bus/backlog for next pick.
Loading
Loading