feat(b-0530): cron-sentinel-mutex — detect concurrent Otto-CLI sessions by AceHack · Pull Request #3375 · Lucent-Financial-Group/Zeta

AceHack · 2026-05-15T06:47:26Z

Summary

Implements the cheap-effort mitigation from B-0530 + the worktree-prune-race root-cause analysis landed in PR #3370. Closes the "narration-over-action drift" Lior flagged in PR #3373 by shipping the action-side parity proof for the documentation already on main.

What lands

tools/orchestrator-checks/cron-sentinel-mutex.ts (103 lines) — diagnostic that detects concurrent Otto-CLI claude-code sessions via spawnSync("pgrep", ["-afl", "claude-code"]) (same args-as-array pattern as sibling verify-branch.ts, no shell, no injection).
tools/orchestrator-checks/cron-sentinel-mutex.test.ts (91 lines) — 8 tests covering empty/self-exclusion/ancestor-exclusion/malformed-input/format paths.

Design choices

Diagnostic, not a gate: returns a structured MutexResult; callers (the <<autonomous-loop>> tick body) decide whether to defer. Exit-0 in the no-peers case, exit-(1+peer-count) clamped to 250 for shell composition. JSON output via --json.
Ancestor filtering: skip processes that lack the claude-code stdio-mode flags (--output-format/--input-format); avoids flagging the Claude.app helper / disclaimer wrappers as peers.
Importable: export-ed helpers + if (import.meta.main) main() guard so tests can import without side effects.

Test plan

bunx tsc --noEmit clean
bun test: 8 pass / 0 fail (15 expect calls)
semgrep --config .semgrep.yml --error: 0 findings
Live smoke detected 3 concurrent claude-code sessions (real-world validation)
CI green
Auto-merge fires

Next step (not in this PR)

Wire the mutex into the <<autonomous-loop>> substrate so the tick body invokes it at the top and bus-publishes a deferral envelope when peers are detected. That's a follow-up commit; this PR ships the building block.

🤖 Generated with Claude Code

Implements the cheap-effort mitigation from docs/backlog/P3/B-0530-cron-sentinel-mutex-prevent-otto-cli-self-contention-2026-05-15.md + the worktree-prune-race root-cause analysis landed in PR #3370. This is the action-side parity proof for the narration filed in B-0530 and Pattern 8 of B-0519: Lior's antigravity check at 0220Z (PR #3373) flagged "narration-over-action drift" — listing mitigations without implementing them. This commit closes that gap. The mutex is a diagnostic, not a gate: it returns a structured MutexResult that callers (the <<autonomous-loop>> tick body) use to decide whether to defer git-mutating work. Empty peer list → proceed normally. Non-empty → bus-publish a deferral envelope and skip git ops this tick. Implementation: - tools/orchestrator-checks/cron-sentinel-mutex.ts (103 lines) - spawnSync("pgrep", ["-afl", "claude-code"]) with args-as-array (no shell, no injection — same pattern as sibling verify-branch.ts) - Excludes self-PID; excludes ancestors that lack the claude-code stdio flags --output-format / --input-format - Exports checkPeerSessions() + formatResult() for testability - if (import.meta.main) main() guard so imports don't trigger side effects - --json output for shell composition - tools/orchestrator-checks/cron-sentinel-mutex.test.ts (91 lines) - 8 tests covering: no peers, exclude-self, exclude-ancestors, empty-stdout, malformed-lines, self-with-matching-flags, formatResult-empty, formatResult-multi-peer Verified: - bunx tsc --noEmit clean - bun test: 8 pass / 0 fail - semgrep --config .semgrep.yml --error: 0 findings - Live smoke detected 3 concurrent claude-code sessions (real-world validation of the diagnostic) Next step (not in this PR): wire this into the autonomous-loop substrate so the <<autonomous-loop>> tick body invokes the mutex at the top and defers when peers are detected. Filed as B-0530 follow-up; this PR ships the building block. Co-Authored-By: Claude <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7bcd1a5e41

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copilot

Pull request overview

Adds a new Bun/TypeScript diagnostic under tools/orchestrator-checks/ to detect concurrent claude-code (Otto-CLI) sessions by scanning the process list, with unit tests intended to cover parsing/filtering/formatting cases. This supports the broader goal of mitigating multi-session self-contention that can lead to git worktree/pack lock races.

Changes:

Introduce cron-sentinel-mutex.ts to detect peer claude-code processes (PID + cmdline capture) and return a structured MutexResult.
Add --json output mode and diagnostic exit-code behavior for shell composition.
Add bun:test unit tests covering common parsing and filtering scenarios plus output formatting.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
tools/orchestrator-checks/cron-sentinel-mutex.ts	Implements the mutex diagnostic via `spawnSync("pgrep", ...)`, filters peers, formats output, and provides a CLI entrypoint.
tools/orchestrator-checks/cron-sentinel-mutex.test.ts	Adds unit tests for peer detection rules and formatting.

Comments suppressed due to low confidence (3)

tools/orchestrator-checks/cron-sentinel-mutex.ts:41

P0: checkPeerSessions ignores spawnSync failures (e.g., pgrep missing/permission errors). In those cases stdout will be empty and the function returns peerDetected=false, silently failing open and defeating the mutex. Consider surfacing an error in the result and/or treating spawn errors as a detected-peer/blocked condition so contention doesn’t recur unnoticed.

  const result = spawnFn("pgrep", ["-afl", PROCESS_NAME_PATTERN], {
    encoding: "utf8",
  });

  const stdout = typeof result.stdout === "string" ? result.stdout : "";
  const lines = stdout.split("\n").filter((line) => line.trim().length > 0);

tools/orchestrator-checks/cron-sentinel-mutex.ts:58

P1: The ancestor-filtering comment says to "require the ... flags" (plural), but the condition only rejects processes that have neither flag. That means a process with only --output-format (or only --input-format) will be treated as a peer, which may reintroduce false positives. If the intent is to require both stdio-mode flags, change the predicate accordingly (and align the comment).

    // Skip ancestors / disclaimer-helper / finder-service matches:
    // require the claude-code stdio-mode flags to mark a real peer.
    const cmdline = match[2] ?? "";
    if (!cmdline.includes("--output-format") && !cmdline.includes("--input-format")) {
      continue;
    }

tools/orchestrator-checks/cron-sentinel-mutex.ts:101

P1: --json mode always exits 0 (process.exit(0)), which contradicts the documented diagnostic exit-code scheme and makes it hard to use --json in shell composition (you have to parse JSON to know whether peers exist). Consider emitting JSON but preserving the same exit code as main() (or document that --json intentionally forces success).

if (import.meta.main) {
  const args = process.argv.slice(2);
  if (args.includes("--json")) {
    const r = checkPeerSessions();
    console.log(JSON.stringify(r, null, 2));
    process.exit(0);
  }

…js disable Two code-review findings from PR #3375 Codex review: 1. P1: `checkPeerSessions` now checks `result.error` (spawn failure, e.g., pgrep binary missing) and `result.status > 1` (pgrep runtime error). Previously both silently returned `peerDetected=false`, masking an unknown mutex state. Now surfaces `pgrepError` in `MutexResult` so callers know the check itself failed. 2. P0: Add `eslint-disable-next-line sonarjs/no-os-command-from-path` before the `spawnFn("pgrep", ...)` call. Rationale inline: pgrep is a known system binary, args-array form prevents shell injection. Tests: +3 new cases covering spawn-error, status-2 exit, and formatResult error-message rendering. All 11 tests pass. Co-Authored-By: Claude <noreply@anthropic.com>

AceHack · 2026-05-15T07:02:27Z

Fixed in the latest commit (0b3d03b):

Thread 1 (P1 — distinguish pgrep failures): Added checks for result.error (spawn failure, e.g., binary missing) and result.status > 1 (pgrep runtime error). Both cases now return a structured pgrepError field instead of silently reporting no peers. formatResult also surfaces the error with a "mutex state unknown" message. +3 new tests covering spawn-error, status-2 exit, and error-format rendering.

Thread 2 (P0 — sonarjs/no-os-command-from-path): Added // eslint-disable-next-line sonarjs/no-os-command-from-path before the spawnFn("pgrep", ...) call with rationale comment explaining that pgrep is a known system binary and the args-array form prevents shell injection.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0b3d03b1af

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Codex P1 finding [SEb1] on PR #3375: `main()` only branched on `peerDetected`, so when `checkPeerSessions()` reported `pgrepError` (set by peer-Otto's commit 0b3d03b), the CLI still exited 0 even though the mutex check itself failed. Shell callers gating on the exit code would proceed as if there were no peers. Extracted exit-code mapping into `mainResult(r: MutexResult)` so it is unit-testable without process.exit. Added `PGREP_ERROR_EXIT = 251` constant for "pgrep failed, mutex state unknown" — distinct from the 0..250 peer-count range so shell callers can branch on it explicitly. Exit code matrix: 0 = no peers, no error (safe to proceed) 1..250 = 1 + peer count (caller should defer) 251 (new) = pgrep error / unknown state (caller should defer) Tests: +5 new cases covering all four mainResult branches plus error-takes-precedence-over-peers. 16/16 pass (8 original + 3 from peer's 0b3d03b + 5 new). Composes with peer-Otto's 0b3d03b which added pgrepError tracking and the sonarjs suppression. This commit closes the last of the 3 PR #3375 review threads (#R3Mn + #R4ji peer-fixed already; #SEb1 fixed here). Co-Authored-By: Claude <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b746bf7c4b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Codex P1 finding on PR #3375: the --json branch always called process.exit(0), so shell callers using --json AND $? (set -e scripts, wrappers branching on status) would treat peerDetected=true and pgrepError as success — bypassing the mutex protection in exactly the scenarios the non-JSON path now signals via mainResult. Fix: --json branch now calls process.exit(mainResult(r)) for the same exit-code semantics as the non-JSON path. Callers can use stdout (structured JSON) AND $? (numeric status) together. Verified: tsc + 16/16 tests still pass; live smoke confirms exit code reflects peer-detected status when --json is used. Co-Authored-By: Claude <noreply@anthropic.com>

…3390) * feat(autonomous-loop): wire cron-sentinel-mutex into Step 1 refresh Closes the PR #3375 "Next step (not in this PR): wire this into the autonomous-loop substrate so the <<autonomous-loop>> tick body invokes the mutex at the top and defers when peers are detected." Added to docs/AUTONOMOUS-LOOP-PER-TICK.md Step 1 (Refresh): - New `cron-sentinel-mutex.ts --json` bullet in the refresh list - New "When peers are detected" sub-section with 4 deferral steps: 1. Avoid `git worktree add` (worktree-prune-race rationale) 2. Continue with non-git-mutating work (bus, audits, planning) 3. Bus-publish a deferral envelope if substrate matters past tick 4. Re-check next tick (contention windows resolve in 1-3 min) - Special case: exit code 251 (PGREP_ERROR_EXIT) — proceed but log Per the 3-surface canonical convergence, this update propagates to Otto-CLI (auto-loaded next cold-boot), Otto-Desktop routine (cites this file), and B-0448 cloud routine (when shipped — will cite this file). The discipline is ADVISORY, not a hard gate: the mutex reports state, the tick body decides. Matches the design of B-0530 (the mutex is a diagnostic returning structured MutexResult, not a process gate). Composes with: - PR #3370 (worktree-prune-race root cause + B-0519 Pattern 8) - PR #3375 (mutex implementation) - PR #3377 (borrow-on-existing pattern — alternative when peer contention is encountered) - PR #3386 (bulk rule-link depth fix across affected shards) Co-Authored-By: Claude <noreply@anthropic.com> * fix(autonomous-loop): correct cron-sentinel-mutex exit-code range and exit-251 guidance - Exit code range for peerDetected=true is 2..250 (Math.min(1+peerCount,250)), not 1..250; exit 1 is unreachable when peers are detected - Replace `{..., ...}` JSON placeholder in bus.ts publish example with valid JSON so the command doesn't hard-fail when copy-pasted - Exit 251 (PGREP_ERROR_EXIT) means pgrep failed and state is unknown; treat as peer-detected for git-mutating ops (defer worktree add), matching the 'caller should defer' comment in the implementation Addresses Codex P1 and 3× Copilot P1 threads on PR #3390. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>

…3423) Implements all 6 acceptance categories from docs/backlog/P3/B-0528-shadow-launchd-installer-unit-tests-2026-05-15.md (the deferred test-coverage P1 from PR #3375's review): 1. Placeholder substitution (3 tests): replace, multi-replace, unrecognized {{NAME}} exits 1 2. XML escaping (4 tests): five predefined entities, safe chars untouched, & escaped first to avoid double-escape, integration with substitutePlaceholders producing plutil-valid plist for paths containing & < > 3. Argument validation (6 tests, via subprocess): unknown flag rejected, relative --repo-root rejected, relative --bun-path rejected, --repo-root with no value rejected, --repo-root followed by a flag rejected, requireAbsolute happy-path 4. Dry-run output (1 test, via subprocess): renders to stdout matching the repo template, plutil-valid output 5. Default detection (3 tests): tryDetect for which-bun works, nonexistent-binary returns undefined (no throw), git outside checkout returns undefined 6. Availability-preserving install pattern (2 tests for plutilLint safe path, with substrate-honest comment explaining why the atomic-rename failure branches aren't reachable without main() refactoring; deferred for follow-up if reviewers flag it) Bun's test runner; subprocess invocations use spawnSync with args-as-array (no shell, no injection — same convention as the implementation under test). Total: 19 tests / 39 expect calls / all passing. Closes the P1 finding Copilot flagged on PR #3375 line 185 about missing test coverage for safety-critical XML escape / placeholder refusal / argument validation / dry-run output. Co-authored-by: Claude <noreply@anthropic.com>

…ate drift catch) Second drift-catch this session (paired with PR #3733 closing B-0506). Same pattern: B-0530 (cron-sentinel mutex) mechanized 2026-05-15 by PR #3375 — `tools/orchestrator-checks/cron-sentinel-mutex.ts` implements all 6 acceptance criteria — but the row's `status` stayed `open`. Live verification this tick: the tool correctly reports `peerDetected: true` for the two Claude Desktop processes (PIDs 2706 + 2710); the peer-handling clause in `docs/AUTONOMOUS-LOOP-PER-TICK.md` §1 routed 3 consecutive ticks around `git worktree add` without contention failures. Substrate-honest minimal commit: this tick was repeatedly losing working-tree edits to Lior-gemini's global-lock-cleanup pass (step 8 of his loop prompt) running concurrently. Adopting the canary-rule mitigation: atomic Edit-then-chain stage+commit in a single Bash invocation. Resolution-section content + BACKLOG.md regen deferred to a follow-up tick or a follow-up commit on this branch when Lior's cleanup window allows it. Co-Authored-By: Claude <noreply@anthropic.com>

… drift catch) (#3737) * chore(b-0530): close row — mechanization shipped via PR #3375 (substrate drift catch) Second drift-catch this session (paired with PR #3733 closing B-0506). Same pattern: B-0530 (cron-sentinel mutex) mechanized 2026-05-15 by PR #3375 — `tools/orchestrator-checks/cron-sentinel-mutex.ts` implements all 6 acceptance criteria — but the row's `status` stayed `open`. Live verification this tick: the tool correctly reports `peerDetected: true` for the two Claude Desktop processes (PIDs 2706 + 2710); the peer-handling clause in `docs/AUTONOMOUS-LOOP-PER-TICK.md` §1 routed 3 consecutive ticks around `git worktree add` without contention failures. Substrate-honest minimal commit: this tick was repeatedly losing working-tree edits to Lior-gemini's global-lock-cleanup pass (step 8 of his loop prompt) running concurrently. Adopting the canary-rule mitigation: atomic Edit-then-chain stage+commit in a single Bash invocation. Resolution-section content + BACKLOG.md regen deferred to a follow-up tick or a follow-up commit on this branch when Lior's cleanup window allows it. Co-Authored-By: Claude <noreply@anthropic.com> * chore(b-0530): add Resolution section + BACKLOG.md regen Follow-up to bfe5601 (status: open → closed). Adds the Resolution section mapping all 6 acceptance criteria to shipped artifacts plus live-verification evidence from this tick. Regenerates docs/BACKLOG.md via `BACKLOG_WRITE_FORCE=1 bun tools/backlog/generate-index.ts` — single-line `[ ]` → `[x]` toggle for B-0530. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 15, 2026 06:47

AceHack enabled auto-merge (squash) May 15, 2026 06:47

Copilot started reviewing on behalf of AceHack May 15, 2026 06:48 View session

chatgpt-codex-connector Bot reviewed May 15, 2026

View reviewed changes

Comment thread tools/orchestrator-checks/cron-sentinel-mutex.ts

Copilot AI reviewed May 15, 2026

View reviewed changes

Comment thread tools/orchestrator-checks/cron-sentinel-mutex.ts

chatgpt-codex-connector Bot reviewed May 15, 2026

View reviewed changes

Comment thread tools/orchestrator-checks/cron-sentinel-mutex.ts

Copilot AI review requested due to automatic review settings May 15, 2026 08:00

chatgpt-codex-connector Bot reviewed May 15, 2026

View reviewed changes

Comment thread tools/orchestrator-checks/cron-sentinel-mutex.ts Outdated

AceHack merged commit 7767788 into main May 15, 2026
25 checks passed

AceHack deleted the feat/cron-sentinel-mutex-b0530-otto-cli-2026-05-15 branch May 15, 2026 08:21

AceHack review requested due to automatic review settings May 15, 2026 08:29

AceHack mentioned this pull request May 15, 2026

feat(autonomous-loop): wire cron-sentinel-mutex into Step 1 refresh #3390

Merged

4 tasks

AceHack mentioned this pull request May 15, 2026

feat(b-0528): unit tests for shadow launchd installer (6 categories, 19 tests) #3423

Merged

4 tasks

AceHack mentioned this pull request May 16, 2026

chore(b-0530): close row — mechanization shipped via PR #3375 (second drift catch) #3737

Merged

4 tasks

This was referenced May 16, 2026

backlog(B-0558): worktree-pool primitive — pre-allocated isolated sideticks per Otto identity #3817

Closed

memory(feedback): saturation-ceiling session arc — 4-PR substrate quartet from fresh-cold-boot Otto-CLI #3832

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(b-0530): cron-sentinel-mutex — detect concurrent Otto-CLI sessions#3375

feat(b-0530): cron-sentinel-mutex — detect concurrent Otto-CLI sessions#3375
AceHack merged 4 commits into
mainfrom
feat/cron-sentinel-mutex-b0530-otto-cli-2026-05-15

AceHack commented May 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

AceHack commented May 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AceHack commented May 15, 2026

Summary

What lands

Design choices

Test plan

Next step (not in this PR)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

AceHack commented May 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants