Skip to content

feat(b-0530): cron-sentinel-mutex — detect concurrent Otto-CLI sessions#3375

Merged
AceHack merged 4 commits into
mainfrom
feat/cron-sentinel-mutex-b0530-otto-cli-2026-05-15
May 15, 2026
Merged

feat(b-0530): cron-sentinel-mutex — detect concurrent Otto-CLI sessions#3375
AceHack merged 4 commits into
mainfrom
feat/cron-sentinel-mutex-b0530-otto-cli-2026-05-15

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 15, 2026

Summary

Implements the cheap-effort mitigation from B-0530 + the worktree-prune-race root-cause analysis landed in PR #3370. Closes the "narration-over-action drift" Lior flagged in PR #3373 by shipping the action-side parity proof for the documentation already on main.

What lands

  • tools/orchestrator-checks/cron-sentinel-mutex.ts (103 lines) — diagnostic that detects concurrent Otto-CLI claude-code sessions via spawnSync("pgrep", ["-afl", "claude-code"]) (same args-as-array pattern as sibling verify-branch.ts, no shell, no injection).
  • tools/orchestrator-checks/cron-sentinel-mutex.test.ts (91 lines) — 8 tests covering empty/self-exclusion/ancestor-exclusion/malformed-input/format paths.

Design choices

  • Diagnostic, not a gate: returns a structured MutexResult; callers (the <<autonomous-loop>> tick body) decide whether to defer. Exit-0 in the no-peers case, exit-(1+peer-count) clamped to 250 for shell composition. JSON output via --json.
  • Ancestor filtering: skip processes that lack the claude-code stdio-mode flags (--output-format/--input-format); avoids flagging the Claude.app helper / disclaimer wrappers as peers.
  • Importable: export-ed helpers + if (import.meta.main) main() guard so tests can import without side effects.

Test plan

  • bunx tsc --noEmit clean
  • bun test: 8 pass / 0 fail (15 expect calls)
  • semgrep --config .semgrep.yml --error: 0 findings
  • Live smoke detected 3 concurrent claude-code sessions (real-world validation)
  • CI green
  • Auto-merge fires

Next step (not in this PR)

Wire the mutex into the <<autonomous-loop>> substrate so the tick body invokes it at the top and bus-publishes a deferral envelope when peers are detected. That's a follow-up commit; this PR ships the building block.

🤖 Generated with Claude Code

Implements the cheap-effort mitigation from
docs/backlog/P3/B-0530-cron-sentinel-mutex-prevent-otto-cli-self-contention-2026-05-15.md
+ the worktree-prune-race root-cause analysis landed in PR #3370.

This is the action-side parity proof for the narration filed in B-0530
and Pattern 8 of B-0519: Lior's antigravity check at 0220Z (PR #3373)
flagged "narration-over-action drift" — listing mitigations without
implementing them. This commit closes that gap.

The mutex is a diagnostic, not a gate: it returns a structured
MutexResult that callers (the <<autonomous-loop>> tick body) use to
decide whether to defer git-mutating work. Empty peer list → proceed
normally. Non-empty → bus-publish a deferral envelope and skip git
ops this tick.

Implementation:
  - tools/orchestrator-checks/cron-sentinel-mutex.ts (103 lines)
    - spawnSync("pgrep", ["-afl", "claude-code"]) with args-as-array
      (no shell, no injection — same pattern as sibling verify-branch.ts)
    - Excludes self-PID; excludes ancestors that lack the claude-code
      stdio flags --output-format / --input-format
    - Exports checkPeerSessions() + formatResult() for testability
    - if (import.meta.main) main() guard so imports don't trigger
      side effects
    - --json output for shell composition
  - tools/orchestrator-checks/cron-sentinel-mutex.test.ts (91 lines)
    - 8 tests covering: no peers, exclude-self, exclude-ancestors,
      empty-stdout, malformed-lines, self-with-matching-flags,
      formatResult-empty, formatResult-multi-peer

Verified:
  - bunx tsc --noEmit clean
  - bun test: 8 pass / 0 fail
  - semgrep --config .semgrep.yml --error: 0 findings
  - Live smoke detected 3 concurrent claude-code sessions
    (real-world validation of the diagnostic)

Next step (not in this PR): wire this into the autonomous-loop
substrate so the <<autonomous-loop>> tick body invokes the mutex
at the top and defers when peers are detected. Filed as B-0530
follow-up; this PR ships the building block.

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 15, 2026 06:47
@AceHack AceHack enabled auto-merge (squash) May 15, 2026 06:47
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7bcd1a5e41

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tools/orchestrator-checks/cron-sentinel-mutex.ts
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Bun/TypeScript diagnostic under tools/orchestrator-checks/ to detect concurrent claude-code (Otto-CLI) sessions by scanning the process list, with unit tests intended to cover parsing/filtering/formatting cases. This supports the broader goal of mitigating multi-session self-contention that can lead to git worktree/pack lock races.

Changes:

  • Introduce cron-sentinel-mutex.ts to detect peer claude-code processes (PID + cmdline capture) and return a structured MutexResult.
  • Add --json output mode and diagnostic exit-code behavior for shell composition.
  • Add bun:test unit tests covering common parsing and filtering scenarios plus output formatting.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
tools/orchestrator-checks/cron-sentinel-mutex.ts Implements the mutex diagnostic via spawnSync("pgrep", ...), filters peers, formats output, and provides a CLI entrypoint.
tools/orchestrator-checks/cron-sentinel-mutex.test.ts Adds unit tests for peer detection rules and formatting.
Comments suppressed due to low confidence (3)

tools/orchestrator-checks/cron-sentinel-mutex.ts:41

  • P0: checkPeerSessions ignores spawnSync failures (e.g., pgrep missing/permission errors). In those cases stdout will be empty and the function returns peerDetected=false, silently failing open and defeating the mutex. Consider surfacing an error in the result and/or treating spawn errors as a detected-peer/blocked condition so contention doesn’t recur unnoticed.
  const result = spawnFn("pgrep", ["-afl", PROCESS_NAME_PATTERN], {
    encoding: "utf8",
  });

  const stdout = typeof result.stdout === "string" ? result.stdout : "";
  const lines = stdout.split("\n").filter((line) => line.trim().length > 0);

tools/orchestrator-checks/cron-sentinel-mutex.ts:58

  • P1: The ancestor-filtering comment says to "require the ... flags" (plural), but the condition only rejects processes that have neither flag. That means a process with only --output-format (or only --input-format) will be treated as a peer, which may reintroduce false positives. If the intent is to require both stdio-mode flags, change the predicate accordingly (and align the comment).
    // Skip ancestors / disclaimer-helper / finder-service matches:
    // require the claude-code stdio-mode flags to mark a real peer.
    const cmdline = match[2] ?? "";
    if (!cmdline.includes("--output-format") && !cmdline.includes("--input-format")) {
      continue;
    }

tools/orchestrator-checks/cron-sentinel-mutex.ts:101

  • P1: --json mode always exits 0 (process.exit(0)), which contradicts the documented diagnostic exit-code scheme and makes it hard to use --json in shell composition (you have to parse JSON to know whether peers exist). Consider emitting JSON but preserving the same exit code as main() (or document that --json intentionally forces success).
if (import.meta.main) {
  const args = process.argv.slice(2);
  if (args.includes("--json")) {
    const r = checkPeerSessions();
    console.log(JSON.stringify(r, null, 2));
    process.exit(0);
  }

Comment thread tools/orchestrator-checks/cron-sentinel-mutex.ts
…js disable

Two code-review findings from PR #3375 Codex review:

1. P1: `checkPeerSessions` now checks `result.error` (spawn failure, e.g.,
   pgrep binary missing) and `result.status > 1` (pgrep runtime error).
   Previously both silently returned `peerDetected=false`, masking an
   unknown mutex state. Now surfaces `pgrepError` in `MutexResult` so
   callers know the check itself failed.

2. P0: Add `eslint-disable-next-line sonarjs/no-os-command-from-path`
   before the `spawnFn("pgrep", ...)` call. Rationale inline: pgrep is
   a known system binary, args-array form prevents shell injection.

Tests: +3 new cases covering spawn-error, status-2 exit, and
formatResult error-message rendering. All 11 tests pass.

Co-Authored-By: Claude <noreply@anthropic.com>
@AceHack
Copy link
Copy Markdown
Member Author

AceHack commented May 15, 2026

Fixed in the latest commit (0b3d03b):

Thread 1 (P1 — distinguish pgrep failures): Added checks for result.error (spawn failure, e.g., binary missing) and result.status > 1 (pgrep runtime error). Both cases now return a structured pgrepError field instead of silently reporting no peers. formatResult also surfaces the error with a "mutex state unknown" message. +3 new tests covering spawn-error, status-2 exit, and error-format rendering.

Thread 2 (P0 — sonarjs/no-os-command-from-path): Added // eslint-disable-next-line sonarjs/no-os-command-from-path before the spawnFn("pgrep", ...) call with rationale comment explaining that pgrep is a known system binary and the args-array form prevents shell injection.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0b3d03b1af

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tools/orchestrator-checks/cron-sentinel-mutex.ts
Codex P1 finding [SEb1] on PR #3375: `main()` only branched on
`peerDetected`, so when `checkPeerSessions()` reported `pgrepError`
(set by peer-Otto's commit 0b3d03b), the CLI still exited 0 even
though the mutex check itself failed. Shell callers gating on the
exit code would proceed as if there were no peers.

Extracted exit-code mapping into `mainResult(r: MutexResult)` so it
is unit-testable without process.exit. Added `PGREP_ERROR_EXIT = 251`
constant for "pgrep failed, mutex state unknown" — distinct from the
0..250 peer-count range so shell callers can branch on it explicitly.

Exit code matrix:
  0           = no peers, no error (safe to proceed)
  1..250      = 1 + peer count (caller should defer)
  251 (new)   = pgrep error / unknown state (caller should defer)

Tests: +5 new cases covering all four mainResult branches plus
error-takes-precedence-over-peers. 16/16 pass (8 original + 3 from
peer's 0b3d03b + 5 new).

Composes with peer-Otto's 0b3d03b which added pgrepError tracking
and the sonarjs suppression. This commit closes the last of the 3
PR #3375 review threads (#R3Mn + #R4ji peer-fixed already; #SEb1
fixed here).

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 15, 2026 08:00
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b746bf7c4b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tools/orchestrator-checks/cron-sentinel-mutex.ts Outdated
Codex P1 finding on PR #3375: the --json branch always called
process.exit(0), so shell callers using --json AND $? (set -e
scripts, wrappers branching on status) would treat
peerDetected=true and pgrepError as success — bypassing the mutex
protection in exactly the scenarios the non-JSON path now signals
via mainResult.

Fix: --json branch now calls process.exit(mainResult(r)) for the
same exit-code semantics as the non-JSON path. Callers can use
stdout (structured JSON) AND $? (numeric status) together.

Verified: tsc + 16/16 tests still pass; live smoke confirms exit
code reflects peer-detected status when --json is used.

Co-Authored-By: Claude <noreply@anthropic.com>
@AceHack AceHack merged commit 7767788 into main May 15, 2026
25 checks passed
@AceHack AceHack deleted the feat/cron-sentinel-mutex-b0530-otto-cli-2026-05-15 branch May 15, 2026 08:21
@AceHack AceHack review requested due to automatic review settings May 15, 2026 08:29
AceHack added a commit that referenced this pull request May 15, 2026
…3390)

* feat(autonomous-loop): wire cron-sentinel-mutex into Step 1 refresh

Closes the PR #3375 "Next step (not in this PR): wire this into the
autonomous-loop substrate so the <<autonomous-loop>> tick body
invokes the mutex at the top and defers when peers are detected."

Added to docs/AUTONOMOUS-LOOP-PER-TICK.md Step 1 (Refresh):
  - New `cron-sentinel-mutex.ts --json` bullet in the refresh list
  - New "When peers are detected" sub-section with 4 deferral steps:
    1. Avoid `git worktree add` (worktree-prune-race rationale)
    2. Continue with non-git-mutating work (bus, audits, planning)
    3. Bus-publish a deferral envelope if substrate matters past tick
    4. Re-check next tick (contention windows resolve in 1-3 min)
  - Special case: exit code 251 (PGREP_ERROR_EXIT) — proceed but log

Per the 3-surface canonical convergence, this update propagates to
Otto-CLI (auto-loaded next cold-boot), Otto-Desktop routine (cites
this file), and B-0448 cloud routine (when shipped — will cite this
file).

The discipline is ADVISORY, not a hard gate: the mutex reports
state, the tick body decides. Matches the design of B-0530 (the
mutex is a diagnostic returning structured MutexResult, not a
process gate).

Composes with:
  - PR #3370 (worktree-prune-race root cause + B-0519 Pattern 8)
  - PR #3375 (mutex implementation)
  - PR #3377 (borrow-on-existing pattern — alternative when peer
    contention is encountered)
  - PR #3386 (bulk rule-link depth fix across affected shards)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(autonomous-loop): correct cron-sentinel-mutex exit-code range and exit-251 guidance

- Exit code range for peerDetected=true is 2..250 (Math.min(1+peerCount,250)),
  not 1..250; exit 1 is unreachable when peers are detected
- Replace `{..., ...}` JSON placeholder in bus.ts publish example with
  valid JSON so the command doesn't hard-fail when copy-pasted
- Exit 251 (PGREP_ERROR_EXIT) means pgrep failed and state is unknown;
  treat as peer-detected for git-mutating ops (defer worktree add),
  matching the 'caller should defer' comment in the implementation

Addresses Codex P1 and 3× Copilot P1 threads on PR #3390.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 15, 2026
…3423)

Implements all 6 acceptance categories from
docs/backlog/P3/B-0528-shadow-launchd-installer-unit-tests-2026-05-15.md
(the deferred test-coverage P1 from PR #3375's review):

1. Placeholder substitution (3 tests): replace, multi-replace,
   unrecognized {{NAME}} exits 1
2. XML escaping (4 tests): five predefined entities, safe chars
   untouched, & escaped first to avoid double-escape, integration
   with substitutePlaceholders producing plutil-valid plist for
   paths containing & < >
3. Argument validation (6 tests, via subprocess): unknown flag
   rejected, relative --repo-root rejected, relative --bun-path
   rejected, --repo-root with no value rejected, --repo-root
   followed by a flag rejected, requireAbsolute happy-path
4. Dry-run output (1 test, via subprocess): renders to stdout
   matching the repo template, plutil-valid output
5. Default detection (3 tests): tryDetect for which-bun works,
   nonexistent-binary returns undefined (no throw), git outside
   checkout returns undefined
6. Availability-preserving install pattern (2 tests for plutilLint
   safe path, with substrate-honest comment explaining why the
   atomic-rename failure branches aren't reachable without main()
   refactoring; deferred for follow-up if reviewers flag it)

Bun's test runner; subprocess invocations use spawnSync with
args-as-array (no shell, no injection — same convention as the
implementation under test). Total: 19 tests / 39 expect calls / all
passing.

Closes the P1 finding Copilot flagged on PR #3375 line 185 about
missing test coverage for safety-critical XML escape / placeholder
refusal / argument validation / dry-run output.

Co-authored-by: Claude <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 16, 2026
…ate drift catch)

Second drift-catch this session (paired with PR #3733 closing B-0506).
Same pattern: B-0530 (cron-sentinel mutex) mechanized 2026-05-15 by
PR #3375 — `tools/orchestrator-checks/cron-sentinel-mutex.ts`
implements all 6 acceptance criteria — but the row's `status` stayed
`open`. Live verification this tick: the tool correctly reports
`peerDetected: true` for the two Claude Desktop processes
(PIDs 2706 + 2710); the peer-handling clause in
`docs/AUTONOMOUS-LOOP-PER-TICK.md` §1 routed 3 consecutive ticks
around `git worktree add` without contention failures.

Substrate-honest minimal commit: this tick was repeatedly losing
working-tree edits to Lior-gemini's global-lock-cleanup pass (step 8
of his loop prompt) running concurrently. Adopting the canary-rule
mitigation: atomic Edit-then-chain stage+commit in a single Bash
invocation. Resolution-section content + BACKLOG.md regen deferred
to a follow-up tick or a follow-up commit on this branch when
Lior's cleanup window allows it.

Co-Authored-By: Claude <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 16, 2026
… drift catch) (#3737)

* chore(b-0530): close row — mechanization shipped via PR #3375 (substrate drift catch)

Second drift-catch this session (paired with PR #3733 closing B-0506).
Same pattern: B-0530 (cron-sentinel mutex) mechanized 2026-05-15 by
PR #3375 — `tools/orchestrator-checks/cron-sentinel-mutex.ts`
implements all 6 acceptance criteria — but the row's `status` stayed
`open`. Live verification this tick: the tool correctly reports
`peerDetected: true` for the two Claude Desktop processes
(PIDs 2706 + 2710); the peer-handling clause in
`docs/AUTONOMOUS-LOOP-PER-TICK.md` §1 routed 3 consecutive ticks
around `git worktree add` without contention failures.

Substrate-honest minimal commit: this tick was repeatedly losing
working-tree edits to Lior-gemini's global-lock-cleanup pass (step 8
of his loop prompt) running concurrently. Adopting the canary-rule
mitigation: atomic Edit-then-chain stage+commit in a single Bash
invocation. Resolution-section content + BACKLOG.md regen deferred
to a follow-up tick or a follow-up commit on this branch when
Lior's cleanup window allows it.

Co-Authored-By: Claude <noreply@anthropic.com>

* chore(b-0530): add Resolution section + BACKLOG.md regen

Follow-up to bfe5601 (status: open → closed). Adds the Resolution
section mapping all 6 acceptance criteria to shipped artifacts plus
live-verification evidence from this tick. Regenerates docs/BACKLOG.md
via `BACKLOG_WRITE_FORCE=1 bun tools/backlog/generate-index.ts` —
single-line `[ ]` → `[x]` toggle for B-0530.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants