Skip to content

fix(adapter-codex): make CodexAdapter dispatchable against live codex 0.133.0#182

Merged
thejustinwalsh merged 19 commits into
mainfrom
middle-issue-177
Jun 3, 2026
Merged

fix(adapter-codex): make CodexAdapter dispatchable against live codex 0.133.0#182
thejustinwalsh merged 19 commits into
mainfrom
middle-issue-177

Conversation

@thejustinwalsh

@thejustinwalsh thejustinwalsh commented May 29, 2026

Copy link
Copy Markdown
Owner

Summary

Closes #177. Resolves #183.

Makes the Codex adapter genuinely dispatchable against the real codex-cli 0.133.0 binary. Reverse-engineered the real surface (binary schema + many live tmux runs), fixed the four gaps the issue listed, then closed the fifth — full live dispatch — via the maintainer-approved Option A: an additive optional AgentAdapter.startsSessionOnFirstPrompt flag plus a dispatcher prompt-first launch branch. All five verified live end-to-end.

The keystone finding: codex 0.133.0's hooks are modelled on Claude Code's — same PascalCase event names, same payload fields, same Bash shell-tool name. The adapter's invented startup/turn-start/command/turn-end/shutdown taxonomy fired zero hooks; the real schema fires end-to-end.

Status

  • Phase 1: Hooks schema + trust (the keystone) — real [hooks], PascalCase events, matcher groups, Bash-scoped PR-ready gate; enterAutoMode answers the trust dialogs
  • Phase 2: CODEX_HOME + auth reachability — buildLaunchCommand sets CODEX_HOME=<worktree>/.codex; installHooks symlinks the operator's auth.json
  • Phase 3: Sandbox key — sandbox_mode (verified accepted under --strict-config; bare sandbox is rejected)
  • Phase 4: Stop + rate-limit from the real session artifact — structured rate_limits in the rollout token_count events
  • Phase 5: Full live dispatch — prompt-first launch ordering (Option A); the heartbeat reaches agent.stopped end-to-end through the real adapter

Resolution of the launch-ordering blocker (Option A, #183)

Live dispatch revealed interactive codex creates no session — and fires no SessionStart — until the first prompt is submitted (Claude fires it at boot). The dispatcher's fixed enterAutoMode → awaitSessionStart → sendText(prompt) order therefore deadlocked codex: the prompt that would unblock the wait was only sent after it. This is the launch→drive ordering, not the codex adapter's internals — so it needed a shared-code change the issue's "no interface change" headline reserved for the maintainer. The maintainer approved Option A ("Approved: A" + approved label on #177).

What shipped:

  • packages/core/src/adapter.ts — additive, optional, documented AgentAdapter.startsSessionOnFirstPrompt?: boolean. Absent/false ⇒ today's boot-triggered order. The interface only grows an optional field; every existing adapter and call site compiles unchanged.
  • packages/dispatcher/src/workflows/implementation.ts — the launch→drive step branches on the flag: a flagged adapter dismisses the boot dialogs, sends the prompt, then awaits session.started; an unflagged adapter keeps await-first, then send. The hook server already stashes a session.started that lands before the gate is parked, so prompt-first is race-safe. The Claude path's call sequence is byte-for-byte unchanged (regression test).
  • packages/adapters/codex/src/index.ts — codex sets startsSessionOnFirstPrompt: true, and enterAutoMode now returns the instant the composer is ready (detectReadyForInput — the OpenAI Codex (vX.Y.Z) welcome banner, captured live) instead of idling to the 90s boot deadline. That promptness is load-bearing now that the dispatcher awaits enterAutoMode before sending the prompt.

What changed (gaps 1–4)

  • packages/adapters/codex/src/hooks.ts — real [hooks] schema (PascalCase events as matcher groups, type="command"), sandbox_mode, worktree pre-trust ([projects."<wt>"] trust_level="trusted"), the Bash-scoped PR-ready gate, and an auth.json symlink into the worktree CODEX_HOME.
  • packages/adapters/codex/src/index.tsbuildLaunchCommand sets CODEX_HOME; enterAutoMode answers the hooks-trust and directory-trust dialogs.
  • packages/adapters/codex/src/classify.ts — rate-limit detection reads the structured rate_limits block (precise resetAt from the epoch resets_at); textual regex retained as a fallback.
  • packages/adapters/codex/src/transcript.ts — docs only; the existing parse already matches the confirmed rollout schema.
  • packages/core/src/hook-script.ts + packages/cli/src/bootstrap-assets/hooks/hook.shcurl -o /dev/null so the dispatcher's ok body doesn't pollute hook stdout (codex parses hook stdout as JSON). Strictly better for both adapters; drift test stays green.
  • packages/adapters/codex/test/adapter.test.ts — rewritten against the real schema and live-captured payloads; adds the ready-probe and flag tests.
  • packages/adapters/codex/scripts/verify-live-hooks.ts — live end-to-end heartbeat probe, now driving the real prompt-first order (manual/CI-gated).

Acceptance criteria

  • Criterion 1 — the real codex 0.133.0 [hooks] schema is read off the binary and the adapter emits the events the watchdog/stop-classifier consume; the heartbeat is observed end-to-end. Evidence: packages/adapters/codex/src/hooks.ts, packages/adapters/codex/test/adapter.test.ts, and the live e2e probe packages/adapters/codex/scripts/verify-live-hooks.ts (PASS — all five normalized events).
  • Criterion 2 — buildLaunchCommand sets CODEX_HOME so the adapter config loads, with auth reachable via the auth.json symlink. Evidence: packages/adapters/codex/test/adapter.test.ts.
  • Criterion 3 — the sandbox policy is applied via the real sandbox_mode key, verified under --strict-config (bare sandbox rejected). Evidence: packages/adapters/codex/test/adapter.test.ts.
  • Criterion 4 — stop classification + rate-limit detection read the real rollout artifact (structured rate_limits). Evidence: packages/adapters/codex/test/adapter.test.ts.
  • Criterion 5 (integration) — a full live dispatch through the prompt-first launch order dispatches codex into the running dispatcher and drives the heartbeat end-to-end (enterAutoMode → prompt → session.startedagent.stopped) against real codex 0.133.0; closes the deferred CodexAdapter #60/Verify the adapter abstraction holds across both adapters #63 criterion and resolves feat(dispatcher,core): launch-ordering accommodation for prompt-triggered-session adapters (codex live dispatch) #183. Evidence: the live e2e probe packages/adapters/codex/scripts/verify-live-hooks.ts (PASS) and the dispatcher ordering test packages/dispatcher/test/implementation-workflow.test.ts.

Verification

  • Live prompt-first dispatch (Criterion 5). verify-live-hooks.ts drives the real adapter in the dispatcher's prompt-first order against codex 0.133.0. PASS: enterAutoMode returned in 1317 ms (composer-ready, not the 90 s deadline); session.started did not fire before the prompt (proving prompt-triggered); full heartbeat session.started → turn.started → tool.pre → tool.post → agent.stopped; session.started carried a real transcript_path rollout; the agent ran the shell command and stopped.
  • Both launch orderings + the Claude regression. packages/dispatcher/test/implementation-workflow.test.ts asserts the prompt-first adapter sends the prompt before awaiting session.started (no deadlock, against a gate that only resolves once the prompt is sent), the boot-first adapter keeps the await-first sequence unchanged, and that await-first ordering deadlocks a prompt-triggered CLI (why the flag exists). packages/adapters/claude/test/adapter.test.ts locks that Claude leaves the flag unset.
  • Gaps 1–4 as previously: hooks fire end-to-end, CODEX_HOME/auth, sandbox_mode under --strict-config, structured rate_limits — all live-confirmed and unit-tested.
  • bun test (1261 pass), bun run typecheck, bun run lint, bun run format all green.

Why these changes

codex's prompt-triggered session is a genuine cross-CLI difference, not a codex-internal quirk — so the accommodation belongs at the AgentAdapter seam, expressed as a capability the dispatcher honors. Option A keeps it additive: a single optional flag, a localized dispatcher branch, zero change to the Claude path. enterAutoMode resolving on composer-ready (rather than the boot deadline) is what lets the now-awaited dialog dismissal feed straight into the prompt without stalling every launch.

Stumbling points

  • The old live verify script waited for session.started before sending the prompt — it always fell through its 90 s wait and then sent, so it "passed" without ever exercising the real dispatcher order. That masked the deadlock. The rewrite is a faithful mirror of implementation.ts and asserts the prompt-triggered invariant explicitly.
  • {kind:"done"} in the dispatcher test stub triggers the review-changes park (waiting-human), not completed; the ordering tests use bare-stop to isolate the launch order from the post-drive flow.

Decisions

See planning/issues/177/decisions.md → "RESOLUTION (2026-06-03)" for the Option A rationale and the full candidate space; the earlier entries cover gaps 1–4.

Branch

Rebased onto current main (clean, MERGEABLE) — now includes both the merged #178 poller fix and Epic #190's epic-store / dispatcher refactor (the epicNumberepicRef rename in the adapter installHooks/buildPromptText params is adapted accordingly). Option A's launch-ordering branch was re-applied onto #190's restructured implementation.ts (new-work-as-base); the launch sequence and the Claude path are behaviorally unchanged from the live-verified state.

Recovery note: this PR was briefly closed when middle's open-PR divergence reconciler, firing as #190 merged, reset this branch to main instead of rebasing the codex commits on top (the deep #190/#177 interleave in implementation.ts). No code was lost — the approved work is recovered intact and re-verified (typecheck/lint/format/1261 tests green). The reconciler defect is filed as a follow-up.

Follow-up issues

Summary by CodeRabbit

  • New Features

    • Adapters can opt into "start on first prompt" behavior; Codex improves interactive boot handling and readiness detection.
  • Tests

    • Added live E2E verification and expanded tests for launch ordering, prompt/boot dialogs, hooks, transcripts, and rate-limit handling; regression added for Claude ordering.
  • Bug Fixes

    • Hook responses no longer leak to adapter stdout.
  • Documentation

    • Added decision log and planning docs for Codex changes; installer now handles auth linking and updated hook configs.

@coderabbitai

coderabbitai Bot commented May 29, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Adds an optional AgentAdapter.startsSessionOnFirstPrompt flag and branches dispatcher launch ordering to support prompt-first adapters; updates Codex adapter readiness probing, hook TOML generation, auth symlinking, and structured transcript rate-limit detection; expands tests, adds a live verification script, and records planning/decision docs.

Changes

Prompt-first adapter capability and dispatcher launch ordering

Layer / File(s) Summary
AgentAdapter.startsSessionOnFirstPrompt contract
packages/core/src/adapter.ts
Adds startsSessionOnFirstPrompt?: boolean to AgentAdapter documenting prompt-first semantics.
Dispatcher launch-order branching on adapter capability
packages/dispatcher/src/workflows/implementation.ts
Branches drive ordering on adapter.startsSessionOnFirstPrompt, introduces promptText/sendPrompt, sends prompt before awaiting SessionStart for prompt-first adapters, otherwise awaits SessionStart first.
Dispatcher workflow tests for prompt-first ordering
packages/dispatcher/test/implementation-workflow.test.ts
Adds ordering harness and tests validating prompt-first, boot-first, and deadlock scenarios with on-prompt vs immediate SessionGate behaviors.
Claude adapter regression test for startsSessionOnFirstPrompt
packages/adapters/claude/test/adapter.test.ts
Adds a test asserting claudeAdapter.startsSessionOnFirstPrompt is falsy.

Codex adapter boot-mode readiness and hook configuration

Layer / File(s) Summary
Codex boot dialog detection and enterAutoMode polling
packages/adapters/codex/src/index.ts
Adds pane probes (hooks trust, dir trust, ready banner), exposes detection helpers, and rewrites enterAutoMode to answer dialogs and return on composer-ready. Ensures CODEX_HOME points to worktree .codex.
Codex hook TOML generation with matcher groups and event mapping
packages/adapters/codex/src/hooks.ts
Generates matcher-group array-of-tables TOML, updates CODEX_EVENT_MAP to PascalCase events, uses sandbox_mode, pre-trusts worktree, registers PR-ready gate as second PreToolUse matcher scoped to Bash, and adds linkAuth for auth symlinking.
Shared hook script response body discard and docs
packages/cli/src/bootstrap-assets/hooks/hook.sh, packages/core/src/hook-script.ts
Adds -o /dev/null to curl in hook scripts and documents why hook stdout must be free of dispatcher response bodies.

Codex transcript and rate-limit handling

Layer / File(s) Summary
Codex structured rate-limit detection from rollout transcripts
packages/adapters/codex/src/classify.ts
Adds structured-first rate-limit detector reading token_count rate_limits, converts resets_at to ISO, computes limit via rate_limit_reached_type/used_percent >= 100, falls back to regex; updates classifyStop and detectRateLimit.
Codex transcript schema documentation updates
packages/adapters/codex/src/transcript.ts
Updates inline docs for resolveTranscriptPath and readTranscriptState to match codex 0.133.0 rollout shapes.

Tests and live verification

Layer / File(s) Summary
Codex adapter test rewrites and expansions
packages/adapters/codex/test/adapter.test.ts
Extensive test updates: launch/env assertions including CODEX_HOME, prompt framing updates (epicRef: "177"), real-shaped transcript fixtures, structured rate_limits tests, installHooks TOML parsing and PR-ready gate assertions, auth symlink tests, and prompt/dialog detection tests.
Live end-to-end Codex adapter verification script
packages/adapters/codex/scripts/verify-live-hooks.ts
New Bun/TS script that installs hooks, launches real codex under tmux against a local dispatcher stand-in, drives enterAutoMode/prompt-first interaction, records hook events, validates ordering and required events, and emits diagnostics on failure.

Planning and decisions

Layer / File(s) Summary
Issue 177 plan and decisions
planning/issues/177/plan.md, planning/issues/177/decisions.md
Adds plan and decision records documenting the prompt-first resolution, Codex hook/config choices, auth symlink behavior, structured rate-limit decision, and discovery that Codex SessionStart can be prompt-triggered.

Sequence Diagram (high-level prompt-first launch flow):

sequenceDiagram
  participant Dispatcher
  participant CodexAdapter
  participant HookServer
  Dispatcher->>CodexAdapter: enterAutoMode()
  Dispatcher->>CodexAdapter: sendPrompt(promptText)
  CodexAdapter->>HookServer: emit SessionStart hook
  HookServer->>Dispatcher: POST /hooks/SessionStart
  Dispatcher->>CodexAdapter: await running-state update
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related issues

Possibly related PRs

Suggested labels

bug

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 76.92% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: making the CodexAdapter work with live codex 0.133.0 by addressing real integration gaps through reverse-engineering and implementation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Comment @coderabbitai help to get the list of available commands and usage tips.

@thejustinwalsh

Copy link
Copy Markdown
Owner Author

Approved: A

@thejustinwalsh thejustinwalsh force-pushed the middle-issue-177 branch 2 times, most recently from c0d6d85 to 72e25e6 Compare June 3, 2026 08:45
* arrives before the gate is parked, so prompt-first is race-safe. See
* `packages/dispatcher/src/workflows/implementation.ts` launch→drive.
*/
readonly startsSessionOnFirstPrompt?: boolean;

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decision (Option A, #183): additive optional flag — the AgentAdapter signature only grows. Absent/false keeps today's boot-triggered order, so every existing adapter and call site is unchanged. Maintainer-authorized ("Approved: A" + approved label on #177). Race-safety rests on the hook server stashing a session.started that lands before the gate parks (hook-server.ts #deliver/#await). Rationale: planning/issues/177/decisions.md → RESOLUTION (2026-06-03).

// order forks on whether the CLI fires SessionStart at boot or only once
// the first prompt is submitted (AgentAdapter.startsSessionOnFirstPrompt).
let startPayload: HookPayload;
if (adapter.startsSessionOnFirstPrompt) {

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why prompt-first here: codex creates no session — and fires no SessionStart — until the first prompt is submitted, so await-first deadlocks it (the prompt that unblocks the wait is only sent after). The flagged path dismisses dialogs, sends the prompt, then awaits; enterAutoMode is awaited (not fire-and-forget) so a needs-login throw fails the launch fast instead of feeding the prompt into a login screen. The unflagged (Claude) arm below is byte-for-byte the prior sequence — covered by the regression test in implementation-workflow.test.ts.

@thejustinwalsh thejustinwalsh marked this pull request as ready for review June 3, 2026 08:49
@thejustinwalsh thejustinwalsh added the ready-for-review All phases done and verified — PR ready for final human review and merge label Jun 3, 2026
@thejustinwalsh

Copy link
Copy Markdown
Owner Author

Reviewer's brief — PR #182 (Epic #177, codex live-dispatch)

The codex adapter is now functionally dispatchable against real codex-cli 0.133.0. Gaps 1–4 landed earlier; this brief focuses on the final piece — criterion 5, the prompt-first launch ordering (Option A, resolving #183), which the maintainer authorized ("Approved: A" + approved label).

How to run it

bun install
bun run typecheck && bun run lint && bun test     # 1194 pass, all green
# Live e2e (needs the codex 0.133.0 binary, a signed-in ~/.codex, tmux, network):
bun run packages/adapters/codex/scripts/verify-live-hooks.ts   # exits 0 = PASS

What to verify (and what "correct" looks like)

  • The interface change is genuinely additive. packages/core/src/adapter.tsstartsSessionOnFirstPrompt?: boolean is optional; no existing adapter or call site changed signature. Claude must not set it (packages/adapters/claude/test/adapter.test.ts asserts this).
  • The Claude path is unchanged. packages/dispatcher/src/workflows/implementation.ts launch→drive: the else (unflagged) arm is the prior enterAutoMode (parallel) → awaitSessionStart → updateWorkflow(running) → sendPrompt sequence. The ordering test asserts the exact call order [enter-auto-mode, await-session-start, send-prompt] for an unflagged adapter, and [enter-auto-mode, send-prompt, await-session-start] for a flagged one.
  • Prompt-first is race-safe. The flagged arm sends the prompt before awaiting session.started; this is safe only because the hook server stashes a session.started that arrives before a waiter parks (packages/dispatcher/src/hook-server.ts #deliver/#await). Worth a look that this reasoning holds.
  • The live heartbeat. The e2e probe drives the real adapter in the dispatcher's prompt-first order and asserts: enterAutoMode returns promptly on composer-ready (not the 90 s deadline), session.started is not seen before the prompt (proving prompt-triggered), and the full session.started → turn.started → tool.pre → tool.post → agent.stopped heartbeat with a real transcript_path.

How to review it

The five commits are split by concern: core flag → codex (flag + enterAutoMode composer-ready return) → dispatcher branch → claude regression test → live-verify rewrite → decisions log. The acceptance-criteria evidence table is in the PR body; rationale and the full candidate space are in planning/issues/177/decisions.md → "RESOLUTION (2026-06-03)".

Fragile bits that want extra eyes

  • enterAutoMode now returns on a positive pane signal (detectReadyForInput, the OpenAI Codex (vX.Y.Z) banner) rather than idling to the boot deadline. The ready-check sits after the dialog probes (which continue), and a dialog screen never renders the banner — so a still-open dialog is always answered first. The regex /OpenAI Codex\s*\(v/i and that ordering are the load-bearing assumptions; both are grounded in live captures off 0.133.0.
  • enterAutoMode is now awaited on the flagged path (vs fire-and-forget for Claude) — a needs-login throw fails the launch fast; the same outer catch kills the tmux session, so no leak.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/adapters/codex/scripts/verify-live-hooks.ts`:
- Around line 46-157: The test creates runtime resources (tmux session and Bun
server) in main() but doesn't guarantee cleanup on error; wrap the critical
region after creating server and starting the tmux session in a try/finally so
we always run cleanup: move the logic from after server/tmux creation
(everything that drives enterAutoMode, prompt send, waits, and capturePane) into
the try block, and in the finally block ensure you call
sh(["tmux","kill-session","-t", SESSION]) only if the session was started and
await server.stop(true) only if server was created, handling any errors from
those teardown calls so cleanup always runs; reference functions/vars: main(),
server, SESSION, codexAdapter.enterAutoMode, sh(), capturePane().
- Around line 42-44: The helper sh() currently awaits Bun.spawn(...).exited but
ignores the subprocess exit result; update sh to capture the exited result from
Bun.spawn(args, { stdout: "ignore", stderr: "ignore" }), check the exit code
(result.exitCode or exited.status depending on API) and throw an error or
otherwise fail fast when it is non-zero so tmux lifecycle failures
(new-session/send-keys/kill-session) are not silently ignored; reference the sh
function and the Bun.spawn invocation when locating where to add the exit-code
check and error handling.

In `@planning/issues/177/decisions.md`:
- Line 293: Several ATX headings in this document (for example the heading that
currently reads "`#177` labels are still just `phase:10`, `dogfood` (**no
`approved**`)" and the other headings noted) are missing a space after the
leading '#' which triggers markdownlint MD018; fix them by adding a single space
between the '#' characters and the heading text (e.g., change "`#Heading`" to "#
Heading") for each affected heading in decisions.md so they conform to MD018.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 2c121e60-62da-4928-9de7-825d1085fbe2

📥 Commits

Reviewing files that changed from the base of the PR and between 421425d and 72e25e6.

📒 Files selected for processing (14)
  • packages/adapters/claude/test/adapter.test.ts
  • packages/adapters/codex/scripts/verify-live-hooks.ts
  • packages/adapters/codex/src/classify.ts
  • packages/adapters/codex/src/hooks.ts
  • packages/adapters/codex/src/index.ts
  • packages/adapters/codex/src/transcript.ts
  • packages/adapters/codex/test/adapter.test.ts
  • packages/cli/src/bootstrap-assets/hooks/hook.sh
  • packages/core/src/adapter.ts
  • packages/core/src/hook-script.ts
  • packages/dispatcher/src/workflows/implementation.ts
  • packages/dispatcher/test/implementation-workflow.test.ts
  • planning/issues/177/decisions.md
  • planning/issues/177/plan.md

Comment thread packages/adapters/codex/scripts/verify-live-hooks.ts
Comment thread packages/adapters/codex/scripts/verify-live-hooks.ts
Comment thread planning/issues/177/decisions.md
@thejustinwalsh

Copy link
Copy Markdown
Owner Author

🔁 Reconciled with main (rebased) after 7279b25

…OME + auth

The Codex adapter's hook config was a start-generous guess that fired zero
hooks. Reverse-engineered the real codex 0.133.0 surface (binary schema + live
runs): its hooks are modelled on Claude Code's.

- hooks.ts: emit the real `[hooks]` block — PascalCase event names
  (SessionStart/UserPromptSubmit/PreToolUse/PostToolUse/Stop/SubagentStop) as
  Claude-style matcher groups (`type = "command"`), not the fictional
  startup/turn-start/command/turn-end/shutdown taxonomy. Use `sandbox_mode`
  (bare `sandbox` is rejected under --strict-config). Pre-trust the worktree
  (`[projects."<wt>"] trust_level = "trusted"`) to skip the directory dialog.
  Keep the PR-ready gate as a second PreToolUse group scoped to the Bash tool
  (Codex's shell tool is named "Bash"). Symlink the operator's auth.json into
  the worktree CODEX_HOME.
- index.ts: buildLaunchCommand sets CODEX_HOME=<worktree>/.codex so the config
  is actually loaded; enterAutoMode answers the hooks-trust ("Trust all and
  continue") and directory-trust boot dialogs (the bypass flag does not suppress
  them interactively).
- classify.ts: detect rate limits from the structured `rate_limits` block in the
  rollout's `token_count` events (rate_limit_reached_type / used_percent >= 100),
  yielding a precise `resetAt` from the epoch `resets_at` instead of "unknown".
  A healthy structured block is authoritative; the textual regex is a fallback
  only when no structured block is present.
- transcript.ts: doc-only — the existing parse already matches the confirmed
  0.133.0 rollout schema (transcript_path, assistant-message turns, function_call
  tool names, info.total_token_usage context fill).
Replace the assertions encoding the fictional hook taxonomy with the real one:
PascalCase events as matcher groups, sandbox_mode, worktree pre-trust, CODEX_HOME
in the launch env, the auth symlink, the Bash-scoped PR-ready gate group, the
trust-dialog detectors, and structured-plus-fallback rate-limit classification —
all using payload/rollout fixtures captured from live codex 0.133.0 runs.
…s clean

Codex parses each hook's stdout as the hook's structured JSON output, so the
dispatcher's plain `ok` response body — echoed to stdout by curl — made codex
log `hook returned invalid <event> JSON output` on every fire (observed live as
"Stop hook (failed)"). `curl -o /dev/null` discards the body, leaving clean
empty stdout that both codex and Claude treat as a no-op. The byte-identical
bootstrap-assets mirror is updated in lockstep (drift test stays green).
A manual/CI-gated probe (not in `bun test` — needs the real codex binary, a
signed-in CODEX_HOME, tmux, and network) that drives the REAL adapter
(installHooks + buildLaunchCommand + enterAutoMode) against live codex and
asserts the normalized heartbeat (session.started w/ transcript_path,
turn.started, tool.pre, tool.post, agent.stopped) reaches a local dispatcher
stand-in. Captures the evidence for gaps 1-4. Also records the decisions-log
finding that codex's interactive SessionStart is prompt-triggered (the
launch-ordering blocker for full live dispatch).
Additive optional capability flag for CLIs that create no session — and fire no
SessionStart/ready hook — until the first prompt is submitted (codex 0.133.0).
Absent/false keeps the boot-triggered launch order; the dispatcher honors it to
send the prompt before awaiting the ready hook. Race-safe because the hook
server stashes a ready hook that arrives before the gate is parked.
…oMode on composer-ready

codex creates no session until the first prompt, so it sets the flag and the
dispatcher must send the prompt before awaiting session.started. enterAutoMode
now resolves the instant the composer-ready welcome banner appears
(detectReadyForInput, captured live off 0.133.0) instead of idling to the 90s
boot deadline — load-bearing now that the dispatcher awaits it before sending the
prompt. Tests cover the ready probe (real banner vs dialogs) and the flag.
…dapters

Branch launch->drive on AgentAdapter.startsSessionOnFirstPrompt: a flagged
adapter (codex) dismisses the boot dialogs, sends the prompt, THEN awaits
session.started; an unflagged adapter (claude) keeps await-first then send. The
hook server stashes a session.started arriving before the gate parks, so
prompt-first is race-safe. Fixes the codex live-dispatch deadlock (#177 criterion
5, #183). Tests assert both orderings, that the Claude path call sequence is
unchanged, and that await-first deadlocks a prompt-triggered CLI.
… unset

Locks the #183 regression at the adapter level: Claude fires SessionStart at
boot, so it must not set the flag — the dispatcher keeps await-first order for it.
…rify

The probe now mirrors implementation.ts: await enterAutoMode -> send prompt ->
await session.started, and asserts session.started is prompt-triggered (absent
before the prompt) and enterAutoMode returns promptly. The old await-first wait
always fell through its 90s window then sent, masking the deadlock.
…s param

Epic #190 renamed the adapter installHooks/buildPromptText parameter
epicNumber → epicRef (string-keyed Epic refs). The recovered live-verify
script still passed epicNumber; align it with the current signature.
@thejustinwalsh

Copy link
Copy Markdown
Owner Author

Reviewer's brief — PR #182 restored & re-verified (Epic #177, codex live-dispatch)

PR #182 was approved, ready-for-review, and MERGEABLE, then briefly closed when middle's open-PR divergence reconciler — firing as Epic #190 merged to main — reset its branch to main and closed it instead of rebasing the codex commits on top (the deep #190/#177 interleave in implementation.ts). No work was lost. The approved work has been recovered and re-established:

  • Recovered from dangling commits at the ready-for-review tip (72e25e6), rebased cleanly onto current main (7279b25, which now carries feat(epic-store): file-backed Epic store (opt-in hybrid) #190's epic-store/dispatcher refactor). The one real conflict — Option A's launch-ordering branch vs. feat(epic-store): file-backed Epic store (opt-in hybrid) #190's restructured implementation.ts — was resolved new-work-as-base (Option A's intent re-applied onto feat(epic-store): file-backed Epic store (opt-in hybrid) #190's shape); feat(epic-store): file-backed Epic store (opt-in hybrid) #190's epicNumberepicRef rename was adapted in the adapter installHooks/buildPromptText call sites and the live-verify script.
  • Re-verified: bun run typecheck, bun run lint, bun run format, and bun test (1261 pass, 0 fail) all green. PR is MERGEABLE / CLEAN, not draft.
  • The codex launch behavior and the Claude path are behaviorally unchanged from the previously live-verified state (only the cross-cutting epicRef rename was applied), so the prior live e2e evidence for criterion 5 (verify-live-hooks.ts PASS against codex 0.133.0) still holds; the deterministic dispatcher ordering tests (implementation-workflow.test.ts) cover both launch orderings and the await-first deadlock.

How to review

The substantive review is unchanged from the original brief on this PR (Option A: additive AgentAdapter.startsSessionOnFirstPrompt? + dispatcher prompt-first branch; gaps 1–4 in the codex adapter). New since then is only the rebase onto #190's main — focus a second look on packages/dispatcher/src/workflows/implementation.ts (the launch→drive fork) to confirm the merge preserved Option A's ordering and #190's gateway/epicRef changes together.

Run it

bun install && bun run typecheck && bun run lint && bun test

Live (operator-only, needs codex 0.133.0): bun packages/adapters/codex/scripts/verify-live-hooks.ts.

Fragile / extra eyes

The human does the final review and merge — the workflow does not merge.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
packages/adapters/codex/scripts/verify-live-hooks.ts (2)

46-157: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing try/finally for deterministic cleanup.

If any step throws before lines 156-157, the tmux session and Bun server remain open. The past review flagged this and indicated it was addressed, but the current code lacks the try/finally wrapper.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/adapters/codex/scripts/verify-live-hooks.ts` around lines 46 - 157,
Wrap the runtime sequence inside main() (everything after starting the Bun.serve
and before the final tmux kill/server.stop) in a try/finally so cleanup always
runs: ensure the finally block awaits sh(["tmux","kill-session","-t", SESSION])
(or ignores errors if no session) and awaits server.stop(true); reference the
Bun.serve instance named server and the SESSION/session-related tmux usage and
the sh() invocations (newSessionArgs/send-keys/killing) so the tmux session and
Bun server are deterministically terminated even if codexAdapter.enterAutoMode,
buildLaunchCommand, sh, or any later loop throws.

42-44: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

sh() still ignores subprocess exit codes.

The helper awaits Bun.spawn(...).exited but doesn't check the result. A failing tmux command (e.g., new-session) would be silently ignored, leading to misleading harness outcomes. The past review flagged this and indicated it was fixed, but the current code still shows the unfixed version.

Suggested fix (matching the prior resolution)
 async function sh(args: string[]): Promise<void> {
-  await Bun.spawn(args, { stdout: "ignore", stderr: "ignore" }).exited;
+  const code = await Bun.spawn(args, { stdout: "ignore", stderr: "ignore" }).exited;
+  if (code !== 0) {
+    throw new Error(`Command failed (exit ${code}): ${args.join(" ")}`);
+  }
 }
+
+/** Best-effort shell: swallows non-zero exits (for cleanup commands). */
+async function quiet(fn: () => Promise<void>): Promise<void> {
+  try { await fn(); } catch { /* swallow */ }
+}

Then use quiet(() => sh(["tmux", "kill-session", ...])) for the stale-session cleanup at line 93.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/adapters/codex/scripts/verify-live-hooks.ts` around lines 42 - 44,
The sh helper currently awaits Bun.spawn(...).exited but ignores the spawn
result, so failing subprocesses are silently ignored; change sh to await the
spawn result (const res = await Bun.spawn(args, { stdout: "ignore", stderr:
"ignore" }).exited) and throw an error if res.exitCode !== 0 (include
args/exitCode in the error message) so callers see failures; also update the
stale tmux cleanup call (the tmux kill-session invocation) to use quiet(() =>
sh(["tmux", "kill-session", ...])) so non-fatal cleanup failures are suppressed
while all other sh callers will now surface errors.
🧹 Nitpick comments (3)
packages/dispatcher/src/workflows/implementation.ts (1)

501-510: 💤 Low value

Redundant String() coercion.

epicRef is already typed as string (line 501), so String(epicRef) at line 504 is a no-op. This may be leftover from when epicRef was numeric.

🧹 Suggested cleanup
 function readPlanBody(worktreePath: string, epicRef: string): string {
   try {
     return readFileSync(
-      join(worktreePath, "planning", "issues", String(epicRef), "plan.md"),
+      join(worktreePath, "planning", "issues", epicRef, "plan.md"),
       "utf8",
     );
   } catch {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/dispatcher/src/workflows/implementation.ts` around lines 501 - 510,
In readPlanBody(worktreePath: string, epicRef: string) remove the redundant
String(epicRef) coercion when building the path — use epicRef directly in
join(worktreePath, "planning", "issues", epicRef, "plan.md"); keep the try/catch
and return behavior unchanged and update any nearby comments if they mention
numeric coercion.
packages/adapters/codex/test/adapter.test.ts (2)

645-678: 💤 Low value

Test mutates process.env.CODEX_HOME without isolation from parallel test runs.

These tests directly mutate process.env.CODEX_HOME. If Bun runs tests in parallel (default behavior), another test reading process.env.CODEX_HOME could see the wrong value. The try/finally restore is good for sequential safety but not parallel safety.

Consider using test.serial or wrapping these in a dedicated describe block with beforeEach/afterEach that saves and restores the env var, ensuring the test framework serializes them.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/adapters/codex/test/adapter.test.ts` around lines 645 - 678, These
tests mutate process.env.CODEX_HOME (in the two tests around installInto and the
auth.json assertions) which can race with other tests; update the tests to avoid
global mutation by either converting them to run serially (use test.serial for
the two cases) or move them into a dedicated describe block and implement
beforeEach/afterEach that save the previous CODEX_HOME and restore it after each
test, then remove the inline try/finally blocks; reference the existing test
names and the installInto call to locate and change the tests.

328-341: 💤 Low value

Consider adding a test case for used_percent > 100.

The test covers exactly 100%, but the implementation logic (per PR objectives) triggers on used_percent >= 100. A test with used_percent: 105 would verify the > case explicitly, though the current test is likely sufficient since both use the same code path.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/adapters/codex/test/adapter.test.ts` around lines 328 - 341, Add a
second unit test in packages/adapters/codex/test/adapter.test.ts that mirrors
the existing "structured rate_limits at/over 100% used → rate-limited even
without reached_type" test but sets primary.used_percent to 105 (or any value
>100) to explicitly validate the >100 branch; use the same helpers
(writeMiddleDir, writeRolloutWithRateLimits) and call codexAdapter.classifyStop
with the same args, asserting result.kind === "rate-limited".
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@packages/adapters/codex/scripts/verify-live-hooks.ts`:
- Around line 46-157: Wrap the runtime sequence inside main() (everything after
starting the Bun.serve and before the final tmux kill/server.stop) in a
try/finally so cleanup always runs: ensure the finally block awaits
sh(["tmux","kill-session","-t", SESSION]) (or ignores errors if no session) and
awaits server.stop(true); reference the Bun.serve instance named server and the
SESSION/session-related tmux usage and the sh() invocations
(newSessionArgs/send-keys/killing) so the tmux session and Bun server are
deterministically terminated even if codexAdapter.enterAutoMode,
buildLaunchCommand, sh, or any later loop throws.
- Around line 42-44: The sh helper currently awaits Bun.spawn(...).exited but
ignores the spawn result, so failing subprocesses are silently ignored; change
sh to await the spawn result (const res = await Bun.spawn(args, { stdout:
"ignore", stderr: "ignore" }).exited) and throw an error if res.exitCode !== 0
(include args/exitCode in the error message) so callers see failures; also
update the stale tmux cleanup call (the tmux kill-session invocation) to use
quiet(() => sh(["tmux", "kill-session", ...])) so non-fatal cleanup failures are
suppressed while all other sh callers will now surface errors.

---

Nitpick comments:
In `@packages/adapters/codex/test/adapter.test.ts`:
- Around line 645-678: These tests mutate process.env.CODEX_HOME (in the two
tests around installInto and the auth.json assertions) which can race with other
tests; update the tests to avoid global mutation by either converting them to
run serially (use test.serial for the two cases) or move them into a dedicated
describe block and implement beforeEach/afterEach that save the previous
CODEX_HOME and restore it after each test, then remove the inline try/finally
blocks; reference the existing test names and the installInto call to locate and
change the tests.
- Around line 328-341: Add a second unit test in
packages/adapters/codex/test/adapter.test.ts that mirrors the existing
"structured rate_limits at/over 100% used → rate-limited even without
reached_type" test but sets primary.used_percent to 105 (or any value >100) to
explicitly validate the >100 branch; use the same helpers (writeMiddleDir,
writeRolloutWithRateLimits) and call codexAdapter.classifyStop with the same
args, asserting result.kind === "rate-limited".

In `@packages/dispatcher/src/workflows/implementation.ts`:
- Around line 501-510: In readPlanBody(worktreePath: string, epicRef: string)
remove the redundant String(epicRef) coercion when building the path — use
epicRef directly in join(worktreePath, "planning", "issues", epicRef,
"plan.md"); keep the try/catch and return behavior unchanged and update any
nearby comments if they mention numeric coercion.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8c1e5797-db4d-4891-a0fb-ffbd43227a00

📥 Commits

Reviewing files that changed from the base of the PR and between 1060996 and 0aec625.

📒 Files selected for processing (14)
  • packages/adapters/claude/test/adapter.test.ts
  • packages/adapters/codex/scripts/verify-live-hooks.ts
  • packages/adapters/codex/src/classify.ts
  • packages/adapters/codex/src/hooks.ts
  • packages/adapters/codex/src/index.ts
  • packages/adapters/codex/src/transcript.ts
  • packages/adapters/codex/test/adapter.test.ts
  • packages/cli/src/bootstrap-assets/hooks/hook.sh
  • packages/core/src/adapter.ts
  • packages/core/src/hook-script.ts
  • packages/dispatcher/src/workflows/implementation.ts
  • packages/dispatcher/test/implementation-workflow.test.ts
  • planning/issues/177/decisions.md
  • planning/issues/177/plan.md
✅ Files skipped from review due to trivial changes (4)
  • planning/issues/177/plan.md
  • packages/cli/src/bootstrap-assets/hooks/hook.sh
  • packages/adapters/codex/src/transcript.ts
  • planning/issues/177/decisions.md

thejustinwalsh added a commit that referenced this pull request Jun 3, 2026
A clean rebase that drops all of a PR's commits (each empty against the new
main) exits 0 with HEAD == origin/main; applySuccess then force-pushed that
empty HEAD over the branch, silently emptying an approved PR (#182).

- tryRebaseOntoMain: detect the dropped-all-commits case after a clean rebase,
  restore the worktree to its pre-rebase HEAD, and return droppedAllCommits so
  the orchestrator escalates instead of pushing.
- applySuccess: keystone last-line guard refuses to push when the remote branch
  has commits ahead of main but the local HEAD has none.
- reconcileOpenPRs: route droppedAllCommits straight to applyDemoteToWork
  (skip the merge fallback) with a specific escalation reason.
- GitOps gains revListCount + restore-only resetHard.

Regression tests cover the helper guard, the applySuccess refusal, and the
end-to-end escalation (branch not reset to main, PR not closed).

Closes #201
thejustinwalsh added a commit that referenced this pull request Jun 3, 2026
A clean rebase that drops all of a PR's commits (each empty against the new
main) exits 0 with HEAD == origin/main; applySuccess then force-pushed that
empty HEAD over the branch, silently emptying an approved PR (#182).

- tryRebaseOntoMain: detect the dropped-all-commits case after a clean rebase,
  restore the worktree to its pre-rebase HEAD, and return droppedAllCommits so
  the orchestrator escalates instead of pushing.
- applySuccess: keystone last-line guard refuses to push when the remote branch
  has commits ahead of main but the local HEAD has none.
- reconcileOpenPRs: route droppedAllCommits straight to applyDemoteToWork
  (skip the merge fallback) with a specific escalation reason.
- GitOps gains revListCount + restore-only resetHard.

Regression tests cover the helper guard, the applySuccess refusal, and the
end-to-end escalation (branch not reset to main, PR not closed).

Closes #201
thejustinwalsh added a commit that referenced this pull request Jun 3, 2026
A clean rebase that drops all of a PR's commits (each empty against the new
main) exits 0 with HEAD == origin/main; applySuccess then force-pushed that
empty HEAD over the branch, silently emptying an approved PR (#182).

- tryRebaseOntoMain: detect the dropped-all-commits case after a clean rebase,
  restore the worktree to its pre-rebase HEAD, and return droppedAllCommits so
  the orchestrator escalates instead of pushing.
- applySuccess: keystone last-line guard refuses to push when the remote branch
  has commits ahead of main but the local HEAD has none.
- reconcileOpenPRs: route droppedAllCommits straight to applyDemoteToWork
  (skip the merge fallback) with a specific escalation reason.
- GitOps gains revListCount + restore-only resetHard.

Regression tests cover the helper guard, the applySuccess refusal, and the
end-to-end escalation (branch not reset to main, PR not closed).

Closes #201
@thejustinwalsh thejustinwalsh merged commit 8fbd926 into main Jun 3, 2026
1 check passed
@thejustinwalsh thejustinwalsh deleted the middle-issue-177 branch June 3, 2026 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-review All phases done and verified — PR ready for final human review and merge

Projects

None yet

1 participant