fix(adapter-codex): make CodexAdapter dispatchable against live codex 0.133.0 by thejustinwalsh · Pull Request #182 · thejustinwalsh/middle

thejustinwalsh · 2026-05-29T05:23:14Z

Summary

Closes #177. Resolves #183.

Makes the Codex adapter genuinely dispatchable against the real codex-cli 0.133.0 binary. Reverse-engineered the real surface (binary schema + many live tmux runs), fixed the four gaps the issue listed, then closed the fifth — full live dispatch — via the maintainer-approved Option A: an additive optional AgentAdapter.startsSessionOnFirstPrompt flag plus a dispatcher prompt-first launch branch. All five verified live end-to-end.

The keystone finding: codex 0.133.0's hooks are modelled on Claude Code's — same PascalCase event names, same payload fields, same Bash shell-tool name. The adapter's invented startup/turn-start/command/turn-end/shutdown taxonomy fired zero hooks; the real schema fires end-to-end.

Status

Phase 1: Hooks schema + trust (the keystone) — real [hooks], PascalCase events, matcher groups, Bash-scoped PR-ready gate; enterAutoMode answers the trust dialogs
Phase 2: CODEX_HOME + auth reachability — buildLaunchCommand sets CODEX_HOME=<worktree>/.codex; installHooks symlinks the operator's auth.json
Phase 3: Sandbox key — sandbox_mode (verified accepted under --strict-config; bare sandbox is rejected)
Phase 4: Stop + rate-limit from the real session artifact — structured rate_limits in the rollout token_count events
Phase 5: Full live dispatch — prompt-first launch ordering (Option A); the heartbeat reaches agent.stopped end-to-end through the real adapter

Resolution of the launch-ordering blocker (Option A, #183)

Live dispatch revealed interactive codex creates no session — and fires no SessionStart — until the first prompt is submitted (Claude fires it at boot). The dispatcher's fixed enterAutoMode → awaitSessionStart → sendText(prompt) order therefore deadlocked codex: the prompt that would unblock the wait was only sent after it. This is the launch→drive ordering, not the codex adapter's internals — so it needed a shared-code change the issue's "no interface change" headline reserved for the maintainer. The maintainer approved Option A ("Approved: A" + approved label on #177).

What shipped:

packages/core/src/adapter.ts — additive, optional, documented AgentAdapter.startsSessionOnFirstPrompt?: boolean. Absent/false ⇒ today's boot-triggered order. The interface only grows an optional field; every existing adapter and call site compiles unchanged.
packages/dispatcher/src/workflows/implementation.ts — the launch→drive step branches on the flag: a flagged adapter dismisses the boot dialogs, sends the prompt, then awaits session.started; an unflagged adapter keeps await-first, then send. The hook server already stashes a session.started that lands before the gate is parked, so prompt-first is race-safe. The Claude path's call sequence is byte-for-byte unchanged (regression test).
packages/adapters/codex/src/index.ts — codex sets startsSessionOnFirstPrompt: true, and enterAutoMode now returns the instant the composer is ready (detectReadyForInput — the OpenAI Codex (vX.Y.Z) welcome banner, captured live) instead of idling to the 90s boot deadline. That promptness is load-bearing now that the dispatcher awaits enterAutoMode before sending the prompt.

What changed (gaps 1–4)

packages/adapters/codex/src/hooks.ts — real [hooks] schema (PascalCase events as matcher groups, type="command"), sandbox_mode, worktree pre-trust ([projects."<wt>"] trust_level="trusted"), the Bash-scoped PR-ready gate, and an auth.json symlink into the worktree CODEX_HOME.
packages/adapters/codex/src/index.ts — buildLaunchCommand sets CODEX_HOME; enterAutoMode answers the hooks-trust and directory-trust dialogs.
packages/adapters/codex/src/classify.ts — rate-limit detection reads the structured rate_limits block (precise resetAt from the epoch resets_at); textual regex retained as a fallback.
packages/adapters/codex/src/transcript.ts — docs only; the existing parse already matches the confirmed rollout schema.
packages/core/src/hook-script.ts + packages/cli/src/bootstrap-assets/hooks/hook.sh — curl -o /dev/null so the dispatcher's ok body doesn't pollute hook stdout (codex parses hook stdout as JSON). Strictly better for both adapters; drift test stays green.
packages/adapters/codex/test/adapter.test.ts — rewritten against the real schema and live-captured payloads; adds the ready-probe and flag tests.
packages/adapters/codex/scripts/verify-live-hooks.ts — live end-to-end heartbeat probe, now driving the real prompt-first order (manual/CI-gated).

Acceptance criteria

Criterion 1 — the real codex 0.133.0 [hooks] schema is read off the binary and the adapter emits the events the watchdog/stop-classifier consume; the heartbeat is observed end-to-end. Evidence: packages/adapters/codex/src/hooks.ts, packages/adapters/codex/test/adapter.test.ts, and the live e2e probe packages/adapters/codex/scripts/verify-live-hooks.ts (PASS — all five normalized events).
Criterion 2 — buildLaunchCommand sets CODEX_HOME so the adapter config loads, with auth reachable via the auth.json symlink. Evidence: packages/adapters/codex/test/adapter.test.ts.
Criterion 3 — the sandbox policy is applied via the real sandbox_mode key, verified under --strict-config (bare sandbox rejected). Evidence: packages/adapters/codex/test/adapter.test.ts.
Criterion 4 — stop classification + rate-limit detection read the real rollout artifact (structured rate_limits). Evidence: packages/adapters/codex/test/adapter.test.ts.
Criterion 5 (integration) — a full live dispatch through the prompt-first launch order dispatches codex into the running dispatcher and drives the heartbeat end-to-end (enterAutoMode → prompt → session.started → agent.stopped) against real codex 0.133.0; closes the deferred CodexAdapter #60/Verify the adapter abstraction holds across both adapters #63 criterion and resolves feat(dispatcher,core): launch-ordering accommodation for prompt-triggered-session adapters (codex live dispatch) #183. Evidence: the live e2e probe packages/adapters/codex/scripts/verify-live-hooks.ts (PASS) and the dispatcher ordering test packages/dispatcher/test/implementation-workflow.test.ts.

Verification

Live prompt-first dispatch (Criterion 5). verify-live-hooks.ts drives the real adapter in the dispatcher's prompt-first order against codex 0.133.0. PASS: enterAutoMode returned in 1317 ms (composer-ready, not the 90 s deadline); session.started did not fire before the prompt (proving prompt-triggered); full heartbeat session.started → turn.started → tool.pre → tool.post → agent.stopped; session.started carried a real transcript_path rollout; the agent ran the shell command and stopped.
Both launch orderings + the Claude regression. packages/dispatcher/test/implementation-workflow.test.ts asserts the prompt-first adapter sends the prompt before awaiting session.started (no deadlock, against a gate that only resolves once the prompt is sent), the boot-first adapter keeps the await-first sequence unchanged, and that await-first ordering deadlocks a prompt-triggered CLI (why the flag exists). packages/adapters/claude/test/adapter.test.ts locks that Claude leaves the flag unset.
Gaps 1–4 as previously: hooks fire end-to-end, CODEX_HOME/auth, sandbox_mode under --strict-config, structured rate_limits — all live-confirmed and unit-tested.
bun test (1261 pass), bun run typecheck, bun run lint, bun run format all green.

Why these changes

codex's prompt-triggered session is a genuine cross-CLI difference, not a codex-internal quirk — so the accommodation belongs at the AgentAdapter seam, expressed as a capability the dispatcher honors. Option A keeps it additive: a single optional flag, a localized dispatcher branch, zero change to the Claude path. enterAutoMode resolving on composer-ready (rather than the boot deadline) is what lets the now-awaited dialog dismissal feed straight into the prompt without stalling every launch.

Stumbling points

The old live verify script waited for session.started before sending the prompt — it always fell through its 90 s wait and then sent, so it "passed" without ever exercising the real dispatcher order. That masked the deadlock. The rewrite is a faithful mirror of implementation.ts and asserts the prompt-triggered invariant explicitly.
{kind:"done"} in the dispatcher test stub triggers the review-changes park (waiting-human), not completed; the ordering tests use bare-stop to isolate the launch order from the post-drive flow.

Decisions

See planning/issues/177/decisions.md → "RESOLUTION (2026-06-03)" for the Option A rationale and the full candidate space; the earlier entries cover gaps 1–4.

Branch

Rebased onto current main (clean, MERGEABLE) — now includes both the merged #178 poller fix and Epic #190's epic-store / dispatcher refactor (the epicNumber → epicRef rename in the adapter installHooks/buildPromptText params is adapted accordingly). Option A's launch-ordering branch was re-applied onto #190's restructured implementation.ts (new-work-as-base); the launch sequence and the Claude path are behaviorally unchanged from the live-verified state.

Recovery note: this PR was briefly closed when middle's open-PR divergence reconciler, firing as #190 merged, reset this branch to main instead of rebasing the codex commits on top (the deep #190/#177 interleave in implementation.ts). No code was lost — the approved work is recovered intact and re-verified (typecheck/lint/format/1261 tests green). The reconciler defect is filed as a follow-up.

Follow-up issues

feat(dispatcher,core): launch-ordering accommodation for prompt-triggered-session adapters (codex live dispatch) #183 (launch-ordering accommodation) — resolved in this PR per its own "within PR fix(adapter-codex): make CodexAdapter dispatchable against live codex 0.133.0 #182" path, now that the maintainer authorized the additive interface change.
fix(dispatcher): open-PR divergence reconciler reset an approved branch to main and closed the PR (data-loss) #201 (standalone) — the open-PR divergence reconciler reset this branch to main and closed the PR while feat(epic-store): file-backed Epic store (opt-in hybrid) #190 merged (the incident in the Recovery note above). Standalone: it's a dispatcher-reconciler defect, outside the codex-adapter surface.

Summary by CodeRabbit

New Features
- Adapters can opt into "start on first prompt" behavior; Codex improves interactive boot handling and readiness detection.
Tests
- Added live E2E verification and expanded tests for launch ordering, prompt/boot dialogs, hooks, transcripts, and rate-limit handling; regression added for Claude ordering.
Bug Fixes
- Hook responses no longer leak to adapter stdout.
Documentation
- Added decision log and planning docs for Codex changes; installer now handles auth linking and updated hook configs.

coderabbitai · 2026-05-29T05:23:20Z

📝 Walkthrough

Walkthrough

Adds an optional AgentAdapter.startsSessionOnFirstPrompt flag and branches dispatcher launch ordering to support prompt-first adapters; updates Codex adapter readiness probing, hook TOML generation, auth symlinking, and structured transcript rate-limit detection; expands tests, adds a live verification script, and records planning/decision docs.

Changes

Prompt-first adapter capability and dispatcher launch ordering

Layer / File(s)	Summary
AgentAdapter.startsSessionOnFirstPrompt contract `packages/core/src/adapter.ts`	Adds `startsSessionOnFirstPrompt?: boolean` to `AgentAdapter` documenting prompt-first semantics.
Dispatcher launch-order branching on adapter capability `packages/dispatcher/src/workflows/implementation.ts`	Branches drive ordering on `adapter.startsSessionOnFirstPrompt`, introduces `promptText`/`sendPrompt`, sends prompt before awaiting `SessionStart` for prompt-first adapters, otherwise awaits `SessionStart` first.
Dispatcher workflow tests for prompt-first ordering `packages/dispatcher/test/implementation-workflow.test.ts`	Adds ordering harness and tests validating prompt-first, boot-first, and deadlock scenarios with on-prompt vs immediate SessionGate behaviors.
Claude adapter regression test for startsSessionOnFirstPrompt `packages/adapters/claude/test/adapter.test.ts`	Adds a test asserting `claudeAdapter.startsSessionOnFirstPrompt` is falsy.

Codex adapter boot-mode readiness and hook configuration

Layer / File(s)	Summary
Codex boot dialog detection and enterAutoMode polling `packages/adapters/codex/src/index.ts`	Adds pane probes (hooks trust, dir trust, ready banner), exposes detection helpers, and rewrites `enterAutoMode` to answer dialogs and return on composer-ready. Ensures `CODEX_HOME` points to worktree `.codex`.
Codex hook TOML generation with matcher groups and event mapping `packages/adapters/codex/src/hooks.ts`	Generates matcher-group array-of-tables TOML, updates CODEX_EVENT_MAP to PascalCase events, uses `sandbox_mode`, pre-trusts worktree, registers PR-ready gate as second `PreToolUse` matcher scoped to `Bash`, and adds `linkAuth` for auth symlinking.
Shared hook script response body discard and docs `packages/cli/src/bootstrap-assets/hooks/hook.sh`, `packages/core/src/hook-script.ts`	Adds `-o /dev/null` to `curl` in hook scripts and documents why hook stdout must be free of dispatcher response bodies.

Codex transcript and rate-limit handling

Layer / File(s)	Summary
Codex structured rate-limit detection from rollout transcripts `packages/adapters/codex/src/classify.ts`	Adds structured-first rate-limit detector reading `token_count` `rate_limits`, converts `resets_at` to ISO, computes limit via `rate_limit_reached_type`/`used_percent >= 100`, falls back to regex; updates `classifyStop` and `detectRateLimit`.
Codex transcript schema documentation updates `packages/adapters/codex/src/transcript.ts`	Updates inline docs for `resolveTranscriptPath` and `readTranscriptState` to match codex 0.133.0 rollout shapes.

Tests and live verification

Layer / File(s)	Summary
Codex adapter test rewrites and expansions `packages/adapters/codex/test/adapter.test.ts`	Extensive test updates: launch/env assertions including `CODEX_HOME`, prompt framing updates (`epicRef: "177"`), real-shaped transcript fixtures, structured `rate_limits` tests, `installHooks` TOML parsing and PR-ready gate assertions, auth symlink tests, and prompt/dialog detection tests.
Live end-to-end Codex adapter verification script `packages/adapters/codex/scripts/verify-live-hooks.ts`	New Bun/TS script that installs hooks, launches real `codex` under `tmux` against a local dispatcher stand-in, drives enterAutoMode/prompt-first interaction, records hook events, validates ordering and required events, and emits diagnostics on failure.

Planning and decisions

Layer / File(s)	Summary
Issue 177 plan and decisions `planning/issues/177/plan.md`, `planning/issues/177/decisions.md`	Adds plan and decision records documenting the prompt-first resolution, Codex hook/config choices, auth symlink behavior, structured rate-limit decision, and discovery that Codex SessionStart can be prompt-triggered.

Sequence Diagram (high-level prompt-first launch flow):

sequenceDiagram
  participant Dispatcher
  participant CodexAdapter
  participant HookServer
  Dispatcher->>CodexAdapter: enterAutoMode()
  Dispatcher->>CodexAdapter: sendPrompt(promptText)
  CodexAdapter->>HookServer: emit SessionStart hook
  HookServer->>Dispatcher: POST /hooks/SessionStart
  Dispatcher->>CodexAdapter: await running-state update

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related issues

Make the CodexAdapter functionally dispatchable against live codex 0.133.0 #177 — This PR implements the Codex-specific changes and the startsSessionOnFirstPrompt capability tracked by Issue #177.

Possibly related PRs

thejustinwalsh/middle#73 — Related dispatcher/adapter launch flow work that this PR extends.
thejustinwalsh/middle#175 — Prior Codex adapter rate-limit/classify work that this PR builds upon.

Suggested labels

bug

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 76.92% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: making the CodexAdapter work with live codex 0.133.0 by addressing real integration gaps through reverse-engineering and implementation.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

thejustinwalsh · 2026-06-03T07:33:25Z

Approved: A

thejustinwalsh · 2026-06-03T08:48:59Z

+   * arrives before the gate is parked, so prompt-first is race-safe. See
+   * `packages/dispatcher/src/workflows/implementation.ts` launch→drive.
+   */
+  readonly startsSessionOnFirstPrompt?: boolean;


Decision (Option A, #183): additive optional flag — the AgentAdapter signature only grows. Absent/false keeps today's boot-triggered order, so every existing adapter and call site is unchanged. Maintainer-authorized ("Approved: A" + approved label on #177). Race-safety rests on the hook server stashing a session.started that lands before the gate parks (hook-server.ts #deliver/#await). Rationale: planning/issues/177/decisions.md → RESOLUTION (2026-06-03).

thejustinwalsh · 2026-06-03T08:49:00Z

+      // order forks on whether the CLI fires SessionStart at boot or only once
+      // the first prompt is submitted (AgentAdapter.startsSessionOnFirstPrompt).
+      let startPayload: HookPayload;
+      if (adapter.startsSessionOnFirstPrompt) {


Why prompt-first here: codex creates no session — and fires no SessionStart — until the first prompt is submitted, so await-first deadlocks it (the prompt that unblocks the wait is only sent after). The flagged path dismisses dialogs, sends the prompt, then awaits; enterAutoMode is awaited (not fire-and-forget) so a needs-login throw fails the launch fast instead of feeding the prompt into a login screen. The unflagged (Claude) arm below is byte-for-byte the prior sequence — covered by the regression test in implementation-workflow.test.ts.

thejustinwalsh · 2026-06-03T08:49:51Z

Reviewer's brief — PR #182 (Epic #177, codex live-dispatch)

The codex adapter is now functionally dispatchable against real codex-cli 0.133.0. Gaps 1–4 landed earlier; this brief focuses on the final piece — criterion 5, the prompt-first launch ordering (Option A, resolving #183), which the maintainer authorized ("Approved: A" + approved label).

How to run it

bun install
bun run typecheck && bun run lint && bun test     # 1194 pass, all green
# Live e2e (needs the codex 0.133.0 binary, a signed-in ~/.codex, tmux, network):
bun run packages/adapters/codex/scripts/verify-live-hooks.ts   # exits 0 = PASS

What to verify (and what "correct" looks like)

The interface change is genuinely additive. packages/core/src/adapter.ts — startsSessionOnFirstPrompt?: boolean is optional; no existing adapter or call site changed signature. Claude must not set it (packages/adapters/claude/test/adapter.test.ts asserts this).
The Claude path is unchanged. packages/dispatcher/src/workflows/implementation.ts launch→drive: the else (unflagged) arm is the prior enterAutoMode (parallel) → awaitSessionStart → updateWorkflow(running) → sendPrompt sequence. The ordering test asserts the exact call order [enter-auto-mode, await-session-start, send-prompt] for an unflagged adapter, and [enter-auto-mode, send-prompt, await-session-start] for a flagged one.
Prompt-first is race-safe. The flagged arm sends the prompt before awaiting session.started; this is safe only because the hook server stashes a session.started that arrives before a waiter parks (packages/dispatcher/src/hook-server.ts #deliver/#await). Worth a look that this reasoning holds.
The live heartbeat. The e2e probe drives the real adapter in the dispatcher's prompt-first order and asserts: enterAutoMode returns promptly on composer-ready (not the 90 s deadline), session.started is not seen before the prompt (proving prompt-triggered), and the full session.started → turn.started → tool.pre → tool.post → agent.stopped heartbeat with a real transcript_path.

How to review it

The five commits are split by concern: core flag → codex (flag + enterAutoMode composer-ready return) → dispatcher branch → claude regression test → live-verify rewrite → decisions log. The acceptance-criteria evidence table is in the PR body; rationale and the full candidate space are in planning/issues/177/decisions.md → "RESOLUTION (2026-06-03)".

Fragile bits that want extra eyes

enterAutoMode now returns on a positive pane signal (detectReadyForInput, the OpenAI Codex (vX.Y.Z) banner) rather than idling to the boot deadline. The ready-check sits after the dialog probes (which continue), and a dialog screen never renders the banner — so a still-open dialog is always answered first. The regex /OpenAI Codex\s*\(v/i and that ordering are the load-bearing assumptions; both are grounded in live captures off 0.133.0.
enterAutoMode is now awaited on the flagged path (vs fire-and-forget for Claude) — a needs-login throw fails the launch fast; the same outer catch kills the tmux session, so no leak.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/adapters/codex/scripts/verify-live-hooks.ts`:
- Around line 46-157: The test creates runtime resources (tmux session and Bun
server) in main() but doesn't guarantee cleanup on error; wrap the critical
region after creating server and starting the tmux session in a try/finally so
we always run cleanup: move the logic from after server/tmux creation
(everything that drives enterAutoMode, prompt send, waits, and capturePane) into
the try block, and in the finally block ensure you call
sh(["tmux","kill-session","-t", SESSION]) only if the session was started and
await server.stop(true) only if server was created, handling any errors from
those teardown calls so cleanup always runs; reference functions/vars: main(),
server, SESSION, codexAdapter.enterAutoMode, sh(), capturePane().
- Around line 42-44: The helper sh() currently awaits Bun.spawn(...).exited but
ignores the subprocess exit result; update sh to capture the exited result from
Bun.spawn(args, { stdout: "ignore", stderr: "ignore" }), check the exit code
(result.exitCode or exited.status depending on API) and throw an error or
otherwise fail fast when it is non-zero so tmux lifecycle failures
(new-session/send-keys/kill-session) are not silently ignored; reference the sh
function and the Bun.spawn invocation when locating where to add the exit-code
check and error handling.

In `@planning/issues/177/decisions.md`:
- Line 293: Several ATX headings in this document (for example the heading that
currently reads "`#177` labels are still just `phase:10`, `dogfood` (**no
`approved**`)" and the other headings noted) are missing a space after the
leading '#' which triggers markdownlint MD018; fix them by adding a single space
between the '#' characters and the heading text (e.g., change "`#Heading`" to "#
Heading") for each affected heading in decisions.md so they conform to MD018.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 2c121e60-62da-4928-9de7-825d1085fbe2

📥 Commits

Reviewing files that changed from the base of the PR and between 421425d and 72e25e6.

📒 Files selected for processing (14)

packages/adapters/claude/test/adapter.test.ts
packages/adapters/codex/scripts/verify-live-hooks.ts
packages/adapters/codex/src/classify.ts
packages/adapters/codex/src/hooks.ts
packages/adapters/codex/src/index.ts
packages/adapters/codex/src/transcript.ts
packages/adapters/codex/test/adapter.test.ts
packages/cli/src/bootstrap-assets/hooks/hook.sh
packages/core/src/adapter.ts
packages/core/src/hook-script.ts
packages/dispatcher/src/workflows/implementation.ts
packages/dispatcher/test/implementation-workflow.test.ts
planning/issues/177/decisions.md
planning/issues/177/plan.md

thejustinwalsh · 2026-06-03T11:44:47Z

🔁 Reconciled with main (rebased) after 7279b25

#177)

…OME + auth The Codex adapter's hook config was a start-generous guess that fired zero hooks. Reverse-engineered the real codex 0.133.0 surface (binary schema + live runs): its hooks are modelled on Claude Code's. - hooks.ts: emit the real `[hooks]` block — PascalCase event names (SessionStart/UserPromptSubmit/PreToolUse/PostToolUse/Stop/SubagentStop) as Claude-style matcher groups (`type = "command"`), not the fictional startup/turn-start/command/turn-end/shutdown taxonomy. Use `sandbox_mode` (bare `sandbox` is rejected under --strict-config). Pre-trust the worktree (`[projects."<wt>"] trust_level = "trusted"`) to skip the directory dialog. Keep the PR-ready gate as a second PreToolUse group scoped to the Bash tool (Codex's shell tool is named "Bash"). Symlink the operator's auth.json into the worktree CODEX_HOME. - index.ts: buildLaunchCommand sets CODEX_HOME=<worktree>/.codex so the config is actually loaded; enterAutoMode answers the hooks-trust ("Trust all and continue") and directory-trust boot dialogs (the bypass flag does not suppress them interactively).

- classify.ts: detect rate limits from the structured `rate_limits` block in the rollout's `token_count` events (rate_limit_reached_type / used_percent >= 100), yielding a precise `resetAt` from the epoch `resets_at` instead of "unknown". A healthy structured block is authoritative; the textual regex is a fallback only when no structured block is present. - transcript.ts: doc-only — the existing parse already matches the confirmed 0.133.0 rollout schema (transcript_path, assistant-message turns, function_call tool names, info.total_token_usage context fill).

Replace the assertions encoding the fictional hook taxonomy with the real one: PascalCase events as matcher groups, sandbox_mode, worktree pre-trust, CODEX_HOME in the launch env, the auth symlink, the Bash-scoped PR-ready gate group, the trust-dialog detectors, and structured-plus-fallback rate-limit classification — all using payload/rollout fixtures captured from live codex 0.133.0 runs.

…s clean Codex parses each hook's stdout as the hook's structured JSON output, so the dispatcher's plain `ok` response body — echoed to stdout by curl — made codex log `hook returned invalid <event> JSON output` on every fire (observed live as "Stop hook (failed)"). `curl -o /dev/null` discards the body, leaving clean empty stdout that both codex and Claude treat as a no-op. The byte-identical bootstrap-assets mirror is updated in lockstep (drift test stays green).

A manual/CI-gated probe (not in `bun test` — needs the real codex binary, a signed-in CODEX_HOME, tmux, and network) that drives the REAL adapter (installHooks + buildLaunchCommand + enterAutoMode) against live codex and asserts the normalized heartbeat (session.started w/ transcript_path, turn.started, tool.pre, tool.post, agent.stopped) reaches a local dispatcher stand-in. Captures the evidence for gaps 1-4. Also records the decisions-log finding that codex's interactive SessionStart is prompt-triggered (the launch-ordering blocker for full live dispatch).

…erion-5 scope decision

…ope decision

…ESUME #4 (loop is poller bug #178)

…hurn

Additive optional capability flag for CLIs that create no session — and fire no SessionStart/ready hook — until the first prompt is submitted (codex 0.133.0). Absent/false keeps the boot-triggered launch order; the dispatcher honors it to send the prompt before awaiting the ready hook. Race-safe because the hook server stashes a ready hook that arrives before the gate is parked.

…oMode on composer-ready codex creates no session until the first prompt, so it sets the flag and the dispatcher must send the prompt before awaiting session.started. enterAutoMode now resolves the instant the composer-ready welcome banner appears (detectReadyForInput, captured live off 0.133.0) instead of idling to the 90s boot deadline — load-bearing now that the dispatcher awaits it before sending the prompt. Tests cover the ready probe (real banner vs dialogs) and the flag.

…dapters Branch launch->drive on AgentAdapter.startsSessionOnFirstPrompt: a flagged adapter (codex) dismisses the boot dialogs, sends the prompt, THEN awaits session.started; an unflagged adapter (claude) keeps await-first then send. The hook server stashes a session.started arriving before the gate parks, so prompt-first is race-safe. Fixes the codex live-dispatch deadlock (#177 criterion 5, #183). Tests assert both orderings, that the Claude path call sequence is unchanged, and that await-first deadlocks a prompt-triggered CLI.

… unset Locks the #183 regression at the adapter level: Claude fires SessionStart at boot, so it must not set the flag — the dispatcher keeps await-first order for it.

…rify The probe now mirrors implementation.ts: await enterAutoMode -> send prompt -> await session.started, and asserts session.started is prompt-triggered (absent before the prompt) and enterAutoMode returns promptly. The old await-first wait always fell through its 90s window then sent, masking the deadlock.

…s param Epic #190 renamed the adapter installHooks/buildPromptText parameter epicNumber → epicRef (string-keyed Epic refs). The recovered live-verify script still passed epicNumber; align it with the current signature.

thejustinwalsh · 2026-06-03T12:04:19Z

Reviewer's brief — PR #182 restored & re-verified (Epic #177, codex live-dispatch)

PR #182 was approved, ready-for-review, and MERGEABLE, then briefly closed when middle's open-PR divergence reconciler — firing as Epic #190 merged to main — reset its branch to main and closed it instead of rebasing the codex commits on top (the deep #190/#177 interleave in implementation.ts). No work was lost. The approved work has been recovered and re-established:

Recovered from dangling commits at the ready-for-review tip (72e25e6), rebased cleanly onto current main (7279b25, which now carries feat(epic-store): file-backed Epic store (opt-in hybrid) #190's epic-store/dispatcher refactor). The one real conflict — Option A's launch-ordering branch vs. feat(epic-store): file-backed Epic store (opt-in hybrid) #190's restructured implementation.ts — was resolved new-work-as-base (Option A's intent re-applied onto feat(epic-store): file-backed Epic store (opt-in hybrid) #190's shape); feat(epic-store): file-backed Epic store (opt-in hybrid) #190's epicNumber→epicRef rename was adapted in the adapter installHooks/buildPromptText call sites and the live-verify script.
Re-verified: bun run typecheck, bun run lint, bun run format, and bun test (1261 pass, 0 fail) all green. PR is MERGEABLE / CLEAN, not draft.
The codex launch behavior and the Claude path are behaviorally unchanged from the previously live-verified state (only the cross-cutting epicRef rename was applied), so the prior live e2e evidence for criterion 5 (verify-live-hooks.ts PASS against codex 0.133.0) still holds; the deterministic dispatcher ordering tests (implementation-workflow.test.ts) cover both launch orderings and the await-first deadlock.

How to review

The substantive review is unchanged from the original brief on this PR (Option A: additive AgentAdapter.startsSessionOnFirstPrompt? + dispatcher prompt-first branch; gaps 1–4 in the codex adapter). New since then is only the rebase onto #190's main — focus a second look on packages/dispatcher/src/workflows/implementation.ts (the launch→drive fork) to confirm the merge preserved Option A's ordering and #190's gateway/epicRef changes together.

Run it

bun install && bun run typecheck && bun run lint && bun test

Live (operator-only, needs codex 0.133.0): bun packages/adapters/codex/scripts/verify-live-hooks.ts.

Fragile / extra eyes

The reconciler that destroyed this PR is a live data-loss defect, filed as fix(dispatcher): open-PR divergence reconciler reset an approved branch to main and closed the PR (data-loss) #201 — worth prioritizing before it empties the next approved PR.
implementation.ts launch ordering is the load-bearing merge point; the prompt-first vs await-first fork must stay intact for codex to dispatch at all.

The human does the final review and merge — the workflow does not merge.

coderabbitai

♻️ Duplicate comments (2)

packages/adapters/codex/scripts/verify-live-hooks.ts (2)

46-157: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing try/finally for deterministic cleanup.

If any step throws before lines 156-157, the tmux session and Bun server remain open. The past review flagged this and indicated it was addressed, but the current code lacks the try/finally wrapper.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/adapters/codex/scripts/verify-live-hooks.ts` around lines 46 - 157,
Wrap the runtime sequence inside main() (everything after starting the Bun.serve
and before the final tmux kill/server.stop) in a try/finally so cleanup always
runs: ensure the finally block awaits sh(["tmux","kill-session","-t", SESSION])
(or ignores errors if no session) and awaits server.stop(true); reference the
Bun.serve instance named server and the SESSION/session-related tmux usage and
the sh() invocations (newSessionArgs/send-keys/killing) so the tmux session and
Bun server are deterministically terminated even if codexAdapter.enterAutoMode,
buildLaunchCommand, sh, or any later loop throws.

42-44: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

sh() still ignores subprocess exit codes.

The helper awaits Bun.spawn(...).exited but doesn't check the result. A failing tmux command (e.g., new-session) would be silently ignored, leading to misleading harness outcomes. The past review flagged this and indicated it was fixed, but the current code still shows the unfixed version.

Suggested fix (matching the prior resolution)

 async function sh(args: string[]): Promise<void> {
-  await Bun.spawn(args, { stdout: "ignore", stderr: "ignore" }).exited;
+  const code = await Bun.spawn(args, { stdout: "ignore", stderr: "ignore" }).exited;
+  if (code !== 0) {
+    throw new Error(`Command failed (exit ${code}): ${args.join(" ")}`);
+  }
 }
+
+/** Best-effort shell: swallows non-zero exits (for cleanup commands). */
+async function quiet(fn: () => Promise<void>): Promise<void> {
+  try { await fn(); } catch { /* swallow */ }
+}

Then use quiet(() => sh(["tmux", "kill-session", ...])) for the stale-session cleanup at line 93.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/adapters/codex/scripts/verify-live-hooks.ts` around lines 42 - 44,
The sh helper currently awaits Bun.spawn(...).exited but ignores the spawn
result, so failing subprocesses are silently ignored; change sh to await the
spawn result (const res = await Bun.spawn(args, { stdout: "ignore", stderr:
"ignore" }).exited) and throw an error if res.exitCode !== 0 (include
args/exitCode in the error message) so callers see failures; also update the
stale tmux cleanup call (the tmux kill-session invocation) to use quiet(() =>
sh(["tmux", "kill-session", ...])) so non-fatal cleanup failures are suppressed
while all other sh callers will now surface errors.

🧹 Nitpick comments (3)

packages/dispatcher/src/workflows/implementation.ts (1)
501-510: 💤 Low value

Redundant String() coercion.

epicRef is already typed as string (line 501), so String(epicRef) at line 504 is a no-op. This may be leftover from when epicRef was numeric.
🧹 Suggested cleanup
 function readPlanBody(worktreePath: string, epicRef: string): string {
   try {
     return readFileSync(
-      join(worktreePath, "planning", "issues", String(epicRef), "plan.md"),
+      join(worktreePath, "planning", "issues", epicRef, "plan.md"),
       "utf8",
     );
   } catch {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/dispatcher/src/workflows/implementation.ts` around lines 501 - 510,
In readPlanBody(worktreePath: string, epicRef: string) remove the redundant
String(epicRef) coercion when building the path — use epicRef directly in
join(worktreePath, "planning", "issues", epicRef, "plan.md"); keep the try/catch
and return behavior unchanged and update any nearby comments if they mention
numeric coercion.
packages/adapters/codex/test/adapter.test.ts (2)
645-678: 💤 Low value

Test mutates process.env.CODEX_HOME without isolation from parallel test runs.

These tests directly mutate process.env.CODEX_HOME. If Bun runs tests in parallel (default behavior), another test reading process.env.CODEX_HOME could see the wrong value. The try/finally restore is good for sequential safety but not parallel safety.

Consider using test.serial or wrapping these in a dedicated describe block with beforeEach/afterEach that saves and restores the env var, ensuring the test framework serializes them.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/adapters/codex/test/adapter.test.ts` around lines 645 - 678, These
tests mutate process.env.CODEX_HOME (in the two tests around installInto and the
auth.json assertions) which can race with other tests; update the tests to avoid
global mutation by either converting them to run serially (use test.serial for
the two cases) or move them into a dedicated describe block and implement
beforeEach/afterEach that save the previous CODEX_HOME and restore it after each
test, then remove the inline try/finally blocks; reference the existing test
names and the installInto call to locate and change the tests.
328-341: 💤 Low value

Consider adding a test case for used_percent > 100.

The test covers exactly 100%, but the implementation logic (per PR objectives) triggers on used_percent >= 100. A test with used_percent: 105 would verify the > case explicitly, though the current test is likely sufficient since both use the same code path.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/adapters/codex/test/adapter.test.ts` around lines 328 - 341, Add a
second unit test in packages/adapters/codex/test/adapter.test.ts that mirrors
the existing "structured rate_limits at/over 100% used → rate-limited even
without reached_type" test but sets primary.used_percent to 105 (or any value
>100) to explicitly validate the >100 branch; use the same helpers
(writeMiddleDir, writeRolloutWithRateLimits) and call codexAdapter.classifyStop
with the same args, asserting result.kind === "rate-limited".

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@packages/adapters/codex/scripts/verify-live-hooks.ts`:
- Around line 46-157: Wrap the runtime sequence inside main() (everything after
starting the Bun.serve and before the final tmux kill/server.stop) in a
try/finally so cleanup always runs: ensure the finally block awaits
sh(["tmux","kill-session","-t", SESSION]) (or ignores errors if no session) and
awaits server.stop(true); reference the Bun.serve instance named server and the
SESSION/session-related tmux usage and the sh() invocations
(newSessionArgs/send-keys/killing) so the tmux session and Bun server are
deterministically terminated even if codexAdapter.enterAutoMode,
buildLaunchCommand, sh, or any later loop throws.
- Around line 42-44: The sh helper currently awaits Bun.spawn(...).exited but
ignores the spawn result, so failing subprocesses are silently ignored; change
sh to await the spawn result (const res = await Bun.spawn(args, { stdout:
"ignore", stderr: "ignore" }).exited) and throw an error if res.exitCode !== 0
(include args/exitCode in the error message) so callers see failures; also
update the stale tmux cleanup call (the tmux kill-session invocation) to use
quiet(() => sh(["tmux", "kill-session", ...])) so non-fatal cleanup failures are
suppressed while all other sh callers will now surface errors.

---

Nitpick comments:
In `@packages/adapters/codex/test/adapter.test.ts`:
- Around line 645-678: These tests mutate process.env.CODEX_HOME (in the two
tests around installInto and the auth.json assertions) which can race with other
tests; update the tests to avoid global mutation by either converting them to
run serially (use test.serial for the two cases) or move them into a dedicated
describe block and implement beforeEach/afterEach that save the previous
CODEX_HOME and restore it after each test, then remove the inline try/finally
blocks; reference the existing test names and the installInto call to locate and
change the tests.
- Around line 328-341: Add a second unit test in
packages/adapters/codex/test/adapter.test.ts that mirrors the existing
"structured rate_limits at/over 100% used → rate-limited even without
reached_type" test but sets primary.used_percent to 105 (or any value >100) to
explicitly validate the >100 branch; use the same helpers (writeMiddleDir,
writeRolloutWithRateLimits) and call codexAdapter.classifyStop with the same
args, asserting result.kind === "rate-limited".

In `@packages/dispatcher/src/workflows/implementation.ts`:
- Around line 501-510: In readPlanBody(worktreePath: string, epicRef: string)
remove the redundant String(epicRef) coercion when building the path — use
epicRef directly in join(worktreePath, "planning", "issues", epicRef,
"plan.md"); keep the try/catch and return behavior unchanged and update any
nearby comments if they mention numeric coercion.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8c1e5797-db4d-4891-a0fb-ffbd43227a00

📥 Commits

Reviewing files that changed from the base of the PR and between 1060996 and 0aec625.

📒 Files selected for processing (14)

packages/adapters/claude/test/adapter.test.ts
packages/adapters/codex/scripts/verify-live-hooks.ts
packages/adapters/codex/src/classify.ts
packages/adapters/codex/src/hooks.ts
packages/adapters/codex/src/index.ts
packages/adapters/codex/src/transcript.ts
packages/adapters/codex/test/adapter.test.ts
packages/cli/src/bootstrap-assets/hooks/hook.sh
packages/core/src/adapter.ts
packages/core/src/hook-script.ts
packages/dispatcher/src/workflows/implementation.ts
packages/dispatcher/test/implementation-workflow.test.ts
planning/issues/177/decisions.md
planning/issues/177/plan.md

✅ Files skipped from review due to trivial changes (4)

planning/issues/177/plan.md
packages/cli/src/bootstrap-assets/hooks/hook.sh
packages/adapters/codex/src/transcript.ts
planning/issues/177/decisions.md

A clean rebase that drops all of a PR's commits (each empty against the new main) exits 0 with HEAD == origin/main; applySuccess then force-pushed that empty HEAD over the branch, silently emptying an approved PR (#182). - tryRebaseOntoMain: detect the dropped-all-commits case after a clean rebase, restore the worktree to its pre-rebase HEAD, and return droppedAllCommits so the orchestrator escalates instead of pushing. - applySuccess: keystone last-line guard refuses to push when the remote branch has commits ahead of main but the local HEAD has none. - reconcileOpenPRs: route droppedAllCommits straight to applyDemoteToWork (skip the merge fallback) with a specific escalation reason. - GitOps gains revListCount + restore-only resetHard. Regression tests cover the helper guard, the applySuccess refusal, and the end-to-end escalation (branch not reset to main, PR not closed). Closes #201

This was referenced May 29, 2026

agent-queue: dispatch state #84

Open

Make the CodexAdapter functionally dispatchable against live codex 0.133.0 #177

Closed

feat(dispatcher,core): launch-ordering accommodation for prompt-triggered-session adapters (codex live dispatch) #183

Closed

thejustinwalsh force-pushed the middle-issue-177 branch 2 times, most recently from c0d6d85 to 72e25e6 Compare June 3, 2026 08:45

thejustinwalsh commented Jun 3, 2026

View reviewed changes

thejustinwalsh marked this pull request as ready for review June 3, 2026 08:49

thejustinwalsh added the ready-for-review All phases done and verified — PR ready for final human review and merge label Jun 3, 2026

coderabbitai Bot requested changes Jun 3, 2026

View reviewed changes

Comment thread packages/adapters/codex/scripts/verify-live-hooks.ts

Comment thread packages/adapters/codex/scripts/verify-live-hooks.ts

Comment thread planning/issues/177/decisions.md

coderabbitai Bot approved these changes Jun 3, 2026

View reviewed changes

thejustinwalsh closed this Jun 3, 2026

thejustinwalsh force-pushed the middle-issue-177 branch from 1060996 to 7279b25 Compare June 3, 2026 11:44

thejustinwalsh added 13 commits June 3, 2026 07:56

docs(adapter-codex): plan + decisions for live-0.133.0 dispatchability (

deb34f6

#177)

docs(adapter-codex): log resume with no answer; re-parked on the crit…

e2fdc5b

…erion-5 scope decision

docs(adapter-codex): log 3rd no-answer resume; hold on criterion-5 sc…

e94ea27

…ope decision

docs(adapter-codex): track criterion-5 launch-ordering as #183; log R…

ca23264

…ESUME #4 (loop is poller bug #178)

docs(adapter-codex): log RESUME #5 (5th #178 misfire); held without c…

be30a5b

…hurn

docs(adapter-codex): log RESUME #6 (6th #178 misfire); held without c…

c33c6ba

…hurn

docs(adapter-codex): log RESUME #7 (7th #178 misfire); held without c…

c781117

…hurn

thejustinwalsh added 6 commits June 3, 2026 07:57

test(adapter-claude): assert claude leaves startsSessionOnFirstPrompt…

c23046f

… unset Locks the #183 regression at the adapter level: Claude fires SessionStart at boot, so it must not set the flag — the dispatcher keeps await-first order for it.

docs(issue-177): log Option A resolution + criterion-5 live evidence

f0eb79e

thejustinwalsh reopened this Jun 3, 2026

thejustinwalsh mentioned this pull request Jun 3, 2026

fix(dispatcher): open-PR divergence reconciler reset an approved branch to main and closed the PR (data-loss) #201

Closed

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

thejustinwalsh mentioned this pull request Jun 3, 2026

fix(dispatcher): non-destructive open-PR divergence reconciler (no force-reset to main) #202

Merged

5 tasks

thejustinwalsh merged commit 8fbd926 into main Jun 3, 2026
1 check passed

thejustinwalsh deleted the middle-issue-177 branch June 3, 2026 13:18

coderabbitai Bot mentioned this pull request Jun 4, 2026

feat(adapters): CopilotAdapter — the third agent adapter #207

Merged

7 tasks

Conversation

thejustinwalsh commented May 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Status

Resolution of the launch-ordering blocker (Option A, #183)

What changed (gaps 1–4)

Acceptance criteria

Verification

Why these changes

Stumbling points

Decisions

Branch

Follow-up issues

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

❌ Failed checks (1 warning)

Review ran into problems

Uh oh!

thejustinwalsh commented Jun 3, 2026

Uh oh!

thejustinwalsh Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

thejustinwalsh Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

thejustinwalsh commented Jun 3, 2026

Reviewer's brief — PR #182 (Epic #177, codex live-dispatch)

How to run it

What to verify (and what "correct" looks like)

How to review it

Fragile bits that want extra eyes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thejustinwalsh commented Jun 3, 2026

Uh oh!

thejustinwalsh commented Jun 3, 2026

Reviewer's brief — PR #182 restored & re-verified (Epic #177, codex live-dispatch)

How to review

Run it

Fragile / extra eyes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thejustinwalsh commented May 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 29, 2026 •

edited

Loading