fix(adapter-codex): make CodexAdapter dispatchable against live codex 0.133.0#182
Conversation
📝 WalkthroughWalkthroughAdds an optional AgentAdapter.startsSessionOnFirstPrompt flag and branches dispatcher launch ordering to support prompt-first adapters; updates Codex adapter readiness probing, hook TOML generation, auth symlinking, and structured transcript rate-limit detection; expands tests, adds a live verification script, and records planning/decision docs. ChangesPrompt-first adapter capability and dispatcher launch ordering
Codex adapter boot-mode readiness and hook configuration
Codex transcript and rate-limit handling
Tests and live verification
Planning and decisions
Sequence Diagram (high-level prompt-first launch flow): sequenceDiagram
participant Dispatcher
participant CodexAdapter
participant HookServer
Dispatcher->>CodexAdapter: enterAutoMode()
Dispatcher->>CodexAdapter: sendPrompt(promptText)
CodexAdapter->>HookServer: emit SessionStart hook
HookServer->>Dispatcher: POST /hooks/SessionStart
Dispatcher->>CodexAdapter: await running-state update
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Possibly related issues
Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Comment |
|
Approved: A |
c0d6d85 to
72e25e6
Compare
| * arrives before the gate is parked, so prompt-first is race-safe. See | ||
| * `packages/dispatcher/src/workflows/implementation.ts` launch→drive. | ||
| */ | ||
| readonly startsSessionOnFirstPrompt?: boolean; |
There was a problem hiding this comment.
Decision (Option A, #183): additive optional flag — the AgentAdapter signature only grows. Absent/false keeps today's boot-triggered order, so every existing adapter and call site is unchanged. Maintainer-authorized ("Approved: A" + approved label on #177). Race-safety rests on the hook server stashing a session.started that lands before the gate parks (hook-server.ts #deliver/#await). Rationale: planning/issues/177/decisions.md → RESOLUTION (2026-06-03).
| // order forks on whether the CLI fires SessionStart at boot or only once | ||
| // the first prompt is submitted (AgentAdapter.startsSessionOnFirstPrompt). | ||
| let startPayload: HookPayload; | ||
| if (adapter.startsSessionOnFirstPrompt) { |
There was a problem hiding this comment.
Why prompt-first here: codex creates no session — and fires no SessionStart — until the first prompt is submitted, so await-first deadlocks it (the prompt that unblocks the wait is only sent after). The flagged path dismisses dialogs, sends the prompt, then awaits; enterAutoMode is awaited (not fire-and-forget) so a needs-login throw fails the launch fast instead of feeding the prompt into a login screen. The unflagged (Claude) arm below is byte-for-byte the prior sequence — covered by the regression test in implementation-workflow.test.ts.
Reviewer's brief — PR #182 (Epic #177, codex live-dispatch)The codex adapter is now functionally dispatchable against real How to run itbun install
bun run typecheck && bun run lint && bun test # 1194 pass, all green
# Live e2e (needs the codex 0.133.0 binary, a signed-in ~/.codex, tmux, network):
bun run packages/adapters/codex/scripts/verify-live-hooks.ts # exits 0 = PASSWhat to verify (and what "correct" looks like)
How to review itThe five commits are split by concern: core flag → codex (flag + Fragile bits that want extra eyes
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/adapters/codex/scripts/verify-live-hooks.ts`:
- Around line 46-157: The test creates runtime resources (tmux session and Bun
server) in main() but doesn't guarantee cleanup on error; wrap the critical
region after creating server and starting the tmux session in a try/finally so
we always run cleanup: move the logic from after server/tmux creation
(everything that drives enterAutoMode, prompt send, waits, and capturePane) into
the try block, and in the finally block ensure you call
sh(["tmux","kill-session","-t", SESSION]) only if the session was started and
await server.stop(true) only if server was created, handling any errors from
those teardown calls so cleanup always runs; reference functions/vars: main(),
server, SESSION, codexAdapter.enterAutoMode, sh(), capturePane().
- Around line 42-44: The helper sh() currently awaits Bun.spawn(...).exited but
ignores the subprocess exit result; update sh to capture the exited result from
Bun.spawn(args, { stdout: "ignore", stderr: "ignore" }), check the exit code
(result.exitCode or exited.status depending on API) and throw an error or
otherwise fail fast when it is non-zero so tmux lifecycle failures
(new-session/send-keys/kill-session) are not silently ignored; reference the sh
function and the Bun.spawn invocation when locating where to add the exit-code
check and error handling.
In `@planning/issues/177/decisions.md`:
- Line 293: Several ATX headings in this document (for example the heading that
currently reads "`#177` labels are still just `phase:10`, `dogfood` (**no
`approved**`)" and the other headings noted) are missing a space after the
leading '#' which triggers markdownlint MD018; fix them by adding a single space
between the '#' characters and the heading text (e.g., change "`#Heading`" to "#
Heading") for each affected heading in decisions.md so they conform to MD018.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 2c121e60-62da-4928-9de7-825d1085fbe2
📒 Files selected for processing (14)
packages/adapters/claude/test/adapter.test.tspackages/adapters/codex/scripts/verify-live-hooks.tspackages/adapters/codex/src/classify.tspackages/adapters/codex/src/hooks.tspackages/adapters/codex/src/index.tspackages/adapters/codex/src/transcript.tspackages/adapters/codex/test/adapter.test.tspackages/cli/src/bootstrap-assets/hooks/hook.shpackages/core/src/adapter.tspackages/core/src/hook-script.tspackages/dispatcher/src/workflows/implementation.tspackages/dispatcher/test/implementation-workflow.test.tsplanning/issues/177/decisions.mdplanning/issues/177/plan.md
|
🔁 Reconciled with main (rebased) after 7279b25 |
1060996 to
7279b25
Compare
…OME + auth
The Codex adapter's hook config was a start-generous guess that fired zero
hooks. Reverse-engineered the real codex 0.133.0 surface (binary schema + live
runs): its hooks are modelled on Claude Code's.
- hooks.ts: emit the real `[hooks]` block — PascalCase event names
(SessionStart/UserPromptSubmit/PreToolUse/PostToolUse/Stop/SubagentStop) as
Claude-style matcher groups (`type = "command"`), not the fictional
startup/turn-start/command/turn-end/shutdown taxonomy. Use `sandbox_mode`
(bare `sandbox` is rejected under --strict-config). Pre-trust the worktree
(`[projects."<wt>"] trust_level = "trusted"`) to skip the directory dialog.
Keep the PR-ready gate as a second PreToolUse group scoped to the Bash tool
(Codex's shell tool is named "Bash"). Symlink the operator's auth.json into
the worktree CODEX_HOME.
- index.ts: buildLaunchCommand sets CODEX_HOME=<worktree>/.codex so the config
is actually loaded; enterAutoMode answers the hooks-trust ("Trust all and
continue") and directory-trust boot dialogs (the bypass flag does not suppress
them interactively).
- classify.ts: detect rate limits from the structured `rate_limits` block in the rollout's `token_count` events (rate_limit_reached_type / used_percent >= 100), yielding a precise `resetAt` from the epoch `resets_at` instead of "unknown". A healthy structured block is authoritative; the textual regex is a fallback only when no structured block is present. - transcript.ts: doc-only — the existing parse already matches the confirmed 0.133.0 rollout schema (transcript_path, assistant-message turns, function_call tool names, info.total_token_usage context fill).
Replace the assertions encoding the fictional hook taxonomy with the real one: PascalCase events as matcher groups, sandbox_mode, worktree pre-trust, CODEX_HOME in the launch env, the auth symlink, the Bash-scoped PR-ready gate group, the trust-dialog detectors, and structured-plus-fallback rate-limit classification — all using payload/rollout fixtures captured from live codex 0.133.0 runs.
…s clean Codex parses each hook's stdout as the hook's structured JSON output, so the dispatcher's plain `ok` response body — echoed to stdout by curl — made codex log `hook returned invalid <event> JSON output` on every fire (observed live as "Stop hook (failed)"). `curl -o /dev/null` discards the body, leaving clean empty stdout that both codex and Claude treat as a no-op. The byte-identical bootstrap-assets mirror is updated in lockstep (drift test stays green).
A manual/CI-gated probe (not in `bun test` — needs the real codex binary, a signed-in CODEX_HOME, tmux, and network) that drives the REAL adapter (installHooks + buildLaunchCommand + enterAutoMode) against live codex and asserts the normalized heartbeat (session.started w/ transcript_path, turn.started, tool.pre, tool.post, agent.stopped) reaches a local dispatcher stand-in. Captures the evidence for gaps 1-4. Also records the decisions-log finding that codex's interactive SessionStart is prompt-triggered (the launch-ordering blocker for full live dispatch).
…erion-5 scope decision
Additive optional capability flag for CLIs that create no session — and fire no SessionStart/ready hook — until the first prompt is submitted (codex 0.133.0). Absent/false keeps the boot-triggered launch order; the dispatcher honors it to send the prompt before awaiting the ready hook. Race-safe because the hook server stashes a ready hook that arrives before the gate is parked.
…oMode on composer-ready codex creates no session until the first prompt, so it sets the flag and the dispatcher must send the prompt before awaiting session.started. enterAutoMode now resolves the instant the composer-ready welcome banner appears (detectReadyForInput, captured live off 0.133.0) instead of idling to the 90s boot deadline — load-bearing now that the dispatcher awaits it before sending the prompt. Tests cover the ready probe (real banner vs dialogs) and the flag.
…dapters Branch launch->drive on AgentAdapter.startsSessionOnFirstPrompt: a flagged adapter (codex) dismisses the boot dialogs, sends the prompt, THEN awaits session.started; an unflagged adapter (claude) keeps await-first then send. The hook server stashes a session.started arriving before the gate parks, so prompt-first is race-safe. Fixes the codex live-dispatch deadlock (#177 criterion 5, #183). Tests assert both orderings, that the Claude path call sequence is unchanged, and that await-first deadlocks a prompt-triggered CLI.
… unset Locks the #183 regression at the adapter level: Claude fires SessionStart at boot, so it must not set the flag — the dispatcher keeps await-first order for it.
…rify The probe now mirrors implementation.ts: await enterAutoMode -> send prompt -> await session.started, and asserts session.started is prompt-triggered (absent before the prompt) and enterAutoMode returns promptly. The old await-first wait always fell through its 90s window then sent, masking the deadlock.
…s param Epic #190 renamed the adapter installHooks/buildPromptText parameter epicNumber → epicRef (string-keyed Epic refs). The recovered live-verify script still passed epicNumber; align it with the current signature.
Reviewer's brief — PR #182 restored & re-verified (Epic #177, codex live-dispatch)PR #182 was approved, ready-for-review, and MERGEABLE, then briefly closed when middle's open-PR divergence reconciler — firing as Epic #190 merged to
How to reviewThe substantive review is unchanged from the original brief on this PR (Option A: additive Run itLive (operator-only, needs codex 0.133.0): Fragile / extra eyes
The human does the final review and merge — the workflow does not merge. |
There was a problem hiding this comment.
♻️ Duplicate comments (2)
packages/adapters/codex/scripts/verify-live-hooks.ts (2)
46-157:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMissing
try/finallyfor deterministic cleanup.If any step throws before lines 156-157, the tmux session and Bun server remain open. The past review flagged this and indicated it was addressed, but the current code lacks the
try/finallywrapper.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/adapters/codex/scripts/verify-live-hooks.ts` around lines 46 - 157, Wrap the runtime sequence inside main() (everything after starting the Bun.serve and before the final tmux kill/server.stop) in a try/finally so cleanup always runs: ensure the finally block awaits sh(["tmux","kill-session","-t", SESSION]) (or ignores errors if no session) and awaits server.stop(true); reference the Bun.serve instance named server and the SESSION/session-related tmux usage and the sh() invocations (newSessionArgs/send-keys/killing) so the tmux session and Bun server are deterministically terminated even if codexAdapter.enterAutoMode, buildLaunchCommand, sh, or any later loop throws.
42-44:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
sh()still ignores subprocess exit codes.The helper awaits
Bun.spawn(...).exitedbut doesn't check the result. A failing tmux command (e.g.,new-session) would be silently ignored, leading to misleading harness outcomes. The past review flagged this and indicated it was fixed, but the current code still shows the unfixed version.Suggested fix (matching the prior resolution)
async function sh(args: string[]): Promise<void> { - await Bun.spawn(args, { stdout: "ignore", stderr: "ignore" }).exited; + const code = await Bun.spawn(args, { stdout: "ignore", stderr: "ignore" }).exited; + if (code !== 0) { + throw new Error(`Command failed (exit ${code}): ${args.join(" ")}`); + } } + +/** Best-effort shell: swallows non-zero exits (for cleanup commands). */ +async function quiet(fn: () => Promise<void>): Promise<void> { + try { await fn(); } catch { /* swallow */ } +}Then use
quiet(() => sh(["tmux", "kill-session", ...]))for the stale-session cleanup at line 93.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/adapters/codex/scripts/verify-live-hooks.ts` around lines 42 - 44, The sh helper currently awaits Bun.spawn(...).exited but ignores the spawn result, so failing subprocesses are silently ignored; change sh to await the spawn result (const res = await Bun.spawn(args, { stdout: "ignore", stderr: "ignore" }).exited) and throw an error if res.exitCode !== 0 (include args/exitCode in the error message) so callers see failures; also update the stale tmux cleanup call (the tmux kill-session invocation) to use quiet(() => sh(["tmux", "kill-session", ...])) so non-fatal cleanup failures are suppressed while all other sh callers will now surface errors.
🧹 Nitpick comments (3)
packages/dispatcher/src/workflows/implementation.ts (1)
501-510: 💤 Low valueRedundant
String()coercion.
epicRefis already typed asstring(line 501), soString(epicRef)at line 504 is a no-op. This may be leftover from whenepicRefwas numeric.🧹 Suggested cleanup
function readPlanBody(worktreePath: string, epicRef: string): string { try { return readFileSync( - join(worktreePath, "planning", "issues", String(epicRef), "plan.md"), + join(worktreePath, "planning", "issues", epicRef, "plan.md"), "utf8", ); } catch {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/dispatcher/src/workflows/implementation.ts` around lines 501 - 510, In readPlanBody(worktreePath: string, epicRef: string) remove the redundant String(epicRef) coercion when building the path — use epicRef directly in join(worktreePath, "planning", "issues", epicRef, "plan.md"); keep the try/catch and return behavior unchanged and update any nearby comments if they mention numeric coercion.packages/adapters/codex/test/adapter.test.ts (2)
645-678: 💤 Low valueTest mutates
process.env.CODEX_HOMEwithout isolation from parallel test runs.These tests directly mutate
process.env.CODEX_HOME. If Bun runs tests in parallel (default behavior), another test readingprocess.env.CODEX_HOMEcould see the wrong value. Thetry/finallyrestore is good for sequential safety but not parallel safety.Consider using
test.serialor wrapping these in a dedicateddescribeblock withbeforeEach/afterEachthat saves and restores the env var, ensuring the test framework serializes them.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/adapters/codex/test/adapter.test.ts` around lines 645 - 678, These tests mutate process.env.CODEX_HOME (in the two tests around installInto and the auth.json assertions) which can race with other tests; update the tests to avoid global mutation by either converting them to run serially (use test.serial for the two cases) or move them into a dedicated describe block and implement beforeEach/afterEach that save the previous CODEX_HOME and restore it after each test, then remove the inline try/finally blocks; reference the existing test names and the installInto call to locate and change the tests.
328-341: 💤 Low valueConsider adding a test case for
used_percent > 100.The test covers exactly 100%, but the implementation logic (per PR objectives) triggers on
used_percent >= 100. A test withused_percent: 105would verify the>case explicitly, though the current test is likely sufficient since both use the same code path.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/adapters/codex/test/adapter.test.ts` around lines 328 - 341, Add a second unit test in packages/adapters/codex/test/adapter.test.ts that mirrors the existing "structured rate_limits at/over 100% used → rate-limited even without reached_type" test but sets primary.used_percent to 105 (or any value >100) to explicitly validate the >100 branch; use the same helpers (writeMiddleDir, writeRolloutWithRateLimits) and call codexAdapter.classifyStop with the same args, asserting result.kind === "rate-limited".
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@packages/adapters/codex/scripts/verify-live-hooks.ts`:
- Around line 46-157: Wrap the runtime sequence inside main() (everything after
starting the Bun.serve and before the final tmux kill/server.stop) in a
try/finally so cleanup always runs: ensure the finally block awaits
sh(["tmux","kill-session","-t", SESSION]) (or ignores errors if no session) and
awaits server.stop(true); reference the Bun.serve instance named server and the
SESSION/session-related tmux usage and the sh() invocations
(newSessionArgs/send-keys/killing) so the tmux session and Bun server are
deterministically terminated even if codexAdapter.enterAutoMode,
buildLaunchCommand, sh, or any later loop throws.
- Around line 42-44: The sh helper currently awaits Bun.spawn(...).exited but
ignores the spawn result, so failing subprocesses are silently ignored; change
sh to await the spawn result (const res = await Bun.spawn(args, { stdout:
"ignore", stderr: "ignore" }).exited) and throw an error if res.exitCode !== 0
(include args/exitCode in the error message) so callers see failures; also
update the stale tmux cleanup call (the tmux kill-session invocation) to use
quiet(() => sh(["tmux", "kill-session", ...])) so non-fatal cleanup failures are
suppressed while all other sh callers will now surface errors.
---
Nitpick comments:
In `@packages/adapters/codex/test/adapter.test.ts`:
- Around line 645-678: These tests mutate process.env.CODEX_HOME (in the two
tests around installInto and the auth.json assertions) which can race with other
tests; update the tests to avoid global mutation by either converting them to
run serially (use test.serial for the two cases) or move them into a dedicated
describe block and implement beforeEach/afterEach that save the previous
CODEX_HOME and restore it after each test, then remove the inline try/finally
blocks; reference the existing test names and the installInto call to locate and
change the tests.
- Around line 328-341: Add a second unit test in
packages/adapters/codex/test/adapter.test.ts that mirrors the existing
"structured rate_limits at/over 100% used → rate-limited even without
reached_type" test but sets primary.used_percent to 105 (or any value >100) to
explicitly validate the >100 branch; use the same helpers (writeMiddleDir,
writeRolloutWithRateLimits) and call codexAdapter.classifyStop with the same
args, asserting result.kind === "rate-limited".
In `@packages/dispatcher/src/workflows/implementation.ts`:
- Around line 501-510: In readPlanBody(worktreePath: string, epicRef: string)
remove the redundant String(epicRef) coercion when building the path — use
epicRef directly in join(worktreePath, "planning", "issues", epicRef,
"plan.md"); keep the try/catch and return behavior unchanged and update any
nearby comments if they mention numeric coercion.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 8c1e5797-db4d-4891-a0fb-ffbd43227a00
📒 Files selected for processing (14)
packages/adapters/claude/test/adapter.test.tspackages/adapters/codex/scripts/verify-live-hooks.tspackages/adapters/codex/src/classify.tspackages/adapters/codex/src/hooks.tspackages/adapters/codex/src/index.tspackages/adapters/codex/src/transcript.tspackages/adapters/codex/test/adapter.test.tspackages/cli/src/bootstrap-assets/hooks/hook.shpackages/core/src/adapter.tspackages/core/src/hook-script.tspackages/dispatcher/src/workflows/implementation.tspackages/dispatcher/test/implementation-workflow.test.tsplanning/issues/177/decisions.mdplanning/issues/177/plan.md
✅ Files skipped from review due to trivial changes (4)
- planning/issues/177/plan.md
- packages/cli/src/bootstrap-assets/hooks/hook.sh
- packages/adapters/codex/src/transcript.ts
- planning/issues/177/decisions.md
A clean rebase that drops all of a PR's commits (each empty against the new main) exits 0 with HEAD == origin/main; applySuccess then force-pushed that empty HEAD over the branch, silently emptying an approved PR (#182). - tryRebaseOntoMain: detect the dropped-all-commits case after a clean rebase, restore the worktree to its pre-rebase HEAD, and return droppedAllCommits so the orchestrator escalates instead of pushing. - applySuccess: keystone last-line guard refuses to push when the remote branch has commits ahead of main but the local HEAD has none. - reconcileOpenPRs: route droppedAllCommits straight to applyDemoteToWork (skip the merge fallback) with a specific escalation reason. - GitOps gains revListCount + restore-only resetHard. Regression tests cover the helper guard, the applySuccess refusal, and the end-to-end escalation (branch not reset to main, PR not closed). Closes #201
A clean rebase that drops all of a PR's commits (each empty against the new main) exits 0 with HEAD == origin/main; applySuccess then force-pushed that empty HEAD over the branch, silently emptying an approved PR (#182). - tryRebaseOntoMain: detect the dropped-all-commits case after a clean rebase, restore the worktree to its pre-rebase HEAD, and return droppedAllCommits so the orchestrator escalates instead of pushing. - applySuccess: keystone last-line guard refuses to push when the remote branch has commits ahead of main but the local HEAD has none. - reconcileOpenPRs: route droppedAllCommits straight to applyDemoteToWork (skip the merge fallback) with a specific escalation reason. - GitOps gains revListCount + restore-only resetHard. Regression tests cover the helper guard, the applySuccess refusal, and the end-to-end escalation (branch not reset to main, PR not closed). Closes #201
A clean rebase that drops all of a PR's commits (each empty against the new main) exits 0 with HEAD == origin/main; applySuccess then force-pushed that empty HEAD over the branch, silently emptying an approved PR (#182). - tryRebaseOntoMain: detect the dropped-all-commits case after a clean rebase, restore the worktree to its pre-rebase HEAD, and return droppedAllCommits so the orchestrator escalates instead of pushing. - applySuccess: keystone last-line guard refuses to push when the remote branch has commits ahead of main but the local HEAD has none. - reconcileOpenPRs: route droppedAllCommits straight to applyDemoteToWork (skip the merge fallback) with a specific escalation reason. - GitOps gains revListCount + restore-only resetHard. Regression tests cover the helper guard, the applySuccess refusal, and the end-to-end escalation (branch not reset to main, PR not closed). Closes #201
Summary
Closes #177. Resolves #183.
Makes the Codex adapter genuinely dispatchable against the real
codex-cli 0.133.0binary. Reverse-engineered the real surface (binary schema + many live tmux runs), fixed the four gaps the issue listed, then closed the fifth — full live dispatch — via the maintainer-approved Option A: an additive optionalAgentAdapter.startsSessionOnFirstPromptflag plus a dispatcher prompt-first launch branch. All five verified live end-to-end.The keystone finding: codex 0.133.0's hooks are modelled on Claude Code's — same PascalCase event names, same payload fields, same
Bashshell-tool name. The adapter's inventedstartup/turn-start/command/turn-end/shutdowntaxonomy fired zero hooks; the real schema fires end-to-end.Status
[hooks], PascalCase events, matcher groups, Bash-scoped PR-ready gate;enterAutoModeanswers the trust dialogsCODEX_HOME+ auth reachability —buildLaunchCommandsetsCODEX_HOME=<worktree>/.codex;installHookssymlinks the operator'sauth.jsonsandbox_mode(verified accepted under--strict-config; baresandboxis rejected)rate_limitsin the rollouttoken_counteventsagent.stoppedend-to-end through the real adapterResolution of the launch-ordering blocker (Option A, #183)
Live dispatch revealed interactive
codexcreates no session — and fires noSessionStart— until the first prompt is submitted (Claude fires it at boot). The dispatcher's fixedenterAutoMode → awaitSessionStart → sendText(prompt)order therefore deadlocked codex: the prompt that would unblock the wait was only sent after it. This is the launch→drive ordering, not the codex adapter's internals — so it needed a shared-code change the issue's "no interface change" headline reserved for the maintainer. The maintainer approved Option A ("Approved: A" +approvedlabel on #177).What shipped:
packages/core/src/adapter.ts— additive, optional, documentedAgentAdapter.startsSessionOnFirstPrompt?: boolean. Absent/false⇒ today's boot-triggered order. The interface only grows an optional field; every existing adapter and call site compiles unchanged.packages/dispatcher/src/workflows/implementation.ts— the launch→drive step branches on the flag: a flagged adapter dismisses the boot dialogs, sends the prompt, then awaitssession.started; an unflagged adapter keeps await-first, then send. The hook server already stashes asession.startedthat lands before the gate is parked, so prompt-first is race-safe. The Claude path's call sequence is byte-for-byte unchanged (regression test).packages/adapters/codex/src/index.ts— codex setsstartsSessionOnFirstPrompt: true, andenterAutoModenow returns the instant the composer is ready (detectReadyForInput— theOpenAI Codex (vX.Y.Z)welcome banner, captured live) instead of idling to the 90s boot deadline. That promptness is load-bearing now that the dispatcher awaitsenterAutoModebefore sending the prompt.What changed (gaps 1–4)
packages/adapters/codex/src/hooks.ts— real[hooks]schema (PascalCase events as matcher groups,type="command"),sandbox_mode, worktree pre-trust ([projects."<wt>"] trust_level="trusted"), theBash-scoped PR-ready gate, and anauth.jsonsymlink into the worktreeCODEX_HOME.packages/adapters/codex/src/index.ts—buildLaunchCommandsetsCODEX_HOME;enterAutoModeanswers the hooks-trust and directory-trust dialogs.packages/adapters/codex/src/classify.ts— rate-limit detection reads the structuredrate_limitsblock (preciseresetAtfrom the epochresets_at); textual regex retained as a fallback.packages/adapters/codex/src/transcript.ts— docs only; the existing parse already matches the confirmed rollout schema.packages/core/src/hook-script.ts+packages/cli/src/bootstrap-assets/hooks/hook.sh—curl -o /dev/nullso the dispatcher'sokbody doesn't pollute hook stdout (codex parses hook stdout as JSON). Strictly better for both adapters; drift test stays green.packages/adapters/codex/test/adapter.test.ts— rewritten against the real schema and live-captured payloads; adds the ready-probe and flag tests.packages/adapters/codex/scripts/verify-live-hooks.ts— live end-to-end heartbeat probe, now driving the real prompt-first order (manual/CI-gated).Acceptance criteria
[hooks]schema is read off the binary and the adapter emits the events the watchdog/stop-classifier consume; the heartbeat is observed end-to-end. Evidence:packages/adapters/codex/src/hooks.ts,packages/adapters/codex/test/adapter.test.ts, and the live e2e probepackages/adapters/codex/scripts/verify-live-hooks.ts(PASS — all five normalized events).buildLaunchCommandsetsCODEX_HOMEso the adapter config loads, with auth reachable via theauth.jsonsymlink. Evidence:packages/adapters/codex/test/adapter.test.ts.sandbox_modekey, verified under--strict-config(baresandboxrejected). Evidence:packages/adapters/codex/test/adapter.test.ts.rate_limits). Evidence:packages/adapters/codex/test/adapter.test.ts.enterAutoMode→ prompt →session.started→agent.stopped) against real codex 0.133.0; closes the deferred CodexAdapter #60/Verify the adapter abstraction holds across both adapters #63 criterion and resolves feat(dispatcher,core): launch-ordering accommodation for prompt-triggered-session adapters (codex live dispatch) #183. Evidence: the live e2e probepackages/adapters/codex/scripts/verify-live-hooks.ts(PASS) and the dispatcher ordering testpackages/dispatcher/test/implementation-workflow.test.ts.Verification
verify-live-hooks.tsdrives the real adapter in the dispatcher's prompt-first order against codex 0.133.0. PASS:enterAutoModereturned in 1317 ms (composer-ready, not the 90 s deadline);session.starteddid not fire before the prompt (proving prompt-triggered); full heartbeatsession.started → turn.started → tool.pre → tool.post → agent.stopped;session.startedcarried a realtranscript_pathrollout; the agent ran the shell command and stopped.packages/dispatcher/test/implementation-workflow.test.tsasserts the prompt-first adapter sends the prompt before awaitingsession.started(no deadlock, against a gate that only resolves once the prompt is sent), the boot-first adapter keeps the await-first sequence unchanged, and that await-first ordering deadlocks a prompt-triggered CLI (why the flag exists).packages/adapters/claude/test/adapter.test.tslocks that Claude leaves the flag unset.CODEX_HOME/auth,sandbox_modeunder--strict-config, structuredrate_limits— all live-confirmed and unit-tested.bun test(1261 pass),bun run typecheck,bun run lint,bun run formatall green.Why these changes
codex's prompt-triggered session is a genuine cross-CLI difference, not a codex-internal quirk — so the accommodation belongs at the
AgentAdapterseam, expressed as a capability the dispatcher honors. Option A keeps it additive: a single optional flag, a localized dispatcher branch, zero change to the Claude path.enterAutoModeresolving on composer-ready (rather than the boot deadline) is what lets the now-awaited dialog dismissal feed straight into the prompt without stalling every launch.Stumbling points
session.startedbefore sending the prompt — it always fell through its 90 s wait and then sent, so it "passed" without ever exercising the real dispatcher order. That masked the deadlock. The rewrite is a faithful mirror ofimplementation.tsand asserts the prompt-triggered invariant explicitly.{kind:"done"}in the dispatcher test stub triggers the review-changes park (waiting-human), notcompleted; the ordering tests usebare-stopto isolate the launch order from the post-drive flow.Decisions
See
planning/issues/177/decisions.md→ "RESOLUTION (2026-06-03)" for the Option A rationale and the full candidate space; the earlier entries cover gaps 1–4.Branch
Rebased onto current
main(clean, MERGEABLE) — now includes both the merged #178 poller fix and Epic #190's epic-store / dispatcher refactor (theepicNumber→epicRefrename in the adapterinstallHooks/buildPromptTextparams is adapted accordingly). Option A's launch-ordering branch was re-applied onto #190's restructuredimplementation.ts(new-work-as-base); the launch sequence and the Claude path are behaviorally unchanged from the live-verified state.Recovery note: this PR was briefly closed when middle's open-PR divergence reconciler, firing as #190 merged, reset this branch to
maininstead of rebasing the codex commits on top (the deep #190/#177 interleave inimplementation.ts). No code was lost — the approved work is recovered intact and re-verified (typecheck/lint/format/1261 tests green). The reconciler defect is filed as a follow-up.Follow-up issues
mainand closed the PR while feat(epic-store): file-backed Epic store (opt-in hybrid) #190 merged (the incident in the Recovery note above). Standalone: it's a dispatcher-reconciler defect, outside the codex-adapter surface.Summary by CodeRabbit
New Features
Tests
Bug Fixes
Documentation