Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 32 additions & 17 deletions docs/research/codex-cli-first-class-2026-04-23.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,9 +88,17 @@ CLI in the 2026 coding-agent landscape.
## 2 · The big, non-obvious win — `AGENTS.md` is already universal

Claude Code reads `CLAUDE.md` first. Codex CLI reads `AGENTS.md`
first. **Zeta's setup already has both, and the `CLAUDE.md`
explicitly delegates to `AGENTS.md`** as the universal
onboarding handbook. The relevant lines of `CLAUDE.md`:
following a precedence chain: global `~/.codex/AGENTS.override.md`
/ `~/.codex/AGENTS.md` first, then walks project root → CWD with
`AGENTS.override.md` taking precedence per directory (and a byte
cap; see [Codex AGENTS guide](https://developers.openai.com/codex/guides/agents-md)).
**Zeta's setup already has both `CLAUDE.md` and `AGENTS.md`, and
`CLAUDE.md` explicitly delegates to `AGENTS.md`** as the universal
onboarding handbook. Stage-2 readiness checks must account for
the precedence chain — environments with global overrides at
`~/.codex/AGENTS.override.md` can pass/fail ingestion checks for
reasons unrelated to the repo's `AGENTS.md` content. The relevant
lines of `CLAUDE.md`:
Comment on lines +100 to +101
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: This section warns that global ~/.codex/AGENTS.override.md can change ingestion, but it doesn’t say how Stage-2 checks should control for / detect that (e.g., instruct running in a clean home/CODEX home, or explicitly verifying which AGENTS file was loaded). Adding a concrete test precondition would make the readiness guidance actionable and less error-prone.

Suggested change
reasons unrelated to the repo's `AGENTS.md` content. The relevant
lines of `CLAUDE.md`:
reasons unrelated to the repo's `AGENTS.md` content. **Concrete
Stage-2 precondition:** run ingestion verification in a clean
Codex home (no `~/.codex/AGENTS.md` and no
`~/.codex/AGENTS.override.md`), or an equivalent isolated `HOME` /
Codex config directory, and record which AGENTS file(s) were in
effect during the run. If the harness cannot show that the repo's
`AGENTS.md` was the active source, the check is inconclusive rather
than a repo failure. The relevant lines of `CLAUDE.md`:

Copilot uses AI. Check for mistakes.

> 1. **[`AGENTS.md`](../../AGENTS.md)** — the universal
> onboarding handbook. Pre-v1 status, the three
Expand Down Expand Up @@ -190,24 +198,29 @@ and Codex-specific.
| Plan Mode | `plan_mode_reasoning_effort` config | **Parity** | Named differently; same concept. |
| Output styles (e.g., explanatory) | Not documented; may go via system-prompt override | **Gap (minor)** | Factory-side impact is small; output styles are Claude-Code-session features, not substrate. |
| Hooks (`.claude/settings.json` PreToolUse, UserPromptSubmit) | `notify` hook + shell-only PreToolUse (per OpenAI release notes for `rust-v0.117.0`, March 26 2026, [openai/codex#15211](https://github.com/openai/codex/pull/15211)) | **Partial (narrowing)** | Codex now has shell-only PreToolUse alongside the existing `notify` hook for turn completion. UserPromptSubmit and other Claude-Code-specific hook types are still gaps. Zeta's ASCII-clean pre-commit + prompt-injection lints run via git-pre-commit (harness-neutral) so the gap-impact on Zeta substrate is small. SessionStart hooks (e.g., for output style) still have no Codex equivalent. |
| Slash commands (`/loop`, `/fast`, `/help`, `/status-line-setup`) | `-m` / `--model`, profiles, plan-mode commands | **Partial** | Codex exposes fewer user-visible slash commands; model selection is via `-m` / `--model` flags + `--profile` (per `docs/research/openai-codex-cli-capability-map.md`), not via a `/model` slash command. Project-specific commands (e.g., Zeta's `/loop`) need re-authoring or re-routing through `codex exec`. |
| Slash commands (`/loop`, `/fast`, `/help`, `/status-line-setup`) | Built-in `/model`, `/compact`, etc. (per [`developers.openai.com/codex/cli/slash-commands`](https://developers.openai.com/codex/cli/slash-commands)) + `-m`/`--model` flags + `--profile` | **Parity (different roster)** | Codex CLI ships built-in slash commands including `/model` for model + reasoning-effort selection, `/compact` for context compaction, etc. Both harnesses expose slash commands; the rosters differ (Claude Code has Zeta-defined `/loop`, `/fast`; Codex has its own built-in roster). Project-specific commands (e.g., Zeta's `/loop`) need re-authoring or re-routing through `codex exec`. The capability surface is parity; the specific commands aren't 1-to-1. |
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: The /model note claims “model + reasoning-effort selection”, which reads inconsistently with the earlier statement that Codex uses profiles (--profile) rather than a discrete effort-tier enumeration. Consider clarifying how /model interacts with profiles and/or plan_mode_reasoning_effort so the doc has a single, unambiguous story for “effort” selection.

Suggested change
| Slash commands (`/loop`, `/fast`, `/help`, `/status-line-setup`) | Built-in `/model`, `/compact`, etc. (per [`developers.openai.com/codex/cli/slash-commands`](https://developers.openai.com/codex/cli/slash-commands)) + `-m`/`--model` flags + `--profile` | **Parity (different roster)** | Codex CLI ships built-in slash commands including `/model` for model + reasoning-effort selection, `/compact` for context compaction, etc. Both harnesses expose slash commands; the rosters differ (Claude Code has Zeta-defined `/loop`, `/fast`; Codex has its own built-in roster). Project-specific commands (e.g., Zeta's `/loop`) need re-authoring or re-routing through `codex exec`. The capability surface is parity; the specific commands aren't 1-to-1. |
| Slash commands (`/loop`, `/fast`, `/help`, `/status-line-setup`) | Built-in `/model`, `/compact`, etc. (per [`developers.openai.com/codex/cli/slash-commands`](https://developers.openai.com/codex/cli/slash-commands)) + `-m`/`--model` flags + `--profile` | **Parity (different roster)** | Codex CLI ships built-in slash commands including `/model` for model selection and `/compact` for context compaction. Any reasoning-effort change should be understood through the active profile/config surface (`--profile`, `plan_mode_reasoning_effort`), not as a separate standalone effort-tier picker implied by `/model`. Both harnesses expose slash commands; the rosters differ (Claude Code has Zeta-defined `/loop`, `/fast`; Codex has its own built-in roster). Project-specific commands (e.g., Zeta's `/loop`) need re-authoring or re-routing through `codex exec`. The capability surface is parity; the specific commands aren't 1-to-1. |

Copilot uses AI. Check for mistakes.
| `Task` with `isolation: "worktree"` | Built-in worktree support | **Parity** | Codex advertises worktrees as a first-class subagent feature. |
| Session compaction | Not documented | **Gap (opaque)** | Codex's handling of long sessions is unclear; Stage 2 must test. |
| Session compaction | Built-in `/compact` slash command (per [`developers.openai.com/codex/cli/slash-commands`](https://developers.openai.com/codex/cli/slash-commands)) | **Parity** | Codex CLI ships `/compact` specifically for summarizing conversation context to free tokens — same role as Claude Code's session compaction. Stage-2 should still test the trigger conditions and quality of the summary. |
| Code-review agent | Native "separate agent before commit" feature | **Parity (different shape)** | Codex integrates review into the CLI workflow directly; Zeta's equivalent is Codex-as-PR-reviewer on GitHub + the harsh-critic persona under `.claude/skills/code-review-zero-empathy/`. (Note: `/ultrareview` is a Claude Code platform feature surfaced in the harness's session prompt, not a Zeta-defined command — repo-wide search finds no in-tree definition. Listed here for surface-mapping context only; not an in-repo entrypoint.) Composes. |
| Image input / image generation | Native | **Parity+** | Codex exposes image generation in-CLI; Claude Code accepts image input only. |
| Background macOS Computer Use | Native | **Codex-specific** | No Claude Code equivalent; relevant if Zeta ever wants agent-run GUI tests. Not urgent for Otto. |
| Cloud-backed runtime | Codex Cloud | **Codex-specific** | May subsume the cron-gap by running long-lived agents in cloud; Stage 2 needs to verify. |

**Running gap score after first-pass:**

- Parity: 11 (TodoWrite reclassified Gap → Parity (different shape)
per OpenAI's Sept 15 2025 Codex CLI to-do-list announcement)
- Partial: 5 (cron/autonomous-loop reclassified Likely-gap →
Partial (different surface) per
`developers.openai.com/codex/app/automations` thread-automation
primitive)
- Gap: 2 (no longer including cron — autonomous-loop is reachable
via Codex Cloud thread automations)
**Running gap score after first-pass + post-merge cascade
reclassifications:**

- Parity: 13 (TodoWrite Gap → Parity (different shape) per
OpenAI's Sept 15 2025 announcement; Slash commands Partial →
Parity (different roster) per Codex CLI built-in
`/model`/`/compact`/etc. roster; Session compaction Gap →
Parity per Codex CLI built-in `/compact` — both per
`developers.openai.com/codex/cli/slash-commands`)
- Partial: 4 (cron/autonomous-loop Likely-gap → Partial (different
surface) per `developers.openai.com/codex/app/automations`
thread-automation primitive; slash-commands removed from this
bucket)
- Gap: 1 (Output styles only; cron and session-compaction both
moved to Parity-class buckets)
- Codex-specific: 2

(Score subject to Stage 2 verification — these are first-pass
Expand Down Expand Up @@ -286,8 +299,10 @@ sidesteps that problem for Phase 1 Codex research**.
**Nice-to-have (low friction, low impact):**

1. Output-style / explanatory-mode parity.
2. Session compaction behaviour parity.
3. Slash-command name-parity (Zeta's `/loop` etc.).
2. Slash-command roster parity (Zeta's project-specific commands
like `/loop` need re-authoring or routing through `codex exec`;
Codex CLI's built-in roster includes `/model`/`/compact` and
covers a different subset of session-management needs).

**Codex-specific we don't need today:**

Expand Down
Loading