research: Codex CLI first-class session — Phase 1 (Stage 1 of 5 per PR #228) by AceHack · Pull Request #231 · Lucent-Financial-Group/Zeta

AceHack · 2026-04-24T01:11:23Z

Summary

Executes Stage 1 (research tick, S-effort) of the 5-stage arc named in PR backlog: first-class Codex-CLI session experience (P1, Aaron Otto-75 directive) #228's BACKLOG row for first-class Codex-CLI session experience.
Surfaces a major non-obvious win: Zeta's AGENTS.md is already what Codex CLI reads natively, so Zeta is ~60% Codex-ready by accident of prior decisions.
First-pass capability matrix: 11 parity / 5 partial / 2 gap / 2 Codex-specific (matches the doc's running-gap-score after two reclassifications: TodoWrite Gap → Parity (different shape) per OpenAI's Sept 15 2025 announcement, and CronCreate/ScheduleWakeup Likely-gap → Partial (different surface) per Codex Cloud thread automations at developers.openai.com/codex/app/automations).

Key findings

AGENTS.md parity — both harnesses read it natively; CLAUDE.md already delegates to it. Free win.
Capability matrix — most dev-work tools have parity (Bash/Edit/Read/Write/MCP/WebSearch). Subagents + worktrees present in both. Plan Mode mirrors.
Autonomous-loop cadence reachable — Codex Cloud's thread-automations primitive (cron syntax + minute heartbeats + daily/weekly schedules) gives Otto-in-Codex a different-surface partial for the * * * * * <<autonomous-loop>> fire pattern. Stage 2 must verify the agent-facing API for arming/listing automations.
Partial gaps — skills portability (covered by cross-harness-mirror-pipeline), hooks (narrowing per openai/codex#15211), slash commands.
Account setup — already aligned (ServiceTitan across Claude Code + Codex CLI per Aaron Otto-76).

Stage-2 plan

7 concrete test prompts for the parity matrix (AGENTS.md reading via structural-discriminator, subagent dispatch, MCP invocation, Codex Cloud thread-automation API surface verification, codex exec repo-local probe, git-worktree isolation, session resumption).

Scope limits

Does NOT commit to harness swap.
Does NOT propose implementing a Codex-mode Otto.
Does NOT modify AGENTS.md.
Does NOT duplicate cross-harness-mirror-pipeline.

Test plan

9+ web sources cited against April 2026 snapshot
Parity matrix first-pass covers every tool Otto routinely uses
Cron/autonomous-loop reframed from critical-gap-blocker to reachable-via-different-surface
Stage-2 has concrete actionable test prompts (repo-local probes only — no external-dependency couplings)
Sibling composition with PR backlog: first-class Codex-CLI session experience (P1, Aaron Otto-75 directive) #228 / PR backlog: P3 multi-account access design — safety-first (Aaron Otto-76, low-priority) #230 / mirror-pipeline row explicit

🤖 Generated with Claude Code

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8d5ca82de7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copilot

Pull request overview

Adds a Stage-1 research writeup for making Codex CLI a first-class session harness for Zeta (per the staged plan in PR #228), focusing on parity assessment and identifying key gaps.

Changes:

Introduces a Phase 1 / Stage 1 research doc describing Codex CLI surfaces, configuration, and authentication.
Adds a first-pass capability parity matrix vs Claude Code, highlighting gaps (notably cron/scheduling for autonomous loop).
Documents a Stage-2 test plan and external reference links.

…phase sequence, Aminata blocking gate) (#233) Aaron Otto-76 named-agent-email-ownership directive crystallises three memory layers + task #240 into an executable path: - 2026-04-20 four hard rules (never Aaron address; disclose agent-not-human; name project + why-contacted; recipient-UX- first). - 2026-04-22 two-lanes + standing Playwright signup authorisation + free-tier constraint + provider-choice autonomy. - 2026-04-23 autonomy-envelope with email carve-out (agents own their email; parallel ownership allowed; aaron_bond@yahoo.com test target; "don't be a dick" soft constraint). - Task #240 signup-terrain mapping (complete). Five explicit phase gates: - Phase 0: complete (signup terrain mapped). - Phase 1: persona-email-identity design doc (8 questions — persona choice, handle, provider, recovery cascade, 2FA, lanes, signature, reputation posture). - Phase 2: Aminata threat-model pass (BLOCKING gate — new attack surface, recovery abuse, phishing attribution, employer-policy interaction). - Phase 3: Playwright signup execution (bounded; single persona, single provider, DP-NNN.yaml evidence record). - Phase 4: Test send to aaron_bond@yahoo.com. - Phase 5: Memory capture + BP-NN promotion review. Scope limits explicit: - Does NOT authorise execution this tick. - Does NOT authorise email use bypassing maintainer visibility. - Does NOT allow parallel acquisition without explicit Phase 1 design choice. - Does NOT bypass Aminata blocking gate. Composes with: PR #230 (multi-account Phase-2 gating is sibling pattern); PR #231 (Codex is harness-neutral); decision-proxy-evidence (PR #222) for Phase 3 records; persona roster for persona-choice question. Filed under `## P2 — research-grade`. Effort M total; spread across 3-5 ticks. Otto-77 tick deliverable.

…+ primary-switch-by-Aaron-context + symmetric-parity) (#236) Aaron Otto-78 two-message refinement of the existing first- class-Codex-CLI BACKLOG row (PR #228). Message 1: parallel-design directive — Codex CLI designs its own skill files asynchronously to Otto (only touching its own substrate); each harness researches its own features on a cadence; both harnesses get full-featured wrappers (loops, memory enhancements, hooks, etc.); asymmetry between harnesses tracked explicitly. Message 2: primary-switch clarification — "only one will be the primary either you or codex which ever one i'm in at the time". Primary = whichever harness Aaron is actively in at that moment; the other runs async controlled-by-primary; when Aaron switches, roles swap. Symmetric feature parity required ("got to have all your fancyness and skills"). Refinement composes as extension of the existing 5-stage arc: - Stage 1 (existing, PR #231) — Otto researches Codex from Otto-side. - Stage 1b (new) — Codex CLI researches Claude Code from Codex-side (inverted roles). - Stage 2 (joint) — parity matrix combines both sides. - Stage 3 (each on own surface) — Codex CLI designs own skill files; Otto designs Claude-Code-specific wrappers. - Stage 4 (synchronization cadence) — both sides run periodic harness-features research; asymmetry inventory maintained. - Stage 5 (harness-choice ADR) — retains revisitable primary designation. Scope limits: - No Otto-ceding-control (Otto primary while Aaron in Claude Code, which is now). - No cross-edit of other harness's substrate. - No forced harness swap. - ADR still the gate for any primary-reset. Composes with cross-harness-mirror-pipeline (that row = universal-skill distribution; this row = harness-specific- skill parallel-authoring), multi-account design (PR #230), Phase-1 Codex research (PR #231), and the first-class roster memory. Otto-78 tick split-attention deliverable (alongside primary 5th-ferry absorb PR #235).

…substrate entry-point Aaron Otto-102 directive: "there are files in the drop including a skill created with the openai skill creator so it seems like codex should use this and integrate with this like you did with your skill creator please absorb and delete/remove items from the drop folder, there is a sample skill in tere created by the oopenai skill creator too". Establishes .codex/ as Codex CLI's harness-specific substrate parallel to .claude/ per Otto-79 "each harness owns its own named loop agent; each harness authors its own skill files" discipline. Files landed: - .codex/README.md — harness-specific entry-point parallel to CLAUDE.md; names layout + convention + Otto/Codex-skill edit-boundary + bootstrap story + skill-authorship convention + provenance. - .codex/skills/idea-spark/SKILL.md — OpenAI-Skill-Creator- generated brainstorming helper. Frontmatter + 3-option- spread workflow + naming/positioning/experiment sub-patterns. - .codex/skills/idea-spark/agents/openai.yaml — vendor- specific agent config (display_name: "Idea Spark"). - .codex/skills/idea-spark/references/idea-patterns.md — on-demand reference content (expansion lenses + option styles + tiny experiment template). Boundary discipline (per Otto-79 cross-session-review-yes- cross-edit-no): Otto (Claude Code loop agent) does NOT edit .codex/skills/** as normal work. This initial landing was a substrate-setup action only. Future Codex CLI sessions author + maintain. drop/ folder disposition at Otto-102: - skill.zip → extracted + deleted (substrate preserved here). - usageReport CSV → deleted (non-substrate; 9KB usage data). - aurora-initial-integration-points.md (40.5KB) → PRESERVED in drop/ pending Otto-103 dedicated absorb as 9th ferry (retroactive). Per CC-002 discipline; drop/ is gitignored per PR #265 so no accidental check-in risk. - aurora-integration-deep-research-report.md (25.4KB) → PRESERVED in drop/ pending Otto-104 dedicated absorb as 10th ferry (retroactive). Scheduling-memory filed: memory/project_amara_drop_folder_9th_and_10th_ferry_research_ reports_pending_absorb_otto_103_104_2026_04_24.md — names Otto-103 + Otto-104 absorb plan, disposition of all 4 drop/ items, Aaron's "absorb and delete" directive literal-honoring timeline. Composes with: - PR #228 Codex-first-class BACKLOG row — .codex/ is substrate-support for that 6-stage arc. - PR #231 Phase-1 Codex CLI research — identified the AGENTS.md-already-universal parity finding; this landing extends to harness-specific substrate. - Otto-79 peer-harness refinement memory — "each harness owns its own". - PR #265 drop/ gitignore. Lands within-standing-authority per Otto-82/90/93 calibration — Aaron's explicit directive + Codex-harness-substrate setup action; not gated. Aaron message-ending Otto-102: "when you get a second end your loop i'm going to exit and update you" — Otto-102 closes gracefully; autonomous loop ends after tick-history row + push.

…autonomy-envelope absorb Otto-76 tick closed with three substantive landings despite high-directive-velocity mid-tick: - PR #230 — P3 multi-account access design BACKLOG row (3 Aaron refinements landed same branch: initial → "design allowed now, implementation gated on security review" → "poor-man-tier no-paid-API-keys hard requirement"). - PR #231 — Codex CLI Phase-1 research (Stage 1 of 5 per PR #228); 294-line doc; surfaces AGENTS.md-is-already- universal free-win finding; 10/4/4/2 capability-parity breakdown. - Three per-user memory captures (account snapshot, split-attention+composition endorsed, agent-autonomy- envelope with email carve-out). Key observations (from the row's Observations column): 1. Directive-churn != tick-failure. Split-attention pattern held under 4x directive rate. 2. AGENTS.md parity de-risks first-class-Codex support (portability-by-design was retroactively validated). 3. Named-agent-email-ownership carve-out is substantive agent-autonomy expansion (email = reputation surface). 4. Poor-man-tier vs enterprise-API-tier distinction is load-bearing for multi-account design. Stacked on top of Otto-75 tick-history branch so it shows as atop that row in diff preview. Independent of PR #229 merge timing.

…450) Otto-268 follow-on: drain-log for the post-merge cascade PR #429 following parent #270 (research: multi-Claude peer-harness experiment design). Per Otto-250 training-signal discipline. Captures two substantive findings: 1. **Memory-substrate disambiguation** (P1): AutoMemory (out-of-repo, per-user at `~/.claude/projects/<slug>/memory/`) vs git-tracked in-repo `memory/` (the forward-mirror substrate landed via Otto-114). Conflating in experiment-design produces wrong- detection-mechanism findings downstream. Fix: name both surfaces + the forward-mirror relationship + per-surface mechanism (git diff/reflog vs filesystem hash compare). Same shape as implementation-vs-math-definition tension on #206. 2. **Severity-bolding consistency** (P2): markdown-rendering class where third CRITICAL had inconsistent bold; uniformity matters for at-a-glance scanning + grep-ability. Future doc-lint candidate. Pattern observation: experiment-design docs benefit from per- surface mechanism tables — same shape as parity-matrix on #231; table-form documents reduce surface for omission-class findings.

…ty map) Otto-268 backfill: drain-log for PR #231 — textbook case of the post-merge reviewer-cascade pattern. 9 threads drained across 4 waves (2 + 1 + 3 + 2 + 1 cascade pattern); every commit triggered a fresh Codex/Copilot review wave catching new factual issues against the freshly-changed surface. Per Otto-250 training-signal discipline. Pattern observations capture four load-bearing patterns: 1. Post-merge reviewer-cascade as dominant pattern; wave-by-wave the findings shift class (structural → rendering → internal-consistency → version-currency). 2. Codex enforces version-currency on the doc itself — Wave 4 reclassifications cite OpenAI release notes (Sept 15 2025 + March 26 2026); reviewer-enforces-rule pattern is the inverse of CLAUDE.md author-side version-currency rule. 3. "Partial (narrowing)" status annotation as a useful sub-state for gaps that are shrinking on measurable schedule. 4. Discriminator-falsification finding pattern (AGENTS.md-read test relying on values repeated in same doc) — same shape as randomized-canary in security testing.

…-log Multiple Copilot threads on #445 caught count-mismatch errors: - Header claimed "9 across 4 waves (2 + 1 + 3 + 2 + 1 cascade)" — the "+1 cascade" doesn't correspond to any Wave 5 section in the body. Body has exactly 4 waves with 2+1+3+2=8 thread-sections. - Final-resolution claimed "All 9 threads resolved" — should be 8. Fix: aligned both header + final-resolution to the body's actual 8-thread-across-4-waves structure. Wave breakdown explicit (Wave 1: 2, Wave 2: 1, Wave 3: 3, Wave 4: 2 — sum 8) so the math is verifiable from the heading. Same count-vs-list cardinality pattern documented in `_patterns.md` (Class B in PR #465 BACKLOG row) — the doc-lint suite would catch this at author-time.

…rain-log Multiple Codex/Copilot threads on #444 caught: - L16: '3 were Otto-279' → '2 were Otto-279' (matches body's Threads C1-C2 = 2 OTTO-279 SURFACE-CLASS). - L22: 'Outcome distribution: 4 OTTO-279' → '2 OTTO-279 + 2 dups' (matches L161 final-resolution math: 4 + 5 + 2 + 2 dups = 13). - L56: Thread A3 'Copilot P1 ×2' → 'Copilot P1 ×3' (3 thread IDs listed: ejy1 + eenN + eenr). - L87: non-portable `grep -i "actionlint\|shellcheck"` → portable `grep -iE "actionlint|shellcheck"` (BSD/macOS grep doesn't support `\|` BRE alternation; the `-E` extended-regex form is POSIX-portable). Captured the rationale inline so the verification command actually works on macOS. Same count-vs-list cardinality pattern (Class B in PR #465 doc-lint suite BACKLOG row) — third drain-log of mine to exhibit it (after #195 and #231). The shellcheck-rule-precision class also surfaces via the `\|` portability finding (related to SC2086-vs-SC2046 from #427 drain-log).

Codex P2 + Copilot threads on #437 caught: - Lines 6-7 fragment + count mismatch: header said '10 unresolved ..., 1 P1' (suggesting 11) while body summarized 14 = 10 first- wave + 4 second-wave. Reworded into a single unambiguous summary: '10 unresolved at first-wave; post-merge cascade then surfaced 3 more (1 Codex P1 + 2 Copilot P2). Total 13.' - Second-wave header '1 P1 + 3 P2 post-merge cascade' → '1 Codex P1 + 2 Copilot P2 — 3 threads total' (only 3 thread sections A/B/C exist in body). - Pattern observation 2 'Stale-resolved-by-reality at ~70%' (7 of 14) → '~54%' (7 of 13) matching corrected total. - Final-resolution 'All 14 threads' → 'All 13 threads (10 first- wave + 3 second-wave)'. Same count-vs-list cardinality pattern as #195/#231/#377/#444 drain-log fixes — fourth instance in my own logs. Strong validation that doc-lint Class B (PR #465 BACKLOG) would compound.

* hygiene(#268): pr-preservation drain-log for #135 (auto-loop-35 Itron mapping) Otto-268 backfill task: drain-log for PR #135 covering 14 total threads across 2 waves (10 first-wave pre-merge + 4 second-wave post-merge cascade). Per Otto-250 training-signal discipline: full per-thread record with reviewer authorship, severity, outcome class (FIX / STALE-RESOLVED-BY- REALITY / OTTO-279 SURFACE-CLASS), and resolution path. Pattern observations capture the three load-bearing patterns: Otto-279 as mature uniform reply stamp; stale-resolved-by-reality at ~70% on this PR; Codex catching subset-vs-superset framing errors in benchmark canonical definitions (DORA / K-relations). * drain(#437 follow-up): fix count mismatches in #135 drain-log Codex P2 + Copilot threads on #437 caught: - Lines 6-7 fragment + count mismatch: header said '10 unresolved ..., 1 P1' (suggesting 11) while body summarized 14 = 10 first- wave + 4 second-wave. Reworded into a single unambiguous summary: '10 unresolved at first-wave; post-merge cascade then surfaced 3 more (1 Codex P1 + 2 Copilot P2). Total 13.' - Second-wave header '1 P1 + 3 P2 post-merge cascade' → '1 Codex P1 + 2 Copilot P2 — 3 threads total' (only 3 thread sections A/B/C exist in body). - Pattern observation 2 'Stale-resolved-by-reality at ~70%' (7 of 14) → '~54%' (7 of 13) matching corrected total. - Final-resolution 'All 14 threads' → 'All 13 threads (10 first- wave + 3 second-wave)'. Same count-vs-list cardinality pattern as #195/#231/#377/#444 drain-log fixes — fourth instance in my own logs. Strong validation that doc-lint Class B (PR #465 BACKLOG) would compound.

…rain-log Multiple Codex/Copilot threads on #461 caught: - L7 'Thread count at drain: 3' → '4' (body has Threads 1-4). - L17 'Codex caught three findings' → 'four' matching body. - L122 'merged to main' → 'merged to main as `5698f9d`' for consistency with other drain-logs that include the merge SHA for auditability. Same count-vs-list cardinality pattern (Class B in PR #465 doc-lint suite BACKLOG row) — 5th instance in my own drain-logs (#195 / #231 / #377 / #135 / #430). The pattern is genuinely universal author- side; even when explicitly aware of it, instances slip through.

Codex P2 + Copilot P1+P2 caught: - Inline code span split across newline (`docs/` on one line, `research/openai-codex-cli-capability-map.md` on the next) — reflowed to single-line so the path renders as one token. - Capability-map cluster listed `docs/research/codex-cli-first- class-2026-04-23.md` as if in-tree, but PR #231 is still OPEN at time of this drain-log so the file isn't yet in main. Reframed as 'pending merge of PR #231; will be in-tree once that PR lands' with the in-tree `openai-codex-cli-capability-map.md` listed first. Same forward-author-to-future-state-of-main drift class as #377 (38% stale-resolved density). The drain-log itself exhibits the pattern it documents — cited a forthcoming-but-not-yet-landed file as if already present. Inline-code-span line-wrap is the 5th observation of that class in the corpus (now: #191 / #195 / #219 / #423 / #460). At this density the doc-lint Class A (PR #465 BACKLOG) is high-leverage automation.

…dex) (#461) * hygiene(#268+): pr-preservation drain-log for #430 (#221 follow-up Codex) Otto-268 follow-on: drain-log for the 4-finding cascade PR #430 (post-merge follow-up to #221 Amara 4th courier ferry absorb). Captures four substantive Codex post-merge corrections. Per Otto-250 training-signal discipline. Pattern observations: 1. Verbatim-claim accuracy under absorbing-side annotation — "preserved verbatim" claims must reflect any absorbing-side annotations (proposal-flag markers, footnotes, inline bracketing). Same shape as #235's "byte-for-byte ... excluding whitespace" contradiction fix. 2. Count-vs-list cardinality is now a 4th-observation pattern (#191 / #219 / #430 / #85). At this density, pre-commit-lint candidate: regex on "N drift classes / phases / audits / items" patterns + count the surrounding list to verify. 3. Terminology drift between parent absorb + canonical vocabulary ("decision-proxy-consult" vs canonical "decision-proxy-evidence") is recurring. Fix template: align absorption-notes text to canonical; preserve verbatim ferry content per Otto-227. 4. Stabilize effort-summary correction is a concrete instance of "claim summary doesn't match per-item tally" — future doc-lint candidate (sum-vs-tally check). * drain(#461 follow-up): fix count mismatches + add merge SHA in #430 drain-log Multiple Codex/Copilot threads on #461 caught: - L7 'Thread count at drain: 3' → '4' (body has Threads 1-4). - L17 'Codex caught three findings' → 'four' matching body. - L122 'merged to main' → 'merged to main as `5698f9d`' for consistency with other drain-logs that include the merge SHA for auditability. Same count-vs-list cardinality pattern (Class B in PR #465 doc-lint suite BACKLOG row) — 5th instance in my own drain-logs (#195 / #231 / #377 / #135 / #430). The pattern is genuinely universal author- side; even when explicitly aware of it, instances slip through.

…#460) * hygiene(#268+): pr-preservation drain-log for #428 (#126 follow-up Gemini xref) Otto-268 follow-on: drain-log for the targeted single-finding cascade PR #428 (post-merge follow-up to parent #126 Grok CLI capability map). Captures one Gemini capability-map cross-reference truth-update. Per Otto-250 training-signal discipline. Pattern observations: 1. Cross-capability-map xref consistency is its own class. The repo has a growing family of CLI capability maps (Codex / Grok / Gemini / Claude Code) that form a related-document cluster needing joint cross-reference maintenance. Future doc-lint candidate: maintain manifest of related-document clusters and warn on edit-without-sweep. 2. Multi-CLI capability-map family is its own substrate pattern. Worth documenting in `_patterns.md`: when multiple capability maps cover overlapping but distinct CLIs, they form a cluster that benefits from shared structure (status taxonomy, parity- matrix shape, score-summary conventions) and joint cross-reference maintenance. 3. Targeted single-finding follow-ups are the cheapest cascade shape — 1 finding / 1 commit / 1 merge gate. Cascade-pattern amortized cost is dominated by the few-thread cascades. URL → PR-number defensive pattern continues (lesson from #454/#455 collision earlier this session). * drain(#460 follow-up): fix capability-map xref + inline-code-span split Codex P2 + Copilot P1+P2 caught: - Inline code span split across newline (`docs/` on one line, `research/openai-codex-cli-capability-map.md` on the next) — reflowed to single-line so the path renders as one token. - Capability-map cluster listed `docs/research/codex-cli-first- class-2026-04-23.md` as if in-tree, but PR #231 is still OPEN at time of this drain-log so the file isn't yet in main. Reframed as 'pending merge of PR #231; will be in-tree once that PR lands' with the in-tree `openai-codex-cli-capability-map.md` listed first. Same forward-author-to-future-state-of-main drift class as #377 (38% stale-resolved density). The drain-log itself exhibits the pattern it documents — cited a forthcoming-but-not-yet-landed file as if already present. Inline-code-span line-wrap is the 5th observation of that class in the corpus (now: #191 / #195 / #219 / #423 / #460). At this density the doc-lint Class A (PR #465 BACKLOG) is high-leverage automation.

…reword) Otto-268 backfill: drain-log for PR #435 (drain follow-up to #148: why-the-factory-is-different live-lock cadence claim + grammar), covering 3 threads across 2 waves with a clean self-induced-cascade pattern. Per Otto-250 training-signal discipline. Pattern observations capture four load-bearing patterns: 1. Cross-reviewer convergence on Wave 1 (Codex P2 + Copilot P1 flagging the same missing-FACTORY-HYGIENE-row) raised quality signal — same shape as #432's `warn` unbound finding. 2. Self-induced cascade: my Wave-1 fix introduced the Wave-2 finding (claim "separate BACKLOG items" implied plural; actual BACKLOG state is one row with multiple sub-items). Pattern: when fixing a claim, verify the new claim is also accurate against current-state. 3. Reword-option-(a)-vs-(b) decision template generalizes: when doc asserts X but X doesn't exist, prefer reword-to-current-truth over add-the-thing-asserted (unless thing is small + isolated). 4. PR-mechanics: 4 of 7 cascade-PRs in this session (#135, #231, #432, #435) went through wave-1 + wave-2 cascade pattern; the reviewer-cascade is a consistent property of the merge-trigger surface, not a per-PR oddity. Closes the session-drain-log backfill (Otto-268) for the major PRs drained in this session: #135 / #235 / #432 / #434 / #195 / #219 / #206 / #377 / #231 / #85 / #435 (11 PRs total covered across drain logs #437-#447).

…reword) (#447) * hygiene(#268): pr-preservation drain-log for #435 (live-lock cadence reword) Otto-268 backfill: drain-log for PR #435 (drain follow-up to #148: why-the-factory-is-different live-lock cadence claim + grammar), covering 3 threads across 2 waves with a clean self-induced-cascade pattern. Per Otto-250 training-signal discipline. Pattern observations capture four load-bearing patterns: 1. Cross-reviewer convergence on Wave 1 (Codex P2 + Copilot P1 flagging the same missing-FACTORY-HYGIENE-row) raised quality signal — same shape as #432's `warn` unbound finding. 2. Self-induced cascade: my Wave-1 fix introduced the Wave-2 finding (claim "separate BACKLOG items" implied plural; actual BACKLOG state is one row with multiple sub-items). Pattern: when fixing a claim, verify the new claim is also accurate against current-state. 3. Reword-option-(a)-vs-(b) decision template generalizes: when doc asserts X but X doesn't exist, prefer reword-to-current-truth over add-the-thing-asserted (unless thing is small + isolated). 4. PR-mechanics: 4 of 7 cascade-PRs in this session (#135, #231, #432, #435) went through wave-1 + wave-2 cascade pattern; the reviewer-cascade is a consistent property of the merge-trigger surface, not a per-PR oddity. Closes the session-drain-log backfill (Otto-268) for the major PRs drained in this session: #135 / #235 / #432 / #434 / #195 / #219 / #206 / #377 / #231 / #85 / #435 (11 PRs total covered across drain logs #437-#447). * drain(#447 follow-up): fix #435 drain-log Reviewer field + stable-identifier xref Codex P2 + Copilot threads on #447 caught: - Thread 1.2 missing the `Reviewer:` field even though the drain-log schema (intro paragraph) declares per-thread reviewer authorship. Added `Reviewer: copilot-pull-request-reviewer`. - Stale `docs/BACKLOG.md lines 1313-1328` citation: those lines now contain the Server Meshing section; the live-lock-smell cadence row drifted to ~L1452 in the P1 tooling section. Replaced with the stable identifier (heading text 'Live-lock smell cadence (round 44 auto-loop-46 absorb, landed as `tools/audit/ live-lock-audit.sh` + hygiene-history log)') so future readers don't chase a moving line-number target. Same stable-identifier-vs-line-number-xref pattern flagged on #423's `near line 4167` finding. Documented in `_patterns.md` — line numbers decay on every adjacent edit; stable identifiers decay only on rename. Adopting heading text as the stable cite. The bare `:111`/`:113` thread location format (Otto-250 file:line shape conformance) is the broader Otto-268-wave divergence documented in PR #467 known-divergence section — deferred to maintainer review per that framing.

…arch) (#444) * hygiene(#268): pr-preservation drain-log for #377 (setup-tooling research) Otto-268 backfill: drain-log for PR #377 covering 13 threads — notable for high stale-resolved density (38%, 5 of 13) where the doc was authored against a future-state of main that adjacent PRs landed during the review window. Per Otto-250 training-signal discipline. Pattern observations capture four load-bearing patterns: 1. High stale-resolved density (38%) when research doc forward- authors against future state of main; adjacent PRs landing produces natural drift. 2. "CLAUDE.md-level rule" cite shape is undisciplined — Otto-NNN IDs live in memory files; CLAUDE.md has the rule shapes. Fix template for any factory-rule cross-reference. 3. Runner-matrix vs current-truth drift is recurring; research docs need explicit "post-#NNN landing" annotations. 4. Otto-114 forward-mirror landing is a high-leverage substrate improvement — converts memory-file dangling-citation findings from re-fix-required to verify-and-resolve. * drain(#444 follow-up): correct Otto-248 memory file path in #377 drain-log Codex P1 caught that the cited memory file path in #377's drain-log () doesn't exist; actual file is the longer . This was a fix-induced citation error inherited from #377's research doc (which used the same wrong abbreviated path). Both #377 and #444 needed correction — landed paired (#377 force-pushed earlier this tick, #444 corrected here). The drain-log inherited the wrong citation from the research doc it was logging. * drain(#444 follow-up): fix count mismatches + portable grep in #377 drain-log Multiple Codex/Copilot threads on #444 caught: - L16: '3 were Otto-279' → '2 were Otto-279' (matches body's Threads C1-C2 = 2 OTTO-279 SURFACE-CLASS). - L22: 'Outcome distribution: 4 OTTO-279' → '2 OTTO-279 + 2 dups' (matches L161 final-resolution math: 4 + 5 + 2 + 2 dups = 13). - L56: Thread A3 'Copilot P1 ×2' → 'Copilot P1 ×3' (3 thread IDs listed: ejy1 + eenN + eenr). - L87: non-portable `grep -i "actionlint\|shellcheck"` → portable `grep -iE "actionlint|shellcheck"` (BSD/macOS grep doesn't support `\|` BRE alternation; the `-E` extended-regex form is POSIX-portable). Captured the rationale inline so the verification command actually works on macOS. Same count-vs-list cardinality pattern (Class B in PR #465 doc-lint suite BACKLOG row) — third drain-log of mine to exhibit it (after #195 and #231). The shellcheck-rule-precision class also surfaces via the `\|` portability finding (related to SC2086-vs-SC2046 from #427 drain-log). * hygiene(#444): reconcile 377 drain-log outcome distribution math Codex P2 + Copilot both caught: header said '4 FIX + 2 dups' but Section A enumerates 6 FIX thread-IDs (A1×1 + A2×2 + A3×3) and Section B enumerates 5 STALE thread-IDs (B5 explicit dup of B3). Header didn't match the per-section enumeration end-to-end; intro prose ('3 were real-fix factual corrections' + '2 were combined') disagreed with the header in turn. Pick a single counting rule (by thread-ID) and apply it consistently: - 6 FIX (3 unique findings, 3 duplicate reviewer threads on the same fixes — combined into one fix commit c8d91b5) - 5 STALE-RESOLVED-BY-REALITY (4 unique + 1 dup B5≡B3) - 2 OTTO-279 - = 13 thread-IDs covering 9 unique findings Fix header + intro prose + final-resolution all to match this single rule. The 'unique findings' count (9) is preserved in parentheses for cross-reference.

…ty map) (#445) * hygiene(#268): pr-preservation drain-log for #231 (Codex CLI capability map) Otto-268 backfill: drain-log for PR #231 — textbook case of the post-merge reviewer-cascade pattern. 9 threads drained across 4 waves (2 + 1 + 3 + 2 + 1 cascade pattern); every commit triggered a fresh Codex/Copilot review wave catching new factual issues against the freshly-changed surface. Per Otto-250 training-signal discipline. Pattern observations capture four load-bearing patterns: 1. Post-merge reviewer-cascade as dominant pattern; wave-by-wave the findings shift class (structural → rendering → internal-consistency → version-currency). 2. Codex enforces version-currency on the doc itself — Wave 4 reclassifications cite OpenAI release notes (Sept 15 2025 + March 26 2026); reviewer-enforces-rule pattern is the inverse of CLAUDE.md author-side version-currency rule. 3. "Partial (narrowing)" status annotation as a useful sub-state for gaps that are shrinking on measurable schedule. 4. Discriminator-falsification finding pattern (AGENTS.md-read test relying on values repeated in same doc) — same shape as randomized-canary in security testing. * drain(#445 follow-up): fix thread/wave count mismatches in #231 drain-log Multiple Copilot threads on #445 caught count-mismatch errors: - Header claimed "9 across 4 waves (2 + 1 + 3 + 2 + 1 cascade)" — the "+1 cascade" doesn't correspond to any Wave 5 section in the body. Body has exactly 4 waves with 2+1+3+2=8 thread-sections. - Final-resolution claimed "All 9 threads resolved" — should be 8. Fix: aligned both header + final-resolution to the body's actual 8-thread-across-4-waves structure. Wave breakdown explicit (Wave 1: 2, Wave 2: 1, Wave 3: 3, Wave 4: 2 — sum 8) so the math is verifiable from the heading. Same count-vs-list cardinality pattern documented in `_patterns.md` (Class B in PR #465 BACKLOG row) — the doc-lint suite would catch this at author-time.

Two Copilot P2 catches on citation auditability: - L179: TodoWrite row cites 'OpenAI's Introducing upgrades to Codex post, Sept 15 2025' but Reference section had no link entry. Add inline link to https://openai.com/index/introducing-upgrades-to-codex/ so the claim is auditable over time. - L188: '#15211' was unqualified (which tracker?). Change to the fully-qualified [openai/codex#15211] with link to openai/codex#15211. External-source-verifiability-gap pattern per docs/pr-preservation/_patterns.md.

…449) * hygiene(#268+): pr-preservation drain-log for #427 (#133 follow-up) Otto-268 follow-on: drain-log for the post-merge cascade PR #427 following parent #133 (research: secret-handoff protocol options). Per Otto-250 training-signal discipline. Captures two specific fixes from the cascade wave: 1. **Shellcheck SC2086 → SC2046 correction**: prior rationale cited the wrong shellcheck rule. SC2046 covers unquoted command substitution `$(...)`; SC2086 covers unquoted variable expansion `$var`. Pre-commit-lint candidate: regex check on shellcheck SC-NNNN claims against the actual rule applying to cited code shape. 2. **Status-banner truth-update**: doc-claim-staleness during review window, same class as #135 DORA canonical definitions and #231 Wave-4 version-currency reclassifications. Pattern observation: drain follow-ups for substantive PRs are themselves often small + targeted; substantive technical content gets first-wave attention, small cleanups land as separate follow-ups when they don't gate merge. * hygiene(#449): reflow maintainer-asleep across line break Copilot P2 catch: previous wrap split "maintainer-asleep" mid-token as "maintainer-" / "asleep" which renders with extra space ("maintainer- asleep") in Markdown. Reflow so the hyphenated compound stays on a single line. Class A pattern (inline-code-span / hyphen line-wrap) per docs/pr-preservation/_patterns.md.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 32f1663e07

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…te §5 consistency Two Codex post-merge cascade catches: - L303 P1 (discriminator self-reference, recurrence of earlier Cursor finding): the AGENTS.md-ingestion test was non-causal because the proposed discriminator (the build-and-test command pair) was quoted inline in this doc, so reading the research doc would suffice — false-positive readiness signal. Replace with structural reference only ('the build-gate section of AGENTS.md') + explicit instruction that the evaluator (not the doc) holds the canonical answer string. The discriminator surface no longer names any property/file/phrase that appears in this doc, so the only way to satisfy the prompt is to actually read AGENTS.md. Same shape as Otto-231's earlier discriminator-falsification finding. - L260 P2: §5 still treated TodoWrite as 'analogue unclear', but the parity matrix and roll-up classify it as Parity (different shape) per OpenAI's Sept 15 2025 announcement. Reconcile §5 to match the matrix so Stage-2 prioritization is reproducible from any section.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b80554fac0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

CI markdownlint job was failing on this PR with 11 errors: - MD032 (blanks-around-lists) at lines 36, 42, 50, 69, 198 — bold intro lines (**Install:** / **Authentication:** / **Key surfaces:** / **Config surface:** / **Running gap score**) immediately followed by list items with no separating blank line. Add blank lines. - MD029 (ol-prefix) at lines 253, 258, 267, 274-276 — ordered list items in §5 numbered 2-4 (Important) then 5-7 (Nice-to- have) across heading breaks; markdownlint sees each block as a new list that should restart at 1. Renumber to 1-3 in each block; priority ordering preserved via the bold sub-heading context. These are pre-existing failures not introduced by my drain-fixes, but they block CI auto-merge so worth fixing.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

…exec probe Two Codex post-merge cascade catches: - L189 P1: cron-row was 'Likely gap (not documented)' but Codex Cloud has a documented thread-automations primitive at developers.openai.com/codex/app/automations covering custom cron syntax + minute-based heartbeat + daily/weekly schedules. Verified via WebSearch (April 2026 docs current). Reclassify to Partial (different surface) — local CLI doesn't expose it, Codex Cloud does. Update gap-score totals: was 11/4/3/2 with cron as critical; now 11/5/2/2 with cron reachable via cloud-thread surface. Update §3 'biggest single gap' prose + §5 'critical' → 'high-priority (reframed)' section. Verify- version-currency rule applied (CLAUDE.md memory feedback). - L335 P2: Stage-2 `codex exec` probe used 'list the top 5 open PRs on LFG' which couples to GitHub access — failures from missing creds / repo visibility / network policy would look like exec parity failures. Replace with repo-local probe ('count the .fs files under src/Core/ and report the count and the longest filename') that exercises exec semantics without external dependencies. Same shape as the Otto-231 discriminator-falsification class: probe surface must isolate the property under test from confounders. Cron classification fix is a substantive parity-research correction; bumps Otto-in-Codex viability from 'critical-gap- blocker' to 'reachable-via-different-surface'.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b2b09fdc85

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copilot AI review requested due to automatic review settings April 24, 2026 01:11

AceHack enabled auto-merge (squash) April 24, 2026 01:11

Copilot started reviewing on behalf of AceHack April 24, 2026 01:11 View session

chatgpt-codex-connector Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread docs/research/codex-cli-first-class-2026-04-23.md Outdated

Copilot AI reviewed Apr 24, 2026

View reviewed changes

Comment thread docs/research/codex-cli-first-class-2026-04-23.md

Comment thread docs/research/codex-cli-first-class-2026-04-23.md

Comment thread docs/research/codex-cli-first-class-2026-04-23.md Outdated

Comment thread docs/research/codex-cli-first-class-2026-04-23.md Outdated

AceHack mentioned this pull request Apr 24, 2026

history: Otto-76 tick-close row — Codex Phase-1 + 4-message autonomy-envelope absorb #232

Closed

4 tasks

AceHack mentioned this pull request Apr 24, 2026

backlog: Codex-first-class row — Otto-78 refinement (parallel-design + primary-switch + symmetric-parity) #236

Merged

5 tasks

AceHack mentioned this pull request Apr 25, 2026

hygiene(#268+): pr-preservation drain-log for #429 (#270 follow-up) #450

Merged

3 tasks

Copilot AI mentioned this pull request Apr 25, 2026

hygiene(#268+): pr-preservation _patterns.md — synthesis index #448

Merged

4 tasks

AceHack mentioned this pull request Apr 25, 2026

drain(#448 follow-up): resolve _patterns.md surface-class contradiction #466

Merged

4 tasks

chatgpt-codex-connector Bot reviewed Apr 25, 2026

View reviewed changes

Comment thread docs/research/codex-cli-first-class-2026-04-23.md Outdated

Comment thread docs/research/codex-cli-first-class-2026-04-23.md Outdated

Copilot AI review requested due to automatic review settings April 25, 2026 08:30

Copilot started reviewing on behalf of AceHack April 25, 2026 08:30 View session

chatgpt-codex-connector Bot reviewed Apr 25, 2026

View reviewed changes

Comment thread docs/research/codex-cli-first-class-2026-04-23.md Outdated

Comment thread docs/research/codex-cli-first-class-2026-04-23.md Outdated

Copilot AI reviewed Apr 25, 2026

View reviewed changes

AceHack merged commit 1c2b64c into main Apr 25, 2026
13 checks passed

chatgpt-codex-connector Bot reviewed Apr 25, 2026

View reviewed changes

Comment thread docs/research/codex-cli-first-class-2026-04-23.md

Comment thread docs/research/codex-cli-first-class-2026-04-23.md

Comment thread docs/research/codex-cli-first-class-2026-04-23.md

AceHack mentioned this pull request Apr 25, 2026

hygiene(#231 follow-up): 3 Codex post-merge parity reclassifications #472

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research: Codex CLI first-class session — Phase 1 (Stage 1 of 5 per PR #228)#231

research: Codex CLI first-class session — Phase 1 (Stage 1 of 5 per PR #228)#231
AceHack merged 11 commits intomainfrom
research/codex-cli-first-class-phase-1

AceHack commented Apr 24, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AceHack commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key findings

Stage-2 plan

Scope limits

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AceHack commented Apr 24, 2026 •

edited

Loading