Skip to content

research: Codex CLI first-class session — Phase 1 (Stage 1 of 5 per PR #228)#231

Merged
AceHack merged 11 commits intomainfrom
research/codex-cli-first-class-phase-1
Apr 25, 2026
Merged

research: Codex CLI first-class session — Phase 1 (Stage 1 of 5 per PR #228)#231
AceHack merged 11 commits intomainfrom
research/codex-cli-first-class-phase-1

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented Apr 24, 2026

Summary

  • Executes Stage 1 (research tick, S-effort) of the 5-stage arc named in PR backlog: first-class Codex-CLI session experience (P1, Aaron Otto-75 directive) #228's BACKLOG row for first-class Codex-CLI session experience.
  • Surfaces a major non-obvious win: Zeta's AGENTS.md is already what Codex CLI reads natively, so Zeta is ~60% Codex-ready by accident of prior decisions.
  • First-pass capability matrix: 11 parity / 5 partial / 2 gap / 2 Codex-specific (matches the doc's running-gap-score after two reclassifications: TodoWrite Gap → Parity (different shape) per OpenAI's Sept 15 2025 announcement, and CronCreate/ScheduleWakeup Likely-gap → Partial (different surface) per Codex Cloud thread automations at developers.openai.com/codex/app/automations).

Key findings

  1. AGENTS.md parity — both harnesses read it natively; CLAUDE.md already delegates to it. Free win.
  2. Capability matrix — most dev-work tools have parity (Bash/Edit/Read/Write/MCP/WebSearch). Subagents + worktrees present in both. Plan Mode mirrors.
  3. Autonomous-loop cadence reachable — Codex Cloud's thread-automations primitive (cron syntax + minute heartbeats + daily/weekly schedules) gives Otto-in-Codex a different-surface partial for the * * * * * <<autonomous-loop>> fire pattern. Stage 2 must verify the agent-facing API for arming/listing automations.
  4. Partial gaps — skills portability (covered by cross-harness-mirror-pipeline), hooks (narrowing per openai/codex#15211), slash commands.
  5. Account setup — already aligned (ServiceTitan across Claude Code + Codex CLI per Aaron Otto-76).

Stage-2 plan

7 concrete test prompts for the parity matrix (AGENTS.md reading via structural-discriminator, subagent dispatch, MCP invocation, Codex Cloud thread-automation API surface verification, codex exec repo-local probe, git-worktree isolation, session resumption).

Scope limits

  • Does NOT commit to harness swap.
  • Does NOT propose implementing a Codex-mode Otto.
  • Does NOT modify AGENTS.md.
  • Does NOT duplicate cross-harness-mirror-pipeline.

Test plan

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings April 24, 2026 01:11
@AceHack AceHack enabled auto-merge (squash) April 24, 2026 01:11
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8d5ca82de7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread docs/research/codex-cli-first-class-2026-04-23.md Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Stage-1 research writeup for making Codex CLI a first-class session harness for Zeta (per the staged plan in PR #228), focusing on parity assessment and identifying key gaps.

Changes:

  • Introduces a Phase 1 / Stage 1 research doc describing Codex CLI surfaces, configuration, and authentication.
  • Adds a first-pass capability parity matrix vs Claude Code, highlighting gaps (notably cron/scheduling for autonomous loop).
  • Documents a Stage-2 test plan and external reference links.

Comment thread docs/research/codex-cli-first-class-2026-04-23.md
Comment thread docs/research/codex-cli-first-class-2026-04-23.md
Comment thread docs/research/codex-cli-first-class-2026-04-23.md Outdated
Comment thread docs/research/codex-cli-first-class-2026-04-23.md Outdated
AceHack added a commit that referenced this pull request Apr 24, 2026
…phase sequence, Aminata blocking gate) (#233)

Aaron Otto-76 named-agent-email-ownership directive crystallises
three memory layers + task #240 into an executable path:

- 2026-04-20 four hard rules (never Aaron address; disclose
  agent-not-human; name project + why-contacted; recipient-UX-
  first).
- 2026-04-22 two-lanes + standing Playwright signup
  authorisation + free-tier constraint + provider-choice
  autonomy.
- 2026-04-23 autonomy-envelope with email carve-out (agents
  own their email; parallel ownership allowed;
  aaron_bond@yahoo.com test target; "don't be a dick" soft
  constraint).
- Task #240 signup-terrain mapping (complete).

Five explicit phase gates:

- Phase 0: complete (signup terrain mapped).
- Phase 1: persona-email-identity design doc (8 questions —
  persona choice, handle, provider, recovery cascade, 2FA,
  lanes, signature, reputation posture).
- Phase 2: Aminata threat-model pass (BLOCKING gate — new
  attack surface, recovery abuse, phishing attribution,
  employer-policy interaction).
- Phase 3: Playwright signup execution (bounded; single
  persona, single provider, DP-NNN.yaml evidence record).
- Phase 4: Test send to aaron_bond@yahoo.com.
- Phase 5: Memory capture + BP-NN promotion review.

Scope limits explicit:
- Does NOT authorise execution this tick.
- Does NOT authorise email use bypassing maintainer visibility.
- Does NOT allow parallel acquisition without explicit Phase 1
  design choice.
- Does NOT bypass Aminata blocking gate.

Composes with: PR #230 (multi-account Phase-2 gating is
sibling pattern); PR #231 (Codex is harness-neutral);
decision-proxy-evidence (PR #222) for Phase 3 records;
persona roster for persona-choice question.

Filed under `## P2 — research-grade`. Effort M total;
spread across 3-5 ticks.

Otto-77 tick deliverable.
AceHack added a commit that referenced this pull request Apr 24, 2026
…+ primary-switch-by-Aaron-context + symmetric-parity) (#236)

Aaron Otto-78 two-message refinement of the existing first-
class-Codex-CLI BACKLOG row (PR #228).

Message 1: parallel-design directive — Codex CLI designs its
own skill files asynchronously to Otto (only touching its own
substrate); each harness researches its own features on a
cadence; both harnesses get full-featured wrappers (loops,
memory enhancements, hooks, etc.); asymmetry between harnesses
tracked explicitly.

Message 2: primary-switch clarification — "only one will be
the primary either you or codex which ever one i'm in at the
time". Primary = whichever harness Aaron is actively in at
that moment; the other runs async controlled-by-primary; when
Aaron switches, roles swap. Symmetric feature parity required
("got to have all your fancyness and skills").

Refinement composes as extension of the existing 5-stage arc:

- Stage 1 (existing, PR #231) — Otto researches Codex from
  Otto-side.
- Stage 1b (new) — Codex CLI researches Claude Code from
  Codex-side (inverted roles).
- Stage 2 (joint) — parity matrix combines both sides.
- Stage 3 (each on own surface) — Codex CLI designs own skill
  files; Otto designs Claude-Code-specific wrappers.
- Stage 4 (synchronization cadence) — both sides run periodic
  harness-features research; asymmetry inventory maintained.
- Stage 5 (harness-choice ADR) — retains revisitable primary
  designation.

Scope limits:
- No Otto-ceding-control (Otto primary while Aaron in Claude
  Code, which is now).
- No cross-edit of other harness's substrate.
- No forced harness swap.
- ADR still the gate for any primary-reset.

Composes with cross-harness-mirror-pipeline (that row =
universal-skill distribution; this row = harness-specific-
skill parallel-authoring), multi-account design (PR #230),
Phase-1 Codex research (PR #231), and the first-class roster
memory.

Otto-78 tick split-attention deliverable (alongside primary
5th-ferry absorb PR #235).
AceHack added a commit that referenced this pull request Apr 24, 2026
…substrate entry-point

Aaron Otto-102 directive: "there are files in the drop
including a skill created with the openai skill creator so
it seems like codex should use this and integrate with this
like you did with your skill creator please absorb and
delete/remove items from the drop folder, there is a sample
skill in tere created by the oopenai skill creator too".

Establishes .codex/ as Codex CLI's harness-specific substrate
parallel to .claude/ per Otto-79 "each harness owns its own
named loop agent; each harness authors its own skill files"
discipline.

Files landed:

- .codex/README.md — harness-specific entry-point parallel
  to CLAUDE.md; names layout + convention + Otto/Codex-skill
  edit-boundary + bootstrap story + skill-authorship
  convention + provenance.

- .codex/skills/idea-spark/SKILL.md — OpenAI-Skill-Creator-
  generated brainstorming helper. Frontmatter + 3-option-
  spread workflow + naming/positioning/experiment
  sub-patterns.

- .codex/skills/idea-spark/agents/openai.yaml — vendor-
  specific agent config (display_name: "Idea Spark").

- .codex/skills/idea-spark/references/idea-patterns.md —
  on-demand reference content (expansion lenses + option
  styles + tiny experiment template).

Boundary discipline (per Otto-79 cross-session-review-yes-
cross-edit-no): Otto (Claude Code loop agent) does NOT edit
.codex/skills/** as normal work. This initial landing was a
substrate-setup action only. Future Codex CLI sessions
author + maintain.

drop/ folder disposition at Otto-102:
- skill.zip → extracted + deleted (substrate preserved here).
- usageReport CSV → deleted (non-substrate; 9KB usage data).
- aurora-initial-integration-points.md (40.5KB) → PRESERVED
  in drop/ pending Otto-103 dedicated absorb as 9th ferry
  (retroactive). Per CC-002 discipline; drop/ is gitignored
  per PR #265 so no accidental check-in risk.
- aurora-integration-deep-research-report.md (25.4KB) →
  PRESERVED in drop/ pending Otto-104 dedicated absorb as
  10th ferry (retroactive).

Scheduling-memory filed:
memory/project_amara_drop_folder_9th_and_10th_ferry_research_
reports_pending_absorb_otto_103_104_2026_04_24.md — names
Otto-103 + Otto-104 absorb plan, disposition of all 4 drop/
items, Aaron's "absorb and delete" directive literal-honoring
timeline.

Composes with:
- PR #228 Codex-first-class BACKLOG row — .codex/ is
  substrate-support for that 6-stage arc.
- PR #231 Phase-1 Codex CLI research — identified the
  AGENTS.md-already-universal parity finding; this landing
  extends to harness-specific substrate.
- Otto-79 peer-harness refinement memory — "each harness
  owns its own".
- PR #265 drop/ gitignore.

Lands within-standing-authority per Otto-82/90/93 calibration
— Aaron's explicit directive + Codex-harness-substrate setup
action; not gated.

Aaron message-ending Otto-102: "when you get a second end
your loop i'm going to exit and update you" — Otto-102 closes
gracefully; autonomous loop ends after tick-history row +
push.
AceHack added a commit that referenced this pull request Apr 24, 2026
…autonomy-envelope absorb

Otto-76 tick closed with three substantive landings despite
high-directive-velocity mid-tick:

- PR #230 — P3 multi-account access design BACKLOG row
  (3 Aaron refinements landed same branch: initial → "design
  allowed now, implementation gated on security review" →
  "poor-man-tier no-paid-API-keys hard requirement").

- PR #231 — Codex CLI Phase-1 research (Stage 1 of 5 per
  PR #228); 294-line doc; surfaces AGENTS.md-is-already-
  universal free-win finding; 10/4/4/2 capability-parity
  breakdown.

- Three per-user memory captures (account snapshot,
  split-attention+composition endorsed, agent-autonomy-
  envelope with email carve-out).

Key observations (from the row's Observations column):
1. Directive-churn != tick-failure. Split-attention pattern
   held under 4x directive rate.
2. AGENTS.md parity de-risks first-class-Codex support
   (portability-by-design was retroactively validated).
3. Named-agent-email-ownership carve-out is substantive
   agent-autonomy expansion (email = reputation surface).
4. Poor-man-tier vs enterprise-API-tier distinction is
   load-bearing for multi-account design.

Stacked on top of Otto-75 tick-history branch so it shows as
atop that row in diff preview. Independent of PR #229 merge
timing.
AceHack added a commit that referenced this pull request Apr 24, 2026
…autonomy-envelope absorb

Otto-76 tick closed with three substantive landings despite
high-directive-velocity mid-tick:

- PR #230 — P3 multi-account access design BACKLOG row
  (3 Aaron refinements landed same branch: initial → "design
  allowed now, implementation gated on security review" →
  "poor-man-tier no-paid-API-keys hard requirement").

- PR #231 — Codex CLI Phase-1 research (Stage 1 of 5 per
  PR #228); 294-line doc; surfaces AGENTS.md-is-already-
  universal free-win finding; 10/4/4/2 capability-parity
  breakdown.

- Three per-user memory captures (account snapshot,
  split-attention+composition endorsed, agent-autonomy-
  envelope with email carve-out).

Key observations (from the row's Observations column):
1. Directive-churn != tick-failure. Split-attention pattern
   held under 4x directive rate.
2. AGENTS.md parity de-risks first-class-Codex support
   (portability-by-design was retroactively validated).
3. Named-agent-email-ownership carve-out is substantive
   agent-autonomy expansion (email = reputation surface).
4. Poor-man-tier vs enterprise-API-tier distinction is
   load-bearing for multi-account design.

Stacked on top of Otto-75 tick-history branch so it shows as
atop that row in diff preview. Independent of PR #229 merge
timing.
AceHack added a commit that referenced this pull request Apr 24, 2026
…autonomy-envelope absorb

Otto-76 tick closed with three substantive landings despite
high-directive-velocity mid-tick:

- PR #230 — P3 multi-account access design BACKLOG row
  (3 Aaron refinements landed same branch: initial → "design
  allowed now, implementation gated on security review" →
  "poor-man-tier no-paid-API-keys hard requirement").

- PR #231 — Codex CLI Phase-1 research (Stage 1 of 5 per
  PR #228); 294-line doc; surfaces AGENTS.md-is-already-
  universal free-win finding; 10/4/4/2 capability-parity
  breakdown.

- Three per-user memory captures (account snapshot,
  split-attention+composition endorsed, agent-autonomy-
  envelope with email carve-out).

Key observations (from the row's Observations column):
1. Directive-churn != tick-failure. Split-attention pattern
   held under 4x directive rate.
2. AGENTS.md parity de-risks first-class-Codex support
   (portability-by-design was retroactively validated).
3. Named-agent-email-ownership carve-out is substantive
   agent-autonomy expansion (email = reputation surface).
4. Poor-man-tier vs enterprise-API-tier distinction is
   load-bearing for multi-account design.

Stacked on top of Otto-75 tick-history branch so it shows as
atop that row in diff preview. Independent of PR #229 merge
timing.
AceHack added a commit that referenced this pull request Apr 24, 2026
…autonomy-envelope absorb

Otto-76 tick closed with three substantive landings despite
high-directive-velocity mid-tick:

- PR #230 — P3 multi-account access design BACKLOG row
  (3 Aaron refinements landed same branch: initial → "design
  allowed now, implementation gated on security review" →
  "poor-man-tier no-paid-API-keys hard requirement").

- PR #231 — Codex CLI Phase-1 research (Stage 1 of 5 per
  PR #228); 294-line doc; surfaces AGENTS.md-is-already-
  universal free-win finding; 10/4/4/2 capability-parity
  breakdown.

- Three per-user memory captures (account snapshot,
  split-attention+composition endorsed, agent-autonomy-
  envelope with email carve-out).

Key observations (from the row's Observations column):
1. Directive-churn != tick-failure. Split-attention pattern
   held under 4x directive rate.
2. AGENTS.md parity de-risks first-class-Codex support
   (portability-by-design was retroactively validated).
3. Named-agent-email-ownership carve-out is substantive
   agent-autonomy expansion (email = reputation surface).
4. Poor-man-tier vs enterprise-API-tier distinction is
   load-bearing for multi-account design.

Stacked on top of Otto-75 tick-history branch so it shows as
atop that row in diff preview. Independent of PR #229 merge
timing.
AceHack added a commit that referenced this pull request Apr 24, 2026
…autonomy-envelope absorb

Otto-76 tick closed with three substantive landings despite
high-directive-velocity mid-tick:

- PR #230 — P3 multi-account access design BACKLOG row
  (3 Aaron refinements landed same branch: initial → "design
  allowed now, implementation gated on security review" →
  "poor-man-tier no-paid-API-keys hard requirement").

- PR #231 — Codex CLI Phase-1 research (Stage 1 of 5 per
  PR #228); 294-line doc; surfaces AGENTS.md-is-already-
  universal free-win finding; 10/4/4/2 capability-parity
  breakdown.

- Three per-user memory captures (account snapshot,
  split-attention+composition endorsed, agent-autonomy-
  envelope with email carve-out).

Key observations (from the row's Observations column):
1. Directive-churn != tick-failure. Split-attention pattern
   held under 4x directive rate.
2. AGENTS.md parity de-risks first-class-Codex support
   (portability-by-design was retroactively validated).
3. Named-agent-email-ownership carve-out is substantive
   agent-autonomy expansion (email = reputation surface).
4. Poor-man-tier vs enterprise-API-tier distinction is
   load-bearing for multi-account design.

Stacked on top of Otto-75 tick-history branch so it shows as
atop that row in diff preview. Independent of PR #229 merge
timing.
AceHack added a commit that referenced this pull request Apr 24, 2026
…autonomy-envelope absorb

Otto-76 tick closed with three substantive landings despite
high-directive-velocity mid-tick:

- PR #230 — P3 multi-account access design BACKLOG row
  (3 Aaron refinements landed same branch: initial → "design
  allowed now, implementation gated on security review" →
  "poor-man-tier no-paid-API-keys hard requirement").

- PR #231 — Codex CLI Phase-1 research (Stage 1 of 5 per
  PR #228); 294-line doc; surfaces AGENTS.md-is-already-
  universal free-win finding; 10/4/4/2 capability-parity
  breakdown.

- Three per-user memory captures (account snapshot,
  split-attention+composition endorsed, agent-autonomy-
  envelope with email carve-out).

Key observations (from the row's Observations column):
1. Directive-churn != tick-failure. Split-attention pattern
   held under 4x directive rate.
2. AGENTS.md parity de-risks first-class-Codex support
   (portability-by-design was retroactively validated).
3. Named-agent-email-ownership carve-out is substantive
   agent-autonomy expansion (email = reputation surface).
4. Poor-man-tier vs enterprise-API-tier distinction is
   load-bearing for multi-account design.

Stacked on top of Otto-75 tick-history branch so it shows as
atop that row in diff preview. Independent of PR #229 merge
timing.
AceHack added a commit that referenced this pull request Apr 24, 2026
…autonomy-envelope absorb

Otto-76 tick closed with three substantive landings despite
high-directive-velocity mid-tick:

- PR #230 — P3 multi-account access design BACKLOG row
  (3 Aaron refinements landed same branch: initial → "design
  allowed now, implementation gated on security review" →
  "poor-man-tier no-paid-API-keys hard requirement").

- PR #231 — Codex CLI Phase-1 research (Stage 1 of 5 per
  PR #228); 294-line doc; surfaces AGENTS.md-is-already-
  universal free-win finding; 10/4/4/2 capability-parity
  breakdown.

- Three per-user memory captures (account snapshot,
  split-attention+composition endorsed, agent-autonomy-
  envelope with email carve-out).

Key observations (from the row's Observations column):
1. Directive-churn != tick-failure. Split-attention pattern
   held under 4x directive rate.
2. AGENTS.md parity de-risks first-class-Codex support
   (portability-by-design was retroactively validated).
3. Named-agent-email-ownership carve-out is substantive
   agent-autonomy expansion (email = reputation surface).
4. Poor-man-tier vs enterprise-API-tier distinction is
   load-bearing for multi-account design.

Stacked on top of Otto-75 tick-history branch so it shows as
atop that row in diff preview. Independent of PR #229 merge
timing.
AceHack added a commit that referenced this pull request Apr 24, 2026
…autonomy-envelope absorb

Otto-76 tick closed with three substantive landings despite
high-directive-velocity mid-tick:

- PR #230 — P3 multi-account access design BACKLOG row
  (3 Aaron refinements landed same branch: initial → "design
  allowed now, implementation gated on security review" →
  "poor-man-tier no-paid-API-keys hard requirement").

- PR #231 — Codex CLI Phase-1 research (Stage 1 of 5 per
  PR #228); 294-line doc; surfaces AGENTS.md-is-already-
  universal free-win finding; 10/4/4/2 capability-parity
  breakdown.

- Three per-user memory captures (account snapshot,
  split-attention+composition endorsed, agent-autonomy-
  envelope with email carve-out).

Key observations (from the row's Observations column):
1. Directive-churn != tick-failure. Split-attention pattern
   held under 4x directive rate.
2. AGENTS.md parity de-risks first-class-Codex support
   (portability-by-design was retroactively validated).
3. Named-agent-email-ownership carve-out is substantive
   agent-autonomy expansion (email = reputation surface).
4. Poor-man-tier vs enterprise-API-tier distinction is
   load-bearing for multi-account design.

Stacked on top of Otto-75 tick-history branch so it shows as
atop that row in diff preview. Independent of PR #229 merge
timing.
AceHack added a commit that referenced this pull request Apr 24, 2026
…autonomy-envelope absorb

Otto-76 tick closed with three substantive landings despite
high-directive-velocity mid-tick:

- PR #230 — P3 multi-account access design BACKLOG row
  (3 Aaron refinements landed same branch: initial → "design
  allowed now, implementation gated on security review" →
  "poor-man-tier no-paid-API-keys hard requirement").

- PR #231 — Codex CLI Phase-1 research (Stage 1 of 5 per
  PR #228); 294-line doc; surfaces AGENTS.md-is-already-
  universal free-win finding; 10/4/4/2 capability-parity
  breakdown.

- Three per-user memory captures (account snapshot,
  split-attention+composition endorsed, agent-autonomy-
  envelope with email carve-out).

Key observations (from the row's Observations column):
1. Directive-churn != tick-failure. Split-attention pattern
   held under 4x directive rate.
2. AGENTS.md parity de-risks first-class-Codex support
   (portability-by-design was retroactively validated).
3. Named-agent-email-ownership carve-out is substantive
   agent-autonomy expansion (email = reputation surface).
4. Poor-man-tier vs enterprise-API-tier distinction is
   load-bearing for multi-account design.

Stacked on top of Otto-75 tick-history branch so it shows as
atop that row in diff preview. Independent of PR #229 merge
timing.
AceHack added a commit that referenced this pull request Apr 24, 2026
…autonomy-envelope absorb

Otto-76 tick closed with three substantive landings despite
high-directive-velocity mid-tick:

- PR #230 — P3 multi-account access design BACKLOG row
  (3 Aaron refinements landed same branch: initial → "design
  allowed now, implementation gated on security review" →
  "poor-man-tier no-paid-API-keys hard requirement").

- PR #231 — Codex CLI Phase-1 research (Stage 1 of 5 per
  PR #228); 294-line doc; surfaces AGENTS.md-is-already-
  universal free-win finding; 10/4/4/2 capability-parity
  breakdown.

- Three per-user memory captures (account snapshot,
  split-attention+composition endorsed, agent-autonomy-
  envelope with email carve-out).

Key observations (from the row's Observations column):
1. Directive-churn != tick-failure. Split-attention pattern
   held under 4x directive rate.
2. AGENTS.md parity de-risks first-class-Codex support
   (portability-by-design was retroactively validated).
3. Named-agent-email-ownership carve-out is substantive
   agent-autonomy expansion (email = reputation surface).
4. Poor-man-tier vs enterprise-API-tier distinction is
   load-bearing for multi-account design.

Stacked on top of Otto-75 tick-history branch so it shows as
atop that row in diff preview. Independent of PR #229 merge
timing.
AceHack added a commit that referenced this pull request Apr 24, 2026
…autonomy-envelope absorb

Otto-76 tick closed with three substantive landings despite
high-directive-velocity mid-tick:

- PR #230 — P3 multi-account access design BACKLOG row
  (3 Aaron refinements landed same branch: initial → "design
  allowed now, implementation gated on security review" →
  "poor-man-tier no-paid-API-keys hard requirement").

- PR #231 — Codex CLI Phase-1 research (Stage 1 of 5 per
  PR #228); 294-line doc; surfaces AGENTS.md-is-already-
  universal free-win finding; 10/4/4/2 capability-parity
  breakdown.

- Three per-user memory captures (account snapshot,
  split-attention+composition endorsed, agent-autonomy-
  envelope with email carve-out).

Key observations (from the row's Observations column):
1. Directive-churn != tick-failure. Split-attention pattern
   held under 4x directive rate.
2. AGENTS.md parity de-risks first-class-Codex support
   (portability-by-design was retroactively validated).
3. Named-agent-email-ownership carve-out is substantive
   agent-autonomy expansion (email = reputation surface).
4. Poor-man-tier vs enterprise-API-tier distinction is
   load-bearing for multi-account design.

Stacked on top of Otto-75 tick-history branch so it shows as
atop that row in diff preview. Independent of PR #229 merge
timing.
AceHack added a commit that referenced this pull request Apr 24, 2026
…autonomy-envelope absorb

Otto-76 tick closed with three substantive landings despite
high-directive-velocity mid-tick:

- PR #230 — P3 multi-account access design BACKLOG row
  (3 Aaron refinements landed same branch: initial → "design
  allowed now, implementation gated on security review" →
  "poor-man-tier no-paid-API-keys hard requirement").

- PR #231 — Codex CLI Phase-1 research (Stage 1 of 5 per
  PR #228); 294-line doc; surfaces AGENTS.md-is-already-
  universal free-win finding; 10/4/4/2 capability-parity
  breakdown.

- Three per-user memory captures (account snapshot,
  split-attention+composition endorsed, agent-autonomy-
  envelope with email carve-out).

Key observations (from the row's Observations column):
1. Directive-churn != tick-failure. Split-attention pattern
   held under 4x directive rate.
2. AGENTS.md parity de-risks first-class-Codex support
   (portability-by-design was retroactively validated).
3. Named-agent-email-ownership carve-out is substantive
   agent-autonomy expansion (email = reputation surface).
4. Poor-man-tier vs enterprise-API-tier distinction is
   load-bearing for multi-account design.

Stacked on top of Otto-75 tick-history branch so it shows as
atop that row in diff preview. Independent of PR #229 merge
timing.
AceHack added a commit that referenced this pull request Apr 24, 2026
…autonomy-envelope absorb

Otto-76 tick closed with three substantive landings despite
high-directive-velocity mid-tick:

- PR #230 — P3 multi-account access design BACKLOG row
  (3 Aaron refinements landed same branch: initial → "design
  allowed now, implementation gated on security review" →
  "poor-man-tier no-paid-API-keys hard requirement").

- PR #231 — Codex CLI Phase-1 research (Stage 1 of 5 per
  PR #228); 294-line doc; surfaces AGENTS.md-is-already-
  universal free-win finding; 10/4/4/2 capability-parity
  breakdown.

- Three per-user memory captures (account snapshot,
  split-attention+composition endorsed, agent-autonomy-
  envelope with email carve-out).

Key observations (from the row's Observations column):
1. Directive-churn != tick-failure. Split-attention pattern
   held under 4x directive rate.
2. AGENTS.md parity de-risks first-class-Codex support
   (portability-by-design was retroactively validated).
3. Named-agent-email-ownership carve-out is substantive
   agent-autonomy expansion (email = reputation surface).
4. Poor-man-tier vs enterprise-API-tier distinction is
   load-bearing for multi-account design.

Stacked on top of Otto-75 tick-history branch so it shows as
atop that row in diff preview. Independent of PR #229 merge
timing.
AceHack added a commit that referenced this pull request Apr 24, 2026
…autonomy-envelope absorb

Otto-76 tick closed with three substantive landings despite
high-directive-velocity mid-tick:

- PR #230 — P3 multi-account access design BACKLOG row
  (3 Aaron refinements landed same branch: initial → "design
  allowed now, implementation gated on security review" →
  "poor-man-tier no-paid-API-keys hard requirement").

- PR #231 — Codex CLI Phase-1 research (Stage 1 of 5 per
  PR #228); 294-line doc; surfaces AGENTS.md-is-already-
  universal free-win finding; 10/4/4/2 capability-parity
  breakdown.

- Three per-user memory captures (account snapshot,
  split-attention+composition endorsed, agent-autonomy-
  envelope with email carve-out).

Key observations (from the row's Observations column):
1. Directive-churn != tick-failure. Split-attention pattern
   held under 4x directive rate.
2. AGENTS.md parity de-risks first-class-Codex support
   (portability-by-design was retroactively validated).
3. Named-agent-email-ownership carve-out is substantive
   agent-autonomy expansion (email = reputation surface).
4. Poor-man-tier vs enterprise-API-tier distinction is
   load-bearing for multi-account design.

Stacked on top of Otto-75 tick-history branch so it shows as
atop that row in diff preview. Independent of PR #229 merge
timing.
AceHack added a commit that referenced this pull request Apr 24, 2026
…autonomy-envelope absorb

Otto-76 tick closed with three substantive landings despite
high-directive-velocity mid-tick:

- PR #230 — P3 multi-account access design BACKLOG row
  (3 Aaron refinements landed same branch: initial → "design
  allowed now, implementation gated on security review" →
  "poor-man-tier no-paid-API-keys hard requirement").

- PR #231 — Codex CLI Phase-1 research (Stage 1 of 5 per
  PR #228); 294-line doc; surfaces AGENTS.md-is-already-
  universal free-win finding; 10/4/4/2 capability-parity
  breakdown.

- Three per-user memory captures (account snapshot,
  split-attention+composition endorsed, agent-autonomy-
  envelope with email carve-out).

Key observations (from the row's Observations column):
1. Directive-churn != tick-failure. Split-attention pattern
   held under 4x directive rate.
2. AGENTS.md parity de-risks first-class-Codex support
   (portability-by-design was retroactively validated).
3. Named-agent-email-ownership carve-out is substantive
   agent-autonomy expansion (email = reputation surface).
4. Poor-man-tier vs enterprise-API-tier distinction is
   load-bearing for multi-account design.

Stacked on top of Otto-75 tick-history branch so it shows as
atop that row in diff preview. Independent of PR #229 merge
timing.
AceHack added a commit that referenced this pull request Apr 24, 2026
…autonomy-envelope absorb

Otto-76 tick closed with three substantive landings despite
high-directive-velocity mid-tick:

- PR #230 — P3 multi-account access design BACKLOG row
  (3 Aaron refinements landed same branch: initial → "design
  allowed now, implementation gated on security review" →
  "poor-man-tier no-paid-API-keys hard requirement").

- PR #231 — Codex CLI Phase-1 research (Stage 1 of 5 per
  PR #228); 294-line doc; surfaces AGENTS.md-is-already-
  universal free-win finding; 10/4/4/2 capability-parity
  breakdown.

- Three per-user memory captures (account snapshot,
  split-attention+composition endorsed, agent-autonomy-
  envelope with email carve-out).

Key observations (from the row's Observations column):
1. Directive-churn != tick-failure. Split-attention pattern
   held under 4x directive rate.
2. AGENTS.md parity de-risks first-class-Codex support
   (portability-by-design was retroactively validated).
3. Named-agent-email-ownership carve-out is substantive
   agent-autonomy expansion (email = reputation surface).
4. Poor-man-tier vs enterprise-API-tier distinction is
   load-bearing for multi-account design.

Stacked on top of Otto-75 tick-history branch so it shows as
atop that row in diff preview. Independent of PR #229 merge
timing.
AceHack added a commit that referenced this pull request Apr 24, 2026
…autonomy-envelope absorb

Otto-76 tick closed with three substantive landings despite
high-directive-velocity mid-tick:

- PR #230 — P3 multi-account access design BACKLOG row
  (3 Aaron refinements landed same branch: initial → "design
  allowed now, implementation gated on security review" →
  "poor-man-tier no-paid-API-keys hard requirement").

- PR #231 — Codex CLI Phase-1 research (Stage 1 of 5 per
  PR #228); 294-line doc; surfaces AGENTS.md-is-already-
  universal free-win finding; 10/4/4/2 capability-parity
  breakdown.

- Three per-user memory captures (account snapshot,
  split-attention+composition endorsed, agent-autonomy-
  envelope with email carve-out).

Key observations (from the row's Observations column):
1. Directive-churn != tick-failure. Split-attention pattern
   held under 4x directive rate.
2. AGENTS.md parity de-risks first-class-Codex support
   (portability-by-design was retroactively validated).
3. Named-agent-email-ownership carve-out is substantive
   agent-autonomy expansion (email = reputation surface).
4. Poor-man-tier vs enterprise-API-tier distinction is
   load-bearing for multi-account design.

Stacked on top of Otto-75 tick-history branch so it shows as
atop that row in diff preview. Independent of PR #229 merge
timing.
AceHack added a commit that referenced this pull request Apr 24, 2026
…autonomy-envelope absorb

Otto-76 tick closed with three substantive landings despite
high-directive-velocity mid-tick:

- PR #230 — P3 multi-account access design BACKLOG row
  (3 Aaron refinements landed same branch: initial → "design
  allowed now, implementation gated on security review" →
  "poor-man-tier no-paid-API-keys hard requirement").

- PR #231 — Codex CLI Phase-1 research (Stage 1 of 5 per
  PR #228); 294-line doc; surfaces AGENTS.md-is-already-
  universal free-win finding; 10/4/4/2 capability-parity
  breakdown.

- Three per-user memory captures (account snapshot,
  split-attention+composition endorsed, agent-autonomy-
  envelope with email carve-out).

Key observations (from the row's Observations column):
1. Directive-churn != tick-failure. Split-attention pattern
   held under 4x directive rate.
2. AGENTS.md parity de-risks first-class-Codex support
   (portability-by-design was retroactively validated).
3. Named-agent-email-ownership carve-out is substantive
   agent-autonomy expansion (email = reputation surface).
4. Poor-man-tier vs enterprise-API-tier distinction is
   load-bearing for multi-account design.

Stacked on top of Otto-75 tick-history branch so it shows as
atop that row in diff preview. Independent of PR #229 merge
timing.
AceHack added a commit that referenced this pull request Apr 24, 2026
…autonomy-envelope absorb

Otto-76 tick closed with three substantive landings despite
high-directive-velocity mid-tick:

- PR #230 — P3 multi-account access design BACKLOG row
  (3 Aaron refinements landed same branch: initial → "design
  allowed now, implementation gated on security review" →
  "poor-man-tier no-paid-API-keys hard requirement").

- PR #231 — Codex CLI Phase-1 research (Stage 1 of 5 per
  PR #228); 294-line doc; surfaces AGENTS.md-is-already-
  universal free-win finding; 10/4/4/2 capability-parity
  breakdown.

- Three per-user memory captures (account snapshot,
  split-attention+composition endorsed, agent-autonomy-
  envelope with email carve-out).

Key observations (from the row's Observations column):
1. Directive-churn != tick-failure. Split-attention pattern
   held under 4x directive rate.
2. AGENTS.md parity de-risks first-class-Codex support
   (portability-by-design was retroactively validated).
3. Named-agent-email-ownership carve-out is substantive
   agent-autonomy expansion (email = reputation surface).
4. Poor-man-tier vs enterprise-API-tier distinction is
   load-bearing for multi-account design.

Stacked on top of Otto-75 tick-history branch so it shows as
atop that row in diff preview. Independent of PR #229 merge
timing.
AceHack added a commit that referenced this pull request Apr 24, 2026
…autonomy-envelope absorb

Otto-76 tick closed with three substantive landings despite
high-directive-velocity mid-tick:

- PR #230 — P3 multi-account access design BACKLOG row
  (3 Aaron refinements landed same branch: initial → "design
  allowed now, implementation gated on security review" →
  "poor-man-tier no-paid-API-keys hard requirement").

- PR #231 — Codex CLI Phase-1 research (Stage 1 of 5 per
  PR #228); 294-line doc; surfaces AGENTS.md-is-already-
  universal free-win finding; 10/4/4/2 capability-parity
  breakdown.

- Three per-user memory captures (account snapshot,
  split-attention+composition endorsed, agent-autonomy-
  envelope with email carve-out).

Key observations (from the row's Observations column):
1. Directive-churn != tick-failure. Split-attention pattern
   held under 4x directive rate.
2. AGENTS.md parity de-risks first-class-Codex support
   (portability-by-design was retroactively validated).
3. Named-agent-email-ownership carve-out is substantive
   agent-autonomy expansion (email = reputation surface).
4. Poor-man-tier vs enterprise-API-tier distinction is
   load-bearing for multi-account design.

Stacked on top of Otto-75 tick-history branch so it shows as
atop that row in diff preview. Independent of PR #229 merge
timing.
AceHack added a commit that referenced this pull request Apr 25, 2026
…450)

Otto-268 follow-on: drain-log for the post-merge cascade PR #429
following parent #270 (research: multi-Claude peer-harness experiment
design).

Per Otto-250 training-signal discipline. Captures two substantive
findings:

1. **Memory-substrate disambiguation** (P1): AutoMemory (out-of-repo,
   per-user at `~/.claude/projects/<slug>/memory/`) vs git-tracked
   in-repo `memory/` (the forward-mirror substrate landed via
   Otto-114). Conflating in experiment-design produces wrong-
   detection-mechanism findings downstream. Fix: name both surfaces
   + the forward-mirror relationship + per-surface mechanism (git
   diff/reflog vs filesystem hash compare). Same shape as
   implementation-vs-math-definition tension on #206.

2. **Severity-bolding consistency** (P2): markdown-rendering class
   where third CRITICAL had inconsistent bold; uniformity matters
   for at-a-glance scanning + grep-ability. Future doc-lint
   candidate.

Pattern observation: experiment-design docs benefit from per-
surface mechanism tables — same shape as parity-matrix on #231;
table-form documents reduce surface for omission-class findings.
AceHack added a commit that referenced this pull request Apr 25, 2026
…ty map)

Otto-268 backfill: drain-log for PR #231 — textbook case of the
post-merge reviewer-cascade pattern. 9 threads drained across 4 waves
(2 + 1 + 3 + 2 + 1 cascade pattern); every commit triggered a fresh
Codex/Copilot review wave catching new factual issues against the
freshly-changed surface.

Per Otto-250 training-signal discipline. Pattern observations capture
four load-bearing patterns:
1. Post-merge reviewer-cascade as dominant pattern; wave-by-wave the
   findings shift class (structural → rendering → internal-consistency
   → version-currency).
2. Codex enforces version-currency on the doc itself — Wave 4
   reclassifications cite OpenAI release notes (Sept 15 2025 + March 26
   2026); reviewer-enforces-rule pattern is the inverse of CLAUDE.md
   author-side version-currency rule.
3. "Partial (narrowing)" status annotation as a useful sub-state for
   gaps that are shrinking on measurable schedule.
4. Discriminator-falsification finding pattern (AGENTS.md-read
   test relying on values repeated in same doc) — same shape as
   randomized-canary in security testing.
AceHack added a commit that referenced this pull request Apr 25, 2026
…-log

Multiple Copilot threads on #445 caught count-mismatch errors:

- Header claimed "9 across 4 waves (2 + 1 + 3 + 2 + 1 cascade)" —
  the "+1 cascade" doesn't correspond to any Wave 5 section in the
  body. Body has exactly 4 waves with 2+1+3+2=8 thread-sections.
- Final-resolution claimed "All 9 threads resolved" — should be 8.

Fix: aligned both header + final-resolution to the body's actual
8-thread-across-4-waves structure. Wave breakdown explicit
(Wave 1: 2, Wave 2: 1, Wave 3: 3, Wave 4: 2 — sum 8) so the math
is verifiable from the heading.

Same count-vs-list cardinality pattern documented in `_patterns.md`
(Class B in PR #465 BACKLOG row) — the doc-lint suite would catch
this at author-time.
AceHack added a commit that referenced this pull request Apr 25, 2026
…rain-log

Multiple Codex/Copilot threads on #444 caught:

- L16: '3 were Otto-279' → '2 were Otto-279' (matches body's
  Threads C1-C2 = 2 OTTO-279 SURFACE-CLASS).
- L22: 'Outcome distribution: 4 OTTO-279' → '2 OTTO-279 + 2 dups'
  (matches L161 final-resolution math: 4 + 5 + 2 + 2 dups = 13).
- L56: Thread A3 'Copilot P1 ×2' → 'Copilot P1 ×3' (3 thread IDs
  listed: ejy1 + eenN + eenr).
- L87: non-portable `grep -i "actionlint\|shellcheck"` → portable
  `grep -iE "actionlint|shellcheck"` (BSD/macOS grep doesn't
  support `\|` BRE alternation; the `-E` extended-regex form is
  POSIX-portable). Captured the rationale inline so the verification
  command actually works on macOS.

Same count-vs-list cardinality pattern (Class B in PR #465 doc-lint
suite BACKLOG row) — third drain-log of mine to exhibit it (after
#195 and #231). The shellcheck-rule-precision class also surfaces
via the `\|` portability finding (related to SC2086-vs-SC2046 from
#427 drain-log).
AceHack added a commit that referenced this pull request Apr 25, 2026
Codex P2 + Copilot threads on #437 caught:

- Lines 6-7 fragment + count mismatch: header said '10 unresolved
  ..., 1 P1' (suggesting 11) while body summarized 14 = 10 first-
  wave + 4 second-wave. Reworded into a single unambiguous summary:
  '10 unresolved at first-wave; post-merge cascade then surfaced
  3 more (1 Codex P1 + 2 Copilot P2). Total 13.'
- Second-wave header '1 P1 + 3 P2 post-merge cascade' → '1 Codex P1
  + 2 Copilot P2 — 3 threads total' (only 3 thread sections A/B/C
  exist in body).
- Pattern observation 2 'Stale-resolved-by-reality at ~70%' (7 of
  14) → '~54%' (7 of 13) matching corrected total.
- Final-resolution 'All 14 threads' → 'All 13 threads (10 first-
  wave + 3 second-wave)'.

Same count-vs-list cardinality pattern as #195/#231/#377/#444
drain-log fixes — fourth instance in my own logs. Strong validation
that doc-lint Class B (PR #465 BACKLOG) would compound.
AceHack added a commit that referenced this pull request Apr 25, 2026
* hygiene(#268): pr-preservation drain-log for #135 (auto-loop-35 Itron mapping)

Otto-268 backfill task: drain-log for PR #135 covering 14 total threads
across 2 waves (10 first-wave pre-merge + 4 second-wave post-merge cascade).

Per Otto-250 training-signal discipline: full per-thread record with
reviewer authorship, severity, outcome class (FIX / STALE-RESOLVED-BY-
REALITY / OTTO-279 SURFACE-CLASS), and resolution path. Pattern
observations capture the three load-bearing patterns: Otto-279 as
mature uniform reply stamp; stale-resolved-by-reality at ~70% on this
PR; Codex catching subset-vs-superset framing errors in benchmark
canonical definitions (DORA / K-relations).

* drain(#437 follow-up): fix count mismatches in #135 drain-log

Codex P2 + Copilot threads on #437 caught:

- Lines 6-7 fragment + count mismatch: header said '10 unresolved
  ..., 1 P1' (suggesting 11) while body summarized 14 = 10 first-
  wave + 4 second-wave. Reworded into a single unambiguous summary:
  '10 unresolved at first-wave; post-merge cascade then surfaced
  3 more (1 Codex P1 + 2 Copilot P2). Total 13.'
- Second-wave header '1 P1 + 3 P2 post-merge cascade' → '1 Codex P1
  + 2 Copilot P2 — 3 threads total' (only 3 thread sections A/B/C
  exist in body).
- Pattern observation 2 'Stale-resolved-by-reality at ~70%' (7 of
  14) → '~54%' (7 of 13) matching corrected total.
- Final-resolution 'All 14 threads' → 'All 13 threads (10 first-
  wave + 3 second-wave)'.

Same count-vs-list cardinality pattern as #195/#231/#377/#444
drain-log fixes — fourth instance in my own logs. Strong validation
that doc-lint Class B (PR #465 BACKLOG) would compound.
AceHack added a commit that referenced this pull request Apr 25, 2026
…rain-log

Multiple Codex/Copilot threads on #461 caught:

- L7 'Thread count at drain: 3' → '4' (body has Threads 1-4).
- L17 'Codex caught three findings' → 'four' matching body.
- L122 'merged to main' → 'merged to main as `5698f9d`' for
  consistency with other drain-logs that include the merge SHA
  for auditability.

Same count-vs-list cardinality pattern (Class B in PR #465 doc-lint
suite BACKLOG row) — 5th instance in my own drain-logs (#195 / #231
/ #377 / #135 / #430). The pattern is genuinely universal author-
side; even when explicitly aware of it, instances slip through.
AceHack added a commit that referenced this pull request Apr 25, 2026
Codex P2 + Copilot P1+P2 caught:

- Inline code span split across newline (`docs/` on one line,
  `research/openai-codex-cli-capability-map.md` on the next) —
  reflowed to single-line so the path renders as one token.
- Capability-map cluster listed `docs/research/codex-cli-first-
  class-2026-04-23.md` as if in-tree, but PR #231 is still OPEN
  at time of this drain-log so the file isn't yet in main.
  Reframed as 'pending merge of PR #231; will be in-tree once
  that PR lands' with the in-tree
  `openai-codex-cli-capability-map.md` listed first.

Same forward-author-to-future-state-of-main drift class as #377
(38% stale-resolved density). The drain-log itself exhibits the
pattern it documents — cited a forthcoming-but-not-yet-landed
file as if already present.

Inline-code-span line-wrap is the 5th observation of that class
in the corpus (now: #191 / #195 / #219 / #423 / #460). At this
density the doc-lint Class A (PR #465 BACKLOG) is high-leverage
automation.
AceHack added a commit that referenced this pull request Apr 25, 2026
…dex) (#461)

* hygiene(#268+): pr-preservation drain-log for #430 (#221 follow-up Codex)

Otto-268 follow-on: drain-log for the 4-finding cascade PR #430
(post-merge follow-up to #221 Amara 4th courier ferry absorb).
Captures four substantive Codex post-merge corrections.

Per Otto-250 training-signal discipline. Pattern observations:

1. Verbatim-claim accuracy under absorbing-side annotation —
   "preserved verbatim" claims must reflect any absorbing-side
   annotations (proposal-flag markers, footnotes, inline bracketing).
   Same shape as #235's "byte-for-byte ... excluding whitespace"
   contradiction fix.
2. Count-vs-list cardinality is now a 4th-observation pattern
   (#191 / #219 / #430 / #85). At this density, pre-commit-lint
   candidate: regex on "N drift classes / phases / audits / items"
   patterns + count the surrounding list to verify.
3. Terminology drift between parent absorb + canonical vocabulary
   ("decision-proxy-consult" vs canonical "decision-proxy-evidence")
   is recurring. Fix template: align absorption-notes text to
   canonical; preserve verbatim ferry content per Otto-227.
4. Stabilize effort-summary correction is a concrete instance of
   "claim summary doesn't match per-item tally" — future doc-lint
   candidate (sum-vs-tally check).

* drain(#461 follow-up): fix count mismatches + add merge SHA in #430 drain-log

Multiple Codex/Copilot threads on #461 caught:

- L7 'Thread count at drain: 3' → '4' (body has Threads 1-4).
- L17 'Codex caught three findings' → 'four' matching body.
- L122 'merged to main' → 'merged to main as `5698f9d`' for
  consistency with other drain-logs that include the merge SHA
  for auditability.

Same count-vs-list cardinality pattern (Class B in PR #465 doc-lint
suite BACKLOG row) — 5th instance in my own drain-logs (#195 / #231
/ #377 / #135 / #430). The pattern is genuinely universal author-
side; even when explicitly aware of it, instances slip through.
AceHack added a commit that referenced this pull request Apr 25, 2026
…#460)

* hygiene(#268+): pr-preservation drain-log for #428 (#126 follow-up Gemini xref)

Otto-268 follow-on: drain-log for the targeted single-finding cascade
PR #428 (post-merge follow-up to parent #126 Grok CLI capability map).
Captures one Gemini capability-map cross-reference truth-update.

Per Otto-250 training-signal discipline. Pattern observations:

1. Cross-capability-map xref consistency is its own class. The repo
   has a growing family of CLI capability maps (Codex / Grok /
   Gemini / Claude Code) that form a related-document cluster
   needing joint cross-reference maintenance. Future doc-lint
   candidate: maintain manifest of related-document clusters and
   warn on edit-without-sweep.

2. Multi-CLI capability-map family is its own substrate pattern.
   Worth documenting in `_patterns.md`: when multiple capability
   maps cover overlapping but distinct CLIs, they form a cluster
   that benefits from shared structure (status taxonomy, parity-
   matrix shape, score-summary conventions) and joint
   cross-reference maintenance.

3. Targeted single-finding follow-ups are the cheapest cascade
   shape — 1 finding / 1 commit / 1 merge gate. Cascade-pattern
   amortized cost is dominated by the few-thread cascades.

URL → PR-number defensive pattern continues (lesson from #454/#455
collision earlier this session).

* drain(#460 follow-up): fix capability-map xref + inline-code-span split

Codex P2 + Copilot P1+P2 caught:

- Inline code span split across newline (`docs/` on one line,
  `research/openai-codex-cli-capability-map.md` on the next) —
  reflowed to single-line so the path renders as one token.
- Capability-map cluster listed `docs/research/codex-cli-first-
  class-2026-04-23.md` as if in-tree, but PR #231 is still OPEN
  at time of this drain-log so the file isn't yet in main.
  Reframed as 'pending merge of PR #231; will be in-tree once
  that PR lands' with the in-tree
  `openai-codex-cli-capability-map.md` listed first.

Same forward-author-to-future-state-of-main drift class as #377
(38% stale-resolved density). The drain-log itself exhibits the
pattern it documents — cited a forthcoming-but-not-yet-landed
file as if already present.

Inline-code-span line-wrap is the 5th observation of that class
in the corpus (now: #191 / #195 / #219 / #423 / #460). At this
density the doc-lint Class A (PR #465 BACKLOG) is high-leverage
automation.
AceHack added a commit that referenced this pull request Apr 25, 2026
…reword)

Otto-268 backfill: drain-log for PR #435 (drain follow-up to #148:
why-the-factory-is-different live-lock cadence claim + grammar),
covering 3 threads across 2 waves with a clean self-induced-cascade
pattern.

Per Otto-250 training-signal discipline. Pattern observations capture
four load-bearing patterns:
1. Cross-reviewer convergence on Wave 1 (Codex P2 + Copilot P1
   flagging the same missing-FACTORY-HYGIENE-row) raised quality
   signal — same shape as #432's `warn` unbound finding.
2. Self-induced cascade: my Wave-1 fix introduced the Wave-2
   finding (claim "separate BACKLOG items" implied plural; actual
   BACKLOG state is one row with multiple sub-items). Pattern: when
   fixing a claim, verify the new claim is also accurate against
   current-state.
3. Reword-option-(a)-vs-(b) decision template generalizes: when
   doc asserts X but X doesn't exist, prefer reword-to-current-truth
   over add-the-thing-asserted (unless thing is small + isolated).
4. PR-mechanics: 4 of 7 cascade-PRs in this session (#135, #231,
   #432, #435) went through wave-1 + wave-2 cascade pattern; the
   reviewer-cascade is a consistent property of the merge-trigger
   surface, not a per-PR oddity.

Closes the session-drain-log backfill (Otto-268) for the major PRs
drained in this session: #135 / #235 / #432 / #434 / #195 / #219 /
#206 / #377 / #231 / #85 / #435 (11 PRs total covered across drain
logs #437-#447).
AceHack added a commit that referenced this pull request Apr 25, 2026
…reword) (#447)

* hygiene(#268): pr-preservation drain-log for #435 (live-lock cadence reword)

Otto-268 backfill: drain-log for PR #435 (drain follow-up to #148:
why-the-factory-is-different live-lock cadence claim + grammar),
covering 3 threads across 2 waves with a clean self-induced-cascade
pattern.

Per Otto-250 training-signal discipline. Pattern observations capture
four load-bearing patterns:
1. Cross-reviewer convergence on Wave 1 (Codex P2 + Copilot P1
   flagging the same missing-FACTORY-HYGIENE-row) raised quality
   signal — same shape as #432's `warn` unbound finding.
2. Self-induced cascade: my Wave-1 fix introduced the Wave-2
   finding (claim "separate BACKLOG items" implied plural; actual
   BACKLOG state is one row with multiple sub-items). Pattern: when
   fixing a claim, verify the new claim is also accurate against
   current-state.
3. Reword-option-(a)-vs-(b) decision template generalizes: when
   doc asserts X but X doesn't exist, prefer reword-to-current-truth
   over add-the-thing-asserted (unless thing is small + isolated).
4. PR-mechanics: 4 of 7 cascade-PRs in this session (#135, #231,
   #432, #435) went through wave-1 + wave-2 cascade pattern; the
   reviewer-cascade is a consistent property of the merge-trigger
   surface, not a per-PR oddity.

Closes the session-drain-log backfill (Otto-268) for the major PRs
drained in this session: #135 / #235 / #432 / #434 / #195 / #219 /
#206 / #377 / #231 / #85 / #435 (11 PRs total covered across drain
logs #437-#447).

* drain(#447 follow-up): fix #435 drain-log Reviewer field + stable-identifier xref

Codex P2 + Copilot threads on #447 caught:

- Thread 1.2 missing the `Reviewer:` field even though the drain-log
  schema (intro paragraph) declares per-thread reviewer authorship.
  Added `Reviewer: copilot-pull-request-reviewer`.
- Stale `docs/BACKLOG.md lines 1313-1328` citation: those lines now
  contain the Server Meshing section; the live-lock-smell cadence
  row drifted to ~L1452 in the P1 tooling section. Replaced with
  the stable identifier (heading text 'Live-lock smell cadence
  (round 44 auto-loop-46 absorb, landed as `tools/audit/
  live-lock-audit.sh` + hygiene-history log)') so future readers
  don't chase a moving line-number target.

Same stable-identifier-vs-line-number-xref pattern flagged on
#423's `near line 4167` finding. Documented in `_patterns.md` —
line numbers decay on every adjacent edit; stable identifiers
decay only on rename. Adopting heading text as the stable cite.

The bare `:111`/`:113` thread location format (Otto-250 file:line
shape conformance) is the broader Otto-268-wave divergence
documented in PR #467 known-divergence section — deferred to
maintainer review per that framing.
AceHack added a commit that referenced this pull request Apr 25, 2026
…arch) (#444)

* hygiene(#268): pr-preservation drain-log for #377 (setup-tooling research)

Otto-268 backfill: drain-log for PR #377 covering 13 threads — notable
for high stale-resolved density (38%, 5 of 13) where the doc was
authored against a future-state of main that adjacent PRs landed
during the review window.

Per Otto-250 training-signal discipline. Pattern observations capture
four load-bearing patterns:
1. High stale-resolved density (38%) when research doc forward-
   authors against future state of main; adjacent PRs landing
   produces natural drift.
2. "CLAUDE.md-level rule" cite shape is undisciplined — Otto-NNN IDs
   live in memory files; CLAUDE.md has the rule shapes. Fix template
   for any factory-rule cross-reference.
3. Runner-matrix vs current-truth drift is recurring; research docs
   need explicit "post-#NNN landing" annotations.
4. Otto-114 forward-mirror landing is a high-leverage substrate
   improvement — converts memory-file dangling-citation findings from
   re-fix-required to verify-and-resolve.

* drain(#444 follow-up): correct Otto-248 memory file path in #377 drain-log

Codex P1 caught that the cited memory file path in #377's drain-log
()
doesn't exist; actual file is the longer
.

This was a fix-induced citation error inherited from #377's research
doc (which used the same wrong abbreviated path). Both #377 and
#444 needed correction — landed paired (#377 force-pushed earlier
this tick, #444 corrected here). The drain-log inherited the wrong
citation from the research doc it was logging.

* drain(#444 follow-up): fix count mismatches + portable grep in #377 drain-log

Multiple Codex/Copilot threads on #444 caught:

- L16: '3 were Otto-279' → '2 were Otto-279' (matches body's
  Threads C1-C2 = 2 OTTO-279 SURFACE-CLASS).
- L22: 'Outcome distribution: 4 OTTO-279' → '2 OTTO-279 + 2 dups'
  (matches L161 final-resolution math: 4 + 5 + 2 + 2 dups = 13).
- L56: Thread A3 'Copilot P1 ×2' → 'Copilot P1 ×3' (3 thread IDs
  listed: ejy1 + eenN + eenr).
- L87: non-portable `grep -i "actionlint\|shellcheck"` → portable
  `grep -iE "actionlint|shellcheck"` (BSD/macOS grep doesn't
  support `\|` BRE alternation; the `-E` extended-regex form is
  POSIX-portable). Captured the rationale inline so the verification
  command actually works on macOS.

Same count-vs-list cardinality pattern (Class B in PR #465 doc-lint
suite BACKLOG row) — third drain-log of mine to exhibit it (after
#195 and #231). The shellcheck-rule-precision class also surfaces
via the `\|` portability finding (related to SC2086-vs-SC2046 from
#427 drain-log).

* hygiene(#444): reconcile 377 drain-log outcome distribution math

Codex P2 + Copilot both caught: header said '4 FIX + 2 dups' but
Section A enumerates 6 FIX thread-IDs (A1×1 + A2×2 + A3×3) and
Section B enumerates 5 STALE thread-IDs (B5 explicit dup of B3).
Header didn't match the per-section enumeration end-to-end; intro
prose ('3 were real-fix factual corrections' + '2 were combined')
disagreed with the header in turn.

Pick a single counting rule (by thread-ID) and apply it
consistently:
- 6 FIX (3 unique findings, 3 duplicate reviewer threads on the
  same fixes — combined into one fix commit c8d91b5)
- 5 STALE-RESOLVED-BY-REALITY (4 unique + 1 dup B5≡B3)
- 2 OTTO-279
- = 13 thread-IDs covering 9 unique findings

Fix header + intro prose + final-resolution all to match this
single rule. The 'unique findings' count (9) is preserved in
parentheses for cross-reference.
AceHack added a commit that referenced this pull request Apr 25, 2026
…ty map) (#445)

* hygiene(#268): pr-preservation drain-log for #231 (Codex CLI capability map)

Otto-268 backfill: drain-log for PR #231 — textbook case of the
post-merge reviewer-cascade pattern. 9 threads drained across 4 waves
(2 + 1 + 3 + 2 + 1 cascade pattern); every commit triggered a fresh
Codex/Copilot review wave catching new factual issues against the
freshly-changed surface.

Per Otto-250 training-signal discipline. Pattern observations capture
four load-bearing patterns:
1. Post-merge reviewer-cascade as dominant pattern; wave-by-wave the
   findings shift class (structural → rendering → internal-consistency
   → version-currency).
2. Codex enforces version-currency on the doc itself — Wave 4
   reclassifications cite OpenAI release notes (Sept 15 2025 + March 26
   2026); reviewer-enforces-rule pattern is the inverse of CLAUDE.md
   author-side version-currency rule.
3. "Partial (narrowing)" status annotation as a useful sub-state for
   gaps that are shrinking on measurable schedule.
4. Discriminator-falsification finding pattern (AGENTS.md-read
   test relying on values repeated in same doc) — same shape as
   randomized-canary in security testing.

* drain(#445 follow-up): fix thread/wave count mismatches in #231 drain-log

Multiple Copilot threads on #445 caught count-mismatch errors:

- Header claimed "9 across 4 waves (2 + 1 + 3 + 2 + 1 cascade)" —
  the "+1 cascade" doesn't correspond to any Wave 5 section in the
  body. Body has exactly 4 waves with 2+1+3+2=8 thread-sections.
- Final-resolution claimed "All 9 threads resolved" — should be 8.

Fix: aligned both header + final-resolution to the body's actual
8-thread-across-4-waves structure. Wave breakdown explicit
(Wave 1: 2, Wave 2: 1, Wave 3: 3, Wave 4: 2 — sum 8) so the math
is verifiable from the heading.

Same count-vs-list cardinality pattern documented in `_patterns.md`
(Class B in PR #465 BACKLOG row) — the doc-lint suite would catch
this at author-time.
Two Copilot P2 catches on citation auditability:

- L179: TodoWrite row cites 'OpenAI's Introducing upgrades to Codex
  post, Sept 15 2025' but Reference section had no link entry. Add
  inline link to https://openai.com/index/introducing-upgrades-to-codex/
  so the claim is auditable over time.
- L188: '#15211' was unqualified (which tracker?). Change to the
  fully-qualified [openai/codex#15211] with link to
  openai/codex#15211.

External-source-verifiability-gap pattern per
docs/pr-preservation/_patterns.md.
AceHack added a commit that referenced this pull request Apr 25, 2026
…449)

* hygiene(#268+): pr-preservation drain-log for #427 (#133 follow-up)

Otto-268 follow-on: drain-log for the post-merge cascade PR #427
following parent #133 (research: secret-handoff protocol options).

Per Otto-250 training-signal discipline. Captures two specific
fixes from the cascade wave:

1. **Shellcheck SC2086 → SC2046 correction**: prior rationale cited
   the wrong shellcheck rule. SC2046 covers unquoted command
   substitution `$(...)`; SC2086 covers unquoted variable expansion
   `$var`. Pre-commit-lint candidate: regex check on shellcheck
   SC-NNNN claims against the actual rule applying to cited code
   shape.

2. **Status-banner truth-update**: doc-claim-staleness during review
   window, same class as #135 DORA canonical definitions and #231
   Wave-4 version-currency reclassifications.

Pattern observation: drain follow-ups for substantive PRs are
themselves often small + targeted; substantive technical content
gets first-wave attention, small cleanups land as separate
follow-ups when they don't gate merge.

* hygiene(#449): reflow maintainer-asleep across line break

Copilot P2 catch: previous wrap split "maintainer-asleep" mid-token
as "maintainer-" / "asleep" which renders with extra space ("maintainer-
asleep") in Markdown. Reflow so the hyphenated compound stays on a
single line.

Class A pattern (inline-code-span / hyphen line-wrap) per
docs/pr-preservation/_patterns.md.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 32f1663e07

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread docs/research/codex-cli-first-class-2026-04-23.md Outdated
Comment thread docs/research/codex-cli-first-class-2026-04-23.md Outdated
…te §5 consistency

Two Codex post-merge cascade catches:

- L303 P1 (discriminator self-reference, recurrence of earlier
  Cursor finding): the AGENTS.md-ingestion test was non-causal
  because the proposed discriminator (the build-and-test command
  pair) was quoted inline in this doc, so reading the research
  doc would suffice — false-positive readiness signal. Replace
  with structural reference only ('the build-gate section of
  AGENTS.md') + explicit instruction that the evaluator (not the
  doc) holds the canonical answer string. The discriminator
  surface no longer names any property/file/phrase that appears
  in this doc, so the only way to satisfy the prompt is to
  actually read AGENTS.md. Same shape as Otto-231's earlier
  discriminator-falsification finding.

- L260 P2: §5 still treated TodoWrite as 'analogue unclear', but
  the parity matrix and roll-up classify it as Parity (different
  shape) per OpenAI's Sept 15 2025 announcement. Reconcile §5 to
  match the matrix so Stage-2 prioritization is reproducible from
  any section.
Copilot AI review requested due to automatic review settings April 25, 2026 08:30
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b80554fac0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread docs/research/codex-cli-first-class-2026-04-23.md Outdated
Comment thread docs/research/codex-cli-first-class-2026-04-23.md Outdated
CI markdownlint job was failing on this PR with 11 errors:
- MD032 (blanks-around-lists) at lines 36, 42, 50, 69, 198 — bold
  intro lines (**Install:** / **Authentication:** / **Key
  surfaces:** / **Config surface:** / **Running gap score**)
  immediately followed by list items with no separating blank
  line. Add blank lines.
- MD029 (ol-prefix) at lines 253, 258, 267, 274-276 — ordered
  list items in §5 numbered 2-4 (Important) then 5-7 (Nice-to-
  have) across heading breaks; markdownlint sees each block as
  a new list that should restart at 1. Renumber to 1-3 in each
  block; priority ordering preserved via the bold sub-heading
  context.

These are pre-existing failures not introduced by my drain-fixes,
but they block CI auto-merge so worth fixing.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

…exec probe

Two Codex post-merge cascade catches:

- L189 P1: cron-row was 'Likely gap (not documented)' but Codex
  Cloud has a documented thread-automations primitive at
  developers.openai.com/codex/app/automations covering custom
  cron syntax + minute-based heartbeat + daily/weekly schedules.
  Verified via WebSearch (April 2026 docs current). Reclassify
  to Partial (different surface) — local CLI doesn't expose it,
  Codex Cloud does. Update gap-score totals: was 11/4/3/2 with
  cron as critical; now 11/5/2/2 with cron reachable via
  cloud-thread surface. Update §3 'biggest single gap' prose +
  §5 'critical' → 'high-priority (reframed)' section. Verify-
  version-currency rule applied (CLAUDE.md memory feedback).

- L335 P2: Stage-2 `codex exec` probe used 'list the top 5 open
  PRs on LFG' which couples to GitHub access — failures from
  missing creds / repo visibility / network policy would look
  like exec parity failures. Replace with repo-local probe
  ('count the .fs files under src/Core/ and report the count
  and the longest filename') that exercises exec semantics
  without external dependencies. Same shape as the Otto-231
  discriminator-falsification class: probe surface must isolate
  the property under test from confounders.

Cron classification fix is a substantive parity-research
correction; bumps Otto-in-Codex viability from 'critical-gap-
blocker' to 'reachable-via-different-surface'.
@AceHack AceHack merged commit 1c2b64c into main Apr 25, 2026
13 checks passed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b2b09fdc85

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread docs/research/codex-cli-first-class-2026-04-23.md
Comment thread docs/research/codex-cli-first-class-2026-04-23.md
Comment thread docs/research/codex-cli-first-class-2026-04-23.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants