Skip to content

tools: mark Codex loop headless provenance#3051

Merged
AceHack merged 3 commits into
mainfrom
claim/codex-loop-origin-marker-20260513
May 13, 2026
Merged

tools: mark Codex loop headless provenance#3051
AceHack merged 3 commits into
mainfrom
claim/codex-loop-origin-marker-20260513

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 13, 2026

Summary

  • stamp Codex launchd loop heartbeats/state with origin, surface, and run id
  • pass headless provenance env vars into spawned codex exec
  • require background PR bodies and commits to carry searchable provenance markers
  • document the foreground-vs-headless distinction and cover it with focused tests

Verification

  • bun test tools/codex-loop-tick.test.ts
  • bun run typecheck
  • bun run lint:markdown .codex/AGENTS.md docs/CODEX-HARNESS-NOTES.md docs/claims/codex-loop-origin-marker-20260513.md
  • bunx prettier --check .codex/bin/codex-loop-tick.ts tools/codex-loop-tick.test.ts .codex/AGENTS.md docs/CODEX-HARNESS-NOTES.md docs/claims/codex-loop-origin-marker-20260513.md
  • ZETA_CODEX_LOOP_WORKTREE="$PWD" ZETA_CODEX_LOOP_STATE_DIR=<tmp> ZETA_CODEX_LOOP_LOG_DIR=<tmp> ZETA_CODEX_LOOP_DRY_RUN=1 bun .codex/bin/codex-loop-tick.ts

Note: an accidental full-repo bun run lint:typescript ... invocation expanded to eslint . and failed on pre-existing unrelated repo lint debt; the focused TypeScript test and typecheck passed.

AceHack and others added 3 commits May 13, 2026 18:45
Co-Authored-By: Codex <noreply@openai.com>
Co-Authored-By: Codex <noreply@openai.com>
Co-Authored-By: Codex <noreply@openai.com>
Copilot AI review requested due to automatic review settings May 13, 2026 22:52
@AceHack AceHack enabled auto-merge (squash) May 13, 2026 22:53
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds machine-readable headless provenance markers (origin, surface, run id, session) to the macOS launchd Codex loop so background-loop work can be distinguished from foreground Codex chat work in heartbeats, prompts, PR bodies, and commit trailers.

Changes:

  • Export codexLoopEnv() and extend buildCodexPrompt() with run id / origin / surface / session; stamp heartbeat JSON and heartbeat log line with these fields; pass provenance env vars into spawned codex exec.
  • Document the headless-vs-foreground distinction and the Headless-* / Codex-* trailer conventions in .codex/AGENTS.md and docs/CODEX-HARNESS-NOTES.md.
  • Add focused unit-test coverage for the new codexLoopEnv helper and the new prompt provenance text.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated no comments.

File Description
.codex/bin/codex-loop-tick.ts Adds loop origin/surface/session/runId constants, codexLoopEnv helper, threads provenance into prompt, heartbeat JSON, heartbeat log, codex-state, and the spawned codex env.
tools/codex-loop-tick.test.ts Covers the new helper export and asserts the headless prompt contains origin/surface/run-id markers and trailer strings.
docs/CODEX-HARNESS-NOTES.md Documents the new env vars, PR-body footer, and commit-trailer convention; reflows the launchd field table.
.codex/AGENTS.md Documents Codex-Origin / Codex-Surface / Codex-Loop-Run-Id trailer convention for headless commits.

@AceHack AceHack merged commit e2c306e into main May 13, 2026
29 checks passed
@AceHack AceHack deleted the claim/codex-loop-origin-marker-20260513 branch May 13, 2026 22:55
AceHack added a commit that referenced this pull request May 13, 2026
…competing PR closed

Records: PR #3053 (B-0444 collision) merged; PR #3051 (Codex
provenance) merged; competing PR #3052 closed with substrate-honest
comment; new audit tool finds 12 additional ID collisions on main
(1 three-way: B-0409). PR #3056 ships the tool + B-0451 row tracking
the per-collision cleanup work.

Co-Authored-By: Claude <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 13, 2026
…p row (#3056)

* feat(bg/audit): duplicate-row-id audit tool + B-0451 substrate-cleanup row

While resolving the B-0444 ID collision (PR #3053), an inline audit
surfaced 12 ADDITIONAL duplicate-ID groups across the backlog
directory. Silently-overwriting substrate state is high-severity
hygiene risk: a consumer of `id: B-0409` gets one of THREE files
depending on load order; every other substrate consumer's implicit
primary-key guarantee is broken.

Changes:

- `tools/bg/audit-duplicate-row-ids.ts` — new audit tool: walks
  `docs/backlog/**/*.md` via `git ls-files`, extracts each frontmatter
  `id:` value, reports any ID appearing in >1 file. Exit code 0 = clean;
  1 = duplicates found.
- `tools/bg/audit-duplicate-row-ids.test.ts` — 14 tests covering id
  extraction, group sorting, real-world patterns (clean substrate,
  pair collision, triple collision, missing-id row skip, sub-row IDs,
  unreadable-file resilience).
- `docs/backlog/P1/B-0451-duplicate-row-id-substrate-cleanup-2026-05-13.md`
  — tracks the cleanup work: lists all 12 collisions, classifies them
  into two patterns (cross-priority namespace bleed + within-priority
  concurrent decomposition), defines the per-collision resolution rule
  (keep the row with external references; renumber the other), and
  outlines CI-wiring as future work.

Empirical findings:
- 559 rows scanned
- 12 collision groups (1 three-way: B-0409; 11 pairs)
- Most pairs are P1-vs-P2 cross-priority bleed (Otto-Desktop vs
  parallel agents filing in overlapping ranges)
- The B-0090.x sub-rows show a within-priority decomposition race
  (Riven's atomic-children sweep vs earlier B-0090 decomposition,
  both 2026-05-10/11)

Tests: 14/14 pass on `tools/bg/audit-duplicate-row-ids.test.ts`.

Co-Authored-By: Claude <noreply@anthropic.com>

* shard(tick): 2317Z — duplicate-ID audit tool surfaces 12 collisions; competing PR closed

Records: PR #3053 (B-0444 collision) merged; PR #3051 (Codex
provenance) merged; competing PR #3052 closed with substrate-honest
comment; new audit tool finds 12 additional ID collisions on main
(1 three-way: B-0409). PR #3056 ships the tool + B-0451 row tracking
the per-collision cleanup work.

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(bg/audit): address Copilot round-1 review on PR #3056

Three Copilot findings resolved:

1. Missing `// eslint-disable-next-line sonarjs/no-os-command-from-path`
   on the `spawnSync("git", ...)` invocation. Added the suffixed-
   rationale comment matching the form used in
   `tools/bg/backlog-ready-notifier.ts`.

2. `rowsScanned` was misleadingly named — it counted only rows with
   an extractable `id:` field, not total files inspected. The tests
   already asserted the smaller count; the CLI's "X rows scanned"
   message therefore under-reported. Renamed to `rowsWithId` and
   updated docstring + CLI message accordingly: "X rows with id
   field, no duplicate IDs".

3. Dead `idToFiles.size > 0` ternary — `reduce` on an empty iterable
   already returns 0. Simplified to a plain spread+reduce.

Tests: 14/14 pass; the `rowsWithId` rename mechanically updates 3
test assertions. Audit tool still reports 12 collision groups on
origin/main (no behavior change beyond the cleaner output).

Co-Authored-By: Claude <noreply@anthropic.com>

* shard(tick): 2339Z — Copilot round-1 on PR #3056 addressed; 13 PRs merged this session

Records: PRs #3054 + #3055 (Otto-Desktop's shadow-log + archive)
merged. PR #3056 round-1 review surfaced 3 valid Copilot findings
(missing eslint-disable, misleading field name, dead ternary); all
fixed in 7444a05 + threads resolved. Both my PRs back to wait-ci,
threads-clear.

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(bg/audit): surface readFileSync errors instead of silently skipping (Codex P2)

Codex P2 on PR #3056: `auditRowFiles` previously caught `readFileSync`
failures with a bare `continue;` — silently swallowing the error and
moving on. That created a false-negative path: if a backlog file was
unreadable (permission, IO error, race with concurrent fs ops), any
duplicate ID inside it never got checked, and the CLI could report
"no duplicate IDs" with the failure hidden.

Fix:

- New `ReadError = { file, reason }` type
- `AuditResult.readErrors: ReadError[]` accumulates per-file failures
  (preserves the original "continue scanning" behavior — see ALL
  problems, not abort on first)
- CLI surfaces read errors with a distinct heading + exits non-zero
  when ANY read error OR duplicate is present
- Success message only prints when both counts are 0

Tests updated (15/15 pass):

- Renamed "unreadable files are skipped without crashing" →
  "unreadable files surface as readErrors (Codex P2: don't silently
  skip)" + assertions on the readErrors[] shape
- Added "readErrors is empty when all files readable" to pin the
  zero-state contract

Co-Authored-By: Claude <noreply@anthropic.com>

* shard(tick): 2358Z — Codex P2 round-2 on PR #3056 (surface read errors); rate-limit-failed CI triaged

Records: PR #3056 CI failures triaged as GitHub-API-rate-limit exhaust
during SARIF upload (not real bugs). Codex P2 round-2 finding addressed:
`auditRowFiles` now accumulates `readErrors[]` and CLI fails non-zero
on any read error or duplicate. 15/15 tests pass. Thread resolved.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants