tools: mark Codex loop headless provenance#3051
Merged
Merged
Conversation
Co-Authored-By: Codex <noreply@openai.com>
Co-Authored-By: Codex <noreply@openai.com>
Co-Authored-By: Codex <noreply@openai.com>
There was a problem hiding this comment.
Pull request overview
Adds machine-readable headless provenance markers (origin, surface, run id, session) to the macOS launchd Codex loop so background-loop work can be distinguished from foreground Codex chat work in heartbeats, prompts, PR bodies, and commit trailers.
Changes:
- Export
codexLoopEnv()and extendbuildCodexPrompt()with run id / origin / surface / session; stamp heartbeat JSON and heartbeat log line with these fields; pass provenance env vars into spawnedcodex exec. - Document the headless-vs-foreground distinction and the
Headless-*/Codex-*trailer conventions in.codex/AGENTS.mdanddocs/CODEX-HARNESS-NOTES.md. - Add focused unit-test coverage for the new
codexLoopEnvhelper and the new prompt provenance text.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| .codex/bin/codex-loop-tick.ts | Adds loop origin/surface/session/runId constants, codexLoopEnv helper, threads provenance into prompt, heartbeat JSON, heartbeat log, codex-state, and the spawned codex env. |
| tools/codex-loop-tick.test.ts | Covers the new helper export and asserts the headless prompt contains origin/surface/run-id markers and trailer strings. |
| docs/CODEX-HARNESS-NOTES.md | Documents the new env vars, PR-body footer, and commit-trailer convention; reflows the launchd field table. |
| .codex/AGENTS.md | Documents Codex-Origin / Codex-Surface / Codex-Loop-Run-Id trailer convention for headless commits. |
AceHack
added a commit
that referenced
this pull request
May 13, 2026
…competing PR closed Records: PR #3053 (B-0444 collision) merged; PR #3051 (Codex provenance) merged; competing PR #3052 closed with substrate-honest comment; new audit tool finds 12 additional ID collisions on main (1 three-way: B-0409). PR #3056 ships the tool + B-0451 row tracking the per-collision cleanup work. Co-Authored-By: Claude <noreply@anthropic.com>
AceHack
added a commit
that referenced
this pull request
May 13, 2026
…p row (#3056) * feat(bg/audit): duplicate-row-id audit tool + B-0451 substrate-cleanup row While resolving the B-0444 ID collision (PR #3053), an inline audit surfaced 12 ADDITIONAL duplicate-ID groups across the backlog directory. Silently-overwriting substrate state is high-severity hygiene risk: a consumer of `id: B-0409` gets one of THREE files depending on load order; every other substrate consumer's implicit primary-key guarantee is broken. Changes: - `tools/bg/audit-duplicate-row-ids.ts` — new audit tool: walks `docs/backlog/**/*.md` via `git ls-files`, extracts each frontmatter `id:` value, reports any ID appearing in >1 file. Exit code 0 = clean; 1 = duplicates found. - `tools/bg/audit-duplicate-row-ids.test.ts` — 14 tests covering id extraction, group sorting, real-world patterns (clean substrate, pair collision, triple collision, missing-id row skip, sub-row IDs, unreadable-file resilience). - `docs/backlog/P1/B-0451-duplicate-row-id-substrate-cleanup-2026-05-13.md` — tracks the cleanup work: lists all 12 collisions, classifies them into two patterns (cross-priority namespace bleed + within-priority concurrent decomposition), defines the per-collision resolution rule (keep the row with external references; renumber the other), and outlines CI-wiring as future work. Empirical findings: - 559 rows scanned - 12 collision groups (1 three-way: B-0409; 11 pairs) - Most pairs are P1-vs-P2 cross-priority bleed (Otto-Desktop vs parallel agents filing in overlapping ranges) - The B-0090.x sub-rows show a within-priority decomposition race (Riven's atomic-children sweep vs earlier B-0090 decomposition, both 2026-05-10/11) Tests: 14/14 pass on `tools/bg/audit-duplicate-row-ids.test.ts`. Co-Authored-By: Claude <noreply@anthropic.com> * shard(tick): 2317Z — duplicate-ID audit tool surfaces 12 collisions; competing PR closed Records: PR #3053 (B-0444 collision) merged; PR #3051 (Codex provenance) merged; competing PR #3052 closed with substrate-honest comment; new audit tool finds 12 additional ID collisions on main (1 three-way: B-0409). PR #3056 ships the tool + B-0451 row tracking the per-collision cleanup work. Co-Authored-By: Claude <noreply@anthropic.com> * fix(bg/audit): address Copilot round-1 review on PR #3056 Three Copilot findings resolved: 1. Missing `// eslint-disable-next-line sonarjs/no-os-command-from-path` on the `spawnSync("git", ...)` invocation. Added the suffixed- rationale comment matching the form used in `tools/bg/backlog-ready-notifier.ts`. 2. `rowsScanned` was misleadingly named — it counted only rows with an extractable `id:` field, not total files inspected. The tests already asserted the smaller count; the CLI's "X rows scanned" message therefore under-reported. Renamed to `rowsWithId` and updated docstring + CLI message accordingly: "X rows with id field, no duplicate IDs". 3. Dead `idToFiles.size > 0` ternary — `reduce` on an empty iterable already returns 0. Simplified to a plain spread+reduce. Tests: 14/14 pass; the `rowsWithId` rename mechanically updates 3 test assertions. Audit tool still reports 12 collision groups on origin/main (no behavior change beyond the cleaner output). Co-Authored-By: Claude <noreply@anthropic.com> * shard(tick): 2339Z — Copilot round-1 on PR #3056 addressed; 13 PRs merged this session Records: PRs #3054 + #3055 (Otto-Desktop's shadow-log + archive) merged. PR #3056 round-1 review surfaced 3 valid Copilot findings (missing eslint-disable, misleading field name, dead ternary); all fixed in 7444a05 + threads resolved. Both my PRs back to wait-ci, threads-clear. Co-Authored-By: Claude <noreply@anthropic.com> * fix(bg/audit): surface readFileSync errors instead of silently skipping (Codex P2) Codex P2 on PR #3056: `auditRowFiles` previously caught `readFileSync` failures with a bare `continue;` — silently swallowing the error and moving on. That created a false-negative path: if a backlog file was unreadable (permission, IO error, race with concurrent fs ops), any duplicate ID inside it never got checked, and the CLI could report "no duplicate IDs" with the failure hidden. Fix: - New `ReadError = { file, reason }` type - `AuditResult.readErrors: ReadError[]` accumulates per-file failures (preserves the original "continue scanning" behavior — see ALL problems, not abort on first) - CLI surfaces read errors with a distinct heading + exits non-zero when ANY read error OR duplicate is present - Success message only prints when both counts are 0 Tests updated (15/15 pass): - Renamed "unreadable files are skipped without crashing" → "unreadable files surface as readErrors (Codex P2: don't silently skip)" + assertions on the readErrors[] shape - Added "readErrors is empty when all files readable" to pin the zero-state contract Co-Authored-By: Claude <noreply@anthropic.com> * shard(tick): 2358Z — Codex P2 round-2 on PR #3056 (surface read errors); rate-limit-failed CI triaged Records: PR #3056 CI failures triaged as GitHub-API-rate-limit exhaust during SARIF upload (not real bugs). Codex P2 round-2 finding addressed: `auditRowFiles` now accumulates `readErrors[]` and CLI fails non-zero on any read error or duplicate. 15/15 tests pass. Thread resolved. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
codex execVerification
bun test tools/codex-loop-tick.test.tsbun run typecheckbun run lint:markdown .codex/AGENTS.md docs/CODEX-HARNESS-NOTES.md docs/claims/codex-loop-origin-marker-20260513.mdbunx prettier --check .codex/bin/codex-loop-tick.ts tools/codex-loop-tick.test.ts .codex/AGENTS.md docs/CODEX-HARNESS-NOTES.md docs/claims/codex-loop-origin-marker-20260513.mdZETA_CODEX_LOOP_WORKTREE="$PWD" ZETA_CODEX_LOOP_STATE_DIR=<tmp> ZETA_CODEX_LOOP_LOG_DIR=<tmp> ZETA_CODEX_LOOP_DRY_RUN=1 bun .codex/bin/codex-loop-tick.tsNote: an accidental full-repo
bun run lint:typescript ...invocation expanded toeslint .and failed on pre-existing unrelated repo lint debt; the focused TypeScript test and typecheck passed.