Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6bd7b2a707
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Adds the next substrate-claim-checker check-type to catch existence drift (claims that repo paths exist when they don’t), extending the tooling described in B-0170’s roadmap.
Changes:
- Add
check-existence.tsBun script to detect non-existent path claims (backticks + markdown links) with future-state exemptions. - Add initial unit tests for the path-claim detection heuristics.
- Document v0.5 existence-drift behavior and usage in the tool README.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| tools/substrate-claim-checker/check-existence.ts | New existence-drift checker implementation (path-claim detection + resolution strategy + CLI). |
| tools/substrate-claim-checker/check-existence.test.ts | New bun:test suite for helper functions (path heuristics + future-state detection + fence skipping). |
| tools/substrate-claim-checker/README.md | Documents the new v0.5 checker (what it catches, limitations, usage). |
6bd7b2a to
5bd1bf9
Compare
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
5bd1bf9 to
9fa4459
Compare
|
All 8 findings + the markdownlint failure addressed in latest force-push (9fa4459):
Retroactive eval still shows 7/49 drift rate (down from 8 pre-marker-expansion). All 38 tests pass. Resolving. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9fa4459484
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
9fa4459 to
bddde70
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bddde70c29
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
bddde70 to
b8127f5
Compare
|
All 5 new findings addressed in latest force-push (b8127f5):
All 38 tests pass ( Resolving. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b8127f5497
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
b8127f5 to
d4fbe2f
Compare
|
Both fixes in d4fbe2f:
38 tests still pass. Resolving. |
d4fbe2f to
5067e9a
Compare
|
Both findings addressed in 5067e9a:
Updated test: replaced 39 tests pass (was 38; added 1 new). Resolving. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5067e9ab24
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Both findings addressed in latest force-push:
42 tests pass (was 39; added 3). Resolving. |
5067e9a to
cc60367
Compare
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
cc60367 to
e6f78e0
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e6f78e03fb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…-drift sub-class) Second sub-class of B-0170's 7-class taxonomy. Catches claims that a file or directory exists when it doesn't on disk. **What it catches**: - Backtick-quoted paths in markdown - Markdown link targets (relative paths only) - Cases where the path doesn't resolve to anything on disk **Resolution discipline**: tries 3 candidate roots in priority order: 1. File's own directory (intra-dir cross-references) 2. Parent directory (bare-filename refs for files in subdirs) 3. Repository root (repo-relative paths) Stops on first hit; only emits finding if NO root resolves. **Future-state context detection**: claims marked future-state are exempt (proposed/planned/will-be/would-be/tbd/deferred/i'm-guessing/ concretely-something-like/will-probably/etc.). **Skipped automatically**: globs (*, ?, [...]), URLs, anchors, absolute paths, placeholders, fenced code blocks. **Tests**: 17 new tests across looksLikePath / isFutureStateContext / findPathClaims (33 total in tools/substrate-claim-checker/, all pass). **Multiple findings this session would have been caught**: - PR #1280 B-0173 ground-truth recovery claimed `tools/git/hooks/` exists; reviewer flagged that it doesn't (B-0173 row deliverable) - PR #1289 + #1290 review threads flagged similar existence-drift patterns **Sanity check on real substrate**: - alignment-frontier memo: clean (0 findings) - B-0173 guess file (post-#1285 fix): 2 false-positives in calibration-delta tables (acceptable v0.5 limitation; documented) - B-0166 guess file: 1 finding (proposed `tools/chat-events/replay.ts`) **v0.5 known limitations** (documented in README): - Calibration-delta tables citing path-forms as discussion topics may false-positive (mitigated but imperfect) - Section-level future-state markers don't propagate to claims further down; use inline markers per claim or paragraph **Out of scope (v0.6+)**: - Tool-existence (e.g., "running `bun X` returns Y") — separate empirical-output drift sub-class - URL existence (web fetches; not file-system) - Convention drift, path-form drift, self-recursive drift — separate sub-classes per the 7-class taxonomy Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
e6f78e0 to
89f3b5f
Compare
|
Round-7 (6 findings) addressed in 89f3b5f:
48 tests pass. Resolving. |
…o BACKLOG.md index + replace B-0XXXX placeholder (#1306 post-merge findings) Three real findings from #1306 review (post-merge): 1. **P3 → P2**: per docs/BACKLOG.md taxonomy, P2 IS "research-grade". B-0174 is research-grade frontier-ability measurement. Initial filing in P3 was a category error. Moved file from docs/backlog/P3/ → docs/backlog/P2/, updated frontmatter priority, rewrote "Why P3" section as "Why P2" with promotion- to-P1 trigger conditions 2. **B-0XXXX placeholder → real refs**: replaced the placeholder with explicit references to the existing in-the-moment guesses: B-0173 (hook-authoring) + B-0172 (plugin-packaging) + B-0166 (chat-as-DBSP-event) under memory/architectural-intent-guesses/ 3. **BACKLOG.md not regenerated**: added B-0174 entry to the P2 section between B-0172 and the P3 section header Out of scope: - The "review-cycle stats conflict with tick history" finding (PR #1306 thread #4) is debatable — the tick-history numbers evolved as the PR went through more rounds; the row's "19+ across 5 rounds" was accurate at write-time. Cumulative count is now 21+ findings across 7 rounds; the row will be updated when #1298 actually merges with the final convergence-signature Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…o BACKLOG.md index + replace B-0XXXX placeholder (#1306 post-merge findings) Three real findings from #1306 review (post-merge): 1. **P3 → P2**: per docs/BACKLOG.md taxonomy, P2 IS "research-grade". B-0174 is research-grade frontier-ability measurement. Initial filing in P3 was a category error. Moved file from docs/backlog/P3/ → docs/backlog/P2/, updated frontmatter priority, rewrote "Why P3" section as "Why P2" with promotion- to-P1 trigger conditions 2. **B-0XXXX placeholder → real refs**: replaced the placeholder with explicit references to the existing in-the-moment guesses: B-0173 (hook-authoring) + B-0172 (plugin-packaging) + B-0166 (chat-as-DBSP-event) under memory/architectural-intent-guesses/ 3. **BACKLOG.md not regenerated**: added B-0174 entry to the P2 section between B-0172 and the P3 section header Out of scope: - The "review-cycle stats conflict with tick history" finding (PR #1306 thread #4) is debatable — the tick-history numbers evolved as the PR went through more rounds; the row's "19+ across 5 rounds" was accurate at write-time. Cumulative count is now 21+ findings across 7 rounds; the row will be updated when #1298 actually merges with the final convergence-signature Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…o BACKLOG.md index + replace B-0XXXX placeholder (#1306 post-merge findings) (#1309) Three real findings from #1306 review (post-merge): 1. **P3 → P2**: per docs/BACKLOG.md taxonomy, P2 IS "research-grade". B-0174 is research-grade frontier-ability measurement. Initial filing in P3 was a category error. Moved file from docs/backlog/P3/ → docs/backlog/P2/, updated frontmatter priority, rewrote "Why P3" section as "Why P2" with promotion- to-P1 trigger conditions 2. **B-0XXXX placeholder → real refs**: replaced the placeholder with explicit references to the existing in-the-moment guesses: B-0173 (hook-authoring) + B-0172 (plugin-packaging) + B-0166 (chat-as-DBSP-event) under memory/architectural-intent-guesses/ 3. **BACKLOG.md not regenerated**: added B-0174 entry to the P2 section between B-0172 and the P3 section header Out of scope: - The "review-cycle stats conflict with tick history" finding (PR #1306 thread #4) is debatable — the tick-history numbers evolved as the PR went through more rounds; the row's "19+ across 5 rounds" was accurate at write-time. Cumulative count is now 21+ findings across 7 rounds; the row will be updated when #1298 actually merges with the final convergence-signature Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Second sub-class implementation for B-0170 (substrate-claim-checker). Adds
check-existence.tscovering the existence-drift sub-class — claims that a file or directory exists when it doesn't.Multiple findings this session would have been caught automatically:
tools/git/hooks/exists; reviewer caught it manuallyApproach
For each path claim, try 3 candidate roots in priority order:
Future-state markers exempt the claim:
(proposed),(planned), "would be", "will probably", "lower confidence", etc.Skipped: globs, URLs, anchors, absolute paths, placeholders, fenced code blocks.
Tests
17 new tests; 33 total in tools/substrate-claim-checker/ (all pass):
looksLikePath: 7 testsisFutureStateContext: 5 testsfindPathClaims: 5 testsSanity check on real substrate
feedback_alignment_frontier_*.md: clean (0 findings)tools/chat-events/replay.ts)Known limitations (v0.5)
Documented in README:
Out of scope (v0.6+)
🤖 Generated with Claude Code