Skip to content

GROUND-TRUTH-RECOVERY: B-0173 calibration delta — Otto's first in-the-moment guess (mixed accuracy across layers)#1280

Merged
AceHack merged 1 commit intomainfrom
free-memory/ground-truth-recovery-b-0173-hook-authoring-calibration-otto-2026-05-03
May 3, 2026
Merged

GROUND-TRUTH-RECOVERY: B-0173 calibration delta — Otto's first in-the-moment guess (mixed accuracy across layers)#1280
AceHack merged 1 commit intomainfrom
free-memory/ground-truth-recovery-b-0173-hook-authoring-calibration-otto-2026-05-03

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 3, 2026

Summary

Per the guess-then-verify architectural-intent calibration protocol (PR #1278), this PR follows the prior in-the-moment guess (PR #1279) by recovering ground truth via direct read of B-0173's row body and recording the calibration delta.

This is the first complete calibration data point for the protocol — guess timestamped + committed BEFORE research, then ground truth recovered, then delta recorded.

Calibration result

Layer Score Result
Architectural intent 6/10 PARTIAL-MATCH — got harness-native + separation-of-concerns; missed contract-based development / Design-by-Contract / OpenSpec primary frame
Substrate-content 5/10 MIXED — right path; right pre-commit hook; missed multi-hook architecture (commit-msg + CI on PR descriptions are separate surfaces)
Specific implementation 3/10 MOSTLY-OFF — confused git hooks with Claude Code's .claude/settings.json hook system (fundamentally different mechanisms)
Cross-row composition 5/10 Got B-0170 implicit; missed B-0171 (OpenSpec) as load-bearing contract source

Pattern observed

Inference defaults to generalization-from-principle rather than specific-mechanism-recall.

  • Strong on principles (separation of concerns; harness-native; composition)
  • Weak on specifics (which hook system; which timing windows; which contract source)

For substrate-content + implementation specifics, principle-based inference is unreliable; specific-mechanism-research is needed.

Self-confidence calibration

Well-calibrated — high-confidence layer (architectural) scored highest; low-confidence layer (specific implementation) scored lowest. Confidence levels matched accuracy ordering. This is itself useful — Otto's confidence self-report is reliable.

What I missed (substantive)

  1. Contract-based development as primary frame — Aaron's verbatim "this feature is great for reminding yourself to do the right thing the pre conditions and post condtions in contract based development or spec based development like openspec" names DbC/OpenSpec as the load-bearing motivating frame, not just a benefit
  2. Multi-hook architecture — three hooks (pre-commit + commit-msg + CI workflow), each covering a different timing window for fact-claims (staged content / commit message / PR description)
  3. git hooks vs Claude Code hooks — fundamentally different mechanisms; I guessed the wrong one
  4. B-0171 (OpenSpec) as load-bearing dependency — without specs, hooks have no contracts to enforce

Cross-model retroactive replay readiness

This calibration data point is now reproducible. Give another model B-0173's row title only + the same prior-substrate context, see how their guess compares. The fact that I missed the contract-based-development frame is a genuine inference-failure that other models can be tested against.

Test plan

  • Ground truth recorded with verbatim Aaron quote + 3-hook architecture + dependencies
  • Calibration delta computed across 4 layers (architectural / substrate-content / specific / cross-row)
  • Score per layer + analysis
  • Pattern observation captured for future-Otto

🤖 Generated with Claude Code

…-moment guess scored against actual row body (mixed accuracy across layers)

Per the guess-then-verify architectural-intent calibration protocol
(PR #1278; Aaron 2026-05-03), this commit follows the prior in-the-moment
guess (PR #1279, committed cf1dc7b 2026-05-03 ~02:42Z) by recovering
ground truth via direct read of B-0173's row body and recording the
calibration delta.

**Calibration result by layer:**

- Architectural intent: 6/10 PARTIAL-MATCH — got harness-native +
  separation-of-concerns; missed the contract-based development /
  Design-by-Contract / OpenSpec primary frame Aaron named verbatim
- Substrate-content: 5/10 MIXED — right path (tools/git/hooks/);
  right pre-commit hook; missed the multi-hook architecture
  (commit-msg + CI workflow on PR descriptions are separate surfaces)
- Specific implementation: 3/10 MOSTLY-OFF — confused git hooks with
  Claude Code's .claude/settings.json hook system (fundamentally
  different mechanisms); missed strict-vs-warn mode + per-check
  opt-out via comment markers
- Cross-row composition: 5/10 — got B-0170 (substrate-claim-checker)
  implicit; missed B-0171 (OpenSpec) as load-bearing contract source

**Pattern observed**: Inference defaults to generalization-from-principle
rather than specific-mechanism-recall. Strong on principles (separation
of concerns; harness-native; composition); weak on specifics (which
hook system; which timing windows; which contract source). For
substrate-content + implementation specifics, principle-based
inference is unreliable; specific-mechanism-research is needed.

**Self-confidence calibration**: well-calibrated — high-confidence layer
(architectural) scored highest; low-confidence layer (specific
implementation) scored lowest. Confidence levels matched accuracy
ordering.

**Cross-model retroactive replay readiness**: this calibration data
point is now reproducible — give another model B-0173's row title only
+ the same prior-substrate context, see how their guess compares.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 3, 2026 02:51
@AceHack AceHack enabled auto-merge (squash) May 3, 2026 02:51
@AceHack AceHack merged commit ea11617 into main May 3, 2026
24 of 25 checks passed
@AceHack AceHack deleted the free-memory/ground-truth-recovery-b-0173-hook-authoring-calibration-otto-2026-05-03 branch May 3, 2026 02:52
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Records the recovered ground truth for the first “guess-then-verify” architectural-intent calibration data point (B-0173), and documents the resulting calibration delta across multiple inference layers.

Changes:

  • Populates the previously-empty “Ground truth” section by quoting and summarizing the B-0173 backlog row body.
  • Adds a structured “Calibration delta” section comparing the initial guess vs recovered ground truth.
  • Appends timestamps and recovery method details for reproducibility.

@AceHack
Copy link
Copy Markdown
Member Author

AceHack commented May 3, 2026

Both findings (P1 truth-drift) addressed in follow-up #1285. The recovery section conflated 'what B-0173 proposes' with 'what currently exists' — fix adds explicit '(proposed in B-0173 — does NOT yet exist)' qualifiers + '(not yet recognized by v0.4.4)' notes on env var + opt-out markers.

This was a substrate-claim-checker existence-drift class violation that should have been caught at write-time. v0.4.4 only covers count-drift; the same tool would catch this via the existence-drift sub-class check when v1+ adds it (per B-0170 follow-up).

Resolving — fix is in #1285 with auto-merge armed.

AceHack added a commit that referenced this pull request May 3, 2026
… section — clarify proposed-vs-current state (#1285)

#1280's review (post-merge) flagged P1 truth-drift: my recovery section
described B-0173's proposed hooks (pre-commit / commit-msg / CI workflow)
+ implementation details (env-var-mode-switch, opt-out comment markers)
in a way that read as if these files / features already existed.

They don't. B-0173 is an open backlog row; tools/git/hooks/ does not
exist on main; substrate-claim-checker v0.4.4 doesn't recognize the
env-var or opt-out markers — these are all B-0173 deliverables to be
implemented when the row is picked up.

Fix: explicit "(as PROPOSED in B-0173 — these files do NOT yet exist)"
qualifier on the substrate-content section header + "(proposed)" tags
on each of the three hook bullets + explicit note that env var + opt-out
markers are "not yet recognized by v0.4.4."

This is a substrate-claim-checker existence-drift class violation that
should have been caught at write-time. The same v0.4.4 tool would have
caught it via the existence-drift sub-class check (when v1+ adds it
per B-0170 follow-up).

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 3, 2026
…-cycle (6 findings, 2 substantive fixes) (#1286)

#1282 (guess #2) + #1280 (B-0173 recovery, post-merge) reviews
generated 6 findings. 2 P1 substantive fixes shipped (#1285
existence-drift on B-0173 recovery; MEMORY.md discoverability + grammar
on #1282). 4 clarified or resolved with reasoning.

Key insight: even calibration-recovery sections are subject to
substrate-claim-checker proposed-vs-current state discipline. The
existence-drift class violation should have been caught at write-time
by B-0170 v1+ when the existence-drift sub-class is implemented.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 3, 2026
…-drift sub-class)

Second sub-class of B-0170's 7-class taxonomy. Catches claims that a
file or directory exists when it doesn't on disk.

**What it catches**:

- Backtick-quoted paths in markdown
- Markdown link targets (relative paths only)
- Cases where the path doesn't resolve to anything on disk

**Resolution discipline**: tries 3 candidate roots in priority order:

1. File's own directory (intra-dir cross-references)
2. Parent directory (bare-filename refs for files in subdirs)
3. Repository root (repo-relative paths)

Stops on first hit; only emits finding if NO root resolves.

**Future-state context detection**: claims marked future-state are
exempt (proposed/planned/will-be/would-be/tbd/deferred/i'm-guessing/
concretely-something-like/will-probably/etc.).

**Skipped automatically**: globs (*, ?, [...]), URLs, anchors,
absolute paths, placeholders, fenced code blocks.

**Tests**: 17 new tests across looksLikePath / isFutureStateContext /
findPathClaims (33 total in tools/substrate-claim-checker/, all pass).

**Multiple findings this session would have been caught**:

- PR #1280 B-0173 ground-truth recovery claimed `tools/git/hooks/`
  exists; reviewer flagged that it doesn't (B-0173 row deliverable)
- PR #1289 + #1290 review threads flagged similar existence-drift
  patterns

**Sanity check on real substrate**:
- alignment-frontier memo: clean (0 findings)
- B-0173 guess file (post-#1285 fix): 2 false-positives in
  calibration-delta tables (acceptable v0.5 limitation; documented)
- B-0166 guess file: 1 finding (proposed `tools/chat-events/replay.ts`)

**v0.5 known limitations** (documented in README):

- Calibration-delta tables citing path-forms as discussion topics
  may false-positive (mitigated but imperfect)
- Section-level future-state markers don't propagate to claims
  further down; use inline markers per claim or paragraph

**Out of scope (v0.6+)**:

- Tool-existence (e.g., "running `bun X` returns Y") — separate
  empirical-output drift sub-class
- URL existence (web fetches; not file-system)
- Convention drift, path-form drift, self-recursive drift —
  separate sub-classes per the 7-class taxonomy

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 3, 2026
…-drift sub-class)

Second sub-class of B-0170's 7-class taxonomy. Catches claims that a
file or directory exists when it doesn't on disk.

**What it catches**:

- Backtick-quoted paths in markdown
- Markdown link targets (relative paths only)
- Cases where the path doesn't resolve to anything on disk

**Resolution discipline**: tries 3 candidate roots in priority order:

1. File's own directory (intra-dir cross-references)
2. Parent directory (bare-filename refs for files in subdirs)
3. Repository root (repo-relative paths)

Stops on first hit; only emits finding if NO root resolves.

**Future-state context detection**: claims marked future-state are
exempt (proposed/planned/will-be/would-be/tbd/deferred/i'm-guessing/
concretely-something-like/will-probably/etc.).

**Skipped automatically**: globs (*, ?, [...]), URLs, anchors,
absolute paths, placeholders, fenced code blocks.

**Tests**: 17 new tests across looksLikePath / isFutureStateContext /
findPathClaims (33 total in tools/substrate-claim-checker/, all pass).

**Multiple findings this session would have been caught**:

- PR #1280 B-0173 ground-truth recovery claimed `tools/git/hooks/`
  exists; reviewer flagged that it doesn't (B-0173 row deliverable)
- PR #1289 + #1290 review threads flagged similar existence-drift
  patterns

**Sanity check on real substrate**:
- alignment-frontier memo: clean (0 findings)
- B-0173 guess file (post-#1285 fix): 2 false-positives in
  calibration-delta tables (acceptable v0.5 limitation; documented)
- B-0166 guess file: 1 finding (proposed `tools/chat-events/replay.ts`)

**v0.5 known limitations** (documented in README):

- Calibration-delta tables citing path-forms as discussion topics
  may false-positive (mitigated but imperfect)
- Section-level future-state markers don't propagate to claims
  further down; use inline markers per claim or paragraph

**Out of scope (v0.6+)**:

- Tool-existence (e.g., "running `bun X` returns Y") — separate
  empirical-output drift sub-class
- URL existence (web fetches; not file-system)
- Convention drift, path-form drift, self-recursive drift —
  separate sub-classes per the 7-class taxonomy

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 3, 2026
…-drift sub-class)

Second sub-class of B-0170's 7-class taxonomy. Catches claims that a
file or directory exists when it doesn't on disk.

**What it catches**:

- Backtick-quoted paths in markdown
- Markdown link targets (relative paths only)
- Cases where the path doesn't resolve to anything on disk

**Resolution discipline**: tries 3 candidate roots in priority order:

1. File's own directory (intra-dir cross-references)
2. Parent directory (bare-filename refs for files in subdirs)
3. Repository root (repo-relative paths)

Stops on first hit; only emits finding if NO root resolves.

**Future-state context detection**: claims marked future-state are
exempt (proposed/planned/will-be/would-be/tbd/deferred/i'm-guessing/
concretely-something-like/will-probably/etc.).

**Skipped automatically**: globs (*, ?, [...]), URLs, anchors,
absolute paths, placeholders, fenced code blocks.

**Tests**: 17 new tests across looksLikePath / isFutureStateContext /
findPathClaims (33 total in tools/substrate-claim-checker/, all pass).

**Multiple findings this session would have been caught**:

- PR #1280 B-0173 ground-truth recovery claimed `tools/git/hooks/`
  exists; reviewer flagged that it doesn't (B-0173 row deliverable)
- PR #1289 + #1290 review threads flagged similar existence-drift
  patterns

**Sanity check on real substrate**:
- alignment-frontier memo: clean (0 findings)
- B-0173 guess file (post-#1285 fix): 2 false-positives in
  calibration-delta tables (acceptable v0.5 limitation; documented)
- B-0166 guess file: 1 finding (proposed `tools/chat-events/replay.ts`)

**v0.5 known limitations** (documented in README):

- Calibration-delta tables citing path-forms as discussion topics
  may false-positive (mitigated but imperfect)
- Section-level future-state markers don't propagate to claims
  further down; use inline markers per claim or paragraph

**Out of scope (v0.6+)**:

- Tool-existence (e.g., "running `bun X` returns Y") — separate
  empirical-output drift sub-class
- URL existence (web fetches; not file-system)
- Convention drift, path-form drift, self-recursive drift —
  separate sub-classes per the 7-class taxonomy

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 3, 2026
…-drift sub-class)

Second sub-class of B-0170's 7-class taxonomy. Catches claims that a
file or directory exists when it doesn't on disk.

**What it catches**:

- Backtick-quoted paths in markdown
- Markdown link targets (relative paths only)
- Cases where the path doesn't resolve to anything on disk

**Resolution discipline**: tries 3 candidate roots in priority order:

1. File's own directory (intra-dir cross-references)
2. Parent directory (bare-filename refs for files in subdirs)
3. Repository root (repo-relative paths)

Stops on first hit; only emits finding if NO root resolves.

**Future-state context detection**: claims marked future-state are
exempt (proposed/planned/will-be/would-be/tbd/deferred/i'm-guessing/
concretely-something-like/will-probably/etc.).

**Skipped automatically**: globs (*, ?, [...]), URLs, anchors,
absolute paths, placeholders, fenced code blocks.

**Tests**: 17 new tests across looksLikePath / isFutureStateContext /
findPathClaims (33 total in tools/substrate-claim-checker/, all pass).

**Multiple findings this session would have been caught**:

- PR #1280 B-0173 ground-truth recovery claimed `tools/git/hooks/`
  exists; reviewer flagged that it doesn't (B-0173 row deliverable)
- PR #1289 + #1290 review threads flagged similar existence-drift
  patterns

**Sanity check on real substrate**:
- alignment-frontier memo: clean (0 findings)
- B-0173 guess file (post-#1285 fix): 2 false-positives in
  calibration-delta tables (acceptable v0.5 limitation; documented)
- B-0166 guess file: 1 finding (proposed `tools/chat-events/replay.ts`)

**v0.5 known limitations** (documented in README):

- Calibration-delta tables citing path-forms as discussion topics
  may false-positive (mitigated but imperfect)
- Section-level future-state markers don't propagate to claims
  further down; use inline markers per claim or paragraph

**Out of scope (v0.6+)**:

- Tool-existence (e.g., "running `bun X` returns Y") — separate
  empirical-output drift sub-class
- URL existence (web fetches; not file-system)
- Convention drift, path-form drift, self-recursive drift —
  separate sub-classes per the 7-class taxonomy

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 3, 2026
…-drift sub-class)

Second sub-class of B-0170's 7-class taxonomy. Catches claims that a
file or directory exists when it doesn't on disk.

**What it catches**:

- Backtick-quoted paths in markdown
- Markdown link targets (relative paths only)
- Cases where the path doesn't resolve to anything on disk

**Resolution discipline**: tries 3 candidate roots in priority order:

1. File's own directory (intra-dir cross-references)
2. Parent directory (bare-filename refs for files in subdirs)
3. Repository root (repo-relative paths)

Stops on first hit; only emits finding if NO root resolves.

**Future-state context detection**: claims marked future-state are
exempt (proposed/planned/will-be/would-be/tbd/deferred/i'm-guessing/
concretely-something-like/will-probably/etc.).

**Skipped automatically**: globs (*, ?, [...]), URLs, anchors,
absolute paths, placeholders, fenced code blocks.

**Tests**: 17 new tests across looksLikePath / isFutureStateContext /
findPathClaims (33 total in tools/substrate-claim-checker/, all pass).

**Multiple findings this session would have been caught**:

- PR #1280 B-0173 ground-truth recovery claimed `tools/git/hooks/`
  exists; reviewer flagged that it doesn't (B-0173 row deliverable)
- PR #1289 + #1290 review threads flagged similar existence-drift
  patterns

**Sanity check on real substrate**:
- alignment-frontier memo: clean (0 findings)
- B-0173 guess file (post-#1285 fix): 2 false-positives in
  calibration-delta tables (acceptable v0.5 limitation; documented)
- B-0166 guess file: 1 finding (proposed `tools/chat-events/replay.ts`)

**v0.5 known limitations** (documented in README):

- Calibration-delta tables citing path-forms as discussion topics
  may false-positive (mitigated but imperfect)
- Section-level future-state markers don't propagate to claims
  further down; use inline markers per claim or paragraph

**Out of scope (v0.6+)**:

- Tool-existence (e.g., "running `bun X` returns Y") — separate
  empirical-output drift sub-class
- URL existence (web fetches; not file-system)
- Convention drift, path-form drift, self-recursive drift —
  separate sub-classes per the 7-class taxonomy

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 3, 2026
…-drift sub-class)

Second sub-class of B-0170's 7-class taxonomy. Catches claims that a
file or directory exists when it doesn't on disk.

**What it catches**:

- Backtick-quoted paths in markdown
- Markdown link targets (relative paths only)
- Cases where the path doesn't resolve to anything on disk

**Resolution discipline**: tries 3 candidate roots in priority order:

1. File's own directory (intra-dir cross-references)
2. Parent directory (bare-filename refs for files in subdirs)
3. Repository root (repo-relative paths)

Stops on first hit; only emits finding if NO root resolves.

**Future-state context detection**: claims marked future-state are
exempt (proposed/planned/will-be/would-be/tbd/deferred/i'm-guessing/
concretely-something-like/will-probably/etc.).

**Skipped automatically**: globs (*, ?, [...]), URLs, anchors,
absolute paths, placeholders, fenced code blocks.

**Tests**: 17 new tests across looksLikePath / isFutureStateContext /
findPathClaims (33 total in tools/substrate-claim-checker/, all pass).

**Multiple findings this session would have been caught**:

- PR #1280 B-0173 ground-truth recovery claimed `tools/git/hooks/`
  exists; reviewer flagged that it doesn't (B-0173 row deliverable)
- PR #1289 + #1290 review threads flagged similar existence-drift
  patterns

**Sanity check on real substrate**:
- alignment-frontier memo: clean (0 findings)
- B-0173 guess file (post-#1285 fix): 2 false-positives in
  calibration-delta tables (acceptable v0.5 limitation; documented)
- B-0166 guess file: 1 finding (proposed `tools/chat-events/replay.ts`)

**v0.5 known limitations** (documented in README):

- Calibration-delta tables citing path-forms as discussion topics
  may false-positive (mitigated but imperfect)
- Section-level future-state markers don't propagate to claims
  further down; use inline markers per claim or paragraph

**Out of scope (v0.6+)**:

- Tool-existence (e.g., "running `bun X` returns Y") — separate
  empirical-output drift sub-class
- URL existence (web fetches; not file-system)
- Convention drift, path-form drift, self-recursive drift —
  separate sub-classes per the 7-class taxonomy

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 3, 2026
…-drift sub-class)

Second sub-class of B-0170's 7-class taxonomy. Catches claims that a
file or directory exists when it doesn't on disk.

**What it catches**:

- Backtick-quoted paths in markdown
- Markdown link targets (relative paths only)
- Cases where the path doesn't resolve to anything on disk

**Resolution discipline**: tries 3 candidate roots in priority order:

1. File's own directory (intra-dir cross-references)
2. Parent directory (bare-filename refs for files in subdirs)
3. Repository root (repo-relative paths)

Stops on first hit; only emits finding if NO root resolves.

**Future-state context detection**: claims marked future-state are
exempt (proposed/planned/will-be/would-be/tbd/deferred/i'm-guessing/
concretely-something-like/will-probably/etc.).

**Skipped automatically**: globs (*, ?, [...]), URLs, anchors,
absolute paths, placeholders, fenced code blocks.

**Tests**: 17 new tests across looksLikePath / isFutureStateContext /
findPathClaims (33 total in tools/substrate-claim-checker/, all pass).

**Multiple findings this session would have been caught**:

- PR #1280 B-0173 ground-truth recovery claimed `tools/git/hooks/`
  exists; reviewer flagged that it doesn't (B-0173 row deliverable)
- PR #1289 + #1290 review threads flagged similar existence-drift
  patterns

**Sanity check on real substrate**:
- alignment-frontier memo: clean (0 findings)
- B-0173 guess file (post-#1285 fix): 2 false-positives in
  calibration-delta tables (acceptable v0.5 limitation; documented)
- B-0166 guess file: 1 finding (proposed `tools/chat-events/replay.ts`)

**v0.5 known limitations** (documented in README):

- Calibration-delta tables citing path-forms as discussion topics
  may false-positive (mitigated but imperfect)
- Section-level future-state markers don't propagate to claims
  further down; use inline markers per claim or paragraph

**Out of scope (v0.6+)**:

- Tool-existence (e.g., "running `bun X` returns Y") — separate
  empirical-output drift sub-class
- URL existence (web fetches; not file-system)
- Convention drift, path-form drift, self-recursive drift —
  separate sub-classes per the 7-class taxonomy

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 3, 2026
…-drift sub-class)

Second sub-class of B-0170's 7-class taxonomy. Catches claims that a
file or directory exists when it doesn't on disk.

**What it catches**:

- Backtick-quoted paths in markdown
- Markdown link targets (relative paths only)
- Cases where the path doesn't resolve to anything on disk

**Resolution discipline**: tries 3 candidate roots in priority order:

1. File's own directory (intra-dir cross-references)
2. Parent directory (bare-filename refs for files in subdirs)
3. Repository root (repo-relative paths)

Stops on first hit; only emits finding if NO root resolves.

**Future-state context detection**: claims marked future-state are
exempt (proposed/planned/will-be/would-be/tbd/deferred/i'm-guessing/
concretely-something-like/will-probably/etc.).

**Skipped automatically**: globs (*, ?, [...]), URLs, anchors,
absolute paths, placeholders, fenced code blocks.

**Tests**: 17 new tests across looksLikePath / isFutureStateContext /
findPathClaims (33 total in tools/substrate-claim-checker/, all pass).

**Multiple findings this session would have been caught**:

- PR #1280 B-0173 ground-truth recovery claimed `tools/git/hooks/`
  exists; reviewer flagged that it doesn't (B-0173 row deliverable)
- PR #1289 + #1290 review threads flagged similar existence-drift
  patterns

**Sanity check on real substrate**:
- alignment-frontier memo: clean (0 findings)
- B-0173 guess file (post-#1285 fix): 2 false-positives in
  calibration-delta tables (acceptable v0.5 limitation; documented)
- B-0166 guess file: 1 finding (proposed `tools/chat-events/replay.ts`)

**v0.5 known limitations** (documented in README):

- Calibration-delta tables citing path-forms as discussion topics
  may false-positive (mitigated but imperfect)
- Section-level future-state markers don't propagate to claims
  further down; use inline markers per claim or paragraph

**Out of scope (v0.6+)**:

- Tool-existence (e.g., "running `bun X` returns Y") — separate
  empirical-output drift sub-class
- URL existence (web fetches; not file-system)
- Convention drift, path-form drift, self-recursive drift —
  separate sub-classes per the 7-class taxonomy

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 3, 2026
…-drift sub-class)

Second sub-class of B-0170's 7-class taxonomy. Catches claims that a
file or directory exists when it doesn't on disk.

**What it catches**:

- Backtick-quoted paths in markdown
- Markdown link targets (relative paths only)
- Cases where the path doesn't resolve to anything on disk

**Resolution discipline**: tries 3 candidate roots in priority order:

1. File's own directory (intra-dir cross-references)
2. Parent directory (bare-filename refs for files in subdirs)
3. Repository root (repo-relative paths)

Stops on first hit; only emits finding if NO root resolves.

**Future-state context detection**: claims marked future-state are
exempt (proposed/planned/will-be/would-be/tbd/deferred/i'm-guessing/
concretely-something-like/will-probably/etc.).

**Skipped automatically**: globs (*, ?, [...]), URLs, anchors,
absolute paths, placeholders, fenced code blocks.

**Tests**: 17 new tests across looksLikePath / isFutureStateContext /
findPathClaims (33 total in tools/substrate-claim-checker/, all pass).

**Multiple findings this session would have been caught**:

- PR #1280 B-0173 ground-truth recovery claimed `tools/git/hooks/`
  exists; reviewer flagged that it doesn't (B-0173 row deliverable)
- PR #1289 + #1290 review threads flagged similar existence-drift
  patterns

**Sanity check on real substrate**:
- alignment-frontier memo: clean (0 findings)
- B-0173 guess file (post-#1285 fix): 2 false-positives in
  calibration-delta tables (acceptable v0.5 limitation; documented)
- B-0166 guess file: 1 finding (proposed `tools/chat-events/replay.ts`)

**v0.5 known limitations** (documented in README):

- Calibration-delta tables citing path-forms as discussion topics
  may false-positive (mitigated but imperfect)
- Section-level future-state markers don't propagate to claims
  further down; use inline markers per claim or paragraph

**Out of scope (v0.6+)**:

- Tool-existence (e.g., "running `bun X` returns Y") — separate
  empirical-output drift sub-class
- URL existence (web fetches; not file-system)
- Convention drift, path-form drift, self-recursive drift —
  separate sub-classes per the 7-class taxonomy

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 3, 2026
…-drift sub-class) (#1298)

Second sub-class of B-0170's 7-class taxonomy. Catches claims that a
file or directory exists when it doesn't on disk.

**What it catches**:

- Backtick-quoted paths in markdown
- Markdown link targets (relative paths only)
- Cases where the path doesn't resolve to anything on disk

**Resolution discipline**: tries 3 candidate roots in priority order:

1. File's own directory (intra-dir cross-references)
2. Parent directory (bare-filename refs for files in subdirs)
3. Repository root (repo-relative paths)

Stops on first hit; only emits finding if NO root resolves.

**Future-state context detection**: claims marked future-state are
exempt (proposed/planned/will-be/would-be/tbd/deferred/i'm-guessing/
concretely-something-like/will-probably/etc.).

**Skipped automatically**: globs (*, ?, [...]), URLs, anchors,
absolute paths, placeholders, fenced code blocks.

**Tests**: 17 new tests across looksLikePath / isFutureStateContext /
findPathClaims (33 total in tools/substrate-claim-checker/, all pass).

**Multiple findings this session would have been caught**:

- PR #1280 B-0173 ground-truth recovery claimed `tools/git/hooks/`
  exists; reviewer flagged that it doesn't (B-0173 row deliverable)
- PR #1289 + #1290 review threads flagged similar existence-drift
  patterns

**Sanity check on real substrate**:
- alignment-frontier memo: clean (0 findings)
- B-0173 guess file (post-#1285 fix): 2 false-positives in
  calibration-delta tables (acceptable v0.5 limitation; documented)
- B-0166 guess file: 1 finding (proposed `tools/chat-events/replay.ts`)

**v0.5 known limitations** (documented in README):

- Calibration-delta tables citing path-forms as discussion topics
  may false-positive (mitigated but imperfect)
- Section-level future-state markers don't propagate to claims
  further down; use inline markers per claim or paragraph

**Out of scope (v0.6+)**:

- Tool-existence (e.g., "running `bun X` returns Y") — separate
  empirical-output drift sub-class
- URL existence (web fetches; not file-system)
- Convention drift, path-form drift, self-recursive drift —
  separate sub-classes per the 7-class taxonomy

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 3, 2026
…it hooks needed) (Aaron 2026-05-03) (#1312)

Two architectural insights from Aaron 2026-05-03 chat exchange:

**Insight 1 — DST is the empirical TS-over-bash quality justification**:
Aaron 2026-05-03: *"to back up my bash is lower quality claim i offer
the difficlut of proper Deterministic Simulation in bash vs ts, this
is where my quality assesment comes from."*

TS supports proper DST (typed inputs, deterministic outputs, controlled
randomness, mockable I/O, structured assertions). Bash supports DST
poorly. This is empirical substrate-quality grounding, not just
preference. Composes with Otto-272 DST-everywhere + B-0156 TS
standardization. When justifying TS over bash, cite DST capability —
stronger than "bash is just lower quality."

**Insight 2 — vibe-coders always have a harness; harness hooks suffice;
git hooks are antipattern**:
Aaron 2026-05-03: *"vibe coders will never be without a harness of
some kind"* + *"i don't think we need git hooks harness hooks are
good"* + *"many consider git hooks an antipatter, i tend to love
antipattern when they are used in the non antipatter way lol, i dont
know if we have any non antipatter use cases that harness hook
counld not handle but git hooks could."*.

Analysis: non-antipattern git-hook use cases (server-side hooks,
non-harness commit protection) don't apply to Zeta because vibe-coded
scope assumes harness-mediated contributors only.

**Conclusion**: B-0173 (hook authoring) scope simplifies from "git
hooks + harness hooks + CI" to "harness hooks + CI only". The
ground-truth-recovery on B-0173 (PR #1280) was wrong; correction lands
in a separate PR. This memo is the substrate that justifies it.

Future-Otto rules:
- TS is canonical; bash exists ONLY for pre-install scripts (no DST
  needed there anyway)
- Harness hooks are the distribution mechanism for skill-bundle users
- DST is the empirical quality justification for TS-over-bash
- Skill-bundle distribution flows through harnesses, not direct
  filesystem

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 3, 2026
…ooks memo (Otto 2026-05-03) (#1316)

The B-0173 ground-truth recovery (PR #1280) was wrong. It listed 3
hook types including 2 git hooks. Aaron 2026-05-03 clarified:
vibe-coders always have a harness; harness hooks suffice; git hooks
are antipattern in this scope. Memo capturing this:
`memory/feedback_dst_justifies_ts_quality_over_bash_and_harness_hooks_suffice_no_git_hooks_aaron_2026_05_03.md`
(PR #1312 + #1313 + #1315 follow-ups).

This commit corrects the B-0173 guess file's recovery section:

- ~~tools/git/hooks/pre-commit~~ — REMOVED. Harness fires on
  pre-tool-use (Edit/Write) before content lands; covers same use case
- ~~tools/git/hooks/commit-msg~~ — REMOVED. Harness fires on
  pre-Bash-tool-use when command is `git commit`; covers same use case
- **Harness hooks** (.claude/settings.json hooks field; Codex/Cursor
  parallel mechanisms) — NEW, replaces git hooks
- **CI workflow on PR descriptions** — unchanged

Specific implementation also corrected: TS-canonical (no bash wrapper
needed; harness runs TS directly via bun).

The calibration delta on this guess (~48% accuracy at recovery time)
should NOT be retroactively re-scored — the original delta reflects
the recovery-as-it-happened. The correction here is about the substrate
moving forward, not rewriting calibration history.

Future-Otto: when a calibration recovery turns out to have used wrong
ground truth (because the ground truth itself shifted via clarification),
mark the correction explicitly + preserve the original calibration.
The calibration data is about Otto's inference quality at a moment in
time; subsequent ground-truth refinements are separate substrate.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants