Skip to content

review(pr-1263-postmerge): empirical rewrite of worked example #2 — Layer 4/6/7 corrections#1266

Merged
AceHack merged 2 commits intomainfrom
research/worked-example-2-empirical-rewrite-aaron-2026-05-03
May 3, 2026
Merged

review(pr-1263-postmerge): empirical rewrite of worked example #2 — Layer 4/6/7 corrections#1266
AceHack merged 2 commits intomainfrom
research/worked-example-2-empirical-rewrite-aaron-2026-05-03

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 3, 2026

Summary

10 Copilot post-merge findings on PR #1263 (worked example #2). ALL substantive — including a major load-bearing claim drift on Layer 7 that makes this PR a worked example of the verify-then-claim failure mode applied to a worked example demonstrating verify-then-claim.

What was wrong empirically

  1. Layer 4: claimed phrase unique to mathematics-expert. Reality: physics-expert has the same phrase (replication).
  2. Layer 6: claimed shards from 2026/04/19-20 confirm timeline. Reality: shards start 2026/04/28; substrate boundary.
  3. Layer 7 (load-bearing): claimed v2 router-coherence ADR cites the umbrella pattern. Reality: router-coherence ADRs are about claims-tester Stage-1-vs-Stage-2 routing, NOT umbrella deferral. No ADR mentions the pattern.
  4. | portability (4 occurrences): replaced with grep -E.

How the rewrite is more honest + better

The corrected synthesized answer follows a canonical-by-replication-and-notebook-recognition path, NOT canonical-by-ADR-decree. The skill body teaches contributors to recognize different elevation paths. Substantive negative at Layer 7 is itself instructive.

The worked example for decision-archaeology now also serves as a worked example of the verify-then-claim drift class — recursive substrate-quality teaching.

Test plan

  • Layer 4 reflects empirical 2-skill replication
  • Layer 6 reflects empirical shard-window-starts-04/28
  • Layer 7 reflects empirical no-ADR-canonicalization
  • Synthesized answer revised to replication-and-notebook canonicalization path
  • All 4 grep | occurrences switched to grep -E
  • CI green

🤖 Generated with Claude Code

…ayer 4 + 6 + 7 corrections

10 Copilot post-merge findings on PR #1263 (worked example #2).
ALL substantive — including major load-bearing claim drift that
makes this PR a worked example of the verify-then-claim failure
mode applied to a worked example demonstrating verify-then-claim.

Substantive corrections:

1. **Layer 4 wrong**: claimed "umbrella exists to" verbatim is
   unique to mathematics-expert. Empirical reality:
   `.claude/skills/physics-expert/SKILL.md` ALSO has the phrase
   — pattern was REPLICATED to a sibling. Rewrote Layer 4 to
   reflect replication evidence; updated synthesized answer to
   add "replication to sibling umbrella" as load-bearing
   canonicalization signal.

2. **Layer 6 wrong**: claimed shards from 2026/04/19 + 2026/04/20
   confirm the timeline. Empirical reality: docs/hygiene-history/
   ticks/2026/04/ starts at 04/28 — there are NO shards from the
   authoring window. Rewrote Layer 6 to reflect substantive
   negative + teach the skill-body lesson about substrate
   boundaries (tick-shard discipline started later than umbrella
   authoring).

3. **Layer 7 wrong** (most substantive — the load-bearing claim):
   claimed v2 router-coherence ADR cites the umbrella's "When to
   defer" pattern as canonical exemplar. Empirical reality:
   `grep -liE "When to defer|mathematics-expert|umbrella" docs/
   DECISIONS/*.md` returns nothing. The router-coherence ADR pair
   is about claims-tester Stage-1-vs-Stage-2 routing — a
   different routing concern entirely. NO ADR canonicalized the
   umbrella's defer pattern. Rewrote Layer 7 as substantive
   negative + acknowledged the worked example's earlier draft
   was itself drift.

4. **Synthesized answer revised**: "doctrine emerged across 3
   layers + 3 days (commit → notebook → ADR)" was wrong. Actual
   path: canonical-by-replication-and-notebook-recognition. No
   ADR canonicalization. The skill body now teaches contributors
   to recognize different elevation paths.

5. **Layer 4 + 7 + 11 grep portability** (4 occurrences): `\|`
   alternation without `-E` is GNU-leaning. Replaced with
   `grep -E ... "a|b"` form across all 4 instances.

6. **Updated meta-section** to reflect Layer 7 became substantive
   negative (matching #1's WONT-DO + #2's ADR-absent + #3's
   no-ADR pattern).

Composes with the verify-then-claim discipline recursively: the
worked example demonstrating decision-archaeology drifted on
load-bearing fact-claims without empirical verification. Layer 4
+ Layer 6 + Layer 7 each had wrong claims that empirical
verification immediately falsified. The substrate-claim-checker
v1+ existence-check + content-check would catch this class
pre-publish.

Honest acknowledgment: I authored worked example #2 without
running each command empirically, repeating the same failure mode
the discipline is designed to catch. The corrected version is
genuinely more interesting — canonical-by-replication-and-notebook
is a richer worked example than canonical-by-ADR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 3, 2026 01:54
@AceHack AceHack enabled auto-merge (squash) May 3, 2026 01:54
…demonstrating verify-then-claim; recursive substrate-quality teaching

The worked example for decision-archaeology drifted on its own
load-bearing fact-claims. 10 substantive findings on PR #1263
including a major Layer 7 wrong claim (v2 ADR doesn't cite
umbrella pattern; ADR is about claims-tester routing).

Manual discipline insufficient AT ALL LEVELS of recursion. The
corrected version is genuinely better substrate. The decision-
graph would have caught these via existence-check + content-check.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@AceHack AceHack merged commit 2f7d5aa into main May 3, 2026
21 checks passed
@AceHack AceHack deleted the research/worked-example-2-empirical-rewrite-aaron-2026-05-03 branch May 3, 2026 01:57
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the decision-archaeology worked example #2 (mathematics-expert “When to defer”) to correct several empirically-false claims about where/when the pattern became canonical, and to fix grep portability issues.

Changes:

  • Corrects Layer 4 to reflect that the “umbrella exists to …” phrase is replicated in physics-expert, not unique to mathematics-expert.
  • Corrects Layer 6 to reflect that tick-shard history starts at 2026-04-28 (so earlier authoring can’t be evidenced via tick shards).
  • Corrects Layer 7 to reflect that router-coherence ADRs do not mention/canonicalize the umbrella defer pattern, and switches grep \| to grep -E patterns elsewhere.
Comments suppressed due to low confidence (1)

docs/research/2026-05-03-decision-archaeology-worked-example-2-mathematics-expert-when-to-defer.md:192

  • P0 (documentation): Layer 8 still says “The doctrine lives in the ADR pair (Layer 7) + Aarav’s notebook”, which directly contradicts the updated Layer 7 finding that no ADR mentions/canonicalizes the pattern (and that router-coherence ADRs are about a different concern). This paragraph needs to be rewritten to remove the ADR-anchoring and align with the new canonicalization path (replication + notebook recognition).
grep -lE "When to defer|umbrella exists to" memory/feedback_*.md

Returns no specific feedback memo named for the pattern. The doctrine
lives in the ADR pair (Layer 7) + Aarav's notebook (Layer 9), not in
a named-rule memo.

</details>

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b47f326d68

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

AceHack added a commit that referenced this pull request May 3, 2026
…s -1|sort; ask the maintainer rather than infer the why (#1267)

5 Copilot post-merge findings on PR #1266 (worked example #2
empirical rewrite). All real, all fixed:

1. **P2 attribution drift**: mixed "Aaron wrote it" (named) +
   "the maintainer authored it" (role-ref) in same file.
   Standardized to role-ref form throughout (per Otto-279
   carve-out for current-state surfaces; docs/research/ IS
   history-surface so names are allowed but consistency
   matters).

2. **P1 ls|head -3 ordering not portable**: locale/flags can
   change order. Replaced with `ls -1 ... | sort | head -3`
   for reliable lexicographic ordering.

3. **P0 Layer 6 inconsistency**: the conclusion said substrate
   must be traced via "commit + persona-notebook + ADR + memos"
   but Layer 7 establishes ADRs are unrelated. Removed "ADR"
   from the alternative-traceable-through-other-layers list;
   noted ADR-class did NOT canonicalize this pattern.

4. **P1 Layer 8 stale ADR-canonicalization claim**: said
   doctrine "lives in the ADR pair (Layer 7) + Aarav's notebook
   (Layer 9)" — drift from the corrected Layer 7. Updated to
   "recognition-as-canonical lives in Aarav's notebook (Layer 9)
   + the replication evidence (Layer 4)".

5. **P2 Layer 10 stale "SKILL.md + ADR + persona-notebook trio"**:
   same drift class. Updated to "SKILL.md (umbrella + replicated
   to physics-expert sibling) + Aarav's persona notebook duo —
   NO ADR is part of the canonical durable form".

The aaron 2026-05-03 mid-tick observation: *"wanna ask why now?"*
— yes. The worked example was inferring "why it became canonical"
from substrate alone (notebook entry + replication). The honest
answer is: archaeology recovers WHAT/WHEN/WHO; first-party intent
requires first-party query. The skill body's teaching should
include this distinction. Asked the maintainer directly in chat;
answer pending.

This is the THIRD round of corrections on worked example #2.
Pattern: each fix surfaces new drift in adjacent sections that
referenced the original wrong claim. The verify-then-claim
discipline composes recursively — fixing one drift point
requires scrubbing every section that depended on it. The
substrate-claim-checker v1+ existence-check + content-check
would catch this class via cross-section consistency-checking.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants