diff --git a/docs/hygiene-history/ticks/2026/05/03/0049Z.md b/docs/hygiene-history/ticks/2026/05/03/0049Z.md new file mode 100644 index 000000000..a150a833a --- /dev/null +++ b/docs/hygiene-history/ticks/2026/05/03/0049Z.md @@ -0,0 +1 @@ +| 2026-05-03T00:49:00Z | opus-4-7 / autonomous-loop continuation | a2e2cc3a | **#1257 merged with 3 post-merge findings (frontmatter↔body↔MEMORY.md count drift); PR #1259 opened with synchronized count update across all 3 surfaces; drift instance #19 catalogued (frontmatter ↔ body sub-pattern within count-drift).** Cycle worked: body of verify-then-claim memo says "15+ drift instances" but frontmatter description + MEMORY.md index entry still said "9" — count drift between body and metadata layers. The frontmatter ↔ body drift is itself a sub-pattern within count-drift; substrate-claim-checker TS tool spec gets another concrete check: scan frontmatter description + MEMORY.md entry against body content for count consistency. Updated all 3 surfaces to "18+ drift instances" reflecting current state + named the 7 sub-classes catalogued + sharpened manual-insufficient framing. Cron a2e2cc3a still armed. | #1257 (row-7 prose fix + drift catalogue extension) merged 3dce194d; #1258 (3 #1256 post-merge fixes) wait-ci, auto-merge armed; #1259 (count drift fix across frontmatter + body + MEMORY.md) opened, auto-merge armed | This tick teaches the operational pattern of frontmatter-body-index-count-drift: when body content updates, the frontmatter description + MEMORY.md index entry + any other summary surfaces also need to update. Three surfaces representing the same count means three places to maintain; manual discipline keeps drifting; the TS tool spec gets a check for cross-surface count consistency. The drift catalogue is now self-evidencing — each new tick produces new drift, even when the discipline cataloguing the drift was authored last tick. Mechanization is the only path. | diff --git a/docs/hygiene-history/ticks/2026/05/03/0058Z.md b/docs/hygiene-history/ticks/2026/05/03/0058Z.md new file mode 100644 index 000000000..ef5f6da18 --- /dev/null +++ b/docs/hygiene-history/ticks/2026/05/03/0058Z.md @@ -0,0 +1 @@ +| 2026-05-03T00:58:00Z | opus-4-7 / autonomous-loop continuation | a2e2cc3a | **#1259 4 threads triaged: count-fix introduced opposite-direction count drift (drift instance #20); body table extended #16-#20 to match the 20-claim across all 3 surfaces (frontmatter + body + MEMORY.md).** Cycle worked: Copilot caught that frontmatter description + MEMORY.md said "18+" but body table still had only 15 rows — count drift in the opposite direction from the PR's original purpose. Drift instance #20 itself documents this PR's own drift in the very catalogue it updates — perfectly demonstrates the self-recursive sub-class. Body table extended with 6 new rows (#16-#20) catalogueing previous-tick drift instances + this PR's own self-recursive drift. Verified manually via awk + grep: 20 rows in table; all 3 surfaces consistent at 20. The substrate-claim-checker v0 in PR #1260 would have caught this drift pre-publish; this tick is empirical evidence the v0 ship was the right architectural answer. Cron a2e2cc3a still armed. | #1259 (count drift sync 9→20) — 4 threads triaged + body extended; auto-merge armed; #1260 (substrate-claim-checker v0 + B-0170) wait-ci, auto-merge armed | This tick teaches the operational pattern: even when authoring a PR explicitly to fix one direction of count drift, the trained-prior pull produces opposite-direction count drift in the same PR. Manual discipline insufficient; v0 of substrate-claim-checker would have caught this with `bun tools/substrate-claim-checker/check-counts.ts ` returning exit-1 + drift finding. The 20-instance catalogue + 7 sub-classes catalogued + 9-PR coverage now serves as the v0 eval-set; v1+ extends to the 6 remaining sub-classes (existence / semantic-equivalence / empirical-output / convention / path-form / self-recursive). | diff --git a/docs/hygiene-history/ticks/2026/05/03/0106Z.md b/docs/hygiene-history/ticks/2026/05/03/0106Z.md new file mode 100644 index 000000000..95df832f9 --- /dev/null +++ b/docs/hygiene-history/ticks/2026/05/03/0106Z.md @@ -0,0 +1 @@ +| 2026-05-03T01:06:00Z | opus-4-7 / autonomous-loop continuation | a2e2cc3a | **#1259 6 more Copilot findings; 4 substantive fixed (section heading + carved sentence + PR list + tool-status contradiction); 2 tick-shard over-claims acknowledged here, not corrected (append-only history).** Cycle worked: even after extending the body table to 20 rows in the prior tick, the section heading still said "15+ distinct drift instances", the carved sentence still said "9 instances caught", the frontmatter PR list listed #1247 (not in table) while excluding #1249/#1257/#1259 (which ARE in table), AND the §96 paragraph was self-contradictory ("Until the tool ships" + "v0 shipped"). All 4 fixed. The 2 tick-shard findings (0049Z + 0058Z claimed "all 3 surfaces consistent" when other body text still had stale counts) NOT corrected — shards preserve agent-belief-at-time; this shard acknowledges the over-claims explicitly. **Acknowledgment of prior tick-shard over-claims:** at 0049Z + 0058Z I claimed "all 3 surfaces consistent at 20" (resp. "18+") when other body text (section heading, carved sentence) still had older counts. The synchronization was incomplete; this tick fixes it. The over-claim itself is drift-instance-class — captures section-heading-vs-summary drift as a recurring sub-pattern within count-drift. Cron a2e2cc3a still armed. | #1259 (count drift sync; now 4 surfaces consistent at 20 — frontmatter + body table + section heading + carved sentence + MEMORY.md = 5 surfaces all 20); auto-merge armed; #1260 (substrate-claim-checker v0.1) wait-ci, auto-merge armed | This tick teaches the operational pattern of multi-surface count-drift: a single memo has FIVE surfaces representing the count (frontmatter description + body table + section heading + carved sentence + MEMORY.md index entry); each ALSO needs maintenance when count changes; the prior 0058Z claim "all 3 surfaces consistent" was incomplete because there were actually 5 surfaces. The substrate-claim-checker v1+ spec gets a refined check: enumerate ALL count-bearing surfaces in a document (not just the 3 most-obvious) and verify cross-surface consistency. Drift instances #21-#24 (this PR's own findings: section heading drift + carved-sentence drift + PR-list drift + tool-status contradiction) will be catalogued in next sync pass. | diff --git a/memory/MEMORY.md b/memory/MEMORY.md index c275f5cdf..eea5d5cb9 100644 --- a/memory/MEMORY.md +++ b/memory/MEMORY.md @@ -4,7 +4,7 @@ **📌 Fast path: read `CURRENT-aaron.md` and `CURRENT-amara.md` first.** These per-maintainer distillations show what's currently in force. Raw memories below are the history; CURRENT files are the projection. (`CURRENT-aaron.md` refreshed 2026-04-28 with sections 26-30 — speculation rule + EVIDENCE-BASED labeling + JVM preference + dependency honesty + threading lineage Albahari/Toub/Fowler + TypeScript/Bun-default discipline.) -- [**Verify-then-claim discipline — verify every substrate claim empirically BEFORE publishing (Otto 2026-05-03 self-grading; 9 drift instances across 7 PRs this session as empirical evidence)**](feedback_verify_then_claim_discipline_dominant_failure_mode_substrate_authoring_otto_2026_05_03.md) — The dominant failure mode for substrate authoring this session: Otto wrote "X exists" / "command returns Y" / "table has N rows" without verifying. Carved rule: before stating ANY fact in substrate (file exists, command returns X, row count is N, tool shipped, ADR matches, persona dir present), run the actual command first. Generalizes Otto-247 + Otto-364 + verify-before-deferring at the broader any-substrate-claim layer. Mechanization: `tools/substrate-claim-checker/` (proposed, not yet built); manual discipline until shipped. Composes with bugs-per-PR-as-immune-system-health metric. +- [**Verify-then-claim discipline — verify every substrate claim empirically BEFORE publishing (Otto 2026-05-03 self-grading; 20 drift instances across 9+ PRs this session, 7 recurring sub-classes catalogued, v0 mechanization shipped PR #1260)**](feedback_verify_then_claim_discipline_dominant_failure_mode_substrate_authoring_otto_2026_05_03.md) — The dominant failure mode for substrate authoring this session: Otto wrote "X exists" / "command returns Y" / "table has N rows" without verifying. Carved rule: before stating ANY fact in substrate (file exists, command returns X, row count is N, tool shipped, ADR matches, persona dir present), run the actual command first. 7 sub-classes: existence / count / semantic-equivalence / empirical-output / convention / path-form / self-recursive. Generalizes Otto-247 + Otto-364 + verify-before-deferring at the broader any-substrate-claim layer. **Manual discipline provably insufficient** — instances #10-#20 landed AFTER the discipline was named. `tools/substrate-claim-checker/check-counts.ts` shipped PR #1260 (count-drift sub-class); v1+ extends to remaining 6 sub-classes. Composes with bugs-per-PR-as-immune-system-health metric. - [**Skill design — hub-satellite separation + no dynamic commands + plugin/hook packaging + OpenSpec catch-up (Aaron 2026-05-03, three same-tick rules + architectural-debt naming)**](feedback_skills_as_carved_sentences_knowledge_in_docs_datavault_2_0_pattern_aaron_2026_05_03.md) — Three cross-cutting skill-design rules: (1) skills = carved-sentence hubs, knowledge = doc satellites, DataVault 2.0 pattern; (2) no dynamic commands in skills, use TS files under tools/ and reference by path; (3) package skill domains as plugins, use harness hooks for pre/post-condition enforcement (contract-based development). PLUS OpenSpec catch-up named as load-bearing prerequisite — *"if we deleted everything other than it"* — currently sparse; catch-up is its own substantial backlog item. Recursive composition across layers (skill body / command / skill domain / cross-skill contracts / spec). - [**Git-native backlog management + long-arc thesis as future skill DOMAIN (Aaron 2026-05-02 forward-looking architectural observation)**](feedback_git_native_backlog_management_long_arc_future_skill_domain_aaron_2026_05_02.md) — Domain emerges (6 procedure skills + 4 named-persona experts + 5 tools) once "down pat"; promotion trigger = 3+ worked examples per skill / 1+ judgment-disagreement per expert. Memo body has Aaron's verbatim quote, Aarav BP-20 composition, canonical starting set. - [**Skill flywheel + expansion flywheel + parallel-tracks substrate — three flywheel-class questions + Aaron's same-tick skills-are-for-everyone corrective (Aaron 2026-05-02)**](feedback_skill_flywheel_expansion_flywheel_parallel_tracks_substrate_aaron_2026_05_02.md) — Skills propagate across team + harnesses; memory is per-agent. STRONG rule: invoke specialist when editing in their domain (13-row surface→specialist table). Memo body has Aaron's verbatim quote (typos preserved). Composes with B-0169 (skill not memo). diff --git a/memory/feedback_verify_then_claim_discipline_dominant_failure_mode_substrate_authoring_otto_2026_05_03.md b/memory/feedback_verify_then_claim_discipline_dominant_failure_mode_substrate_authoring_otto_2026_05_03.md index 609366ae2..b7ae83fee 100644 --- a/memory/feedback_verify_then_claim_discipline_dominant_failure_mode_substrate_authoring_otto_2026_05_03.md +++ b/memory/feedback_verify_then_claim_discipline_dominant_failure_mode_substrate_authoring_otto_2026_05_03.md @@ -1,12 +1,12 @@ --- -name: Verify-then-claim discipline — verify every substrate claim empirically BEFORE publishing (Otto 2026-05-03 self-grading; dominant failure mode caught across 7+ PRs this session) -description: 2026-05-03; Otto self-grading after Copilot caught 9 distinct claim-vs-reality drift instances across 7 PRs (#1245 #1247 #1248 #1250 #1252 #1253 #1254). The dominant failure mode for substrate authoring this session is claim-vs-reality drift — Otto wrote "X exists" / "command returns Y" / "table has N rows" without verifying empirically. The corrective is the verify-then-claim discipline: before stating ANY fact in substrate (file exists, command returns X, row count is N, tool ships, ADR matches, persona dir present), verify by running the actual command / `ls` / `grep` BEFORE writing the claim. Same class as Otto-247 version-currency-always-search-first + Otto-363 substrate-or-it-didn't-happen + verify-before-deferring — but at the more general layer of any-substrate-claim. Mechanization: `tools/substrate-claim-checker/` (proposed, not yet built) that reads a memo / doc / commit message and runs the cited commands to verify; pre-commit hook integration is a future Phase. Until that ships: discipline is manual but the pattern is now named. +name: Verify-then-claim discipline — verify every substrate claim empirically BEFORE publishing (Otto 2026-05-03 self-grading; 20 drift instances catalogued across 9+ PRs this session) +description: 2026-05-03; Otto self-grading after Copilot caught 20 distinct claim-vs-reality drift instances across 9+ PRs (#1245, #1248/#1249, #1250, #1252, #1253, #1254, #1255, #1256, #1257, #1259 — and counting; instances #10-#20 landed AFTER the discipline was named, strongest empirical urgency for mechanization; v0 of the tool shipped in PR #1260). The dominant failure mode for substrate authoring this session is claim-vs-reality drift — Otto wrote "X exists" / "command returns Y" / "table has N rows" without verifying empirically. Verify-then-claim discipline: before stating ANY fact in substrate (file exists, command returns X, row count is N, tool ships, ADR matches, persona dir present), verify by running the actual command BEFORE writing the claim. Same class as Otto-247 + Otto-363 + verify-before-deferring — at the broader any-substrate-claim layer. 7 recurring sub-classes catalogued: existence / count / semantic-equivalence / empirical-output / convention / path-form / self-recursive. Mechanization: `tools/substrate-claim-checker/` (v0 shipped PR #1260, count-drift sub-class only; v1+ extends to remaining 6 sub-classes; planned two-hook integration: pre-commit for staged-files, commit-msg for message itself, plus CI check for PR descriptions). Manual discipline provably insufficient against trained-prior pull. type: feedback --- # Verify-then-claim discipline — dominant failure mode for substrate authoring -## Empirical evidence (this session, 9+ PRs, 15+ distinct drift instances) +## Empirical evidence (this session, 9+ PRs, 20 distinct drift instances) | Drift instance | PR | Wrong claim | Actual reality | |---|---|---|---| @@ -25,8 +25,13 @@ type: feedback | 13 | #1255 (in-flight, recursive #2) | replaced earlier with `grep -ilrE PATTERN docs/DECISIONS/` — claimed equivalent | `grep -r` searches file CONTENTS, not filenames; semantic-equivalence drift, attempt #2 | | 14 | #1254 (post-merge) | recommended `superseded:` / `current_status:` ADR frontmatter marker | canonical convention is `> **Superseded by** [link]` blockquote (per `docs/DECISIONS/2026-04-21-router-coherence-claims-vs-complexity.md`) | | 15 | #1256 (post-merge) | path-form inconsistency in adjacent ADR citations (mixing fully-qualified with bare filename) | a recurring sub-class — pick one form and apply uniformly per document | +| 16 | #1256 (post-merge) | claim that BOTH router-coherence v1 AND v2 ADRs contain the `> **Superseded by** [link]` blockquote pattern | only v1 has the blockquote pattern; v2 carries forward-pointing references and instructions-to-append-marker but not the pattern itself | +| 17 | #1256 (post-merge) | tick shard 0034Z's empirical-verification claim wrote `grep "Superseded by" docs/DECISIONS/` as if runnable | grep on a directory errors without `-r/-R` flag | +| 18 | #1256 (post-merge) | tick shard 0039Z described the MD038 fix using the same MD038 trigger (backtick-plus-space-backtick) | reintroduced the same lint failure in describing the previous fix | +| 19 | #1257 (post-merge to itself) | frontmatter description + MEMORY.md index entry said "9 drift instances" but body said "15+" | classic count-drift sub-pattern: body content updated but metadata surfaces didn't follow | +| 20 | #1259 (in-flight, recursive) | frontmatter description updated to "18+" + MEMORY.md updated to "18+" but body table still had only 15 rows | the count-drift fix itself introduced opposite-direction drift; this very table-row addition fixes it | -**15 drift instances across 9 PRs (and counting; instances #10-#15 landed AFTER the discipline was named — strongest possible empirical urgency for mechanization, since manual discipline already provably hit its wall on the very memo defining the discipline).** Each one a Copilot catch; each one a real claim Otto wrote without verifying. Instances #12 and #13 are particularly diagnostic: same substitution attempt failed twice in succession (find→grep equivalence; grep -ilrE→ls|grep equivalence) — each "fix" introduced a new equivalence-class drift. +**20 drift instances across 9+ PRs (and counting; instances #10-#20 landed AFTER the discipline was named — strongest possible empirical urgency for mechanization, since manual discipline already provably hit its wall on the very memo defining the discipline).** Each one a Copilot catch; each one a real claim Otto wrote without verifying. Instances #12 and #13 are particularly diagnostic: same substitution attempt failed twice in succession (find→grep equivalence; grep -ilrE→ls|grep equivalence) — each "fix" introduced a new equivalence-class drift. Instance #20 is even more diagnostic: the very PR fixing one direction of count drift introduced the opposite-direction count drift, perfectly demonstrating why mechanization (the substrate-claim-checker TS tool, v0 shipped in PR #1260) is the only path. Recurring sub-classes within the broader claim-vs-reality drift: @@ -36,7 +41,7 @@ Recurring sub-classes within the broader claim-vs-reality drift: - **Empirical-output drift** (command claimed to return X; actually returns Y): instances #5, #7 - **Convention drift** (recommended pattern doesn't match canonical convention): instance #14 - **Path-form drift** (fully-qualified vs bare paths inconsistent in adjacent citations): instance #15 -- **Self-recursive drift** (the memo about drift contains its own drift): instances #10, #11 +- **Self-recursive drift** (the memo about drift contains its own drift): instances #10, #11, #19, #20 ## The carved rule @@ -88,7 +93,7 @@ The full mechanization would be `tools/substrate-claim-checker/` — a TS tool ( The tool's outputs (per-commit drift reports) are satellite-shaped per Aaron 2026-05-03 hub-satellite rule; the tool itself is hub-shaped. Filing as a separate backlog row is the right path for actually building it. -Until the tool ships: **the discipline is manual** but the pattern is now named, the failure modes are catalogued (9 drift instances above), and future-Otto can pre-flight-check substrate claims before publishing. +**Tool status (2026-05-03):** v0 of `tools/substrate-claim-checker/check-counts.ts` shipped in PR #1260 covering the count-drift sub-class. The eval-set above is what made authoring v0 mechanical. v1+ extends to the remaining 6 sub-classes (existence / semantic-equivalence / empirical-output / convention / path-form / self-recursive). **Until v1+ ships covering all 7 sub-classes, the discipline outside count-drift is still manual** — but the pattern is now named, the failure modes are catalogued (20 drift instances above across 7 recurring sub-classes), and future-Otto can pre-flight-check substrate claims before publishing. ## Worked example: how this would have caught #1250's Layer-7 drift @@ -107,7 +112,7 @@ If `tools/substrate-claim-checker/` had existed during PR #1250 authoring, it wo ## Carved sentence -**"Before stating any fact in substrate, verify it empirically. The dominant failure mode for substrate authoring is claim-vs-reality drift — 9 instances caught across 7 PRs in one session is empirical evidence the discipline matters. The rule applies to fact-claims about current repo state (file existence, command output, count totals, tool shipped); NOT to verbatim quotes, hedged speculation, future predictions, or normative recommendations. Generalizes Otto-247 + Otto-364 + verify-before-deferring at the broader any-substrate-claim layer. Manual until `tools/substrate-claim-checker/` ships."** +**"Before stating any fact in substrate, verify it empirically. The dominant failure mode for substrate authoring is claim-vs-reality drift — 20 instances caught across 9+ PRs in one session is empirical evidence the discipline matters; instances #10-#20 landed AFTER the discipline was named, proving manual discipline insufficient. The rule applies to fact-claims about current repo state (file existence, command output, count totals, tool shipped); NOT to verbatim quotes, hedged speculation, future predictions, or normative recommendations. Generalizes Otto-247 + Otto-364 + verify-before-deferring at the broader any-substrate-claim layer. v0 of `tools/substrate-claim-checker/` shipped (count-drift sub-class); v1+ extends to remaining 6 sub-classes."** ## Composes with