Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/hygiene-history/ticks/2026/05/03/0031Z.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
| 2026-05-03T00:31:00Z | opus-4-7 / autonomous-loop continuation | a2e2cc3a | **#1252 + #1253 merged on main; PR #1255 expanded with verify-then-claim self-grading memo capturing the dominant failure mode of this 2-day session.** Cycle worked: (1) Copilot's 2 post-merge findings on #1253 (expand-from-closure.ts referenced without "proposed" marker; `\|` table-cell escape) addressed in #1255; (2) self-grading memo captures 9 distinct claim-vs-reality drift instances across 7 PRs in this session as empirical evidence for naming the verify-then-claim discipline; (3) carved rule lands at the more-general layer than Otto-247 (version-currency) + Otto-364 (search-first authority) + verify-before-deferring + Otto-363 (substrate-or-it-didn't-happen) — applies to ANY fact-claim about current repo state. Mechanization path identified: `tools/substrate-claim-checker/` TS tool (proposed, not yet built). Manual discipline until tool ships. Composes with bugs-per-PR-as-immune-system-health metric — would move bugs closer to single-digit productive zone by catching drift pre-publish rather than post-merge. The 9-drift evidence catalogue (per-PR, per-instance, with wrong-claim + actual-reality columns) is preserved in the memo body for future skill-creator authoring. Aarav's B-0169 review predicted this pattern with worked-examples-need-empirical-grounding framing. Cron a2e2cc3a still armed. | #1252 (multi-harness future-skill-domain memo) merged b5baf4b5; #1253 (three skill-design rules) merged 6627ff39; #1254 (Layer-7 ADR follow-up) wait-ci, auto-merge armed; #1255 (#1253 post-merge fixes + verify-then-claim self-grading memo) auto-merge armed | This tick teaches the operational pattern of self-grading from accumulated empirical evidence: the 9 drift instances across the session weren't all the same shape but shared one root cause (publishing fact-claims without empirical verification); naming the pattern + cataloguing instances IS the substrate that future-Otto can use to prevent the same drift. The memo body's 9-row table is the dry-run-eval-set for the eventual `tools/substrate-claim-checker/` TS tool: each row is a test case where the tool should fire. |
1 change: 1 addition & 0 deletions docs/hygiene-history/ticks/2026/05/03/0037Z.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
| 2026-05-03T00:37:00Z | opus-4-7 / autonomous-loop continuation | a2e2cc3a | **#1255 BLOCKED-with-green-CI investigation: 3 threads triaged + fixed (drift-catalogue's own `\|` table-cell drift + pre-commit-vs-commit-msg hook semantics + already-fixed-stale-thread).** Cycle worked the BLOCKED-with-green-CI investigate-threads-first discipline + the recursive verify-then-claim discipline applied to itself: (1) the verify-then-claim memo's drift CATALOGUE TABLE contained its own `\|` table-cell escape drift in rows 5 and 7 — exactly the class of drift the catalogue documents; rewrote rows to describe the search prose-style rather than showing literal pipes; (2) the mechanization-path section claimed pre-commit hook would validate commit-message claims, which is empirically wrong per git hook ordering — pre-commit fires BEFORE commit-msg exists; corrected to two-hook architecture (pre-commit for staged-files; commit-msg for the message itself; CI for PR descriptions); (3) third finding (find→grep equivalence) was stale — already fixed in commit 862d190 on this branch. The recursive application IS the worked example: even when documenting drift, the documenting substrate itself drifts. Manual discipline insufficient against trained-prior pull. tools/substrate-claim-checker/ TS tool ships before this cycle stops repeating. Cron a2e2cc3a still armed. | #1255 — 3 threads triaged + fixed (catalogue rewrite + hook semantics correction); auto-merge armed; CI pending; #1256 (#1254 follow-up) wait-ci, auto-merge armed | This tick teaches the operational pattern of catalogue-substrate-drift: when documenting failure modes, the documentation itself can fall into the documented mode. The strongest evidence yet that mechanization (TS tool) is required, not just naming. Composes with verify-then-claim memo (which now records this recursive-failure as part of its own evidence) + the bugs-per-PR-as-immune-system-health metric (recursive failure on the very memo that names the failure mode is the strongest possible signal the metric is detecting real cost). The substrate-claim-checker proposal in the memo's mechanization path now has empirical urgency — manual discipline ALREADY hit its wall. |
1 change: 1 addition & 0 deletions memory/MEMORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
<!-- paired-edit log (NOT the single-slot latest-marker — that lives on line 3 above): PR #986 lands carved-sentence fixed-point stability + Zeta soul-file executor architecture (Infer.NET-style Bayesian inference, NOT LLMs) + carved sentences ≈ formal specs provable in DST + Deepseek CSAP review absorption (Aaron 2026-04-30 → 2026-05-01, eight-message chain across two autonomous-loop ticks per the file body's section header). Architectural disclosure: substrate IS the priors; alignment IS substrate. The single-slot latest-marker on line 3 (forever-home Aaron 2026-05-01) takes precedence as the chronologically-latest paired edit; this PR's work is earlier. -->
**📌 Fast path: read `CURRENT-aaron.md` and `CURRENT-amara.md` first.** <!-- paired-edit: PR #690 scheduled-workflow-null-result-hygiene-scan tier-1 promotion 2026-04-28 --> These per-maintainer distillations show what's currently in force. Raw memories below are the history; CURRENT files are the projection. (`CURRENT-aaron.md` refreshed 2026-04-28 with sections 26-30 — speculation rule + EVIDENCE-BASED labeling + JVM preference + dependency honesty + threading lineage Albahari/Toub/Fowler + TypeScript/Bun-default discipline.)

- [**Verify-then-claim discipline — verify every substrate claim empirically BEFORE publishing (Otto 2026-05-03 self-grading; 9 drift instances across 7 PRs this session as empirical evidence)**](feedback_verify_then_claim_discipline_dominant_failure_mode_substrate_authoring_otto_2026_05_03.md) — The dominant failure mode for substrate authoring this session: Otto wrote "X exists" / "command returns Y" / "table has N rows" without verifying. Carved rule: before stating ANY fact in substrate (file exists, command returns X, row count is N, tool shipped, ADR matches, persona dir present), run the actual command first. Generalizes Otto-247 + Otto-364 + verify-before-deferring at the broader any-substrate-claim layer. Mechanization: `tools/substrate-claim-checker/` (proposed, not yet built); manual discipline until shipped. Composes with bugs-per-PR-as-immune-system-health metric.
- [**Skill design — hub-satellite separation + no dynamic commands + plugin/hook packaging + OpenSpec catch-up (Aaron 2026-05-03, three same-tick rules + architectural-debt naming)**](feedback_skills_as_carved_sentences_knowledge_in_docs_datavault_2_0_pattern_aaron_2026_05_03.md) — Three cross-cutting skill-design rules: (1) skills = carved-sentence hubs, knowledge = doc satellites, DataVault 2.0 pattern; (2) no dynamic commands in skills, use TS files under tools/ and reference by path; (3) package skill domains as plugins, use harness hooks for pre/post-condition enforcement (contract-based development). PLUS OpenSpec catch-up named as load-bearing prerequisite — *"if we deleted everything other than it"* — currently sparse; catch-up is its own substantial backlog item. Recursive composition across layers (skill body / command / skill domain / cross-skill contracts / spec).
- [**Git-native backlog management + long-arc thesis as future skill DOMAIN (Aaron 2026-05-02 forward-looking architectural observation)**](feedback_git_native_backlog_management_long_arc_future_skill_domain_aaron_2026_05_02.md) — Domain emerges (6 procedure skills + 4 named-persona experts + 5 tools) once "down pat"; promotion trigger = 3+ worked examples per skill / 1+ judgment-disagreement per expert. Memo body has Aaron's verbatim quote, Aarav BP-20 composition, canonical starting set.
- [**Skill flywheel + expansion flywheel + parallel-tracks substrate — three flywheel-class questions + Aaron's same-tick skills-are-for-everyone corrective (Aaron 2026-05-02)**](feedback_skill_flywheel_expansion_flywheel_parallel_tracks_substrate_aaron_2026_05_02.md) — Skills propagate across team + harnesses; memory is per-agent. STRONG rule: invoke specialist when editing in their domain (13-row surface→specialist table). Memo body has Aaron's verbatim quote (typos preserved). Composes with B-0169 (skill not memo).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ The flywheel mechanizes the consume → expand cycle. Per the change-rate split:
- **Hub content for the flywheel:** the discipline ("at-creation: search backlog for prerequisites"), the closure-pass meta-observation procedure, the depends_on graph traversal logic — these are skill-shaped
- **Satellite content for the flywheel:** the actual rows that exist; the tags assigned; the `composes_with` cross-references; per-tick closure outputs — these are doc-shaped (the backlog itself is a satellite cluster)

The mechanizing tool (`tools/backlog/expand-from-closure.ts`) is **hub-shaped** (the mechanism stays stable); its **outputs** are satellite-shaped (per-PR closure analyses).
The mechanizing tool (`tools/backlog/expand-from-closure.ts` — proposed, not yet built; named in `feedback_skill_flywheel_*` as Phase-1b candidate) is **hub-shaped** (the mechanism would stay stable once shipped); its **outputs** would be satellite-shaped (per-PR closure analyses).
Comment thread
AceHack marked this conversation as resolved.

## Composes with existing substrate

Expand Down Expand Up @@ -137,7 +137,7 @@ The decision-archaeology skill body (B-0169 future SKILL.md) has 11 procedure la
| 4 | `git log -S "<string>" -- memory/ CLAUDE.md` | `bun tools/decision-archaeology/string-archaeology.ts "<string>"` |
| 5 | `git log -L :func:file` | `bun tools/decision-archaeology/function-archaeology.ts <func> <file>` |
| 6 | `grep -rlnE "<pattern>" docs/hygiene-history/ticks/` | `bun tools/decision-archaeology/shard-search.ts <pattern>` |
| 7 | `ls docs/DECISIONS/ \| grep <pattern>` | `bun tools/decision-archaeology/adr-search.ts <pattern>` |
| 7 | `grep -ilrE "<pattern>" docs/DECISIONS/` (single-command, regex-capable equivalent of `ls .. | grep -iE`; preserves alternation semantics; avoids markdown-table pipe-escape awkwardness) | `bun tools/decision-archaeology/adr-search.ts <pattern>` |
Comment thread
AceHack marked this conversation as resolved.
Comment thread
AceHack marked this conversation as resolved.
| 8-11 | Various searches | TS-wrapped where they involve multi-flag patterns |

Each TS file is small (often <100 lines), single-purpose, type-checked, and re-runnable. Skill body becomes carved-sentence pointers ("invoke `bun tools/decision-archaeology/blame.ts`") rather than embedded bash.
Expand Down
Loading
Loading