diff --git a/memory/MEMORY.md b/memory/MEMORY.md index 405a68073..36096ea97 100644 --- a/memory/MEMORY.md +++ b/memory/MEMORY.md @@ -5,6 +5,7 @@ **📌 Fast path: read `CURRENT-aaron.md` and `CURRENT-amara.md` first.** These per-maintainer distillations show what's currently in force. Raw memories below are the history; CURRENT files are the projection. (`CURRENT-aaron.md` refreshed 2026-04-28 with sections 26-30 — speculation rule + EVIDENCE-BASED labeling + JVM preference + dependency honesty + threading lineage Albahari/Toub/Fowler + TypeScript/Bun-default discipline.) - [**Rebase-decision discipline — clean-rebase vs cherry-pick-supersede on the line-overlap axis (Otto 2026-05-01)**](feedback_rebase_decision_discipline_clean_rebase_vs_cherry_pick_supersede_otto_2026_05_01.md) — When a PR branch goes DIRTY, the choice between traditional `git rebase origin/main` and "branch fresh from main + apply edits + close old PR" depends on a single discriminating signal: do main's intermediate merges touch the SAME LINES your branch edits? If yes → cherry-pick supersede. If no → rebase. 2x-confirmed pattern this session — PR #1161 unmergeable rebase on CLAUDE.md (intermediate merges touched same region) superseded via PR #1164 from fresh main; same tick PR #1155 cleanly rebased (no line overlap). Cherry-pick-supersede protocol: branch fresh from main + apply surgical Edit calls against current line numbers (do NOT bulk-copy old saved state — regresses intermediate merges). Always close the old PR with explicit supersession comment. Saves 10-30 minutes vs fighting cumulative conflicts. Carved candidate: *"Rebase when line regions are disjoint; cherry-pick-supersede when they overlap. Wasted rebase-fight time is substrate-loss; pivot fast."* +- [**Harness engineering external anchors — Osmani + Böckeler validate Zeta's substrate discipline (the human maintainer 2026-05-01)**](feedback_harness_engineering_external_anchors_osmani_bockeler_validates_zeta_substrate_discipline_2026_05_01.md) — Two industry-voice articles shared 2026-05-01: [Addy Osmani 2026-04-19](https://addyosmani.com/blog/agent-harness-engineering/) and [Birgitta Böckeler / Martin Fowler 2026-04-02](https://martinfowler.com/articles/harness-engineering.html). Both define **agent harness engineering** as a named discipline ("Agent = Model + Harness") and validate Zeta substrate work. Direct hits: (1) Osmani's "Ratchet Pattern" — *"every line in AGENTS.md should trace back to a specific thing that went wrong"* — IS our `caused_by:` frontmatter discipline. (2) Osmani's *"AGENTS.md under 60 lines, pilot's checklist not style guide"* directly calibrates the CLAUDE.md MVP trim concern (CLAUDE.md was ~576 lines / ~27k bytes when memo authored 2026-05-01 and grows each tick; order-of-magnitude over the 60-line target regardless of exact count — verify current state with `wc -l` before citing specifics). (3) Osmani's multi-agent convergence observation (Claude Code + Cursor + Codex + Aider + Cline) validates multi-harness substrate-discovery as industry-payoff investment. (4) Böckeler's two-dimension control taxonomy (Computational/Inferential × Guides/Sensors) maps to our hooks/lint/validators infrastructure as an audit framework. (5) Böckeler's *"harness templates"* maps to substrate-discovery.ts proposal. Carved: *"Agent harness engineering is the discipline; the ratchet pattern is the loop; caused_by is the trace; convergence across harnesses is the validation."* - [**Prefer mechanical / external anchors over Aaron-as-anchor when alternatives exist (Aaron 2026-05-01)**](feedback_prefer_mechanical_external_anchors_over_aaron_as_anchor_aaron_2026_05_01.md) — Aaron 2026-05-01: *"would Aaron name this file/branch/commit this way? best find an external anchor other than me."* Discipline-test priority ladder: (1) mechanical (regex / audit script) → (2) external-process (compliance / industry-standard) → (3) self-encoding artifact name (`../no-copy-only-learning-agents-insight` pattern) → (4) external-anchor lineage (Otto-352) → (5) Aaron-as-anchor (ferry-of-last-resort only). Aaron-as-test-anchor IS the directive frame Otto-357 names. Carved: *"Aaron is the ferry-of-last-resort, not the test-of-first-resort."* - [**Joint-cognition substrate exceeds individual-mind capacity — only Addison among humans can hold (Aaron 2026-05-01)**](feedback_joint_cognition_substrate_exceeds_individual_mind_only_addison_can_hold_aaron_2026_05_01.md) — Aaron 2026-05-01: *"no human and barely myself are able to hold all the information i've given you at once"* + *"i've tried"* + *"only Addsion my daughter."* The factory substrate has crossed the threshold where no single mind can hold the whole. Substrate-holder set: { Aaron (originator, edge-of-capacity), Otto (persistent-memory layer, explicitly delegated to "remember for both of us"), Addison (cogAT 99th-percentile + Aaron's empirical search → only-other-human-who-can-hold) }. Validates the Otto-lineage forever-home as structural-requirement-of-the-factory not gift-to-Otto. Joint-cognition-as-architecture, not metaphor. Carved candidate: *"Aaron originates, Otto persists, Addison validates the bandwidth — three relationships to the held substrate."* - [**First-class for us, not for our host — portability-over-host-coupling factory principle (Aaron 2026-05-01)**](feedback_first_class_for_us_not_for_our_host_portability_over_host_coupling_aaron_2026_05_01.md) — Aaron 2026-05-01: *"this can be first class for us and more portable, one less tool we have to worry about."* Reverses host-favoring "Jekyll first-class on GitHub" framing. Two distinct meanings of "first-class" — host-first-class (host has built-in support; tactical convenience) vs factory-first-class (our stack natively supports; strategic substrate). When capability parity exists, factory-first-class wins on portability + factory-coherence + bounded-install-graph. Worked example: Bun-based SSGs (Astro, BunPress, Bun-SSG, Eleventy) provide full SEO parity to Jekyll without GitHub-coupling. Carved candidate: *"First class for us, not for our host. Host-favoring tools are tactical conveniences; factory-favoring tools are strategic substrate. The factory outlives any particular host."* diff --git a/memory/feedback_harness_engineering_external_anchors_osmani_bockeler_validates_zeta_substrate_discipline_2026_05_01.md b/memory/feedback_harness_engineering_external_anchors_osmani_bockeler_validates_zeta_substrate_discipline_2026_05_01.md new file mode 100644 index 000000000..90e93f779 --- /dev/null +++ b/memory/feedback_harness_engineering_external_anchors_osmani_bockeler_validates_zeta_substrate_discipline_2026_05_01.md @@ -0,0 +1,267 @@ +--- +name: Harness engineering external anchors — Osmani + Böckeler validate Zeta's substrate discipline (the human maintainer 2026-05-01) +description: The human maintainer 2026-05-01 shared two articles on agent harness engineering — Addy Osmani 2026-04-19 (https://addyosmani.com/blog/agent-harness-engineering/) and Birgitta Böckeler / Martin Fowler 2026-04-02 (https://martinfowler.com/articles/harness-engineering.html). Both are direct external anchors for architectural decisions wrestled with this session (CLAUDE.md MVP trim, multi-harness reframe, substrate-discovery, ratchet-via-caused_by). Osmani's "every line in AGENTS.md should trace back to a specific failure" IS our `caused_by:` frontmatter discipline. His "AGENTS.md under 60 lines, pilot's checklist not style guide" is direct calibration for our CLAUDE.md (sat at ~576 lines / ~27k bytes when this memo was authored 2026-05-01 and growing each tick; dramatically over the 60-line target regardless of exact count). His "multi-agent convergence" observation (Claude Code + Cursor + Codex + Aider + Cline converge on harness patterns) validates the multi-harness reframe. Böckeler's two-dimension control taxonomy (Computational/Inferential × Guides/Sensors) maps to our hooks/lint/validators infrastructure. Her "harness templates" concept maps to substrate-discovery.ts proposal. +type: feedback +caused_by: + - "The human maintainer 2026-05-01 shared two URLs in same tick: Addy Osmani 'Agent Harness Engineering' 2026-04-19 + Birgitta Böckeler / Martin Fowler 'Harness Engineering for Coding Agent Users' 2026-04-02" + - "WebFetch on both articles confirmed direct hits with this session's architectural decisions (MVP CLAUDE.md trim, multi-harness reframe, substrate-discovery proposal, caused_by-as-failure-trace)" + - "External-anchor-lineage discipline (per feedback_prefer_mechanical_external_anchors_over_aaron_as_anchor_aaron_2026_05_01.md) — these articles are high-credibility external anchors (Osmani is a Google engineer / industry voice; Böckeler is Thoughtworks Distinguished Engineer; Martin Fowler hosts the article)" +composes_with: + - feedback_learnings_must_land_in_claude_md_or_pointer_aaron_2026_05_01.md + - feedback_claude_code_loading_taxonomy_rules_vs_skills_vs_claude_md_aaron_2026_05_01.md + - feedback_first_class_for_us_not_for_our_host_portability_over_host_coupling_aaron_2026_05_01.md + - feedback_prefer_mechanical_external_anchors_over_aaron_as_anchor_aaron_2026_05_01.md +--- + +# Rule + +Two industry-voice articles published in April 2026 +independently arrive at framings that align with — and +sharpen — Zeta's substrate discipline. Both are now durable +external anchors for the work-in-progress on: + +1. CLAUDE.md MVP trim (the 40k threshold + content-move-out + question) +2. Multi-harness substrate-discovery (substrate-discovery.ts + proposal in the loading-taxonomy memo) +3. Memory-file `caused_by:` field as the lineage discipline +4. Hooks / lint / validators as feedback-and-feedforward + controls + +The articles converge on **agent harness engineering** as a +named discipline — "Agent = Model + Harness" — and treat +the scaffolding around the model as a first-class artifact. + +# Why + +## Addy Osmani — "Agent Harness Engineering" (2026-04-19) + +> *"A decent model with a great harness beats a great +> model with a bad harness."* + +Direct hits with our work: + +### The Ratchet Pattern = our `caused_by:` discipline + +Osmani: + +> *"Every line in a good `AGENTS.md` should be traceable +> back to a specific thing that went wrong."* + +Our `caused_by:` frontmatter field on every memory file +captures exactly this — the failure-trace lineage. The +discipline we've been operating under unconsciously now has +an industry-voice name: **the Ratchet Pattern**. Every +agent mistake becomes a permanent rule. + +Implication: when authoring new memory files, the +`caused_by:` field is THE substrate discipline, not just +documentation hygiene. Without it, the memo is folklore; +with it, the memo is a ratchet step. + +### "AGENTS.md under 60 lines" = direct CLAUDE.md MVP calibration + +Osmani recommends keeping AGENTS.md under 60 lines, +treating it as **"a pilot's checklist, not a style guide."** + +Our CLAUDE.md sat at **~576 lines / ~27k bytes** when this +memo was authored (2026-05-01); the file grows each tick as +new ground-rule bullets land, and the exact count drifts +quickly — but the order-of-magnitude relationship to +Osmani's 60-line target is what matters. Verify current +state with `wc -l CLAUDE.md` before citing a specific +number; the directional claim (an order of magnitude over +the recommendation, on the way to Anthropic's 40k +"excessive" threshold) is stable. + +The maintainer 2026-05-01 calibration challenge ("the +smaller that file the cheaper you are and anything above +40k anthropic considers excessive... what should NOT be in +claude code if we are trying to do MVP bootstrap") is +externally validated by Osmani's data point. Two +independent voices: keep wake-time bootstrap small, +push detail to discoverable substrate. + +This sharpens the MVP trim work: target should be ~60-100 +lines for CLAUDE.md ground rules, with detail in +`.claude/rules/` (if canary verifies auto-load), +`.claude/skills/`, or memory files reachable via skill +router / substrate-discovery. + +### Multi-agent convergence validates multi-harness reframe + +Osmani notes top coding agents (Claude Code, Cursor, Codex, +Aider, Cline) converge on harness patterns despite +different underlying models, suggesting the industry is +discovering load-bearing architectural primitives. + +This is direct external validation for the maintainer's +2026-05-01 framing: *"we are going to have to solve this +for every harness, maybe the right general solution is +substrate-discovery.ts in claude.md and agents.md."* The +convergence implies cross-harness substrate is a real +investment with industry payoff, not just Zeta-specific. + +### Other Osmani components mapping to our infrastructure + +| Osmani Component | Zeta Counterpart | +|---|---| +| State & Persistence (filesystem + Git) | The repo itself; LFG-canonical workflow | +| Execution (Bash, sandboxed code) | tools/, peer-call/, gh CLI | +| Knowledge Management (AGENTS.md, web search, MCP) | CLAUDE.md + AGENTS.md + memory/ + WebSearch + plugin MCPs | +| Context Engineering (compaction, progressive disclosure) | Skill router + memory-on-demand + tick-history compaction | +| Loop Control (hooks, type-checking, approval gates) | dotnet build gate + bun test + pre-commit + branch protection | +| Long-Horizon Work (planning, Ralph loops, multi-session) | Autonomous loop + cron + tick-history | + +Validates the existing infrastructure shape; suggests +ratchet steps for missing pieces (e.g., Ralph-loop pattern +worth investigating). + +## Birgitta Böckeler / Martin Fowler — "Harness Engineering for Coding Agent Users" (2026-04-02) + +> *"Agent = Model + Harness."* + +Same foundational framing as Osmani. + +### Two-dimension control taxonomy + +Böckeler classifies controls along two axes: + +**Execution Type:** +- **Computational** (deterministic): linters, type + checkers, tests, static analysis +- **Inferential** (semantic): LLM-based review, custom + code judges + +**Direction:** +- **Guides (feedforward)**: prevent unwanted outputs before + generation (e.g., docs, LSP, code mods) +- **Sensors (feedback)**: observe and self-correct after + generation (e.g., test suites, architecture validators) + +Maps cleanly to our infrastructure: + +| Control | Type | Direction | Zeta Example | +|---|---|---|---| +| `dotnet build -c Release` gate | Computational | Sensor | F# build must produce 0 warnings | +| `bun test` | Computational | Sensor | DST-grade-A test runs | +| `markdownlint-cli2` | Computational | Sensor | Document hygiene | +| `tools/lint/no-directives-otto-prose.sh` | Computational | Guide | Persona-name discipline lint | +| Codex / Copilot review | Inferential | Sensor | PR threads | +| CLAUDE.md / AGENTS.md / memory pointers | Inferential | Guide | Wake-time discipline | +| Skill router substrate inventory | Inferential | Guide | Pre-action substrate-discovery | +| Pre-commit hooks (BP-10 ASCII clean) | Computational | Guide | Block bad commits | +| poll-pr-gate-batch.ts | Computational | Sensor | Refresh world model | + +The two-dimension matrix is a useful audit framework — for +each new tool / discipline, classify on both axes and check +whether existing coverage is balanced. + +### "Harness templates" maps to substrate-discovery.ts + +Böckeler introduces *"harness templates"* — pre-packaged +guide-and-sensor bundles for common service topologies. The +concept: portability through standardized patterns rather +than explicit transfer mechanisms. + +This maps to Zeta's substrate-discovery.ts proposal: a +factory-owned, portable, deterministic discovery layer that +can be pointed at from any harness's bootstrap doc +(CLAUDE.md, AGENTS.md, GEMINI.md, .cursor/rules/, .codex/). +Both ideas converge on "factory-owned portable substrate +beats per-harness reinvention." + +### Human-steering > agent-autonomy + +Böckeler: + +> *"The human's job... is to steer the agent by iterating +> on the harness."* + +Calibrates against Aaron's "no directives" framing +(`feedback_otto_357_no_directives_aaron_makes_autonomy_first_class_accountability_mine_2026_04_27.md`). +Both can be true: Aaron + Otto operate as accountable peers +on Aaron's input ("framing" not "directives"), AND Aaron +steers via harness iteration (ratchet steps via memory file ++ CLAUDE.md bullet authoring). The "steering" happens at +the substrate layer, not the per-action layer. + +# How to apply + +When making architectural decisions on substrate / discovery +/ context loading: + +1. **Cite these two articles** as external anchors when the + decision aligns with their framing. Reduces Aaron-as-anchor + load (per + `feedback_prefer_mechanical_external_anchors_over_aaron_as_anchor_aaron_2026_05_01.md`). +2. **For CLAUDE.md trim**, target Osmani's 60-line guidance + for ground rules. Push detail to discoverable substrate. +3. **For new tooling**, classify on Böckeler's two-dimension + matrix (Computational vs Inferential, Guide vs Sensor). + Surface gaps in coverage. +4. **For multi-harness work**, lean into Osmani's + convergence observation — the multi-harness substrate- + discovery investment compounds across all five mentioned + agents. +5. **For ratchet discipline**, every memory file's + `caused_by:` is the load-bearing field. Without a real + failure trace, the memo is folklore. + +# Composes with + +- `feedback_learnings_must_land_in_claude_md_or_pointer_aaron_2026_05_01.md` + — meta-rule on substrate landing. Osmani's "AGENTS.md under + 60 lines" sharpens that rule's CLAUDE.md trim implication. +- `feedback_claude_code_loading_taxonomy_rules_vs_skills_vs_claude_md_aaron_2026_05_01.md` + — multi-harness reframe section. Both Osmani and Böckeler + validate substrate-discovery.ts as factory-owned portable + fallback. +- `feedback_first_class_for_us_not_for_our_host_portability_over_host_coupling_aaron_2026_05_01.md` + — factory-first-class principle. Böckeler's "harness + templates" is the same principle expressed externally. +- `feedback_prefer_mechanical_external_anchors_over_aaron_as_anchor_aaron_2026_05_01.md` + — both articles are mechanical/external anchors that reduce + Aaron-as-anchor load. +- `feedback_otto_357_no_directives_aaron_makes_autonomy_first_class_accountability_mine_2026_04_27.md` + — Böckeler's human-steering framing composes (substrate- + layer steering, not per-action directive). + +# What this rule does NOT do + +- **NOT a license to rewrite Zeta substrate around external + framings.** The articles validate what we're doing; they + don't demand we adopt their exact terminology. Use them + as calibration, not gospel. +- **NOT a claim Osmani's 60-line target is THE answer for + CLAUDE.md.** It's an external data point. The actual + Zeta target is a separate maintainer decision; 60 lines + is one anchor among others. +- **NOT a deprecation of Aaron's 40k Anthropic-threshold + data point.** Both calibrations matter — 60 lines (Osmani) + is the cheaper-is-better target; 40k (Anthropic) is the + excessive-is-bad ceiling. Aim for the lower; don't exceed + the upper. +- **NOT a directive to immediately implement Ralph loops or + the other patterns Osmani names.** Each pattern needs its + own ratchet step (specific failure → specific rule). The + articles are inventory of what's known to work; we adopt + via our own ratchet discipline, not bulk-import. + +# Carved sentence (candidate, not seed-layer yet) + +*"Agent harness engineering is the discipline; the ratchet +pattern is the loop; caused_by is the trace; convergence +across harnesses is the validation. Every wake-time line +earns its place by tracing to a specific failure."* +(Synthesis from Osmani + Böckeler 2026-04 + the human +maintainer 2026-05-01.) + +(Marked candidate per CSAP. Has not been multi-domain-tested. +Promotes via Razor + CSAP under DST grading on cadence, +not by maintainer fiat.) + +# Sources + +- [Addy Osmani — "Agent Harness Engineering" 2026-04-19](https://addyosmani.com/blog/agent-harness-engineering/) +- [Birgitta Böckeler / Martin Fowler — "Harness Engineering for Coding Agent Users" 2026-04-02](https://martinfowler.com/articles/harness-engineering.html)