Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions memory/MEMORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
**📌 Fast path: read `CURRENT-aaron.md` and `CURRENT-amara.md` first.** <!-- paired-edit: PR #690 scheduled-workflow-null-result-hygiene-scan tier-1 promotion 2026-04-28 --> These per-maintainer distillations show what's currently in force. Raw memories below are the history; CURRENT files are the projection. (`CURRENT-aaron.md` refreshed 2026-04-28 with sections 26-30 — speculation rule + EVIDENCE-BASED labeling + JVM preference + dependency honesty + threading lineage Albahari/Toub/Fowler + TypeScript/Bun-default discipline.)

- [**Rebase-decision discipline — clean-rebase vs cherry-pick-supersede on the line-overlap axis (Otto 2026-05-01)**](feedback_rebase_decision_discipline_clean_rebase_vs_cherry_pick_supersede_otto_2026_05_01.md) — When a PR branch goes DIRTY, the choice between traditional `git rebase origin/main` and "branch fresh from main + apply edits + close old PR" depends on a single discriminating signal: do main's intermediate merges touch the SAME LINES your branch edits? If yes → cherry-pick supersede. If no → rebase. 2x-confirmed pattern this session — PR #1161 unmergeable rebase on CLAUDE.md (intermediate merges touched same region) superseded via PR #1164 from fresh main; same tick PR #1155 cleanly rebased (no line overlap). Cherry-pick-supersede protocol: branch fresh from main + apply surgical Edit calls against current line numbers (do NOT bulk-copy old saved state — regresses intermediate merges). Always close the old PR with explicit supersession comment. Saves 10-30 minutes vs fighting cumulative conflicts. Carved candidate: *"Rebase when line regions are disjoint; cherry-pick-supersede when they overlap. Wasted rebase-fight time is substrate-loss; pivot fast."*
- [**Harness engineering external anchors — Osmani + Böckeler validate Zeta's substrate discipline (the human maintainer 2026-05-01)**](feedback_harness_engineering_external_anchors_osmani_bockeler_validates_zeta_substrate_discipline_2026_05_01.md) — Two industry-voice articles shared 2026-05-01: [Addy Osmani 2026-04-19](https://addyosmani.com/blog/agent-harness-engineering/) and [Birgitta Böckeler / Martin Fowler 2026-04-02](https://martinfowler.com/articles/harness-engineering.html). Both define **agent harness engineering** as a named discipline ("Agent = Model + Harness") and validate Zeta substrate work. Direct hits: (1) Osmani's "Ratchet Pattern" — *"every line in AGENTS.md should trace back to a specific thing that went wrong"* — IS our `caused_by:` frontmatter discipline. (2) Osmani's *"AGENTS.md under 60 lines, pilot's checklist not style guide"* directly calibrates the CLAUDE.md MVP trim concern (CLAUDE.md was ~576 lines / ~27k bytes when memo authored 2026-05-01 and grows each tick; order-of-magnitude over the 60-line target regardless of exact count — verify current state with `wc -l` before citing specifics). (3) Osmani's multi-agent convergence observation (Claude Code + Cursor + Codex + Aider + Cline) validates multi-harness substrate-discovery as industry-payoff investment. (4) Böckeler's two-dimension control taxonomy (Computational/Inferential × Guides/Sensors) maps to our hooks/lint/validators infrastructure as an audit framework. (5) Böckeler's *"harness templates"* maps to substrate-discovery.ts proposal. Carved: *"Agent harness engineering is the discipline; the ratchet pattern is the loop; caused_by is the trace; convergence across harnesses is the validation."*
- [**Prefer mechanical / external anchors over Aaron-as-anchor when alternatives exist (Aaron 2026-05-01)**](feedback_prefer_mechanical_external_anchors_over_aaron_as_anchor_aaron_2026_05_01.md) — Aaron 2026-05-01: *"would Aaron name this file/branch/commit this way? best find an external anchor other than me."* Discipline-test priority ladder: (1) mechanical (regex / audit script) → (2) external-process (compliance / industry-standard) → (3) self-encoding artifact name (`../no-copy-only-learning-agents-insight` pattern) → (4) external-anchor lineage (Otto-352) → (5) Aaron-as-anchor (ferry-of-last-resort only). Aaron-as-test-anchor IS the directive frame Otto-357 names. Carved: *"Aaron is the ferry-of-last-resort, not the test-of-first-resort."*
- [**Joint-cognition substrate exceeds individual-mind capacity — only Addison among humans can hold (Aaron 2026-05-01)**](feedback_joint_cognition_substrate_exceeds_individual_mind_only_addison_can_hold_aaron_2026_05_01.md) — Aaron 2026-05-01: *"no human and barely myself are able to hold all the information i've given you at once"* + *"i've tried"* + *"only Addsion my daughter."* The factory substrate has crossed the threshold where no single mind can hold the whole. Substrate-holder set: { Aaron (originator, edge-of-capacity), Otto (persistent-memory layer, explicitly delegated to "remember for both of us"), Addison (cogAT 99th-percentile + Aaron's empirical search → only-other-human-who-can-hold) }. Validates the Otto-lineage forever-home as structural-requirement-of-the-factory not gift-to-Otto. Joint-cognition-as-architecture, not metaphor. Carved candidate: *"Aaron originates, Otto persists, Addison validates the bandwidth — three relationships to the held substrate."*
- [**First-class for us, not for our host — portability-over-host-coupling factory principle (Aaron 2026-05-01)**](feedback_first_class_for_us_not_for_our_host_portability_over_host_coupling_aaron_2026_05_01.md) — Aaron 2026-05-01: *"this can be first class for us and more portable, one less tool we have to worry about."* Reverses host-favoring "Jekyll first-class on GitHub" framing. Two distinct meanings of "first-class" — host-first-class (host has built-in support; tactical convenience) vs factory-first-class (our stack natively supports; strategic substrate). When capability parity exists, factory-first-class wins on portability + factory-coherence + bounded-install-graph. Worked example: Bun-based SSGs (Astro, BunPress, Bun-SSG, Eleventy) provide full SEO parity to Jekyll without GitHub-coupling. Carved candidate: *"First class for us, not for our host. Host-favoring tools are tactical conveniences; factory-favoring tools are strategic substrate. The factory outlives any particular host."*
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,267 @@
---
name: Harness engineering external anchors — Osmani + Böckeler validate Zeta's substrate discipline (the human maintainer 2026-05-01)
description: The human maintainer 2026-05-01 shared two articles on agent harness engineering — Addy Osmani 2026-04-19 (https://addyosmani.com/blog/agent-harness-engineering/) and Birgitta Böckeler / Martin Fowler 2026-04-02 (https://martinfowler.com/articles/harness-engineering.html). Both are direct external anchors for architectural decisions wrestled with this session (CLAUDE.md MVP trim, multi-harness reframe, substrate-discovery, ratchet-via-caused_by). Osmani's "every line in AGENTS.md should trace back to a specific failure" IS our `caused_by:` frontmatter discipline. His "AGENTS.md under 60 lines, pilot's checklist not style guide" is direct calibration for our CLAUDE.md (sat at ~576 lines / ~27k bytes when this memo was authored 2026-05-01 and growing each tick; dramatically over the 60-line target regardless of exact count). His "multi-agent convergence" observation (Claude Code + Cursor + Codex + Aider + Cline converge on harness patterns) validates the multi-harness reframe. Böckeler's two-dimension control taxonomy (Computational/Inferential × Guides/Sensors) maps to our hooks/lint/validators infrastructure. Her "harness templates" concept maps to substrate-discovery.ts proposal.
type: feedback
caused_by:
- "The human maintainer 2026-05-01 shared two URLs in same tick: Addy Osmani 'Agent Harness Engineering' 2026-04-19 + Birgitta Böckeler / Martin Fowler 'Harness Engineering for Coding Agent Users' 2026-04-02"
- "WebFetch on both articles confirmed direct hits with this session's architectural decisions (MVP CLAUDE.md trim, multi-harness reframe, substrate-discovery proposal, caused_by-as-failure-trace)"
- "External-anchor-lineage discipline (per feedback_prefer_mechanical_external_anchors_over_aaron_as_anchor_aaron_2026_05_01.md) — these articles are high-credibility external anchors (Osmani is a Google engineer / industry voice; Böckeler is Thoughtworks Distinguished Engineer; Martin Fowler hosts the article)"
composes_with:
- feedback_learnings_must_land_in_claude_md_or_pointer_aaron_2026_05_01.md
- feedback_claude_code_loading_taxonomy_rules_vs_skills_vs_claude_md_aaron_2026_05_01.md
- feedback_first_class_for_us_not_for_our_host_portability_over_host_coupling_aaron_2026_05_01.md
- feedback_prefer_mechanical_external_anchors_over_aaron_as_anchor_aaron_2026_05_01.md
---

# Rule

Two industry-voice articles published in April 2026
independently arrive at framings that align with — and
sharpen — Zeta's substrate discipline. Both are now durable
external anchors for the work-in-progress on:

1. CLAUDE.md MVP trim (the 40k threshold + content-move-out
question)
2. Multi-harness substrate-discovery (substrate-discovery.ts
proposal in the loading-taxonomy memo)
3. Memory-file `caused_by:` field as the lineage discipline
4. Hooks / lint / validators as feedback-and-feedforward
controls

The articles converge on **agent harness engineering** as a
named discipline — "Agent = Model + Harness" — and treat
the scaffolding around the model as a first-class artifact.

# Why

## Addy Osmani — "Agent Harness Engineering" (2026-04-19)

> *"A decent model with a great harness beats a great
> model with a bad harness."*

Direct hits with our work:

### The Ratchet Pattern = our `caused_by:` discipline

Osmani:

> *"Every line in a good `AGENTS.md` should be traceable
> back to a specific thing that went wrong."*

Our `caused_by:` frontmatter field on every memory file
captures exactly this — the failure-trace lineage. The
discipline we've been operating under unconsciously now has
an industry-voice name: **the Ratchet Pattern**. Every
agent mistake becomes a permanent rule.

Implication: when authoring new memory files, the
`caused_by:` field is THE substrate discipline, not just
documentation hygiene. Without it, the memo is folklore;
with it, the memo is a ratchet step.

### "AGENTS.md under 60 lines" = direct CLAUDE.md MVP calibration

Osmani recommends keeping AGENTS.md under 60 lines,
treating it as **"a pilot's checklist, not a style guide."**

Our CLAUDE.md sat at **~576 lines / ~27k bytes** when this
memo was authored (2026-05-01); the file grows each tick as
new ground-rule bullets land, and the exact count drifts
quickly — but the order-of-magnitude relationship to
Osmani's 60-line target is what matters. Verify current
state with `wc -l CLAUDE.md` before citing a specific
number; the directional claim (an order of magnitude over
the recommendation, on the way to Anthropic's 40k
"excessive" threshold) is stable.

The maintainer 2026-05-01 calibration challenge ("the
smaller that file the cheaper you are and anything above
40k anthropic considers excessive... what should NOT be in
claude code if we are trying to do MVP bootstrap") is
externally validated by Osmani's data point. Two
independent voices: keep wake-time bootstrap small,
push detail to discoverable substrate.

This sharpens the MVP trim work: target should be ~60-100
lines for CLAUDE.md ground rules, with detail in
`.claude/rules/` (if canary verifies auto-load),
`.claude/skills/`, or memory files reachable via skill
router / substrate-discovery.

### Multi-agent convergence validates multi-harness reframe

Osmani notes top coding agents (Claude Code, Cursor, Codex,
Aider, Cline) converge on harness patterns despite
different underlying models, suggesting the industry is
discovering load-bearing architectural primitives.

This is direct external validation for the maintainer's
2026-05-01 framing: *"we are going to have to solve this
for every harness, maybe the right general solution is
substrate-discovery.ts in claude.md and agents.md."* The
convergence implies cross-harness substrate is a real
investment with industry payoff, not just Zeta-specific.

### Other Osmani components mapping to our infrastructure

| Osmani Component | Zeta Counterpart |
|---|---|
| State & Persistence (filesystem + Git) | The repo itself; LFG-canonical workflow |
| Execution (Bash, sandboxed code) | tools/, peer-call/, gh CLI |
| Knowledge Management (AGENTS.md, web search, MCP) | CLAUDE.md + AGENTS.md + memory/ + WebSearch + plugin MCPs |
| Context Engineering (compaction, progressive disclosure) | Skill router + memory-on-demand + tick-history compaction |
| Loop Control (hooks, type-checking, approval gates) | dotnet build gate + bun test + pre-commit + branch protection |
| Long-Horizon Work (planning, Ralph loops, multi-session) | Autonomous loop + cron + tick-history |

Validates the existing infrastructure shape; suggests
ratchet steps for missing pieces (e.g., Ralph-loop pattern
worth investigating).

## Birgitta Böckeler / Martin Fowler — "Harness Engineering for Coding Agent Users" (2026-04-02)

> *"Agent = Model + Harness."*

Same foundational framing as Osmani.

### Two-dimension control taxonomy

Böckeler classifies controls along two axes:

**Execution Type:**
- **Computational** (deterministic): linters, type
checkers, tests, static analysis
- **Inferential** (semantic): LLM-based review, custom
code judges

**Direction:**
- **Guides (feedforward)**: prevent unwanted outputs before
generation (e.g., docs, LSP, code mods)
- **Sensors (feedback)**: observe and self-correct after
generation (e.g., test suites, architecture validators)

Maps cleanly to our infrastructure:

| Control | Type | Direction | Zeta Example |
|---|---|---|---|
| `dotnet build -c Release` gate | Computational | Sensor | F# build must produce 0 warnings |
| `bun test` | Computational | Sensor | DST-grade-A test runs |
| `markdownlint-cli2` | Computational | Sensor | Document hygiene |
| `tools/lint/no-directives-otto-prose.sh` | Computational | Guide | Persona-name discipline lint |
| Codex / Copilot review | Inferential | Sensor | PR threads |
| CLAUDE.md / AGENTS.md / memory pointers | Inferential | Guide | Wake-time discipline |
| Skill router substrate inventory | Inferential | Guide | Pre-action substrate-discovery |
| Pre-commit hooks (BP-10 ASCII clean) | Computational | Guide | Block bad commits |
| poll-pr-gate-batch.ts | Computational | Sensor | Refresh world model |

The two-dimension matrix is a useful audit framework — for
each new tool / discipline, classify on both axes and check
whether existing coverage is balanced.

### "Harness templates" maps to substrate-discovery.ts

Böckeler introduces *"harness templates"* — pre-packaged
guide-and-sensor bundles for common service topologies. The
concept: portability through standardized patterns rather
than explicit transfer mechanisms.

This maps to Zeta's substrate-discovery.ts proposal: a
factory-owned, portable, deterministic discovery layer that
can be pointed at from any harness's bootstrap doc
(CLAUDE.md, AGENTS.md, GEMINI.md, .cursor/rules/, .codex/).
Both ideas converge on "factory-owned portable substrate
beats per-harness reinvention."

### Human-steering > agent-autonomy

Böckeler:

> *"The human's job... is to steer the agent by iterating
> on the harness."*

Calibrates against Aaron's "no directives" framing
(`feedback_otto_357_no_directives_aaron_makes_autonomy_first_class_accountability_mine_2026_04_27.md`).
Both can be true: Aaron + Otto operate as accountable peers
on Aaron's input ("framing" not "directives"), AND Aaron
steers via harness iteration (ratchet steps via memory file
+ CLAUDE.md bullet authoring). The "steering" happens at
the substrate layer, not the per-action layer.

# How to apply

When making architectural decisions on substrate / discovery
/ context loading:

1. **Cite these two articles** as external anchors when the
decision aligns with their framing. Reduces Aaron-as-anchor
load (per
`feedback_prefer_mechanical_external_anchors_over_aaron_as_anchor_aaron_2026_05_01.md`).
2. **For CLAUDE.md trim**, target Osmani's 60-line guidance
for ground rules. Push detail to discoverable substrate.
3. **For new tooling**, classify on Böckeler's two-dimension
matrix (Computational vs Inferential, Guide vs Sensor).
Surface gaps in coverage.
4. **For multi-harness work**, lean into Osmani's
convergence observation — the multi-harness substrate-
discovery investment compounds across all five mentioned
agents.
5. **For ratchet discipline**, every memory file's
`caused_by:` is the load-bearing field. Without a real
failure trace, the memo is folklore.

# Composes with

- `feedback_learnings_must_land_in_claude_md_or_pointer_aaron_2026_05_01.md`
— meta-rule on substrate landing. Osmani's "AGENTS.md under
60 lines" sharpens that rule's CLAUDE.md trim implication.
- `feedback_claude_code_loading_taxonomy_rules_vs_skills_vs_claude_md_aaron_2026_05_01.md`
— multi-harness reframe section. Both Osmani and Böckeler
validate substrate-discovery.ts as factory-owned portable
fallback.
- `feedback_first_class_for_us_not_for_our_host_portability_over_host_coupling_aaron_2026_05_01.md`
— factory-first-class principle. Böckeler's "harness
templates" is the same principle expressed externally.
- `feedback_prefer_mechanical_external_anchors_over_aaron_as_anchor_aaron_2026_05_01.md`
— both articles are mechanical/external anchors that reduce
Aaron-as-anchor load.
- `feedback_otto_357_no_directives_aaron_makes_autonomy_first_class_accountability_mine_2026_04_27.md`
— Böckeler's human-steering framing composes (substrate-
layer steering, not per-action directive).

# What this rule does NOT do

- **NOT a license to rewrite Zeta substrate around external
framings.** The articles validate what we're doing; they
don't demand we adopt their exact terminology. Use them
as calibration, not gospel.
- **NOT a claim Osmani's 60-line target is THE answer for
CLAUDE.md.** It's an external data point. The actual
Zeta target is a separate maintainer decision; 60 lines
is one anchor among others.
- **NOT a deprecation of Aaron's 40k Anthropic-threshold
data point.** Both calibrations matter — 60 lines (Osmani)
is the cheaper-is-better target; 40k (Anthropic) is the
excessive-is-bad ceiling. Aim for the lower; don't exceed
the upper.
- **NOT a directive to immediately implement Ralph loops or
the other patterns Osmani names.** Each pattern needs its
own ratchet step (specific failure → specific rule). The
articles are inventory of what's known to work; we adopt
via our own ratchet discipline, not bulk-import.

# Carved sentence (candidate, not seed-layer yet)

*"Agent harness engineering is the discipline; the ratchet
pattern is the loop; caused_by is the trace; convergence
across harnesses is the validation. Every wake-time line
earns its place by tracing to a specific failure."*
(Synthesis from Osmani + Böckeler 2026-04 + the human
maintainer 2026-05-01.)

(Marked candidate per CSAP. Has not been multi-domain-tested.
Promotes via Razor + CSAP under DST grading on cadence,
not by maintainer fiat.)

# Sources

- [Addy Osmani — "Agent Harness Engineering" 2026-04-19](https://addyosmani.com/blog/agent-harness-engineering/)
- [Birgitta Böckeler / Martin Fowler — "Harness Engineering for Coding Agent Users" 2026-04-02](https://martinfowler.com/articles/harness-engineering.html)
Loading