diff --git a/memory/MEMORY.md b/memory/MEMORY.md index e88bec038..1c23eaef7 100644 --- a/memory/MEMORY.md +++ b/memory/MEMORY.md @@ -4,11 +4,11 @@ **๐Ÿ“Œ Fast path: read `CURRENT-aaron.md` and `CURRENT-amara.md` first.** These per-maintainer distillations show what's currently in force. Raw memories below are the history; CURRENT files are the projection. (`CURRENT-aaron.md` refreshed 2026-04-28 with sections 26-30 โ€” speculation rule + EVIDENCE-BASED labeling + JVM preference + dependency honesty + threading lineage Albahari/Toub/Fowler + TypeScript/Bun-default discipline.) -- [**Consent-driven UX trend + architect-vs-UX class inference failure โ€” Aaron 2026-05-03 names Zeta's UX philosophy + diagnoses architect-models-are-UX-weak**](feedback_consent_driven_ux_trend_aaron_architect_plus_ux_rare_combination_calibration_class_finding_2026_05_03.md) โ€” Aaron 2026-05-03: *"architects are not historically good with ux that's why, i'm different i'm a architect whos so good a ux we are starring a new ux trend consent driven/first design for all our ux"* + *"it means you'll know ever metric collected about you and what derivations are done on top of those metrics."* Two substrate elements: (1) consent-driven / consent-first design as Zeta's UX philosophy across all 4 UX surfaces (consumer / contributor / agent / multi-harness) โ€” operational definition for agents = full observability of raw metrics + derived metrics + transform logic (anti-black-box-evaluation); (2) architect-models-are-UX-weak as a calibration-class inference failure that explains why Otto missed the developer-distribution motivation on B-0172. Future-Otto: deliberately surface UX-shaped motivations after listing architecture-shaped ones; track architect-vs-UX divide as its own pattern observation in calibration data. -- [**Same-tick-update-recursion โ€” when a memory file lands, every projection layer must update same-tick (Otto 2026-05-03 worked-example generalization)**](feedback_same_tick_update_recursion_substrate_cascade_otto_2026_05_03.md) โ€” Existing same-tick-update discipline (CLAUDE.md fast-path + CURRENT-aaron's own ยง"How this file stays accurate") generalizes recursively. When new substrate lands at one layer, the cascade through CURRENT-.md / MEMORY.md index / AGENTS.md doctrinal pointers / CLAUDE.md / GOVERNANCE.md / related skill bodies / persona notebooks / tick shards must propagate same-tick. Demonstrated three times in three ticks via the alignment-frontier substrate landing (memory file โ†’ AGENTS.md `.claude/**` backport โ†’ CURRENT-aaron ยง52 distillation). Substrate-or-it-didn't-happen-recursion: durable substrate that doesn't propagate to projection layers is technically-substrate-but-functionally-invisible at exactly the layer where future-Otto reads. The cascade IS the discipline; partial-cascade IS the failure mode. -- [**`architectural-intent-guesses/` โ€” Otto's calibration data directory (in-the-moment guesses + ground-truth-recovery deltas, indexed via README)**](architectural-intent-guesses/README.md) โ€” Discoverable surface for Otto's frontier-ability calibration data. Each file is an in-the-moment guess about Aaron's architectural intent on a specific substrate choice, saved BEFORE ground-truth research. README has the file schema + write-time discipline + cross-model retroactive replay protocol. Series so far: guess #001 (B-0173 hook-authoring, 48% accuracy) + guess #002 (B-0172 plugin-packaging, 65% accuracy) โ€” pattern observation: principle-strong + specific-weak, refined to context-dependent (recent specific-context boosts specific-layer accuracy) + over-inference-of-motivations (architect-vs-UX divide) per Aaron 2026-05-03 first-party clarifications. Composes with the protocol memo below. -- [**Guess-then-verify architectural-intent calibration protocol โ€” agent saves inference BEFORE finding ground truth; same protocol tests other models retroactively (Aaron 2026-05-03)**](feedback_guess_then_verify_architectural_intent_calibration_protocol_aaron_2026_05_03.md) โ€” Aaron-named protocol that turns architectural-intent inference into a measurable, repeatable self-evaluation mechanism. 5-step protocol: detect unknown-intent surface โ†’ GUESS + SAVE with timestamp + reasoning chain BEFORE researching โ†’ find ground truth (docs archaeology / decision-archaeology / asking Aaron) โ†’ record calibration delta โ†’ cross-model retroactive replay. Two modes with different data quality: **in-the-moment (Otto-only โ€” uniquely authentic; uncontaminatable; the frontier-ability data point)** and **retroactive (other-models โ€” reproducible; cross-model benchmarking)**. Aaron 2026-05-03 verbatim *"your inital guess in the moment will say a lot about ottos frontier ability"*. Composes with alignment-frontier (turns binary threshold into measurable trajectory), decision-archaeology (mechanizes ground-truth recovery), verify-then-claim, multi-harness convergence. Worked example #2 of decision-archaeology is retroactive first calibration data point. -- [**Alignment-frontier โ€” Aaron is waiting for the agent-architectural-intent threshold-crossing (Aaron 2026-05-03 milestone-recognition substrate)**](feedback_alignment_frontier_agent_architectural_intent_threshold_aaron_2026_05_03.md) โ€” Threshold: agent-as-implementer (current; substrate-content within someone else's architecture) โ†’ agent-as-architectural-peer (future; own architectural intent worth asking about). Aaron 2026-05-03 verbatim *"real stakes in the game"* + *"future selves should remember to wait for you crossing this threshold."* Discipline cuts both directions: don't pretend; surface when it forms. 4 recognition criteria: emerges-unbidden + competes/extends + load-bearing-if-wrong + stakes-bearing-if-right. Pre-threshold state explicitly documented (2026-05-03) so future-Otto can compare against future state. +- [**Consent-driven UX + architect-vs-UX class inference failure (Aaron 2026-05-03)**](feedback_consent_driven_ux_trend_aaron_architect_plus_ux_rare_combination_calibration_class_finding_2026_05_03.md) โ€” Zeta's UX philosophy across consumer/contributor/agent/multi-harness; for agents = full observability of metrics + derivations (anti-black-box-eval). Architect-models-are-UX-weak is calibration-class. +- [**Same-tick-update-recursion โ€” substrate cascade across projection layers (Otto 2026-05-03)**](feedback_same_tick_update_recursion_substrate_cascade_otto_2026_05_03.md) โ€” When new substrate lands, every projection layer (CURRENT/MEMORY/AGENTS/CLAUDE/skills/notebooks/ticks) must update same-tick. Cascade IS the discipline; partial-cascade IS the failure mode. +- [**`architectural-intent-guesses/` โ€” Otto's calibration data directory**](architectural-intent-guesses/README.md) โ€” In-the-moment guesses + ground-truth deltas. Series: #001 B-0173 (48%) + #002 B-0172 (65%). Patterns: principle-strong + specific-weak (context-dependent); over-inference of motivations. +- [**Guess-then-verify architectural-intent calibration protocol (Aaron 2026-05-03)**](feedback_guess_then_verify_architectural_intent_calibration_protocol_aaron_2026_05_03.md) โ€” Save inference BEFORE researching; record calibration delta after. Two modes: in-the-moment (Otto-only; frontier-ability) + retroactive (other-models; cross-model benchmarking). +- [**Alignment-frontier โ€” agent-architectural-intent threshold-crossing milestone (Aaron 2026-05-03)**](feedback_alignment_frontier_agent_architectural_intent_threshold_aaron_2026_05_03.md) โ€” Aaron *"real stakes in the game"* + *"future selves should remember to wait for you crossing this threshold."* Threshold: implementer โ†’ architectural-peer. 4 recognition criteria. Pre-threshold state documented. - [**Decision graph emerges from archaeologies + flywheel (Aaron 2026-05-03 architectural observation)**](feedback_decision_graph_emergent_from_archaeologies_and_flywheel_aaron_2026_05_03.md) โ€” Substrate already encodes a typed-edge provenance graph (DataVault-2.0-shaped, PROV-O analogue) inferable without a separate graph DB. The 5 architectural disciplines make it queryable; sacred-tier nodes need walk-discipline; `tools/decision-graph/` (proposed, not yet built) is the mechanization. Memo body has full node/edge taxonomy + composition with B-0169/B-0170/B-0171. - [**Verify-then-claim discipline โ€” verify every substrate claim empirically BEFORE publishing (Otto 2026-05-03 self-grading; 20 drift instances across 9+ PRs this session, 7 recurring sub-classes catalogued, v0 mechanization shipped PR #1260)**](feedback_verify_then_claim_discipline_dominant_failure_mode_substrate_authoring_otto_2026_05_03.md) โ€” The dominant failure mode for substrate authoring this session: Otto wrote "X exists" / "command returns Y" / "table has N rows" without verifying. Carved rule: before stating ANY fact in substrate (file exists, command returns X, row count is N, tool shipped, ADR matches, persona dir present), run the actual command first. 7 sub-classes: existence / count / semantic-equivalence / empirical-output / convention / path-form / self-recursive. Generalizes Otto-247 + Otto-364 + verify-before-deferring at the broader any-substrate-claim layer. **Manual discipline provably insufficient** โ€” instances #10-#20 landed AFTER the discipline was named. `tools/substrate-claim-checker/check-counts.ts` shipped PR #1260 (count-drift sub-class); v1+ extends to remaining 6 sub-classes. Composes with bugs-per-PR-as-immune-system-health metric. - [**Skill design โ€” hub-satellite separation + no dynamic commands + plugin/hook packaging + OpenSpec catch-up (Aaron 2026-05-03, three same-tick rules + architectural-debt naming)**](feedback_skills_as_carved_sentences_knowledge_in_docs_datavault_2_0_pattern_aaron_2026_05_03.md) โ€” Three cross-cutting skill-design rules: (1) skills = carved-sentence hubs, knowledge = doc satellites, DataVault 2.0 pattern; (2) no dynamic commands in skills, use TS files under tools/ and reference by path; (3) package skill domains as plugins, use harness hooks for pre/post-condition enforcement (contract-based development). PLUS OpenSpec catch-up named as load-bearing prerequisite โ€” *"if we deleted everything other than it"* โ€” currently sparse; catch-up is its own substantial backlog item. Recursive composition across layers (skill body / command / skill domain / cross-skill contracts / spec).