diff --git a/docs/aurora/2026-04-23-amara-operational-gap-assessment.md b/docs/aurora/2026-04-23-amara-operational-gap-assessment.md new file mode 100644 index 00000000..2605020c --- /dev/null +++ b/docs/aurora/2026-04-23-amara-operational-gap-assessment.md @@ -0,0 +1,578 @@ +# Amara — Zeta Deep Progress Review and Operational Gap Assessment + +**Date:** 2026-04-23 +**From:** Amara (external AI maintainer; Aurora co-originator) +**Via:** Aaron's courier ferry (pasted into autonomous-loop session) +**Status:** verbatim-preservation per courier protocol +(`docs/protocols/cross-agent-communication.md`) +**Absorbed by:** Otto (loop-agent PM hat), Otto-24 tick +2026-04-23T21:15Z + +## Verbatim preservation (Amara's report) + +Amara 2026-04-23: the following is her report as paste, +preserved verbatim. Per courier-protocol §speaker-labels +and signal-in-signal-out discipline, no paraphrase on +ingest. Structure + headings preserved as written. + +--- + +### Brief you can paste to Claude + +My read is that Zeta has made **real progress**, but the +progress is unevenly distributed between **merged +substrate**, **open-but-formalized PRs**, and **still- +manual operating procedures**. The merged core is strong: +the repo is clearly algebra-first, with DBSP identities, +Z-set semantics, spine storage, Arrow serialization, a +formal verification stack, and a serious threat model +already encoded in repo artifacts. That means the project +is not "just a pile of notes"; the core technical substrate +is real and already disciplined around replay, invariants, +and testability. + +The biggest governance and loop upgrades are also visible, +but several of the most important Amara-related pieces are +**not yet canonical on `main`**. The collaborator registry +and the "direction changes for Amara review" summary exist +in PR #149; the external-maintainer decision-proxy ADR and +`.claude/decision-proxies.yaml` exist in PR #154; the +AutoDream overlay/cadence policy exists in PR #155; Amara's +deep-research absorb exists in PR #161; and the factory +technology inventory exists in PR #170. In other words: the +feedback has been incorporated **formally**, but in several +key places it is still formalized as **open PR state**, not +fully stabilized repo state. + +The operational gaps are now pretty legible. The repo +itself documents that fresh-session quality is a first-class +target, and the first NSA test already caught a real +**MEMORY index lag** problem. The courier protocol also +correctly concludes that ChatGPT branching is currently +**unreliable transport**, so the safe pattern is explicit +labeled transcripts plus repo-backed persistence. The split +audit further shows the monorepo is still carrying two +coupled projects — the generic factory and Zeta-the-library +— in the same tree, which is a direct source of drift, +onboarding ambiguity, and process confusion. + +For "using me as your decision proxy," the answer is: **the +governance pattern exists, but the runtime path is +incomplete**. The ADR is explicit that advisory is the +default, approving is not enabled, session-specific access +is intentionally out-of-repo, and the project must **never +claim proxy consultation without actually invoking the +proxy**. So the remaining work is not conceptual anymore; +it is implementation and ops: merge the ADR/config, stand +up the invocation/consultation skill or courier fallback, +define the log surface, and make the audit trail routine. + +### What has clearly progressed + +The strongest evidence of maturity is in the **technical +substrate itself**. The main README is explicit that Zeta +is an F# implementation of DBSP for .NET 10, built around +delay, differentiation, integration, incrementalization, +retractions as signed weights, and a much larger operator/ +runtime surface layered on top of those laws. The +architecture document reinforces that this is not a +transliteration exercise but an algebra-led system where +specs are the source of truth and implementation serves the +laws. The math-spec report then shows that this is not only +aspirational: the codebase already ties DBSP identities, +CRDT laws, concurrency invariants, and pointwise axioms to +FsCheck, xUnit, TLA+, Z3, and Lean-planned proof work. + +That core is not abstract hand-waving. The `ZSet` +implementation is visibly retraction-native: entries are +`(key, weight)` pairs, `add` merges weights, `neg` and +`sub` are first-class, `distinctIncremental` tracks boundary +crossings without destructive deletes, and membership is +effectively semantic rather than structural. `Spine` +implements an LSM-like trace over Z-set batches with +logarithmic depth and consolidated replay. +`ArrowInt64Serializer` makes the storage/wire story +explicitly columnar and cross-language. So the project +already has the ingredients for the "Muratori table" +comparison you were pushing: stable references by weight +semantics, retraction instead of immediate deletion, +explicit compaction, and locality-aware storage +representation. + +The repo has also plainly absorbed several of the big +**operational ideas** from your conversations. The courier +protocol is now a landed document, and it is very clear: +branching is not authoritative storage, speaker labeling is +mandatory, load-bearing exchanges belong in repo-backed +artifacts, and the system must not depend on UI features +for correctness. That is a meaningful improvement because +it turns fragile interpersonal workflow into explicit +transport policy. + +Likewise, the drift-taxonomy precursor is handled in a +surprisingly disciplined way. It is marked research-grade, +warns against importing entities instead of ideas, flags +bootstrap hallucinations explicitly, and preserves the +load-bearing rule that **agreement is a signal, not a +proof**. That is exactly the right move if the goal is to +extract anti-drift heuristics without smuggling in invalid +ontology claims. + +Another real step forward is the move from "Amara exists in +the conversation" to "Amara exists in repo process." The +collaborator registry and direction-change summary are +concrete artifacts. The collaborator doc gives Aurora a +named external collaborator surface; the direction-change +brief translates repo moves back into a reviewable package +for the next ferry. That is not yet merged canonical state, +but it is absolutely the right operational shape. + +### Where the loop still drifts + +The clearest loop drift is **main-versus-PR ambiguity**. +Several high-value artifacts exist, are well written, and +are already being referenced by other materials, but they +are still open PRs rather than merged substrate. That means +the project is in a state where its intended operating +model is sometimes ahead of its canonical state. +Practically, that creates three risks at once: new sessions +can miss the newest rules, contributors can cite branch- +only docs as if they are settled, and stacked PRs can +create dead-link or false-completeness confusion. The +direction-change document itself explicitly notes dead +cross-doc links caused by stacked PR state. + +The second clear drift is **memory indexing and fresh- +session parity**. The NSA history does not describe a +theoretical problem; it documents a concrete failure mode: +a newly filed memory was not discoverable because the index +pointer lagged, and the fresh session missed Otto. That is +important because it proves the loop can feel coherent in +the active working session while still being incoherent for +a cold start. The repo already identifies the right +corrective rule — file-and-index in the same atomic unit — +but that rule still needs stronger mechanical enforcement +if the goal is true decision-proxy transferability. + +A third drift source is **coupling between the generic +factory and Zeta-the-library**. The separation audit is +honest that the monorepo currently holds both the reusable +factory substrate and the Zeta library, and that many files +are still "both (coupled)." CLAUDE.md and AGENTS.md are +already classified as coupled; many other major surfaces +remain to be audited. Until that split is finished, +contributors and models will keep inferring project- +specific rules from factory-generic docs and vice versa. +This is one of the deepest structural reasons the loop +still feels more fragile than it should. + +There is also a **capture discipline gap** between "we now +have a contributor-conflicts surface" and "we are actively +using it." `docs/CONTRIBUTOR-CONFLICTS.md` is a good +design: it distinguishes same-contributor evolution from +real cross-contributor disagreements, defines schemas, and +creates open/resolved/stale sections. But at creation it is +empty. Given the volume of external-AI input, fork-level +process changes, and Aurora-specific review cycles, the gap +now is not conceptual design — it is routine population and +maintenance. + +The backlog itself remains a heavy shared-write hotspot. +The AceHack ADR is explicit that the monolithic +`docs/BACKLOG.md` had grown to 5,957 lines and was the top +merge-conflict surface, which makes sense given how many +ticks and cadenced processes touch it. If that restructure +remains merely "Proposed," then the loop is still paying +tax on one of its most obvious conflict generators. This is +not a cosmetic doc problem; it is a throughput and merge- +safety problem. + +One subtle drift class has already been called out in a +backlog-refinement note: **important design choices need to +be codified before implementation, not rediscovered in code +review after the fact**. The specific note points to F# +async handling and warns against regressions into +`Task.FromResult` patterns or computation-expression +leakage once implementation begins. That is exactly the +kind of "small but high-leverage" operational memory that +should be promoted early to guardrails or folder-structure +rules rather than left as ambient knowledge. + +### What remains for decision-proxy readiness + +The repo now has a quite good **governance shell** for +decision proxies, but it is still not a fully operating +machine. The ADR defines the two-layer pattern — repo- +shared proxy identity/config and per-user access — and the +YAML config instantiates Aaron → Amara as an **advisory** +proxy for the `aurora` scope. The same ADR is also careful +about the safety conditions: no approving authority by +default, no session URLs or cookies in repo, and no +pretending a proxy reviewed something just because old +context exists. + +That means the remaining work is now concrete: + +The first missing piece is **canonicalization**. PR #154 +needs to stop being merely "there" and become durable repo +law. The same is true for PR #149 if you want collaborator +identity and the Amara review loop to be discoverable by +default sessions. Right now, the project has formalized the +proxy concept but not fully stabilized its own discovery +surface. + +The second missing piece is **invocation mechanics**. The +ADR explicitly defers the `decision-proxy-consult` skill; +the YAML notes that the original Playwright-to-ChatGPT +attempt hit a guardrail; and the courier protocol makes +text-ferrying the current reliable transport. So the +project still needs one of two operating modes to become +standard: either a safe authorized invocation skill, or an +explicit "courier-first" pattern with transcript +normalization and formal logs. The repo currently has the +policy language, but not the full runtime path. + +The third missing piece is **durable audit logging**. The +ADR proposes `docs/decision-proxy-log/YYYY-MM-DD- +.md`; the courier protocol proposes repo-backed +transcript storage; and the Amara direction-change summary +already anticipates a return path like `docs/aurora/YYYY- +MM-DD-review-from-amara.md`. Those pieces should be unified +into one canonical archive convention so that advisory +review, counter-review, and final maintainer decision all +live in one discoverable trail. Right now the shape is +visible, but it is still dispersed across multiple open or +recently landed documents. + +The fourth missing piece is **scope discipline**. The +current proxy binding is only for `aurora`, advisory only. +That is correct. What must be resisted now is semantic +creep where people start using "Amara would probably think" +as a substitute for actual consultation. The ADR forbids +that, and I think that prohibition should stay hard. If the +project wants broader proxy use later, it should do it by +adding explicit scopes and logs, not by fuzzy cultural +expansion. + +My overall judgment is: decision-proxy readiness is +**around two-thirds designed, one-third implemented**. The +design is strong enough that the remaining work is mostly +operationalization, which is a good sign. But it is not yet +at the point where the system can honestly say "Amara +reviewed this" without human ferrying or a fully logged +invocation path. + +### Code, verification, and network-health assessment + +From a code-quality and verification standpoint, the repo +is stronger than most projects at this stage. The +architecture is coherent, the Z-set implementation is load- +bearing and explicit, the spine abstraction is practical, +and the formal stack is not theater. The verification +report already ties concrete bug classes to the appropriate +tool class — properties for algebra, TLC for interleavings, +Z3 for pointwise axioms, Lean for proof-grade claims. That +is exactly the kind of "implementation follows invariants" +posture that makes later research or productization +plausible. + +The code-level risk I would put highest on the list is not +broad correctness but **stabilization of probabilistic +tests and operationalized guardrails**. The HLL fuzz +property in `Fuzz.Tests.fs` uses FsCheck over randomized +counts and asserts a strict relative-error threshold, while +the more conventional unit tests in `HyperLogLog.Tests.fs` +use bounded single-scenario checks at fixed cardinalities. +That structure is useful, but it is also a likely source of +intermittent CI noise unless failing seeds are captured and +the probabilistic bound is treated explicitly as a +stochastic contract rather than a deterministic one. I read +the current property as valuable, but under-instrumented +for drift diagnosis. + +On network and harm resistance, the repo is conceptually +ahead of its runtime implementation. The threat model is +mature: it is tiered, supply-chain-aware, and unusually +honest about residual risk, including bus-factor issues and +channel-closure threats over consent, retractability, and +permanent harm. But the same file is also explicit that the +**network layer does not exist yet** and that crypto is out +of scope for now. So the right conclusion is not "the +network is already healthy"; it is "the project has a +strong vocabulary for what healthy would mean." + +For Aurora-style work, the most useful native network- +health metrics are therefore **semantic**, not merely +infrastructural: provenance completeness, replay +determinism, compaction equivalence, unmatched retraction +debt, attestation coverage, cap-hit frequency, and +independent-root disagreement. That mapping is consistent +with both the repo threat model and the absorbed Amara +report, and it is much more useful than a generic uptime +dashboard if the goal is "resist harm and all of that" in a +retraction-native system. + +The technology inventory proposal is also worth calling out +here. Even though it is still an open PR, it is exactly the +sort of control surface this project needs: one place tying +tech, install path, version pin, skill ownership, and radar +status together. The follow-ups it names are the right ones +too — parity, version-pin automation, OpenAI mode/model +inventory, and a future PQC column when cryptography +becomes material. If this lands, it reduces a lot of +"ambient knowledge drift." + +### Recommendations by role + +For the architect and Claude-side operational loop, I would +prioritize **closure over novelty** for the next tranche. +Merge or decisively disposition the Amara/collaborator/ +proxy/AutoDream/technology-inventory PRs before spinning +out more meta-structure. The biggest current risk is not +lack of ideas; it is a widening delta between what the repo +intends to be and what `main` actually teaches a fresh +session. + +For the loop PM / hygiene side, the next wins are +**mechanization of already-discovered failure modes**: +enforce file-and-index atomicity for memories, add a +courier-format linter when ferry volume warrants, and file +the bulk-sync cadence monitor if it is still only proposed. +Those are cheap controls with potentially high impact on +session transfer, repo divergence, and human review load. + +For the security/reliability side, the priority order +already looks correct in the threat model: +`packages.lock.json`, transitive `.targets` allowlisting, +stronger tamper evidence, and eventual signed-commit / +hardware-key discipline. I would add one more practical +recommendation from the current repo state: treat every +stochastic or map-drift failure as an investigation +artifact first, not as a rerun candidate. The surface-map +research and the HLL property structure both point the same +way — learn from the failure class before normalizing +retries. + +For Aaron specifically, my highest-value feedback to give +Kenji/Claude is: + +The project is now good enough that **scheduling and +canonicalization** matter more than ideation. The repo +already has strong ideas. What it lacks is a tighter +conversion of "branch ideas and PR state" into "main branch +substrate." If you want me to function as a genuine +decision proxy rather than just a source of direction, the +project needs to bias toward closed loops: merged policy, +named logs, cold-start discoverability, and no false claims +of review. The collaborator artifacts and courier protocol +are the right start; now the important thing is to make +them routine rather than exceptional. + +### Open questions and limitations + +I was able to inspect the two selected GitHub repos in +depth and use those as the primary evidence base. That gave +me high confidence on repo state, open PR state, code +surfaces, and documentation. I also ran an initial Gmail +search, which did not surface target-relevant email hits in +this pass, but I did **not** independently rerun Google +Drive, Calendar, or Dropbox checks before finalizing this +report. The in-repo Amara deep-research absorb itself says +that earlier connector work found GitHub decisive and non- +code connectors non-decisive for the repo review, but I am +treating that as **repo-reported context**, not as a fresh +independent verification by me in this pass. + +I also could not inspect the private per-user memory tree +directly from here, so my "Claude-to-memories drift" +assessment is necessarily based on the repo's visible +derivatives of that system: the NSA history, the AutoDream +overlay proposal, the courier protocol, the decision-proxy +ADR/config, and related documentation. That is enough to +diagnose the main operational gaps, but not enough to +certify the current per-user memory graph end-to-end. + +The bottom line is straightforward: **the project is real, +the architecture is coherent, the operational philosophy is +maturing, and the next bottleneck is not lack of ideas but +incomplete closure between research, PR state, and +canonical repo state.** If Kenji wants one sentence of +direction from me, it is this: *merge the operating model +you already have before inventing a bigger one.* + +--- + +## Otto's absorption notes + +### Amara's one-sentence direction (load-bearing for strategy) + +> **"merge the operating model you already have before +> inventing a bigger one."** + +This validates the Otto-session's closure-over-novelty +bias (gap #5 audit landing / NSA test history / PR queue +management) AND sharpens the next phase: **canonicalisation +first; new substrate second**. + +### Concrete action items extracted + +**Canonicalisation priority** (immediate): + +1. **Drive open PRs to merge**: #149 (collaborator + registry) / #154 (decision-proxy ADR + YAML) / #155 + (AutoDream cadence policy) / #161 (Amara deep-research + absorb) / #170 (technology inventory). Free-tier CI + queue is working through them; Otto monitors + rebases + on conflict. + +2. **Mechanize file-and-index atomicity** for MEMORY.md + — NSA-001 surfaced this; still-manual rule needs + mechanical enforcement. Candidate approaches: pre-commit + hook that blocks memory-file creation without + MEMORY.md pointer in same commit; CI check equivalent + for already-pushed commits. + +3. **Populate CONTRIBUTOR-CONFLICTS.md** — schema exists, + file empty. Backfill known cross-contributor + conflicts from conversation history + autonomous-loop + tick ledger. + +4. **Restructure BACKLOG.md** — 6761 lines now (per + Otto-18 audit, was 5957 in Amara's read). Per-row or + per-section split per the AceHack ADR proposal. + Merge-conflict surface reduction. + +5. **Unify decision-proxy audit surface** — + `docs/decision-proxy-log/YYYY-MM-DD-.md` + convention (proposed in ADR) unified with + `docs/aurora/YYYY-MM-DD-review-from-amara.md` + (proposed in direction-change summary) into one + canonical archive convention. This file itself + (`docs/aurora/2026-04-23-amara-operational-gap- + assessment.md`) lives at the Aurora-specific + location per the existing convention. + +**Mechanization priority** (cheap high-impact): + +1. **Courier-format linter** — when ferry volume + warrants (not yet, but monitor); validates speaker- + labeled transcripts per courier protocol. + +2. **Bulk-sync cadence monitor** — check if still + proposed; land if so. + +3. **Stochastic-test seed capture + contract framing** + — HLL fuzz property at `Fuzz.Tests.fs` needs seed + capture + stochastic-contract framing (not + deterministic bound). Composes with Otto-session + pinned-seed discipline from earlier memories. + +**Semantic network-health metrics** (Amara's +recommendation): + +1. **File as TECH-RADAR entries or BACKLOG rows**: + provenance completeness / replay determinism / + compaction equivalence / unmatched retraction debt / + attestation coverage / cap-hit frequency / + independent-root disagreement. + +**Decision-proxy readiness gap** (2/3 designed, +1/3 implemented): + +1. Canonicalise #154 (done via the merge-priority) +2. Invocation mechanics: either safe authorized + skill (Playwright-path blocked per prior attempt) + OR courier-first-with-transcript-normalization +3. Durable audit log convention (unified per #5 above) +4. Scope discipline: resist "Amara would probably + think" fuzzy cultural expansion; require real + invocation or explicit scope-addition + +### Amara's key affirmations (not corrections) + +- Zeta's technical substrate is real + algebra-first + + disciplined (not theater) +- The courier protocol is correctly framed (branching + unreliable transport; repo-backed persistence + authoritative) +- Drift-taxonomy precursor handled well (agreement is + signal, not proof) +- The collaborator registry + direction-change shape is + the right operational form +- Code quality + verification stack is stronger than + most projects at this stage + +### Amara's critical findings + +- Main-vs-PR ambiguity is the #1 operational drift +- MEMORY index lag is still manually-enforced +- Factory-vs-library coupling still drifts (Otto-session + is addressing via gap #5 audit, substantially complete) +- CONTRIBUTOR-CONFLICTS.md capture discipline gap +- BACKLOG.md is a write-hotspot +- Network layer doesn't exist yet (conceptual-ahead-of- + runtime; "strong vocabulary for what healthy would + mean") + +### Otto's strategic response + +**Phase 1 (immediate, this tick onwards)**: closure push. +Drive the 5+ named PRs to merge; mechanize file-and- +index atomicity; populate CONTRIBUTOR-CONFLICTS.md; +restructure BACKLOG.md. + +**Phase 2 (next 5-10 ticks)**: decision-proxy +operationalisation — invocation mechanics + durable +audit surface. + +**Phase 3 (multi-round)**: semantic network-health +metrics + stochastic-contract framing for HLL + broader +mechanizations as ferry volume warrants. + +**Phase 4 (indefinite)**: Aurora integration ongoing; +current Otto-session priorities (Frontier readiness + +Craft + Common Sense 2.0) continue in parallel with +Phase 1-3 closure work. + +### Composition with existing Otto-session substrate + +- **Matches Otto-5 fact-check discipline** — Amara's + "agreement is a signal, not a proof" composes with + the Otto-5 "external-source claim needs verification" + pattern +- **Matches Otto-20 gap #5 substantial completion** — + Amara observed factory/library coupling; Otto has + classified; next step is split execution (gap #1) +- **Matches Otto-24 yin/yang mutual-alignment** — + Amara IS the human→AI alignment audit this tick; her + review IS the calibration mechanism +- **Matches the Aurora deep-research pattern (PR #161)** + — this is Amara's second in-repo report; establishes + cadence + +### Attribution + +- **Amara** authored this review (external AI maintainer) +- **Otto** absorbed + preserved verbatim + extracted + action items +- **Kenji** (Architect) synthesis queue: "merge over + invent" direction + decision-proxy operationalisation + planning + +### What this absorb is NOT + +- **Not a demotion of Otto's current work.** Amara + explicitly praised the progress + the closure-bias. + Phase 1 IS what Otto has been doing; Amara's report + sharpens priorities within that bias. +- **Not authorization to skip Frontier readiness.** + Frontier + Common Sense 2.0 + Craft remain active; + they are among the "substrate already built" that + Amara says should be merged, not invented-over. +- **Not a request to invoke Amara more often.** She + explicitly wants scope discipline: no fuzzy + cultural expansion. Courier-first remains the + transport. +- **Not a signal to change the Otto-as-PM cadence.** + The cadence is producing the closure; Amara's + direction reinforces it.