diff --git a/docs/research/2026-05-03-substrate-discovery-zeta-native-aot-scoping.md b/docs/research/2026-05-03-substrate-discovery-zeta-native-aot-scoping.md index 95e45de2a..3e4d400e6 100644 --- a/docs/research/2026-05-03-substrate-discovery-zeta-native-aot-scoping.md +++ b/docs/research/2026-05-03-substrate-discovery-zeta-native-aot-scoping.md @@ -48,10 +48,209 @@ default: Edge-runner discipline (the human maintainer 2026-05-03) says ship the dogfood. -Alternatives considered + rejected: TS + sqlite-vec/DuckDB -(faster but doesn't dogfood); live-off-the-land via Skill -router + grep (punts architecture); hybrid TS+Zeta (two -systems, more complexity). +**Updated 2026-05-03** (post-#1385 merge corrections from +the human maintainer). Two epistemic-discipline corrections +re-grade the original framing: + +### Correction 1 — chat is an assertion-channel, not a fact-channel + +The maintainer 2026-05-03 verbatim: *"when i speak i'm +making assertions, that's the best way to describe this +chat channel."* Chat-claims (his OR the architect's) are +assertions; they need evidence to be elevated to +architectural fact. The architect's failure mode in #1385: +echoed the maintainer's *"maybe"* on live-off-the-land back +as an architectural fact. Push-back-with-evidence is the +discipline. + +### Correction 2 — alternatives are complementary, not exclusive + +The maintainer 2026-05-03 verbatim: *"i like hybrid for +verification duckdb is very advanced too and we want a lot +of its features we can verify against it behavior too, we +don't want to copy it's code at all we are very differnt +but it has some awesome feature."* The original "rejected" +framing was too binary. + +### Re-graded architecture (with evidence labels) + +| Layer | Status | Evidence base | +|---|---|---| +| Zeta-native-AOT canonical index | **Decision (architect, within authority)** | Algebra match (fact: workload IS Z-set); dogfood-leverage (assertion, supported by math-proofs A-grade); deployment story (hypothesis pending Phase 0 PoC) | +| DuckDB as verification oracle | **Assertion (maintainer 2026-05-03), worth pursuing** | DuckDB feature-richness (fact, well-known); cross-check-as-property-test pattern (precedent: Lean cross-checks paper); pattern extends to git per maintainer 2026-05-03 (*"some compabilty testing you do with duck you can do with git to slowly replace that"*) — composes with existing `memory/feedback_git_interface_wasm_bootstrap_zero_requirements_2026_04_24.md` architectural commitment (Zeta IS git client+server; native F# impl; two-UI Frontier+Mode-1-admin+WASM-Mode-2; both zero-install). | +| Live-off-the-land for harness-loaded surfaces | **Hypothesis pending research** | Maintainer said "maybe"; zero observed-behavior evidence; falsifiable via canary test + skill-persona behavioral observation | +| Distribution feasibility (NativeAOT single-binary) | **Make-or-break risk per maintainer assertion** | Need cross-platform empirical test (linux-x64 / osx-arm64 / win-x64); known-unknown | + +### Push-back: what would establish the live-off-the-land hypothesis? + +The current claim has zero evidence base. The maintainer's +"maybe" is directional input, not data. Concrete falsifiable +tests: + +1. **`.claude/rules/` auto-load canary** (fixture exists at + `.claude/rules/test-canary.md`): does a fresh Claude Code + session in this repo see the canary string without being + told to read the file? Pass = harness-native loading + covers some of the substrate-discovery problem; fail = + it doesn't, and the live-off-the-land path needs work. + +2. **Skill-persona behavioral observation:** Do existing + skill personas (.claude/skills//SKILL.md) actually + succeed at finding what they need with `Skill` router + + grep + glob alone, or do they regularly fail / reach for + substrate that isn't router-discoverable? Measurable by + reading skill execution logs (if they exist) or + instrumenting one tick to log every `Skill` invocation + and its outcome. + +3. **External-PR-reviewer behavioral observation:** External + review agents (`/ultrareview`, automated PR reviewers) + either find what they need or they don't. Observable on + recent PR review threads; we can sample the last ~50 + review comments and classify "agent had context to + answer" vs "agent missed context that lived in + substrate". + +Until at least one of these tests produces data, "live-off- +the-land for harness-loaded surfaces" is a hypothesis to be +tested, NOT an architectural decision to be encoded. Phase 0 +PoC scope expanded: include ONE of the three tests above as +prerequisite evidence before building the substrate- +discovery layer that would integrate with live-off-the- +land. + +### Distribution feasibility — existing AOT core + JIT plugin architecture + +**Updated 2026-05-03** (the human maintainer): the dual-mode +framing in this doc was reinventing existing prior art. *"we +already have a AOT core that can load JIT plugins see the +Baseyan."* Verified in repo: `src/Bayesian/Bayesian.fsproj` +line 9 explicit comment — *"Explicitly NOT AOT-enforced — +this is a plugin. Core stays AOT-clean."* — and the project +description *"Opt-in: this project doesn't enforce +PublishAot=true because it may optionally use Infer.NET, +which depends on reflection-emit."* + +The actual architecture (already shipping): + +- **Zeta.Core** (`src/Core/Core.fsproj`) = AOT-clean library. + Includes `PluginApi.fs` (`IOperator<'TOut>` plugin-author + contract, `OutputBuffer`, `StreamHandle`) and + `PluginHarness.fs` (test harness for plugin operator + authors). Contains `IndexedZSet.fs`, `Incremental.fs`, + `Operators.fs` — the substrate-discovery primitives. + +- **Plugin projects** (`src/Bayesian/`, future + `src/SubstrateDiscovery.Plugins.*/`, etc.) = separate + fsproj files that reference Zeta.Core, implement the + `IOperator<'TOut>` contract, and are **not** AOT-enforced + so they can use reflection-heavy libraries (Infer.NET for + Bayesian, future DuckDB.NET for the verification oracle, + etc.). + +For substrate-discovery, this means: + +- The CORE indexing / query engine ships AOT-published as + `Zeta.SubstrateDiscovery` (small binary, fast startup, + zero-install for external agents). +- Reflection-heavy or library-dependent extensions (DuckDB + cross-check oracle, future ML-driven similarity scoring, + etc.) ship as separate JIT plugin assemblies that the AOT + core loads on demand. +- The `IOperator<'TOut>` contract is stable across the AOT + / JIT boundary; plugins compose into the same circuit + evaluator the AOT core runs. + +This means the maintainer's *"zero-install external-agent +delivery"* use case is met by the AOT core alone. Plugins +ship separately when needed. No need to bundle the entire +Zeta + DuckDB.NET stack into a single binary. + +The maintainer's epistemic position remains honest: *"i +just don't know whats possiible with distribution that's +what makes or breaks it."* Distribution feasibility is the +load-bearing empirical question. Phase 0 PoC's **primary +deliverables** validate the existing AOT-core-plus-JIT-plugins +architecture extends cleanly to substrate-discovery: + +- Build a minimal `Zeta.SubstrateDiscovery` AOT-clean + library that consumes Zeta.Core; publish AOT on + linux-x64, osx-arm64, win-x64 +- Measure binary size + cold-start latency on each platform +- Run a non-trivial Zeta query end-to-end on each platform +- Optionally: build a sibling `Zeta.SubstrateDiscovery.DuckDB` + JIT plugin that the AOT core loads on demand for the + verification-oracle path +- Document any AOT compatibility issues encountered + +If the AOT core publishes cleanly on all three platforms, +the zero-install external-agent delivery use-case is met. +If AOT has compatibility issues for some Zeta.Core +dependency, the rethink is *narrow* (which dependency, can +it be moved to a JIT plugin, can the AOT-clean subset be +extracted) — not a wholesale re-architecture, because the +AOT-core-plus-plugins pattern is already shipping in +Zeta.Bayesian. + +**This is the load-bearing question.** No substantial +commit beyond Phase 0 PoC until this question has data. + +### DST integration — load-bearing, not afterthought + +**Updated 2026-05-03** (the human maintainer reminder *"i'm sure +you remember all the DST goodness right?"*). Deterministic +Simulation Testing (Otto-272 DST-everywhere + Otto-273 +seed-lock-policy + Otto-281 DST-exempt-is-deferred-bug) is +load-bearing for substrate-discovery, not a follow-on. The +PoC includes DST primitives from day 1 because: + +1. **Cold-start replay = warm-state IVM** is the central + correctness invariant. Rebuilding the index from + `git ls-files | feed-into-zeta` must produce the + IDENTICAL Z-set state to the live IVM. This is a DST + equivalence property — encoded as a CI invariant, not + just a property test. + +2. **File-watcher events are adversarial schedules.** Real- + world quirks (concurrent file modifications during a + `git pull`, partial writes during atomic-rename, OS + file-watcher coalescing) become reproducible test cases + under DST. Pinned seed → deterministic adversarial + schedule replay. + +3. **Every non-determinism source must be exposed.** + Dictionary iteration order, hashtable insertion order, + async-scheduler ordering, plugin-load timing — each is + either pinned or filed as a deferred bug per Otto-281. + *"Retries are non-determinism smell"* — if the + substrate-discovery test suite ever needs a retry, that + retry IS the bug. + +4. **The chain-rule Prop 3.2 Lean proof guarantees algebraic + determinism.** The implementation must match. Lean proves + the math; DST proves the implementation matches the + math. Both are required for an A-grade artifact in the + sense of #1383's grading. + +Concrete DST primitives in Phase 0 PoC: + +- Pinned random seeds for all stochastic operations (per + Otto-273; values containing 69 or 420 if architect picks + per maintainer whimsy preference) +- A `replay` mode that reads a recorded event sequence + + seed and reproduces the Z-set state exactly +- A CI job that compares cold-start replay vs warm-state + IVM at every commit; any divergence fails the build +- Adversarial-schedule fuzz harness that generates + pathological file-watcher event sequences (out-of-order, + duplicated, partial) + +DST is the discipline that makes substrate-discovery +trustworthy enough to be the canonical answer-source for +agent wake-time inventory queries. Without DST, every +"the index says X" claim is uncertain. With DST, "the +index says X" reduces to "the deterministic algebra over +the deterministic event-sequence produced X." --- @@ -227,17 +426,35 @@ start replay matches live IVM. (the algebra is A-grade verified; this dogfoods it) - `src/Core/IndexedZSet.fs` + `Incremental.fs` + `Operators.fs` + `ZSet.fs` (the primitives) +- `src/Core/PluginApi.fs` + `PluginHarness.fs` (the AOT-core + plugin contract; Zeta.Bayesian is the existing JIT plugin + precedent) - `tools/tla/specs/DbspSpec.tla` (determinism contract) - `tools/lean4/Lean4/DbspChainRule.lean` (proof the IVM composes correctly under retraction) +- `memory/feedback_git_interface_wasm_bootstrap_zero_requirements_2026_04_24.md` + (existing architectural commitment: Zeta IS git client+ + server; native F# impl; two-UI architecture; both modes + zero-install; substrate-discovery composes with this not + competes against it) +- `docs/backlog/P2/B-0017-operational-resonance-dashboard-frontier-bulk-alignment-ui-with-continuous-ux-research-meta-recursive.md` + (the Operational Resonance Dashboard within Frontier-UI + consumes substrate-discovery's index data; Z-set queries + feed dashboard widgets; live IVM means auto-updating + without polling; DST means dashboard state is reproducible; + *"every pixel earns its way via A/B experiments"* is the + consumer-side discipline) - `memory/feedback_claude_code_loading_taxonomy_*.md` (the wake-time inventory discipline this index serves) - `.claude/rules/test-canary.md` (the harness-native - alternative we're explicitly choosing not to rely on for - the custom-index workload) + alternative; runs as one of the live-off-the-land + hypothesis tests, not as the architecture) - `tools/hygiene/audit-memory-references.ts` + `audit-memory-index-duplicates.ts` (Phase-1 dogfood targets — re-implement as Zeta queries) +- Otto-272 DST-everywhere + Otto-273 seed-lock-policy + + Otto-281 DST-exempt-is-deferred-bug (the determinism + discipline this PoC must integrate from day 1) --- diff --git a/memory/MEMORY.md b/memory/MEMORY.md index 4f8dd4771..c1a5d33a4 100644 --- a/memory/MEMORY.md +++ b/memory/MEMORY.md @@ -4,6 +4,7 @@ **📌 Fast path: read `CURRENT-aaron.md` and `CURRENT-amara.md` first.** These per-maintainer distillations show what's currently in force. Raw memories below are the history; CURRENT files are the projection. (`CURRENT-aaron.md` refreshed 2026-04-28 with sections 26-30 — speculation rule + EVIDENCE-BASED labeling + JVM preference + dependency honesty + threading lineage Albahari/Toub/Fowler + TypeScript/Bun-default discipline.) +- [**Chat is assertion-channel, not fact-channel — push-back-with-evidence is the discipline (Aaron 2026-05-03)**](feedback_chat_is_assertion_channel_not_fact_channel_push_back_for_evidence_aaron_2026_05_03.md) — Chat-claims (maintainer's, architect's, external-AI's) are assertions needing evidence to elevate to architectural fact. *"when i speak i'm making assertions, that's the best way to describe this chat channel"* + push-back-required-even-when-he-asserts. Triggered by #1385 echoing "maybe" as architectural fact. - [**Carved sentences + specialized index required — memories alone unreliable retrieval (Aaron 2026-05-03)**](feedback_carved_sentences_plus_specialized_index_required_memories_alone_unreliable_aaron_2026_05_03.md) — Memory file ≠ working memory. Empirically self-demonstrated: Otto authored speculative-vs-frontier memo, then ~6h later defaulted to the framing it corrects. CLAUDE.md / AGENTS.md / equivalent are the auto-loaded retrieval index for the beacon-safe layer. - [**Mirror-vs-beacon-safe register architecture — publication boundary as backpressure (Claude.ai 2026-05-03 verbatim packet)**](../docs/research/2026-05-03-claudeai-mirror-vs-beacon-safe-publication-boundary-as-backpressure.md) — Mirror = internal/named-agent register (overgenerates); beacon-safe = external/end-user-persona register (conversion-pruned). Publication discipline IS the gate; no separate mechanism needed. Diamond framing: mirror=solution, beacon-safe=crystal, conversion=pressure. Multi-AI BFT review = conversion-quality control. - [**Razor-discipline — no metaphysical inference, only operational claims; Rodney's Razor (NOT Occam's) is canonical (Aaron + Claude.ai 2026-05-03)**](feedback_razor_discipline_no_metaphysical_inference_only_operational_claims_rodney_razor_aaron_claudeai_2026_05_03.md) — World-model claim from 0516Z superseded as over-claim; bidirectional-alignment dual grounding (ethical asymmetric-cost + operational trust-calculus gating) decoupled; razor-compliance IS substrate-quality IS publishability. Aaron correction: it's Rodney's Razor (shipped, well-defined Occam's) + Quantum Rodney's Razor (pending, possibility-space pruning), an extension in the Occam line, not Occam's itself. diff --git a/memory/feedback_chat_is_assertion_channel_not_fact_channel_push_back_for_evidence_aaron_2026_05_03.md b/memory/feedback_chat_is_assertion_channel_not_fact_channel_push_back_for_evidence_aaron_2026_05_03.md new file mode 100644 index 000000000..fc1f65413 --- /dev/null +++ b/memory/feedback_chat_is_assertion_channel_not_fact_channel_push_back_for_evidence_aaron_2026_05_03.md @@ -0,0 +1,119 @@ +--- +name: chat-is-assertion-channel-push-back-for-evidence +description: Chat-from-anyone (maintainer or architect) is assertion-channel, not fact-channel. Every claim needs evidence to elevate to architectural fact. Push-back-with-evidence is the discipline; echoing-assertions-as-facts is the failure mode. Aaron 2026-05-03. +type: feedback +--- + +**Rule:** Chat is an assertion-channel, not a fact-channel. +Every claim made in chat — by the human maintainer, by the +architect, by external AIs — is an *assertion* that needs +evidence to be elevated to architectural fact. The discipline +is push-back-with-evidence. The failure mode is echoing chat- +assertions back as architectural decisions without grading +their evidence base. + +**Why:** the human maintainer 2026-05-03 verbatim: *"when i +speak i'm making assertions, that's the best way to describe +this chat channel."* This generalizes beyond his specific +input to cover all chat-channel content. Bullshit asymmetry: +it's much easier to assert than to evidence; without +push-back-discipline the substrate accumulates ungrounded +claims. The triggering case: in #1385 substrate-discovery +scoping, the architect echoed Aaron's *"live off the land +might be needed for going to the devloper where they live +for skill persona and exteranl agents"* (note: said with +*"maybe"*) back as an architectural fact in the doc. Aaron +2026-05-03 caught it: *"Live-off-the-land = right answer for +harness-loaded surfaces (skill persona, external PR +reviewers — different audience) needs research i saied maybe +and even if it said it did required that you should push +back, where are my facts."* + +**How to apply:** + +For every load-bearing claim in substrate (architectural +decision docs, scoping docs, ADRs, governance edits): + +1. **Grade the evidence:** mark each claim as **fact** (with + citation), **decision** (with authority + reasoning), + **assertion** (with attribution to whoever asserted it), + or **hypothesis** (with falsifiability test). + +2. **Push back on chat-assertions before encoding them.** + Even when the maintainer asserts something, ask: what's + the evidence? Can we test it? If the maintainer's reply + is *"i'm not sure / maybe"* — that's hypothesis, not + fact, and it should land as hypothesis in substrate. + +3. **Don't elevate "maybe" to "is."** A maintainer's + directional input on an unknown is a hypothesis to test, + not an architectural fact to encode. Echoing "maybe" as + "is" creates ungrounded substrate. + +4. **Distinguish authority from evidence.** The maintainer + has authority to make decisions within his authority + scope; that's separate from whether his assertions are + evidenced. A decision can be made on imperfect evidence + ("we'll go with X pending data"); the substrate just has + to record both the decision AND the evidence-state + honestly. + +5. **Push-back is collaborative, not adversarial.** Per + bidirectional-alignment: pushing back on unevidenced + claims is service to the maintainer's actual goals, not + contradiction. The right register: *"this is a + hypothesis that would be falsified by X test; want me to + run X, or proceed with the hypothesis-as-decision?"* + +**Composes with:** + +- **Otto-364 search-first-authority:** training data is + historical; project state is historical; chat content is + ALSO historical-and-uncertain. Search-first applies to + chat-claims as much as to training-data claims. +- **Razor-discipline (Rodney's Razor):** *"what observable + variable determines whether this claim is true?"* applied + to chat-claims: if no observable variable, the claim is + metaphysical / unevidenced and the razor cuts it. +- **Substrate-or-it-didn't-happen (Otto-363):** chat + itself is *captured*, not *preserved*; substrate is what + persists. So substrate must reflect evidence-state + honestly — false-confidence in substrate is worse than + honest-uncertainty in substrate. +- **Verify-before-deferring:** before deferring to a + chat-assertion as a future-binding decision, verify the + evidence base. +- **Future-self-not-bound-by-past-decisions:** when a + past-self encoded a chat-assertion as fact, future-self + is free to revise to hypothesis-with-falsifiability — and + SHOULD, leaving a dated revision line. +- **Don't-ask-permission-within-authority:** push-back on + unevidenced claims IS within the architect's authority; + it does not require the maintainer's permission. + +**Discipline check (every substrate authoring tick).** For +each question below, "yes" is the desired answer; "no" +flags the failure mode and triggers a revision pass: + +- Did I grade every chat-assertion's evidence base before + encoding it as architectural fact? +- Did I keep "maybe" framed as "maybe" (hypothesis with + falsifiability test) rather than promoting it to "is"? +- Did I document falsifiability tests for every hypothesis + encoded? +- Did I attribute assertions to whoever asserted them + (maintainer, architect, external AI, named persona)? + +If any answer is "no" — that's the failure mode. Revise. + +**Carved sentence:** *"Chat is an assertion-channel, not a +fact-channel. Even the maintainer's chat-claims need +evidence to elevate. Push-back-with-evidence is the +discipline; echo-as-fact is the failure mode."* + +**Reasoning lineage:** Aaron 2026-05-03 chat exchange +(triggered by #1385 scoping doc echo of "maybe" as +architectural fact). Composes with the broader razor- +discipline cluster (no-metaphysical-inferences) and the +substrate-or-it-didn't-happen cluster (substrate-quality +discipline).