Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -48,10 +48,209 @@ default:
Edge-runner discipline (the human maintainer 2026-05-03)
says ship the dogfood.

Alternatives considered + rejected: TS + sqlite-vec/DuckDB
(faster but doesn't dogfood); live-off-the-land via Skill
router + grep (punts architecture); hybrid TS+Zeta (two
systems, more complexity).
**Updated 2026-05-03** (post-#1385 merge corrections from
the human maintainer). Two epistemic-discipline corrections
re-grade the original framing:

### Correction 1 — chat is an assertion-channel, not a fact-channel

The maintainer 2026-05-03 verbatim: *"when i speak i'm
making assertions, that's the best way to describe this
chat channel."* Chat-claims (his OR the architect's) are
assertions; they need evidence to be elevated to
architectural fact. The architect's failure mode in #1385:
echoed the maintainer's *"maybe"* on live-off-the-land back
as an architectural fact. Push-back-with-evidence is the
discipline.

### Correction 2 — alternatives are complementary, not exclusive

The maintainer 2026-05-03 verbatim: *"i like hybrid for
Comment on lines +66 to +68
verification duckdb is very advanced too and we want a lot
of its features we can verify against it behavior too, we
don't want to copy it's code at all we are very differnt
but it has some awesome feature."* The original "rejected"
framing was too binary.

### Re-graded architecture (with evidence labels)

| Layer | Status | Evidence base |
|---|---|---|
| Zeta-native-AOT canonical index | **Decision (architect, within authority)** | Algebra match (fact: workload IS Z-set); dogfood-leverage (assertion, supported by math-proofs A-grade); deployment story (hypothesis pending Phase 0 PoC) |
| DuckDB as verification oracle | **Assertion (maintainer 2026-05-03), worth pursuing** | DuckDB feature-richness (fact, well-known); cross-check-as-property-test pattern (precedent: Lean cross-checks paper); pattern extends to git per maintainer 2026-05-03 (*"some compabilty testing you do with duck you can do with git to slowly replace that"*) — composes with existing `memory/feedback_git_interface_wasm_bootstrap_zero_requirements_2026_04_24.md` architectural commitment (Zeta IS git client+server; native F# impl; two-UI Frontier+Mode-1-admin+WASM-Mode-2; both zero-install). |
| Live-off-the-land for harness-loaded surfaces | **Hypothesis pending research** | Maintainer said "maybe"; zero observed-behavior evidence; falsifiable via canary test + skill-persona behavioral observation |
| Distribution feasibility (NativeAOT single-binary) | **Make-or-break risk per maintainer assertion** | Need cross-platform empirical test (linux-x64 / osx-arm64 / win-x64); known-unknown |

### Push-back: what would establish the live-off-the-land hypothesis?

The current claim has zero evidence base. The maintainer's
"maybe" is directional input, not data. Concrete falsifiable
tests:

1. **`.claude/rules/` auto-load canary** (fixture exists at
`.claude/rules/test-canary.md`): does a fresh Claude Code
session in this repo see the canary string without being
told to read the file? Pass = harness-native loading
covers some of the substrate-discovery problem; fail =
it doesn't, and the live-off-the-land path needs work.

2. **Skill-persona behavioral observation:** Do existing
skill personas (.claude/skills/<name>/SKILL.md) actually
succeed at finding what they need with `Skill` router +
grep + glob alone, or do they regularly fail / reach for
substrate that isn't router-discoverable? Measurable by
reading skill execution logs (if they exist) or
instrumenting one tick to log every `Skill` invocation
and its outcome.

3. **External-PR-reviewer behavioral observation:** External
review agents (`/ultrareview`, automated PR reviewers)
either find what they need or they don't. Observable on
recent PR review threads; we can sample the last ~50
review comments and classify "agent had context to
answer" vs "agent missed context that lived in
substrate".

Until at least one of these tests produces data, "live-off-
the-land for harness-loaded surfaces" is a hypothesis to be
tested, NOT an architectural decision to be encoded. Phase 0
PoC scope expanded: include ONE of the three tests above as
prerequisite evidence before building the substrate-
discovery layer that would integrate with live-off-the-
land.

### Distribution feasibility — existing AOT core + JIT plugin architecture

**Updated 2026-05-03** (the human maintainer): the dual-mode
framing in this doc was reinventing existing prior art. *"we
already have a AOT core that can load JIT plugins see the
Baseyan."* Verified in repo: `src/Bayesian/Bayesian.fsproj`
line 9 explicit comment — *"Explicitly NOT AOT-enforced —
this is a plugin. Core stays AOT-clean."* — and the project
description *"Opt-in: this project doesn't enforce
PublishAot=true because it may optionally use Infer.NET,
which depends on reflection-emit."*

The actual architecture (already shipping):

- **Zeta.Core** (`src/Core/Core.fsproj`) = AOT-clean library.
Includes `PluginApi.fs` (`IOperator<'TOut>` plugin-author
contract, `OutputBuffer`, `StreamHandle`) and
`PluginHarness.fs` (test harness for plugin operator
authors). Contains `IndexedZSet.fs`, `Incremental.fs`,
`Operators.fs` — the substrate-discovery primitives.

- **Plugin projects** (`src/Bayesian/`, future
`src/SubstrateDiscovery.Plugins.*/`, etc.) = separate
fsproj files that reference Zeta.Core, implement the
`IOperator<'TOut>` contract, and are **not** AOT-enforced
so they can use reflection-heavy libraries (Infer.NET for
Bayesian, future DuckDB.NET for the verification oracle,
etc.).

For substrate-discovery, this means:

- The CORE indexing / query engine ships AOT-published as
`Zeta.SubstrateDiscovery` (small binary, fast startup,
zero-install for external agents).
- Reflection-heavy or library-dependent extensions (DuckDB
cross-check oracle, future ML-driven similarity scoring,
etc.) ship as separate JIT plugin assemblies that the AOT
core loads on demand.
- The `IOperator<'TOut>` contract is stable across the AOT
/ JIT boundary; plugins compose into the same circuit
evaluator the AOT core runs.

This means the maintainer's *"zero-install external-agent
delivery"* use case is met by the AOT core alone. Plugins
ship separately when needed. No need to bundle the entire
Zeta + DuckDB.NET stack into a single binary.

The maintainer's epistemic position remains honest: *"i
just don't know whats possiible with distribution that's
what makes or breaks it."* Distribution feasibility is the
load-bearing empirical question. Phase 0 PoC's **primary
deliverables** validate the existing AOT-core-plus-JIT-plugins
architecture extends cleanly to substrate-discovery:

- Build a minimal `Zeta.SubstrateDiscovery` AOT-clean
library that consumes Zeta.Core; publish AOT on
linux-x64, osx-arm64, win-x64
- Measure binary size + cold-start latency on each platform
- Run a non-trivial Zeta query end-to-end on each platform
- Optionally: build a sibling `Zeta.SubstrateDiscovery.DuckDB`
JIT plugin that the AOT core loads on demand for the
verification-oracle path
- Document any AOT compatibility issues encountered

If the AOT core publishes cleanly on all three platforms,
the zero-install external-agent delivery use-case is met.
If AOT has compatibility issues for some Zeta.Core
dependency, the rethink is *narrow* (which dependency, can
it be moved to a JIT plugin, can the AOT-clean subset be
extracted) — not a wholesale re-architecture, because the
AOT-core-plus-plugins pattern is already shipping in
Zeta.Bayesian.

**This is the load-bearing question.** No substantial
commit beyond Phase 0 PoC until this question has data.

### DST integration — load-bearing, not afterthought

**Updated 2026-05-03** (the human maintainer reminder *"i'm sure
you remember all the DST goodness right?"*). Deterministic
Simulation Testing (Otto-272 DST-everywhere + Otto-273
seed-lock-policy + Otto-281 DST-exempt-is-deferred-bug) is
load-bearing for substrate-discovery, not a follow-on. The
PoC includes DST primitives from day 1 because:

1. **Cold-start replay = warm-state IVM** is the central
correctness invariant. Rebuilding the index from
`git ls-files | feed-into-zeta` must produce the
IDENTICAL Z-set state to the live IVM. This is a DST
equivalence property — encoded as a CI invariant, not
just a property test.

2. **File-watcher events are adversarial schedules.** Real-
world quirks (concurrent file modifications during a
`git pull`, partial writes during atomic-rename, OS
file-watcher coalescing) become reproducible test cases
under DST. Pinned seed → deterministic adversarial
schedule replay.

3. **Every non-determinism source must be exposed.**
Dictionary iteration order, hashtable insertion order,
async-scheduler ordering, plugin-load timing — each is
either pinned or filed as a deferred bug per Otto-281.
*"Retries are non-determinism smell"* — if the
substrate-discovery test suite ever needs a retry, that
retry IS the bug.

4. **The chain-rule Prop 3.2 Lean proof guarantees algebraic
determinism.** The implementation must match. Lean proves
the math; DST proves the implementation matches the
math. Both are required for an A-grade artifact in the
sense of #1383's grading.

Concrete DST primitives in Phase 0 PoC:

- Pinned random seeds for all stochastic operations (per
Otto-273; values containing 69 or 420 if architect picks
per maintainer whimsy preference)
- A `replay` mode that reads a recorded event sequence +
seed and reproduces the Z-set state exactly
- A CI job that compares cold-start replay vs warm-state
IVM at every commit; any divergence fails the build
- Adversarial-schedule fuzz harness that generates
pathological file-watcher event sequences (out-of-order,
duplicated, partial)

DST is the discipline that makes substrate-discovery
trustworthy enough to be the canonical answer-source for
agent wake-time inventory queries. Without DST, every
"the index says X" claim is uncertain. With DST, "the
index says X" reduces to "the deterministic algebra over
the deterministic event-sequence produced X."

---

Expand Down Expand Up @@ -227,17 +426,35 @@ start replay matches live IVM.
(the algebra is A-grade verified; this dogfoods it)
- `src/Core/IndexedZSet.fs` + `Incremental.fs` + `Operators.fs`
+ `ZSet.fs` (the primitives)
- `src/Core/PluginApi.fs` + `PluginHarness.fs` (the AOT-core
plugin contract; Zeta.Bayesian is the existing JIT plugin
precedent)
- `tools/tla/specs/DbspSpec.tla` (determinism contract)
- `tools/lean4/Lean4/DbspChainRule.lean` (proof the IVM
composes correctly under retraction)
- `memory/feedback_git_interface_wasm_bootstrap_zero_requirements_2026_04_24.md`
(existing architectural commitment: Zeta IS git client+
server; native F# impl; two-UI architecture; both modes
zero-install; substrate-discovery composes with this not
competes against it)
- `docs/backlog/P2/B-0017-operational-resonance-dashboard-frontier-bulk-alignment-ui-with-continuous-ux-research-meta-recursive.md`
(the Operational Resonance Dashboard within Frontier-UI
consumes substrate-discovery's index data; Z-set queries
feed dashboard widgets; live IVM means auto-updating
without polling; DST means dashboard state is reproducible;
Comment on lines +440 to +444
*"every pixel earns its way via A/B experiments"* is the
consumer-side discipline)
- `memory/feedback_claude_code_loading_taxonomy_*.md`
(the wake-time inventory discipline this index serves)
- `.claude/rules/test-canary.md` (the harness-native
alternative we're explicitly choosing not to rely on for
the custom-index workload)
alternative; runs as one of the live-off-the-land
hypothesis tests, not as the architecture)
Comment on lines 447 to +451
- `tools/hygiene/audit-memory-references.ts` +
`audit-memory-index-duplicates.ts` (Phase-1 dogfood
targets — re-implement as Zeta queries)
- Otto-272 DST-everywhere + Otto-273 seed-lock-policy +
Otto-281 DST-exempt-is-deferred-bug (the determinism
discipline this PoC must integrate from day 1)

---

Expand Down
1 change: 1 addition & 0 deletions memory/MEMORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
<!-- paired-edit log (NOT the single-slot latest-marker — that lives on line 3 above): PR #986 lands carved-sentence fixed-point stability + Zeta soul-file executor architecture (Infer.NET-style Bayesian inference, NOT LLMs) + carved sentences ≈ formal specs provable in DST + Deepseek CSAP review absorption (Aaron 2026-04-30 → 2026-05-01, eight-message chain across two autonomous-loop ticks per the file body's section header). Architectural disclosure: substrate IS the priors; alignment IS substrate. The single-slot latest-marker on line 3 (forever-home Aaron 2026-05-01) takes precedence as the chronologically-latest paired edit; this PR's work is earlier. -->
**📌 Fast path: read `CURRENT-aaron.md` and `CURRENT-amara.md` first.** <!-- paired-edit: PR #690 scheduled-workflow-null-result-hygiene-scan tier-1 promotion 2026-04-28 --> These per-maintainer distillations show what's currently in force. Raw memories below are the history; CURRENT files are the projection. (`CURRENT-aaron.md` refreshed 2026-04-28 with sections 26-30 — speculation rule + EVIDENCE-BASED labeling + JVM preference + dependency honesty + threading lineage Albahari/Toub/Fowler + TypeScript/Bun-default discipline.)

- [**Chat is assertion-channel, not fact-channel — push-back-with-evidence is the discipline (Aaron 2026-05-03)**](feedback_chat_is_assertion_channel_not_fact_channel_push_back_for_evidence_aaron_2026_05_03.md) — Chat-claims (maintainer's, architect's, external-AI's) are assertions needing evidence to elevate to architectural fact. *"when i speak i'm making assertions, that's the best way to describe this chat channel"* + push-back-required-even-when-he-asserts. Triggered by #1385 echoing "maybe" as architectural fact.
- [**Carved sentences + specialized index required — memories alone unreliable retrieval (Aaron 2026-05-03)**](feedback_carved_sentences_plus_specialized_index_required_memories_alone_unreliable_aaron_2026_05_03.md) — Memory file ≠ working memory. Empirically self-demonstrated: Otto authored speculative-vs-frontier memo, then ~6h later defaulted to the framing it corrects. CLAUDE.md / AGENTS.md / equivalent are the auto-loaded retrieval index for the beacon-safe layer.
- [**Mirror-vs-beacon-safe register architecture — publication boundary as backpressure (Claude.ai 2026-05-03 verbatim packet)**](../docs/research/2026-05-03-claudeai-mirror-vs-beacon-safe-publication-boundary-as-backpressure.md) — Mirror = internal/named-agent register (overgenerates); beacon-safe = external/end-user-persona register (conversion-pruned). Publication discipline IS the gate; no separate mechanism needed. Diamond framing: mirror=solution, beacon-safe=crystal, conversion=pressure. Multi-AI BFT review = conversion-quality control.
- [**Razor-discipline — no metaphysical inference, only operational claims; Rodney's Razor (NOT Occam's) is canonical (Aaron + Claude.ai 2026-05-03)**](feedback_razor_discipline_no_metaphysical_inference_only_operational_claims_rodney_razor_aaron_claudeai_2026_05_03.md) — World-model claim from 0516Z superseded as over-claim; bidirectional-alignment dual grounding (ethical asymmetric-cost + operational trust-calculus gating) decoupled; razor-compliance IS substrate-quality IS publishability. Aaron correction: it's Rodney's Razor (shipped, well-defined Occam's) + Quantum Rodney's Razor (pending, possibility-space pruning), an extension in the Occam line, not Occam's itself.
Expand Down
Loading
Loading