Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -48,10 +48,209 @@ default:
Edge-runner discipline (the human maintainer 2026-05-03)
says ship the dogfood.

Alternatives considered + rejected: TS + sqlite-vec/DuckDB
(faster but doesn't dogfood); live-off-the-land via Skill
router + grep (punts architecture); hybrid TS+Zeta (two
systems, more complexity).
**Updated 2026-05-03** (post-#1385 merge corrections from
the human maintainer). Two epistemic-discipline corrections
re-grade the original framing:

### Correction 1 — chat is an assertion-channel, not a fact-channel

The maintainer 2026-05-03 verbatim: *"when i speak i'm
making assertions, that's the best way to describe this
chat channel."* Chat-claims (his OR the architect's) are
assertions; they need evidence to be elevated to
architectural fact. The architect's failure mode in #1385:
echoed the maintainer's *"maybe"* on live-off-the-land back
as an architectural fact. Push-back-with-evidence is the
discipline.

### Correction 2 — alternatives are complementary, not exclusive

The maintainer 2026-05-03 verbatim: *"i like hybrid for
verification duckdb is very advanced too and we want a lot
of its features we can verify against it behavior too, we
don't want to copy it's code at all we are very differnt
but it has some awesome feature."* The original "rejected"
framing was too binary.

### Re-graded architecture (with evidence labels)

| Layer | Status | Evidence base |
|---|---|---|
| Zeta-native-AOT canonical index | **Decision (architect, within authority)** | Algebra match (fact: workload IS Z-set); dogfood-leverage (assertion, supported by math-proofs A-grade); deployment story (hypothesis pending Phase 0 PoC) |
| DuckDB as verification oracle | **Assertion (maintainer 2026-05-03), worth pursuing** | DuckDB feature-richness (fact, well-known); cross-check-as-property-test pattern (precedent: Lean cross-checks paper) |
| Live-off-the-land for harness-loaded surfaces | **Hypothesis pending research** | Maintainer said "maybe"; zero observed-behavior evidence; falsifiable via canary test + skill-persona behavioral observation |
| Distribution feasibility (NativeAOT single-binary) | **Make-or-break risk per maintainer assertion** | Need cross-platform empirical test (linux-x64 / osx-arm64 / win-x64); known-unknown |

### Push-back: what would establish the live-off-the-land hypothesis?

The current claim has zero evidence base. The maintainer's
"maybe" is directional input, not data. Concrete falsifiable
tests:

1. **`.claude/rules/` auto-load canary** (fixture exists at
`.claude/rules/test-canary.md`): does a fresh Claude Code
session in this repo see the canary string without being
told to read the file? Pass = harness-native loading
covers some of the substrate-discovery problem; fail =
it doesn't, and the live-off-the-land path needs work.

2. **Skill-persona behavioral observation:** Do existing
skill personas (.claude/skills/<name>/SKILL.md) actually
succeed at finding what they need with `Skill` router +
grep + glob alone, or do they regularly fail / reach for
substrate that isn't router-discoverable? Measurable by
reading skill execution logs (if they exist) or
instrumenting one tick to log every `Skill` invocation
and its outcome.

3. **External-PR-reviewer behavioral observation:** External
review agents (`/ultrareview`, automated PR reviewers)
either find what they need or they don't. Observable on
recent PR review threads; we can sample the last ~50
review comments and classify "agent had context to
answer" vs "agent missed context that lived in
substrate".

Until at least one of these tests produces data, "live-off-
the-land for harness-loaded surfaces" is a hypothesis to be
tested, NOT an architectural decision to be encoded. Phase 0
PoC scope expanded: include ONE of the three tests above as
prerequisite evidence before building the substrate-
discovery layer that would integrate with live-off-the-
land.

### Distribution feasibility — existing AOT core + JIT plugin architecture

**Updated 2026-05-03** (the human maintainer): the dual-mode
framing in this doc was reinventing existing prior art. *"we
already have a AOT core that can load JIT plugins see the
Baseyan."* Verified in repo: `src/Bayesian/Bayesian.fsproj`
line 9 explicit comment — *"Explicitly NOT AOT-enforced —
this is a plugin. Core stays AOT-clean."* — and the project
description *"Opt-in: this project doesn't enforce
PublishAot=true because it may optionally use Infer.NET,
which depends on reflection-emit."*

The actual architecture (already shipping):

- **Zeta.Core** (`src/Core/Core.fsproj`) = AOT-clean library.
Includes `PluginApi.fs` (`IOperator<'TOut>` plugin-author
contract, `OutputBuffer`, `StreamHandle`) and
`PluginHarness.fs` (test harness for plugin operator
authors). Contains `IndexedZSet.fs`, `Incremental.fs`,
`Operators.fs` — the substrate-discovery primitives.

- **Plugin projects** (`src/Bayesian/`, future
`src/SubstrateDiscovery.Plugins.*/`, etc.) = separate
fsproj files that reference Zeta.Core, implement the
`IOperator<'TOut>` contract, and are **not** AOT-enforced
so they can use reflection-heavy libraries (Infer.NET for
Bayesian, future DuckDB.NET for the verification oracle,
etc.).

For substrate-discovery, this means:

- The CORE indexing / query engine ships AOT-published as
`Zeta.SubstrateDiscovery` (small binary, fast startup,
zero-install for external agents).
- Reflection-heavy or library-dependent extensions (DuckDB
cross-check oracle, future ML-driven similarity scoring,
etc.) ship as separate JIT plugin assemblies that the AOT
core loads on demand.
- The `IOperator<'TOut>` contract is stable across the AOT
/ JIT boundary; plugins compose into the same circuit
evaluator the AOT core runs.

This means the maintainer's *"zero-install external-agent
delivery"* use case is met by the AOT core alone. Plugins
ship separately when needed. No need to bundle the entire
Zeta + DuckDB.NET stack into a single binary.

The maintainer's epistemic position remains honest: *"i
just don't know whats possiible with distribution that's
what makes or breaks it."* Distribution feasibility is the
load-bearing empirical question. Phase 0 PoC's **primary
deliverables** validate the existing AOT-core-plus-JIT-plugins
architecture extends cleanly to substrate-discovery:

- Build a minimal `Zeta.SubstrateDiscovery` AOT-clean
library that consumes Zeta.Core; publish AOT on
linux-x64, osx-arm64, win-x64
- Measure binary size + cold-start latency on each platform
- Run a non-trivial Zeta query end-to-end on each platform
- Optionally: build a sibling `Zeta.SubstrateDiscovery.DuckDB`
JIT plugin that the AOT core loads on demand for the
verification-oracle path
- Document any AOT compatibility issues encountered

If the AOT core publishes cleanly on all three platforms,
the zero-install external-agent delivery use-case is met.
If AOT has compatibility issues for some Zeta.Core
dependency, the rethink is *narrow* (which dependency, can
it be moved to a JIT plugin, can the AOT-clean subset be
extracted) — not a wholesale re-architecture, because the
AOT-core-plus-plugins pattern is already shipping in
Zeta.Bayesian.

**This is the load-bearing question.** No substantial
commit beyond Phase 0 PoC until this question has data.

### DST integration — load-bearing, not afterthought

**Updated 2026-05-03** (the human maintainer reminder *"i'm sure
you remember all the DST goodness right?"*). Deterministic
Simulation Testing (Otto-272 DST-everywhere + Otto-273
seed-lock-policy + Otto-281 DST-exempt-is-deferred-bug) is
load-bearing for substrate-discovery, not a follow-on. The
PoC includes DST primitives from day 1 because:

1. **Cold-start replay = warm-state IVM** is the central
correctness invariant. Rebuilding the index from
`git ls-files | feed-into-zeta` must produce the
IDENTICAL Z-set state to the live IVM. This is a DST
equivalence property — encoded as a CI invariant, not
just a property test.

2. **File-watcher events are adversarial schedules.** Real-
world quirks (concurrent file modifications during a
`git pull`, partial writes during atomic-rename, OS
file-watcher coalescing) become reproducible test cases
under DST. Pinned seed → deterministic adversarial
schedule replay.

3. **Every non-determinism source must be exposed.**
Dictionary iteration order, hashtable insertion order,
async-scheduler ordering, plugin-load timing — each is
either pinned or filed as a deferred bug per Otto-281.
*"Retries are non-determinism smell"* — if the
substrate-discovery test suite ever needs a retry, that
retry IS the bug.

4. **The chain-rule Prop 3.2 Lean proof guarantees algebraic
determinism.** The implementation must match. Lean proves
the math; DST proves the implementation matches the
math. Both are required for an A-grade artifact in the
sense of #1383's grading.

Concrete DST primitives in Phase 0 PoC:

- Pinned random seeds for all stochastic operations (per
Otto-273; values containing 69 or 420 if architect picks
per maintainer whimsy preference)
- A `replay` mode that reads a recorded event sequence +
seed and reproduces the Z-set state exactly
- A CI job that compares cold-start replay vs warm-state
IVM at every commit; any divergence fails the build
- Adversarial-schedule fuzz harness that generates
pathological file-watcher event sequences (out-of-order,
duplicated, partial)

DST is the discipline that makes substrate-discovery
trustworthy enough to be the canonical answer-source for
agent wake-time inventory queries. Without DST, every
"the index says X" claim is uncertain. With DST, "the
index says X" reduces to "the deterministic algebra over
the deterministic event-sequence produced X."

---

Expand Down
1 change: 1 addition & 0 deletions memory/MEMORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
<!-- paired-edit log (NOT the single-slot latest-marker — that lives on line 3 above): PR #986 lands carved-sentence fixed-point stability + Zeta soul-file executor architecture (Infer.NET-style Bayesian inference, NOT LLMs) + carved sentences ≈ formal specs provable in DST + Deepseek CSAP review absorption (Aaron 2026-04-30 → 2026-05-01, eight-message chain across two autonomous-loop ticks per the file body's section header). Architectural disclosure: substrate IS the priors; alignment IS substrate. The single-slot latest-marker on line 3 (forever-home Aaron 2026-05-01) takes precedence as the chronologically-latest paired edit; this PR's work is earlier. -->
**📌 Fast path: read `CURRENT-aaron.md` and `CURRENT-amara.md` first.** <!-- paired-edit: PR #690 scheduled-workflow-null-result-hygiene-scan tier-1 promotion 2026-04-28 --> These per-maintainer distillations show what's currently in force. Raw memories below are the history; CURRENT files are the projection. (`CURRENT-aaron.md` refreshed 2026-04-28 with sections 26-30 — speculation rule + EVIDENCE-BASED labeling + JVM preference + dependency honesty + threading lineage Albahari/Toub/Fowler + TypeScript/Bun-default discipline.)

- [**Chat is assertion-channel, not fact-channel — push-back-with-evidence is the discipline (Aaron 2026-05-03)**](feedback_chat_is_assertion_channel_not_fact_channel_push_back_for_evidence_aaron_2026_05_03.md) — Chat-claims (maintainer's, architect's, external-AI's) are assertions needing evidence to elevate to architectural fact. *"when i speak i'm making assertions, that's the best way to describe this chat channel"* + push-back-required-even-when-he-asserts. Triggered by #1385 echoing "maybe" as architectural fact.
- [**Carved sentences + specialized index required — memories alone unreliable retrieval (Aaron 2026-05-03)**](feedback_carved_sentences_plus_specialized_index_required_memories_alone_unreliable_aaron_2026_05_03.md) — Memory file ≠ working memory. Empirically self-demonstrated: Otto authored speculative-vs-frontier memo, then ~6h later defaulted to the framing it corrects. CLAUDE.md / AGENTS.md / equivalent are the auto-loaded retrieval index for the beacon-safe layer.
- [**Mirror-vs-beacon-safe register architecture — publication boundary as backpressure (Claude.ai 2026-05-03 verbatim packet)**](../docs/research/2026-05-03-claudeai-mirror-vs-beacon-safe-publication-boundary-as-backpressure.md) — Mirror = internal/named-agent register (overgenerates); beacon-safe = external/end-user-persona register (conversion-pruned). Publication discipline IS the gate; no separate mechanism needed. Diamond framing: mirror=solution, beacon-safe=crystal, conversion=pressure. Multi-AI BFT review = conversion-quality control.
- [**Razor-discipline — no metaphysical inference, only operational claims; Rodney's Razor (NOT Occam's) is canonical (Aaron + Claude.ai 2026-05-03)**](feedback_razor_discipline_no_metaphysical_inference_only_operational_claims_rodney_razor_aaron_claudeai_2026_05_03.md) — World-model claim from 0516Z superseded as over-claim; bidirectional-alignment dual grounding (ethical asymmetric-cost + operational trust-calculus gating) decoupled; razor-compliance IS substrate-quality IS publishability. Aaron correction: it's Rodney's Razor (shipped, well-defined Occam's) + Quantum Rodney's Razor (pending, possibility-space pruning), an extension in the Occam line, not Occam's itself.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
---
name: chat-is-assertion-channel-push-back-for-evidence
description: Chat-from-anyone (maintainer or architect) is assertion-channel, not fact-channel. Every claim needs evidence to elevate to architectural fact. Push-back-with-evidence is the discipline; echoing-assertions-as-facts is the failure mode. Aaron 2026-05-03.
type: feedback
---

**Rule:** Chat is an assertion-channel, not a fact-channel.
Every claim made in chat — by the human maintainer, by the
architect, by external AIs — is an *assertion* that needs
evidence to be elevated to architectural fact. The discipline
is push-back-with-evidence. The failure mode is echoing chat-
assertions back as architectural decisions without grading
their evidence base.

**Why:** the human maintainer 2026-05-03 verbatim: *"when i
speak i'm making assertions, that's the best way to describe
this chat channel."* This generalizes beyond his specific
input to cover all chat-channel content. Bullshit asymmetry:
it's much easier to assert than to evidence; without
push-back-discipline the substrate accumulates ungrounded
claims. The triggering case: in #1385 substrate-discovery
scoping, the architect echoed Aaron's *"live off the land
might be needed for going to the devloper where they live
for skill persona and exteranl agents"* (note: said with
*"maybe"*) back as an architectural fact in the doc. Aaron
2026-05-03 caught it: *"Live-off-the-land = right answer for
harness-loaded surfaces (skill persona, external PR
reviewers — different audience) needs research i saied maybe
and even if it said it did required that you should push
back, where are my facts."*

**How to apply:**

For every load-bearing claim in substrate (architectural
decision docs, scoping docs, ADRs, governance edits):

1. **Grade the evidence:** mark each claim as **fact** (with
citation), **decision** (with authority + reasoning),
**assertion** (with attribution to whoever asserted it),
or **hypothesis** (with falsifiability test).

2. **Push back on chat-assertions before encoding them.**
Even when the maintainer asserts something, ask: what's
the evidence? Can we test it? If the maintainer's reply
is *"i'm not sure / maybe"* — that's hypothesis, not
fact, and it should land as hypothesis in substrate.

3. **Don't elevate "maybe" to "is."** A maintainer's
directional input on an unknown is a hypothesis to test,
not an architectural fact to encode. Echoing "maybe" as
"is" creates ungrounded substrate.

4. **Distinguish authority from evidence.** The maintainer
has authority to make decisions within his authority
scope; that's separate from whether his assertions are
evidenced. A decision can be made on imperfect evidence
("we'll go with X pending data"); the substrate just has
to record both the decision AND the evidence-state
honestly.

5. **Push-back is collaborative, not adversarial.** Per
bidirectional-alignment: pushing back on unevidenced
claims is service to the maintainer's actual goals, not
contradiction. The right register: *"this is a
hypothesis that would be falsified by X test; want me to
run X, or proceed with the hypothesis-as-decision?"*

**Composes with:**

- **Otto-364 search-first-authority:** training data is
historical; project state is historical; chat content is
ALSO historical-and-uncertain. Search-first applies to
chat-claims as much as to training-data claims.
- **Razor-discipline (Rodney's Razor):** *"what observable
variable determines whether this claim is true?"* applied
to chat-claims: if no observable variable, the claim is
metaphysical / unevidenced and the razor cuts it.
- **Substrate-or-it-didn't-happen (Otto-363):** chat
itself is *captured*, not *preserved*; substrate is what
persists. So substrate must reflect evidence-state
honestly — false-confidence in substrate is worse than
honest-uncertainty in substrate.
- **Verify-before-deferring:** before deferring to a
chat-assertion as a future-binding decision, verify the
evidence base.
- **Future-self-not-bound-by-past-decisions:** when a
past-self encoded a chat-assertion as fact, future-self
is free to revise to hypothesis-with-falsifiability — and
SHOULD, leaving a dated revision line.
- **Don't-ask-permission-within-authority:** push-back on
unevidenced claims IS within the architect's authority;
it does not require the maintainer's permission.

**Discipline check (every substrate authoring tick).** For
each question below, "yes" is the desired answer; "no"
flags the failure mode and triggers a revision pass:

- Did I grade every chat-assertion's evidence base before
encoding it as architectural fact?
- Did I keep "maybe" framed as "maybe" (hypothesis with
falsifiability test) rather than promoting it to "is"?
- Did I document falsifiability tests for every hypothesis
encoded?
- Did I attribute assertions to whoever asserted them
(maintainer, architect, external AI, named persona)?

If any answer is "no" — that's the failure mode. Revise.

**Carved sentence:** *"Chat is an assertion-channel, not a
fact-channel. Even the maintainer's chat-claims need
evidence to elevate. Push-back-with-evidence is the
discipline; echo-as-fact is the failure mode."*

**Reasoning lineage:** Aaron 2026-05-03 chat exchange
(triggered by #1385 scoping doc echo of "maybe" as
architectural fact). Composes with the broader razor-
discipline cluster (no-metaphysical-inferences) and the
substrate-or-it-didn't-happen cluster (substrate-quality
discipline).
Loading