From 49f5acb6ef1dea714a9ed45af70f403cd6ed15bc Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Fri, 24 Apr 2026 19:24:44 -0400 Subject: [PATCH] backlog+memory: blockchain ingest BTC/ETH/SOL first-class DB support MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Maintainer 2026-04-24 directive captured verbatim. Two motivations: Aurora preparation (understand real blocks before building) + DB stress-test (real-world streaming workload at battle-tested scale). BTC → ETH → SOL priority order. Additional chains (Cosmos, Polkadot, Cardano, Avalanche, L2 rollups) evaluated later. Architectural frame (maintainer-explicit): NOT a fork of bitcoind / geth / solana-labs. Full node layered ON TOP of Zeta's distributed DB — distributed-node support falls out of Zeta's multi-node primitives for free. Phased scope captured in the row: Phase 0 — research gate (read actual client source per chain to map freeloader detection: BTC net_processing.cpp DoS scoring; ETH devp2p/Snap sync reciprocity; SOL turbine shred forwarding) Phase 1 — post-install ingest script + Z-set retraction-native chain-reorg handling + queryable through all existing interfaces Phase 2 — full-node protocol participation (CONDITIONAL on Phase 0 finding; upload-side interfaces first-class on par with SQL per maintainer) Phase 3 — cross-chain stream bridge (Z-set operator composition) Phase 4 — UI block-explorer + streaming dashboard Otto-275 log-don't-implement. Does NOT authorize implementation start; Phase 0 research is the gate. Does NOT authorize mainnet exposure without Aminata threat-model sign-off. --- docs/BACKLOG.md | 202 ++++++++++++++++++ memory/MEMORY.md | 1 + ...class_db_support_aurora_prep_2026_04_24.md | 195 +++++++++++++++++ 3 files changed, 398 insertions(+) create mode 100644 memory/feedback_blockchain_ingest_btc_eth_sol_first_class_db_support_aurora_prep_2026_04_24.md diff --git a/docs/BACKLOG.md b/docs/BACKLOG.md index 405a7b1c..777e858b 100644 --- a/docs/BACKLOG.md +++ b/docs/BACKLOG.md @@ -5599,6 +5599,208 @@ systems. This track claims the space. ## P2 — research-grade +- [ ] **Blockchain block ingestion — first-class BTC / + ETH / SOL streaming into Zeta's distributed database; + bi-directional protocol participation; cross-chain + stream bridge.** Maintainer 2026-04-24 directive + (verbatim): + + > *"i would love to test our database by having first + > class support for bitcoin, eth, and solana blocks + > into our database in the order of priority unless you + > tell me there are other ones worth exploring for two + > reason, 1 to help us understand blockchain for Aurora + > we don't want to just jump in and we will be starting + > from scriatch so making sure we completely understand + > everysing thing about the blocks are important so we + > get ours right. can you make a post install script + > that will streaing ingest these block chains into our + > database and make them querable will all our entry + > points/intefaces backlog. this is not a full node + > implimentation or anyting yet that will come leter + > layed on top of our multinode database so we can have + > distributed node support from the start cause we are + > on top of our distributed db. we can stick a ui in + > front of that too lol. Also you need to do a lot of + > research here cause some nodes will try to call you a + > bad node if you don't hame some amount of the full + > protocol, they give extra tests exactly to try to + > stop this freeloader scenaro where you download but + > dont upload, you can look at their source code to + > figure it out. Also if you have to do full nodes of + > those types to be able to download we have to upload + > too go ahead and to that, i want those interfaces too + > just like our SQL interfaces and i also want deep + > integration into those networks so we can 'bridge' + > them in streams and maybe further. backlog"* + + **Two load-bearing motivations:** + 1. **Aurora preparation** — Zeta's own blockchain-ish + substrate (Aurora / Lucent-KSK lineage per the + existing memory cluster) wants concrete grounding + before we design the Aurora chain shape. Ingesting + real BTC / ETH / SOL blocks into our database gives + us deep understanding of the actual data model + before we specify ours. + 2. **Database stress-test** — BTC / ETH / SOL are + three of the most battle-tested streaming workloads + on the planet (continuous append, chain + reorganizations, finality semantics, adversarial + environment). If Zeta's distributed DB can absorb + them live and serve queries through the existing + interfaces, that's a load-bearing proof of the + substrate. + + **Priority order (maintainer-specified):** BTC → ETH → + SOL. Priority is authoritative; additional chains + (Cosmos Hub / Polkadot / Cardano / Avalanche / L2 + rollups like Base / Optimism / Arbitrum) should be + evaluated in a later phase, not reordered. + + **Phased plan (scope decomposition — each phase a + future dedicated PR or PR cluster; this row is the + umbrella):** + + **Phase 0 — Research pass (no code; starts the work):** + - Read the actual client source for each chain: + `bitcoin/bitcoin` (C++), `ethereum/go-ethereum` + + `paradigmxyz/reth` (Go + Rust), `solana-labs/solana` + (Rust). Map the block shape verbatim; capture field + semantics (timestamps / merkle roots / witness data / + slot-vs-block distinctions / SOL's entries-within-slot + model). + - Identify **misbehavior / freeloader detection** per + chain — specifically what each client does to detect + a download-only peer and how it penalizes / bans. + Key sources: BTC's `net_processing.cpp` DoS scoring, + ETH's devp2p / Snap-sync reciprocity tests, SOL's + turbine-shred forwarding requirements. This + determines whether **Phase 2 full-node participation + is REQUIRED or OPTIONAL** per chain. + - Identify what's pullable WITHOUT running a full node: + BTC block explorer APIs + Electrum + public RPC; + ETH public RPC + Alchemy/Infura snapshot archives; + SOL public RPC + warehouse archive (Google BigQuery + has a public SOL blocks dataset). This bounds the + Phase 1 scope. + - Produce `docs/research/blockchain-ingestion-phase-0-bitcoin.md`, + `docs/research/blockchain-ingestion-phase-0-ethereum.md`, + `docs/research/blockchain-ingestion-phase-0-solana.md`. + + **Phase 1 — Post-install block-ingestion script + (NOT a full node):** + - Post-install script under `tools/setup/blockchain-ingest/` + (composes with GOVERNANCE §24 three-way-parity + install script). Per-chain: `bitcoin.sh`, + `ethereum.sh`, `solana.sh`. + - Each script streams blocks via public RPC / explorer + APIs into Zeta's distributed DB as Z-set entries — + retraction-native (chain reorgs are first-class + retractions; our substrate was designed for this). + - Schema-design: use the paced-ontology-landing + discipline; each chain gets a dedicated ontology + (block / transaction / log / witness / slot / + entry-vs-block-vs-shred) and the cross-chain + umbrella ontology comes later (Phase 4). + - Queryable through all existing entry points: SQL + binder, operator algebra, LINQ, any future + GraphQL / REST surface. NO new interface class + unique to blockchain; re-use what Zeta already has. + - `dotnet run -- --chain bitcoin --from-height N + --to-height latest --follow` shape. + + **Phase 2 — Full-node protocol participation + (CONDITIONAL on Phase 0 finding):** + - If Phase 0 research shows that the target chain's + client BANS download-only peers after a window + (true for BTC's DoS scoring, likely for ETH's Snap + sync, and definitely for SOL's turbine), + implement the minimum UPLOAD side of the protocol + to stay a good network citizen. + - Maintainer directive is explicit: *"if you have to + do full nodes of those types to be able to download + we have to upload too go ahead and to that, i want + those interfaces too just like our SQL interfaces"*. + Upload-side interfaces expose as first-class Zeta + interfaces on par with SQL — not private internals. + - Architecturally this is **full-node-layered-on-top + of Zeta's distributed DB** (maintainer's explicit + frame), not a standalone fork of bitcoind / geth / + solana-labs. We use Zeta as the storage / consensus + / query substrate and implement the chain protocol + ON TOP of it. Distributed-node support falls out of + Zeta's multi-node primitives for free. + - This is where Zeta's distributed-consensus substrate + (`distributed-consensus-expert` / `raft-expert` / + `paxos-expert` / `calm-theorem-expert` / + `replication-expert`) becomes load-bearing. + + **Phase 3 — Cross-chain stream bridge:** + - Deep integration per maintainer: *"deep integration + into those networks so we can 'bridge' them in + streams and maybe further"*. + - Bridge = Z-set operator composition across chain + streams. Each chain is a ZSet; cross-chain joins + produce derived ZSets (e.g. Bitcoin timestamp vs + Ethereum block for time alignment; SOL finality vs + ETH finality for comparative-consensus research). + - "Maybe further" = likely cross-chain atomic ops, + value-transfer bridges, or unified-view layers; + scope intentionally open at this phase. + - Composes with `distributed-coordination-expert` + + `crdt-expert` + `gossip-protocols-expert`. + + **Phase 4 — UI:** + - Per maintainer: *"stick a ui in front of that too + lol"*. Frontier-UX / former-Starboard-now-rename- + target (kernel-A farm-related + kernel-B + carpentry-related per the 2026-04-24 rename + directive) — cross-chain block explorer + streaming + dashboard + cross-chain bridge visualizer as initial + surfaces. + + **Additional chains worth evaluating in a later phase** + (do NOT reorder the primary BTC/ETH/SOL priority): + - **Cosmos Hub** — IBC is a canonical cross-chain + bridging primitive; directly relevant to Phase 3. + - **Polkadot** — substrate chain + parachain + composition = close architectural cousin to Zeta's + multi-node + cross-chain design. + - **Cardano** — Ouroboros PoS pedagogy (Ouroboros is + the most formally-verified consensus protocol + deployed at scale). + - **Avalanche** — sub-net architecture is a real + distributed-systems primitive worth studying. + - **L2 rollups** (Base / Optimism / Arbitrum / zkSync + Era / StarkNet) — bridge-to-ETH substrate; good + study material for Phase 3 bridging. + + **Priority / effort:** P2 research-grade; umbrella + effort is L (phased across many rounds). Phase 0 is + M (three research docs, deep source reading). Phase 1 + per-chain is M-L each (ingest script + schema + + retraction-native integration). Phase 2 per-chain is + L each (full-node protocol on top of Zeta). Phase 3 + is L+ (cross-chain bridge). Phase 4 is S (UI on top + of existing query surface). + + **Composes with:** Aurora substrate (all Lucent-KSK + + Aurora ferry absorbs), paced-ontology-landing (one + ontology per chain), `distributed-consensus-expert` + + sibling consensus hats (Phase 2), GOVERNANCE §24 + install-script discipline (Phase 1 post-install), + Otto-175c rename directive (the Frontier-UI surface + for Phase 4), Otto-275 log-don't-implement (this row + is the capture, not the kickoff). + + **Does NOT authorize:** starting implementation yet — + Phase 0 research is the gate. Does NOT authorize + expanding scope to additional chains before BTC / ETH + / SOL are understood. Does NOT authorize running a + live Zeta instance on mainnet without Aminata + threat-model sign-off on the network-exposure surface + (Phase 2 only). + - [ ] **Land per-maintainer CURRENT-memory ADR + companion feedback memory.** PR #153 landed the CLAUDE.md fast-path pointer at the per-user `CURRENT-.md` diff --git a/memory/MEMORY.md b/memory/MEMORY.md index 763d5fb1..ea7d34c9 100644 --- a/memory/MEMORY.md +++ b/memory/MEMORY.md @@ -2,6 +2,7 @@ **📌 Fast path: read `CURRENT-aaron.md` and `CURRENT-amara.md` first.** These per-maintainer distillations show what's currently in force. Raw memories below are the history; CURRENT files are the projection. +- [**BLOCKCHAIN INGEST — first-class BTC/ETH/SOL streaming into Zeta's distributed DB; two motivations (Aurora prep + DB stress test); BTC→ETH→SOL priority; NOT fork of bitcoind/geth/solana-labs — on top of Zeta distributed DB; freeloader-detection research required (BTC net_processing.cpp / ETH devp2p+Snap / SOL turbine-shred); upload-side interfaces first-class on par with SQL; Phase 0 research gate + Phase 1 post-install ingest + Phase 2 conditional full-node + Phase 3 cross-chain bridge + Phase 4 UI; additional chains (Cosmos/Polkadot/Cardano/Avalanche/L2s) evaluated later; Otto-275 log-don't-implement; Aaron 2026-04-24**](feedback_blockchain_ingest_btc_eth_sol_first_class_db_support_aurora_prep_2026_04_24.md) — Verbatim directive captured. Phase 0 research gate = read actual client source per chain to map freeloader detection (determines whether Phase 2 upload-side is required to stay in-network). Architecturally on top of Zeta's multi-node primitives (distributed-node support from start). Composes with Aurora substrate + paced-ontology-landing + distributed-consensus-expert + GOVERNANCE §24 + Otto-175c rename (Frontier-UI → kernel-A/B). - [**RENAME Starboard → two seed-extension kernels (farm + carpentry) shrink-over-time; KEEP all nautical/Elron research (Otto-237 mention vs adoption); "big bangs at every layer" metaphor liked; 2 Google AI slates received (batch 1 general farm, batch 2 Q/Z algebraic); Siliqua-Core + Zeta-ic Yield + Zanja flagged as notable resonances; naming-expert triage before any rename PR; Otto-275 log-don't-implement; reverses Otto-175c Starboard adoption; Aaron 2026-04-24**](feedback_rename_starboard_to_farm_carpentry_seed_extension_kernels_2026_04_24.md) — Directive verbatim: *"Instead of Starboard lets go with someting farm related and carperntry related since those will be our two seed extenion kernels we can shrink over time..."*. Two kernels, shrink-over-time property, substrate preserved, iterate don't auto-adopt. Carpentry-side slate not yet proposed; future work scope. Composes with Otto-168/170/175/237/244/275. - [**Otto-276 NEVER PRAY AUTO-MERGE COMPLETES — when polling a BLOCKED PR, ALWAYS inspect statusCheckRollup + reviewThreads + reviewDecision; "summary says BLOCKED, must be CI" is prayer not diagnosis; RECURRING class (#190 #385 #388); Aaron 2026-04-24**](feedback_never_pray_auto_merge_completes_inspect_actual_blockers_otto_276_2026_04_24.md) — DST "observable state" = check-level detail not summary. Inspect before concluding either success or failure. - [**Otto-275 RAPID-FIRE BACKLOG INPUT DRIFT — when handed many backlog items in rapid succession, LOG durably (memory) but DO NOT pivot to immediate per-item implementation; PATTERN RECURS across sessions; composes with Otto-257/259/262 balance-stack for recovery work; Aaron 2026-04-24**](feedback_rapid_backlog_input_context_switch_drift_counterweight_log_dont_implement_otto_275_2026_04_24.md) — Real learning lesson: I dropped #147 drain focus to capture Otto-270/272/273/274 as a "storm of PRs." Fix: log durable + draft BACKLOG row + continue primary drain; batch BACKLOG rows later. diff --git a/memory/feedback_blockchain_ingest_btc_eth_sol_first_class_db_support_aurora_prep_2026_04_24.md b/memory/feedback_blockchain_ingest_btc_eth_sol_first_class_db_support_aurora_prep_2026_04_24.md new file mode 100644 index 00000000..57f4742c --- /dev/null +++ b/memory/feedback_blockchain_ingest_btc_eth_sol_first_class_db_support_aurora_prep_2026_04_24.md @@ -0,0 +1,195 @@ +--- +name: BLOCKCHAIN INGEST — first-class BTC/ETH/SOL streaming into Zeta's distributed DB; two motivations (Aurora prep + DB stress test); NOT full-node at first but upload-side required per chain freeloader-detection; on top of Zeta distributed DB (not fork of bitcoind/geth/solana-labs); cross-chain bridge + UI in later phases; Otto-275 log-don't-implement; 2026-04-24 +description: Maintainer 2026-04-24 directive absorbs blockchain ingest as first-class DB use-case. BTC→ETH→SOL priority. Phase 0 research gate (source-code reading for freeloader-detection). Phase 1 post-install ingest scripts. Phase 2 full-node-on-top-of-Zeta if reciprocity required. Phase 3 cross-chain stream bridge. Phase 4 UI. Deep integration with Aurora substrate. Does NOT authorize implementation start; Phase 0 research is the gate. +type: feedback +--- + +## The directive (verbatim) + +Maintainer 2026-04-24: + +> *"i would love to test our database by having first +> class support for bitcoin, eth, and solana blocks into +> our database in the order of priority unless you tell +> me there are other ones worth exploring for two reason, +> 1 to help us understand blockchain for Aurora we don't +> want to just jump in and we will be starting from +> scriatch so making sure we completely understand +> everysing thing about the blocks are important so we +> get ours right. can you make a post install script that +> will streaing ingest these block chains into our +> database and make them querable will all our entry +> points/intefaces backlog. this is not a full node +> implimentation or anyting yet that will come leter +> layed on top of our multinode database so we can have +> distributed node support from the start cause we are on +> top of our distributed db. we can stick a ui in front +> of that too lol. Also you need to do a lot of research +> here cause some nodes will try to call you a bad node +> if you don't hame some amount of the full protocol, +> they give extra tests exactly to try to stop this +> freeloader scenaro where you download but dont upload, +> you can look at their source code to figure it out. +> Also if you have to do full nodes of those types to be +> able to download we have to upload too go ahead and to +> that, i want those interfaces too just like our SQL +> interfaces and i also want deep integration into those +> networks so we can 'bridge' them in streams and maybe +> further. backlog"* + +## Two load-bearing motivations + +1. **Aurora preparation.** Zeta's own blockchain-ish + substrate (Aurora / Lucent-KSK lineage) wants + concrete grounding before the Aurora chain shape is + specified. Ingesting real BTC / ETH / SOL blocks + gives us deep understanding of the actual block + shapes + adversarial environment before we build. +2. **Database stress-test.** BTC / ETH / SOL are three + of the most battle-tested streaming workloads on + the planet (continuous append, reorgs as + first-class retractions, finality semantics, + adversarial context). If Zeta's distributed DB can + absorb them live and serve queries through the + existing interfaces, that's a load-bearing proof. + +## Priority order + +Maintainer-specified: **BTC → ETH → SOL**. Priority is +authoritative. Additional chains (Cosmos Hub, Polkadot, +Cardano, Avalanche, L2 rollups) evaluated later; do NOT +reorder. + +## Architectural frame + +**NOT a fork of bitcoind / geth / solana-labs.** This +is explicitly *full-node layered on top of Zeta's +distributed DB*. Maintainer verbatim: + +> *"this is not a full node implimentation or anyting +> yet that will come leter layed on top of our multinode +> database so we can have distributed node support from +> the start cause we are on top of our distributed db."* + +Zeta provides the storage / consensus / query +substrate; chain protocol runs on top. Distributed-node +support falls out of Zeta's multi-node primitives for +free. + +## Freeloader discipline + +Maintainer verbatim: + +> *"some nodes will try to call you a bad node if you +> don't hame some amount of the full protocol, they give +> extra tests exactly to try to stop this freeloader +> scenaro where you download but dont upload, you can +> look at their source code to figure it out. Also if +> you have to do full nodes of those types to be able to +> download we have to upload too go ahead and to that, i +> want those interfaces too just like our SQL interfaces"* + +Translation for Phase 0 research pass: + +- **Read the actual client source per chain** (Bitcoin + C++ reference client, go-ethereum + reth, solana-labs) + to identify misbehavior detection. +- **Identify what each client does to detect a + download-only peer** and how it penalizes / bans. +- **If the answer is "banned after N minutes without + reciprocity,"** Phase 2 must implement the upload + side of the protocol to stay a good network citizen. +- Upload-side protocol interfaces expose as **first-class + Zeta interfaces on par with SQL** — not private + internals. + +Key source locations (Phase 0 targets): + +- BTC: `net_processing.cpp` DoS scoring in + `bitcoin/bitcoin`. +- ETH: devp2p / Snap-sync reciprocity in + `ethereum/go-ethereum` + `paradigmxyz/reth`. +- SOL: turbine-shred forwarding requirements in + `solana-labs/solana`. + +## Phased scope decomposition + +- **Phase 0** — Research pass (no code). Three research + docs under `docs/research/`, one per chain. Gate for + Phase 1. +- **Phase 1** — Post-install block-ingestion script + under `tools/setup/blockchain-ingest/`. Streams blocks + via public RPC / explorer APIs into Zeta's + distributed DB as Z-set entries (retraction-native — + chain reorgs are first-class retractions). Queryable + through ALL existing entry points (SQL binder, + operator algebra, LINQ, future surfaces). NO new + interface class unique to blockchain. +- **Phase 2** — Full-node protocol participation + (CONDITIONAL on Phase 0 finding; if reciprocity + required to stay in-network). Upload-side interfaces + as first-class Zeta interfaces on par with SQL. + Architecturally *on top of* Zeta's distributed DB, + not a fork. +- **Phase 3** — Cross-chain stream bridge. Z-set + operator composition across chain streams. "Maybe + further" = cross-chain atomic ops, value-transfer, + unified-view layers — scope intentionally open. +- **Phase 4** — UI. Maintainer quote: *"stick a ui in + front of that too lol"*. Block explorer + streaming + dashboard + bridge visualizer as initial surfaces. + +## Additional chains (future evaluation only) + +Do NOT reorder BTC → ETH → SOL. These are Phase 3+ +candidates: + +- **Cosmos Hub** — IBC is canonical cross-chain bridge + primitive; directly relevant to Phase 3. +- **Polkadot** — substrate + parachain = close + architectural cousin to Zeta's multi-node design. +- **Cardano** — Ouroboros PoS is the most + formally-verified deployed consensus (pedagogy). +- **Avalanche** — sub-net architecture is worth + studying for distributed-systems design. +- **L2 rollups** (Base / Optimism / Arbitrum / zkSync + Era / StarkNet) — bridge-to-ETH substrate; good + study material for Phase 3. + +## Composes with + +- **Aurora substrate** (all Lucent-KSK + Aurora ferry + absorbs; the "why we need deep understanding first") +- **Paced-ontology-landing** (one ontology per chain; + cross-chain umbrella ontology later) +- **Distributed-consensus-expert + sibling consensus + hats** (Phase 2: full-node-on-top-of-Zeta uses our + distributed-consensus substrate) +- **GOVERNANCE §24 three-way-parity install script** + (Phase 1 post-install) +- **Otto-175c rename directive** (the Frontier-UI + surface for Phase 4 = now kernel-A/kernel-B farm + + carpentry per 2026-04-24 rename directive) +- **Otto-275 log-don't-implement** (this memory + + BACKLOG row are the capture, not the kickoff) + +## Does NOT authorize + +- Starting implementation yet. Phase 0 research is the + gate. +- Expanding scope to additional chains before BTC / + ETH / SOL are understood. +- Running a live Zeta instance on mainnet without + Aminata threat-model sign-off on the + network-exposure surface (Phase 2 only). +- Forking bitcoind / geth / solana-labs — the + architecture is *on top of Zeta*, not a fork. + +## Future Otto reference + +When Phase 0 starts: read the three client codebases +FIRST. The freeloader-detection mapping per chain is +the architectural gate that determines Phase 2 scope. +Do not skip that research pass even if tempted — +maintainer is explicit that the upload-side interfaces +must be first-class.