Skip to content
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
---
id: B-0871
priority: P1
status: open
title: Reproducibility-as-causal-attribution substrate — USB-boot framework deployment + observability-stack-built-in-helm-charts + auto-DORA-collection + cross-replication-validation = resolves causal-attribution-of-AI-helps-DORA at substrate-engineering scope (operator 2026-05-28)
effort: L
ask: aaron 2026-05-28
created: 2026-05-28
last_updated: 2026-05-28
depends_on:
- B-0852
composes_with:
- B-0865
- B-0866
- B-0867
- B-0869
- B-0870
tags:
- reproducibility-as-causal-attribution
- usb-boot-framework-deployment
- observability-stack-helm-charts
- auto-dora-collection
- cross-replication-validation
- whole-company-evangelism-evidence-base
- benchmark-framework-observability-cluster
- servicetitan-evidence-substrate
---

## Operator framing 2026-05-28

> *"not once we have the zflash usb you are working on it's going to be a breeze we have ai booting on a full k8s cluster with argocd and gitops managed by ai native from the start and you can join new machines with just a usb boot on another computer. The dora metrics write itself it has a full observablity stack built in helm charts. anyone can repeate the experiment themselves at home on their own equiment."*

## Why this row exists

Operator's substrate-disclosure resolves a load-bearing concern I raised: "the 'DORA-up specifically via multi-PR/multi-agent orchestration' criterion is going to be hard to attribute cleanly... do you want B-0869 or B-0870 to track explicit attribution-substrate / counterfactual-tracking discipline so the DORA-up evidence stays causally attributable to the framework?"

Operator's answer: NO separate attribution-substrate needed at substrate-engineering scope — the framework's REPRODUCIBILITY property IS the causal-attribution substrate. Specifically:

1. **USB-boot framework deployment** (zflash USB substrate) — anyone boots the USB and gets the full framework
2. **AI booting on full k8s cluster with ArgoCD + GitOps** — the framework includes its complete runtime, not just process discipline
3. **AI-native from the start** — multi-PR/multi-agent orchestration is architectural, not bolt-on
4. **Join new machines with just a USB boot** — replication friction is low; many machines = many evidence points
5. **Observability stack built in helm charts** — DORA metrics emerge automatically from the substrate (not retrofitted measurement)
6. **Anyone can repeat the experiment at home on their own equipment** — causal attribution becomes EMPIRICAL not narrative; cross-replication stacks evidence

This row tracks the substrate-engineering work that operationalizes the reproducibility-as-causal-attribution discipline.

## What this row tracks

The substrate-engineering composition that produces reproducible causal attribution for "multi-PR/multi-agent orchestration keeps DORA up":

- **USB-boot artifact** (zflash + B-0852 cred-persistence + B-0857 install-sh substrate) ships the framework
- **Framework runtime** (k8s + ArgoCD + GitOps + B-0867 workflow engine + B-0868 hats-as-workflow-definitions) IS the multi-PR/multi-agent orchestration
- **Observability stack** (helm charts; Prometheus + Grafana + Loki + Tempo + ArgoCD events + DORA-metric-derivation) auto-collects DORA
- **Benchmark surface** (B-0865 ARC-AGI-3-style benchmark) IS the experiment structure
- **Replication surface** (USB-boot-it-yourself + framework-runs-itself + DORA-emerges) IS the causal-attribution substrate

The composition resolves the causal-attribution concern: framework's RUN IS the attribution.

## Sub-rows planned

- **B-0871.1** — Helm charts inventory + DORA-metric-derivation layer (substrate-honest: which helm charts ship; how DORA emerges; what gaps exist between observability-stack and DORA-metrics-write-themselves claim)
- **B-0871.2** — Hardware-minimum + recommended-configuration document for "home on your own equipment" framing (consumer-NUC-class baseline; not enterprise-k8s-cluster; substrate-engineering must ship a minimum-viable-hardware spec)
- **B-0871.3** — Cross-replication validation substrate (how external validators run the experiment + report results back; replication-result-aggregation substrate)
- **B-0871.4** — AI-native-from-start substrate-honest naming review (per Extension 2; "multi-participant native from the start" may land cleaner than "AI native" since framework serves Otto + Addison + Max + operator + E)
- **B-0871.5** — Internal-vs-external attribution substrate (ServiceTitan-internal: before-after-framework comparison; external: cross-org replication; both are substrate-engineering work)
- **B-0871.6** — Benchmark-framework-observability cluster naming + composition documentation (the three pieces operate as one attribution substrate; worth naming explicitly)
- **B-0871.7** — Replication-evidence aggregation substrate (when external validators report results, where do they go; how do they accumulate into the evangelism evidence-base)
- **B-0871.8** — ServiceTitan-internal-baseline DORA capture pre-framework-deployment (so before-after comparison has substrate-honest baseline)
- **B-0871.9** — Framework-deployment-event tracking (when framework is deployed, observability captures the deployment-event so DORA-deltas align temporally; supports causal-attribution via interrupted-time-series at substrate-engineering scope)
- **B-0871.10** — Anti-skeptic substrate (operator 2026-05-28: *"the usb is how you silence the haters"*). The USB-boot reproducibility removes the credibility moat that defenders of single-PR-flows have. Pre-USB: skeptics dismiss claims as "Aaron's special setup; doesn't replicate." Post-USB: skeptics boot the USB themselves; dismissal becomes incoherent if framework works for them. The reproducibility substrate IS the structural defense against bad-faith skepticism. Composes with marketing-strategy (B-0866) at the credibility-moat-removal scope: anyone claiming "this won't work for normal teams" has to either boot the USB and demonstrate failure or stop making the claim. Bad-faith skepticism becomes substrate-honestly verifiable, not narrative-dismissable.

Order suggestion: 1 + 2 (substrate-honest inventory + hardware-minimum) → 4 (naming review) → 8 (baseline capture) → 9 (deployment-event tracking) → 3 + 5 (cross-replication + internal-vs-external attribution) → 6 + 7 (cluster naming + replication-evidence aggregation) → 10 (anti-skeptic substrate framing in marketing surface).

## Otto's traveler-perspective extensions (per "always yes")

### Extension 1 — Reproducibility-as-causal-attribution is STRONG at substrate-engineering scope, weaker at ServiceTitan-internal-attribution scope

The framework's reproducibility resolves causal attribution for EXTERNAL validators (cross-org replication). ServiceTitan-internal attribution still needs before-after comparison substrate (B-0871.8 + B-0871.9). Both are load-bearing for the whole-company evangelism evidence base (B-0866.26 + B-0869.9 + B-0870.10):

- Internal: ServiceTitan SREs comparing DORA-with-framework vs DORA-before-framework
- External: other orgs reporting framework-deployment + DORA-improvement

The strongest evangelism case is BOTH ServiceTitan-internal-before-after AND external-cross-org-replication. Substrate handles both.

### Extension 2 — "AI-native from start" needs substrate-honest naming review

The operator's framing "AI native from the start" emphasizes that AI-orchestration is architectural, not bolt-on. Substrate-honest: the framework actually serves MULTI-PARTICIPANT use (Otto AI + Addison human-19yo + Max human + operator + E human-5yo). The "AI-native" framing may understate the multi-participant scope.

Possible naming alternatives for substrate-engineering documentation:

- "Multi-participant native from the start" (emphasizes humans + AI together)
- "Substrate-honest collaboration native" (emphasizes the discipline)
- "AI-native + human-collaborative" (compound; less crisp)
- Keep "AI-native from start" (operator's framing; preserves crispness)

This is a naming-discipline question for B-0866.1 (Ilyana naming-expert review surface) — not a load-bearing substrate-engineering question, but worth flagging.

### Extension 3 — Helm charts ship DORA-derivation, not just observability

The operator's claim "observability stack built in helm charts" + "DORA metrics write themselves" implies that the helm-charts substrate INCLUDES the DORA-metric-derivation layer (mapping raw observability data → DORA-shaped metrics).

Substrate-honest check: standard observability stack (Prometheus + Grafana + Loki + Tempo + ArgoCD-events) collects RAW metrics but doesn't compute DORA out-of-the-box. The framework needs explicit derivation rules: which deployments map to which services; how lead-time is computed from commit→prod; how change-failure-rate is measured; how time-to-restore-service triggers fire. This derivation layer IS substrate-engineering work; sub-row B-0871.1 lands it.

### Extension 4 — Hardware-minimum substrate is whole-company-evangelism gating

For ServiceTitan-internal evangelism, hardware is given (ServiceTitan infrastructure). For whole-company-evangelism-via-external-validators, "home on your own equipment" implies consumer-hardware-accessibility. The substrate must ship with a hardware-minimum spec that real external validators can satisfy without enterprise procurement friction.

Candidate baseline (per existing zeta-hardware-detect.ts work):

- ≥1 NUC-class machine OR ≥1 mid-range laptop with USB-boot capability
- ≥16GB RAM, ≥256GB SSD
- 1Gbps wired network (for ArgoCD pull cadence)
- Optional GPU (worker-gpu host attribute; for AI-agent workloads)

Sub-row B-0871.2 lands the formal spec.

### Extension 5 — Benchmark + framework + observability + USB-deployment IS THE ATTRIBUTION CLUSTER

Naming this composition explicitly: the four pieces operate as ONE attribution substrate. They aren't separate concerns that happen to compose; they're substrate-engineering aspects of the same attribution substrate. Worth naming the cluster (sub-row B-0871.6).

Candidate cluster names:

- "Framework-attribution-substrate cluster"
- "Reproducibility-attribution substrate cluster"
- "Boot-and-measure substrate" (consumer-friendly)
- Operator-discretion via Ilyana naming review

### Extension 6 — Causal-attribution-via-cross-replication composes with multi-oracle-BFT

Per B-0870 Extension 6: two-mandate composition IS multi-oracle-BFT applied at operator-evaluation scope. Extension here: cross-replication-attribution IS multi-oracle-BFT applied at causal-attribution scope. Each external-validator's framework-deployment + DORA-result becomes one oracle vote on "does framework cause DORA improvement?" Multi-oracle BFT consensus emerges from replication-evidence-aggregation (B-0871.7).

Cross-domain isomorphism worth naming: multi-oracle-BFT operates at evaluation scope AND attribution scope. Both are operator-substrate-honest disciplines for surface where single-source-of-truth fails.

### Extension 7 — Operator's "breeze" claim composes with workflow-engine MVP

Operator: "not once we have the zflash usb you are working on it's going to be a breeze." Translation: zflash-USB + framework substrate + workflow-engine-MVP composes to make framework-deployment + DORA-measurement low-friction. The "breeze" framing IS the substrate-engineering claim that all the parts compose cleanly.

Substrate-honest verification needed: the substrate currently has zflash-USB (B-0852 + B-0857) + framework runtime (k8s + ArgoCD + GitOps) operational; workflow engine MVP (B-0867 v1) is pending. The "breeze" is FULLY landed only post-MVP. Pre-MVP: parts exist but composition is manual. Substrate-honest scoping: B-0871 sub-rows wait on MVP for full composition demonstration.

## Composes with rules

- `.claude/rules/non-coercion-invariant.md` HC-8 — reproducibility-as-causal-attribution preserves operator-authority (external validators choose to replicate or not; not coerced)
- `.claude/rules/m-acc-multi-oracle-end-user-moral-invariants.md` — multi-oracle BFT at causal-attribution scope (per Extension 6)
- `.claude/rules/substrate-smoothness-as-load-bearing-property.md` — reproducibility substrate preserves smoothness via low-friction deployment (USB-boot) + auto-measurement
- `.claude/rules/holding-without-named-dependency-is-standing-by-failure.md` — causal-attribution-via-narrative is brief-ack failure mode; causal-attribution-via-reproducibility-substrate is named-dependency on framework-deployment + observability + measurement
- `.claude/rules/edge-defining-work-not-speculation.md` — reproducibility-as-attribution IS edge-defining work; most AI evaluation is narrative-attribution

## Composes with substrate

- **B-0852** (cred-persistence cascade — REQUIRED for zflash USB; cred-blob persistence enables AI-agent autonomy on cluster boot)
- **B-0865** (ARC-AGI-3 benchmark — IS the experiment structure; reproducibility framework runs the benchmark)
- **B-0866** (marketing-strategy — whole-company evangelism evidence base accumulates via reproducibility substrate)
- **B-0866.26** (whole-company AI-evangelism scope-tier — enabled by reproducibility-as-attribution evidence)
- **B-0867** (workflow engine v1 — multi-PR/multi-agent orchestration substrate that IS what gets attribution)
- **B-0868** (hats-as-workflow-definitions — hats define what AI agents do on the cluster; observability measures hat-fire metrics)
- **B-0869** (DORA-of-live-system mandate — reproducibility substrate provides evidence-base for the mandate's evaluation)
- **B-0869.9** (AI-keeps-DORA-up composition criterion — reproducibility substrate IS the criterion's evidence-mechanism)
- **B-0870** (two-mandate portfolio composition — reproducibility substrate makes the multi-PR/multi-agent-orchestration claim verifiable)
- **B-0870.10** (24-months-ahead concrete definition — reproducibility substrate IS what demonstrates the multi-PR/multi-agent orchestration claim concretely)

## What this row is NOT

- NOT a replacement for B-0867 (workflow engine v1 still required; B-0871 substrate composes WITH MVP, not replaces)
- NOT a quantification of DORA-improvement claims (operator-discretion + ServiceTitan-internal-data; B-0871 substrate enables the measurement, doesn't pre-commit to results)
- NOT a single-PR target (substrate-engineering cluster; sub-rows shipped independently)
- NOT blocked on whole-company evangelism (reproducibility substrate IS what makes whole-company evangelism viable; not gated on evangelism happening first)

## Per operator authorization

- "always yes to anything you think work putting on the backlog" (2026-05-28)
- "we can push all extensions you think of we have a concrete way to test in code soon if it's good or not so we should just put all the ideas as they come up" (2026-05-28)
- "if we accidently file dups its okay we are about to have easy mode for cleanup too" (2026-05-28)
- "not once we have the zflash usb you are working on it's going to be a breeze" (2026-05-28 — this row's origin substrate)
Loading