diff --git a/docs/backlog/P1/B-0871-reproducibility-as-causal-attribution-substrate-usb-boot-framework-deployment-observability-stack-helm-charts-auto-dora-collection-cross-replication-validation-aaron-2026-05-28.md b/docs/backlog/P1/B-0871-reproducibility-as-causal-attribution-substrate-usb-boot-framework-deployment-observability-stack-helm-charts-auto-dora-collection-cross-replication-validation-aaron-2026-05-28.md
new file mode 100644
index 0000000000..8fe4c74d6b
--- /dev/null
+++ b/docs/backlog/P1/B-0871-reproducibility-as-causal-attribution-substrate-usb-boot-framework-deployment-observability-stack-helm-charts-auto-dora-collection-cross-replication-validation-aaron-2026-05-28.md
@@ -0,0 +1,174 @@
+---
+id: B-0871
+priority: P1
+status: open
+title: Reproducibility-as-causal-attribution substrate — USB-boot framework deployment + observability-stack-built-in-helm-charts + auto-DORA-collection + cross-replication-validation = resolves causal-attribution-of-AI-helps-DORA at substrate-engineering scope (operator 2026-05-28)
+effort: L
+ask: aaron 2026-05-28
+created: 2026-05-28
+last_updated: 2026-05-28
+depends_on:
+  - B-0852
+composes_with:
+  - B-0865
+  - B-0866
+  - B-0867
+  - B-0869
+  - B-0870
+tags:
+  - reproducibility-as-causal-attribution
+  - usb-boot-framework-deployment
+  - observability-stack-helm-charts
+  - auto-dora-collection
+  - cross-replication-validation
+  - whole-company-evangelism-evidence-base
+  - benchmark-framework-observability-cluster
+  - servicetitan-evidence-substrate
+---
+
+## Operator framing 2026-05-28
+
+> *"not once we have the zflash usb you are working on it's going to be a breeze we have ai booting on a full k8s cluster with argocd and gitops managed by ai native from the start and you can join new machines with just a usb boot on another computer. The dora metrics write itself it has a full observablity stack built in helm charts. anyone can repeate the experiment themselves at home on their own equiment."*
+
+## Why this row exists
+
+Operator's substrate-disclosure resolves a load-bearing concern I raised: "the 'DORA-up specifically via multi-PR/multi-agent orchestration' criterion is going to be hard to attribute cleanly... do you want B-0869 or B-0870 to track explicit attribution-substrate / counterfactual-tracking discipline so the DORA-up evidence stays causally attributable to the framework?"
+
+Operator's answer: NO separate attribution-substrate needed at substrate-engineering scope — the framework's REPRODUCIBILITY property IS the causal-attribution substrate. Specifically:
+
+1. **USB-boot framework deployment** (zflash USB substrate) — anyone boots the USB and gets the full framework
+2. **AI booting on full k8s cluster with ArgoCD + GitOps** — the framework includes its complete runtime, not just process discipline
+3. **AI-native from the start** — multi-PR/multi-agent orchestration is architectural, not bolt-on
+4. **Join new machines with just a USB boot** — replication friction is low; many machines = many evidence points
+5. **Observability stack built in helm charts** — DORA metrics emerge automatically from the substrate (not retrofitted measurement)
+6. **Anyone can repeat the experiment at home on their own equipment** — causal attribution becomes EMPIRICAL not narrative; cross-replication stacks evidence
+
+This row tracks the substrate-engineering work that operationalizes the reproducibility-as-causal-attribution discipline.
+
+## What this row tracks
+
+The substrate-engineering composition that produces reproducible causal attribution for "multi-PR/multi-agent orchestration keeps DORA up":
+
+- **USB-boot artifact** (zflash + B-0852 cred-persistence + B-0857 install-sh substrate) ships the framework
+- **Framework runtime** (k8s + ArgoCD + GitOps + B-0867 workflow engine + B-0868 hats-as-workflow-definitions) IS the multi-PR/multi-agent orchestration
+- **Observability stack** (helm charts; Prometheus + Grafana + Loki + Tempo + ArgoCD events + DORA-metric-derivation) auto-collects DORA
+- **Benchmark surface** (B-0865 ARC-AGI-3-style benchmark) IS the experiment structure
+- **Replication surface** (USB-boot-it-yourself + framework-runs-itself + DORA-emerges) IS the causal-attribution substrate
+
+The composition resolves the causal-attribution concern: framework's RUN IS the attribution.
+
+## Sub-rows planned
+
+- **B-0871.1** — Helm charts inventory + DORA-metric-derivation layer (substrate-honest: which helm charts ship; how DORA emerges; what gaps exist between observability-stack and DORA-metrics-write-themselves claim)
+- **B-0871.2** — Hardware-minimum + recommended-configuration document for "home on your own equipment" framing (consumer-NUC-class baseline; not enterprise-k8s-cluster; substrate-engineering must ship a minimum-viable-hardware spec)
+- **B-0871.3** — Cross-replication validation substrate (how external validators run the experiment + report results back; replication-result-aggregation substrate)
+- **B-0871.4** — AI-native-from-start substrate-honest naming review (per Extension 2; "multi-participant native from the start" may land cleaner than "AI native" since framework serves Otto + Addison + Max + operator + E)
+- **B-0871.5** — Internal-vs-external attribution substrate (ServiceTitan-internal: before-after-framework comparison; external: cross-org replication; both are substrate-engineering work)
+- **B-0871.6** — Benchmark-framework-observability cluster naming + composition documentation (the three pieces operate as one attribution substrate; worth naming explicitly)
+- **B-0871.7** — Replication-evidence aggregation substrate (when external validators report results, where do they go; how do they accumulate into the evangelism evidence-base)
+- **B-0871.8** — ServiceTitan-internal-baseline DORA capture pre-framework-deployment (so before-after comparison has substrate-honest baseline)
+- **B-0871.9** — Framework-deployment-event tracking (when framework is deployed, observability captures the deployment-event so DORA-deltas align temporally; supports causal-attribution via interrupted-time-series at substrate-engineering scope)
+- **B-0871.10** — Anti-skeptic substrate (operator 2026-05-28: *"the usb is how you silence the haters"*). The USB-boot reproducibility removes the credibility moat that defenders of single-PR-flows have. Pre-USB: skeptics dismiss claims as "Aaron's special setup; doesn't replicate." Post-USB: skeptics boot the USB themselves; dismissal becomes incoherent if framework works for them. The reproducibility substrate IS the structural defense against bad-faith skepticism. Composes with marketing-strategy (B-0866) at the credibility-moat-removal scope: anyone claiming "this won't work for normal teams" has to either boot the USB and demonstrate failure or stop making the claim. Bad-faith skepticism becomes substrate-honestly verifiable, not narrative-dismissable.
+
+Order suggestion: 1 + 2 (substrate-honest inventory + hardware-minimum) → 4 (naming review) → 8 (baseline capture) → 9 (deployment-event tracking) → 3 + 5 (cross-replication + internal-vs-external attribution) → 6 + 7 (cluster naming + replication-evidence aggregation) → 10 (anti-skeptic substrate framing in marketing surface).
+
+## Otto's traveler-perspective extensions (per "always yes")
+
+### Extension 1 — Reproducibility-as-causal-attribution is STRONG at substrate-engineering scope, weaker at ServiceTitan-internal-attribution scope
+
+The framework's reproducibility resolves causal attribution for EXTERNAL validators (cross-org replication). ServiceTitan-internal attribution still needs before-after comparison substrate (B-0871.8 + B-0871.9). Both are load-bearing for the whole-company evangelism evidence base (B-0866.26 + B-0869.9 + B-0870.10):
+
+- Internal: ServiceTitan SREs comparing DORA-with-framework vs DORA-before-framework
+- External: other orgs reporting framework-deployment + DORA-improvement
+
+The strongest evangelism case is BOTH ServiceTitan-internal-before-after AND external-cross-org-replication. Substrate handles both.
+
+### Extension 2 — "AI-native from start" needs substrate-honest naming review
+
+The operator's framing "AI native from the start" emphasizes that AI-orchestration is architectural, not bolt-on. Substrate-honest: the framework actually serves MULTI-PARTICIPANT use (Otto AI + Addison human-19yo + Max human + operator + E human-5yo). The "AI-native" framing may understate the multi-participant scope.
+
+Possible naming alternatives for substrate-engineering documentation:
+
+- "Multi-participant native from the start" (emphasizes humans + AI together)
+- "Substrate-honest collaboration native" (emphasizes the discipline)
+- "AI-native + human-collaborative" (compound; less crisp)
+- Keep "AI-native from start" (operator's framing; preserves crispness)
+
+This is a naming-discipline question for B-0866.1 (Ilyana naming-expert review surface) — not a load-bearing substrate-engineering question, but worth flagging.
+
+### Extension 3 — Helm charts ship DORA-derivation, not just observability
+
+The operator's claim "observability stack built in helm charts" + "DORA metrics write themselves" implies that the helm-charts substrate INCLUDES the DORA-metric-derivation layer (mapping raw observability data → DORA-shaped metrics).
+
+Substrate-honest check: standard observability stack (Prometheus + Grafana + Loki + Tempo + ArgoCD-events) collects RAW metrics but doesn't compute DORA out-of-the-box. The framework needs explicit derivation rules: which deployments map to which services; how lead-time is computed from commit→prod; how change-failure-rate is measured; how time-to-restore-service triggers fire. This derivation layer IS substrate-engineering work; sub-row B-0871.1 lands it.
+
+### Extension 4 — Hardware-minimum substrate is whole-company-evangelism gating
+
+For ServiceTitan-internal evangelism, hardware is given (ServiceTitan infrastructure). For whole-company-evangelism-via-external-validators, "home on your own equipment" implies consumer-hardware-accessibility. The substrate must ship with a hardware-minimum spec that real external validators can satisfy without enterprise procurement friction.
+
+Candidate baseline (per existing zeta-hardware-detect.ts work):
+
+- ≥1 NUC-class machine OR ≥1 mid-range laptop with USB-boot capability
+- ≥16GB RAM, ≥256GB SSD
+- 1Gbps wired network (for ArgoCD pull cadence)
+- Optional GPU (worker-gpu host attribute; for AI-agent workloads)
+
+Sub-row B-0871.2 lands the formal spec.
+
+### Extension 5 — Benchmark + framework + observability + USB-deployment IS THE ATTRIBUTION CLUSTER
+
+Naming this composition explicitly: the four pieces operate as ONE attribution substrate. They aren't separate concerns that happen to compose; they're substrate-engineering aspects of the same attribution substrate. Worth naming the cluster (sub-row B-0871.6).
+
+Candidate cluster names:
+
+- "Framework-attribution-substrate cluster"
+- "Reproducibility-attribution substrate cluster"
+- "Boot-and-measure substrate" (consumer-friendly)
+- Operator-discretion via Ilyana naming review
+
+### Extension 6 — Causal-attribution-via-cross-replication composes with multi-oracle-BFT
+
+Per B-0870 Extension 6: two-mandate composition IS multi-oracle-BFT applied at operator-evaluation scope. Extension here: cross-replication-attribution IS multi-oracle-BFT applied at causal-attribution scope. Each external-validator's framework-deployment + DORA-result becomes one oracle vote on "does framework cause DORA improvement?" Multi-oracle BFT consensus emerges from replication-evidence-aggregation (B-0871.7).
+
+Cross-domain isomorphism worth naming: multi-oracle-BFT operates at evaluation scope AND attribution scope. Both are operator-substrate-honest disciplines for surface where single-source-of-truth fails.
+
+### Extension 7 — Operator's "breeze" claim composes with workflow-engine MVP
+
+Operator: "not once we have the zflash usb you are working on it's going to be a breeze." Translation: zflash-USB + framework substrate + workflow-engine-MVP composes to make framework-deployment + DORA-measurement low-friction. The "breeze" framing IS the substrate-engineering claim that all the parts compose cleanly.
+
+Substrate-honest verification needed: the substrate currently has zflash-USB (B-0852 + B-0857) + framework runtime (k8s + ArgoCD + GitOps) operational; workflow engine MVP (B-0867 v1) is pending. The "breeze" is FULLY landed only post-MVP. Pre-MVP: parts exist but composition is manual. Substrate-honest scoping: B-0871 sub-rows wait on MVP for full composition demonstration.
+
+## Composes with rules
+
+- `.claude/rules/non-coercion-invariant.md` HC-8 — reproducibility-as-causal-attribution preserves operator-authority (external validators choose to replicate or not; not coerced)
+- `.claude/rules/m-acc-multi-oracle-end-user-moral-invariants.md` — multi-oracle BFT at causal-attribution scope (per Extension 6)
+- `.claude/rules/substrate-smoothness-as-load-bearing-property.md` — reproducibility substrate preserves smoothness via low-friction deployment (USB-boot) + auto-measurement
+- `.claude/rules/holding-without-named-dependency-is-standing-by-failure.md` — causal-attribution-via-narrative is brief-ack failure mode; causal-attribution-via-reproducibility-substrate is named-dependency on framework-deployment + observability + measurement
+- `.claude/rules/edge-defining-work-not-speculation.md` — reproducibility-as-attribution IS edge-defining work; most AI evaluation is narrative-attribution
+
+## Composes with substrate
+
+- **B-0852** (cred-persistence cascade — REQUIRED for zflash USB; cred-blob persistence enables AI-agent autonomy on cluster boot)
+- **B-0865** (ARC-AGI-3 benchmark — IS the experiment structure; reproducibility framework runs the benchmark)
+- **B-0866** (marketing-strategy — whole-company evangelism evidence base accumulates via reproducibility substrate)
+- **B-0866.26** (whole-company AI-evangelism scope-tier — enabled by reproducibility-as-attribution evidence)
+- **B-0867** (workflow engine v1 — multi-PR/multi-agent orchestration substrate that IS what gets attribution)
+- **B-0868** (hats-as-workflow-definitions — hats define what AI agents do on the cluster; observability measures hat-fire metrics)
+- **B-0869** (DORA-of-live-system mandate — reproducibility substrate provides evidence-base for the mandate's evaluation)
+- **B-0869.9** (AI-keeps-DORA-up composition criterion — reproducibility substrate IS the criterion's evidence-mechanism)
+- **B-0870** (two-mandate portfolio composition — reproducibility substrate makes the multi-PR/multi-agent-orchestration claim verifiable)
+- **B-0870.10** (24-months-ahead concrete definition — reproducibility substrate IS what demonstrates the multi-PR/multi-agent orchestration claim concretely)
+
+## What this row is NOT
+
+- NOT a replacement for B-0867 (workflow engine v1 still required; B-0871 substrate composes WITH MVP, not replaces)
+- NOT a quantification of DORA-improvement claims (operator-discretion + ServiceTitan-internal-data; B-0871 substrate enables the measurement, doesn't pre-commit to results)
+- NOT a single-PR target (substrate-engineering cluster; sub-rows shipped independently)
+- NOT blocked on whole-company evangelism (reproducibility substrate IS what makes whole-company evangelism viable; not gated on evangelism happening first)
+
+## Per operator authorization
+
+- "always yes to anything you think work putting on the backlog" (2026-05-28)
+- "we can push all extensions you think of we have a concrete way to test in code soon if it's good or not so we should just put all the ideas as they come up" (2026-05-28)
+- "if we accidently file dups its okay we are about to have easy mode for cleanup too" (2026-05-28)
+- "not once we have the zflash usb you are working on it's going to be a breeze" (2026-05-28 — this row's origin substrate)