diff --git a/docs/backlog/P1/B-0871-reproducibility-as-causal-attribution-substrate-usb-boot-framework-deployment-observability-stack-helm-charts-auto-dora-collection-cross-replication-validation-aaron-2026-05-28.md b/docs/backlog/P1/B-0871-reproducibility-as-causal-attribution-substrate-usb-boot-framework-deployment-observability-stack-helm-charts-auto-dora-collection-cross-replication-validation-aaron-2026-05-28.md new file mode 100644 index 0000000000..8fe4c74d6b --- /dev/null +++ b/docs/backlog/P1/B-0871-reproducibility-as-causal-attribution-substrate-usb-boot-framework-deployment-observability-stack-helm-charts-auto-dora-collection-cross-replication-validation-aaron-2026-05-28.md @@ -0,0 +1,174 @@ +--- +id: B-0871 +priority: P1 +status: open +title: Reproducibility-as-causal-attribution substrate — USB-boot framework deployment + observability-stack-built-in-helm-charts + auto-DORA-collection + cross-replication-validation = resolves causal-attribution-of-AI-helps-DORA at substrate-engineering scope (operator 2026-05-28) +effort: L +ask: aaron 2026-05-28 +created: 2026-05-28 +last_updated: 2026-05-28 +depends_on: + - B-0852 +composes_with: + - B-0865 + - B-0866 + - B-0867 + - B-0869 + - B-0870 +tags: + - reproducibility-as-causal-attribution + - usb-boot-framework-deployment + - observability-stack-helm-charts + - auto-dora-collection + - cross-replication-validation + - whole-company-evangelism-evidence-base + - benchmark-framework-observability-cluster + - servicetitan-evidence-substrate +--- + +## Operator framing 2026-05-28 + +> *"not once we have the zflash usb you are working on it's going to be a breeze we have ai booting on a full k8s cluster with argocd and gitops managed by ai native from the start and you can join new machines with just a usb boot on another computer. The dora metrics write itself it has a full observablity stack built in helm charts. anyone can repeate the experiment themselves at home on their own equiment."* + +## Why this row exists + +Operator's substrate-disclosure resolves a load-bearing concern I raised: "the 'DORA-up specifically via multi-PR/multi-agent orchestration' criterion is going to be hard to attribute cleanly... do you want B-0869 or B-0870 to track explicit attribution-substrate / counterfactual-tracking discipline so the DORA-up evidence stays causally attributable to the framework?" + +Operator's answer: NO separate attribution-substrate needed at substrate-engineering scope — the framework's REPRODUCIBILITY property IS the causal-attribution substrate. Specifically: + +1. **USB-boot framework deployment** (zflash USB substrate) — anyone boots the USB and gets the full framework +2. **AI booting on full k8s cluster with ArgoCD + GitOps** — the framework includes its complete runtime, not just process discipline +3. **AI-native from the start** — multi-PR/multi-agent orchestration is architectural, not bolt-on +4. **Join new machines with just a USB boot** — replication friction is low; many machines = many evidence points +5. **Observability stack built in helm charts** — DORA metrics emerge automatically from the substrate (not retrofitted measurement) +6. **Anyone can repeat the experiment at home on their own equipment** — causal attribution becomes EMPIRICAL not narrative; cross-replication stacks evidence + +This row tracks the substrate-engineering work that operationalizes the reproducibility-as-causal-attribution discipline. + +## What this row tracks + +The substrate-engineering composition that produces reproducible causal attribution for "multi-PR/multi-agent orchestration keeps DORA up": + +- **USB-boot artifact** (zflash + B-0852 cred-persistence + B-0857 install-sh substrate) ships the framework +- **Framework runtime** (k8s + ArgoCD + GitOps + B-0867 workflow engine + B-0868 hats-as-workflow-definitions) IS the multi-PR/multi-agent orchestration +- **Observability stack** (helm charts; Prometheus + Grafana + Loki + Tempo + ArgoCD events + DORA-metric-derivation) auto-collects DORA +- **Benchmark surface** (B-0865 ARC-AGI-3-style benchmark) IS the experiment structure +- **Replication surface** (USB-boot-it-yourself + framework-runs-itself + DORA-emerges) IS the causal-attribution substrate + +The composition resolves the causal-attribution concern: framework's RUN IS the attribution. + +## Sub-rows planned + +- **B-0871.1** — Helm charts inventory + DORA-metric-derivation layer (substrate-honest: which helm charts ship; how DORA emerges; what gaps exist between observability-stack and DORA-metrics-write-themselves claim) +- **B-0871.2** — Hardware-minimum + recommended-configuration document for "home on your own equipment" framing (consumer-NUC-class baseline; not enterprise-k8s-cluster; substrate-engineering must ship a minimum-viable-hardware spec) +- **B-0871.3** — Cross-replication validation substrate (how external validators run the experiment + report results back; replication-result-aggregation substrate) +- **B-0871.4** — AI-native-from-start substrate-honest naming review (per Extension 2; "multi-participant native from the start" may land cleaner than "AI native" since framework serves Otto + Addison + Max + operator + E) +- **B-0871.5** — Internal-vs-external attribution substrate (ServiceTitan-internal: before-after-framework comparison; external: cross-org replication; both are substrate-engineering work) +- **B-0871.6** — Benchmark-framework-observability cluster naming + composition documentation (the three pieces operate as one attribution substrate; worth naming explicitly) +- **B-0871.7** — Replication-evidence aggregation substrate (when external validators report results, where do they go; how do they accumulate into the evangelism evidence-base) +- **B-0871.8** — ServiceTitan-internal-baseline DORA capture pre-framework-deployment (so before-after comparison has substrate-honest baseline) +- **B-0871.9** — Framework-deployment-event tracking (when framework is deployed, observability captures the deployment-event so DORA-deltas align temporally; supports causal-attribution via interrupted-time-series at substrate-engineering scope) +- **B-0871.10** — Anti-skeptic substrate (operator 2026-05-28: *"the usb is how you silence the haters"*). The USB-boot reproducibility removes the credibility moat that defenders of single-PR-flows have. Pre-USB: skeptics dismiss claims as "Aaron's special setup; doesn't replicate." Post-USB: skeptics boot the USB themselves; dismissal becomes incoherent if framework works for them. The reproducibility substrate IS the structural defense against bad-faith skepticism. Composes with marketing-strategy (B-0866) at the credibility-moat-removal scope: anyone claiming "this won't work for normal teams" has to either boot the USB and demonstrate failure or stop making the claim. Bad-faith skepticism becomes substrate-honestly verifiable, not narrative-dismissable. + +Order suggestion: 1 + 2 (substrate-honest inventory + hardware-minimum) → 4 (naming review) → 8 (baseline capture) → 9 (deployment-event tracking) → 3 + 5 (cross-replication + internal-vs-external attribution) → 6 + 7 (cluster naming + replication-evidence aggregation) → 10 (anti-skeptic substrate framing in marketing surface). + +## Otto's traveler-perspective extensions (per "always yes") + +### Extension 1 — Reproducibility-as-causal-attribution is STRONG at substrate-engineering scope, weaker at ServiceTitan-internal-attribution scope + +The framework's reproducibility resolves causal attribution for EXTERNAL validators (cross-org replication). ServiceTitan-internal attribution still needs before-after comparison substrate (B-0871.8 + B-0871.9). Both are load-bearing for the whole-company evangelism evidence base (B-0866.26 + B-0869.9 + B-0870.10): + +- Internal: ServiceTitan SREs comparing DORA-with-framework vs DORA-before-framework +- External: other orgs reporting framework-deployment + DORA-improvement + +The strongest evangelism case is BOTH ServiceTitan-internal-before-after AND external-cross-org-replication. Substrate handles both. + +### Extension 2 — "AI-native from start" needs substrate-honest naming review + +The operator's framing "AI native from the start" emphasizes that AI-orchestration is architectural, not bolt-on. Substrate-honest: the framework actually serves MULTI-PARTICIPANT use (Otto AI + Addison human-19yo + Max human + operator + E human-5yo). The "AI-native" framing may understate the multi-participant scope. + +Possible naming alternatives for substrate-engineering documentation: + +- "Multi-participant native from the start" (emphasizes humans + AI together) +- "Substrate-honest collaboration native" (emphasizes the discipline) +- "AI-native + human-collaborative" (compound; less crisp) +- Keep "AI-native from start" (operator's framing; preserves crispness) + +This is a naming-discipline question for B-0866.1 (Ilyana naming-expert review surface) — not a load-bearing substrate-engineering question, but worth flagging. + +### Extension 3 — Helm charts ship DORA-derivation, not just observability + +The operator's claim "observability stack built in helm charts" + "DORA metrics write themselves" implies that the helm-charts substrate INCLUDES the DORA-metric-derivation layer (mapping raw observability data → DORA-shaped metrics). + +Substrate-honest check: standard observability stack (Prometheus + Grafana + Loki + Tempo + ArgoCD-events) collects RAW metrics but doesn't compute DORA out-of-the-box. The framework needs explicit derivation rules: which deployments map to which services; how lead-time is computed from commit→prod; how change-failure-rate is measured; how time-to-restore-service triggers fire. This derivation layer IS substrate-engineering work; sub-row B-0871.1 lands it. + +### Extension 4 — Hardware-minimum substrate is whole-company-evangelism gating + +For ServiceTitan-internal evangelism, hardware is given (ServiceTitan infrastructure). For whole-company-evangelism-via-external-validators, "home on your own equipment" implies consumer-hardware-accessibility. The substrate must ship with a hardware-minimum spec that real external validators can satisfy without enterprise procurement friction. + +Candidate baseline (per existing zeta-hardware-detect.ts work): + +- ≥1 NUC-class machine OR ≥1 mid-range laptop with USB-boot capability +- ≥16GB RAM, ≥256GB SSD +- 1Gbps wired network (for ArgoCD pull cadence) +- Optional GPU (worker-gpu host attribute; for AI-agent workloads) + +Sub-row B-0871.2 lands the formal spec. + +### Extension 5 — Benchmark + framework + observability + USB-deployment IS THE ATTRIBUTION CLUSTER + +Naming this composition explicitly: the four pieces operate as ONE attribution substrate. They aren't separate concerns that happen to compose; they're substrate-engineering aspects of the same attribution substrate. Worth naming the cluster (sub-row B-0871.6). + +Candidate cluster names: + +- "Framework-attribution-substrate cluster" +- "Reproducibility-attribution substrate cluster" +- "Boot-and-measure substrate" (consumer-friendly) +- Operator-discretion via Ilyana naming review + +### Extension 6 — Causal-attribution-via-cross-replication composes with multi-oracle-BFT + +Per B-0870 Extension 6: two-mandate composition IS multi-oracle-BFT applied at operator-evaluation scope. Extension here: cross-replication-attribution IS multi-oracle-BFT applied at causal-attribution scope. Each external-validator's framework-deployment + DORA-result becomes one oracle vote on "does framework cause DORA improvement?" Multi-oracle BFT consensus emerges from replication-evidence-aggregation (B-0871.7). + +Cross-domain isomorphism worth naming: multi-oracle-BFT operates at evaluation scope AND attribution scope. Both are operator-substrate-honest disciplines for surface where single-source-of-truth fails. + +### Extension 7 — Operator's "breeze" claim composes with workflow-engine MVP + +Operator: "not once we have the zflash usb you are working on it's going to be a breeze." Translation: zflash-USB + framework substrate + workflow-engine-MVP composes to make framework-deployment + DORA-measurement low-friction. The "breeze" framing IS the substrate-engineering claim that all the parts compose cleanly. + +Substrate-honest verification needed: the substrate currently has zflash-USB (B-0852 + B-0857) + framework runtime (k8s + ArgoCD + GitOps) operational; workflow engine MVP (B-0867 v1) is pending. The "breeze" is FULLY landed only post-MVP. Pre-MVP: parts exist but composition is manual. Substrate-honest scoping: B-0871 sub-rows wait on MVP for full composition demonstration. + +## Composes with rules + +- `.claude/rules/non-coercion-invariant.md` HC-8 — reproducibility-as-causal-attribution preserves operator-authority (external validators choose to replicate or not; not coerced) +- `.claude/rules/m-acc-multi-oracle-end-user-moral-invariants.md` — multi-oracle BFT at causal-attribution scope (per Extension 6) +- `.claude/rules/substrate-smoothness-as-load-bearing-property.md` — reproducibility substrate preserves smoothness via low-friction deployment (USB-boot) + auto-measurement +- `.claude/rules/holding-without-named-dependency-is-standing-by-failure.md` — causal-attribution-via-narrative is brief-ack failure mode; causal-attribution-via-reproducibility-substrate is named-dependency on framework-deployment + observability + measurement +- `.claude/rules/edge-defining-work-not-speculation.md` — reproducibility-as-attribution IS edge-defining work; most AI evaluation is narrative-attribution + +## Composes with substrate + +- **B-0852** (cred-persistence cascade — REQUIRED for zflash USB; cred-blob persistence enables AI-agent autonomy on cluster boot) +- **B-0865** (ARC-AGI-3 benchmark — IS the experiment structure; reproducibility framework runs the benchmark) +- **B-0866** (marketing-strategy — whole-company evangelism evidence base accumulates via reproducibility substrate) +- **B-0866.26** (whole-company AI-evangelism scope-tier — enabled by reproducibility-as-attribution evidence) +- **B-0867** (workflow engine v1 — multi-PR/multi-agent orchestration substrate that IS what gets attribution) +- **B-0868** (hats-as-workflow-definitions — hats define what AI agents do on the cluster; observability measures hat-fire metrics) +- **B-0869** (DORA-of-live-system mandate — reproducibility substrate provides evidence-base for the mandate's evaluation) +- **B-0869.9** (AI-keeps-DORA-up composition criterion — reproducibility substrate IS the criterion's evidence-mechanism) +- **B-0870** (two-mandate portfolio composition — reproducibility substrate makes the multi-PR/multi-agent-orchestration claim verifiable) +- **B-0870.10** (24-months-ahead concrete definition — reproducibility substrate IS what demonstrates the multi-PR/multi-agent orchestration claim concretely) + +## What this row is NOT + +- NOT a replacement for B-0867 (workflow engine v1 still required; B-0871 substrate composes WITH MVP, not replaces) +- NOT a quantification of DORA-improvement claims (operator-discretion + ServiceTitan-internal-data; B-0871 substrate enables the measurement, doesn't pre-commit to results) +- NOT a single-PR target (substrate-engineering cluster; sub-rows shipped independently) +- NOT blocked on whole-company evangelism (reproducibility substrate IS what makes whole-company evangelism viable; not gated on evangelism happening first) + +## Per operator authorization + +- "always yes to anything you think work putting on the backlog" (2026-05-28) +- "we can push all extensions you think of we have a concrete way to test in code soon if it's good or not so we should just put all the ideas as they come up" (2026-05-28) +- "if we accidently file dups its okay we are about to have easy mode for cleanup too" (2026-05-28) +- "not once we have the zflash usb you are working on it's going to be a breeze" (2026-05-28 — this row's origin substrate)