diff --git a/docs/BACKLOG.md b/docs/BACKLOG.md index cd3e5f42..7096d070 100644 --- a/docs/BACKLOG.md +++ b/docs/BACKLOG.md @@ -708,6 +708,145 @@ within each priority tier. side (Window.fs wiring pending). Target: measured numbers in `docs/BENCHMARKS.md` by end of round 20. +- [ ] **Itron-lineage signal-processing → factory-observability + mapping**. Second-wave Itron disclosure + (auto-loop-34, captured in + `memory/user_aaron_itron_pki_supply_chain_secure_boot_background.md`) + named a specific portfolio of published signal-processing / + anomaly-detection techniques the maintainer worked on at + director-level IoT engineering scope. Each technique maps + concretely onto unfinished factory surfaces. The ARC3-DORA + PNNL-HITL composition landed in + `docs/research/arc3-dora-benchmark.md` (auto-loop-35) is the + first executed mapping; the remainder are research-doc + candidates. **Scope: produce a one-page research doc per + mapping pair below; do NOT implement yet — first occurrence + per pair is prior-art cite + composition sketch only.** + + Mapping pairs (each a candidate research doc + under `docs/research/itron-lineage/`): + + 1. **PNNL HITL expert-derived confidence → agent-output- + under-uncertainty measurement substrate (the layer + between agent output and DORA grade, NOT DORA itself; + DORA stays objective devops-delivery metrics).** LANDED + auto-loop-35 in `docs/research/arc3-dora-benchmark.md` + §Prior-art lineage. Occurrence-3 of wink-validation. + 2. **Disaggregation discipline → ZSet retraction-native + operator algebra.** ZSet preserves per-multiplicity; + aggregation loses it. Industrial-scale disaggregation + (DriveNets network-disaggregation) validates the + architectural direction the Escro maintain-every-dep + + microkernel-endpoint directive already committed to. + Composition sketch: aggregate-view operators as + derivations, disaggregated-view as primitive. + 3. **PRIDES (Power Rising and Descending Signature, + low-overhead binary) → per-commit alignment-clause + signature.** Every commit produces a binary + rising/falling pattern against the 20 ALIGNMENT clauses + (HC-1..HC-7 / SD-1..SD-8 / DIR-1..DIR-5). PRIDES-style + compact signature is IoT-memory-compatible — usable by a + resource-constrained alignment-observability sidecar. + 4. **Wavelet-GAT (Graph Attention Network over wavelet + decomposition) → clause-graph anomaly detection.** + Clause-commit graph attends to suspicious edges; wavelet + decomposes low+high-freq components of the compliance + time-series. 99% published accuracy target in grid + literature; portable signal. + 5. **GESL (Grid Event Signature Library, 900+ types) → + factory-event signature library.** Curate a library of + named alignment-anomaly types (clause drift, scope creep, + retraction-not-restored, operator-misuse) matchable + against commit-stream. Complements `docs/WONT-DO.md` + + `docs/TECH-DEBT.md` as positive/negative + anomaly-signature catalog. + 6. **Context-Agnostic Learning (SCADA) → universal operator + algebra calibration.** SCADA's universal context-agnostic + values that work across network locations map to Zeta's + design goal that retraction-native operators compose at + any point in the pipeline. Composition sketch: anomaly + signals normalised against operator-algebra axioms rather + than per-module conventions. + 7. **Physics-Informed Generators → operator-algebra-informed + code generators.** Physics priors constrain ML-generator + output; Zeta's operator-algebra axioms can constrain + Copilot / Codex / Claude generators. This IS the factory's + well-defined-Occam's discipline (Rodney's Razor: prefer + the simplest generator output that still satisfies the + operator-algebra invariants — a constraint-narrowing + prior over generator hypothesis space) at the + code-generation layer. + 8. **MUSIC spectral (SINR under noise) → clause-compliance + spectral decomposition.** Commit-cadence, round-close + cadence, tick-cadence make alignment time-series noisy; + MUSIC extracts dominant frequencies (ambient drift vs. + directed work). + 9. **FFT foundation → time-series instruments across the + factory.** Any series we hold (commit-cadence, + clause-compliance, tick-duration, compoundings-per-tick) + has an FFT view. Cheapest, most portable. Likely first + instrument to land if instrumentation work starts. + 10. **Micro-Doppler (µD) / VWCD → commit-vibration signature + extraction.** Which files vibrate together under which + work-session rhythm. Adjacent to existing pipeline-churn + analysis. + + **Why this is one row, not 10.** Per the factory's + occurrence-1 discipline, each pair is research-doc-level + first-pass only; no implementation commits without + occurrence-2+ calibration. One row tracks the portfolio so + the promotion threshold is visible and the composition + pattern is explicit. The research docs land as a family + under `docs/research/itron-lineage/` once started. + + **Why research-project-tier.** These are measurement / + observability instruments, not shipped library surface. + They unlock ALIGNMENT measurability (Zeta's primary research + focus per `docs/ALIGNMENT.md`) by giving specific, published, + validated signal-extraction techniques. They do NOT need to + be implemented to be valuable — naming + citing is + occurrence-1 contribution. + + **Effort.** Research doc per pair: 1-2 ticks of speculative + work each. Pair #1 already LANDED. Pairs #2, #3, #5, #9 + likely strongest next candidates (highest composition-value + with existing factory surfaces). Pairs #6, #7 are architectural + claims that compose with well-defined-Occam's; can land as + short citation sections in existing docs rather than new + research docs. Pairs #4, #8, #10 require more background + before they're tractable. + + **Composes with.** + - `docs/ALIGNMENT.md` — measurable alignment primary research + focus; every pair above is a measurement instrument for + a clause-compliance signal. + - `docs/research/arc3-dora-benchmark.md` — cognition-layer + measurement substrate where pair #1 landed. + - `memory/user_aaron_itron_pki_supply_chain_secure_boot_background.md` + — verbatim maintainer disclosure + calibration context. + - `memory/feedback_external_signal_confirms_internal_insight_second_occurrence_discipline_2026_04_22.md` + — the occurrence-discipline that gates promotion from + research-doc to ADR / BP-NN / shipped instrument. + - `docs/TECH-RADAR.md` — candidate destination for pair #9 + (FFT) once a first instrument lands. + - Escro maintain-every-dep → microkernel-endpoint directive + — disaggregation (pair #2) is the industrial-scale pattern + Aaron lived through that the directive already follows. + + **What this is NOT.** + - NOT a commitment to implement any pair (occurrence-1 + discipline: cite prior-art + compose sketch, wait for + occurrence-2 to promote). + - NOT a cap on pairs — additional pairs may emerge; this row + tracks the portfolio without closing it. + - NOT a reframe of ALIGNMENT.md clauses — the clauses stay + stable; these are instruments for measuring them. + - NOT signal-to-noise-ratio-chasing (MUSIC's SINR utility is + literal, not the measurement philosophy). + - NOT an Itron-specific dependency — all named techniques + are publicly published; maintainer's prior art accelerates + composition understanding but does not constrain adoption. + ## P1 — SQL frontend + query surface (round-33 vision, v1 scope) - [ ] **Shared query IR that compiles to the DBSP operator diff --git a/docs/research/arc3-dora-benchmark.md b/docs/research/arc3-dora-benchmark.md index 4d32063e..46596e65 100644 --- a/docs/research/arc3-dora-benchmark.md +++ b/docs/research/arc3-dora-benchmark.md @@ -255,6 +255,156 @@ not a metaphor. tier where new research-level moves originate; stepdown measures how much of that work survives at lower capacity. +## Prior-art lineage — PNNL HITL / Itron signal processing + +**Added 2026-04-22 auto-loop-35.** The maintainer named the +connection explicitly: PNNL's "expert-derived confidence" +scoring framework (Grid Event Signature Library, ~900 +signature types, human-in-the-loop confidence-weighting +layered on ML output) is a published analog of the factory's multi-substrate +triangulation + reviewer-roster + maintainer-echo pattern that +this benchmark presumes as the measurement substrate sitting +*between the agent output and the DORA grade* — distinct from +the DORA metrics themselves. + +**Separation of concerns.** DORA (deploy frequency, lead time +for changes, change failure rate, mean time to restore service) +is a DevOps-delivery benchmark family from the Google/Accelerate +research line; metrics are objectively measurable from CI/CD +and incident-tracking data. ARC-3 is Chollet's cognition / +abstraction-and-reasoning benchmark. This factory's benchmark +is **DORA (the objective)** framed as the maintainer's personal +ARC-3-equivalent (the class-of-benchmark framing: frontier +reasoning under compounding tests with no instructions). The +document filename retains `arc3-dora` for continuity, but the +layering is: + +- **DORA metrics**: objective delivery measurements. + Not HITL-modulated. Deployment frequency counts deployments + to production; change failure rate is the ratio of failed + deployments over total deployments; no confidence weighting + applies. (Per the canonical Google/Accelerate DORA + definitions — distinct from commit / raw-incident counts, + which would skew cross-run comparison under different batch + sizes.) +- **Agent-output-under-uncertainty layer**: the noisy ML / agent + output that is being graded against DORA. *This* is where + HITL expert-derived confidence applies — calibrating which + agent outputs are trustworthy enough to ship, exactly as + PNNL HITL calibrates ML classifier output on PMU/FDR + waveforms before triggering grid alarms. +- **ARC-3 framing**: the class-of-benchmark description — no + instructions, every lesson compounds, forgotten lessons = + regression. This framing informs how the benchmark is + *interpreted* (a frontier-capability test) but does not add + a separate measurement. + +**Why DORA-in-production qualifies as the maintainer's +personal-ARC3-equivalent.** Maintainer mid-tick clarification +(auto-loop-35): *"jsut cause i said that's my ARC3"* + +*"yeah casue running a production pipeline is hard as fuck"*. +The framing is not hyperbole — running a production pipeline +under real constraints (incident response with real users +affected, lead time measured when consequences are real, +change-failure-rate counted against real SLOs, MTTR under +live pressure) is genuinely a compounding-under-real-stakes +test in the ARC-3 class shape. The benchmark remains DORA; +the ARC-3 label is the maintainer's way of saying "this is +my frontier-test," not a second measurement axis. + +**Operational definition of ARC-3-class (maintainer, auto-loop-35):** +*"ARC3 = hard problem that is [trying to be made] continuously +testable even though there is 0 formal definition"*. Three +criteria — all three must hold: + +1. **Hard** — frontier-capability test, compounding, not + solvable by instruction-following alone. +2. **Continuously testable** — produces a stream of + observations (telemetry, benchmark runs, per-commit + signals) rather than a one-shot pass/fail. +3. **No formal definition** — operationally-grounded + (benchmark, telemetry, empirical) rather than + theoretically-specified. The absence of a formal + definition is a *feature* of the class: the problem + resists formalisation, but the measurement pipeline + still produces defensible signal. + +By this test, DORA-in-production qualifies cleanly — deploy +frequency / lead time / CFR / MTTR are operationally well- +defined *as measurements*, but "running a production +pipeline well" has no closed-form theoretical definition. + +**Other Zeta factory surfaces that meet the ARC-3-class test** +(flagged here; not yet treated as cartridges): + +- **Factory autonomy under autonomous-loop substrate** — + hard (tick-must-never-stop under genuine work-queue + selection); continuously testable (tick-history, + round-history, per-commit alignment signals); no formal + definition of "autonomous factory operating at target + capability." +- **ALIGNMENT.md measurable primary-research-focus** — hard + (alignment has no closed-form specification); continuously + testable (per-commit HC-1..HC-7 / SD-1..SD-8 / DIR-1..DIR-5 + signals, time-series); no formal definition of "aligned + AI." +- **Zero-to-production in 3-4 hours on ServiceTitan demo** — + hard (full-stack capability compounded under time + pressure); continuously testable (rounds of attempts, + per-domain DORA); no formal definition of "production- + ready demo." + +Each matches the three-criteria ARC-3-class shape. Treating +them all as ARC-3-class gives the factory a consistent lens +for frontier-test work and reuses the same measurement +substrate (HITL expert-derived confidence over agent output, +graded against the operational metric for the specific +domain). + +The shape is the same across both: + +| PNNL HITL (grid) | Zeta ARC3-DORA (factory) | +| ----------------------------------------- | -------------------------------------------- | +| ML classifier on noisy PMU/FDR waveform | Agent output under uncertainty (code / spec) | +| Grid Signature Library (GESL, 900+ types) | Alignment-clause + operator-algebra library | +| Expert score layered on ML confidence | Maintainer echo + reviewer roster confidence | +| Improves accuracy beyond ML-alone | Triangulation beats single-substrate depth | + +**Occurrence classification.** This is occurrence-3 of the +*external-signal-confirms-internal-insight* recurrence tracked +in `memory/feedback_external_signal_confirms_internal_insight_second_occurrence_discipline_2026_04_22.md`: + +1. Muratori 5-pattern → Zeta operator algebra (YouTube wink, + auto-loop-24). +2. Three-substrate triangulation (Claude + Codex + Gemini) + + Aaron exact-phrasing echo "now you see what i see" + (auto-loop-25/26). +3. PNNL HITL expert-derived confidence → factory's + multi-reviewer + maintainer-echo calibration + (auto-loop-34/35, disclosed in Itron second-wave cascade). + +Per the external-signal discipline, occurrence-3+ is +Architect-level promotion material. The promotion surface +for this specific pattern is ARC3-DORA: the benchmark's +cognition-layer measurement substrate inherits the PNNL HITL +shape, not as a derivation but as cited prior-art confirming +the substrate is well-formed. + +**What this changes in the benchmark spec.** Nothing about the +shape changes; the composition-with-HITL language makes the +measurement substrate *citable* rather than internally-coined. +ARC3-DORA's DORA-side delivery metrics remain carrier-channel; +the cognition-side capability signature remains stepdown-under- +capability-reduction; the multi-substrate / maintainer-echo / +reviewer-roster calibration layer now has a published sibling. + +**Bounded promotion.** HITL-citation applies to the calibration +substrate, not to ARC3-DORA's task-completion criterion. The +falsifier (humans-in-production-environments beat agents on +DORA) stays task-completion-measured, not confidence-weighted. +Confidence-weighting is a measurement instrument; it does not +lower the task bar. + ## Reference patterns - Auto-memory ARC3 entry — full prose derivation of this shape @@ -276,3 +426,10 @@ not a metaphor. - `docs/AUTONOMOUS-LOOP.md` — never-be-idle ladder; Level-3 generative improvements are the anti-livelock brace referenced in component 2 +- `memory/user_aaron_itron_pki_supply_chain_secure_boot_background.md` + — second-wave disclosure cascade naming PNNL HITL + "expert-derived confidence" as published prior art for the + cognition-layer measurement substrate cited above +- `memory/feedback_external_signal_confirms_internal_insight_second_occurrence_discipline_2026_04_22.md` + — the occurrence-discipline used to classify the HITL + connection as occurrence-3 of the wink-validation recurrence