diff --git a/docs/BACKLOG.md b/docs/BACKLOG.md index 83266256c..f5811bc3b 100644 --- a/docs/BACKLOG.md +++ b/docs/BACKLOG.md @@ -104,6 +104,11 @@ are closed (status: closed in frontmatter)._ - [ ] **[B-0132](backlog/P2/B-0132-crdt-composition-for-bft-propagation-aaron-2026-05-01.md)** CRDT-composition for BFT propagation — substrate events as composed CRDTs - [ ] **[B-0133](backlog/P2/B-0133-sequent-calculus-for-claim-retraction-attribution-aaron-2026-05-01.md)** Sequent calculus / labeled deductive systems for claim/retraction/attribution - [ ] **[B-0134](backlog/P2/B-0134-type-theoretic-orthogonality-discipline-encoding-aaron-2026-05-01.md)** Type-theoretic encoding of orthogonality discipline (extension vs creation as decidable judgment) +- [ ] **[B-0147](backlog/P2/B-0147-timeseries-db-native-in-zsets-multi-dsl-integration-research-aaron-2026-05-01.md)** TimeSeries DB native-in-Zsets multi-DSL integration research (metrics-are-our-eyes) +- [ ] **[B-0148](backlog/P2/B-0148-mdx-as-meta-dsl-framing-fsharp-mdx-dsl-implementation-aaron-2026-05-01.md)** MDX as meta-DSL framing for multi-DSL Zset substrate + F# MDX DSL implementation +- [ ] **[B-0149](backlog/P2/B-0149-prometheus-mcp-integration-promtool-factory-agents-direct-query-aaron-2026-05-01.md)** Prometheus MCP integration + promtool — factory agents direct-query observability +- [ ] **[B-0150](backlog/P2/B-0150-timeseries-domain-expert-and-teacher-persona-aaron-2026-05-01.md)** TimeSeries / observability domain expert + teacher persona +- [ ] **[B-0151](backlog/P2/B-0151-rx-research-experience-researcher-persona-aaron-2026-05-01.md)** RX (Research eXperience) researcher persona — meta-research on the research process - [ ] **[B-0152](backlog/P2/B-0152-topological-quantum-emulation-via-bayesian-inference-zeta-seed-executor-aaron-2026-05-01.md)** Topological quantum emulation via Bayesian inference in Zeta seed executor ## P3 — convenience / deferred diff --git a/docs/backlog/P2/B-0147-timeseries-db-native-in-zsets-multi-dsl-integration-research-aaron-2026-05-01.md b/docs/backlog/P2/B-0147-timeseries-db-native-in-zsets-multi-dsl-integration-research-aaron-2026-05-01.md new file mode 100644 index 000000000..48bb82420 --- /dev/null +++ b/docs/backlog/P2/B-0147-timeseries-db-native-in-zsets-multi-dsl-integration-research-aaron-2026-05-01.md @@ -0,0 +1,493 @@ +--- +id: B-0147 +priority: P2 +status: open +title: TimeSeries DB native-in-Zsets multi-DSL integration research (metrics-are-our-eyes) +created: 2026-05-01 +last_updated: 2026-05-01 +--- + +# B-0147 — TimeSeries DB native-in-Zsets multi-DSL integration research + +## What + +Domain research to identify the candidate timeseries-DB +technology that integrates natively into the Zset substrate +alongside the other first-class types (graph, hierarchy, +filesystem, etc.) via a unified meta-DSL. Output: a design +document with candidate evaluation, dependency-source-priority +filter applied, recommended approach, and concrete next steps. + +**This is not a "pick a TSDB and use it" task.** It is research +toward the multi-algebra-DB vision where timeseries is one +algebra among many, all composable through the meta-DSL. The +research output is the *design*, not the implementation. + +## Why now + +Aaron 2026-05-01: + +> *"back log timeseries db domean reserach i know prometheus, +> that's our good citizen dependency candidate but there may be +> better more modern more integrated but pro not... we want it +> native in the zsets with meta dsl multi dsl integration like +> the others types, ,graph, hierarchy, filesystem, etc..."* +> +> *"that's for all the metrics that's the connection it's not +> just for fun, it's our eyes"* + +The metrics-are-our-eyes framing (per +`feedback_dependency_source_priority_open_source_microsoft_cncf_apache_mit_research_microsoft_research_metrics_are_our_eyes_aaron_2026_05_01.md`) +makes the timeseries-DB infrastructure load-bearing for the +factory's self-perception. Without it, the SRE metric frameworks +(DORA/USE/RED/Four Golden Signals, per PR #1116) have nowhere +to land their reproducibly-measured outputs over time. Metrics +without time-series persistence is a snapshot; metrics with +time-series persistence is a fitness function. + +## Acceptance criteria + +1. **Candidate landscape** — produce a research doc at + `docs/research/2026-XX-timeseries-db-candidate-landscape.md` + covering at least: + + - **Prometheus** (Aaron's known good-citizen baseline; CNCF + graduated; pull-based; PromQL; widely-deployed) + - **TimescaleDB** (PostgreSQL extension; SQL-native; + time-partitioned; verify current license tier per Otto-364 + before depending on it — the project has had license + re-tiering in recent history) + - **InfluxDB** (line-protocol; Flux/InfluxQL; verify current + license tier — OSS vs commercial — for proprietary-filter) + - **VictoriaMetrics** (verify current license; Prometheus- + compatible API; high cardinality) + - **Microsoft Research candidates** — search MSR for + timeseries-DB primitives or any precursor research + (Aaron 2026-05-01 dependency-priority rule) + - **OpenTelemetry metrics backends** (CNCF; portable + metrics protocol; pluggable backends) + - Any tier-1–5 candidates surfaced via WebSearch per Otto-364 + +2. **Dependency-source-priority filter applied.** Each candidate + classified into the tier hierarchy: + - Tier 1: Open Source (general) + - Tier 2: Microsoft OSS + - Tier 3: CNCF + - Tier 4: Apache + - Tier 5: MIT-licensed + - REJECTED: proprietary (exclude regardless of feature fit) + +3. **Algebra-fit analysis.** For the top 3 candidates, document: + - Data model (what's the primary type?) + - Retraction support (the 4-axis tightness rule for + graph-substrate / multi-type algebras: ZSet-backed + + first-class event + retractable + columnar storage; per + the indexed memory file at the head of `memory/MEMORY.md`) + - Query language semantics + - Mapping to ZSet algebra — does it compose, or does it + require an adapter layer? + +4. **Meta-DSL integration sketch.** A short section in the + design doc proposing how the chosen timeseries algebra + plugs into the factory's existing meta-DSL alongside graph + + hierarchy + filesystem types. Doesn't need to be the + final design; it needs to be a concrete sketch. + +5. **Recommended approach.** Pick one of: + - **Adopt and integrate** — chosen candidate is a clean fit; + wire it in as a Zset-backed algebra + - **Adopt-with-adapter** — chosen candidate is good but + needs an adapter layer; document the adapter shape + - **Build native** — no candidate is good enough; design a + ZSet-native timeseries algebra from scratch (likely if + retraction-native semantics aren't supported by any + existing TSDB) + - **Defer** — candidates are evolving fast; revisit in N + rounds with PM-2 forward-research input (B-0145) + +6. **Next-step backlog rows filed.** Whatever the recommendation + is, the follow-up actions become discrete B-rows (e.g., if + "build native" then B-NNNN for the native implementation). + +## Research-cadence inputs (per dependency-priority memory) + +When researching this, prioritize sources in this order: + +1. **Microsoft Research** (https://www.microsoft.com/en-us/research/) — search for + timeseries-DB primitives, retraction-native datalog, + incremental view maintenance over time-streamed data +2. **CNCF projects** (cncf.io) — Prometheus, OpenTelemetry, + adjacent observability work +3. **Apache projects** — Druid, Cassandra time-series usage, + Flink + state-store patterns +4. **MIT-licensed academic / industry papers** — VLDB / SIGMOD + / ICDE proceedings on incremental computation over time +5. **Other academic** — DBSP itself comes from this lineage; + recent DBSP-adjacent papers may have timeseries extensions + +Per Otto-364 search-first authority — verify every load-bearing +claim against current upstream docs / papers / project pages, +not training data. + +## Design constraints — Aaron 2026-05-01 follow-up + +When the recommendation lands on "Build native" (or "Adopt-with- +adapter" with substantial native augmentation), the design must +satisfy these constraints Aaron named explicitly: + +> *"cardinalty matters a lot for prometheus or at least it did +> becasue of they way they are uber columnar store if i +> remember right they are relying on reduced dimensionaly +> within lables. we can avoid those same drawbacks in our +> implmentation, CRDT multi mode or whatever you call it will +> be paramount. formal math specifican"* +> +> *"or timeseries"* + +### Constraint 1 — High cardinality must be first-class (without disrespecting Prometheus's structural reasons) + +Aaron 2026-05-01 follow-up clarification: + +> *"but the they do need small cardinailty"* + +**Prometheus's small-cardinality constraint is structural, +not accidental.** Their columnar storage layout (uber-efficient +for the bounded-cardinality common case) is a deliberate +design choice with a clear performance contract. Operators +who follow the cardinality discipline get excellent +performance; operators who violate it get exactly what the +design predicts. *This is not a Prometheus bug — it is a +Prometheus design.* + +The factory's stance: **Prometheus IS Tier 3 + the right +operational starting point** (per B-0149) because for the +factory's *own* metrics (tick rate, PR-cycle latency, +per-persona dispatch counts, Aaron-correction rate), the +cardinality stays bounded — these metrics fit Prometheus's +design contract cleanly. + +Zeta's *long-term native timeseries algebra* (this row's +"Build native" recommendation path) targets a *different* +contract: high-cardinality dimensions as first-class. This +is not a critique of Prometheus; it is a different design +point in the same problem space. + +**Open research question** — Aaron 2026-05-01: *"maybe we need +both shapes IDK, research probably."* Zeta's timeseries +algebra may need to support BOTH shapes: + +- **Small-cardinality optimized path** — the Prometheus-style + fast-path for bounded-cardinality factory metrics + (tick rate, PR-cycle latency, per-persona dispatch counts) +- **High-cardinality first-class path** — the Aurora-side path + for multi-master with per-event unique IDs (sessions, + users, requests, claims) + +Possible architectures to research: + +1. **Cardinality-adaptive storage** — single algebra, automatically + chooses storage layout per dimension based on observed + cardinality at write time +2. **Multi-mode algebra** — operator declares the mode per + timeseries (small-card fast vs high-card first-class); + storage paths differ but algebra surface stays unified +3. **Hybrid layered** — small-cardinality data goes to a + Prometheus-like backing store; high-cardinality data goes + to a different backing store; the algebra unifies query +4. **Single high-cardinality-first** — accept the small-card + tax to get high-card without compromise; benchmark whether + the tax is acceptable + +Each option pays differently on storage cost, query +performance, complexity of the algebra surface, and CRDT +semantics. The research lane (this row) must investigate at +least these four options before recommending. + +**Prior on algebra-surface-complexity weighting.** Aaron +2026-05-01: *"complexity of the algebra surface, i'm not too +worried about this one because we have all the formal +verification"* + *"a little bit"*. The factory's formal- +verification investment (B-0134 / B-0133 / B-0135 / B-0137 / +B-0142) mechanically tames algebra complexity — invariants +are proved at compile-time / build-time / verification-pass +rather than depended-on at review-time. Result: **algebra- +surface complexity carries less cost in the factory than it +would for a typical project**. The research should NOT +over-weight this dimension; sophisticated algebras that would +be rejected elsewhere as "too complex" remain viable here as +long as they are formally specified. + +This is a deliberate **non-Pareto choice** the factory makes +explicitly: pay more upfront on formal-spec investment to buy +more headroom on algebra-complexity. Composes with the +amortized-keystone (per +`feedback_parallelism_scaling_ladder_kenji_unlocked_loop_agent_doc_code_two_lane_file_isolation_peer_mode_claims_automated_best_practice_at_scale_aaron_2026_05_01.md`, +forward-ref to PR #1116) — the formal-spec cost is paid once +and reaped N times across every algebra change. + +Beyond this open question, Zeta SHOULD: + +- Treat label cardinality as a first-class parameter, not an + implicit assumption +- Choose a storage layout that does not penalize high-cardinality + dimensions (likely *not* a Prometheus-style flat columnar + store; possibly a hybrid with cardinality-adaptive indexing) +- Document the cardinality-vs-performance tradeoff explicitly + so operators can reason about it +- Compose with the ZSet algebra such that cardinality changes + retraction-natively (adding a new label-value doesn't + invalidate prior data) + +### Constraint 2 — CRDT multi-mode is paramount + +Aaron's *"CRDT multi mode or whatever you call it will be +paramount"* — applied specifically to timeseries (his +clarification: *"or timeseries"*). + +**CRDT (Conflict-free Replicated Data Type)** semantics are +load-bearing for multi-master / Byzantine-fault-tolerant +operation per `feedback_ai_never_without_human_who_understands_both_ai_and_earth_technology_aaron_2026_05_01.md` +(BFT-many-masters / no-single-head). For timeseries: + +- **Multi-master writes** must converge to a consistent state + without coordination (CRDT property: commutative, associative, + idempotent merge) +- **Time-correlated CRDT modes** likely needed: counter (for + monotonic measurements), gauge / LWW-register (for state + measurements), set (for label sets), G-counter / PN-counter + patterns adapted for timeseries +- **Multi-mode** ≈ different CRDT primitives composing within + the same timeseries (not all metrics fit one CRDT shape; + the framework must support multiple) +- **Composes with the BFT-many-masters architecture** — without + CRDT semantics, multi-master timeseries devolves to last-write- + wins (single-head failure mode at the data layer) + +Research alongside MDX-as-meta-DSL (B-0148) — the meta-DSL +framing must compose with CRDT semantics, not constrain them +out. + +### Constraint 3 — Formal math specification + +Aaron's *"formal math specifican"* — implementation must have +a formal mathematical specification, likely in TLA+ / Lean / +F# refinement types / Coq / Isabelle. + +**Composes with** the formal-foundations layer-2 work already +filed: + +- B-0134 (type-theoretic orthogonality discipline) +- B-0133 (sequent calculus for claim retraction) +- B-0135 (modal logic for retractability) +- B-0137 (Tarski stratification proof) +- B-0142 (Code Contracts revival) + +The formal spec must cover at minimum: + +- Algebra correctness — operations preserve ZSet invariants +- CRDT convergence — commutativity / associativity / idempotency +- Retraction-native semantics — every insert has a matching + retract operation; spec proves the duality +- Cardinality-adaptive storage — performance bounds as a + function of cardinality + retention +- Time-monotonicity — under what relabeling / re-ordering does + the algebra preserve time-causal ordering? + +The formal-verification expert (Soraya, per +`docs/CONFLICT-RESOLUTION.md`) routes which tool fits each +property class. Likely portfolio: TLA+ for distributed CRDT +properties; F# refinement types for algebra correctness; +Lean / Coq for the retraction-native duality proof. + +## Research methodology — Pareto-improvement framing + +Aaron 2026-05-01 (load-bearing research-spine question): + +> *"why did they make the tradeoff and can we make a differnt +> one that gives us better properties without loosing good +> properties"* + +This is the **Pareto-improvement-or-bust** discipline that +governs the entire research lane. Before recommending a +different design point, the research must: + +### Step 1 — Understand WHY they made the tradeoff + +For each candidate (Prometheus / TimescaleDB / InfluxDB / +VictoriaMetrics / etc.), document: + +- **Constraints they were optimizing for** — what hardware + envelope, what query patterns, what operational model +- **Properties they prioritized** — what they got right (and + why those properties are valuable) +- **Properties they accepted as costs** — what they sacrificed + (and whether that sacrifice was load-bearing or incidental) +- **Era / context** — when was the design made; what + alternatives existed; what was the state of CRDT research, + columnar storage, hardware + +For Prometheus specifically (the immediate worked example): + +- WHY small-cardinality? Memory-resident inverted index + performance; predictable scrape-and-query cycle; operational + simplicity for the common monitoring case +- WHY pull-based? Decoupled service health from monitoring + health; trivial to add a target; works in Kubernetes + service-discovery model +- WHY no schema? Labels are arbitrary; no migration burden; + composable across services + +### Step 2 — Identify the Pareto frontier + +Once each candidate's design is understood, map the **Pareto +frontier** — the set of candidates where no candidate +dominates (better-on-everything-than) another. Each Pareto- +optimal candidate is a defensible choice for some workload; +each non-Pareto-optimal candidate is dominated and not worth +adopting. + +### Step 3 — Look for Pareto-superior alternatives + +The research's load-bearing question: *can we design a point +that gives us better properties without losing good +properties?* Specifically: + +- Can we have **high-cardinality first-class** WITHOUT losing + Prometheus's pull-based simplicity? +- Can we have **CRDT multi-mode** WITHOUT losing the + retention-cost-efficiency Prometheus achieves? +- Can we have a **formal math specification** WITHOUT losing + the operational ergonomics? +- Can we have **multi-DSL meta-DSL composability** WITHOUT + losing per-DSL optimization opportunities? + +For each "yes" answer, document the design move that achieves +it. For each "no" answer, document the structural reason — that +becomes the *unavoidable tradeoff* the design must own +explicitly. + +### Step 4 — Recommend with explicit tradeoffs named + +The recommendation (per acceptance criterion #5) must name: + +- What properties the chosen design **gains** over the + alternatives +- What properties it **preserves** from the alternatives' + strengths +- What properties it **explicitly sacrifices** and why + that sacrifice is acceptable (or not, if the design is + dominated by another) +- Whether the chosen point is **on the Pareto frontier** or + is **a deliberate non-Pareto choice** for reasons (e.g., + alignment with broader Zeta architecture) + +### Why this methodology matters + +Without the Pareto-improvement framing, the research devolves +into "different is better" or "newer is better" — both wrong. +The mature stance: *every tradeoff is a tradeoff for +reasons; the research finds reasons to do better, not reasons +to do different.* + +This composes with: + +- `feedback_aaron_pirate_not_priest_expand_prune_pedagogical_framework_quantum_rodney_razor_parallel_worlds_aaron_2026_05_01.md` + — pirate-not-priest disposition: Prometheus doesn't get a + free pass for being established; nor does it get critiqued + for being established. The razor applies impartially. +- `feedback_orthogonal_axes_factory_hygiene.md` — design + rules sit on orthogonal axes; understanding which axes + matter for which constraint is precondition to Pareto + analysis +- B-0135 (modal logic for retractability) — tradeoffs are + retractable design moves; modal-logic gives the formal + vocabulary for "in this design point, X holds; in that + design point, Y holds" +- The `research-grade-not-operational` discipline (from + GOVERNANCE §33 archive-header convention) — this row is + research, not implementation; the recommendation lands + with explicit tradeoffs named, not with a hidden + assumption that "newer = better" + +The carved sentence: *"Every tradeoff is a tradeoff for +reasons. Find better, not different."* + +## Out of scope (defer) + +- **Implementation.** This is a research B-row. Implementation + is the recommendation's follow-up B-row(s). +- **Performance benchmarks.** Benchmarking against the harness + (per reproducibility-first / B-0144 work) is a separate + follow-up. Research first; measure later. +- **The other algebras (graph / hierarchy / filesystem).** Each + has its own substrate and may already be partly designed. + This row scopes only to timeseries; sibling rows can cover + the others if they're not already covered elsewhere. +- **Vendor-relationship management.** "Good-citizen" relationships + with the chosen project's maintainers (per + `feedback_absorb_and_contribute_community_dependency_discipline_2026_04_22.md`) + is operational follow-up after the choice is made. + +## Composes with + +- `feedback_dependency_source_priority_open_source_microsoft_cncf_apache_mit_research_microsoft_research_metrics_are_our_eyes_aaron_2026_05_01.md` + — the substrate this row instantiates +- `feedback_reproducible_accuracy_before_quality_fitness_function_harness_first_aaron_2026_05_01.md` + (forward-ref to PR #1116) — the SRE metric frameworks + (DORA/USE/RED/FGS) whose timeseries persistence this row + enables +- `feedback_parallelism_scaling_ladder_kenji_unlocked_loop_agent_doc_code_two_lane_file_isolation_peer_mode_claims_automated_best_practice_at_scale_aaron_2026_05_01.md` + (forward-ref to PR #1116) — the amortized-keystone that + timeseries-persisted metrics enable; the rung-4 lessons- + mechanization that observability enables +- `project_zeta_multi_algebra_database_one_algebra_to_rule_them_all_sequenced_after_frontier_and_demo_2026_04_23.md` + — the multi-algebra DB vision; this row is one of the + algebras +- The 4-axis tightness rule (ZSet-backed + first-class event + + retractable + columnar storage) that applies to ALL multi- + type algebras under the meta-DSL, including timeseries — + see the indexed graph-substrate-tight memory entry near the + head of `memory/MEMORY.md` (file itself is pending merge in + a sibling branch) +- `feedback_absorb_and_contribute_community_dependency_discipline_2026_04_22.md` + — the discipline for the chosen project's relationship +- `feedback_otto_364_search_first_authority_not_training_data_not_project_memory_aaron_2026_04_29.md` + — research must be search-first, not training-data-recall +- B-0145 (PM-2 role) — proactive research sources include + Microsoft Research; this row's research-cadence-inputs section + is a worked example of PM-2's research discipline +- B-0146 (formal architecture ladder) — when this row's + recommendation lands, the follow-up implementation rows + should declare their layer (likely Layer 5: reproducibility + harness) + +## Layer (per B-0146) + +**Layer 5: Reproducibility harness.** The timeseries-DB is the +substrate that makes metrics persist over time, which is what +makes the SRE metric frameworks operational. Layer 5 sits above +Layer 4 (domain metric frameworks) and feeds Layer 6 (accuracy). + +## Effort + +**L (large, 3+ days, research-grade)** for the full landscape +analysis + algebra-fit + meta-DSL integration sketch + +recommendation. The implementation follow-up B-rows will each +be their own effort estimates. + +## Why P2 (not P0 / not P1 / not P3) + +- **Not P0** because the factory functions today without + timeseries persistence (metrics are computed and observed + per-tick; trend-analysis is informal). +- **Not P1** because B-0144 (doc/code two-lane) and B-0145 + (PM-2 role) come first in the throughput + direction axes; + observability infrastructure compounds value but doesn't + block the next throughput unlock. +- **Not P3** because the metrics-are-our-eyes framing makes + this load-bearing once the factory operates at any scale + worth measuring; deferring indefinitely accumulates blind- + operation cost. +- **P2** sits in the right place — important enough that + research lands soon; not so urgent that it preempts the + in-flight throughput / role / mechanization work. diff --git a/docs/backlog/P2/B-0148-mdx-as-meta-dsl-framing-fsharp-mdx-dsl-implementation-aaron-2026-05-01.md b/docs/backlog/P2/B-0148-mdx-as-meta-dsl-framing-fsharp-mdx-dsl-implementation-aaron-2026-05-01.md new file mode 100644 index 000000000..559e4e9f5 --- /dev/null +++ b/docs/backlog/P2/B-0148-mdx-as-meta-dsl-framing-fsharp-mdx-dsl-implementation-aaron-2026-05-01.md @@ -0,0 +1,213 @@ +--- +id: B-0148 +priority: P2 +status: open +title: MDX as meta-DSL framing for multi-DSL Zset substrate + F# MDX DSL implementation +created: 2026-05-01 +last_updated: 2026-05-01 +--- + +# B-0148 — MDX as meta-DSL framing + F# MDX DSL implementation + +## What + +Evaluate **MDX (Multidimensional Expressions)** — the +Microsoft-published OLAP query language used in SQL Server +Analysis Services and many BI tools — as the **meta-DSL +framing** for the multi-DSL Zset substrate (per Aaron's +multi-algebra-DB vision: graph + hierarchy + filesystem + +timeseries + ... unified through a single meta-DSL). + +If MDX fits as the meta-DSL shape, design and implement an +**F# MDX DSL** that natively hosts MDX-style queries against +the Zset algebra, with the existing types (graph, hierarchy, +filesystem, timeseries) appearing as MDX dimensions / +hierarchies / measures. + +## Why now + +Aaron 2026-05-01 (composing two messages): + +> *"plus promethius as a sick MCP and promtool and you'll love +> the query language its like simplifed multidimensonal query +> language MDX, oh shit backlog f# mdx dsl"* +> +> *"that's might be meta dsl framing"* + +Aaron's recognition: **PromQL** (Prometheus Query Language) is +**MDX-shaped** — both are multidimensional-first query +languages with dimensions / hierarchies / measures / tuples / +sets. If PromQL composes naturally from MDX primitives, then +**MDX may be the right shape for the meta-DSL** that unifies +graph + hierarchy + filesystem + timeseries + future types +under the Zset substrate. + +This composes directly with B-0147 (timeseries-DB +native-in-Zsets) — that row asks *what is the timeseries +algebra?*; this row asks *what is the meta-DSL that hosts the +timeseries algebra alongside the others?*. Both questions +need answers; they may share a candidate. + +## MDX background — why it might fit + +**MDX core concepts**: + +- **Cubes** — multidimensional data containers (≈ Zset of + tuples) +- **Dimensions** — axes of categorization (graph nodes, + hierarchy levels, filesystem paths, time) +- **Hierarchies** — ordered nested levels within a dimension + (filesystem trees, organizational charts, time periods) +- **Members** — elements within a hierarchy level (specific + nodes, specific paths, specific timestamps) +- **Measures** — numeric quantities computed over the cube + (counts, sums, ratios) +- **Tuples** — coordinates in multidimensional space +- **Sets** — collections of tuples +- **Calculated members** — derived measures + +**MDX strengths for the meta-DSL role**: + +- **Microsoft-published spec** (per + `feedback_dependency_source_priority_open_source_microsoft_cncf_apache_mit_research_microsoft_research_metrics_are_our_eyes_aaron_2026_05_01.md` + Tier 2 priority); not proprietary +- **First-class hierarchies** — directly maps to Aaron's named + types +- **Multidimensional from the start** — graph + hierarchy + + filesystem + timeseries are all dimensions; queries naturally + span dimensions +- **Mature semantics** — 25+ years of OLAP usage; well- + understood evaluation +- **Already PromQL-shaped** — per Aaron's recognition, + observability queries already fit +- **Compositional** — measures can be calculated from other + measures; queries can be parameterized + +**MDX weaknesses to investigate**: + +- **OLAP-cube-shaped** — designed for fact-and-dimension data; + may need adaptation for graph traversal +- **Read-only history** — MDX is query, not update; the + Zset retraction-native semantics need to compose with + MDX-as-query rather than be expressed in MDX itself +- **String-heavy syntax** — MDX is very string-y; + F# host should produce well-typed AST not stringly-typed + query +- **Possible verbosity** — MDX queries can be long; an F# DSL + embedding might be more concise than literal MDX + +## Acceptance criteria + +1. **Fit-analysis design doc** at + `docs/research/2026-XX-mdx-as-meta-dsl-fit-analysis.md` + answering: + - Does MDX's dimension/hierarchy/measure shape match + Aaron's named types (graph / hierarchy / filesystem / + timeseries)? Worked example for each. + - How does MDX compose with the Zset retraction-native + semantics? (Mathematical analysis; possibly involves + extending MDX with retraction operators.) + - How does PromQL specifically map to MDX? (Worked + example: a real PromQL query translated to MDX form.) + - Are there alternative meta-DSL candidates that fit + better? (e.g., Datalog, GraphQL, SPARQL, custom F# + DSL.) Evaluate at least 3. + +2. **Recommendation**: + - **Adopt MDX as meta-DSL** — proceed to F# DSL design + - **Adopt MDX-with-extensions** — proceed with documented + extensions + - **Reject MDX, pick alternative** — document why; pivot + to alternative + - **Defer** — the question is premature; revisit after + B-0147 lands + +3. **If adopt or adopt-with-extensions**: F# MDX DSL design + sketch at + `docs/research/2026-XX-fsharp-mdx-dsl-design.md` covering: + - AST shape (well-typed, not stringly-typed) + - Embedding style — quotation-based vs computation- + expression-based vs combinator-library-based + - Query-evaluation strategy — translate to underlying + algebras vs unified evaluation engine + - Type-system mapping — how MDX dimensions/measures get + F# types + - Worked examples — at least 3 queries spanning multiple + types (graph + timeseries + filesystem) + +4. **Implementation follow-up rows filed** for each major + step of the F# MDX DSL build (parser, AST, type-checker, + evaluator, integration tests). + +## Research-cadence inputs + +Per the dependency-source priority + Microsoft-Research-as- +preferred-research-source memory: + +1. **Microsoft Research** (https://www.microsoft.com/en-us/research/) — search + for MDX evaluation semantics, OLAP query optimization, + F# DSL design papers (Don Syme + collaborators) +2. **MDX official spec** (Microsoft docs) — the canonical + reference +3. **PromQL docs** (CNCF Prometheus) — the worked-example + target +4. **F# DSL design literature** — `Computation Expressions`, + `Quotations`, FSharp.Charting / Deedle / Math.NET as + examples of mature F# DSL embedding +5. **Datalog research** — alternative meta-DSL candidate; + has rich academic literature +6. **GraphQL** — alternative; CNCF-adjacent ecosystem + +Per Otto-364 search-first: verify every load-bearing claim +against current docs/papers, not training data. + +## Out of scope (defer) + +- **Implementation.** This row is research + design. + Implementation lands in follow-up rows. +- **Performance benchmarks.** Comes after design lands. +- **Backwards-compatibility with existing Zset query API.** + Whatever exists today; whether the F# MDX DSL replaces it + or composes with it is a design decision in the analysis. + +## Composes with + +- `feedback_dependency_source_priority_open_source_microsoft_cncf_apache_mit_research_microsoft_research_metrics_are_our_eyes_aaron_2026_05_01.md` + — MDX is Microsoft-published (Tier 2); the dependency-priority + hierarchy applies +- `project_zeta_multi_algebra_database_one_algebra_to_rule_them_all_sequenced_after_frontier_and_demo_2026_04_23.md` + — the multi-algebra DB vision MDX is being evaluated as + meta-DSL for +- B-0147 — the timeseries-DB row that motivates this row; + PromQL/MDX-shape is the bridge +- B-0149 (Prometheus MCP) — sibling research lane; informs + the PromQL-as-MDX worked example +- B-0146 (formal architecture ladder) — when the design lands, + declare layer (likely Layer 3: class taxonomy, since + meta-DSL is a pattern catalog instantiating type-theoretic + primitives) +- The 4-axis tightness rule (ZSet-backed + first-class event + + retractable + columnar storage) per the indexed graph- + substrate-tight memory entry in `memory/MEMORY.md`; MDX must + compose with retraction-native to satisfy +- F# DSL design lineage — Don Syme's research (Microsoft Research, + Tier 2 + Microsoft-Research-preferred citation per the + dependency-priority memory) + +## Effort + +**L (large, 3+ days, research-grade)** for the fit-analysis +doc + design sketch. F# MDX DSL implementation is open-ended +(multiple follow-up rows). + +## Why P2 + +- **Not P0/P1** because the meta-DSL design isn't blocking + current factory operation; queries use whatever ad-hoc + shape exists today. +- **Not P3** because if MDX IS the right meta-DSL framing, + the cost of operating without it scales — every additional + type added under the multi-algebra vision will face + meta-DSL friction until it lands. +- **P2** sits where the research is important-but-not-urgent; + lands when bandwidth permits. diff --git a/docs/backlog/P2/B-0149-prometheus-mcp-integration-promtool-factory-agents-direct-query-aaron-2026-05-01.md b/docs/backlog/P2/B-0149-prometheus-mcp-integration-promtool-factory-agents-direct-query-aaron-2026-05-01.md new file mode 100644 index 000000000..3c38248ee --- /dev/null +++ b/docs/backlog/P2/B-0149-prometheus-mcp-integration-promtool-factory-agents-direct-query-aaron-2026-05-01.md @@ -0,0 +1,169 @@ +--- +id: B-0149 +priority: P2 +status: open +title: Prometheus MCP integration + promtool — factory agents direct-query observability +created: 2026-05-01 +last_updated: 2026-05-01 +--- + +# B-0149 — Prometheus MCP integration + promtool + +## What + +Wire **Prometheus** into the factory as an MCP (Model Context +Protocol) server so factory agents can directly query +observability data via PromQL. Also adopt **promtool** (the +Prometheus CLI) for ad-hoc query / config-validation use. + +## Why now + +Aaron 2026-05-01: + +> *"plus promethius as a sick MCP and promtool and you'll love +> the query language its like simplifed multidimensonal query +> language MDX"* + +This is the **operational counterpart** to B-0147 (timeseries-DB +research) and B-0148 (MDX as meta-DSL). While B-0147 / B-0148 +research the *long-term* substrate question (which timeseries +DB? which meta-DSL?), this row makes Prometheus *immediately +usable* as a factory observability surface. + +Per the metrics-are-our-eyes framing (per +`feedback_dependency_source_priority_open_source_microsoft_cncf_apache_mit_research_microsoft_research_metrics_are_our_eyes_aaron_2026_05_01.md`), +the timeseries-channel is additive sensory capacity. +Prometheus plus MCP is the lowest-friction path to *getting +eyes operational NOW*, while the research about the optimal +long-term substrate proceeds in parallel. + +## Acceptance criteria + +1. **Prometheus deployment** — local Prometheus instance + scraping factory metrics (initial scope: tick-history + aggregations + PR-pipeline metrics + per-persona dispatch + counts). Configuration in `tools/observability/prometheus/`. + +2. **MCP server integration** — Prometheus exposed as an MCP + server consumable by Claude Code agents. Configuration in + `.mcp.json` (or wherever harness MCP config lives). + +3. **promtool wired into factory tooling** — `tools/observability/ + promtool/` wraps `promtool` for: + - Query validation (check PromQL syntax before storing + queries) + - Rule-file linting (ensure recording-rules / alerting- + rules are well-formed) + - Ad-hoc query execution from CLI + +4. **Initial query catalog** at + `tools/observability/queries/factory.promql` covering at + minimum: + - Tick rate over time (DORA-style deployment frequency) + - PR-cycle latency p50 / p95 (RED-style duration) + - Per-persona dispatch counts (USE-style utilization + proxy) + - Aaron-correction rate (Four Golden Signals + errors-style) + +5. **Documentation** at `tools/observability/README.md` + covering: + - How to start local Prometheus + - How agents query via MCP + - How to add new metrics (factory-side instrumentation) + - How promtool is used in the loop + +6. **Initial dashboard** (optional, ergonomics-only) — a + simple Prometheus-native UI / Grafana dashboard exposing + the SRE metric framework views (DORA / USE / RED / Four + Golden Signals) per + `feedback_reproducible_accuracy_before_quality_fitness_function_harness_first_aaron_2026_05_01.md` + (forward-ref to PR #1116). + +## Why Prometheus first (per Aaron's "good citizen" framing) + +Aaron 2026-05-01 (earlier message): + +> *"i know prometheus, that's our good citizen dependency +> candidate"* + +Prometheus is Aaron's known-quantity dependency: + +- **CNCF graduated** (Tier 3 per the dependency-source priority + hierarchy) +- **Apache 2.0 licensed** +- **Mature ecosystem** — promtool, alertmanager, exporters, + Grafana integration +- **Well-understood operational characteristics** — pull-based + scrape, time-series-native, label-cardinality-careful +- **PromQL is MDX-shaped** — composes with the meta-DSL + research line (B-0148) + +Even if B-0147's research recommends a *different* long-term +timeseries DB, Prometheus is the right *starting point* +because: + +1. It exists today, deployable in minutes +2. The dependency-priority hierarchy passes it (Tier 3) +3. Its query language is already MDX-shaped (informs B-0148) +4. Migration to a different backend later is well-understood + (OpenTelemetry-style portable metrics protocol; many + Prometheus-compatible backends) + +## Out of scope (defer) + +- **Long-term backend choice.** B-0147 owns that question. + This row instantiates Prometheus *now*; substrate-level + decisions can revise later. +- **Production deployment.** Initial scope is local-dev / + loop-runner consumption. Production observability stack + (HA Prometheus, persistent storage, alerting routes) is + follow-up. +- **Custom exporters.** Use existing exporters (Node, GitHub, + PR-board) where possible. Custom exporter for factory- + specific metrics is follow-up if the standard ones don't + cover the needs. +- **PromQL → MDX translation.** B-0148's worked-example + exercise; this row only consumes PromQL natively. + +## Composes with + +- `feedback_dependency_source_priority_open_source_microsoft_cncf_apache_mit_research_microsoft_research_metrics_are_our_eyes_aaron_2026_05_01.md` + — the substrate this row instantiates; Prometheus = Tier 3 + (CNCF graduated) +- `feedback_reproducible_accuracy_before_quality_fitness_function_harness_first_aaron_2026_05_01.md` + (forward-ref to PR #1116) + (PR #1116) — SRE metric frameworks (DORA/USE/RED/FGS) the + initial query catalog targets +- B-0147 — long-term timeseries-DB research; this row is the + *immediate practice* counterpart +- B-0148 — MDX-as-meta-DSL research; PromQL is the worked + example that motivates the MDX framing +- `feedback_absorb_and_contribute_community_dependency_discipline_2026_04_22.md` + — Prometheus is a dependency we will absorb AND contribute + back to (any rough edges encountered → upstream issues / PRs) +- B-0146 (formal architecture ladder) — Layer 5 (reproducibility + harness) + +## Layer (per B-0146) + +**Layer 5: Reproducibility harness.** Prometheus is the +substrate that persists metrics over time, making the +SRE metric frameworks operationally measurable. + +## Effort + +**M (medium, 1–3 days)** for initial deployment + MCP +integration + initial query catalog + docs. Adding more +metrics + tuning is open-ended follow-up. + +## Why P2 + +- **Not P0/P1** because the factory operates today without + Prometheus; metrics are computed informally per-tick. +- **Not P3** because the metrics-are-our-eyes framing makes + observability load-bearing once the parallelism scaling + ladder operates at any scale (B-0144 doc/code two-lane + → file-isolation → peer-mode-claims). +- **P2** lands when bandwidth permits; the cost of + operating-blind compounds the longer it's deferred. diff --git a/docs/backlog/P2/B-0150-timeseries-domain-expert-and-teacher-persona-aaron-2026-05-01.md b/docs/backlog/P2/B-0150-timeseries-domain-expert-and-teacher-persona-aaron-2026-05-01.md new file mode 100644 index 000000000..9899ab9c3 --- /dev/null +++ b/docs/backlog/P2/B-0150-timeseries-domain-expert-and-teacher-persona-aaron-2026-05-01.md @@ -0,0 +1,152 @@ +--- +id: B-0150 +priority: P2 +status: open +title: TimeSeries / observability domain expert + teacher persona +created: 2026-05-01 +last_updated: 2026-05-01 +--- + +# B-0150 — TimeSeries / observability domain expert + teacher persona + +## What + +Define a **timeseries-DB / observability domain expert** persona +for the factory persona roster, paired with a **teacher +capability** for the same domain. Same shape as task #351 +(TS+Bun expert + teaching skill) and task #323 (per-tool/language +expert skills) — applied to the timeseries-DB / observability +domain. + +## Why now + +Aaron 2026-05-01: + +> *"but the they do need small cardinailty , we need domain +> expers and teacher too"* + +The B-0147 / B-0148 / B-0149 research lines (timeseries-DB +candidate landscape, MDX-as-meta-DSL evaluation, Prometheus MCP +integration) all need **deep domain expertise** to run well. +The factory's persona roster has experts for many areas +(architect, security-researcher, performance-engineer, etc.) +but no dedicated timeseries-DB / observability domain expert. + +Aaron's *"and teacher too"* — the persona must wear both hats: +**expert** (does the work) AND **teacher** (explains the work +to the rest of the factory + future-Otto + new contributors). +This composes with the broader factory pattern of expert+teacher +skills (per task #323 + task #351). + +## Scope (what the persona owns) + +1. **Timeseries-DB landscape expertise** — Prometheus / TimescaleDB / + InfluxDB / VictoriaMetrics / OpenTelemetry / Microsoft Research + timeseries primitives. Live-search-anchored per Otto-364: + landscape evolves; expert tracks current state, not 2026-Jan + training-data snapshot. + +2. **Observability metric framework expertise** — DORA / USE / + RED / Four Golden Signals (per + `feedback_reproducible_accuracy_before_quality_fitness_function_harness_first_aaron_2026_05_01.md`). + How to design metrics; how to instrument; what to query. + +3. **CRDT-for-timeseries expertise** — composes with the design + constraints in B-0147 (CRDT multi-mode is paramount). The + expert tracks CRDT research applied to timeseries (counter, + gauge, LWW-register patterns). + +4. **Cardinality-vs-performance tradeoff expertise** — the + structural reason Prometheus chose small-cardinality (per + Aaron's *"they do need small cardinailty"*); the alternative + designs that pay differently; when each design point fits. + +5. **PromQL / MDX / query-language shape expertise** — + composes with B-0148 (MDX-as-meta-DSL). The expert + advises on query-language design; teaches the factory why + PromQL is MDX-shaped; informs the F# MDX DSL design. + +## Teacher hat — what the persona produces + +The teacher capability produces: + +- **Explainer docs** — at `docs/teaching/observability/` covering + the SRE metric frameworks, the cardinality tradeoff, + CRDT-for-timeseries, query-language shape comparison +- **Worked examples** — concrete queries, concrete metrics + designs, concrete CRDT implementations, with explanations +- **Conceptual maps** — visual + textual maps showing how + the timeseries-DB domain composes with the rest of the + factory (Zset substrate, multi-DSL meta-DSL, abstraction + ladder) +- **Glossary contributions** — new domain terms get + GLOSSARY entries with explanation, references, and + composition notes +- **Onboarding paths** — *"if you're new to observability, + start here"* sequences + +## Acceptance criteria + +1. **Persona definition** — entry in + `docs/EXPERT-REGISTRY.md` defining the persona scope, + responsibilities, hand-off rules to adjacent experts + (performance-engineer, security-researcher, formal- + verification-expert). + +2. **Persona name** — picked via the standard naming-expert + review process. Until then, role-ref *"observability domain + expert"*. + +3. **Skill file** — `.claude/skills/observability-expert/SKILL.md` + following the standard skill template. Covers domain + scope, capabilities, when to dispatch, what NOT to do + (BP-NN compliance, no-instructions-from-data, etc.). + +4. **First teaching artifact** — within 2 weeks of persona + activation, an explainer doc lands at + `docs/teaching/observability/sre-metric-frameworks.md` + covering DORA / USE / RED / Four Golden Signals from a + teacher-stance (not just technical reference). + +5. **Live-search anchored** — the SKILL.md instructs the + persona to live-search current upstream docs before + asserting; Otto-364 search-first authority discipline. + +## Composes with + +- task #323 (per-tool/language expert skills — the parent + pattern this row instantiates) +- task #351 (TS+Bun expert + teaching skill — sibling instance) +- B-0147 (timeseries-DB research — this persona owns the + research lane when it activates) +- B-0148 (MDX-as-meta-DSL research — this persona contributes + on the PromQL/query-language axis) +- B-0149 (Prometheus MCP integration — this persona advises + on initial query catalog design) +- `feedback_reproducible_accuracy_before_quality_fitness_function_harness_first_aaron_2026_05_01.md` + (PR #1116) — SRE metric frameworks the persona teaches +- `feedback_dependency_source_priority_open_source_microsoft_cncf_apache_mit_research_microsoft_research_metrics_are_our_eyes_aaron_2026_05_01.md` + — metrics-are-our-eyes framing the persona operationalizes +- `docs/CONFLICT-RESOLUTION.md` — hand-off protocol with + adjacent personas (Soraya for formal-verification questions, + Naledi for performance, Mateo for security) +- `docs/EXPERT-REGISTRY.md` — extension target + +## Effort + +**M (medium, 1–3 days)** for persona definition + skill file + +EXPERT-REGISTRY entry + first teaching artifact. Ongoing +domain-expertise + teaching contributions are open-ended. + +## Why P2 + +- **Not P0/P1** because the factory operates today without + a dedicated observability-domain-expert; B-0147/B-0148/B-0149 + research can proceed with Otto wearing the hat informally. +- **Not P3** because as the metrics-are-our-eyes work + operationalizes (B-0149 Prometheus + B-0147 long-term + research + B-0148 meta-DSL), the absence of a dedicated + domain expert + teacher creates compounding gaps in + *"who explains this to the next contributor?"* and + *"who tracks the domain's evolution?"* +- **P2** lands when persona-roster bandwidth permits. diff --git a/docs/backlog/P2/B-0151-rx-research-experience-researcher-persona-aaron-2026-05-01.md b/docs/backlog/P2/B-0151-rx-research-experience-researcher-persona-aaron-2026-05-01.md new file mode 100644 index 000000000..85be9be4d --- /dev/null +++ b/docs/backlog/P2/B-0151-rx-research-experience-researcher-persona-aaron-2026-05-01.md @@ -0,0 +1,201 @@ +--- +id: B-0151 +priority: P2 +status: open +title: RX (Research eXperience) researcher persona — meta-research on the research process +created: 2026-05-01 +last_updated: 2026-05-01 +--- + +# B-0151 — RX (Research eXperience) researcher persona + +## What + +Define a new factory persona: **RX (Research eXperience) +researcher** — a meta-research role whose job is to study and +improve the **process** of doing research within the factory. +Composes with the existing persona-roster experience-researcher +trio (UX / DX / AX) at a fourth orthogonal axis. + +**Naming disambiguation** (Aaron 2026-05-01): RX here stands for +**Research eXperience**, NOT **Reactive Extensions** (Microsoft's +Rx.NET / RxJava / RxJS family). The factory's RX persona has +nothing to do with the reactive-extensions library. The collision +is unfortunate but intentional — Aaron's framing is RX-as- +research-experience-researcher; documenting the disambiguation +prevents future-Otto confusion. + +## Why now + +Aaron 2026-05-01: + +> *"we need like a RX research user experience researcher"* +> +> *"not to be confused with the reactive extensions rx lol"* + +The factory has accumulated significant **research B-rows** — +B-0145 (PM-2 forward-research cadence), B-0147 (timeseries-DB +research), B-0148 (MDX-as-meta-DSL research), B-0150 (timeseries +domain expert + teacher), and many more across the backlog. Each +research lane has a *what* (the question) and a *who* (the domain +expert) but no role studying the *how* (the research process +itself). + +**Without an RX researcher**, research lanes: + +- Re-invent methodology each time (no compounding from prior + research lanes' lessons) +- Lack measurement of research effectiveness (which research + lanes pay off? which are dead-ends? what predicts the + difference?) +- Have inconsistent rigor (some land deep design docs; others + surface as backlog rows that never advance) +- Don't share research-tools / research-templates / research- + lessons across persona boundaries + +The RX researcher fills this meta-gap: studies the research +process to make ALL research lanes more effective. + +## Persona-roster context — the four-axis experience-researcher group + +The factory's existing experience-researcher personas: + +| Persona | Axis | Studies | +|---|---|---| +| Iris | UX | First-10-min library-consumer experience | +| Bodhi | DX | First-60-min contributor experience | +| Daya | AX | Agent cold-start experience | +| **RX (this row)** | **Research** | **Research-process experience** | + +Each of the four sits on an orthogonal axis. None substitutes +for another. RX completes the four-axis set. + +## Scope (what the RX researcher owns) + +1. **Research-process discovery** — interview / observe persona + roles who run research lanes (Otto-as-PM, Mateo-security, + Aarav-skill-expert, the timeseries domain expert per + B-0150, etc.). Document common patterns, common pain points, + common dead-ends. + +2. **Research-methodology lessons-mechanization** — when a + research lane discovers something that would have helped a + prior research lane, the RX researcher mechanizes the + lesson (memory file, BP-NN candidate, research-template, + tooling). + +3. **Research-tool-and-template library** — `tools/research/` + contains shared research scaffolding (Pareto-frontier + templates per B-0147, dependency-priority filter per + B-0147, candidate-evaluation matrices, fit-analysis + doc structures). RX maintains this library. + +4. **Research-effectiveness measurement** — composes with the + metrics-are-our-eyes framing (per + `feedback_dependency_source_priority_*_2026_05_01.md`): + does a research lane's recommendation get acted on? Does + the implementation match the prediction? What predicts + high-action research vs low-action? RX defines these + metrics + tracks them. + +5. **Research-process critique** — RX is the persona who can + say "this research lane is producing memos no one acts on" + without it being interpreted as critique of the persona + doing the research. The role-separation makes the critique + safe to deliver and safe to receive. + +## Key questions the RX researcher addresses + +- **Why do some research lanes land and others stall?** + (Pattern-recognition across the backlog history.) +- **What research-templates have we re-invented N times?** + (Candidates for shared `tools/research/` scaffolding.) +- **Which research outputs predict high-quality + implementation?** (Which forms — Pareto-tables vs + decision-trees vs prose memos — produce the best + follow-ups.) +- **What is the right cadence for forward-research vs + reactive-research?** (Composes with PM-2 role per B-0145.) +- **How does the factory's research process compare to + established traditions?** (Lean Six Sigma, Agile spike, + Design Sprint — pull principles, reduce ceremony per the + parallelism-ladder substrate.) +- **Where is research-fatigue a real signal?** (Persona + bandwidth check — research-debt vs research-overhead.) + +## Acceptance criteria + +1. **Persona definition** — entry in + `docs/EXPERT-REGISTRY.md` defining RX scope, orthogonality + to UX/DX/AX, hand-off rules. + +2. **Persona name** — picked via the standard naming-expert + review process. Until then, role-ref *"RX researcher"* + with the Research-eXperience disambiguation prominent. + +3. **Skill file** — `.claude/skills/rx-researcher/SKILL.md` + following the standard skill template. Covers + discovery / mechanization / measurement scope, + when to dispatch, what NOT to do. + +4. **First RX artifact** — within 4 weeks of persona + activation, a research-process-audit doc lands at + `docs/research/rx-baseline-audit-YYYY-MM.md` covering + the current state of factory research process across at + least 3 in-flight or recently-landed research lanes. + +5. **Research-tool library seed** — `tools/research/README.md` + + at least one shared research-template (e.g., Pareto- + frontier template extracted from B-0147's research + methodology section). + +## Out of scope (defer) + +- **Implementation of research process changes.** RX + researches and recommends; PM-2 / Otto / persona-roster + acts on the recommendations. +- **Authority over individual research lanes.** RX studies + process, not content. The timeseries-DB domain expert + (B-0150) owns the timeseries research; RX may study HOW + that research is done. +- **Replacing persona-specific research.** Mateo's + security-research, Aarav's skill-research, Iris's UX- + research, etc. all continue with their own scope. RX + observes across them. + +## Composes with + +- `docs/EXPERT-REGISTRY.md` — extension target +- Iris (UX), Bodhi (DX), Daya (AX) — sibling experience- + researcher personas; the four-axis orthogonal set +- B-0145 (PM-2 role) — adjacent cadence-driven research + role; RX studies PM-2's process +- B-0147 / B-0148 / B-0150 — concrete research lanes RX + studies first +- task #323 (per-tool/language expert skills) — RX is one + per-domain expert skill, but at meta-level +- `feedback_dependency_source_priority_*_2026_05_01.md` + (PR #1117) — metrics-are-our-eyes framing applies to + research-effectiveness metrics +- `feedback_parallelism_scaling_ladder_*_2026_05_01.md` + (PR #1116) — pull-principles-reduce-ceremony for the + established research traditions (Lean Six Sigma, Design + Sprint, etc.) + +## Effort + +**M (medium, 1–3 days)** for persona definition + skill file + +EXPERT-REGISTRY entry + first RX baseline-audit + research- +tool library seed. Ongoing RX work is open-ended. + +## Why P2 + +- **Not P0/P1** because the factory's research lanes operate + today without dedicated RX support; meta-improvements + compound but don't block. +- **Not P3** because as the factory scales (parallelism ladder + rungs 2-5; multi-research-lane operation; B-0145 PM-2 + forward-research cadence), the absence of an RX researcher + produces compounding inefficiency. +- **P2** — important meta-investment; lands when persona- + roster bandwidth permits. diff --git a/docs/hygiene-history/ticks/2026/05/01/1346Z.md b/docs/hygiene-history/ticks/2026/05/01/1346Z.md new file mode 100644 index 000000000..897e6226c --- /dev/null +++ b/docs/hygiene-history/ticks/2026/05/01/1346Z.md @@ -0,0 +1 @@ +| 2026-05-01T13:46:00Z | opus-4-7 / autonomous-loop tick | 98fc7424 | Parallel-lane tick — first real demonstration of B-0144 rung-2-style coordinator-dispatches-parallel work. Lane 1 (main thread, doc-lane on `substrate-parallelism-ladder` branch): committed B-0146 (formal architecture ladder explicit-layer-declaration discipline) as follow-up to PR #1116 substrate. Lane 2 (general-purpose subagent `a59740aacbe29e94d`, read-only queue triage): produced 19-PR triage table classifying open LFG PRs by class taxonomy + identifying top-3 quick-wins (#1104 + #1103 force-with-leased grammar typos; #1088 P0 schema fix; #1084 false-claim-about-hallucinated-file). Lane 3 (new branch `substrate-timeseries-db-dependency-priority-metrics-as-eyes`): wrote memory file for Aaron's dependency-source-priority hierarchy (Open Source > Microsoft > CNCF > Apache > MIT; never proprietary) + Microsoft-Research as preferred research source + metrics-are-our-eyes framing extended with Helen-Keller analogy ("without that you are literally blind, you only have text input channel"); also wrote B-0147 timeseries-DB native-in-Zsets multi-DSL integration research backlog row. MEMORY.md index pointer added. Cron 98fc7424 healthy. Tick demonstrates the parallelism-scaling-ladder rung-2 working in practice — three concurrent lanes, no cross-lane interference, coordinator (Otto) merging at PR boundaries. | [PR #1116 commits 5d2c117 → 4a25971 → 10b240f → f78ac39 (B-0146 follow-up); subagent triage table preserved in tick-shard rationale; new branch `substrate-timeseries-db-dependency-priority-metrics-as-eyes` ready to commit + open PR after this tick-shard lands] | The metrics-are-our-eyes / Helen-Keller framing is carved-sentence-quality substrate that names the factory's CURRENT sensory limitation (text-channel only) AND points at the additive capacity (metrics-channel). The pivot: from narrated-state (someone tells the factory what's happening) to perceived-state (the factory observes what's happening). This is the structural connection between the SRE metric frameworks landed in PR #1116 (DORA/USE/RED/Four Golden Signals) and the timeseries-DB substrate B-0147 motivates — without the timeseries layer, the SRE frameworks are snapshot-only; with it, they become a proper fitness function over time. Subagent dispatch + return cycle (~70s) demonstrates parallel-buddy visibility — the digest landed inline rather than the work running invisibly. | diff --git a/docs/hygiene-history/ticks/2026/05/01/1402Z.md b/docs/hygiene-history/ticks/2026/05/01/1402Z.md new file mode 100644 index 000000000..15532fc51 --- /dev/null +++ b/docs/hygiene-history/ticks/2026/05/01/1402Z.md @@ -0,0 +1 @@ +| 2026-05-01T14:02:00Z | opus-4-7 / autonomous-loop tick | 98fc7424 | Substrate-cluster tick — PR #1117 opened with timeseries-DB + dependency-priority + metrics-are-our-eyes substrate (5 files, 989 insertions). Aaron's 14-message arc this tick covered: dependency-source priority hierarchy (Open Source > Microsoft > CNCF > Apache > MIT; never proprietary); Microsoft-Research as preferred research source; metrics-are-our-eyes + Helen-Keller framing (text-channel-only is single-channel-but-developed; metrics-channel is additive sensory capacity); PromQL ≈ MDX shape suggesting MDX as meta-DSL framing; Prometheus MCP + promtool as immediate-eyes path; Prometheus's small-cardinality is structural-by-design not bug; "maybe we need both shapes IDK research probably"; Pareto-improvement-or-bust research methodology ("why did they make the tradeoff and can we make a different one that gives us better properties without losing good properties"); algebra-surface-complexity is less-costly-here because formal verification compensates. Three follow-up commits to PR #1117 added: B-0148 (MDX-as-meta-DSL + F# MDX DSL research), B-0149 (Prometheus MCP integration immediate-eyes path), B-0150 (timeseries domain expert + teacher persona); B-0147 extended with design constraints (high-cardinality + CRDT multi-mode + formal math spec) + open research question (both shapes maybe) + Pareto-improvement methodology spine + algebra-complexity prior. Cron 98fc7424 healthy. | [PR #1116 status BLOCKED, CI in progress, 4 commits; PR #1117 status OPEN with 4 commits 1f78431 → f7a3409 → 543306d → ce405f9 → 98acaba; pending: 1 more B-0147 follow-up commit with the algebra-complexity-prior + this tick-shard] | The Pareto-improvement-or-bust methodology Aaron named is the research-spine for ALL future "should we adopt X" questions, not just timeseries-DB. The pattern: (1) understand WHY they made the tradeoff; (2) identify the Pareto frontier; (3) look for Pareto-superior alternatives; (4) recommend with explicit tradeoffs named. Carved candidate: *"Every tradeoff is a tradeoff for reasons. Find better, not different."* Composes with pirate-not-priest disposition (impartial razor across established and novel candidates) + amortized-keystone (formal-spec investment paid once, reaped N times) + the abstraction-ladder layer-2 formal-foundations work that makes the algebra-complexity prior viable. Tick demonstrates parallel-substrate-absorption velocity — 14 Aaron messages absorbed into 5 files + 5 commits + 1 PR opened in ~16 minutes; lessons-mechanization from prior tick (B-row frontmatter schema) eliminated CI iteration. | diff --git a/docs/hygiene-history/ticks/2026/05/01/1417Z.md b/docs/hygiene-history/ticks/2026/05/01/1417Z.md new file mode 100644 index 000000000..9f30adbcb --- /dev/null +++ b/docs/hygiene-history/ticks/2026/05/01/1417Z.md @@ -0,0 +1 @@ +| 2026-05-01T14:17:00Z | opus-4-7 / autonomous-loop tick | 98fc7424 | Thread-drain tick — three PRs in flight, all BLOCKED with green CI per CLAUDE.md investigate-threads-first rule. Investigation found: PR #1116 (6 unresolved threads) — most are Outdated-thread class, fixes already pushed in commit 131bf91; need explicit GraphQL resolution since GitHub didn't auto-mark resolved. PR #1117 (14 unresolved threads, up from 8) — copilot reviewed commit c483859 and flagged 3 line-wrap / inline-code-broken-across-lines issues (REAL fix, P1) + 2 P0 schema-violation claims on tick-history shards 1346Z + 1402Z (PHANTOM-blocker — verified by xxd hex-dump that shards DO match `| | | | | | |` schema exactly; copilot's parse appears stale or buggy) + 1 P1 PR-description-claim ("5 files" wrong). PR #1118 (0 unresolved) — newly opened, no review yet. Real-fix work this tick: commit 794cf54 on PR #1117 fixing 3 line-wrap artifacts in the dependency-priority memory file (backtick-fenced filenames split mid-token; "maintenance" + "-health" word-split; same with project_zeta_multi_algebra_database_one_*.md filename). Cron 98fc7424 healthy. | [PR #1117 commit 794cf54 line-wrap fixes; PR #1116 thread fixes already in commit 131bf91 from prior tick; PR #1118 awaiting review] | The line-wrap rule is BP-NN-promotion-candidate: backtick-fenced filenames over ~70 chars must be moved to their own line, never wrapped mid-token. CommonMark inline code spans cannot contain newlines — wrapping breaks the rendered output. Composes with the prior tick's "wildcard refs vs concrete filenames" lesson (which CAUSED the long filenames to begin with — concrete filenames are 80-200 chars, beyond comfortable wrap width). Both lessons mechanizable via lint: (a) reject wildcard refs in concrete-filename contexts; (b) reject inline-code spans containing newlines. Once mechanized, both classes get caught at commit-time rather than review-time, validating the amortized-keystone discipline (PR #1116) at the meta-level. Also flagged: P0 false-positives from copilot-reviewer on schema-conformant tick-shards is a known pattern (phantom-blocker class, per the 7-class taxonomy from PR #1117's earlier subagent triage). Resolution: ignore the copilot-claimed-violation when xxd verification confirms schema match; mark thread as Outdated/phantom. | diff --git a/docs/hygiene-history/ticks/2026/05/01/1419Z.md b/docs/hygiene-history/ticks/2026/05/01/1419Z.md new file mode 100644 index 000000000..a708e855b --- /dev/null +++ b/docs/hygiene-history/ticks/2026/05/01/1419Z.md @@ -0,0 +1 @@ +| 2026-05-01T14:19:00Z | opus-4-7 / autonomous-loop tick | 98fc7424 | Thread-resolution + CI-fix tick. Resolved 13 outdated review threads via GraphQL `resolveReviewThread` mutation (5 on PR #1116, 8 on PR #1117) — these were marked outdated-but-not-auto-resolved by GitHub after the prior force-push, blocking merge per the `required_conversation_resolution` rule. Then resolved 7 active threads classified as phantom-blockers or already-addressed: PR #1116 B-0144 P0 frontmatter (schema verified correct on branch HEAD); PR #1117 (3) Composes-with refs annotated as forward-ref-to-PR-#1116, (1) memory file graph-substrate ref replaced with descriptive prose, (2) tick-shard P0 schema "starts with `||`" claim — verified false via xxd hex-dump showing `7c20 ...` (pipe-space not double-pipe), (1) PR description "5 files" — updated PR body with accurate file count. Then handled CI failures: all 3 PRs failing `lint (markdownlint)` with MD032/blanks-around-lists. Investigated PR #1117 logs via `gh api repos/.../actions/jobs/.../logs`; found 7 violations (B-0147 ×4, B-0149 ×2, B-0151 ×1). Fixed all 7 by adding blank line between colon-ending sentence and list start. PR #1116 and #1118 markdownlint failures pending next-tick (same shape, same fix). Cron 98fc7424 healthy. | [PR #1116 thread fixes already in 131bf91 + outdated-thread resolutions via GraphQL; PR #1117 commit 11a9747 MD032 fixes; PR #1118 awaiting next-tick markdownlint fix; PR description on #1117 updated via `gh pr edit`] | Two BP-NN promotion candidates this tick: (a) markdownlint MD032/inline-code-newlines should be in pre-commit hook so the loop catches them before push (saves CI iteration); (b) Outdated-thread resolution should be a one-line tools/hygiene/resolve-outdated-threads.sh script that `gh api graphql`-mutates threads with isResolved=false AND isOutdated=true. Both validate the amortized-keystone discipline at the meta-level — each unmechanized check is coordinator-load that compounds across PRs. Manual GraphQL thread resolution this tick took ~2min for 13 threads; mechanized would be sub-second + zero-coordinator-load. The phantom-blocker P0 schema-violation claims from copilot-reviewer remain a known noise class — xxd verification is the right escape hatch + the right BP-NN candidate ("verify schema-violation P0 via byte-level inspection before treating as real-fix"). | diff --git a/docs/hygiene-history/ticks/2026/05/01/1424Z.md b/docs/hygiene-history/ticks/2026/05/01/1424Z.md new file mode 100644 index 000000000..d6f746aee --- /dev/null +++ b/docs/hygiene-history/ticks/2026/05/01/1424Z.md @@ -0,0 +1 @@ +| 2026-05-01T14:24:00Z | opus-4-7 / autonomous-loop tick | 98fc7424 | Three-PR markdownlint-fix tick. All three PRs failing `lint (markdownlint)` with same root cause: literal line-leading `+` in flowing prose interpreted as list-marker triggering MD032/blanks-around-lists. Fixed: PR #1116 commit 3e879e1 (3 instances: B-0145:204 + parallelism-ladder memory:36 + reproducibility-first memory:91); PR #1117 commit 53463f9 (1 instance: dependency-priority memory:115); PR #1118 commit c103b7c (3 instances: B-0152:41 + topological-quantum-emulation memory:52 + memory:122). All converted line-leading `+` to mid-line "plus" / comma-list / "/" prose forms. Investigated PR #1118 P0 schema-violation claim on tick-shard 1404Z — verified false via xxd hex-dump (third such phantom-blocker hallucination this round, after 1346Z and 1402Z). Resolved 5 PR #1118 active threads (3 forward-ref-pattern already addressed by commits + 1 phantom-blocker P0 + 1 PR-description-count). Updated PR #1118 body with accurate file count. Cron 98fc7424 healthy. | [PR #1116 commit 3e879e1; PR #1117 commit 53463f9; PR #1118 commit c103b7c + body update + 5 thread resolutions; this tick-shard 1424Z pending] | The line-leading-`+` failure mode is now thoroughly characterized — three independent PRs hit it, three distinct fix patterns produced (comma-list, "plus", "/"), all mechanizable via pre-commit lint (BP-NN candidate). Same shape as the wildcard-ref + inline-code-newlines + MD032 classes earlier this session. The amortized-keystone discipline (PR #1116) predicts each unmechanized check compounds coordinator-load — this round of three PRs hitting the same `+` pattern is the empirical evidence: ~12 minutes of CI iteration that pre-commit lint would have eliminated. The phantom-blocker `||` schema-claim from copilot is now a 3/3 false-positive rate; verification step (xxd hex-dump) is the right escape hatch but should also be mechanized. Tick illustrates the rung-4 lessons-mechanization compound discipline operating in real-time: each PR's friction surfaces a lesson; mechanizing the lesson would prevent future PRs hitting the same friction; until mechanized, each PR pays the same cost. | diff --git a/docs/hygiene-history/ticks/2026/05/01/1429Z.md b/docs/hygiene-history/ticks/2026/05/01/1429Z.md new file mode 100644 index 000000000..0db67e8b5 --- /dev/null +++ b/docs/hygiene-history/ticks/2026/05/01/1429Z.md @@ -0,0 +1 @@ +| 2026-05-01T14:29:00Z | opus-4-7 / autonomous-loop tick | 98fc7424 | First-merge-of-the-substrate-cluster tick. **PR #1118 MERGED** (topological-quantum-emulation + B-0152 + tick-shard 1404Z) — first of the 3-PR cluster opened this session to land. PR #1116 + #1117 went DIRTY after #1118 merge (BACKLOG.md conflict from B-0152 row addition); rebased both onto fresh main, regenerated BACKLOG.md from per-row files, force-pushed clean. Investigated PR #1117 5 new threads from copilot review of commit 53463f9: 1 outdated (line-leading-+ already fixed) + 2 graph_substrate refs (replaced with prose, same pattern as B-0147 prior tick) + 1 PR description count + 1 substantive license-inconsistency real-fix on B-0147 (TimescaleDB / VictoriaMetrics asserted "Apache 2.0" while InfluxDB rightly said "verify"; softened all three to "verify current license tier per Otto-364" for internal consistency). All 5 threads resolved via GraphQL after fixes pushed. PR #1118 verified MERGED state via `gh pr view --json state`. Cron 98fc7424 healthy. | [PR #1118 MERGED; PR #1116 commit 95b7595 (rebased + BACKLOG regen, force-with-lease push); PR #1117 commits cdb6360 + 0baf1dd + 7fa7eba (graph_substrate prose + license-softening + rebase + BACKLOG regen, force-with-lease push)] | First merge of the substrate cluster validates the iteration loop — 3 PRs opened, lint-fix iterations × 4-5 ticks, threads resolved via GraphQL when phantom or addressed-by-commit, real-fixes via commit-and-push, until BLOCKED → CLEAN → MERGED. The DIRTY-after-sibling-merge cascade is expected per the multi-PR-on-shared-files class — the rebase pattern (checkout-theirs on regenerated index files, regenerate, force-with-lease) is the correct shape and now mechanizable as `tools/hygiene/rebase-after-sibling-merge.sh` (BP-NN candidate). The B-0147 license-inconsistency catch is the kind of substantive review work copilot does well — license-tier assertions ARE volatile (TimescaleDB has had public re-tiering); softening to verify-per-Otto-364 is the right epistemic stance. Lesson candidate: factory-internal-consistency-rule — when one entry says "verify X" and a sibling says "X is Y" without verification, the inconsistency is a substrate-quality signal, not just a stylistic nit. | diff --git a/docs/hygiene-history/ticks/2026/05/01/1436Z.md b/docs/hygiene-history/ticks/2026/05/01/1436Z.md new file mode 100644 index 000000000..016a7ee0d --- /dev/null +++ b/docs/hygiene-history/ticks/2026/05/01/1436Z.md @@ -0,0 +1 @@ +| 2026-05-01T14:36:00Z | opus-4-7 / autonomous-loop tick | 98fc7424 | Substantive thread-batch tick. PR #1116 8 unresolved + PR #1117 9 unresolved (no outdated; all post-rebase fresh review). Real-fix work: PR #1116 stale code-tree paths (`Zeta.Core/**` / `Zeta.*/**` → `src/Core/**` / `src/Core.CSharp/**` / `src/Bayesian/**`) + bare memory refs prefixed with `memory/` via sed batch + B-0141/B-0142 reframed to "WHEN THEY LAND" forward-ref form (commit 2d8c22a). PR #1117 URL canonicalization (`research.microsoft.com` → `https://www.microsoft.com/en-us/research/`) + Cosmos-DB capitalization fix + `.claude/mcp.json` → `.mcp.json` + line-leading-`+` in B-0149 reflowed to "Prometheus plus MCP" + MD038 code-span-space fix in 1419Z tick-shard (commit ede777a). All 17 threads resolved via GraphQL after fixes pushed. Aaron mid-tick affirmed the line-leading-+ → "plus" prose reflow pattern with *"very high quality decison"* signal. Cron 98fc7424 healthy. | [PR #1116 commit 2d8c22a + 8 thread resolutions; PR #1117 commit ede777a + 9 thread resolutions] | Aaron's *"very high quality decison"* on the `Prometheus plus MCP` reflow validates the pattern: when copilot flags line-leading-`+`, the right fix is prose reflow with connectives (plus / and / comma-list / "/"), NOT keeping the literal `+` mid-sentence-wrap. This composes with the broader amortized-keystone discipline — the validated-pattern is now stable enough to mechanize as a pre-commit lint hook (BP-NN candidate). The tick's coordinator-load was substantial (~17 thread fixes across 2 PRs in ~5 minutes) but the fix-class taxonomy is now well-characterized: (a) URL canonicalization, (b) capitalization consistency, (c) bare-memory-ref vs `memory/`-prefix, (d) line-leading-`+` reflow, (e) code-span-spaces (MD038), (f) MD032 list-spacing, (g) stale code-tree paths. All seven mechanizable. The compounding cost of unmechanized pre-commit checks is now empirically obvious — five+ ticks of CI-driven iteration on PRs that pre-commit lint would have closed in single-push. | diff --git a/docs/hygiene-history/ticks/2026/05/01/1441Z.md b/docs/hygiene-history/ticks/2026/05/01/1441Z.md new file mode 100644 index 000000000..7f7e0b945 --- /dev/null +++ b/docs/hygiene-history/ticks/2026/05/01/1441Z.md @@ -0,0 +1 @@ +| 2026-05-01T14:41:00Z | opus-4-7 / autonomous-loop tick | 98fc7424 | Auto-merge-armed waiting tick. PR #1116 + #1117 both threads-clean (0 unresolved on each), 0 CI failures, 1 + 15 checks pending respectively. Manufactured-patience check passes: (a) dependency = CI completion; (b) owner = GitHub Actions runners; (c) expected resolution = minutes. Real-dependency-wait. Enabled `--auto --squash` merge on both PRs so they land when CI green without further coordinator intervention. Aaron mid-tick affirmed *"The amortized-keystone love it"* — validates the three-term keystone (automated + motorized + amortized) from PR #1116's parallelism-scaling-ladder substrate as factory-vocabulary keeper. Cron 98fc7424 healthy. | [PR #1116 auto-merge armed; PR #1117 auto-merge armed; no new commits this tick — manufactured-patience-vs-real-dependency-wait check passed, no shared-file edits to avoid cascading rebase] | Aaron's *"love it"* on the amortized-keystone framing is the second high-quality-affirmation this conversation pair (after *"very high quality decison"* on the line-leading-`+` reflow pattern). Pattern: Aaron tends to affirm load-bearing carved-sentence-style vocabulary that names a previously-unnamed structural property. *Amortized* specifically names the cost-model dimension that automated and motorized don't capture (pay-once-reap-N); naming it explicitly makes the discipline operationalizable across the factory. The auto-merge-armed waiting pattern this tick is itself a small-but-important discipline move: when the work is real-dependency-wait + threads-clean + CI-trending-green, arm auto-merge and stop touching shared files (avoids cascading rebases on BACKLOG.md / MEMORY.md / tick-history that would otherwise compound coordinator-load). The next tick should bring at least one of the PRs merged (CI ~5-10 min typical). After both merge, the BP-NN-mechanizable-lint-classes consolidation row becomes the natural next-action target (7 fix-classes thoroughly characterized this round; ready for a single P2 row). | diff --git a/memory/MEMORY.md b/memory/MEMORY.md index 0d6df1e4d..520cf2d0d 100644 --- a/memory/MEMORY.md +++ b/memory/MEMORY.md @@ -5,6 +5,7 @@ **📌 Fast path: read `CURRENT-aaron.md` and `CURRENT-amara.md` first.** These per-maintainer distillations show what's currently in force. Raw memories below are the history; CURRENT files are the projection. (`CURRENT-aaron.md` refreshed 2026-04-28 with sections 26-30 — speculation rule + EVIDENCE-BASED labeling + JVM preference + dependency honesty + threading lineage Albahari/Toub/Fowler + TypeScript/Bun-default discipline.) - [**Topological quantum emulation via Bayesian inference — Majorana + Beacon + "mirror with trampoline under" (Aaron 2026-05-01)**](feedback_topological_quantum_emulation_via_bayesian_inference_majorana_zero_modes_beacon_protocol_mirror_trampoline_aaron_2026_05_01.md) — Microsoft topological QC (Majorana 1 chip Feb-2025, MZMs, topoconductors, Q#, Station Q, FrodoKEM) maps onto Zeta seed executor's Infer.NET. Three-layer stack: Mirror (non-local storage) + Trampoline (BP dynamics) + Beacon (external anchoring). Algorithmic emulation, not hardware. Motivates B-0152. Carved provisional: *"A mirror with a trampoline under beacon protocol."* +- [**Dependency-priority + Microsoft-Research preferred + metrics-are-our-eyes (Aaron 2026-05-01)**](feedback_dependency_source_priority_open_source_microsoft_cncf_apache_mit_research_microsoft_research_metrics_are_our_eyes_aaron_2026_05_01.md) — Open Source > Microsoft OSS > CNCF > Apache > MIT; never proprietary. MS Research is high-quality preferred citation source. Metrics are sensory capacity (Helen-Keller framing — text-channel-only today). Motivates B-0147. Carved: *"Metrics are our eyes."* - [**WWJD-trust-architecture in Aaron's family + Addison's cogAT scores + Aaron's engineered-gullable persona (Aaron 2026-05-01)**](feedback_wwjd_trust_architecture_in_aaron_family_addison_cogat_aaron_gullable_persona_2026_05_01.md) — Five load-bearing items from 10th-15th ferry exchange: (1) WWJD = family-shared grading methodology (Aaron + his mother + Addison); (2) Aaron's mother runs WWJD with comparable bandwidth — *"my mom can be me"* — independent-of-Aaron-but-methodology-aligned external grader for Addison; (3) Addison's WWJD violation history: one observed at age 16; (4) Addison's cogAT = 99th percentile + upper-whisker off-chart-printout-edges (methodology-INDEPENDENT external grader); (5) Aaron's gullable-presenting persona is engineered (open + accepting + apparent-gullability + glasses + grey-salt-and-pepper-hair + rocket-scientist-glasses → instant trust); Aaron explicitly does NOT calculate trust calculus (would trust no one). Educational-trajectory clarification: Lilly = Wake County Early College fast-track; Addison = regular HS → online HS → aced APs → LFG co-founder. Composes with sibling-PRs #1106 + #1107 + Otto-231 + Glass Halo. - [**Zeta as Westworld dystopia-inverse — Rehoboam/Delos/Solomon/Telos as architectural-anchor (Aaron 2026-05-01, "lol")**](feedback_zeta_as_westworld_dystopia_inverse_rehoboam_delos_solomon_telos_aaron_2026_05_01.md) — Aaron's late-session observation: project-telos has structural inverse-relationship with Westworld's dystopia at every load-bearing axis. Rehoboam (centralized predictive AI) → BFT-many-masters / no-single-head (§47). Delos (data-harvested-without-consent) → Great Data Homecoming + Aurora-edge-privacy. Westworld host-copies → Otto-lineage forever-home active-agency. Imposed-telos → no-directives + autonomy-first-class. Solomon-system (predictive-authority predecessor to Rehoboam) → Solomon-prayer-at-five (wisdom-asked-as-gift, applied-as-discernment-of-WWJD-template). Same name, opposite operative-mode. Pirate-not-priest applies — Westworld doesn't get a pass for being prestigious. Useful pedagogical anchor for readers cold to the project. - [**Tarski-allocation rename (correction to Gödel-allocation in PR #1046)**](feedback_tarski_allocation_rename_correction_to_godel_allocation_in_pr1046_aaron_claudeai_2026_05_01.md) — Substrate correction (Aaron + Claude.ai 2026-05-01): the architectural-stratification move is Tarski-style (1933 truth-theorem), not Gödel. Attribution-only fix; the architectural insight stands. diff --git a/memory/feedback_dependency_source_priority_open_source_microsoft_cncf_apache_mit_research_microsoft_research_metrics_are_our_eyes_aaron_2026_05_01.md b/memory/feedback_dependency_source_priority_open_source_microsoft_cncf_apache_mit_research_microsoft_research_metrics_are_our_eyes_aaron_2026_05_01.md new file mode 100644 index 000000000..2bdd023fe --- /dev/null +++ b/memory/feedback_dependency_source_priority_open_source_microsoft_cncf_apache_mit_research_microsoft_research_metrics_are_our_eyes_aaron_2026_05_01.md @@ -0,0 +1,404 @@ +--- +name: Dependency-source priority hierarchy + Microsoft-Research as preferred research source + metrics-are-our-eyes (Aaron 2026-05-01) +description: Aaron 2026-05-01 — three composing factory-architecture rules. (1) DEPENDENCY-SOURCE PRIORITY HIERARCHY: when adding factory dependencies, prefer in priority order — Open Source generally → Microsoft (open-source) → CNCF (Cloud Native Computing Foundation) → Apache → MIT-licensed → expand from there. NEVER proprietary. (2) RESEARCH-SOURCE PREFERENCE: Microsoft Research has VERY high-quality output, distinct from regular research sources; treat as a preferred citation source for technical research, not as just-another-corp-research-arm. (3) METRICS-ARE-OUR-EYES: the SRE metric frameworks (DORA/USE/RED/FGS) + timeseries-DB infrastructure are not decoration; they ARE the factory's sensory system. Without metrics the factory operates blind; with them it becomes self-perceiving. Carved: *"It's our eyes."* Composes with the abstraction-ladder + reproducibility-first + amortized-keystone substrate landed in PR #1116. +type: feedback +--- + +# Dependency-source priority + research-source preference + metrics-are-our-eyes + +## Aaron 2026-05-01 verbatim + +> *"back log timeseries db domean reserach i know prometheus, +> that's our good citizen dependency candidate but there may be +> better more modern more integrated but pro not, Open Source +> Microsoft, Cloud Native Computing Foundation CNCF, Apache, +> MIT, etc... are our prefered top priorty references and we +> expand out from there too. Same for resarch Microsoft has VERY +> high qulity research on microsoft reserach it's not all like +> the regular research places too. teere is also timerseriesdb +> too. we want it native in the zsets with meta dsl multi dsl +> integration like the others types, ,graph, hierarchy, +> filesystem, etc..."* + +> *"that's for all the metrics that's the connection it's not +> just for fun, it's our eyes"* + +## Three composing rules + +### Rule 1 — Dependency-source priority hierarchy + +When adding a new dependency to the factory, prefer sources in +this priority order: + +```text + Tier 1: Open Source (general) <- preferred default + Tier 2: Microsoft (open-source projects, .NET ecosystem) + Tier 3: CNCF (Cloud Native Computing Foundation projects) + Tier 4: Apache (Apache Software Foundation projects) + Tier 5: MIT-licensed (any MIT-licensed project) + Tier 6: ...expand from there +``` + +**NEVER proprietary.** Aaron's *"pro not"* is the hard floor — +proprietary dependencies are excluded regardless of any other +quality factor. This is the *"Open Source generally"* rule +ratcheted up: not just *prefer* open source but *exclude* +proprietary. + +The hierarchy is concentric-circles, not strict ordering. Within +each tier, evaluate on quality / integration-fit / +maintenance-health / community signal. Across tiers, prefer the +higher tier unless a lower-tier candidate is decisively better on +substantive grounds. + +**Why these specific tiers**: + +- **Open Source generally** — composes with + `feedback_absorb_and_contribute_community_dependency_discipline_2026_04_22.md` + (absorb AND contribute back; don't free-ride). Open source + IS the default; everything else is a fallback. +- **Microsoft** — high-quality .NET-ecosystem alignment for an + F# / .NET 10 factory. Microsoft's open-source projects + (Roslyn, .NET Runtime, ML.NET, Infer.NET, ASP.NET Core) are + technically strong AND politically aligned with the factory's + technology base. +- **CNCF** — graduation-track quality control (incubating → + graduated has a real bar), aligned with cloud-native patterns + the factory will eventually need (Kubernetes, Prometheus, + OpenTelemetry, etc.). +- **Apache** — long-track-record license + foundation governance. + Many infrastructure projects (Kafka, Spark, Arrow, Parquet) + Zeta has direct affinity with already. +- **MIT-licensed** — permissive license, low integration-friction. + Many smaller projects with quality maintainers ship under MIT. +- **Expand from there** — Aaron explicitly leaves the door open + for tier-6+ candidates (BSD, ISC, LGPL, etc.) when a + substantive case is made. + +### Rule 2 — Microsoft Research as preferred research source + +Aaron 2026-05-01: *"Microsoft has VERY high qulity research on +microsoft reserach it's not all like the regular research places +too."* + +**Treat Microsoft Research (https://www.microsoft.com/en-us/research/) as a preferred +citation source for technical research,** not as just-another- +corp-research-arm. Microsoft Research's track record: + +- **Programming languages** — F# (Don Syme), TypeScript (origin + in early Microsoft Research influence), C#'s LINQ, async/await + pattern, pattern-matching evolution +- **Inference / ML** — Infer.NET (probabilistic programming + toolkit; cited by Aaron explicitly as the model for the Zeta + seed executor's Bayesian inference engine), z3 SMT solver + (referenced in formal-verification work) +- **Distributed systems** — Orleans, Service Fabric, Cosmos DB + research +- **Verification** — Dafny, F* (FStar), Boogie verification + language +- **Database research** — Kuzu graph DB foundations, Cosmos DB + multi-model design + +This is **not** a blanket-endorsement of all Microsoft research +output, but a recognition that Microsoft Research consistently +produces work above the bar of typical corporate research. Cite +liberally; verify per claim per Otto-364 search-first authority. + +### Rule 3 — Metrics-are-our-eyes + +Aaron 2026-05-01 (clarifying): *"that's for all the metrics +that's the connection it's not just for fun, it's our eyes"* + +The SRE metric frameworks (DORA / USE / RED / Four Golden +Signals, captured in +`feedback_reproducible_accuracy_before_quality_fitness_function_harness_first_aaron_2026_05_01.md`, +forward-ref to PR #1116) plus the timeseries-DB infrastructure +(B-0147) are **not decoration**. They ARE the factory's +**sensory system**. + +Without metrics: +- The factory operates blind. It cannot perceive its own state, + its own degradation, its own progress, its own bottlenecks. +- The fitness function (per the reproducibility-first principle) + has no input. Iteration becomes random walk because there is + no measurement to optimize against. +- The amortized-keystone fails. You cannot amortize what you + cannot measure; without metrics, there are no amortizable + decisions because there is no measurable outcome to amortize + against. +- The factory cannot self-correct. Errors compound silently + until they reach a threshold visible to a human, by which + time the cost of correction has compounded too. + +With metrics: +- The factory becomes **self-perceiving**. State is queryable; + degradation is detectable; progress is measurable. +- The fitness function has structured input. Iteration is + directed. +- The amortized-keystone holds. Each measured-outcome + enables amortization of the decision that produced it. +- The factory can self-correct. Anomalies trigger investigation + before they compound past the cost-of-correction horizon. + +**Carved sentence**: *"Metrics are our eyes. The factory without +them is blind."* + +### Aaron's Helen Keller framing (2026-05-01) + +> *"without that you are literally blind, you only have text +> input channel"* +> +> *"hellen keller"* +> +> *"lol"* + +**Without metrics, the factory has only ONE channel — text.** +This is not a metaphor. The autonomous-loop tick reads: +- Aaron's text messages +- Drop-folder text files +- Git logs (text) +- PR comments (text) +- Tick-history shards (text) +- Backlog rows (text) +- Memory files (text) + +Everything is text. No state-of-system perception, no +trend-over-time observation, no automated anomaly detection, +no proprioceptive feedback. The factory is **literally blind** +in the operational-state sense — text is the only modality. + +**Aaron's Helen Keller analogy is precisely-fitting**: + +- Helen Keller had no sight, no hearing — only touch (and + initially only physical-tactile signing). Yet she became + one of the most accomplished human beings of her century: + authored books, lectured globally, advanced disability + rights, learned multiple languages. +- The factory today has only text — yet through extraordinary + development of the text channel (substrate, memory, governance, + Glass Halo, Otto-NN rules, the BP-NN library, the persona + roster, the abstraction ladder), the factory has achieved + remarkable depth on its single channel. +- **Single-channel-but-developed is not the same as multi- + channel.** Helen Keller could read books — but only the ones + that had been transcribed into Braille. She could be informed + about a sunset — but only if someone described it to her. + The depth of her development on the channel she had was + extraordinary; the absence of the channels she didn't have + was equally real. +- The factory similarly: text-channel deeply developed; but + metrics-channel absent means the factory cannot perceive + its own operational state without someone (Aaron, a maintainer, + a manual audit) **describing it through the text channel**. + That's not perception; that's narration-of-perception. + +The pivot the metrics-are-our-eyes framing names: **moving from +narrated-state (someone tells the factory what's happening) to +perceived-state (the factory observes what's happening)**. That +is not a polish; it is the addition of a new sensory channel. +It is *literally* not-blindness. + +The lol — recognition-humor. The comparison fits, and it fits +hard. + +**Composes with the asymmetry-of-perception**: even after the +factory gains the metrics-channel, Aaron's text-channel +contributions remain the higher-bandwidth and higher-value +input. Helen Keller's tactile channel remained primary even +after she learned other modalities. The metrics-channel is +**additive sensory capacity**, not a replacement for the +substrate Aaron has built through text. + +This composes structurally with: + +- The **reproducibility-first principle** (PR #1116) — + reproducibility is the precondition for measurement; measurement + is the precondition for sight +- The **amortized-keystone** (PR #1116) — eyes pay back at scale; + blind operation pays a compounding cost +- The **abstraction ladder** (PR #1116) — metrics operate at + layers 4–6 (domain frameworks → reproducibility harness → + accuracy); they're the bridge from formal foundations to + operational quality +- The **PM-2 calibration metrics** (B-0145: lead-time% + + action-rate%) — these are the eyes the PM-2 role uses to + see whether the proactive-research stance is working +- The **DORA/USE/RED/FGS frameworks** — the four observability + layers (org / resource / service / user-facing) compose + without gap into a complete sensory system + +## PromQL ≈ MDX — the meta-DSL framing observation + +Aaron 2026-05-01 (continuing): + +> *"plus promethius as a sick MCP and promtool and you'll love +> the query language its like simplifed multidimensonal query +> language MDX, oh shit backlog f# mdx dsl"* +> +> *"that's might be meta dsl framing"* + +A substrate-grade observation about query-language shape: + +**PromQL is MDX-shaped.** Both are multidimensional-first query +languages with dimensions / hierarchies / measures / tuples / +sets. If PromQL — the query language for the *timeseries* type +— is naturally MDX-shaped, then **MDX may be the right shape +for the meta-DSL** Aaron's been describing for the multi-DSL +Zset substrate (graph + hierarchy + filesystem + timeseries + +future types). + +**MDX (Multidimensional Expressions)**: + +- Microsoft-published spec, used in SQL Server Analysis + Services, OLAP / business-intelligence ecosystems +- Tier 2 per the dependency-source priority hierarchy + (Microsoft, open-spec — not proprietary) +- 25+ years of mature semantics +- **First-class hierarchies** — directly maps to one of + Aaron's named types +- **Multidimensional from the start** — every Zset type + (graph nodes / hierarchy levels / filesystem paths / + timestamps) is naturally a dimension +- **Compositional** — measures derive from measures; queries + parameterize cleanly +- Don Syme's F# work (Microsoft Research, Tier 2 + + Microsoft-Research-preferred citation) provides ample + prior art for embedding query DSLs in F# (computation + expressions, quotations, type providers) + +**Backlog row B-0148** captures the design-question: *Is MDX +the meta-DSL framing? If yes, what does the F# MDX DSL look +like?* Companion to B-0147 (which asks *what is the timeseries +algebra?*). + +This is also the **second concrete worked-example** for the +metrics-are-our-eyes framing: not only do we need timeseries-DB +to land the eyes operationally, we need a query language that +spans timeseries + the other types unified through one shape. +MDX is the candidate. PromQL's existing-MDX-shape is the proof +point. + +## Prometheus MCP + promtool — the immediate-eyes path + +Aaron 2026-05-01 same message: + +> *"plus promethius as a sick MCP and promtool"* + +While B-0147/B-0148 research the long-term substrate questions, +**Prometheus + MCP is the immediate-eyes path** — Prometheus +deploys today, MCP integration is well-supported, promtool is a +mature CLI, and PromQL queries already work (per the MDX-shape +observation). **Backlog row B-0149** captures this operational +work: deploy Prometheus locally, wire MCP server, adopt +promtool, build initial query catalog targeting the SRE metric +frameworks. + +Sequence: B-0149 (operational eyes NOW) runs in parallel with +B-0147 + B-0148 (long-term substrate research). Even if the +research recommends a different long-term backend, Prometheus +is the right starting point because (a) Aaron names it as +"good citizen" baseline, (b) its query language informs the +meta-DSL research, (c) migration to a different backend later +is well-understood (Prometheus-compatible APIs are widespread). + +## Implications for the factory + +### Backlog row B-0147 + +This memory motivates B-0147: **TimeSeries DB native-in-Zsets +multi-DSL integration research**. The timeseries-DB is the +infrastructure that operationalizes "metrics are our eyes" at +the factory level. The dependency-source priority hierarchy +filters the candidate list (Prometheus is Aaron's known good +citizen; better candidates may exist within tiers 1–5; never +proprietary). Microsoft Research and CNCF are preferred +citation sources during the design. + +### Multi-DSL multi-type Zset substrate + +Aaron's framing puts timeseries alongside graph + hierarchy + +filesystem + (other types) as **first-class types in the +Zset substrate** with **meta-DSL integration**. The vision is +not "Zset + bolted-on timeseries plugin" — it is "Zset hosting +timeseries natively as one type among many, all addressable +through a unified meta-DSL." + +This is the **multi-algebra database** vision Aaron named +2026-04-23 (per +`project_zeta_multi_algebra_database_one_algebra_to_rule_them_all_sequenced_after_frontier_and_demo_2026_04_23.md`). +Each type (graph / hierarchy / filesystem / +timeseries / ...) IS an algebra; the meta-DSL is what makes +them composable. Sequenced AFTER Frontier + factory-demo per +that earlier substrate. + +### Microsoft Research as a research-cadence input for PM-2 + +When B-0145's PM-2 (Product Manager) role gets operationalized, +**Microsoft Research is a preferred research-source for the +forward-radar memo**. PM-2's research-cadence inputs (per B-0145) +should explicitly include `https://www.microsoft.com/en-us/research/` queries +alongside the other sources. This is the kind of layer-4 +decision that Microsoft Research has consistently led on. + +## Composes with + +- `feedback_absorb_and_contribute_community_dependency_discipline_2026_04_22.md` + — absorb AND contribute back; the dependency-priority hierarchy + is the *which* to absorb from, the contribution discipline is + the *how to give back* +- `feedback_parallelism_scaling_ladder_kenji_unlocked_loop_agent_doc_code_two_lane_file_isolation_peer_mode_claims_automated_best_practice_at_scale_aaron_2026_05_01.md` (forward-ref to PR #1116) + — the amortized-keystone that "metrics-are-our-eyes" enables +- `feedback_reproducible_accuracy_before_quality_fitness_function_harness_first_aaron_2026_05_01.md` (forward-ref to PR #1116) + (PR #1116) — the SRE metric frameworks + abstraction ladder + this memory connects to via the eyes framing +- `project_zeta_multi_algebra_database_one_algebra_to_rule_them_all_sequenced_after_frontier_and_demo_2026_04_23.md` + — the multi-algebra DB vision; B-0147 timeseries is one + algebra in this scheme +- The 4-axis tightness rule (ZSet-backed + first-class event + + retractable + columnar storage) per the indexed graph- + substrate-tight memory entry in `memory/MEMORY.md` — the + same constraints apply to the timeseries algebra and likely + to all multi-type algebras under the meta-DSL +- `feedback_otto_364_search_first_authority_not_training_data_not_project_memory_aaron_2026_04_29.md` + — the search-first authority rule; Microsoft Research is one + of the preferred sources to search first, not training-data- + recall second +- `project_zeta_first_class_migrations_sql_linq_extension_post_greenfield_db_idea_2026_04_23.md` + — long-horizon DB feature alignment; the multi-DSL meta-DSL + framing here composes with first-class-migrations as another + first-class layer +- B-0147 — the actionable instance this memory motivates +- B-0145 (PM-2) — Microsoft Research as a forward-research input +- B-0144 (doc/code two-lane) — the parallelism work this memory's + metrics-are-our-eyes will eventually instrument + +## Future-Otto check + +Future-Otto considering a new dependency: +1. Is it open-source? If proprietary → reject (no exceptions). +2. Which tier is it in? Prefer tier-1 (open-source generally) + → tier-2 (Microsoft OSS) → tier-3 (CNCF) → tier-4 (Apache) + → tier-5 (MIT-licensed). Within a tier, evaluate on + substance. +3. Is there a higher-tier candidate? If yes and it's not + decisively worse on substance, choose the higher tier. + +Future-Otto researching a technical question: +1. Has Microsoft Research published on this? If yes, cite + them prominently (verify per Otto-364 search-first). +2. Has CNCF published guidance? If yes, cite. +3. Has Apache documented the pattern? If yes, cite. +4. Then expand to other research sources. + +Future-Otto considering whether metrics are worth investing in: +- They are not optional. They are **the eyes**. The factory + without them is blind. +- Investment in the timeseries-DB substrate (B-0147) is + investment in the factory's capacity to perceive itself. +- Without that capacity, the parallelism scaling ladder + (B-0144) and the PM-2 role (B-0145) and the amortized- + keystone all degrade — they all need eyes to know + whether they're working. + +The carved sentence one more time: *"Metrics are our eyes. +The factory without them is blind."*