From 6ce21825fc6f6b1d6e60bb12b4312c6659fe01e4 Mon Sep 17 00:00:00 2001 From: Lior Date: Tue, 26 May 2026 19:27:37 -0400 Subject: [PATCH] =?UTF-8?q?docs(backlog):=20B-0836=20=E2=80=94=20hardware-?= =?UTF-8?q?inventory-vs-cluster=20reconciliation=20+=20buying-decisions=20?= =?UTF-8?q?substrate=20(no=20more=20buying=20willy=20nilly)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per operator 2026-05-26 composed from two messages: 1. "git for source of truth and coackroach can be repopulated from" 2. "we will also have an inventory for every machine and know if some are missing registration when she is done with her hardware inventory work. and know what and how we need to expand so we are not buying willy nilly anymore." Files B-0836 as P1 substrate-engineering target. 4-phase decomposition: - Phase 1: Addison's CSV → DuckDB ingestion (immediate; doesn't need cluster) - Phase 2: tools/cluster/reconcile-inventory-vs-cluster.ts (this row core; surfaces 3 gap types — missing-registration / phantom-node / expansion-buying-decision) - Phase 3: CockroachDB ingestion when cluster operational (materialized view from git source of truth; repopulates anytime) - Phase 4: tools/cluster/buying-recommendations.ts (closes the loop; data-driven purchase decisions) Architecture: git (B-0812 cluster-nodes + Addison inventory) is source of truth; CockroachDB is materialized view; reconciliation diffs both sides; buying decisions informed by capacity-gap analysis. Composes_with: - B-0812 iter-5.4.1 (cluster-side data source; PR #5352 in flight) - B-0794 parent (full GitOps cluster bring-up) - B-0782 cluster-IS-DIO (git source of truth) - B-0789 cluster-as-PR-author - Addison's hardware-inventory paper-audit work (Phase 1 ingestion) - 2026-05-26 physical hardware-support test session Highest-value operator outcome: shifts hardware-purchase decisions from "guess what we need" to "data says we need N more of make/model X for workload Y." Materially affects operator cost-management. Co-Authored-By: Claude Opus 4.7 --- docs/BACKLOG.md | 1 + ...lysis-buying-decisions-aaron-2026-05-26.md | 156 ++++++++++++++++++ 2 files changed, 157 insertions(+) create mode 100644 docs/backlog/P1/B-0836-hardware-inventory-vs-cluster-reconciliation-gap-analysis-buying-decisions-aaron-2026-05-26.md diff --git a/docs/BACKLOG.md b/docs/BACKLOG.md index 9b623aaece..58c726e71e 100644 --- a/docs/BACKLOG.md +++ b/docs/BACKLOG.md @@ -395,6 +395,7 @@ are closed (status: closed in frontmatter)._ - [ ] **[B-0831](backlog/P1/B-0831-ci-cascade-6-full-install-plus-cluster-auto-join-eliminate-routine-human-physical-usb-test-aaron-2026-05-26.md)** CI cascade #6 — full-install-and-cluster-auto-join (post-boot install completes; node self-registers; eliminates routine human physical USB test) (Aaron 2026-05-26) - [ ] **[B-0833](backlog/P1/B-0833-installer-interactive-login-vs-baked-in-keys-ci-test-tension-resolve-without-shipping-credentials-aaron-2026-05-26.md)** installer interactive-login vs baked-in-keys CI-test tension — resolve without shipping credentials on ISO (operator 2026-05-26 from physical hardware-support test) - [ ] **[B-0835](backlog/P1/B-0835-installer-config-bugs-cluster-hostname-not-unique-gh-auth-not-respected-banner-password-disclosure-empirical-aaron-2026-05-26.md)** installer config-bugs cluster — hostname not unique (shows control-plane); gh login not respected; login banner shows password text (default OR custom) (empirical from 2026-05-26 physical hardware-support test) (Aaron 2026-05-26) +- [ ] **[B-0836](backlog/P1/B-0836-hardware-inventory-vs-cluster-reconciliation-gap-analysis-buying-decisions-aaron-2026-05-26.md)** hardware-inventory-vs-cluster reconciliation + gap-analysis → buying decisions (no more buying willy nilly) (Aaron 2026-05-26) ## P2 — research-grade diff --git a/docs/backlog/P1/B-0836-hardware-inventory-vs-cluster-reconciliation-gap-analysis-buying-decisions-aaron-2026-05-26.md b/docs/backlog/P1/B-0836-hardware-inventory-vs-cluster-reconciliation-gap-analysis-buying-decisions-aaron-2026-05-26.md new file mode 100644 index 0000000000..c3564e6463 --- /dev/null +++ b/docs/backlog/P1/B-0836-hardware-inventory-vs-cluster-reconciliation-gap-analysis-buying-decisions-aaron-2026-05-26.md @@ -0,0 +1,156 @@ +--- +id: B-0836 +priority: P1 +status: open +title: hardware-inventory-vs-cluster reconciliation + gap-analysis → buying decisions (no more buying willy nilly) (Aaron 2026-05-26) +effort: M +ask: aaron 2026-05-26 +created: 2026-05-26 +last_updated: 2026-05-26 +depends_on: + - B-0812 +composes_with: + - B-0794 + - B-0782 +tags: [hardware-inventory, cluster-state, gap-analysis, reconciliation, cockroachdb, git-source-of-truth, addison-substrate, buying-decisions, operational] +--- + +## Problem + +Per operator 2026-05-26 (composed from two messages during the +2026-05-26 physical hardware-support test session): + +> "git for source of truth and coackroach can be repopulated from" + +> "we will also have an inventory for every machine and know if some +> are missing registration when she is done with her hardware inventory +> work. and know what and how we need to expand so we are not buying +> willy nilly anymore." + +Two substrate-engineering targets composed: + +1. **Hardware-inventory substrate** (Addison's work): authoritative + list of every physical machine the operator owns; populated from + her paper-audit + scan → DuckDB/SQLite → eventually CockroachDB + when the cluster is operational +2. **Inventory-vs-cluster reconciliation substrate** (this row): + diff the inventory against the actually-self-registered cluster + nodes (from B-0812 iter-5.4.1 git substrate) + surface gaps in + both directions + +## Three operational questions the reconciliation answers + +| Question | Inventory side | Cluster side | Action | +|---|---|---|---| +| **Missing registration?** | Machine X exists in inventory | No `maintainers//cluster-nodes/X/node.yaml` on git | Either X isn't deployed yet OR self-registration failed; investigate | +| **Phantom node?** | Machine X not in inventory | `maintainers//cluster-nodes/X/node.yaml` on git | Either inventory is stale OR an unknown machine registered; investigate | +| **Expansion-buying-decision?** | Inventory + cluster utilization metrics | Workload demand + planned features | What hardware to buy NEXT — answer informed by data instead of guesswork | + +## Architecture (composed substrate) + +``` +┌──────────────────┐ ┌───────────────────┐ ┌────────────────────┐ +│ Addison's │ │ B-0812 iter-5.4.1 │ │ Reconciliation │ +│ hardware- │ │ self-registration │ │ (this row B-0836) │ +│ inventory │ │ │ │ │ +│ paper-audit → │ │ Node boots → │ │ Diff inventory │ +│ scan → CSV → │ │ install → opens │ │ vs cluster-nodes/ │ +│ DuckDB/SQLite → │ │ PR to maintainers/│ │ Surface gaps in │ +│ CockroachDB │ │ /cluster- │ │ both directions │ +│ (when up) │ │ nodes// │ │ │ +└──────────────────┘ └───────────────────┘ └────────────────────┘ + │ │ │ + └──────────┬───────────────┴──────────────────────────┘ + ▼ + Git is source of truth + (CockroachDB repopulates from git when needed) + (Inventory + cluster-state both live in queryable form) +``` + +## Proposed phases + +### Phase 1 — inventory schema + ingestion (Addison's path) + +Once Addison's paper-audit is scanned to CSV: + +- Schema: machine_id (operator-assigned) + make/model/SN + CPU/RAM/storage/NIC/GPU specs + location + status (in-service / spare / dead / planned-purchase) +- Ingestion: small TS script `tools/cluster/import-inventory.ts` reads CSV → DuckDB (`tools/cluster/inventory.duckdb`; gitignored), OR commits as `inventory//hardware-inventory.csv` for git-source-of-truth +- Operator can query via `duckdb -c "SELECT * FROM machines WHERE status='spare'"` immediately; CockroachDB ingestion deferred until cluster operational + +### Phase 2 — reconciliation tool (this row's core) + +`tools/cluster/reconcile-inventory-vs-cluster.ts`: + +1. Read inventory CSV from git source-of-truth (OR DuckDB) +2. Read all `maintainers/*/cluster-nodes/*/node.yaml` from git +3. Compute set-diffs in both directions: + - missing-from-cluster (in inventory; not registered) + - phantom-in-cluster (registered; not in inventory) +4. Emit a status report (markdown table OR JSON for tool composition) +5. Optional: open PR with the report on each run (audit trail) + +### Phase 3 — CockroachDB ingestion when cluster operational + +After cluster is up + CockroachDB deployed (post-B-0812 PRs merged + ArgoCD reconciled + storage backend ready): + +- Materialize git source-of-truth into CockroachDB via ingestion job (`tools/cluster/sync-git-to-crdb.ts`) +- Run on a schedule (per-hour OR per-PR-merge-webhook) +- Operator queries via SQL against CockroachDB (PostgreSQL wire-protocol; standard tooling works) +- Addison's queries shift from DuckDB → CockroachDB transparently (same SQL) + +### Phase 4 — buying-decision substrate (closes the loop) + +`tools/cluster/buying-recommendations.ts`: + +1. Read cluster utilization metrics (CPU / RAM / storage / GPU saturation per node) +2. Read planned-workload list (manually maintained OR from k8s manifests in git) +3. Compute capacity-gap = workload-demand minus inventory-capacity +4. Emit recommended-purchases list (specific make/model/qty informed by what the workloads need) +5. Operator reviews + approves; no more "buying willy nilly" + +## Acceptance + +Phased acceptance: + +- **Phase 1 acceptance**: Addison's CSV imports into DuckDB; basic queries work +- **Phase 2 acceptance**: reconcile-inventory-vs-cluster.ts emits accurate gap reports in both directions; tested with synthetic inventory + cluster state +- **Phase 3 acceptance**: CockroachDB ingestion runs on schedule; same queries return same results as DuckDB; rebuild-from-git tested + works +- **Phase 4 acceptance**: buying-recommendations.ts emits actionable purchase list informed by real data; operator's next purchase decision is data-driven not guesswork + +## Composes with + +- **B-0812** iter-5.4.1 (this row's cluster-side data source; ships Step 6.9 of zeta-install.sh) +- **B-0794** parent (full GitOps cluster bring-up; inventory-reconciliation is a downstream value-add) +- **B-0782** cluster-IS-DIO (git is source of truth; CockroachDB is materialized view) +- **B-0789** cluster-as-PR-author (reconciliation tool could also open PRs for inventory updates) +- Addison's hardware-inventory paper-audit work (Phase 1 ingestion target) +- The 2026-05-26 substrate-engineering session (operator's git-source-of-truth + CockroachDB-repopulates-from-git architecture) + +## Substrate-honest framing + +This row depends on B-0812 iter-5.4.1 LANDING + the cluster being +operational (post-installs with self-registration). Phase 1 (Addison's +inventory ingestion) can start IMMEDIATELY once her scan completes; +Phase 2 (reconciliation) needs at least one B-0812 self-registration +PR merged so there's cluster-side state to diff against; Phases 3+4 +need the cluster operational (CockroachDB deployed). + +The buying-decision payoff is the highest-value operator outcome — +shifts hardware-purchase decisions from "guess what we need" to +"data says we need N more of make/model X for workload Y." This +composes with the broader homelab-first + cost-conscious operator +substrate. + +## Origin + +Two operator messages 2026-05-26 (during physical hardware-support +test session that also produced B-0832/B-0833/B-0834/B-0835): + +1. "git for source of truth and coackroach can be repopulated from" +2. "we will also have an inventory for every machine and know if some + are missing registration when she is done with her hardware + inventory work. and know what and how we need to expand so we are + not buying willy nilly anymore." + +Files this as P1 substrate target — directly enables data-driven +buying decisions which materially affects operator cost-management.