Skip to content

feat(ai-cluster): cookie-cutter node provisioning via disko + Longhorn multi-disk#4950

Merged
AceHack merged 1 commit into
mainfrom
feat/disko-cookie-cutter-2026-05-25-c2
May 25, 2026
Merged

feat(ai-cluster): cookie-cutter node provisioning via disko + Longhorn multi-disk#4950
AceHack merged 1 commit into
mainfrom
feat/disko-cookie-cutter-2026-05-25-c2

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 25, 2026

Summary

Declarative end-to-end node provisioning: copy template → edit six lines → boot USB → disko + nixos-install → cluster member. No hand-partitioning, no per-host shell scripts. Adding a new identical box is ~10 minutes wall-clock.

What lands

  • nixos/modules/disko-shapes/2nvme.nix — cookie-cutter shape for boxes with 2 equal NVMes. Uses size = "100%" for Longhorn partitions so the shape works at any disk size (handles "1TB is never really 1TB"). Layout: nvme0 = 1G ESP + 256G root + rest Longhorn; nvme1 = whole-disk Longhorn. OS + bootloader live on nvme0 only — the bootloader-on-untouched-disk failure mode from manual install is structurally impossible since disko --mode disko wipes both drives first and only one disk has an ESP.
  • nixos/modules/longhorn-disks.nix — wires the shape's mountpoints to Longhorn's node-level data-path catalog. Takes zeta.longhorn.dataDisks list, ensures mount dirs exist, emits /etc/longhorn/node-disks.yaml for the cluster-side patch Job, adds the zeta.io/longhorn-disks=N K3S node label.
  • nixos/hosts/worker-template/default.nix — composes shape + longhorn module + k3s-agent/gpu/docker. Six clearly-marked PLACEHOLDER blocks (hostName, hostId, two by-id disk symlinks, network, SSH key).
  • flake.nix — disko input + worker-template nixosConfiguration + new modules exposed under nixosModules.{disko-shape-2nvme, longhorn-disks}.
  • usb-nixos-installer/.../configuration.nix — bakes disko into the ISO so the installer doesn't need network access to fetch it.
  • PROVISIONING.md — end-to-end cookie-cutter workflow + disk-failure recovery + multi-shape extension guide.

Why this is the right shape for symmetric 2×NVMe

Discussed in conversation:

  • Longhorn already replicates cross-node (default replica count 3), so intra-node RAID is wasted capacity at cluster scope
  • K3S + GPU workloads → swap is a footgun (OOM-kill > thrashing), so no swap partition
  • Container image cache (CUDA + vLLM + Ollama + ArgoCD + Cilium + SPIRE + Vault + Hindsight layers...) easily hits 50+ GB → 256 GB root not 80 GB
  • Using both NVMes for Longhorn data paths (vs mirroring) gives \~1.7 TB usable per node and lets a single-disk failure isolate to one Longhorn path while OS keeps running

Future-shape extension

disko-shapes/2nvme.nix is one shape. Future hardware classes get their own file (4nvme.nix, nvme-plus-sata.nix, etc.) matching the same zeta.disko options pattern. The Longhorn module is shape-agnostic — takes a list of mount paths, doesn't care how many disks contributed them.

Test plan

  • nix flake check from full-ai-cluster/ passes after nix flake update adds the disko input
  • nix build .#installer-iso succeeds (disko gets baked in)
  • On a target box: USB boot → disko --mode disko --flake .#worker-template (with placeholder disk IDs replaced) wipes + partitions + formats + mounts both drives without errors
  • nixos-install --flake .#worker-template completes
  • After reboot: lsblk shows the expected layout; mount | grep longhorn shows both data paths
  • After node joins cluster: kubectl get nodes shows the new node; kubectl -n longhorn-system get nodes.longhorn.io <hostname> -o yaml shows both disks entries

Open follow-ups (not blockers for this PR)

  • Cluster-side Job/DaemonSet to consume /etc/longhorn/node-disks.yaml and patch the Longhorn Node CR automatically (documented in longhorn-disks.nix TODO)
  • Real per-host configs (worker-gpu-01.nix, etc.) as physical boxes come online — those are cookie-cutter copies of the template, one PR per box
  • Additional shapes as new hardware classes show up

🤖 Generated with Claude Code

…n multi-disk

Adds declarative disk partitioning + multi-disk Longhorn wiring + a
worker-template host so adding a new identical box to the cluster is
six per-host edits and `disko + nixos-install` — no hand-partitioning,
no per-host shell scripts.

What lands:

- `nixos/modules/disko-shapes/2nvme.nix` — cookie-cutter disko shape
  for boxes with 2 equal NVMes. Uses `size = "100%"` for the Longhorn
  partitions so the shape works on any disk size (handles the "1TB is
  never really 1TB" reality). Layout: nvme0 = 1G ESP + 256G root + rest
  Longhorn; nvme1 = whole disk Longhorn. OS + bootloader live on nvme0
  only — bootloader-on-wrong-disk failure mode from prior manual install
  is structurally impossible.
- `nixos/modules/longhorn-disks.nix` — declarative wiring between the
  shape's mountpoints and Longhorn's node-level data-path catalog.
  Takes `zeta.longhorn.dataDisks` list, ensures mount dirs exist with
  the perms Longhorn expects, emits a NodeDiskCatalog YAML at
  /etc/longhorn/node-disks.yaml for the cluster-side patch Job, and
  adds the `zeta.io/longhorn-disks=N` K3S node label so the scheduler
  can target high-capacity nodes.
- `nixos/hosts/worker-template/default.nix` — composes the shape +
  longhorn module + existing k3s-agent/gpu/docker modules. Six clearly-
  marked PLACEHOLDER blocks (hostName, hostId, nvme0 by-id, nvme1
  by-id, network, SSH key). Cookie-cutter from here per new box.
- `flake.nix` — adds disko input + `worker-template` nixosConfiguration
  + exposes the new modules via `nixosModules.{disko-shape-2nvme,
  longhorn-disks}` for external reuse.
- `usb-nixos-installer/.../configuration.nix` — bakes disko into the
  installer ISO so installs don't need network access just to fetch
  disko itself.
- `PROVISIONING.md` — end-to-end cookie-cutter workflow (copy template
  → edit 6 lines → flake.nix entry → boot USB → `disko --mode disko`
  + `nixos-install` → done). Includes disk-failure recovery paths and
  multi-shape extension guide for future hardware classes.

Provisioning a new identical box: ~10 minutes wall-clock, six lines
of human edits, declarative end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 25, 2026 16:29
@AceHack AceHack enabled auto-merge (squash) May 25, 2026 16:29
@AceHack AceHack merged commit bbabef4 into main May 25, 2026
27 of 28 checks passed
@AceHack AceHack deleted the feat/disko-cookie-cutter-2026-05-25-c2 branch May 25, 2026 16:32
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1f1d1e3699

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +84 to +85
4. Longhorn DaemonSet pod schedules → reads `/etc/longhorn/node-disks.yaml`
→ patches the Longhorn Node CR to add both data paths
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove unsupported auto-patch step from provisioning workflow

This step states that a Longhorn DaemonSet will read /etc/longhorn/node-disks.yaml and patch Node CRs automatically, but this commit does not add any manifest or script under full-ai-cluster/k8s/ that consumes that file (and nixos/modules/longhorn-disks.nix still documents the patch job as TODO/manual). As written, operators can complete the runbook believing both data disks are active while Longhorn still uses default disk config, which can misreport usable capacity and scheduling behavior immediately after node bring-up.

Useful? React with 👍 / 👎.

Comment on lines +142 to +146
# disko-shapes/ modules. Pre-staged on the ISO so installs
# don't need network access just to fetch disko itself.
# Invocation:
# disko --mode disko --flake /mnt/etc/zeta/full-ai-cluster#<host>
disko
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Synchronize disko package change with root USB installer copy

Adding disko only to full-ai-cluster/usb-nixos-installer breaks the documented "byte-identical" relationship with /workspace/Zeta/usb-nixos-installer, whose nixos/installer/configuration.nix still lacks this package. That creates divergent installer behavior: users building from the root usb-nixos-installer/ path (explicitly supported in its README) will not have offline disko, so the documented provisioning flow can fail on air-gapped installs.

Useful? React with 👍 / 👎.

AceHack added a commit that referenced this pull request May 25, 2026
…ory (#4951)

Adds precise hardware mapping, a guided install script, and an
inventory-capture tool. Four composing pieces:

1. Node Feature Discovery (NFD) — labels every node with detailed
   hardware features (CPU model, AVX-512, PCI vendors, kernel modules,
   storage class). Lands as an ArgoCD Application using the upstream
   kubernetes-sigs Helm chart, with `worker` tolerating everything so
   it discovers features on control-plane + tainted GPU nodes alike.
   PCI source plugin tuned to whitelist network/display/NVMe/
   accelerator classes for compact labels.

2. hwloc / lstopo — added to common.nix so every cluster node has
   the topology tool installed; also added to the USB installer ISO
   so the operator can inspect NUMA / PCI / GPU layout BEFORE picking
   disk-by-id paths for disko. Composes with NFD: NFD labels are
   kubectl-queryable; lstopo XML is diff-stable for catching silent
   hardware drift.

3. tools/cluster-inventory/capture.sh — pulls NFD labels and
   lstopo XML from every node (via kubectl debug, with ssh fallback),
   renders SVG topology diagrams, and lands artifacts under
   full-ai-cluster/docs/cluster-hardware/<node>/. README documents
   query patterns, the cross-platform rescue substrate via Paragon
   FS drivers (extFS / NTFS for Mac, ExtFS for Windows — any cluster
   disk mounts read+write on any maintainer machine), and recommended
   cadence.

4. zeta-install — guided installer script baked into the USB ISO via
   writeShellScriptBin. Walks through: enumerate internal NVMes,
   confirm boot disk, type WIPE to confirm, wipe + partition + format
   + mount per the 2-NVMe shape, clone Zeta, run nixos-install for
   the chosen host attribute. /etc/zeta-install.md runbook on the USB
   updated to point at zeta-install as the default path.

Composes with PR #4950 disko cookie-cutter: zeta-install does the
manual-shape path the script encodes; once #4950 lands, the disko
flake invocation becomes the alternative path documented in the
runbook (for hosts that prefer the disko module over the imperative
sgdisk sequence). Both end on the same disk layout.

Note: zeta-install.sh lives at usb-nixos-installer/zeta-install.sh
(not bin/) because root .gitignore blocks bin/ globally for .NET
build outputs. configuration.nix readFile path matches.

Co-authored-by: Lior <lior@zeta.dev>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@AceHack AceHack review requested due to automatic review settings May 25, 2026 16:50
AceHack added a commit that referenced this pull request May 25, 2026
…4956)

* ci(ai-cluster): workflow that builds full-ai-cluster installer ISO

Sibling to .github/workflows/build-installer-iso.yml (which targets
the root flake's infra/nixos/hosts/installer/). This new workflow
targets full-ai-cluster/flake.nix's installer-iso package, which
includes the recent USB additions:

  - disko (declarative partitioning; PR #4950)
  - hwloc / lstopo (hardware topology; PR #4951)
  - zeta-install helper script (PR #4951)
  - Paragon cross-platform rescue docs (PR #4951)

Triggers on PR + push to main when full-ai-cluster/flake.{nix,lock},
full-ai-cluster/usb-nixos-installer/**, or
full-ai-cluster/nixos/modules/disko-shapes/** changes. Plus
workflow_dispatch for manual runs.

Uploads the ISO as a workflow artifact (90 day retention) so
anyone reviewing the PR or running the workflow can download it
and `dd` to a USB stick without needing Nix installed locally.

Discipline mirrors build-installer-iso.yml:
  - ubuntu-24.04 (pinned)
  - third-party actions SHA-pinned
  - permissions: contents: read
  - concurrency cancel-in-progress only on PRs
  - no github.event.* values interpolated into run: lines

The repo currently carries two parallel installer substrates
(infra/ vs full-ai-cluster/). Consolidating them into a single
canonical path is a separate larger PR — this workflow just makes
the AI-cluster ISO downloadable from CI in the meantime.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): align workflow header comment with actual paths filter

Copilot flagged that the header comment says triggers ONLY on the
3 paths but the actual paths: list also includes
`full-ai-cluster/nixos/modules/disko-shapes/**`. Rewriting the
comment as a bullet list that matches the paths verbatim — easier
to keep in sync as the list grows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Lior <lior@zeta.dev>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 25, 2026
… — hats become negotiated fork structure ON TOP of reference stack — deterministic + declarative + GitOps + AI-native + human-native (#5004)

Aaron 2026-05-25, continuing the ACE+fork-negotiation arc after B-0741:

'hats become our negoated fork structure on top of a referece k8s local
stack in zeta so anyone can use the reference stack and negoate back hats
and new cluster primitives / charts ontology negoation, ace can distribute
the reference stack itself as PoC that it has reliable AI control over
all the package managers deterministicly and declarative / desired state
way for easy git ops ai native human native understanding.'

Operational anchor for B-0741. B-0741 = WHAT the primitive is;
B-0742 = HOW it's empirically demonstrated via reference-cluster-as-Ace-
package.

Three substantive claims:

1. full-ai-cluster/ IS the reference k8s local stack. Inventory of
   existing PR-landed substrate (PR #4930 hat-system + #4950 disko +
   #4951 NFD/lstopo/zeta-install + #4953 dev-cluster + ArgoCD + Cilium
   + cert-manager + Vault + SPIRE + Trust Manager + ESO + B-0737 zflash
   + Determinate Systems Nix installer).

2. Hats become the negotiated fork structure ON TOP of the reference
   stack. Forks declare delta via hat-ontology; cross-fork negotiation
   maps capabilities (B-0741 surface 2). Worked example: LFG
   trading-bot-driver hat + Healthcare-fork hipaa-data-handler hat.

3. Ace distributes the reference stack as PoC of reliable AI control
   over all PMs. Single 'ace install zeta/reference-cluster' dispatches
   across Nix + ArgoCD + helm + kustomize + native k8s + brew + apt +
   mise + DeterminateSystems Nix installer. Deterministic (Nix flake.lock
   + ArgoCD pins). Declarative + desired-state (GitOps). AI-native
   (markdown + JSON-LD). Human-native (readable, reviewable).

Six independently-shippable scope items: reference-stack inventory doc;
hat-as-fork-structure spec; Ace cluster-distribution scope extension to
B-0288; determinism PoC (N=3+ machines); cross-PM dispatch PoC; desired-
state-enforcement drift-recovery PoC.

Composes with: B-0741 (abstract primitive) + B-0731 (hat ontology) +
B-0247/B-0287/B-0288 (Ace PM existing substrate) + B-0727 (4-tier
federation) + B-0726 (Reticulum) + B-0628/B-0638/B-0634/B-0703
(governance + negotiation + signature + consensus) + B-0732 (leverage-
class safety) + B-0737 (zflash bring-up).

P2 — high-value PoC anchoring B-0741 abstract primitive; not P1 because
full-ai-cluster substrate just landed this round + needs to stabilize
before distribution layer ships.

Closing arc of today's 2026-05-25 substrate cascade (B-0728 → ... →
B-0742): destructive-tool authoring contract through reference-stack PoC.

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant