Lucent-Financial-Group · AceHack · May 25, 2026 · May 25, 2026 · chatgpt-codex-connector · May 25, 2026
diff --git a/full-ai-cluster/PROVISIONING.md b/full-ai-cluster/PROVISIONING.md
@@ -0,0 +1,134 @@
+# Provisioning a new node — cookie-cutter workflow
+
+End-to-end: physical box arrives → boots into running cluster
+member with replicated Longhorn capacity. Six values to change
+per box, no hand-partitioning, no shell scripts.
+
+## What you need
+
+- A NixOS installer USB built from this repo (`nix build .#installer-iso`)
+- The new box wired to the cluster network with internet access
+- The maintainer's public SSH key
+- A few minutes to read off two disk serial numbers
+
+## Step 1: copy the template
+
+```bash
+HOST=worker-gpu-03    # pick the next free number
+cp -r full-ai-cluster/nixos/hosts/worker-template \
+      full-ai-cluster/nixos/hosts/$HOST
+```
+
+## Step 2: change the six placeholder values
+
+Open `full-ai-cluster/nixos/hosts/$HOST/default.nix` and edit
+each of the six clearly-marked PLACEHOLDER blocks:
+
+| What | Where to get it |
+|------|-----------------|
+| `networking.hostName` | the name you chose above (`worker-gpu-03`) |
+| `networking.hostId` | `head -c4 /dev/urandom \| od -A n -t x4 \| tr -d ' '` |
+| `zeta.disko.nvme0` | On the live system: `ls -l /dev/disk/by-id/ \| grep nvme \| awk '{print $9, $11}'` — pick the disk you want to BE the boot disk (gets OS + first Longhorn data path) |
+| `zeta.disko.nvme1` | Same listing, the other NVMe (becomes pure Longhorn data) |
+| Network config | Static IP block if you don't use DHCP |
+| `users.users.zeta.openssh.authorizedKeys` | Maintainer key |
+
+## Step 3: wire into the flake
+
+Open `full-ai-cluster/flake.nix`, add an entry mirroring
+`worker-template`:
+
+```nix
+"worker-gpu-03" = mkSystem {
+  modules = [
+    ./nixos/hosts/worker-gpu-03/default.nix
+  ];
+};
+```
+
+Commit + push to main so the install reads from a real ref.
+
+## Step 4: boot the box on the USB
+
+UEFI boot order → USB first. Network up via `nmtui` if not DHCP.
+
+```bash
+# Clone Zeta to the live system's writable scratch
+sudo git clone https://github.com/Lucent-Financial-Group/Zeta /mnt/etc/zeta
+cd /mnt/etc/zeta/full-ai-cluster
+```
+
+## Step 5: disko + nixos-install (the actual cookie-cutter install)
+
+```bash
+# Step 5a — disko wipes + partitions + formats + mounts both disks
+sudo disko --mode disko --flake .#worker-gpu-03
+
+# Step 5b — install NixOS onto the mounted layout
+sudo nixos-install --flake .#worker-gpu-03 --no-root-password
+
+# Step 5c — reboot. Box joins cluster on first boot.
+sudo reboot
+```
+
+That's it. Subsequent boxes: repeat steps 1-5 with new placeholder
+values. Each provision is ~10 minutes wall-clock, ~6 lines of
+human edits, zero hand-partitioning.
+
+## What happens after first boot
+
+1. systemd-boot → kernel → NixOS userland (~30s)
+2. K3S agent service starts → contacts `control-plane.zeta.local:6443`
+3. Cluster admits the node → kubelet reports both `/var/lib/longhorn-disk1`
+   and `/var/lib/longhorn-disk2` as filesystem entries
+4. Longhorn DaemonSet pod schedules → reads `/etc/longhorn/node-disks.yaml`
+   → patches the Longhorn Node CR to add both data paths
+5. Longhorn rebalancer notices the new capacity → starts placing
+   replicas of existing volumes onto this node
+6. ArgoCD reconciles any node-affinity workloads that target this
+   node's labels
+
+Check it landed:
+
+```bash
+kubectl get nodes -o wide
+kubectl -n longhorn-system get nodes.longhorn.io worker-gpu-03 -o yaml | grep -A20 disks:
+```
+
+## Disk failure recovery
+
+NVMe dies → Longhorn marks the data path Unavailable → the cluster's
+other replicas (default replica count 3 means 2 healthy copies
+remain) keep serving the volumes → no app-visible interruption.
+
+Replace the dead drive, then either:
+
+- **Hot path** (drive replaced with identical model + position):
+  reboot, disko recreates the partition table on the fresh drive,
+  Longhorn re-registers the data path, replicas rebuild from peers.
+- **Slow path** (drive serial changed): update the `zeta.disko.nvme0`
+  or `nvme1` by-id symlink in `nixos/hosts/<host>/default.nix`,
+  `nixos-rebuild switch --flake .#<host> --target-host <host>` from
+  any admin machine, then rebuild as above.
+
+OS itself: the `/` partition lives on `nvme0` only, so a `nvme1`
+failure leaves the node fully bootable + Longhorn capacity
+degrades by half until repair. An `nvme0` failure takes the OS
+down — reinstall via Step 5 onto the replacement disk; Longhorn
+data on `nvme1` is re-imported when the rebuilt node rejoins.
+
+## Multi-shape support
+
+`disko-shapes/2nvme.nix` is the shape for the current hardware.
+Adding a new hardware class (e.g. 4 NVMes, or NVMe + SATA SSD mix)
+means:
+
+1. Author `disko-shapes/<new-shape>.nix` matching the
+   `zeta.disko` options pattern
+2. Author a new host template under `hosts/<new-class>-template/`
+   that imports it
+3. Cookie-cutter from THAT template for boxes of the new class
+
+The Longhorn module (`modules/longhorn-disks.nix`) is shape-
+agnostic — it takes a list of mount paths and wires them, no
+matter how many disks contributed those mounts.
diff --git a/full-ai-cluster/flake.nix b/full-ai-cluster/flake.nix
@@ -30,9 +30,19 @@
       url = "github:nix-darwin/nix-darwin/nix-darwin-24.11";
       inputs.nixpkgs.follows = "nixpkgs";
     };
+
+    # disko — declarative disk partitioning + formatting + mounting.
+    # Together with the disko-shapes/ modules under ./nixos/modules,
+    # adding a new node is: copy a host template, change hostname/IP,
+    # commit, run `nixos-install --flake .#<host> --disko`.
+    # No interactive partitioning, no per-host shell scripts.
+    disko = {
+      url = "github:nix-community/disko";
+      inputs.nixpkgs.follows = "nixpkgs";
+    };
   };
 
-  outputs = { self, nixpkgs, nixos-hardware, flake-utils, nix-darwin, ... }@inputs:
+  outputs = { self, nixpkgs, nixos-hardware, flake-utils, nix-darwin, disko, ... }@inputs:
     let
       stateVersion = "24.11";
 
@@ -79,6 +89,18 @@
             ./nixos/hosts/worker-gpu/configuration.nix
           ];
         };
+
+        # Cookie-cutter worker template — uses disko for declarative
+        # disk partitioning + Longhorn multi-disk wiring. Copy
+        # ./nixos/hosts/worker-template/ to ./nixos/hosts/worker-gpu-NN/,
+        # change the six placeholder values documented in the file,
+        # then add a `worker-gpu-NN = mkSystem { ... };` entry here
+        # mirroring this one. See full-ai-cluster/PROVISIONING.md.
+        worker-template = mkSystem {
+          modules = [
+            ./nixos/hosts/worker-template/default.nix
+          ];
+        };
       };
 
       # Shared NixOS modules — per-host configs import these via
@@ -93,6 +115,8 @@
         gpu-device-plugin = ./nixos/modules/gpu-device-plugin.nix;
         docker = ./nixos/modules/docker.nix;
         local-storage = ./nixos/modules/local-storage.nix;
+        longhorn-disks = ./nixos/modules/longhorn-disks.nix;
+        disko-shape-2nvme = ./nixos/modules/disko-shapes/2nvme.nix;
       };
 
       # nix-darwin config for maintainer Macs (Apple Silicon). Enables

diff --git a/full-ai-cluster/nixos/hosts/worker-template/default.nix b/full-ai-cluster/nixos/hosts/worker-template/default.nix
@@ -0,0 +1,101 @@
+# full-ai-cluster/nixos/hosts/worker-template/default.nix
+#
+# Cookie-cutter worker node config. Adding a new identical box:
+#
+#   1. cp -r nixos/hosts/worker-template nixos/hosts/worker-gpu-NN
+#   2. Edit the new file — change SIX placeholder values:
+#        - networking.hostName        (line ~30)
+#        - networking.hostId          (line ~32; new random 8-hex)
+#        - networking.interfaces      (per-host MAC / static IP)
+#        - zeta.disko.nvme0           (per-host /dev/disk/by-id)
+#        - zeta.disko.nvme1           (per-host /dev/disk/by-id)
+#        - users.users.zeta.openssh.authorizedKeys  (maintainer key)
+#   3. Add `worker-gpu-NN` to flake.nix nixosConfigurations
+#   4. Boot the box on the installer USB, then:
+#        nix run github:nix-community/disko -- \
+#          --mode disko \
+#          --flake /mnt/etc/zeta/full-ai-cluster#worker-gpu-NN
+#        nixos-install --flake /mnt/etc/zeta/full-ai-cluster#worker-gpu-NN
+#   5. Reboot. Node joins cluster, Longhorn picks up both disks,
+#      ArgoCD reconciles workloads.
+#
+# Hardware shape: x86_64, UEFI, 2 NVMes (any size, same shape),
+# 1+ NVIDIA GPU. For AMD-only or Intel-only GPU nodes change the
+# `zeta.gpu-device-plugin.vendors` setting; for non-GPU workers
+# drop the GPU imports entirely.
+
+{ config, pkgs, lib, inputs, ... }:
+
+{
+  imports = [
+    # Declarative disk layout — disko shapes the partitions,
+    # longhorn-disks wires the mounts to Longhorn data paths.
+    inputs.disko.nixosModules.disko
+    ../../modules/disko-shapes/2nvme.nix
+    ../../modules/longhorn-disks.nix
+
+    # Cluster role + hardware-class modules.
+    ../../modules/common.nix
+    ../../modules/k3s-agent.nix
+    ../../modules/gpu.nix
+    ../../modules/gpu-device-plugin.nix
+    ../../modules/gpu-passthrough.nix
+    ../../modules/docker.nix
+    ../../modules/local-storage.nix
+  ];
+
+  # ── PLACEHOLDER: change per-host ─────────────────────────────
+  networking.hostName = "worker-template";
+  networking.hostId = "00000000";   # `head -c4 /dev/urandom | od -A n -t x4 | tr -d ' '`
+  # ─────────────────────────────────────────────────────────────
+
+  # ── PLACEHOLDER: change per-host (disk IDs) ──────────────────
+  # On the live system, run: ls -l /dev/disk/by-id/ | grep nvme
+  zeta.disko = {
+    nvme0 = "/dev/disk/by-id/nvme-REPLACE_ME_BOOT_DISK";
+    nvme1 = "/dev/disk/by-id/nvme-REPLACE_ME_LONGHORN_DISK";
+    # rootSize = "256G";  # default; override if needed
+  };
+  # ─────────────────────────────────────────────────────────────
+
+  # ── PLACEHOLDER: per-host static IP if not using DHCP ────────
+  # networking.useDHCP = false;
+  # networking.interfaces.eno1.ipv4.addresses = [{
+  #   address = "10.0.0.21";
+  #   prefixLength = 24;
+  # }];
+  # networking.defaultGateway = "10.0.0.1";
+  # networking.nameservers = [ "10.0.0.1" "1.1.1.1" ];
+  # ─────────────────────────────────────────────────────────────
+
+  # K3S join target — same for every worker in the cluster.
+  services.k3s.serverAddr = "https://control-plane.zeta.local:6443";
+
+  # GPU device plugin vendor mix. Override per-host if AMD or Intel.
+  zeta.gpu-device-plugin = {
+    enable = true;
+    vendors = [ "nvidia" ];
+  };
+
+  # VFIO passthrough off by default; enable per-host with PCI IDs.
+  zeta.gpu-passthrough = {
+    enable = false;
+    pciIds = [ ];
+  };
+
+  # Node labels — uncomment + customize per hardware spec so the
+  # scheduler can target nodes by GPU model / count.
+  services.k3s.extraFlags = lib.mkAfter [
+    # "--node-label=zeta.io/gpu-model=rtx-4090"
+    # "--node-label=zeta.io/gpu-count=2"
+    # "--node-label=zeta.io/dram-gb=128"
+  ];
+
+  # ── PLACEHOLDER: maintainer SSH keys ─────────────────────────
+  users.users.zeta.openssh.authorizedKeys.keys = [
+    # "ssh-ed25519 AAAAC3Nz... aaron@zeta"
+  ];
+  # ─────────────────────────────────────────────────────────────
+
+  system.stateVersion = "24.11";
+}