From ff4117ff2e49d9277c54d455f92184c1785b316b Mon Sep 17 00:00:00 2001 From: Lior Date: Tue, 26 May 2026 22:30:30 -0400 Subject: [PATCH] =?UTF-8?q?fix(B-0835=20Bug=204=20+=205=20=E2=80=94=20Aaro?= =?UTF-8?q?n=202026-05-27=20control-plane=20install=20empirical=20anchors)?= =?UTF-8?q?:=20zeta-install.sh=20storage=20probe=20filters=200B=20devices?= =?UTF-8?q?=20+=20common.nix=20adds=20gh=20CLI=20to=20systemPackages?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two empirical anchors from Aaron's iter-5.4 install of `node-e5a176` (PR #5380 self-registered) where install completed but operator hit two distinct gaps on first login: Bug 4 — `/dev/sda 0B` zero-size storage device in node.yaml ================================================================ The storage probe in zeta-install.sh (line 781) emitted EVERY block device from lsblk, including 0-byte placeholder devices (empty SD card readers, empty optical bays, removable-media readers without media). Aaron's Intel Core Ultra 9 185H node has /dev/sda 0B (likely the laptop's empty SD card reader) which got registered as "storage" — Copilot P1 finding on PR #5380. Fix: add `$2 != "0B"` filter to the awk pipeline so zero-size placeholders are excluded from the spec.hardware.storage list. - STORAGE_LINES=$(lsblk -ndo NAME,SIZE,TYPE -e7 2>/dev/null | - awk '$3=="disk"{print "..."}' || echo "") + STORAGE_LINES=$(lsblk -ndo NAME,SIZE,TYPE -e7 2>/dev/null | + awk '$3=="disk" && $2!="0B"{print "..."}' || echo "") This prevents reconcilers reading spec.hardware.storage from treating 0-byte devices as usable storage targets. Bug 5 — gh CLI not in installed system's PATH after reboot ================================================================ Operator framing: "when i log in gh command is not found" The installer ISO had gh in PATH (used by iter-5.4.0 for `gh auth login` during Step 6.8) but common.nix systemPackages did not include gh, so post-reboot the auth tokens stored in ~/.config/gh are useless without the binary. The gap surfaced empirically on Aaron's first login to the freshly-installed node-e5a176. Fix: add `gh` to common.nix environment.systemPackages so the installed system has it for ongoing operator workflows (re-auth, ssh-key sync, future register/deregister-node tooling, kubectl helpers that wrap gh, etc.). Composes with: B-0813 (cluster-node schema), B-0817 (register-node tool), iter-5.4 install cascade. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- full-ai-cluster/nixos/modules/common.nix | 10 ++++++++++ full-ai-cluster/usb-nixos-installer/zeta-install.sh | 7 ++++++- 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/full-ai-cluster/nixos/modules/common.nix b/full-ai-cluster/nixos/modules/common.nix index 94feeb8c7a..32f54691ba 100644 --- a/full-ai-cluster/nixos/modules/common.nix +++ b/full-ai-cluster/nixos/modules/common.nix @@ -90,6 +90,16 @@ skopeo kubectl kubernetes-helm k9s argocd cilium-cli hubble + + # B-0835 fix (Aaron 2026-05-27 control-plane install): gh CLI was + # available in the installer ISO's PATH (iter-5.4.0 used it for + # `gh auth login` during install) but NOT in the installed system's + # PATH after reboot. Operator empirically hit "gh: command not found" + # on first login. The gh-auth tokens stored in ~/.config/gh during + # install are useless without the binary. gh stays in systemPackages + # for ongoing operator workflows (re-auth, ssh-key sync, future + # node-register tooling). + gh ]; boot.loader = { diff --git a/full-ai-cluster/usb-nixos-installer/zeta-install.sh b/full-ai-cluster/usb-nixos-installer/zeta-install.sh index ff84460d69..caab8bfbbb 100755 --- a/full-ai-cluster/usb-nixos-installer/zeta-install.sh +++ b/full-ai-cluster/usb-nixos-installer/zeta-install.sh @@ -778,7 +778,12 @@ if [ "$GH_AUTH_OK" = 1 ]; then # Storage lines: indented 6 spaces to nest under spec.hardware.storage # (Copilot finding on #5352 — was a sibling of `hardware:` at 4 spaces; the # B-0813 schema places storage under hardware block). - STORAGE_LINES=$(lsblk -ndo NAME,SIZE,TYPE -e7 2>/dev/null | awk '$3=="disk"{print " - \"/dev/" $1 " " $2 "\""}' || echo "") + # Filter zero-size devices ($2 != "0B"): empty SD card readers, optical + # bays, and other placeholder block devices show up in `lsblk -ndo` + # output with SIZE=0B and confuse any reconciler that interprets the + # storage list as usable. Copilot finding on PR #5380 (Aaron's + # 2026-05-27 control-plane registration surfaced `/dev/sda 0B`). + STORAGE_LINES=$(lsblk -ndo NAME,SIZE,TYPE -e7 2>/dev/null | awk '$3=="disk" && $2!="0B"{print " - \"/dev/" $1 " " $2 "\""}' || echo "") REG_TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ") FLAKE_COMMIT=$(git -C /mnt/etc/zeta rev-parse HEAD 2>/dev/null | head -c 12 || echo "unknown")