From 9cde0919b487ba5e7c3ae3c6d4ae808379fd313e Mon Sep 17 00:00:00 2001 From: Lior Date: Tue, 26 May 2026 00:01:55 -0400 Subject: [PATCH] =?UTF-8?q?backlog(B-0789):=20iter-4=20v1=20cluster=20cred?= =?UTF-8?q?ential=20substrate=20=E2=80=94=20hashedPassword=20+=20operator-?= =?UTF-8?q?ssh-keys=20scaffold=20(iter-4.2=20ships=20the=20zero-typing=20a?= =?UTF-8?q?uto-inject)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The maintainer 2026-05-26 surfaced two adjacent signals across recent ticks: 1. "we can do what's going to make cluster setup eaiser for me and not users if that's ssh lets do that first cause we want to get ai running the cluster asap" — authorized iter-4 SSH+password work 2. "i can wait for 4.2 or whatever version before we try again" — downgraded v1 from "test via re-flash" to "substrate scaffolding for 4.2 to build on" iter-4 v1 ships the Nix-module + per-host-import scaffolding so iter-4.2 (zflash auto-inject + zeta-install.sh USB probe) is a tightly-scoped tooling PR rather than a substrate-shape PR. Files: - full-ai-cluster/nixos/modules/initial-password.nix (new): sha512crypt hash for "zeta-change-me" via `openssl passwd -6`; operator rotates on first tty1 login via `passwd zeta`. Simplest-first per the maintainer-Mika 2026-05-25 feedback memory; promote to yescrypt / agenix / sops-nix when repo goes public OR multi-operator isolation becomes load-bearing - full-ai-cluster/nixos/modules/operator-ssh-keys.nix (new): empty stub with edit-and-rebuild workflow documented in the module's comment header. iter-4.2 OVERWRITES this file at install time from the boot USB; v1 ships the stub so per-host imports compose cleanly with the iter-4.2 tooling change without re-architecting - full-ai-cluster/nixos/hosts/control-plane/configuration.nix: imports the two new modules; removes the prior inline empty authorizedKeys.keys declaration (now handled by the imported module) - full-ai-cluster/usb-nixos-installer/zeta-install.sh: prints initial credentials + post-install operator workflow block before exit. Text covers both the v1 manual-edit fallback path and the iter-4.2 zero-typing path so the same echo block works for either iteration - docs/backlog/P1/B-0789-*.md (new): captures iter-4 v1 acceptance (scaffolding-only) + iter-4.2 acceptance (zflash auto-inject; the maintainer's actually-usable target) + iter-4.3 multi-key extension + iter-5 per-node deploy-key + iter-5+ secret-management substrate promotion paths Composes with B-0754 (iter-3 zero-typing USB install — iter-4 is the credential-substrate follow-on); B-0759 (first-time-CLI-user persona); B-0770 (Comet Pro IP-KVM — local-console-with-password becomes load- bearing for the IP-KVM substrate); B-0776 (simplest-first plugin sequence); B-0780 (Local Loop tier-3 substrate needs reachable clusters); B-0786 (simplest-first discipline applied at every choice); B-0778 (commodity hardware reference); .claude/rules/human-audit-and- legal-risk-acceptance-pattern-in-settings.md Shape A (hashedPassword-in-per-host-module). Co-Authored-By: Claude --- docs/BACKLOG.md | 1 + ...ubstrate-for-cluster-bringup-2026-05-26.md | 195 ++++++++++++++++++ .../hosts/control-plane/configuration.nix | 8 +- .../nixos/modules/initial-password.nix | 35 ++++ .../nixos/modules/operator-ssh-keys.nix | 34 +++ .../usb-nixos-installer/zeta-install.sh | 25 ++- 6 files changed, 292 insertions(+), 6 deletions(-) create mode 100644 docs/backlog/P1/B-0789-iter4-ssh-key-and-hashedpassword-substrate-for-cluster-bringup-2026-05-26.md create mode 100644 full-ai-cluster/nixos/modules/initial-password.nix create mode 100644 full-ai-cluster/nixos/modules/operator-ssh-keys.nix diff --git a/docs/BACKLOG.md b/docs/BACKLOG.md index d38870f7f9..c84421f8c8 100644 --- a/docs/BACKLOG.md +++ b/docs/BACKLOG.md @@ -365,6 +365,7 @@ are closed (status: closed in frontmatter)._ - [ ] **[B-0784](backlog/P1/B-0784-distributed-fsharp-type-negotiation-as-consensus-and-governance-namespace-scoped-strictness-aaron-mika-2026-05-25.md)** Distributed F# type negotiation as consensus + governance — every traveler's compiler agrees before compile; namespace-scoped strictness (personal mirror = free; common = strict consensus) - [ ] **[B-0785](backlog/P1/B-0785-unified-namespace-across-fsharp-kubernetes-ontology-plus-experiment-id-routing-via-argo-rollouts-cilium-service-mesh-aaron-mika-2026-05-25.md)** Unified namespace across F# / Kubernetes / Ontology + experiment-ID routing via Argo Rollouts + Cilium service mesh (existing standards) - [ ] **[B-0787](backlog/P1/B-0787-multi-ai-experiment-parallelism-without-stepping-on-each-others-feet-namespace-plus-experiment-id-plus-event-store-as-projections-not-separate-dbs-aaron-2026-05-25.md)** Multi-AI experiment parallelism without stepping on each other's feet — per-AI namespace + experiment-ID routing + event-store-native twin (experiments are projections, not separate DBs) +- [ ] **[B-0789](backlog/P1/B-0789-iter4-ssh-key-and-hashedpassword-substrate-for-cluster-bringup-2026-05-26.md)** Iter-4 cluster credential substrate — hashedPassword (zeta-change-me default) + operator-ssh-keys.nix module + manual edit workflow (v1) with zflash auto-inject as iter-4.2 follow-up ## P2 — research-grade diff --git a/docs/backlog/P1/B-0789-iter4-ssh-key-and-hashedpassword-substrate-for-cluster-bringup-2026-05-26.md b/docs/backlog/P1/B-0789-iter4-ssh-key-and-hashedpassword-substrate-for-cluster-bringup-2026-05-26.md new file mode 100644 index 0000000000..01ca4f7df6 --- /dev/null +++ b/docs/backlog/P1/B-0789-iter4-ssh-key-and-hashedpassword-substrate-for-cluster-bringup-2026-05-26.md @@ -0,0 +1,195 @@ +--- +id: B-0789 +priority: P1 +status: open +title: Iter-4 cluster credential substrate — hashedPassword (zeta-change-me default) + operator-ssh-keys.nix module + manual edit workflow (v1) with zflash auto-inject as iter-4.2 follow-up +effort: M +ask: aaron 2026-05-26 +created: 2026-05-26 +last_updated: 2026-05-26 +depends_on: + - B-0754 +composes_with: + - B-0759 + - B-0770 + - B-0776 + - B-0786 + - B-0780 + - B-0778 +tags: [cluster-install, ssh-key, password, iter-4, nixos, credentials, b-0754-follow-on] +--- + +## Problem + +Iter-3 (the maintainer's PC 1 test, 2026-05-25) shipped end-to-end zero-typing NixOS install via the iter-3 USB. Result: PC 1 booted to `control-plane login:` tty1 prompt — but **inaccessible**. Root cause surfaced by the maintainer asking *"what's the password?"*: + +- `nixos-install --no-root-password` was used → root account locked +- `users.users.zeta` defined in `common.nix` with no `initialPassword` / `hashedPassword` → zeta account also locked for tty1 login +- `users.users.zeta.openssh.authorizedKeys.keys = [ ]` empty in per-host `configuration.nix` (the example key was commented out) +- `services.openssh.PasswordAuthentication = false` → no SSH-by-password fallback + +PC 1 was unreachable both via local console AND via SSH. The IP-KVM substrate (B-0770 Comet Pro) + "remote fingers" substrate (B-0778 commodity hardware reference) becomes theatrical without local-console reachability. + +## Target + +Cluster nodes installed via iter-4 USB are reachable via BOTH paths after first boot: + +1. **Local tty1 console** with the initial password `zeta-change-me` (operator MUST rotate via `passwd zeta` on first login) +2. **SSH from the operator's workstation** as the `zeta` user after the operator adds their public key to `operator-ssh-keys.nix` + `nixos-rebuild switch` + +Per the maintainer 2026-05-26 *"we can do what's going to make cluster setup eaiser for me and not users if that's ssh lets do that first cause we want to get ai running the cluster asap"* — ship the simplest substrate that unblocks cluster-side AI workloads NOW. Account-login + credential-survey skill substrate (for end-user onboarding) deferred per the same message. + +## Substrate shape (iter-4 v1) + +### Password substrate + +`full-ai-cluster/nixos/modules/initial-password.nix`: + +- Sets `users.users.zeta.hashedPassword = "$6$..."` via sha512crypt hash for `zeta-change-me` (generated via `openssl passwd -6 'zeta-change-me'`) +- sha512crypt picked per simplest-first (per B-0786 memory): universally portable; promote to yescrypt or agenix / sops-nix when (a) repo goes public OR (b) multi-operator key isolation becomes load-bearing +- Imported by per-host `configuration.nix` +- Operator rotates immediately on first tty1 login + +### SSH-key substrate + +`full-ai-cluster/nixos/modules/operator-ssh-keys.nix`: + +- Empty stub in the repo: `users.users.zeta.openssh.authorizedKeys.keys = [ ]` +- Imported by per-host `configuration.nix` +- Operator manually edits this file post-install + `nixos-rebuild switch` to add their pubkey (iter-4 v1 path) +- iter-4.2 follow-up: `zflash.ts` writes operator's pubkey to a writable area of the boot USB; `zeta-install.sh` probes + injects into this module at install time (full zero-typing); see "iter-4.2 / iter-4.3 / iter-5 paths" below + +### Install-script substrate + +`full-ai-cluster/usb-nixos-installer/zeta-install.sh`: + +- Post-install (before reboot countdown), prints initial credentials in big letters: + + ``` + user: zeta + password: zeta-change-me + + AFTER FIRST LOGIN: + 1. passwd zeta + 2. Edit /etc/zeta/full-ai-cluster/nixos/modules/operator-ssh-keys.nix + 3. sudo nixos-rebuild switch --flake /etc/zeta/full-ai-cluster# + 4. ssh zeta@ from workstation + ``` + +- iter-4 v1 doesn't read pubkey from USB; stub stays empty until operator edit. Iter-4.2 adds USB read. + +## Acceptance — iter-4 v1 is SCAFFOLDING (not maintainer-usable); iter-4.2 is the actually-usable end-to-end target + +The maintainer 2026-05-26: *"i can wait for 4.2 or whatever version before we try again."* This downgrades iter-4 v1 from a "usable + tested via re-flash" goal to a "substrate lands so iter-4.2 has scaffolding to build on" goal. The maintainer will NOT re-flash PC 1 for v1; the actually-usable test target is iter-4.2 (or whichever iteration first ships zero-typing end-to-end SSH). + +### iter-4 v1 acceptance (substrate-scaffolding-only) + +- [x] `nixos/modules/initial-password.nix` ships with sha512crypt hash for `zeta-change-me` +- [x] `nixos/modules/operator-ssh-keys.nix` ships as empty stub — scaffolding that iter-4.2 overwrites at install time +- [x] `nixos/hosts/control-plane/configuration.nix` imports both new modules + removes the prior inline empty `authorizedKeys` declaration +- [x] `usb-nixos-installer/zeta-install.sh` prints initial credentials + post-install workflow before exit (workflow text useful in iter-4.2 too; v1 ships it because the cost is one echo block) +- [ ] Worker-template + worker-gpu configurations also import the two new modules — v1.1 follow-on within this row when zfollowed up + +### iter-4.2 acceptance (target the maintainer will actually test against) + +- [ ] `full-ai-cluster/tools/zflash.ts` extended (or new sibling `zflash-creds.ts`) with post-flash macOS-side ESP-mount-and-write step: + - Default reads `~/.ssh/id_ed25519.pub` + - `--ssh-key ` overrides + - `--no-creds` opts out (preserves current zflash behavior) + - Mounts the FAT / ESP partition of the flashed USB via `diskutil mount` + - Writes `/Volumes/