diff --git a/docs/BACKLOG.md b/docs/BACKLOG.md index d38870f7f9..c84421f8c8 100644 --- a/docs/BACKLOG.md +++ b/docs/BACKLOG.md @@ -365,6 +365,7 @@ are closed (status: closed in frontmatter)._ - [ ] **[B-0784](backlog/P1/B-0784-distributed-fsharp-type-negotiation-as-consensus-and-governance-namespace-scoped-strictness-aaron-mika-2026-05-25.md)** Distributed F# type negotiation as consensus + governance — every traveler's compiler agrees before compile; namespace-scoped strictness (personal mirror = free; common = strict consensus) - [ ] **[B-0785](backlog/P1/B-0785-unified-namespace-across-fsharp-kubernetes-ontology-plus-experiment-id-routing-via-argo-rollouts-cilium-service-mesh-aaron-mika-2026-05-25.md)** Unified namespace across F# / Kubernetes / Ontology + experiment-ID routing via Argo Rollouts + Cilium service mesh (existing standards) - [ ] **[B-0787](backlog/P1/B-0787-multi-ai-experiment-parallelism-without-stepping-on-each-others-feet-namespace-plus-experiment-id-plus-event-store-as-projections-not-separate-dbs-aaron-2026-05-25.md)** Multi-AI experiment parallelism without stepping on each other's feet — per-AI namespace + experiment-ID routing + event-store-native twin (experiments are projections, not separate DBs) +- [ ] **[B-0789](backlog/P1/B-0789-iter4-ssh-key-and-hashedpassword-substrate-for-cluster-bringup-2026-05-26.md)** Iter-4 cluster credential substrate — hashedPassword (zeta-change-me default) + operator-ssh-keys.nix module + manual edit workflow (v1) with zflash auto-inject as iter-4.2 follow-up ## P2 — research-grade diff --git a/docs/backlog/P1/B-0789-iter4-ssh-key-and-hashedpassword-substrate-for-cluster-bringup-2026-05-26.md b/docs/backlog/P1/B-0789-iter4-ssh-key-and-hashedpassword-substrate-for-cluster-bringup-2026-05-26.md new file mode 100644 index 0000000000..01ca4f7df6 --- /dev/null +++ b/docs/backlog/P1/B-0789-iter4-ssh-key-and-hashedpassword-substrate-for-cluster-bringup-2026-05-26.md @@ -0,0 +1,195 @@ +--- +id: B-0789 +priority: P1 +status: open +title: Iter-4 cluster credential substrate — hashedPassword (zeta-change-me default) + operator-ssh-keys.nix module + manual edit workflow (v1) with zflash auto-inject as iter-4.2 follow-up +effort: M +ask: aaron 2026-05-26 +created: 2026-05-26 +last_updated: 2026-05-26 +depends_on: + - B-0754 +composes_with: + - B-0759 + - B-0770 + - B-0776 + - B-0786 + - B-0780 + - B-0778 +tags: [cluster-install, ssh-key, password, iter-4, nixos, credentials, b-0754-follow-on] +--- + +## Problem + +Iter-3 (the maintainer's PC 1 test, 2026-05-25) shipped end-to-end zero-typing NixOS install via the iter-3 USB. Result: PC 1 booted to `control-plane login:` tty1 prompt — but **inaccessible**. Root cause surfaced by the maintainer asking *"what's the password?"*: + +- `nixos-install --no-root-password` was used → root account locked +- `users.users.zeta` defined in `common.nix` with no `initialPassword` / `hashedPassword` → zeta account also locked for tty1 login +- `users.users.zeta.openssh.authorizedKeys.keys = [ ]` empty in per-host `configuration.nix` (the example key was commented out) +- `services.openssh.PasswordAuthentication = false` → no SSH-by-password fallback + +PC 1 was unreachable both via local console AND via SSH. The IP-KVM substrate (B-0770 Comet Pro) + "remote fingers" substrate (B-0778 commodity hardware reference) becomes theatrical without local-console reachability. + +## Target + +Cluster nodes installed via iter-4 USB are reachable via BOTH paths after first boot: + +1. **Local tty1 console** with the initial password `zeta-change-me` (operator MUST rotate via `passwd zeta` on first login) +2. **SSH from the operator's workstation** as the `zeta` user after the operator adds their public key to `operator-ssh-keys.nix` + `nixos-rebuild switch` + +Per the maintainer 2026-05-26 *"we can do what's going to make cluster setup eaiser for me and not users if that's ssh lets do that first cause we want to get ai running the cluster asap"* — ship the simplest substrate that unblocks cluster-side AI workloads NOW. Account-login + credential-survey skill substrate (for end-user onboarding) deferred per the same message. + +## Substrate shape (iter-4 v1) + +### Password substrate + +`full-ai-cluster/nixos/modules/initial-password.nix`: + +- Sets `users.users.zeta.hashedPassword = "$6$..."` via sha512crypt hash for `zeta-change-me` (generated via `openssl passwd -6 'zeta-change-me'`) +- sha512crypt picked per simplest-first (per B-0786 memory): universally portable; promote to yescrypt or agenix / sops-nix when (a) repo goes public OR (b) multi-operator key isolation becomes load-bearing +- Imported by per-host `configuration.nix` +- Operator rotates immediately on first tty1 login + +### SSH-key substrate + +`full-ai-cluster/nixos/modules/operator-ssh-keys.nix`: + +- Empty stub in the repo: `users.users.zeta.openssh.authorizedKeys.keys = [ ]` +- Imported by per-host `configuration.nix` +- Operator manually edits this file post-install + `nixos-rebuild switch` to add their pubkey (iter-4 v1 path) +- iter-4.2 follow-up: `zflash.ts` writes operator's pubkey to a writable area of the boot USB; `zeta-install.sh` probes + injects into this module at install time (full zero-typing); see "iter-4.2 / iter-4.3 / iter-5 paths" below + +### Install-script substrate + +`full-ai-cluster/usb-nixos-installer/zeta-install.sh`: + +- Post-install (before reboot countdown), prints initial credentials in big letters: + + ``` + user: zeta + password: zeta-change-me + + AFTER FIRST LOGIN: + 1. passwd zeta + 2. Edit /etc/zeta/full-ai-cluster/nixos/modules/operator-ssh-keys.nix + 3. sudo nixos-rebuild switch --flake /etc/zeta/full-ai-cluster# + 4. ssh zeta@ from workstation + ``` + +- iter-4 v1 doesn't read pubkey from USB; stub stays empty until operator edit. Iter-4.2 adds USB read. + +## Acceptance — iter-4 v1 is SCAFFOLDING (not maintainer-usable); iter-4.2 is the actually-usable end-to-end target + +The maintainer 2026-05-26: *"i can wait for 4.2 or whatever version before we try again."* This downgrades iter-4 v1 from a "usable + tested via re-flash" goal to a "substrate lands so iter-4.2 has scaffolding to build on" goal. The maintainer will NOT re-flash PC 1 for v1; the actually-usable test target is iter-4.2 (or whichever iteration first ships zero-typing end-to-end SSH). + +### iter-4 v1 acceptance (substrate-scaffolding-only) + +- [x] `nixos/modules/initial-password.nix` ships with sha512crypt hash for `zeta-change-me` +- [x] `nixos/modules/operator-ssh-keys.nix` ships as empty stub — scaffolding that iter-4.2 overwrites at install time +- [x] `nixos/hosts/control-plane/configuration.nix` imports both new modules + removes the prior inline empty `authorizedKeys` declaration +- [x] `usb-nixos-installer/zeta-install.sh` prints initial credentials + post-install workflow before exit (workflow text useful in iter-4.2 too; v1 ships it because the cost is one echo block) +- [ ] Worker-template + worker-gpu configurations also import the two new modules — v1.1 follow-on within this row when zfollowed up + +### iter-4.2 acceptance (target the maintainer will actually test against) + +- [ ] `full-ai-cluster/tools/zflash.ts` extended (or new sibling `zflash-creds.ts`) with post-flash macOS-side ESP-mount-and-write step: + - Default reads `~/.ssh/id_ed25519.pub` + - `--ssh-key ` overrides + - `--no-creds` opts out (preserves current zflash behavior) + - Mounts the FAT / ESP partition of the flashed USB via `diskutil mount` + - Writes `/Volumes/