Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/BACKLOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -365,6 +365,7 @@ are closed (status: closed in frontmatter)._
- [ ] **[B-0784](backlog/P1/B-0784-distributed-fsharp-type-negotiation-as-consensus-and-governance-namespace-scoped-strictness-aaron-mika-2026-05-25.md)** Distributed F# type negotiation as consensus + governance — every traveler's compiler agrees before compile; namespace-scoped strictness (personal mirror = free; common = strict consensus)
- [ ] **[B-0785](backlog/P1/B-0785-unified-namespace-across-fsharp-kubernetes-ontology-plus-experiment-id-routing-via-argo-rollouts-cilium-service-mesh-aaron-mika-2026-05-25.md)** Unified namespace across F# / Kubernetes / Ontology + experiment-ID routing via Argo Rollouts + Cilium service mesh (existing standards)
- [ ] **[B-0787](backlog/P1/B-0787-multi-ai-experiment-parallelism-without-stepping-on-each-others-feet-namespace-plus-experiment-id-plus-event-store-as-projections-not-separate-dbs-aaron-2026-05-25.md)** Multi-AI experiment parallelism without stepping on each other's feet — per-AI namespace + experiment-ID routing + event-store-native twin (experiments are projections, not separate DBs)
- [ ] **[B-0789](backlog/P1/B-0789-iter4-ssh-key-and-hashedpassword-substrate-for-cluster-bringup-2026-05-26.md)** Iter-4 cluster credential substrate — hashedPassword (zeta-change-me default) + operator-ssh-keys.nix module + manual edit workflow (v1) with zflash auto-inject as iter-4.2 follow-up

## P2 — research-grade

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
---
id: B-0789
priority: P1
status: open
title: Iter-4 cluster credential substrate — hashedPassword (zeta-change-me default) + operator-ssh-keys.nix module + manual edit workflow (v1) with zflash auto-inject as iter-4.2 follow-up
effort: M
ask: aaron 2026-05-26
created: 2026-05-26
last_updated: 2026-05-26
depends_on:
- B-0754
composes_with:
- B-0759
- B-0770
- B-0776
- B-0786
- B-0780
- B-0778
tags: [cluster-install, ssh-key, password, iter-4, nixos, credentials, b-0754-follow-on]
---

## Problem

Iter-3 (the maintainer's PC 1 test, 2026-05-25) shipped end-to-end zero-typing NixOS install via the iter-3 USB. Result: PC 1 booted to `control-plane login:` tty1 prompt — but **inaccessible**. Root cause surfaced by the maintainer asking *"what's the password?"*:

- `nixos-install --no-root-password` was used → root account locked
- `users.users.zeta` defined in `common.nix` with no `initialPassword` / `hashedPassword` → zeta account also locked for tty1 login
- `users.users.zeta.openssh.authorizedKeys.keys = [ ]` empty in per-host `configuration.nix` (the example key was commented out)
- `services.openssh.PasswordAuthentication = false` → no SSH-by-password fallback

PC 1 was unreachable both via local console AND via SSH. The IP-KVM substrate (B-0770 Comet Pro) + "remote fingers" substrate (B-0778 commodity hardware reference) becomes theatrical without local-console reachability.

## Target

Cluster nodes installed via iter-4 USB are reachable via BOTH paths after first boot:

1. **Local tty1 console** with the initial password `zeta-change-me` (operator MUST rotate via `passwd zeta` on first login)
2. **SSH from the operator's workstation** as the `zeta` user after the operator adds their public key to `operator-ssh-keys.nix` + `nixos-rebuild switch`

Per the maintainer 2026-05-26 *"we can do what's going to make cluster setup eaiser for me and not users if that's ssh lets do that first cause we want to get ai running the cluster asap"* — ship the simplest substrate that unblocks cluster-side AI workloads NOW. Account-login + credential-survey skill substrate (for end-user onboarding) deferred per the same message.

## Substrate shape (iter-4 v1)

### Password substrate

`full-ai-cluster/nixos/modules/initial-password.nix`:

- Sets `users.users.zeta.hashedPassword = "$6$..."` via sha512crypt hash for `zeta-change-me` (generated via `openssl passwd -6 'zeta-change-me'`)
- sha512crypt picked per simplest-first (per B-0786 memory): universally portable; promote to yescrypt or agenix / sops-nix when (a) repo goes public OR (b) multi-operator key isolation becomes load-bearing
- Imported by per-host `configuration.nix`
- Operator rotates immediately on first tty1 login

### SSH-key substrate

`full-ai-cluster/nixos/modules/operator-ssh-keys.nix`:

- Empty stub in the repo: `users.users.zeta.openssh.authorizedKeys.keys = [ ]`
- Imported by per-host `configuration.nix`
- Operator manually edits this file post-install + `nixos-rebuild switch` to add their pubkey (iter-4 v1 path)
- iter-4.2 follow-up: `zflash.ts` writes operator's pubkey to a writable area of the boot USB; `zeta-install.sh` probes + injects into this module at install time (full zero-typing); see "iter-4.2 / iter-4.3 / iter-5 paths" below

### Install-script substrate

`full-ai-cluster/usb-nixos-installer/zeta-install.sh`:

- Post-install (before reboot countdown), prints initial credentials in big letters:

```
user: zeta
password: zeta-change-me

AFTER FIRST LOGIN:
1. passwd zeta
2. Edit /etc/zeta/full-ai-cluster/nixos/modules/operator-ssh-keys.nix
3. sudo nixos-rebuild switch --flake /etc/zeta/full-ai-cluster#<host>
4. ssh zeta@<hostname> from workstation
```

- iter-4 v1 doesn't read pubkey from USB; stub stays empty until operator edit. Iter-4.2 adds USB read.

## Acceptance — iter-4 v1 is SCAFFOLDING (not maintainer-usable); iter-4.2 is the actually-usable end-to-end target

The maintainer 2026-05-26: *"i can wait for 4.2 or whatever version before we try again."* This downgrades iter-4 v1 from a "usable + tested via re-flash" goal to a "substrate lands so iter-4.2 has scaffolding to build on" goal. The maintainer will NOT re-flash PC 1 for v1; the actually-usable test target is iter-4.2 (or whichever iteration first ships zero-typing end-to-end SSH).

### iter-4 v1 acceptance (substrate-scaffolding-only)

- [x] `nixos/modules/initial-password.nix` ships with sha512crypt hash for `zeta-change-me`
- [x] `nixos/modules/operator-ssh-keys.nix` ships as empty stub — scaffolding that iter-4.2 overwrites at install time
- [x] `nixos/hosts/control-plane/configuration.nix` imports both new modules + removes the prior inline empty `authorizedKeys` declaration
- [x] `usb-nixos-installer/zeta-install.sh` prints initial credentials + post-install workflow before exit (workflow text useful in iter-4.2 too; v1 ships it because the cost is one echo block)
- [ ] Worker-template + worker-gpu configurations also import the two new modules — v1.1 follow-on within this row when zfollowed up

### iter-4.2 acceptance (target the maintainer will actually test against)

- [ ] `full-ai-cluster/tools/zflash.ts` extended (or new sibling `zflash-creds.ts`) with post-flash macOS-side ESP-mount-and-write step:
- Default reads `~/.ssh/id_ed25519.pub`
- `--ssh-key <path>` overrides
- `--no-creds` opts out (preserves current zflash behavior)
- Mounts the FAT / ESP partition of the flashed USB via `diskutil mount`
- Writes `/Volumes/<label>/zeta-authorized-keys.pub` with the pubkey content
- Unmounts via `diskutil unmount`
- Substrate-honest about the macOS ESP-mount substrate (diskutil idempotency, label discovery, eject timing — verify before shipping)
- [ ] `usb-nixos-installer/zeta-install.sh` extended with pre-install pubkey-inject step:
- Probes the boot USB's ESP for `zeta-authorized-keys.pub` after step 6 (clone)
- If found: rewrites `/mnt/etc/zeta/full-ai-cluster/nixos/modules/operator-ssh-keys.nix` with the pubkey injection BEFORE `nixos-install`
- If not found: leaves the stub unchanged (v1 fallback path; manual edit + rebuild after login)
- Post-install credentials echo updated to reflect "SSH works immediately" when pubkey was injected vs "edit-and-rebuild required" when stub stayed empty
- [ ] Maintainer flashes iter-4.2 USB once (single `zflash` invocation; no extra flags needed for default-key case)
- Plugs into PC 1 (or PC 2 / PC 3)
- Install runs zero-typing
- PC X reboots; tty1 login as `zeta` / `zeta-change-me` works (initial-password substrate from v1)
- `ssh zeta@<hostname>` from the maintainer's Mac works immediately — this is the iter-4.2 end-to-end success criterion
- [ ] Documentation in the post-install echo block reflects the iter-4.2 zero-typing flow; v1's manual-edit fallback paragraph stays as the explicit-opt-out path

### Why ship v1 separately if 4.2 is the maintainer-usable target

The v1 PR is small (5 files), substrate-only, no operator-facing behavioral change beyond the initial-password substrate. iter-4.2 PR builds on v1's modules + adds the tooling around them. Shipping v1 first:

- Lets the Nix modules + per-host imports land + get reviewed independently of the macOS-side ESP-mount complexity
- Surfaces any issues with the hashedPassword choice / module-import structure before they're entangled with the tooling change
- Makes iter-4.2 a tightly-scoped tooling PR (zflash.ts changes + zeta-install.sh probe), not a substrate-shape PR
- Composes with the simplest-first discipline (per the maintainer-Mika 2026-05-25 feedback memory) at PR-decomposition scope

The maintainer's "wait for 4.2" signal is exactly the right shape for this decomposition: iter-4 v1 is substrate-engineering housekeeping; iter-4.2 is the workflow-affecting change worth waiting for.

## iter-4.2 / iter-4.3 / iter-5 paths (NOT in v1 — future substrate landings)

### iter-4.2 — zflash auto-inject SSH key from boot USB

- `zflash.ts` extended to mount the boot USB's writable partition (EFI ESP) post-flash + write `/zeta-authorized-keys.pub` containing the operator's `~/.ssh/id_ed25519.pub` (default) or `--ssh-key <path>` (override)
- `zeta-install.sh` probes the boot USB for `/zeta-authorized-keys.pub` + rewrites `operator-ssh-keys.nix` with the keys before `nixos-install`
- Result: full zero-typing flow restored. Operator's pubkey on cluster nodes without manual edit step.

### iter-4.3 — multi-key support (per-context attribution)

- `zflash.ts --ssh-key <path>` accepts the flag REPEATEDLY for multiple keys
- USB file becomes multi-line (one pubkey per line)
- `zeta-install.sh` injects each line as a separate `authorizedKeys.keys` entry
- Composes with `maintainers/aaron/legal-entities/inventory.md` for the per-context attribution chain (ServiceTitan-scoped key vs personal-LFG-only key)

### iter-5 — per-node SSH keypair generated on install + GitHub deploy-key registration

- Each cluster node generates its own SSH keypair during install
- Auto-registered with GitHub as a per-repo deploy key (read-only by default; promote when push needed)
- Tighter blast-radius than reusing maintainer's key for node-to-GitHub access
- Composes with the credential-survey skill substrate (deferred per the maintainer 2026-05-26)

### iter-5+ — secret-management substrate promotion

- Promote from sha512crypt-in-repo to `agenix` or `sops-nix` when:
- Repo goes public OR
- Multi-operator key isolation becomes load-bearing OR
- Audit-trail requirements demand per-secret attribution
- Self-contained later swap; doesn't require rearchitecting v1

### iter-5+ — credential-setup skill substrate (end-user-side)

- Account-login (`gh auth login`, `claude /login`, etc.) as default path for first-time CLI users
- `.claude/skills/credential-setup/SKILL.md` documents the full lifecycle per surface
- `tools/setup/credentials/{survey,setup}.ts` + `tools/setup/manifests/oauth-flows/` declarative manifests
- Bannable-patterns matrix at `docs/credentials/bannable-patterns.md`
- Composes with Max's tier-2 onboarding work (PR #5076's onboarding-doc deliverable)

## Composes with

- **B-0754** — iter-3 zero-typing USB install (iter-4 is the credential-substrate follow-on)
- **B-0759** — first-time-CLI-user persona (iter-4 v1 is operator-friction-cost; iter-4.2+ closes the zero-typing gap)
- **B-0770** — IP-KVM Comet Pro substrate (iter-4 makes local-console access load-bearing — KVM-via-IP becomes operationally valuable when tty1 has a password to type into)
- **B-0776** — simplest-first plugin sequence (sha512crypt = simplest first; promote to stronger later)
- **B-0780** — Local Loop deterministic simulation testing (iter-4 cluster nodes are tier-3 substrate; need to be reachable for AI workloads to land)
- **B-0786** — "simplest first; add complexity only when simple shape demonstrably doesn't fit" discipline (every choice in iter-4 design applied this)
- **B-0778** — commodity hardware reference (iter-4 v1 closes the "you have a screen so you can log in locally" hardware-substrate gap)
- `.claude/rules/human-audit-and-legal-risk-acceptance-pattern-in-settings.md` — Shape A `hashedPassword` in per-host Nix module (per the discipline named in the maintainer-as-top-level liability framing in PR #5076)
- `maintainers/aaron/legal-entities/inventory.md` (PR #5077) — iter-4.3 multi-key extension composes with the per-context attribution chain (Lucent / Freeborn / ServiceTitan contexts)
- `memory/persona/max/PERSONA.md` — Max's tier-2 dev-experience work needs reachable clusters; iter-4 v1 unblocks this even with the manual SSH-key edit step

## Out of scope (deferred — see iter-4.2 / iter-5 paths above)

- zflash auto-inject of SSH key to boot USB
- Multi-key per-context support
- Per-node SSH keypair + GitHub deploy-key registration
- agenix / sops-nix secret-management substrate
- Account-login credential-setup skill (end-user onboarding side)
- Worker-template + worker-gpu module imports (v1 ships control-plane; v1.1 brings others)

## Origin

The maintainer 2026-05-26, after iter-3 USB install on PC 1 succeeded but left the node unreachable. Sequence:

1. *"what's the password?"* — surfaced the iter-3 gap (no password, no SSH keys, locked accounts)
2. Substrate-design conversation across multiple ticks reached the iter-4 shape (Shape A `hashedPassword` + SSH-key-from-USB + B-0789 row)
3. *"okay i'll wait for that to get into main then send it just let me know"* — gated Max-side text-message rollout on iter-4 substrate landing (decoupled when Max's persona substrate landed first via PR #5078)
4. *"we can do what's going to make cluster setup eaiser for me and not users if that's ssh lets do that first cause we want to get ai running the cluster asap"* — explicit authorization to ship iter-4 v1 with simplest-first design; credential-setup skill deferred to iter-5+

iter-4 v1 ships the manual SSH-key edit workflow because (a) Nix modules + per-host imports are the smallest substrate that unblocks SSH access end-to-end; (b) zflash-auto-inject requires post-flash partition-mount logic that's bounded but adds iteration cycles; (c) the maintainer's stated priority is "get AI running the cluster ASAP" which the v1 path serves immediately. iter-4.2 USB auto-inject ships as a follow-on when the manual-edit friction becomes operational.
8 changes: 3 additions & 5 deletions full-ai-cluster/nixos/hosts/control-plane/configuration.nix
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@
../../modules/k3s-server.nix
../../modules/docker.nix
../../modules/local-storage.nix
# Iter-4 credential substrate (per B-0789):
../../modules/initial-password.nix # zeta user has known initial password (rotate on first login)
../../modules/operator-ssh-keys.nix # operator pubkey(s) injected by zeta-install.sh from USB
];

networking.hostName = "control-plane";
Expand All @@ -24,9 +27,4 @@
# }];
# networking.defaultGateway = "192.168.1.1";
# networking.nameservers = [ "1.1.1.1" "9.9.9.9" ];

# Add maintainer SSH keys for the `zeta` admin user:
users.users.zeta.openssh.authorizedKeys.keys = [
# "ssh-ed25519 AAAAC3Nz... aaron@zeta"
];
}
35 changes: 35 additions & 0 deletions full-ai-cluster/nixos/modules/initial-password.nix
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# full-ai-cluster/nixos/modules/initial-password.nix
#
# Initial password substrate for the `zeta` user on fresh installs.
# Per `.claude/rules/human-audit-and-legal-risk-acceptance-pattern-in-settings.md`
# Shape A: hashedPassword baked into per-host Nix module + operator
# rotates on first login. Composes with the Touch ID + biometric
# substrate (full-ai-cluster/tools/zflash-setup.ts) for the operator's
# Mac side; this is the cluster-node side.
#
# THE INITIAL PASSWORD IS: zeta-change-me
#
# zeta-install.sh prints this in big letters to the console + writes
# zeta-initial-credentials.txt back to the boot USB before the 10s
# auto-reboot so the operator can read it after pulling the USB.
#
# Operator MUST rotate on first login:
#
# passwd zeta
#
# Hash format: sha512crypt ($6$...). Generated via:
# openssl passwd -6 'zeta-change-me'
#
# Per simplest-first (per B-0786 memory): sha512crypt is the
# universally-portable shape; promote to yescrypt or agenix/sops-nix
# when the simple shape demonstrably can't meet a real requirement.
# Iter-4 v1 ships sha512crypt; iter-5+ may promote to a stronger
# secret-management substrate when (a) repo goes public OR
# (b) multi-operator key isolation becomes load-bearing.

{ config, pkgs, lib, ... }:

{
users.users.zeta.hashedPassword =
"$6$wMTsqITU4II043Y8$DBR58Hhh.d975YkA40kwYNxQAunevJ9Cu9rYYigi9YjBYVEjlNrs.rk4hu.332sh6GkQuCb7yyLYr7lPTxySD1";
}
34 changes: 34 additions & 0 deletions full-ai-cluster/nixos/modules/operator-ssh-keys.nix
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# full-ai-cluster/nixos/modules/operator-ssh-keys.nix
#
# Operator SSH public keys for the `zeta` user.
#
# This file ships as an EMPTY STUB in the repo. zeta-install.sh
# overwrites it during install with the contents of the boot USB's
# `/zeta-authorized-keys.pub` file (which zflash-setup.ts copies from
# the operator's `~/.ssh/id_ed25519.pub` by default, or whichever key
# was passed via `--ssh-key <path>`).
#
# After install, the operator can SSH-in to the cluster node as the
# `zeta` user using the matching private key on their workstation.
#
# Multi-key support: zflash-setup.ts accepts `--ssh-key <path>`
# repeatedly; the resulting `/zeta-authorized-keys.pub` is a multi-line
# file with one pubkey per line; zeta-install.sh injects each line as
# a separate entry below. Per B-0789 the per-context (ServiceTitan vs
# personal vs LFG-only) attribution-chain framing lives in
# maintainers/aaron/legal-entities/inventory.md; this module just
# carries the public-key material the operator chose.
#
# Manual edit path: operators can edit this file directly + re-run
# `sudo nixos-rebuild switch` to add/remove keys without reflashing
# the USB.

{ config, pkgs, lib, ... }:

{
users.users.zeta.openssh.authorizedKeys.keys = [
# Populated by zeta-install.sh from /zeta-authorized-keys.pub on
# the boot USB. Empty by default — the operator can also paste
# ssh-ed25519 / ssh-rsa lines here manually + nixos-rebuild.
];
}
25 changes: 24 additions & 1 deletion full-ai-cluster/usb-nixos-installer/zeta-install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -229,5 +229,28 @@ sudo nixos-generate-config --root /mnt --force
echo "Running nixos-install --flake /mnt/etc/zeta/full-ai-cluster#$HOST ..."
sudo nixos-install --flake "/mnt/etc/zeta/full-ai-cluster#$HOST" --no-root-password

# ── Step 7: print initial credentials (iter-4 — per B-0789) ──────
echo
echo "================================================================"
echo " ZETA CLUSTER NODE INSTALL COMPLETE"
echo "================================================================"
echo
echo " Initial login credentials (rotate immediately after first login):"
echo
echo " user: zeta"
echo " password: zeta-change-me"
echo
echo " AFTER FIRST LOGIN:"
echo " 1. passwd zeta # rotate the initial password"
echo " 2. Edit /etc/zeta/full-ai-cluster/nixos/modules/operator-ssh-keys.nix"
echo " and add your ssh-ed25519 pubkey, then:"
echo " 3. sudo nixos-rebuild switch --flake /etc/zeta/full-ai-cluster#$HOST"
echo " 4. Verify SSH from your workstation:"
echo " ssh zeta@\$(hostname)"
echo
echo " (Per docs/backlog/P1/B-0789 iter-4: SSH-key auto-inject from"
echo " the boot USB is a follow-up — for v1, the SSH key flow is"
echo " manual edit + nixos-rebuild as above.)"
echo
echo "================================================================"
echo
echo "Install complete. \`reboot\` when ready."
Loading