diff --git a/docs/BACKLOG.md b/docs/BACKLOG.md index 7366e58adc..62b373e0ff 100644 --- a/docs/BACKLOG.md +++ b/docs/BACKLOG.md @@ -372,6 +372,7 @@ are closed (status: closed in frontmatter)._ - [ ] **[B-0787](backlog/P1/B-0787-multi-ai-experiment-parallelism-without-stepping-on-each-others-feet-namespace-plus-experiment-id-plus-event-store-as-projections-not-separate-dbs-aaron-2026-05-25.md)** Multi-AI experiment parallelism without stepping on each other's feet — per-AI namespace + experiment-ID routing + event-store-native twin (experiments are projections, not separate DBs) - [ ] **[B-0789](backlog/P1/B-0789-iter4-ssh-key-and-hashedpassword-substrate-for-cluster-bringup-2026-05-26.md)** Iter-4 cluster credential substrate — hashedPassword (zeta-change-me default) + operator-ssh-keys.nix module + manual edit workflow (v1) with zflash auto-inject as iter-4.2 follow-up - [ ] **[B-0790](backlog/P1/B-0790-zero-dev-machines-cluster-native-architecture-voice-as-primary-operator-surface-aaron-2026-05-26.md)** Zero-dev-machines cluster-native architecture — all PRs from cluster; voice (Alexa + future microphones) as primary operator interface; dev machines and Alexa surfaces are conversational entry points into the cluster, not work substrate +- [ ] **[B-0792](backlog/P1/B-0792-iter5-wifi-credentials-injection-via-usb-esp-for-zero-typing-cluster-bringup-without-ethernet-load-bearing-for-homelab-persona-aaron-2026-05-26.md)** iter-5 wifi-credentials injection via USB ESP — homelab persona MOSTLY HAS NO ETHERNET; cluster must "remember the wifi on setup"; analogous to iter-4.x pubkey injection but for NetworkManager profile (Aaron 2026-05-26) ## P2 — research-grade diff --git a/docs/backlog/P1/B-0792-iter5-wifi-credentials-injection-via-usb-esp-for-zero-typing-cluster-bringup-without-ethernet-load-bearing-for-homelab-persona-aaron-2026-05-26.md b/docs/backlog/P1/B-0792-iter5-wifi-credentials-injection-via-usb-esp-for-zero-typing-cluster-bringup-without-ethernet-load-bearing-for-homelab-persona-aaron-2026-05-26.md new file mode 100644 index 0000000000..a3149600d8 --- /dev/null +++ b/docs/backlog/P1/B-0792-iter5-wifi-credentials-injection-via-usb-esp-for-zero-typing-cluster-bringup-without-ethernet-load-bearing-for-homelab-persona-aaron-2026-05-26.md @@ -0,0 +1,183 @@ +--- +id: B-0792 +priority: P1 +status: open +title: iter-5 wifi-credentials injection via USB ESP — homelab persona MOSTLY HAS NO ETHERNET; cluster must "remember the wifi on setup"; analogous to iter-4.x pubkey injection but for NetworkManager profile (Aaron 2026-05-26) +effort: M +ask: aaron 2026-05-26 +created: 2026-05-26 +last_updated: 2026-05-26 +depends_on: + - B-0789 +composes_with: + - B-0754 + - B-0759 + - B-0770 + - B-0778 + - B-0790 +tags: [iter-5, wifi, networkmanager, zero-typing, homelab-persona, esp-injection, usb-installer, cluster-bringup, b0789-extension] +--- + +## Problem + +The maintainer 2026-05-26 surfaced a load-bearing substrate gap during the iter-4.2 empirical test (PC1 first cluster bring-up): + +> *"we won't have ethernet for most machines it needs to remember the wifi on setup"* + +Today's substrate ([full-ai-cluster/nixos/modules/common.nix:31](full-ai-cluster/nixos/modules/common.nix#L31)) enables NetworkManager but bakes in **zero wifi credentials**. Result: + +- **Ethernet works automatically** (DHCP, no config) — fine for one-off bench testing where operator has an ethernet cable +- **Wifi does NOT work automatically** — requires console-side `nmtui` / `nmcli` to set up first connection; defeats zero-typing discipline + +For the homelab persona (per B-0759 broadening + B-0790 end-state), MOST cluster nodes are wifi-only mini-PCs with no ethernet jack populated. Without wifi-injection, iter-4.x doesn't bootstrap the homelab persona at all. + +## Target + +Extend the iter-4.x ESP-injection pattern to ALSO carry wifi credentials so first cluster boot connects to home wifi automatically — analogous to how iter-4.2 carries `zeta-authorized-keys.pub`. + +Operator flow becomes (end-state): + +```bash +# One-time setup (per operator Mac): create credentials file once +$ echo '{"ssid":"MyHomeWifi","password":"secret123"}' > ~/.zeta/wifi-credentials.json + +# Per-USB flash (zero-typing same as iter-4.4 today): +$ zflash +[Touch ID] +# ... dd + inject pubkey + inject wifi-creds + eject + +# Cluster boot: NetworkManager comes up with credentials persisted; +# DHCP via wifi; sshd accessible from operator Mac immediately +``` + +## Sub-targets (composing iters) + +### Sub-target 1 — zflash extension: write zeta-wifi-credentials.json to ESP + +Parallel to existing `zeta-authorized-keys.pub` injection. Additions to `full-ai-cluster/tools/zflash.ts`: + +- Resolve credentials from (priority order): + 1. CLI flags `--wifi-ssid --wifi-password ` (one-off override) + 2. Env vars `ZETA_WIFI_SSID` / `ZETA_WIFI_PASSWORD` + 3. JSON file `~/.zeta/wifi-credentials.json` (`{"ssid": "...", "password": "..."}`) + 4. None → skip wifi injection (operator may have ethernet; not fatal) +- Write `zeta-wifi-credentials.json` to ESP via existing mountEsp path (single Touch ID covers all sudo calls per sudo timestamp window) +- Print substrate-honest disclosure: "iter-5: wrote wifi credentials (SSID=, password=) to ESP" — never print the password to stdout + +### Sub-target 2 — zeta-install.sh extension: read ESP creds + write NetworkManager profile + +Parallel to existing pubkey read. Additions to `full-ai-cluster/usb-nixos-installer/zeta-install.sh`: + +- During install, before `nixos-install` completes, check if `zeta-wifi-credentials.json` exists on the boot USB's ESP +- If present, write a NetworkManager connection file to `/mnt/etc/NetworkManager/system-connections/zeta-wifi.nmconnection` with: + ```ini + [connection] + id=zeta-wifi + type=wifi + autoconnect=true + permissions= + [wifi] + ssid= + mode=infrastructure + [wifi-security] + key-mgmt=wpa-psk + psk= + [ipv4] + method=auto + [ipv6] + method=auto + ``` +- chmod 0600 the file (NetworkManager requires) +- Photo-friendly diagnostic on success: "iter-5: wifi credentials injected for SSID= at /etc/NetworkManager/system-connections/zeta-wifi.nmconnection" +- On failure: dumpDiagnostics + fallback discipline (cluster still bootable; operator can use `nmtui` console-side as escape hatch) + +### Sub-target 3 — NixOS config: NetworkManager `wireless` enable + nss-mdns publishing + +Two related gaps surfaced by same test: + +a) **NetworkManager wireless plugin enable** — verify `programs.nm-applet.enable` and wireless backend are correct for headless NetworkManager wifi connection (may need `networking.wireless.enable = false` to defer to NM, plus NetworkManager's wpa_supplicant module). Test on actual cluster hardware. + +b) **mDNS publishing** — empirical 2026-05-26: `ssh zeta@control-plane.local` failed to resolve from operator Mac because NixOS install has NO Avahi configured. Add to `full-ai-cluster/nixos/modules/common.nix`: + +```nix +services.avahi = { + enable = true; + publish = { + enable = true; + addresses = true; + workstation = true; + domain = true; + }; + nssmdns4 = true; +}; +``` + +After this lands, `ssh zeta@control-plane.local` from any LAN device (Mac, Linux, etc.) resolves via mDNS without IP discovery step. + +### Sub-target 4 — multi-node hostname selection + +The iter-5 "what happens when there's a 2nd node?" question Aaron asked. Three options previously surfaced: + +1. **Pre-bake per-USB** (RECOMMENDED): `bun tools/zflash.ts --host worker-gpu-1` → zflash writes `zeta-hostname.txt` to ESP; `zeta-install.sh` reads it + passes to `nixos-install --flake .../#$HOST` +2. Prompt on first boot via console (defeats zero-typing) +3. Auto-detect by MAC/serial pattern (risky) + +Option 1 composes with sub-targets 1+2 cleanly — adds a 3rd ESP file (`zeta-hostname.txt`) to the inject set. + +### Sub-target 5 — cluster join token / control-plane address injection + +For worker nodes (`worker-gpu-1` joining `control-plane`), iter-5 also needs: + +- Bootstrap join token (k3s / kubeadm / Talos / whatever cluster substrate the workers join) +- Control-plane address (probably auto-discoverable via mDNS once sub-target 3 is in) + +This sub-target is downstream of cluster-orchestration-substrate selection (B-0776 simplest-first plugin sequence likely informs the choice). Track separately when that lands. + +## Acceptance + +- [ ] **Sub-target 1**: zflash writes `zeta-wifi-credentials.json` to ESP when credentials are resolvable; logs SSID + redacted-password disclosure +- [ ] **Sub-target 2**: `zeta-install.sh` reads ESP creds + writes NetworkManager profile to `/mnt/etc/NetworkManager/system-connections/zeta-wifi.nmconnection` with chmod 0600 +- [ ] **Sub-target 3a**: NixOS config verified to bring up wifi via NetworkManager on cluster hardware boot +- [ ] **Sub-target 3b**: Avahi enabled so `.local` resolves from LAN +- [ ] **Sub-target 4**: `bun tools/zflash.ts --host ` writes `zeta-hostname.txt` to ESP; install selects right per-host config +- [ ] **Empirical validation**: wifi-only mini-PC boots, joins wifi via injected credentials, accessible via `ssh zeta@.local` from operator Mac with NO console intervention +- [ ] **Sub-target 5** (deferred): cluster join substrate for workers (downstream of B-0776) + +## Composes with substrate + +- **B-0789** (iter-4 SSH+password substrate; depends_on; iter-5 extends the ESP-injection pattern this row builds on) +- **B-0754** (iter-3 USB install; depends_on through B-0789) +- **B-0759** (first-time-CLI-user persona broadened to homelab; this row is load-bearing for homelab specifically) +- **B-0770** (Comet Pro IP-KVM; composes; remote-first install still needs network reachability after install) +- **B-0778** (commodity hardware reference; wifi-only mini-PCs are common in the curated list) +- **B-0790** (zero-dev-machines cluster-native architecture end-state; iter-5 wifi-injection is load-bearing for the homelab persona target) +- `.claude/rules/human-audit-and-legal-risk-acceptance-pattern-in-settings.md` (composes; wifi credentials on USB ESP = plaintext credential class; may want `_wifi_credentials_acceptance` block if cluster goes beyond personal homelab into shared-substrate scope) + +## Security framing + +Wifi password on USB ESP is **plaintext** to anyone who can read the partition. Acceptance: + +- **Homelab persona scope**: physical-USB-control assumption (same as iter-4.2 pubkey — the USB carries operator authority temporarily; physically secured during transit) +- **Maintainer persona scope**: same assumption (the Mac that runs zflash also has the wifi credentials in keychain; no additional exposure) +- **NOT acceptable for**: shared infrastructure, multi-tenant deployments, anywhere the USB transits hostile territory + +Future hardening (out-of-scope this row): encrypted credentials with Touch ID gate at boot; per-cluster ephemeral credentials; etc. For now, plaintext + physical-control + first-boot-consumption (the cred file can optionally be wiped from ESP after consume). + +## Out of scope (for this row; tracked elsewhere) + +- Cluster orchestration substrate (k3s vs Talos vs whatever) — tracked under B-0776 +- Worker join token / control-plane discovery — sub-target 5; deferred +- Encrypted credentials / Touch ID gate — future hardening +- WPA-Enterprise / 802.1X / corporate wifi — not homelab scope + +## Origin + +The maintainer 2026-05-26 during the iter-4.2 PC1 empirical test surfaced the substrate gap when his first cluster node booted but couldn't be reached: + +1. *"we need to move this forward also is this node up and running and working?"* — asks for node health verification (SSH fails to resolve `control-plane.local`) +2. *"does it reconnect to wifi after reboot?"* — sharp question; surfaces the missing piece +3. *"we won't have ethernet for most machines it needs to remember the wifi on setup"* — names the load-bearing requirement explicitly + +This row captures + scopes the iter-5 substrate work. Composes directly with iter-4.x (#5080 → #5083 → #5086 → #5088 → #5091 → #5093 → #5099) — same ESP-injection pattern, different payload (wifi credentials + hostname). + +Per maintainer's broader 2026-05-26 *"going for right not fast"* discipline + the *"ferry commands by reading and typing avoid like the plague"* discipline — iter-5 wifi-injection is load-bearing for keeping zero-typing as the homelab persona's default operator experience.