Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/BACKLOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -372,6 +372,7 @@ are closed (status: closed in frontmatter)._
- [ ] **[B-0787](backlog/P1/B-0787-multi-ai-experiment-parallelism-without-stepping-on-each-others-feet-namespace-plus-experiment-id-plus-event-store-as-projections-not-separate-dbs-aaron-2026-05-25.md)** Multi-AI experiment parallelism without stepping on each other's feet — per-AI namespace + experiment-ID routing + event-store-native twin (experiments are projections, not separate DBs)
- [ ] **[B-0789](backlog/P1/B-0789-iter4-ssh-key-and-hashedpassword-substrate-for-cluster-bringup-2026-05-26.md)** Iter-4 cluster credential substrate — hashedPassword (zeta-change-me default) + operator-ssh-keys.nix module + manual edit workflow (v1) with zflash auto-inject as iter-4.2 follow-up
- [ ] **[B-0790](backlog/P1/B-0790-zero-dev-machines-cluster-native-architecture-voice-as-primary-operator-surface-aaron-2026-05-26.md)** Zero-dev-machines cluster-native architecture — all PRs from cluster; voice (Alexa + future microphones) as primary operator interface; dev machines and Alexa surfaces are conversational entry points into the cluster, not work substrate
- [ ] **[B-0792](backlog/P1/B-0792-iter5-wifi-credentials-injection-via-usb-esp-for-zero-typing-cluster-bringup-without-ethernet-load-bearing-for-homelab-persona-aaron-2026-05-26.md)** iter-5 wifi-credentials injection via USB ESP — homelab persona MOSTLY HAS NO ETHERNET; cluster must "remember the wifi on setup"; analogous to iter-4.x pubkey injection but for NetworkManager profile (Aaron 2026-05-26)

## P2 — research-grade

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
---
id: B-0792
priority: P1
status: open
title: iter-5 wifi-credentials injection via USB ESP — homelab persona MOSTLY HAS NO ETHERNET; cluster must "remember the wifi on setup"; analogous to iter-4.x pubkey injection but for NetworkManager profile (Aaron 2026-05-26)
effort: M
ask: aaron 2026-05-26
created: 2026-05-26
last_updated: 2026-05-26
depends_on:
- B-0789
composes_with:
- B-0754
- B-0759
- B-0770
- B-0778
- B-0790
tags: [iter-5, wifi, networkmanager, zero-typing, homelab-persona, esp-injection, usb-installer, cluster-bringup, b0789-extension]
---

## Problem

The maintainer 2026-05-26 surfaced a load-bearing substrate gap during the iter-4.2 empirical test (PC1 first cluster bring-up):

> *"we won't have ethernet for most machines it needs to remember the wifi on setup"*

Today's substrate ([full-ai-cluster/nixos/modules/common.nix:31](full-ai-cluster/nixos/modules/common.nix#L31)) enables NetworkManager but bakes in **zero wifi credentials**. Result:

- **Ethernet works automatically** (DHCP, no config) — fine for one-off bench testing where operator has an ethernet cable
- **Wifi does NOT work automatically** — requires console-side `nmtui` / `nmcli` to set up first connection; defeats zero-typing discipline

For the homelab persona (per B-0759 broadening + B-0790 end-state), MOST cluster nodes are wifi-only mini-PCs with no ethernet jack populated. Without wifi-injection, iter-4.x doesn't bootstrap the homelab persona at all.

## Target

Extend the iter-4.x ESP-injection pattern to ALSO carry wifi credentials so first cluster boot connects to home wifi automatically — analogous to how iter-4.2 carries `zeta-authorized-keys.pub`.

Operator flow becomes (end-state):

```bash
# One-time setup (per operator Mac): create credentials file once
$ echo '{"ssid":"MyHomeWifi","password":"secret123"}' > ~/.zeta/wifi-credentials.json

# Per-USB flash (zero-typing same as iter-4.4 today):
$ zflash
[Touch ID]
# ... dd + inject pubkey + inject wifi-creds + eject

# Cluster boot: NetworkManager comes up with credentials persisted;
# DHCP via wifi; sshd accessible from operator Mac immediately
```

## Sub-targets (composing iters)

### Sub-target 1 — zflash extension: write zeta-wifi-credentials.json to ESP

Parallel to existing `zeta-authorized-keys.pub` injection. Additions to `full-ai-cluster/tools/zflash.ts`:

- Resolve credentials from (priority order):
1. CLI flags `--wifi-ssid <ssid> --wifi-password <pw>` (one-off override)
2. Env vars `ZETA_WIFI_SSID` / `ZETA_WIFI_PASSWORD`
3. JSON file `~/.zeta/wifi-credentials.json` (`{"ssid": "...", "password": "..."}`)
4. None → skip wifi injection (operator may have ethernet; not fatal)
- Write `zeta-wifi-credentials.json` to ESP via existing mountEsp path (single Touch ID covers all sudo calls per sudo timestamp window)
- Print substrate-honest disclosure: "iter-5: wrote wifi credentials (SSID=<ssid>, password=<redacted>) to ESP" — never print the password to stdout

### Sub-target 2 — zeta-install.sh extension: read ESP creds + write NetworkManager profile

Parallel to existing pubkey read. Additions to `full-ai-cluster/usb-nixos-installer/zeta-install.sh`:

- During install, before `nixos-install` completes, check if `zeta-wifi-credentials.json` exists on the boot USB's ESP
- If present, write a NetworkManager connection file to `/mnt/etc/NetworkManager/system-connections/zeta-wifi.nmconnection` with:
```ini
[connection]
id=zeta-wifi
type=wifi
autoconnect=true
permissions=
[wifi]
ssid=<from json>
mode=infrastructure
[wifi-security]
key-mgmt=wpa-psk
psk=<from json>
[ipv4]
method=auto
[ipv6]
method=auto
```
- chmod 0600 the file (NetworkManager requires)
- Photo-friendly diagnostic on success: "iter-5: wifi credentials injected for SSID=<ssid> at /etc/NetworkManager/system-connections/zeta-wifi.nmconnection"
- On failure: dumpDiagnostics + fallback discipline (cluster still bootable; operator can use `nmtui` console-side as escape hatch)

### Sub-target 3 — NixOS config: NetworkManager `wireless` enable + nss-mdns publishing

Two related gaps surfaced by same test:

a) **NetworkManager wireless plugin enable** — verify `programs.nm-applet.enable` and wireless backend are correct for headless NetworkManager wifi connection (may need `networking.wireless.enable = false` to defer to NM, plus NetworkManager's wpa_supplicant module). Test on actual cluster hardware.

b) **mDNS publishing** — empirical 2026-05-26: `ssh zeta@control-plane.local` failed to resolve from operator Mac because NixOS install has NO Avahi configured. Add to `full-ai-cluster/nixos/modules/common.nix`:

```nix
services.avahi = {
enable = true;
publish = {
enable = true;
addresses = true;
workstation = true;
domain = true;
};
nssmdns4 = true;
};
```

After this lands, `ssh zeta@control-plane.local` from any LAN device (Mac, Linux, etc.) resolves via mDNS without IP discovery step.

### Sub-target 4 — multi-node hostname selection

The iter-5 "what happens when there's a 2nd node?" question Aaron asked. Three options previously surfaced:

1. **Pre-bake per-USB** (RECOMMENDED): `bun tools/zflash.ts --host worker-gpu-1` → zflash writes `zeta-hostname.txt` to ESP; `zeta-install.sh` reads it + passes to `nixos-install --flake .../#$HOST`
2. Prompt on first boot via console (defeats zero-typing)
3. Auto-detect by MAC/serial pattern (risky)

Option 1 composes with sub-targets 1+2 cleanly — adds a 3rd ESP file (`zeta-hostname.txt`) to the inject set.

### Sub-target 5 — cluster join token / control-plane address injection

For worker nodes (`worker-gpu-1` joining `control-plane`), iter-5 also needs:

- Bootstrap join token (k3s / kubeadm / Talos / whatever cluster substrate the workers join)
- Control-plane address (probably auto-discoverable via mDNS once sub-target 3 is in)

This sub-target is downstream of cluster-orchestration-substrate selection (B-0776 simplest-first plugin sequence likely informs the choice). Track separately when that lands.

## Acceptance

- [ ] **Sub-target 1**: zflash writes `zeta-wifi-credentials.json` to ESP when credentials are resolvable; logs SSID + redacted-password disclosure
- [ ] **Sub-target 2**: `zeta-install.sh` reads ESP creds + writes NetworkManager profile to `/mnt/etc/NetworkManager/system-connections/zeta-wifi.nmconnection` with chmod 0600
- [ ] **Sub-target 3a**: NixOS config verified to bring up wifi via NetworkManager on cluster hardware boot
- [ ] **Sub-target 3b**: Avahi enabled so `<hostname>.local` resolves from LAN
- [ ] **Sub-target 4**: `bun tools/zflash.ts --host <hostname>` writes `zeta-hostname.txt` to ESP; install selects right per-host config
- [ ] **Empirical validation**: wifi-only mini-PC boots, joins wifi via injected credentials, accessible via `ssh zeta@<hostname>.local` from operator Mac with NO console intervention
- [ ] **Sub-target 5** (deferred): cluster join substrate for workers (downstream of B-0776)

## Composes with substrate

- **B-0789** (iter-4 SSH+password substrate; depends_on; iter-5 extends the ESP-injection pattern this row builds on)
- **B-0754** (iter-3 USB install; depends_on through B-0789)
- **B-0759** (first-time-CLI-user persona broadened to homelab; this row is load-bearing for homelab specifically)
- **B-0770** (Comet Pro IP-KVM; composes; remote-first install still needs network reachability after install)
- **B-0778** (commodity hardware reference; wifi-only mini-PCs are common in the curated list)
- **B-0790** (zero-dev-machines cluster-native architecture end-state; iter-5 wifi-injection is load-bearing for the homelab persona target)
- `.claude/rules/human-audit-and-legal-risk-acceptance-pattern-in-settings.md` (composes; wifi credentials on USB ESP = plaintext credential class; may want `_wifi_credentials_acceptance` block if cluster goes beyond personal homelab into shared-substrate scope)

## Security framing

Wifi password on USB ESP is **plaintext** to anyone who can read the partition. Acceptance:

- **Homelab persona scope**: physical-USB-control assumption (same as iter-4.2 pubkey — the USB carries operator authority temporarily; physically secured during transit)
- **Maintainer persona scope**: same assumption (the Mac that runs zflash also has the wifi credentials in keychain; no additional exposure)
- **NOT acceptable for**: shared infrastructure, multi-tenant deployments, anywhere the USB transits hostile territory

Future hardening (out-of-scope this row): encrypted credentials with Touch ID gate at boot; per-cluster ephemeral credentials; etc. For now, plaintext + physical-control + first-boot-consumption (the cred file can optionally be wiped from ESP after consume).

## Out of scope (for this row; tracked elsewhere)

- Cluster orchestration substrate (k3s vs Talos vs whatever) — tracked under B-0776
- Worker join token / control-plane discovery — sub-target 5; deferred
- Encrypted credentials / Touch ID gate — future hardening
- WPA-Enterprise / 802.1X / corporate wifi — not homelab scope

## Origin

The maintainer 2026-05-26 during the iter-4.2 PC1 empirical test surfaced the substrate gap when his first cluster node booted but couldn't be reached:

1. *"we need to move this forward also is this node up and running and working?"* — asks for node health verification (SSH fails to resolve `control-plane.local`)
2. *"does it reconnect to wifi after reboot?"* — sharp question; surfaces the missing piece
3. *"we won't have ethernet for most machines it needs to remember the wifi on setup"* — names the load-bearing requirement explicitly

This row captures + scopes the iter-5 substrate work. Composes directly with iter-4.x (#5080 → #5083 → #5086 → #5088 → #5091 → #5093 → #5099) — same ESP-injection pattern, different payload (wifi credentials + hostname).

Per maintainer's broader 2026-05-26 *"going for right not fast"* discipline + the *"ferry commands by reading and typing avoid like the plague"* discipline — iter-5 wifi-injection is load-bearing for keeping zero-typing as the homelab persona's default operator experience.
Loading