Skip to content

docs(B-0852.4): NixOS module boot-time cred-restore from ESP — gates end-to-end USB test (Aaron 2026-05-27 USB push)#5454

Merged
AceHack merged 1 commit into
mainfrom
backlog/b-0852-4-nixos-module-boot-restore-row-2026-05-27
May 27, 2026
Merged

docs(B-0852.4): NixOS module boot-time cred-restore from ESP — gates end-to-end USB test (Aaron 2026-05-27 USB push)#5454
AceHack merged 1 commit into
mainfrom
backlog/b-0852-4-nixos-module-boot-restore-row-2026-05-27

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 27, 2026

Summary

Files B-0852.4 row capturing the boot-time companion to B-0852.3a picker (PR #5450).

Why this gates the USB test: picker writes blob → reboot → without B-0852.4 the blob is ignored. With B-0852.4: full persist → restore → use chain on real USB hardware.

Sub-rows

  • 4a NixOS module + systemd unit
  • 4b interactive systemd-ask-password mode
  • 4c file-based env-injected passphrase (simpler; first to ship)
  • 4d wire into common.nix
  • 4e empirical USB end-to-end test

Order: 4a → 4c → 4d → 4e → 4b.

Test plan

🤖 Generated with Claude Code

…end-to-end USB test (Aaron 2026-05-27 USB push; consumes B-0852.3a picker blob)

Files B-0852.4 row capturing the boot-time companion to the just-armed
B-0852.3a picker (PR #5450). Without B-0852.4, picker writes blob to ESP
but no boot-time consumer = blob ignored on reboot. With B-0852.4: full
persist → restore → use chain on real USB hardware.

Five sub-rows planned:
- 4a NixOS module file + systemd unit
- 4b interactive systemd-ask-password mode
- 4c file-based env-injected passphrase mode (simpler; first to ship)
- 4d wire into cluster-node common.nix
- 4e empirical USB end-to-end test

Implementation order: 4a → 4c (simpler passphrase path) → 4d → 4e
(USB validation) → 4b (interactive nicer-UX last).

Substrate-inventory pass per .claude/rules/verify-existing-substrate-before-authoring.md
cited inline. All upstream sub-rows merged (B-0852.1/.2a/.2b/.5/.10) +
picker armed (5450). P1 because gates operator's empirical USB test.

Composes with:
- B-0852.3a picker (PR #5450 in flight) — produces the blob this consumes
- B-0855 self-register architectural fix — fires AFTER cred-restore via systemd ordering
- B-0857 install.sh universal entry — install-time companion to this row's boot-time scope
- B-0833 installer interactive-login — declined-creds branch at user login
- .claude/rules/non-coercion-invariant.md HC-8 — required-fail surfaces; optional-fail warn-and-continue

Heartbeat-via-commit per the just-landed CLAUDE.md discipline (PR #5451):
filing this row IS counter-reset work per .claude/rules/holding-without-named-dependency-is-standing-by-failure.md
condition #3 while #5450 build-iso runs as the named bounded-wait.

Per .claude/rules/agent-worktree-hygiene-never-hold-main-...: isolated
worktree at /private/tmp/zeta-b0852-4-row-1245z; operator primary
checkout untouched.

Per .claude/rules/non-coercion-invariant.md HC-8: operator authority over
own creds preserved; passphrase NEVER logged; modes A/B both preserve
operator choice; required-cred-fail surfaces honestly.

Agency-Signature-Version: 1
Agent: Otto
Agent-Runtime: Claude Code (auto mode)
Agent-Model: claude-opus-4-7
Credential-Identity: aaron-otto-vscode
Credential-Mode: operator-authorized
Human-Review: pre-merge-pending
Human-Review-Evidence: operator-direction-2026-05-27-usb-push-keep-pushing-forward
Action-Mode: substrate-row-filing
Task: B-0852.4

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 27, 2026 13:42
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@AceHack AceHack enabled auto-merge (squash) May 27, 2026 13:42
@AceHack AceHack merged commit badc0b6 into main May 27, 2026
29 of 30 checks passed
@AceHack AceHack deleted the backlog/b-0852-4-nixos-module-boot-restore-row-2026-05-27 branch May 27, 2026 13:44
@AceHack AceHack review requested due to automatic review settings May 27, 2026 14:04
AceHack added a commit that referenced this pull request May 27, 2026
…luster common.nix imports — last gate for end-to-end USB cred-persistence test (Aaron 2026-05-27 USB priority) (#5476)

* feat(B-0852.4a): NixOS module zeta-creds-restore.nix — boot-time decrypt from ESP via systemd service (Aaron 2026-05-27 USB push; sibling to zeta-self-register.nix per B-0855.1)

Implements the boot-time consumer for the install-time picker (B-0852.3a
PR #5450). Composes with zeta-self-register.service which already
declares `after = "zeta-creds-restore.service"` per B-0855.1 module —
the dependency was wired upstream; this row makes the target service
actually exist.

**Module: full-ai-cluster/nixos/modules/zeta-creds-restore.nix**

NixOS module providing systemd service `zeta-creds-restore.service`:

- Disabled by default (`zeta.credsRestore.enable = false`); opt-in
  per host config (matches zeta-self-register sibling pattern)
- Ordering: `wantedBy=multi-user.target`, `after=local-fs.target` +
  `wants=local-fs.target` (ESP mounted before fire); B-0855.1 enforces
  `after=zeta-creds-restore.service` from its side
- ConditionPathExists guard: blob + USB UUID + restore CLI + bun shim
  must all exist (clean skip when picker wasn't run at install)
- Two passphrase modes (operator-configurable):
  - **file** (default): read from /run/zeta-creds-passphrase
    (operator pre-stages); deleted by ExecStopPost
  - **interactive**: systemd-ask-password on tty1 (300s timeout);
    writes zeta-readable temp file; deleted by ExecStopPost
- Invokes B-0852.2b restore CLI as zeta user via sudo with proper
  HOME + PATH + --target-root=/
- Optional --persona passthrough for per-persona-scoped creds
- Restart=on-failure with 30s backoff (per .claude/rules/non-coercion-invariant.md
  HC-8: required-cred failure surfaces honestly)

**Verification**: `nix-instantiate --parse` returns PARSE OK.

**What this unblocks for operator's USB test**:

End-to-end persist → restore → use chain now possible on real USB:
1. Operator reflashes USB
2. Boots, runs installer with ZETA_CREDS_PICKER=1 + ZETA_CREDS_PASSPHRASE=...
3. Picker writes /esp/zeta-creds.enc (B-0852.3a / PR #5450)
4. Operator enables zeta.credsRestore.enable=true + passphraseMode in
   host common.nix (B-0852.4d wiring; next sub-row)
5. Reboot → systemd fires zeta-creds-restore.service → blob decrypts →
   per-cred files populated in /home/zeta
6. zeta-self-register.service fires next per B-0855.1 ordering

Composes:
- B-0852.1 crypto (PR #5413; decrypt envelope)
- B-0852.2a envelope (PR #5421; parse blob format)
- B-0852.2b restore CLI (PR #5425; the binary this module wraps)
- B-0852.3a picker (PR #5450; produces the blob)
- B-0852.4 row (PR #5454; this is sub-row 4a)
- B-0852.5 manifest (PR #5414; drives per-cred path resolution)
- B-0855.1 zeta-self-register.nix (the sibling module that already
  expects this service to exist)
- B-0857 install.sh universal entry (install-time companion)

Remaining sub-rows planned (per B-0852.4 row):
- 4c: file-mode is implemented (default mode in this PR)
- 4b: interactive-mode also implemented (both modes ship together)
- 4d: wire into common.nix (next PR; simple imports list add)
- 4e: empirical USB end-to-end test (validates full chain on hardware)

Per .claude/rules/agent-worktree-hygiene-never-hold-main-...: isolated
worktree at /private/tmp/zeta-b0852-4a-module-1250z; operator primary
checkout untouched.

Per .claude/rules/non-coercion-invariant.md HC-8: operator authority
over creds preserved; passphrase NEVER logged; interactive prompt
operator-driven; file-mode operator-staged; failure surfaces via
journalctl + restart policy.

Per .claude/rules/methodology-hard-limits.md: clinical/security floor
operative; cred-restore is purely defensive operator-data-recovery
substrate; no offensive use.

Heartbeat-via-commit per CLAUDE.md (PR #5451): this commit IS the
externalized counter tick; AgencySignature v1 trailer below; named
bounded-wait is #5450 build-iso completion.

Agency-Signature-Version: 1
Agent: Otto
Agent-Runtime: Claude Code (auto mode)
Agent-Model: claude-opus-4-7
Credential-Identity: aaron-otto-vscode
Credential-Mode: operator-authorized
Human-Review: pre-merge-pending
Human-Review-Evidence: operator-direction-2026-05-27-usb-push-keep-pushing-forward
Action-Mode: substrate-implementation
Task: B-0852.4a

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(B-0852.4d): wire zeta-creds-restore.nix into cluster common.nix imports — last gate before end-to-end USB test (Aaron 2026-05-27 USB priority)

Adds `./zeta-creds-restore.nix` to `full-ai-cluster/nixos/modules/common.nix`
imports list right after `./zeta-self-register.nix` — matches the
ordering B-0855.1 documents (zeta-self-register declares
`after = "zeta-creds-restore.service"`; both share import position).

Disabled-by-default (per the module's mkEnableOption); host configs
opt in via `zeta.credsRestore.enable = true;` AND operator pre-stages
a passphrase source. Imported here so every cluster-node type
(control-plane / worker-gpu) inherits the same module surface; the
opt-in flip lives at host-config level not common.nix level.

Composes:
- B-0852.4a (this PR's earlier commit ef45b4f) — the module file itself
- B-0852.3a picker (PR #5450) — install-time blob writer
- B-0852.4 row (PR #5454 merged) — substrate-engineering parent
- B-0855.1 zeta-self-register.nix — already declares `after = "zeta-creds-restore.service"`
- iter-5.5.0 install flow — picker writes blob during install; module restores at boot

**Empirical USB test path now complete end-to-end**:
1. Reflash USB with ISO carrying these changes
2. Boot, run installer with ZETA_CREDS_PICKER=1 + ZETA_CREDS_PASSPHRASE=...
3. Step 6.95-picker writes /esp/zeta-creds.enc (B-0852.3a)
4. Operator enables `zeta.credsRestore.enable = true;` in host config
   + pre-stages /run/zeta-creds-passphrase
5. Reboot → zeta-creds-restore.service fires → blob decrypted →
   per-cred files populated in /home/zeta
6. zeta-self-register.service fires next per B-0855.1 ordering

Verification:
- `nix-instantiate --parse full-ai-cluster/nixos/modules/common.nix` → PARSE OK
- `nix-instantiate --parse full-ai-cluster/nixos/modules/zeta-creds-restore.nix` → PARSE OK

Per .claude/rules/non-coercion-invariant.md HC-8: opt-in default
preserves operator authority over per-host enablement; importing
the module surface doesn't activate it.

Per .claude/rules/agent-worktree-hygiene-never-hold-main-...: isolated
worktree at /private/tmp/zeta-b0852-4a-module-1250z; operator primary
checkout untouched.

Agency-Signature-Version: 1
Agent: Otto
Agent-Runtime: Claude Code (auto mode)
Agent-Model: claude-opus-4-7
Credential-Identity: aaron-otto-vscode
Credential-Mode: operator-authorized
Human-Review: pre-merge-pending
Human-Review-Evidence: operator-direction-2026-05-27-back-to-usb-after-heartbeat-iteration
Action-Mode: substrate-implementation-final-usb-gate
Task: B-0852.4d

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(B-0852.4a): 3 Copilot findings — P0 root-write for /etc paths + P0 ExecStopPost-never-fires + P1 USB UUID newline trim

3 Copilot threads on PR #5476:

**P0 (@180): sudo -u ${cfg.user} can't write to /etc paths.**
The default cred manifest includes /etc/zeta/operator-authorized-keys
+ /etc/ssh/ssh_host_* (root-owned paths zeta user can't write).
Fix: run restore CLI AS ROOT directly (drop the sudo -u zeta drop).
Post-restore find ${cfg.home} -user root -exec chown zeta:users
to fix ownership on user-facing creds (~/.config/gh, ~/.config/claude,
~/.gemini, ~/.codex). Operator's pre-existing configs (already
zeta-owned) untouched by the -user root filter.

**P0 (@189): RemainAfterExit=true + Type=oneshot means
ExecStopPost never fires on successful boot.**
The unit stays "active" after ExecStart returns; systemd doesn't
treat that as a "stop" event so ExecStopPost is skipped. Passphrase
cleanup never runs. Fix: move cleanup to bash EXIT trap inside
ExecStart — fires on ANY exit path (success or failure), unaffected
by RemainAfterExit semantics. Removed standalone ExecStopPost.

**P1 (@140): USB_UUID trailing newline from cat.**
`cat /etc/zeta/usb-uuid` includes trailing \n if file ends with one.
Fix: `tr -d '[:space:]' < ${cfg.usbUuidPath}` strips all whitespace
(safer than just newlines; covers \r\n + leading whitespace too).

Per .claude/rules/blocked-green-ci-investigate-threads.md verify-then-fix:
each Copilot finding read against actual file content; all 3 real
findings; bundled fix with rationale per finding.

Verification: `nix-instantiate --parse full-ai-cluster/nixos/modules/zeta-creds-restore.nix`
returns PARSE OK.

Per .claude/rules/non-coercion-invariant.md HC-8: operator authority
preserved (chown only touches root-owned files; pre-existing
zeta-owned files untouched).

Agency-Signature-Version: 1
Agent: Otto
Agent-Runtime: Claude Code (auto mode)
Agent-Model: claude-opus-4-7
Credential-Identity: aaron-otto-vscode
Credential-Mode: operator-authorized
Human-Review: pre-merge-pending
Human-Review-Evidence: copilot-3-findings-on-pr-5476-2-p0-1-p1
Action-Mode: substrate-fix-fwd-security-plus-correctness
Task: B-0852.4a

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Lior <lior@zeta.dev>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant