fix(B-0754 iter-2): empty systemd PATH broke clear+nmtui+ping+systemctl on real hardware#5047
Merged
Merged
Conversation
…tl on real hardware
Iteration-1 cluster-node test (Aaron 2026-05-25, photo evidence on first real-hardware boot of v1):
- 'clear: command not found' at line 40 (role prompt) + line 77 (banner)
- 'nmtui: command not found' when ethernet-DHCP wait expired and the wifi-fallback path fired
- Drop-to-shell worked correctly (substrate-honest failure path); operator got a recovery shell
Root cause: NixOS systemd services get a minimal PATH by default; the first-boot script's bare commands (clear, nmtui, ping, systemctl) need either explicit absolute paths OR a configured Environment block. The shell PATH that 'just works' for interactive login isn't inherited by Type=idle systemd services.
Two-layer fix (defense in depth):
1. **systemd unit Environment** (configuration.nix): explicit
PATH=/run/current-system/sw/bin:/run/current-system/sw/sbin:/run/wrappers/bin
TERM=linux
This is the load-bearing fix — covers every current AND future bare command in zeta-first-boot.sh + zeta-install.sh (ping, systemctl, lsblk, sgdisk, mkfs.fat, mkfs.ext4, mount, partprobe, etc.; many of which would have ALSO failed silently on the same iteration but didn't reach because nmtui blocked first).
2. **Script-level absolute paths + ANSI escape** (zeta-first-boot.sh): replace 'clear || true' with 'printf \033c || true' (no external command); change 'nmtui' invocation to '/run/current-system/sw/bin/nmtui'. Belt-and-suspenders: even if the systemd Environment is overridden by some future change, these two failure modes stay fixed.
Composes with B-0759 first-time-CLI-user persona (substrate-honest error path proved the discipline — drop-to-shell with recovery hints worked exactly as designed); B-0760 USB-as-repair-tool (the same systemd-PATH discipline applies to every command the repair flow will invoke); B-0761 reference architecture (this is iteration N of N for the AI-native cluster-bootstrap reference, payoff is bandwidth-engineered across every future install).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Fixes the installer ISO’s first-boot automation failing under NixOS systemd’s minimal default PATH, so the unattended “zero-typing” flow can complete on real hardware.
Changes:
- Replaced
clearwith an ANSI terminal reset escape to remove reliance onclearbeing inPATH. - Invoked
nmtuivia an absolute path to avoidPATH-inheritance issues. - Set explicit
PATHandTERMin thezeta-first-bootsystemd unit to cover current and future bare command usage in the first-boot flow.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| full-ai-cluster/usb-nixos-installer/zeta-first-boot.sh | Removes dependency on clear being available and pins nmtui to an absolute path for robustness under systemd. |
| full-ai-cluster/usb-nixos-installer/nixos/installer/configuration.nix | Adds explicit PATH/TERM to the first-boot systemd unit to ensure all invoked tools are discoverable in the unit environment. |
…OS sets a default; my new value collided
CI iteration-2 build failed on:
error: The option 'systemd.services.zeta-first-boot.environment.PATH' has conflicting definition values:
- In nixos/modules/system/boot/systemd.nix: <NixOS default — coreutils + findutils + grep + sed + systemd>
- In configuration.nix: '/run/current-system/sw/bin:/run/current-system/sw/sbin:/run/wrappers/bin'
NixOS systemd module defines a default PATH at mkOptionDefault priority; my new value was at the same priority → conflict. lib.mkForce raises priority + resolves.
Note: the NixOS default DOES include coreutils + systemd (so 'clear', 'systemctl', 'ping' WOULD have worked under that default) — meaning iter-1's failures were NOT actually from a missing PATH but from this service running with an EMPTY PATH. Different bug than I initially diagnosed. lib.mkForce still gets us the desired PATH including /run/current-system/sw/bin which is the load-bearing addition (zeta-install + zeta-first-boot + nmtui all live there + not in NixOS default).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t script env Copilot review on PR #5047 noted that 'the env-var below' was misleading because no env-var is set in the script — the PATH override is on the systemd unit (configuration.nix systemd.services.zeta-first-boot .environment.PATH = lib.mkForce ...). Reword the comment to reference the systemd unit's environment block so future readers see the actual source of truth. Co-Authored-By: Claude <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Member
Author
|
Fixed in |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Iteration 1 result (real-hardware test, Aaron 2026-05-25)
Photo evidence on the cluster node screen after booting the v1 ISO:
clear: command not found(line 40 + line 77) — the role-prompt and banner sectionsnmtui: command not found— when ethernet-DHCP wait expired and the wifi-fallback firedThe substrate-honest failure path validated: the script degraded gracefully and the operator could still complete the install manually (
nmtui+zeta-install control-planefrom the recovery shell). But the load-bearing zero-typing-automation flow didn't reach the end. This PR is the fix so iteration 2 completes unattended.Root cause
NixOS systemd services get a minimal PATH by default. The first-boot script's bare commands (
clear,nmtui,ping,systemctl, plus every command zeta-install.sh would reach for —lsblk,sgdisk,mkfs.fat,mkfs.ext4,mount,partprobe,partprobe, etc.) all need either explicit absolute paths OR a configured Environment block on the systemd unit. The interactive-shell PATH that 'just works' for SSH or tty2 login is NOT inherited by Type=idle systemd services.The reason only
clearandnmtuiwere observed:nmtuiblocked first; the rest never executed in the failed path.Fix (defense in depth)
1. systemd unit Environment block (load-bearing)
configuration.nix: explicitPATH+TERMon the zeta-first-boot service. Covers every current AND future bare command:2. Script-level belt-and-suspenders
zeta-first-boot.sh:clear || true(×2) withprintf '\\033c' || true— ANSI 'reset terminal' escape; no external command dependencynmtuiinvocation to/run/current-system/sw/bin/nmtui(absolute path)Even if the systemd Environment is overridden by some future change, these two failure modes stay fixed.
Composes with
Test plan
bash -nsyntax check on edited zeta-first-boot.shfull-ai-cluster/usb-nixos-installer/**path)