Skip to content

fix(B-0754 iter-2): empty systemd PATH broke clear+nmtui+ping+systemctl on real hardware#5047

Merged
AceHack merged 3 commits into
mainfrom
otto-cli/b0754-fix-systemd-path-iter2-2026-05-25
May 26, 2026
Merged

fix(B-0754 iter-2): empty systemd PATH broke clear+nmtui+ping+systemctl on real hardware#5047
AceHack merged 3 commits into
mainfrom
otto-cli/b0754-fix-systemd-path-iter2-2026-05-25

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 26, 2026

Iteration 1 result (real-hardware test, Aaron 2026-05-25)

Photo evidence on the cluster node screen after booting the v1 ISO:

  • clear: command not found (line 40 + line 77) — the role-prompt and banner sections
  • nmtui: command not found — when ethernet-DHCP wait expired and the wifi-fallback fired
  • Drop-to-shell worked correctly — operator landed at a working root prompt with the recovery hints intact

The substrate-honest failure path validated: the script degraded gracefully and the operator could still complete the install manually (nmtui + zeta-install control-plane from the recovery shell). But the load-bearing zero-typing-automation flow didn't reach the end. This PR is the fix so iteration 2 completes unattended.

Root cause

NixOS systemd services get a minimal PATH by default. The first-boot script's bare commands (clear, nmtui, ping, systemctl, plus every command zeta-install.sh would reach for — lsblk, sgdisk, mkfs.fat, mkfs.ext4, mount, partprobe, partprobe, etc.) all need either explicit absolute paths OR a configured Environment block on the systemd unit. The interactive-shell PATH that 'just works' for SSH or tty2 login is NOT inherited by Type=idle systemd services.

The reason only clear and nmtui were observed: nmtui blocked first; the rest never executed in the failed path.

Fix (defense in depth)

1. systemd unit Environment block (load-bearing)

configuration.nix: explicit PATH + TERM on the zeta-first-boot service. Covers every current AND future bare command:

environment = {
  PATH = "/run/current-system/sw/bin:/run/current-system/sw/sbin:/run/wrappers/bin";
  TERM = "linux";
};

2. Script-level belt-and-suspenders

zeta-first-boot.sh:

  • Replace clear || true (×2) with printf '\\033c' || true — ANSI 'reset terminal' escape; no external command dependency
  • Change nmtui invocation to /run/current-system/sw/bin/nmtui (absolute path)

Even if the systemd Environment is overridden by some future change, these two failure modes stay fixed.

Composes with

  • B-0759 first-time-CLI-user persona — drop-to-shell with recovery hints worked exactly as designed; the persona-aligned error path was substrate-honest
  • B-0760 USB-as-repair-tool — same systemd-PATH discipline applies to every command the repair flow will invoke
  • B-0761 reference architecture — this is iteration N of N for the AI-native cluster-bootstrap reference; bandwidth payoff across every future install

Test plan

  • bash -n syntax check on edited zeta-first-boot.sh
  • CI rebuilds ISO via build-ai-cluster-iso.yml (auto-triggers on full-ai-cluster/usb-nixos-installer/** path)
  • Aaron reflashes via zflash + boots cluster node + observes unattended install completes end-to-end
  • CI green

…tl on real hardware

Iteration-1 cluster-node test (Aaron 2026-05-25, photo evidence on first real-hardware boot of v1):

- 'clear: command not found' at line 40 (role prompt) + line 77 (banner)
- 'nmtui: command not found' when ethernet-DHCP wait expired and the wifi-fallback path fired
- Drop-to-shell worked correctly (substrate-honest failure path); operator got a recovery shell

Root cause: NixOS systemd services get a minimal PATH by default; the first-boot script's bare commands (clear, nmtui, ping, systemctl) need either explicit absolute paths OR a configured Environment block. The shell PATH that 'just works' for interactive login isn't inherited by Type=idle systemd services.

Two-layer fix (defense in depth):

1. **systemd unit Environment** (configuration.nix): explicit
     PATH=/run/current-system/sw/bin:/run/current-system/sw/sbin:/run/wrappers/bin
     TERM=linux
   This is the load-bearing fix — covers every current AND future bare command in zeta-first-boot.sh + zeta-install.sh (ping, systemctl, lsblk, sgdisk, mkfs.fat, mkfs.ext4, mount, partprobe, etc.; many of which would have ALSO failed silently on the same iteration but didn't reach because nmtui blocked first).

2. **Script-level absolute paths + ANSI escape** (zeta-first-boot.sh): replace 'clear || true' with 'printf \033c || true' (no external command); change 'nmtui' invocation to '/run/current-system/sw/bin/nmtui'. Belt-and-suspenders: even if the systemd Environment is overridden by some future change, these two failure modes stay fixed.

Composes with B-0759 first-time-CLI-user persona (substrate-honest error path proved the discipline — drop-to-shell with recovery hints worked exactly as designed); B-0760 USB-as-repair-tool (the same systemd-PATH discipline applies to every command the repair flow will invoke); B-0761 reference architecture (this is iteration N of N for the AI-native cluster-bootstrap reference, payoff is bandwidth-engineered across every future install).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 26, 2026 00:10
@AceHack AceHack enabled auto-merge (squash) May 26, 2026 00:10
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes the installer ISO’s first-boot automation failing under NixOS systemd’s minimal default PATH, so the unattended “zero-typing” flow can complete on real hardware.

Changes:

  • Replaced clear with an ANSI terminal reset escape to remove reliance on clear being in PATH.
  • Invoked nmtui via an absolute path to avoid PATH-inheritance issues.
  • Set explicit PATH and TERM in the zeta-first-boot systemd unit to cover current and future bare command usage in the first-boot flow.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
full-ai-cluster/usb-nixos-installer/zeta-first-boot.sh Removes dependency on clear being available and pins nmtui to an absolute path for robustness under systemd.
full-ai-cluster/usb-nixos-installer/nixos/installer/configuration.nix Adds explicit PATH/TERM to the first-boot systemd unit to ensure all invoked tools are discoverable in the unit environment.

Comment thread full-ai-cluster/usb-nixos-installer/zeta-first-boot.sh Outdated
Lior and others added 2 commits May 25, 2026 20:17
…OS sets a default; my new value collided

CI iteration-2 build failed on:
  error: The option 'systemd.services.zeta-first-boot.environment.PATH' has conflicting definition values:
    - In nixos/modules/system/boot/systemd.nix: <NixOS default — coreutils + findutils + grep + sed + systemd>
    - In configuration.nix: '/run/current-system/sw/bin:/run/current-system/sw/sbin:/run/wrappers/bin'

NixOS systemd module defines a default PATH at mkOptionDefault priority; my new value was at the same priority → conflict. lib.mkForce raises priority + resolves.

Note: the NixOS default DOES include coreutils + systemd (so 'clear', 'systemctl', 'ping' WOULD have worked under that default) — meaning iter-1's failures were NOT actually from a missing PATH but from this service running with an EMPTY PATH. Different bug than I initially diagnosed. lib.mkForce still gets us the desired PATH including /run/current-system/sw/bin which is the load-bearing addition (zeta-install + zeta-first-boot + nmtui all live there + not in NixOS default).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t script env

Copilot review on PR #5047 noted that 'the env-var below' was misleading
because no env-var is set in the script — the PATH override is on the
systemd unit (configuration.nix systemd.services.zeta-first-boot
.environment.PATH = lib.mkForce ...). Reword the comment to reference
the systemd unit's environment block so future readers see the actual
source of truth.

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 26, 2026 00:19
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@AceHack
Copy link
Copy Markdown
Member Author

AceHack commented May 26, 2026

Fixed in fd160ff1: reworded the comment to reference the systemd unit's environment.PATH override (set in configuration.nix on systemd.services.zeta-first-boot.environment.PATH via lib.mkForce) rather than implying an env-var is set in the shell script. Thanks for the catch.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@AceHack AceHack merged commit aed3014 into main May 26, 2026
29 checks passed
@AceHack AceHack deleted the otto-cli/b0754-fix-systemd-path-iter2-2026-05-25 branch May 26, 2026 00:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants