Skip to content

fix(P0-iter-5.4): nixos-install --fallback is NOT a valid flag → use --option fallback true (empirical USB install failure Aaron 2026-05-27)#5410

Merged
AceHack merged 1 commit into
mainfrom
fix/p0-nixos-install-fallback-flag-not-supported-2026-05-27
May 27, 2026
Merged

fix(P0-iter-5.4): nixos-install --fallback is NOT a valid flag → use --option fallback true (empirical USB install failure Aaron 2026-05-27)#5410
AceHack merged 1 commit into
mainfrom
fix/p0-nixos-install-fallback-flag-not-supported-2026-05-27

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 27, 2026

Summary

P0 install blocker fix-fwd. Aaron's 2026-05-27 USB boot test (ISO ci26490417201 / commit 282648d) hit:

Running nixos-install --flake /mnt/etc/zeta/full-ai-cluster#control-plane --fallback ...
/run/current-system/sw/bin/nixos-install: unknown option `--fallback`
[zeta-first-boot] Install failed. See output above.

Install dropped to interactive shell; cluster bring-up completely blocked.

Root cause

PR #5383 added --fallback as a top-level flag to nixos-install. The flag exists in nix-build/nix-store but NOT in nixos-install. The Nix-option pass-through convention --option fallback true IS supported (same shape as the existing --option connect-timeout 10 / --option stalled-download-timeout 60 / --option download-attempts 3 already in the same invocation).

Fix

1-line change --fallback \--option fallback true \ + comment update with empirical anchor.

Operator unblock (live-USB shell right now)

sed -i 's|^  --fallback \\|  --option fallback true \\|' /run/current-system/sw/bin/zeta-install
zeta-install control-plane

Composes with

Test plan

  • Next ISO build from this fix → fresh USB flash → first-boot install completes (no unknown option error)
  • cache.nixos.org timeout fallback still operates (verify via --option fallback true semantics: Nix builds from source when substituter fails)

🤖 Generated with Claude Code

…--option fallback true (Nix-option pass-through)

Empirical failure 2026-05-27 (Aaron USB boot, ISO ci26490417201 / commit 282648d):

  Running nixos-install --flake /mnt/etc/zeta/full-ai-cluster#control-plane --fallback ...
  /run/current-system/sw/bin/nixos-install: unknown option `--fallback`
  [zeta-first-boot] Install failed. See output above.

Dropped to interactive shell; install completely blocked.

Root cause: PR #5383 (`fix(B-0835)+feat(B-0846): zeta-install --fallback`)
added `--fallback` as a TOP-LEVEL flag to nixos-install, assuming it was
passed through to underlying nix. nixos-install does NOT accept --fallback;
the supported form is `--option fallback true` (the Nix-option pass-through
convention nixos-install already uses for connect-timeout / stalled-download-
timeout / download-attempts in the same invocation block).

Fix: 1-line change `--fallback \` → `--option fallback true \`
plus comment update noting the empirical anchor + correct pass-through form.

Operator unblock for the broken interactive shell on Aaron's USB:
  sed -i 's|^  --fallback \\|  --option fallback true \\|' \
    /run/current-system/sw/bin/zeta-install
  zeta-install control-plane

Composes with:
- B-0835 (installer-config-bugs canonical bag) — adds Bug 9 to the catalog
- B-0846 (WiFi-reproducibility substrate) — this fix preserves the intent
  (build-from-source fallback) using the correct API

Successful empirical anchor for next ISO build will validate the fix.
Copilot AI review requested due to automatic review settings May 27, 2026 06:50
@AceHack AceHack enabled auto-merge (squash) May 27, 2026 06:50
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@AceHack AceHack merged commit 9603ee8 into main May 27, 2026
29 of 30 checks passed
@AceHack AceHack deleted the fix/p0-nixos-install-fallback-flag-not-supported-2026-05-27 branch May 27, 2026 06:52
AceHack added a commit that referenced this pull request May 27, 2026
…aron 2026-05-27 architectural fix to B-0812) (#5412)

Empirical anchor: PR #5408 auto-opened mid-install for node-0fe6eb; install
then failed downstream at nixos-install --fallback bug (PR #5410 fix-fwd);
registration PR orphaned for a node-id that never came up.

Operator framing: "how did it register before it even rebooted? it should
not register until the last step when everything comes up and if it
reboots it should not register over and over... cluster should realize
it's register or has a pr in flight for register and not duplicate."

4 architectural changes:
1. Move self-registration OUT of zeta-install.sh Step 6.9 INTO systemd
   oneshot service that fires on first boot of installed OS, AFTER
   network-online + creds-restore + cluster reachable
2. Idempotency: marker file + upstream check + in-flight PR check before
   composing new PR
3. Cluster-agent coordination via Path B (Otto-pushes-PR-across-finish-line;
   per Aaron's simpler-form preference); Path A (/tmp folder standard)
   deferred to future row
4. De-dup: idempotent branch naming + in-flight detection + comment-on-existing

7 sub-rows B-0855.1-7 enumerated. Refines B-0812 (does not replace);
keeps B-0813 ArgoCD reconciliation unchanged.

Composes with B-0812 + B-0813 + B-0835 (Bug 10) + B-0850 (systemd
substrate) + B-0851 (persona-first scheduler) + B-0852 (cred persistence
as pre-condition).

PR #5408 closed substrate-honestly with cross-link to this row.

Co-authored-by: Lior <lior@zeta.dev>
@AceHack AceHack review requested due to automatic review settings May 27, 2026 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant