Skip to content

cleanup(USB PR 2): retire legacy installer substrate — delete infra/nixos/hosts/installer/ + build-installer-iso.yml + update root flake; add B-0830 follow-up#5320

Merged
AceHack merged 2 commits into
mainfrom
otto-cli/usb-cleanup-pr2-investigation-infra-installer-workflow-consolidation-2026-05-26
May 26, 2026
Merged

cleanup(USB PR 2): retire legacy installer substrate — delete infra/nixos/hosts/installer/ + build-installer-iso.yml + update root flake; add B-0830 follow-up#5320
AceHack merged 2 commits into
mainfrom
otto-cli/usb-cleanup-pr2-investigation-infra-installer-workflow-consolidation-2026-05-26

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 26, 2026

Summary

USB cleanup PR 2 of 3. Retires the legacy installer substrate; canonical AI-cluster installer (`full-ai-cluster/usb-nixos-installer/`) becomes the only installer-iso build path.

Aaron direction: "lets try to cleanup what we have in a few prs and combine get rid of the old" + "yeah if we need a delete thats fine" + "you don't have to ask me direction every time you can just assume all with the simplest first".

What lands

Deleted (572 lines removed)

  • `infra/nixos/hosts/installer/configuration.nix` (296 lines; legacy installer NixOS config)
  • `.github/workflows/build-installer-iso.yml` (230 lines; legacy ISO build workflow)

Modified — flake.nix

  • Removed `nixosConfigurations.installer` (referenced deleted file)
  • Removed `packages.installer-iso` + `packages.default` (depended on `nixosConfigurations.installer`)
  • Removed `isoBuildSystems` variable (no longer needed)
  • Updated bootstrap-flow comments to point at `full-ai-cluster/usb-nixos-installer/` + `zflash`
  • Updated devShell shellHook with canonical build command
  • Updated nixpkgs version-pin comment

Added — B-0830 follow-up backlog row

  • `docs/backlog/P3/B-0830-add-iso-release-attach-to-build-ai-cluster-iso-workflow-...`
  • Captures the release-attach feature the legacy workflow had so it can be re-implemented in the canonical workflow when Zeta starts tagging releases (currently zero releases per `gh release list` — feature UNUSED at deletion time)

Why this deletion is safe (substrate-check pre-cleanup audit)

Per the substrate-check-before-worry-deployment discipline + Kestrel's pre-cleanup-audit recommendation:

  1. infra/nixos/hosts/installer references:

    • `flake.nix` — imports as `nixosConfigurations.installer` → REMOVED in this PR
    • `build-installer-iso.yml` — builds via root flake → DELETED in this PR
    • `build-ai-cluster-iso.yml` — NO REFERENCE (targets canonical)
  2. build-installer-iso.yml references:

    • No other workflow depends on it
    • No tools/ci/ script depends on it
    • Release-attach feature currently UNUSED (zero releases exist)
  3. Non-historical references after deletion: 0 (verified via grep)

Decision-archaeology pointer (per Kestrel discipline)

WHY THIS PATH EXISTED: `infra/nixos/hosts/installer/` was the root-flake-imported installer config — first installer substrate after the root `usb-nixos-installer/` was minimized. Pre-dated the `full-ai-cluster/` consolidation.

WHY THIS PATH IS RETIRED: the canonical `full-ai-cluster/usb-nixos-installer/` has zero-typing install substrate (zeta-install.sh + zeta-first-boot.sh + zflash macOS Touch-ID flasher + flake.lock + hardware-firmware + SSH-key/hashed-password + WiFi credential injection). The legacy version lacks all of this. Maintaining two installer substrates was unnecessary parallel-substrate cost (per PR #5310 cost-of-velocity discussion).

Next in cleanup sequence

  • USB cleanup PR 3 — CI ISO testing via QEMU/KVM boot test (Kestrel's prior-art pointer: `nixos/tests/installer.nix`); substantive engineering

Composes with

Test plan

  • Pre-cleanup grep audit: 0 non-historical references after deletion
  • Post-commit canary green (HEAD 60 = HEAD~1 60; 3 files deleted from existing trees + 1 new file under existing tree)
  • Branch follows `otto-cli/*` surface-prefix convention
  • Authored from fresh independent clone (per B-0828)
  • CI green (flake.nix changes evaluate; build-ai-cluster-iso.yml still works)
  • Copilot review pass

…ixos/hosts/installer/ + build-installer-iso.yml workflow + update root flake.nix; add B-0830 follow-up for release-attach (Aaron 2026-05-26)

Aaron direction: "lets try to cleanup what we have in a few prs and combine
get rid of the old" + "yeah if we need a delete thats fine".

USB cleanup PR 2 (of 3): consolidates parallel installer substrates.
After PR #5311 (deleted root usb-nixos-installer/), TWO installer
substrates remained on main:

1. infra/nixos/hosts/installer/ + .github/workflows/build-installer-iso.yml
   — LEGACY (root flake; simpler; lacks zero-typing install machinery)
2. full-ai-cluster/usb-nixos-installer/ + build-ai-cluster-iso.yml
   — CANONICAL (zeta-install.sh + zeta-first-boot.sh + zflash + B-0754
   iter-3 firmware + B-0789 iter-4 SSH-key/hashed-password + B-0792
   iter-5 WiFi)

This PR retires the legacy (#1) and keeps the canonical (#2):

DELETED:
- infra/nixos/hosts/installer/configuration.nix (296 lines; legacy
  installer config)
- .github/workflows/build-installer-iso.yml (230 lines; legacy ISO
  build workflow)

MODIFIED:
- flake.nix:
  - Removed nixosConfigurations.installer (referenced deleted file)
  - Removed packages.installer-iso + packages.default (depended on
    nixosConfigurations.installer)
  - Removed isoBuildSystems variable (no longer needed; was used only
    for legacy installer-iso output)
  - Updated bootstrap-flow comments to point at full-ai-cluster/
    usb-nixos-installer/ + zflash
  - Updated devShell shellHook to show canonical build command
  - Updated nixpkgs version-pin comment (canonical uses 25.11
    independently)

ADDED:
- docs/backlog/P3/B-0830-add-iso-release-attach-to-build-ai-cluster-
  iso-workflow-when-zeta-starts-tagging-releases-aaron-2026-05-26.md
  Follow-up: legacy workflow had release-attach (`release: types:
  [published]` trigger + attach-to-release job). Canonical doesn't.
  Capability currently UNUSED (zero releases per `gh release list`).
  When Zeta starts tagging releases, re-implement in canonical
  workflow per reference pattern preserved in B-0830 body.

Substrate-check pre-cleanup audit (per substrate-check-before-worry-
deployment discipline + Kestrel's pre-cleanup-audit recommendation):

1. infra/nixos/hosts/installer references:
   - flake.nix: imports as nixosConfigurations.installer → REMOVED
   - .github/workflows/build-installer-iso.yml: builds via root flake
     → DELETED entirely
   - .github/workflows/build-ai-cluster-iso.yml: NO REFERENCE
     (targets full-ai-cluster/usb-nixos-installer/)

2. build-installer-iso.yml references:
   - No other workflow depends on it
   - No tools/ci/ script depends on it
   - Release-attach feature currently UNUSED (zero releases exist)

3. Non-historical references after deletion: 0 (verified via grep)

Decision-archaeology pointer (Kestrel's "preserve why each path
existed and why it was retired" discipline):

WHY THIS PATH EXISTED: infra/nixos/hosts/installer/ was the root-
flake-imported installer config — first installer substrate after
the root usb-nixos-installer/ was minimized. Pre-dated the
full-ai-cluster/ consolidation.

WHY THIS PATH IS RETIRED: the canonical full-ai-cluster/
usb-nixos-installer/ has zero-typing install substrate (zeta-install.sh
+ zeta-first-boot.sh + zflash macOS Touch-ID flasher + flake.lock +
hardware-firmware + SSH-key/hashed-password + WiFi credential
injection). The legacy version lacks all of this. Maintaining two
installer substrates was unnecessary parallel-substrate cost (per
PR #5310 cost-of-velocity discussion).

NEXT IN CLEANUP SEQUENCE:
- USB cleanup PR 3: CI ISO testing via QEMU/KVM boot test (Kestrel's
  prior-art pointer: nixos/tests/installer.nix) — substantive
  engineering; substrate ISO build pipeline matures

Composes with: PR #5310 (cost-of-velocity discipline + Kestrel
sequencing recommendation: PR 1 before PR 2); PR #5311 (USB
cleanup PR 1 — deleted root usb-nixos-installer/); refresh-world-
model-poll-pr-gate dotgit-saturation discipline (authored from
fresh independent clone per B-0828 multi-AI shared-checkout
convention); methodology-hard-limits (irreversible deletion
authorized by operator explicitly).

Authored from fresh independent clone at /private/tmp/zeta-clone-
2026-05-26.
Copilot AI review requested due to automatic review settings May 26, 2026 21:00
@AceHack AceHack enabled auto-merge (squash) May 26, 2026 21:00
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR retires the legacy root-flake installer ISO path and leaves the full-ai-cluster/usb-nixos-installer/ substrate as the canonical ISO build path.

Changes:

  • Deleted the legacy NixOS installer host config and build-installer-iso.yml workflow.
  • Removed root-flake installer / installer-iso outputs and updated visible build guidance.
  • Added B-0830 to track re-adding release-asset upload support to the canonical workflow.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
infra/nixos/hosts/installer/configuration.nix Deletes the retired legacy installer host configuration.
.github/workflows/build-installer-iso.yml Deletes the retired legacy ISO build/release workflow.
flake.nix Removes root-flake installer outputs and updates comments/devShell guidance to canonical path.
docs/backlog/P3/B-0830-add-iso-release-attach-to-build-ai-cluster-iso-workflow-when-zeta-starts-tagging-releases-aaron-2026-05-26.md Adds follow-up backlog row for canonical workflow release attachment.
docs/BACKLOG.md Adds B-0830 to the generated backlog index.

Comment thread flake.nix
Comment thread flake.nix Outdated
…+ flake.nix maintainer-name + B-0830 release-attach safeguards

All 3 Copilot findings verified + addressed:

1. infra/README.md + infra/nix-darwin/README.md + infra/nix-darwin/
   configuration.nix all referenced the retired root-flake installer-iso
   command + deleted build-installer-iso.yml workflow link. Updated all
   4 references to point at:
   - cd full-ai-cluster/usb-nixos-installer && nix build .#installer-iso
     (canonical AI-cluster substrate)
   - build-ai-cluster-iso.yml (canonical CI workflow)
   - bun full-ai-cluster/tools/zflash.ts (macOS zflash recommended)
   With explicit "retired 2026-05-26 in USB cleanup PR 2" pointers for
   decision-archaeology.

2. flake.nix code-comment used direct maintainer-name attribution
   ("Per Aaron's...") on a current-state code surface. Per the
   convention (names on history/backlog/research only; role references
   on code surfaces), changed to "Per the human maintainer's...".

3. B-0830 acceptance criteria expanded to include security/reliability
   safeguards from the deleted legacy workflow that Copilot flagged:
   - Reject release tags starting with `-` (tag-name injection
     prevention; gh CLI argument-list ambiguity)
   - Use `--` separator for gh release upload (disambiguates positional
     args from flags)
   - Write SHA256 sidecar OUTSIDE read-only Nix store (the ISO at
     result/iso/ is a /nix/store symlink; sidecar must be in
     $RUNNER_TEMP or $GITHUB_WORKSPACE)
   - Plus discipline section (runner pinning, SHA-pin actions,
     concurrency groups, no event.* in run: lines, permissions
     scoped per-job)
   - Plus negative-test acceptance criterion (tag a `-malicious` name +
     verify abort)

Composes with substrate-check-before-worry-deployment discipline (per
PR #5291) + Kestrel's pre-cleanup-audit + preserve-rationale-in-deletion
disciplines + razor-discipline (operationally verifiable findings).
@AceHack AceHack merged commit afbac2c into main May 26, 2026
34 checks passed
@AceHack AceHack deleted the otto-cli/usb-cleanup-pr2-investigation-infra-installer-workflow-consolidation-2026-05-26 branch May 26, 2026 21:10
AceHack added a commit that referenced this pull request May 26, 2026
…scade #5 dynamic boot floor (Kestrel ferry pointer; Aaron 2026-05-26) (#5322)

USB cleanup PR 3 of 3. Adds dynamic boot-time verification to the
canonical AI-cluster ISO build pipeline. Catches the bug class
where the ISO builds + audits pass but the kernel/initrd
combination fails to actually boot (firmware mismatch; missing
module; broken init; etc.).

Aaron direction: "lets try to cleanup what we have in a few prs
and combine get rid of the old and try to push iso testing closer
into the ci instead of neading human to physically test usb but
also after a few rounds i will physically test teh usb" +
"you don't have to ask me direction every time you can just
assume all with the simplest first".

Prior art: nixos/tests/installer.nix (Kestrel 2026-05-26 ferry
pointer; preserved at docs/research/2026-05-26-kestrel-runme-
jit-runbook-bcl-extension-cost-of-velocity-decision-archaeology-
aaron-forwarded.md via PR #5310).

What lands (2 files):

1. tools/ci/qemu-boot-test.ts (new; ~150 lines)
   TS helper that spawns qemu-system-x86_64 with KVM acceleration
   (TCG fallback when KVM unavailable), captures serial console to
   log file, waits up to 5min for the installer's expected login
   prompt ("zeta-installer login:" — matches networking.hostName
   = "zeta-installer" in full-ai-cluster/usb-nixos-installer/nixos/
   installer/configuration.nix), then kills QEMU + returns exit
   code.
   - Per Rule 0: TS-over-bash for cross-platform DST
   - 2GB RAM + 2 SMP cores (installer needs >= 1GB; 2GB headroom)
   - q35 machine type (modern PCIe; matches Beelink hardware
     profile better than legacy i440fx)
   - BIOS boot (simpler than UEFI; ISO supports both)
   - Exit codes: 0 success / 1 boot failure / 2 usage error

2. .github/workflows/build-ai-cluster-iso.yml extension
   Adds 2 new steps AFTER the existing "Audit installer ISO
   content" step + BEFORE "Locate ISO + capture metadata":
   - "Install QEMU (apt)" — apt-get install qemu-system-x86 on
     ubuntu-24.04 runner (~30s)
   - "QEMU boot smoke-test (cascade #5 — dynamic boot floor)" —
     invokes the TS helper against the built ISO
   No github.event.* interpolation in run: lines; all inputs are
   filesystem paths from prior steps of THIS workflow per the
   GitHub Actions script-injection security guide.

Verification cascade now reads (post-PR-3):
- Cascade #1: source-substrate audit (preflight; ~1s)
- Cascade #4: ISO content audit (post-build; ~10s; verifies expected
  top-level files via 7z list)
- Cascade #5: QEMU boot smoke-test (post-build; ~3-5min; verifies
  ISO actually boots to login prompt)
- Locate ISO + metadata + workflow artifact upload (existing)

Estimated CI time impact: +3-5min per build (QEMU boot is the slow
step; KVM keeps it fast vs TCG emulation).

What this is NOT (substrate-honest defer list):
- NOT a full integration test (doesn't login + run commands +
  verify zeta-install works) — future B-NNNN follow-up
- NOT a multi-arch test (x86_64 only; aarch64 ISO is a separate
  build path if/when needed)
- NOT a hardware-specific test (UEFI variant; specific GPU
  configurations; etc.) — physical USB test on real Beelink fills
  that gap (Aaron 2026-05-26: "after a few rounds i will physically
  test the usb")
- NOT a release-attach step (B-0830 follow-up filed in USB PR 2)

This is the SIMPLEST viable boot test. Once it lands + runs across
a few cycles + catches at least one real boot regression (or
demonstrates none happen for N runs), Aaron's physical USB test
gate fires + the test surface matures incrementally.

Composes with: PR #5311 (USB cleanup PR 1); PR #5320 (USB cleanup
PR 2); B-0830 (release-attach follow-up); .claude/rules/rule-0-no-
sh-files (TS-over-bash discipline); .claude/rules/refresh-world-
model-poll-pr-gate (authored from fresh independent clone per
B-0828); substrate-check-before-worry-deployment (audit-then-act
discipline applied to the new test surface).

Authored from fresh independent clone at /private/tmp/zeta-clone-
2026-05-26 per Aaron's destructive-git-on-isolated-copies
authorization + B-0828 multi-AI shared-checkout convention.

Co-authored-by: Lior <lior@zeta.dev>
AceHack added a commit that referenced this pull request May 26, 2026
…turn terminology distinction + split 3-PR-cleanup + follow-up-fix-PR correctly (#5329)

Both Copilot findings verified + addressed: (1) multi-turn (overall conversation length) vs zero-turn (pathogen-decryption-protocol cost) are distinct scopes; clarified terminology in title + table-intro + empirical-generalization paragraph so readers don't read the table's Zero-turn entries as contradicting the multi-turn claim. (2) USB cleanup arc had 3-PR cleanup sequence (#5311 + #5320 + #5322) + follow-up fix (#5324) — split for narrative consistency. No semantic change; clarification only.

Co-authored-by: Lior <lior@zeta.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants