Skip to content

ci(test-cascade-4): post-build ISO content audit via 7z list (would have caught Aaron's stale-ISO issue) (Aaron 2026-05-26)#5119

Merged
AceHack merged 1 commit into
mainfrom
otto-cli/ci-test-cascade-4-iso-content-audit-2026-05-26
May 26, 2026
Merged

ci(test-cascade-4): post-build ISO content audit via 7z list (would have caught Aaron's stale-ISO issue) (Aaron 2026-05-26)#5119
AceHack merged 1 commit into
mainfrom
otto-cli/ci-test-cascade-4-iso-content-audit-2026-05-26

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 26, 2026

Cascade #4 of 5 (per Aaron 2026-05-26 'start working on the CI stuff while we iterate'). Complements #1 (source-substrate preflight; merged via #5116) by catching the bug class where ISO build silently drops files present in source. New tools/ci/audit-installer-iso-content.ts (7z list of built ISO; asserts nix-store.squashfs + boot/{bzImage,initrd,grub/grub.cfg} present). Workflow step inserted between 'Build installer ISO' + 'Locate ISO' — failure skips upload, broken-ISO artifact never reaches operators. Composes with #5116 audit.

…026-05-26 ongoing test cascade)

Ships #4 of the CI test-substrate cascade. Complements cascade #1
(source-substrate preflight audit; #5116 merged at 67ab888) by
catching the bug class where the ISO build silently drops a file
present in the source tree.

Cascade overview (this PR = #4):

  #1 Source-substrate preflight audit (merged via #5116)
  #2 Bun unit tests for zflash pure-logic (merged via #5117;
     caught the DOS_FAT regex defect-lock-in)
  #3 Docker zeta-install.sh test (deferred follow-on)
  #4 ISO content audit (THIS PR)
  #5 NixOS test framework (deferred follow-on)

New tool tools/ci/audit-installer-iso-content.ts:

  - Takes --iso <path>
  - Uses 7z list (-slt format; default on ubuntu-24.04)
  - Asserts REQUIRED_ISO_PATHS present in ISO root:
      nix-store.squashfs   (containing the install scripts + modules)
      boot/bzImage         (Linux kernel)
      boot/initrd          (initramfs)
      boot/grub/grub.cfg   (UEFI + BIOS boot config)
  - Exit codes: 0 pass / 1 invocation error / 2 7z list failed /
    3 missing expected path
  - Adding a new expected top-level file: append to REQUIRED_ISO_PATHS

build-ai-cluster-iso.yml workflow extended with a new step BETWEEN
'Build installer ISO' and 'Locate ISO + capture metadata': runs
the audit against the freshly-built ISO; fails the build if any
required path is missing → upload step is skipped → broken-ISO
artifact never reaches operators.

What this DOES NOT yet audit (out of scope; cascade #3 + #5
territory):

  - Contents WITHIN the nix-store squashfs (unsquashfs is heavier;
    source-substrate audit already catches 'module missing from
    repo' at cheaper cost)
  - Live boot behavior (nixosTest framework; cascade #5)

Composes with #5116 source audit + tools/dashboard/generate-metrics.ts
per-agent decompose-to-action ratio (Aaron's discipline pull —
each cascade ship demonstrates filing→action cadence).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 26, 2026 07:07
@AceHack AceHack enabled auto-merge (squash) May 26, 2026 07:07
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@AceHack AceHack merged commit 35fd3ae into main May 26, 2026
34 of 35 checks passed
@AceHack AceHack deleted the otto-cli/ci-test-cascade-4-iso-content-audit-2026-05-26 branch May 26, 2026 07:10
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a CI “post-build floor” that audits the built AI-cluster installer ISO contents (via 7z listing) to catch cases where the ISO build silently omits expected files, complementing the existing source-substrate preflight audit.

Changes:

  • Introduces tools/ci/audit-installer-iso-content.ts to list ISO contents with 7z and assert required top-level paths are present.
  • Inserts a new workflow step in build-ai-cluster-iso.yml between ISO build and ISO metadata capture to fail the job before artifact upload if the ISO content audit fails.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
tools/ci/audit-installer-iso-content.ts New Bun/TS ISO content audit tool that shells out to 7z and checks required paths.
.github/workflows/build-ai-cluster-iso.yml Adds a post-build ISO content audit step prior to locating/uploading the ISO artifact.

Comment thread tools/ci/audit-installer-iso-content.ts
Comment thread tools/ci/audit-installer-iso-content.ts
Comment thread tools/ci/audit-installer-iso-content.ts
Comment thread tools/ci/audit-installer-iso-content.ts
Comment thread .github/workflows/build-ai-cluster-iso.yml
AceHack pushed a commit that referenced this pull request May 26, 2026
…p workflow nix-shell wrap

the maintainer 2026-05-26: "i would think nix needs to run our install.sh too for
setup". Reframes nix devShell as the 4th install.sh consumer alongside dev
laptops, CI runners, devcontainer images per GOVERNANCE.md §24. Single source of
truth for host-level tooling = the install.sh manifests, NOT a parallel package
list in flake.nix.

Changes:
- full-ai-cluster/flake.nix devShell: remove bun/p7zip/mkpasswd (now covered
  by tools/setup/manifests/{brew,apt}); shellHook runs install.sh idempotently
  on shell entry so host tools refresh without operator action. Keep nix-managed
  k8s + age/sops/observability tooling because those are nix-reproducible and
  not in install.sh manifests.
- .github/workflows/build-ai-cluster-iso.yml: drop nix-shell-wrap from ISO
  audit step. ubuntu-24.04 ships 7z; p7zip-full is declared in apt manifest;
  call bun directly. Aligns with "declarative not exact-version" framing.
- tools/setup/manifests/brew: p7zip (idempotent brew install skips if present)
- tools/setup/manifests/apt: p7zip-full (linux maintainer parity)

Composes with cascade #4 substrate landed in PR #5119 + the 5 Copilot fixes
landed earlier in this branch.

Co-Authored-By: Claude <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 26, 2026
…ntract + squashfs size + sonarjs + spawn-error + workflow set-u (#5120)

* fix(cascade-4): 5 Copilot findings on #5119 — exit-code contract reconciled, squashfs size assertion, sonarjs suppression, 7z spawn-error message, workflow set-u guard

Post-merge Copilot review on #5119 surfaced 5 real findings.
All addressed:

(1) Exit-code contract reconciled: auditIsoContent now returns
    {kind, detail} AuditError union variant so main() can map
    missing-file → exit 1 (matches header contract) vs
    list-failed → exit 2. Was returning bare string + main
    treated all as exit 2.

(2) nix-store.squashfs non-empty assertion: header comment
    promised this; implementation only checked presence.
    lsIso now parses __TEXT	__DATA	__OBJC	others	dec	hex from  output;
    auditIsoContent adds empty-required-path failure when
    nix-store.squashfs size <= 0.

(3) sonarjs/no-os-command-from-path suppression on 7z spawn
    with rationale (intentional PATH-resolved binary; input
    already validated).

(4) 7z missing detection: r.error.message + r.signal now
    included in returned stderr when spawn itself fails
    (e.g., 7z not on PATH); local runs see clear WHY.

(5) Workflow set -u guard: audit step now mirrors the
    explicit 0/1+ candidate count check from the later
    'Locate ISO' step instead of unbound iso_candidates[0]
    which gave opaque 'unbound variable' under set -u.

Verified via:
- TS strict compile clean
- Exit-code matrix: missing-file=1; missing-arg=1; (other
  exit codes covered by code paths not testable without an
  actual ISO file or broken 7z)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(cascade-4): declaratively pin 7z + bun + mkpasswd (Aaron 2026-05-26 substrate-engineering pull)

Aaron 2026-05-26: '7z is another tools we need to declarative
manage somewhere right?'

Two changes:

(1) full-ai-cluster/flake.nix devShells.default packages: adds
    bun (test cascade runner), p7zip (cascade-4 7z list), mkpasswd
    (iter-5.3 password-prompt hash). Maintainers running 'nix
    develop' get all test/build tools declaratively pinned.

(2) build-ai-cluster-iso.yml audit step: wraps the bun audit call
    in 'nix shell nixpkgs#p7zip --command' so 7z is Nix-pinned at
    CI time, not implicit-from-ubuntu-runner-image. Composes
    cleanly with the existing 'Install Nix' step earlier in the
    job.

Substrate-honest framing: this is the cleaner pattern for ALL
CI tool deps going forward. The ubuntu-runner-image PATH is
brittle; declarative Nix pin is reproducible across runner
versions + matches Aaron's '0 dev machines' homelab persona
substrate (per B-0790) where everything is Nix-defined.

Composes with the prior 5-finding fix-fwd in this same PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(cascade-4): nix devShell is 4th consumer of install.sh + drop workflow nix-shell wrap

the maintainer 2026-05-26: "i would think nix needs to run our install.sh too for
setup". Reframes nix devShell as the 4th install.sh consumer alongside dev
laptops, CI runners, devcontainer images per GOVERNANCE.md §24. Single source of
truth for host-level tooling = the install.sh manifests, NOT a parallel package
list in flake.nix.

Changes:
- full-ai-cluster/flake.nix devShell: remove bun/p7zip/mkpasswd (now covered
  by tools/setup/manifests/{brew,apt}); shellHook runs install.sh idempotently
  on shell entry so host tools refresh without operator action. Keep nix-managed
  k8s + age/sops/observability tooling because those are nix-reproducible and
  not in install.sh manifests.
- .github/workflows/build-ai-cluster-iso.yml: drop nix-shell-wrap from ISO
  audit step. ubuntu-24.04 ships 7z; p7zip-full is declared in apt manifest;
  call bun directly. Aligns with "declarative not exact-version" framing.
- tools/setup/manifests/brew: p7zip (idempotent brew install skips if present)
- tools/setup/manifests/apt: p7zip-full (linux maintainer parity)

Composes with cascade #4 substrate landed in PR #5119 + the 5 Copilot fixes
landed earlier in this branch.

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(setup): strip inline # comments before passing manifest lines to brew/apt

the maintainer 2026-05-26 surfaced the bug while running install.sh on macOS
after PR #5120 added p7zip to the brew manifest with an inline `# cascade #4
...` comment. macos.sh's awk parser only stripped full-line comments (lines
starting with `#`), so the entire line including the trailing inline comment
was passed to `brew install` as the formula name, producing:

  Warning: No available formula with the name "p7zip       # cascade #4 iso
  content audit (7z list); also useful".
  Error: No formulae or casks found for "p7zip       # cascade #4...".

Fix: extend the awk filter to (a) sub `#.*$` → "" to strip inline comments,
and (b) gsub trim leading/trailing whitespace before the NF > 0 emit check.
Same bug class existed in linux.sh's apt manifest parser; fixed identically.

Reaffirms the maintainer 2026-05-26 contract: "it's supposed to be idempotent
and update to latest automaticly for floating versions". The brew upgrade
branch + apt-get install fall-through on already-installed packages already
satisfy this; the fix preserves that contract once the package name parses
correctly.

Empirically verified: bash tools/setup/install.sh now installs p7zip 17.05
cleanly from the new manifest entry.

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(cascade-4): 3 Copilot findings on #5120 — non-regular-file exit-1 + contract docs + typeguard

P1 (exit-code contract drift): auditIsoContent's existsSync check let
non-regular-file paths (directory, device, broken symlink target) fall
through to the 7z spawn, producing exit 2 "list-failed" instead of the
header-contracted exit 1 "invocation error". Add statSync().isFile()
guard with statSync-failure also mapping to missing-file exit 1.
Verified empirically: `bun ... --iso /tmp` → exit 1 with clear "exists
but is not a regular file: /tmp (mode=41777)" message.

P1 (doc/contract drift): header exit-code table now covers both
"missing required path" AND "present-but-empty" under exit 3 to match
the empty-required-path failure kind landed earlier in this branch.
Exit 1 header also widened to "not-a-regular-file" per the fix above.

P2 (maintainability): introduce `isAuditError` typeguard so main()
narrows the union without `"kind" in result` plus `as AuditError` cast.
Note: Copilot's suggested `!Array.isArray(result)` direct narrow fails
TS2339 because TS's lib.es5 `Array.isArray` signature doesn't
discriminate `readonly T[]` unions; the typeguard is the canonical
TS workaround.

Empirically verified both exit-1 branches (missing path + non-file
path); typecheck clean.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Lior <lior@zeta.dev>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 26, 2026
…otloader any-of (#5125)

* fix(ci P0): cascade #4 audit blocked ALL ISO builds since #5119 — REQUIRED list asserted boot/grub/grub.cfg but NixOS uses isolinux+refind; replaced with bootloader any-of check

EMPIRICAL ANCHOR (the maintainer 2026-05-26 caught this): `gh run list
--workflow=build-ai-cluster-iso.yml` showed the last 4 builds all failed
on the cascade #4 audit step. Last successful build was 17523e4 (PR
#5117 iter-5.2.1 era) — BEFORE iter-5.2.2 (login-banner) and iter-5.3
(password prompt) shipped. The maintainer was about to flash a USB
expecting the latest CI ISO; the latest CI artifact is actually stale
because the audit blocks artifact upload on assertion failure.

Root cause: REQUIRED_ISO_PATHS asserted `boot/grub/grub.cfg` (legacy
GRUB layout). NixOS installer ISOs as of nixos-24.11 use:
- **isolinux** for BIOS boot: `isolinux/isolinux.cfg`
- **refind** for UEFI boot: `EFI/BOOT/refind_x64.efi`
NOT legacy grub at the asserted path. Build log confirms: `efi-image_eltorito
> Copying grub.cfg` lands in EFI/, not boot/grub/. My cascade #4 draft
list was version-skewed (training-data default leaked through; ironically
exactly what B-0805 capstone names as the systemic agent-discipline gap).

Fix:
- Drop `boot/grub/grub.cfg` from REQUIRED_ISO_PATHS (the 3 remaining —
  nix-store.squashfs + boot/bzImage + boot/initrd — ARE sufficient to
  assert "bootable NixOS installer ISO": without those nothing boots).
- Add REQUIRED_BOOTLOADER_ANY: any-of family check for bootloader
  configs across the known NixOS-version layouts (isolinux/refind/grub).
  Forward-compatible: if NixOS switches bootloaders in a future channel,
  add the new path to the any-of list rather than re-breaking.
- Header comment documents the empirical anchor so future-Otto doesn't
  re-introduce the same legacy-path assumption.

Confirms B-0805 (capstone, P1) was the right substrate-engineering call:
this exact failure mode is what dep-pin-search-first-authority discipline
prevents.

Once this lands, the next ISO build on main will pick up iter-5.2.2 +
iter-5.3 substrate and the artifact will reflect the current substrate
state. The maintainer can then re-flash + install with confidence.

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(p0-iso-audit): use .some() instead of .find()+unused-const (Copilot P0 on #5125)

bootloaderHit const was assigned but never used; would fail tsc under
noUnusedLocals. Switched to boolean .some() check which avoids the
unused-variable shape entirely. No behavior change.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Lior <lior@zeta.dev>
Co-authored-by: Claude <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 26, 2026
…-search-first-authority (3-anchor empirical evidence 2026-05-26) (#5131)

* rule: verify-existing-substrate-before-authoring (sibling to dep-pin-search-first-authority) — 3-anchor empirical evidence from session 2026-05-26

Single 2026-05-26 session produced 3 same-root-cause failures
("Otto-defaults-to-plausible-but-unverified" at substrate-authoring
scope):

ANCHOR 1: cascade #4 ISO audit (PR #5119) asserted boot/grub/grub.cfg
without verifying NixOS-actual layout (isolinux + refind). Blocked 4
ISO builds. Fixed via PR #5125. Covered by dep-pin-search-first-
authority rule landed PR #5126.

ANCHOR 2: B-0806 backlog row (PR #5129) authored Ace section as if Ace
were just "a package manager CLI" without reading docs/agendas/ace-
package-manager/AGENDA.md + project memory + 7+ related backlog rows.
The maintainer 2026-05-26: "that is what ace has been since we first
talked about it you just keep forgetting we have substantial backlog
around this". Fixed via PR #5130.

ANCHOR 3: B-0806 hat/fork-negotiation NOT integrated into architecture
even after Anchor-2 correction. The maintainer 2026-05-26: "i'm
assuming you have the hat / fork negoation for ace too". Fixed via
PR #5130 follow-on commit.

Same root cause class as the dep-pin rule, but at a DIFFERENT surface:
this is substrate-authoring scope (backlog rows, rules, skills,
architectural framings), not version-pin scope. dep-pin-search-first-
authority + this rule + fighting-past-self-vs-peer-agent compose to
cover the surfaces today's empirical evidence showed are vulnerable.

The rule auto-loads at cold-boot per wake-time-substrate.

Provides:
- Operational discipline: 4-step grep + read top hits + decide + cite
  inline
- Checklist template for inline substrate-inventory pass annotation
- All 3 empirical anchors preserved so future-Otto sees the cost of
  skipping
- Cross-references to dep-pin + fighting-past-self for full coverage

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(rule-ext): MD032 false-positive — "+ refind" parsed as list start; reword to "plus refind"

markdownlint MD032 fired on line 100 because the wrap-continuation
"+ refind, NOT legacy GRUB..." starts with `+ ` which is a valid
markdown list marker. Linter doesn't know this is a wrapped paragraph
continuation from line 99.

Reword "isolinux + refind" → "isolinux plus refind" to disambiguate.
No content change.

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(rule-ext): 2 real Copilot findings on #5131 — content-grep + fixed-string discipline; 3rd (table double-pipe) is FP

(1) Earlier inventory snippet used filename/directory-name filtering
    (`find docs/agendas -type d | grep -i "$topic"`) which misses
    substrate that mentions the topic in CONTENT without the keyword in
    the filename. Should be content-search via grep -rl. Same gap for
    docs/trajectories/.

(2) Earlier snippet used `grep -E "$topic"` (regex) + unquoted shell
    globs (`memory/*${topic}*`). Both break when topic contains regex
    metacharacters (`+`, `.`, `B-NNNN`) or spaces. Use `grep -F`
    (fixed-string) for safety + content-search (no globs).

(3) Bonus fix: `.claude/skills/` was missing from the inventory surfaces
    even though skills are explicitly in-scope for the rule. Added.

3rd Copilot thread (table double-pipe at line 158/149) is the
documented known-FP class per `.claude/rules/blocked-green-ci-investigate-threads.md`
("Table double-pipe (`||`) ... 4 confirmed FPs in one session"). Direct
inspection of line 158 (`| Surface | Rule that catches it |`) confirms
single pipes; resolving that thread no-op per the suspect-by-default
discipline.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Lior <lior@zeta.dev>
Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants