ci(test-cascade-4): post-build ISO content audit via 7z list (would have caught Aaron's stale-ISO issue) (Aaron 2026-05-26)#5119
Merged
AceHack merged 1 commit intoMay 26, 2026
Conversation
…026-05-26 ongoing test cascade) Ships #4 of the CI test-substrate cascade. Complements cascade #1 (source-substrate preflight audit; #5116 merged at 67ab888) by catching the bug class where the ISO build silently drops a file present in the source tree. Cascade overview (this PR = #4): #1 Source-substrate preflight audit (merged via #5116) #2 Bun unit tests for zflash pure-logic (merged via #5117; caught the DOS_FAT regex defect-lock-in) #3 Docker zeta-install.sh test (deferred follow-on) #4 ISO content audit (THIS PR) #5 NixOS test framework (deferred follow-on) New tool tools/ci/audit-installer-iso-content.ts: - Takes --iso <path> - Uses 7z list (-slt format; default on ubuntu-24.04) - Asserts REQUIRED_ISO_PATHS present in ISO root: nix-store.squashfs (containing the install scripts + modules) boot/bzImage (Linux kernel) boot/initrd (initramfs) boot/grub/grub.cfg (UEFI + BIOS boot config) - Exit codes: 0 pass / 1 invocation error / 2 7z list failed / 3 missing expected path - Adding a new expected top-level file: append to REQUIRED_ISO_PATHS build-ai-cluster-iso.yml workflow extended with a new step BETWEEN 'Build installer ISO' and 'Locate ISO + capture metadata': runs the audit against the freshly-built ISO; fails the build if any required path is missing → upload step is skipped → broken-ISO artifact never reaches operators. What this DOES NOT yet audit (out of scope; cascade #3 + #5 territory): - Contents WITHIN the nix-store squashfs (unsquashfs is heavier; source-substrate audit already catches 'module missing from repo' at cheaper cost) - Live boot behavior (nixosTest framework; cascade #5) Composes with #5116 source audit + tools/dashboard/generate-metrics.ts per-agent decompose-to-action ratio (Aaron's discipline pull — each cascade ship demonstrates filing→action cadence). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Pull request overview
Adds a CI “post-build floor” that audits the built AI-cluster installer ISO contents (via 7z listing) to catch cases where the ISO build silently omits expected files, complementing the existing source-substrate preflight audit.
Changes:
- Introduces
tools/ci/audit-installer-iso-content.tsto list ISO contents with7zand assert required top-level paths are present. - Inserts a new workflow step in
build-ai-cluster-iso.ymlbetween ISO build and ISO metadata capture to fail the job before artifact upload if the ISO content audit fails.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| tools/ci/audit-installer-iso-content.ts | New Bun/TS ISO content audit tool that shells out to 7z and checks required paths. |
| .github/workflows/build-ai-cluster-iso.yml | Adds a post-build ISO content audit step prior to locating/uploading the ISO artifact. |
AceHack
pushed a commit
that referenced
this pull request
May 26, 2026
…p workflow nix-shell wrap
the maintainer 2026-05-26: "i would think nix needs to run our install.sh too for
setup". Reframes nix devShell as the 4th install.sh consumer alongside dev
laptops, CI runners, devcontainer images per GOVERNANCE.md §24. Single source of
truth for host-level tooling = the install.sh manifests, NOT a parallel package
list in flake.nix.
Changes:
- full-ai-cluster/flake.nix devShell: remove bun/p7zip/mkpasswd (now covered
by tools/setup/manifests/{brew,apt}); shellHook runs install.sh idempotently
on shell entry so host tools refresh without operator action. Keep nix-managed
k8s + age/sops/observability tooling because those are nix-reproducible and
not in install.sh manifests.
- .github/workflows/build-ai-cluster-iso.yml: drop nix-shell-wrap from ISO
audit step. ubuntu-24.04 ships 7z; p7zip-full is declared in apt manifest;
call bun directly. Aligns with "declarative not exact-version" framing.
- tools/setup/manifests/brew: p7zip (idempotent brew install skips if present)
- tools/setup/manifests/apt: p7zip-full (linux maintainer parity)
Composes with cascade #4 substrate landed in PR #5119 + the 5 Copilot fixes
landed earlier in this branch.
Co-Authored-By: Claude <noreply@anthropic.com>
AceHack
added a commit
that referenced
this pull request
May 26, 2026
…ntract + squashfs size + sonarjs + spawn-error + workflow set-u (#5120) * fix(cascade-4): 5 Copilot findings on #5119 — exit-code contract reconciled, squashfs size assertion, sonarjs suppression, 7z spawn-error message, workflow set-u guard Post-merge Copilot review on #5119 surfaced 5 real findings. All addressed: (1) Exit-code contract reconciled: auditIsoContent now returns {kind, detail} AuditError union variant so main() can map missing-file → exit 1 (matches header contract) vs list-failed → exit 2. Was returning bare string + main treated all as exit 2. (2) nix-store.squashfs non-empty assertion: header comment promised this; implementation only checked presence. lsIso now parses __TEXT __DATA __OBJC others dec hex from output; auditIsoContent adds empty-required-path failure when nix-store.squashfs size <= 0. (3) sonarjs/no-os-command-from-path suppression on 7z spawn with rationale (intentional PATH-resolved binary; input already validated). (4) 7z missing detection: r.error.message + r.signal now included in returned stderr when spawn itself fails (e.g., 7z not on PATH); local runs see clear WHY. (5) Workflow set -u guard: audit step now mirrors the explicit 0/1+ candidate count check from the later 'Locate ISO' step instead of unbound iso_candidates[0] which gave opaque 'unbound variable' under set -u. Verified via: - TS strict compile clean - Exit-code matrix: missing-file=1; missing-arg=1; (other exit codes covered by code paths not testable without an actual ISO file or broken 7z) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(cascade-4): declaratively pin 7z + bun + mkpasswd (Aaron 2026-05-26 substrate-engineering pull) Aaron 2026-05-26: '7z is another tools we need to declarative manage somewhere right?' Two changes: (1) full-ai-cluster/flake.nix devShells.default packages: adds bun (test cascade runner), p7zip (cascade-4 7z list), mkpasswd (iter-5.3 password-prompt hash). Maintainers running 'nix develop' get all test/build tools declaratively pinned. (2) build-ai-cluster-iso.yml audit step: wraps the bun audit call in 'nix shell nixpkgs#p7zip --command' so 7z is Nix-pinned at CI time, not implicit-from-ubuntu-runner-image. Composes cleanly with the existing 'Install Nix' step earlier in the job. Substrate-honest framing: this is the cleaner pattern for ALL CI tool deps going forward. The ubuntu-runner-image PATH is brittle; declarative Nix pin is reproducible across runner versions + matches Aaron's '0 dev machines' homelab persona substrate (per B-0790) where everything is Nix-defined. Composes with the prior 5-finding fix-fwd in this same PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(cascade-4): nix devShell is 4th consumer of install.sh + drop workflow nix-shell wrap the maintainer 2026-05-26: "i would think nix needs to run our install.sh too for setup". Reframes nix devShell as the 4th install.sh consumer alongside dev laptops, CI runners, devcontainer images per GOVERNANCE.md §24. Single source of truth for host-level tooling = the install.sh manifests, NOT a parallel package list in flake.nix. Changes: - full-ai-cluster/flake.nix devShell: remove bun/p7zip/mkpasswd (now covered by tools/setup/manifests/{brew,apt}); shellHook runs install.sh idempotently on shell entry so host tools refresh without operator action. Keep nix-managed k8s + age/sops/observability tooling because those are nix-reproducible and not in install.sh manifests. - .github/workflows/build-ai-cluster-iso.yml: drop nix-shell-wrap from ISO audit step. ubuntu-24.04 ships 7z; p7zip-full is declared in apt manifest; call bun directly. Aligns with "declarative not exact-version" framing. - tools/setup/manifests/brew: p7zip (idempotent brew install skips if present) - tools/setup/manifests/apt: p7zip-full (linux maintainer parity) Composes with cascade #4 substrate landed in PR #5119 + the 5 Copilot fixes landed earlier in this branch. Co-Authored-By: Claude <noreply@anthropic.com> * fix(setup): strip inline # comments before passing manifest lines to brew/apt the maintainer 2026-05-26 surfaced the bug while running install.sh on macOS after PR #5120 added p7zip to the brew manifest with an inline `# cascade #4 ...` comment. macos.sh's awk parser only stripped full-line comments (lines starting with `#`), so the entire line including the trailing inline comment was passed to `brew install` as the formula name, producing: Warning: No available formula with the name "p7zip # cascade #4 iso content audit (7z list); also useful". Error: No formulae or casks found for "p7zip # cascade #4...". Fix: extend the awk filter to (a) sub `#.*$` → "" to strip inline comments, and (b) gsub trim leading/trailing whitespace before the NF > 0 emit check. Same bug class existed in linux.sh's apt manifest parser; fixed identically. Reaffirms the maintainer 2026-05-26 contract: "it's supposed to be idempotent and update to latest automaticly for floating versions". The brew upgrade branch + apt-get install fall-through on already-installed packages already satisfy this; the fix preserves that contract once the package name parses correctly. Empirically verified: bash tools/setup/install.sh now installs p7zip 17.05 cleanly from the new manifest entry. Co-Authored-By: Claude <noreply@anthropic.com> * fix(cascade-4): 3 Copilot findings on #5120 — non-regular-file exit-1 + contract docs + typeguard P1 (exit-code contract drift): auditIsoContent's existsSync check let non-regular-file paths (directory, device, broken symlink target) fall through to the 7z spawn, producing exit 2 "list-failed" instead of the header-contracted exit 1 "invocation error". Add statSync().isFile() guard with statSync-failure also mapping to missing-file exit 1. Verified empirically: `bun ... --iso /tmp` → exit 1 with clear "exists but is not a regular file: /tmp (mode=41777)" message. P1 (doc/contract drift): header exit-code table now covers both "missing required path" AND "present-but-empty" under exit 3 to match the empty-required-path failure kind landed earlier in this branch. Exit 1 header also widened to "not-a-regular-file" per the fix above. P2 (maintainability): introduce `isAuditError` typeguard so main() narrows the union without `"kind" in result` plus `as AuditError` cast. Note: Copilot's suggested `!Array.isArray(result)` direct narrow fails TS2339 because TS's lib.es5 `Array.isArray` signature doesn't discriminate `readonly T[]` unions; the typeguard is the canonical TS workaround. Empirically verified both exit-1 branches (missing path + non-file path); typecheck clean. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
3 tasks
AceHack
added a commit
that referenced
this pull request
May 26, 2026
…otloader any-of (#5125) * fix(ci P0): cascade #4 audit blocked ALL ISO builds since #5119 — REQUIRED list asserted boot/grub/grub.cfg but NixOS uses isolinux+refind; replaced with bootloader any-of check EMPIRICAL ANCHOR (the maintainer 2026-05-26 caught this): `gh run list --workflow=build-ai-cluster-iso.yml` showed the last 4 builds all failed on the cascade #4 audit step. Last successful build was 17523e4 (PR #5117 iter-5.2.1 era) — BEFORE iter-5.2.2 (login-banner) and iter-5.3 (password prompt) shipped. The maintainer was about to flash a USB expecting the latest CI ISO; the latest CI artifact is actually stale because the audit blocks artifact upload on assertion failure. Root cause: REQUIRED_ISO_PATHS asserted `boot/grub/grub.cfg` (legacy GRUB layout). NixOS installer ISOs as of nixos-24.11 use: - **isolinux** for BIOS boot: `isolinux/isolinux.cfg` - **refind** for UEFI boot: `EFI/BOOT/refind_x64.efi` NOT legacy grub at the asserted path. Build log confirms: `efi-image_eltorito > Copying grub.cfg` lands in EFI/, not boot/grub/. My cascade #4 draft list was version-skewed (training-data default leaked through; ironically exactly what B-0805 capstone names as the systemic agent-discipline gap). Fix: - Drop `boot/grub/grub.cfg` from REQUIRED_ISO_PATHS (the 3 remaining — nix-store.squashfs + boot/bzImage + boot/initrd — ARE sufficient to assert "bootable NixOS installer ISO": without those nothing boots). - Add REQUIRED_BOOTLOADER_ANY: any-of family check for bootloader configs across the known NixOS-version layouts (isolinux/refind/grub). Forward-compatible: if NixOS switches bootloaders in a future channel, add the new path to the any-of list rather than re-breaking. - Header comment documents the empirical anchor so future-Otto doesn't re-introduce the same legacy-path assumption. Confirms B-0805 (capstone, P1) was the right substrate-engineering call: this exact failure mode is what dep-pin-search-first-authority discipline prevents. Once this lands, the next ISO build on main will pick up iter-5.2.2 + iter-5.3 substrate and the artifact will reflect the current substrate state. The maintainer can then re-flash + install with confidence. Co-Authored-By: Claude <noreply@anthropic.com> * fix(p0-iso-audit): use .some() instead of .find()+unused-const (Copilot P0 on #5125) bootloaderHit const was assigned but never used; would fail tsc under noUnusedLocals. Switched to boolean .some() check which avoids the unused-variable shape entirely. No behavior change. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude <noreply@anthropic.com>
AceHack
added a commit
that referenced
this pull request
May 26, 2026
…-search-first-authority (3-anchor empirical evidence 2026-05-26) (#5131) * rule: verify-existing-substrate-before-authoring (sibling to dep-pin-search-first-authority) — 3-anchor empirical evidence from session 2026-05-26 Single 2026-05-26 session produced 3 same-root-cause failures ("Otto-defaults-to-plausible-but-unverified" at substrate-authoring scope): ANCHOR 1: cascade #4 ISO audit (PR #5119) asserted boot/grub/grub.cfg without verifying NixOS-actual layout (isolinux + refind). Blocked 4 ISO builds. Fixed via PR #5125. Covered by dep-pin-search-first- authority rule landed PR #5126. ANCHOR 2: B-0806 backlog row (PR #5129) authored Ace section as if Ace were just "a package manager CLI" without reading docs/agendas/ace- package-manager/AGENDA.md + project memory + 7+ related backlog rows. The maintainer 2026-05-26: "that is what ace has been since we first talked about it you just keep forgetting we have substantial backlog around this". Fixed via PR #5130. ANCHOR 3: B-0806 hat/fork-negotiation NOT integrated into architecture even after Anchor-2 correction. The maintainer 2026-05-26: "i'm assuming you have the hat / fork negoation for ace too". Fixed via PR #5130 follow-on commit. Same root cause class as the dep-pin rule, but at a DIFFERENT surface: this is substrate-authoring scope (backlog rows, rules, skills, architectural framings), not version-pin scope. dep-pin-search-first- authority + this rule + fighting-past-self-vs-peer-agent compose to cover the surfaces today's empirical evidence showed are vulnerable. The rule auto-loads at cold-boot per wake-time-substrate. Provides: - Operational discipline: 4-step grep + read top hits + decide + cite inline - Checklist template for inline substrate-inventory pass annotation - All 3 empirical anchors preserved so future-Otto sees the cost of skipping - Cross-references to dep-pin + fighting-past-self for full coverage Co-Authored-By: Claude <noreply@anthropic.com> * fix(rule-ext): MD032 false-positive — "+ refind" parsed as list start; reword to "plus refind" markdownlint MD032 fired on line 100 because the wrap-continuation "+ refind, NOT legacy GRUB..." starts with `+ ` which is a valid markdown list marker. Linter doesn't know this is a wrapped paragraph continuation from line 99. Reword "isolinux + refind" → "isolinux plus refind" to disambiguate. No content change. Co-Authored-By: Claude <noreply@anthropic.com> * fix(rule-ext): 2 real Copilot findings on #5131 — content-grep + fixed-string discipline; 3rd (table double-pipe) is FP (1) Earlier inventory snippet used filename/directory-name filtering (`find docs/agendas -type d | grep -i "$topic"`) which misses substrate that mentions the topic in CONTENT without the keyword in the filename. Should be content-search via grep -rl. Same gap for docs/trajectories/. (2) Earlier snippet used `grep -E "$topic"` (regex) + unquoted shell globs (`memory/*${topic}*`). Both break when topic contains regex metacharacters (`+`, `.`, `B-NNNN`) or spaces. Use `grep -F` (fixed-string) for safety + content-search (no globs). (3) Bonus fix: `.claude/skills/` was missing from the inventory surfaces even though skills are explicitly in-scope for the rule. Added. 3rd Copilot thread (table double-pipe at line 158/149) is the documented known-FP class per `.claude/rules/blocked-green-ci-investigate-threads.md` ("Table double-pipe (`||`) ... 4 confirmed FPs in one session"). Direct inspection of line 158 (`| Surface | Rule that catches it |`) confirms single pipes; resolving that thread no-op per the suspect-by-default discipline. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cascade #4 of 5 (per Aaron 2026-05-26 'start working on the CI stuff while we iterate'). Complements #1 (source-substrate preflight; merged via #5116) by catching the bug class where ISO build silently drops files present in source. New
tools/ci/audit-installer-iso-content.ts(7z list of built ISO; asserts nix-store.squashfs + boot/{bzImage,initrd,grub/grub.cfg} present). Workflow step inserted between 'Build installer ISO' + 'Locate ISO' — failure skips upload, broken-ISO artifact never reaches operators. Composes with #5116 audit.