Skip to content

ci(B-0831 layer-1): extend audit-installer-substrate with iter-5.4 sentinels#5365

Merged
AceHack merged 2 commits into
mainfrom
ci-layer1-iter54-sentinels-audit-installer-substrate-otto-cli-2026-05-26
May 27, 2026
Merged

ci(B-0831 layer-1): extend audit-installer-substrate with iter-5.4 sentinels#5365
AceHack merged 2 commits into
mainfrom
ci-layer1-iter54-sentinels-audit-installer-substrate-otto-cli-2026-05-26

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 27, 2026

Layer 1 of 4-layer CI testing approach for iter-5.4 substrate

Aaron asked: "yeah push forward a bit maybe create some more ci tests how do you want to test the gh login flow?"

The 4-layer plan:

Layer Approach Cost Catches
Layer 1 (THIS PR) Source-level sentinel audit Seconds Substrate regression (text-level)
Layer 2 (next PR) Behavioral test with mock `gh` shim on PATH ~1s Conditional-logic regression
Layer 3 (B-0833 Approach A) Mock GH device-code endpoint ~10s Real interactive-login flow without humans
Layer 4 (B-0831 cascade #6) QEMU full-install + cluster auto-join Minutes End-to-end including reboot + ArgoCD

What this PR adds

Extends `REQUIRED_SENTINELS` for `full-ai-cluster/usb-nixos-installer/zeta-install.sh` with 14 new substrings:

(a) iter-5.4 flow anchors

  • `Step 6.8: iter-5.4.0 homelab gh-auth + operator pubkey copy`
  • `Step 6.9: iter-5.4.1 self-registration commit+push`
  • `gh auth login`
  • `gh ssh-key list`
  • `gh repo clone Lucent-Financial-Group/Zeta`

(b) Bug 2a + 2b fix-regression catches (PR #5364)

  • `gh auth setup-git` — Bug 2a fix
  • `SSH_KEY_ERR_FILE` — Bug 2b stderr capture
  • `admin:public_key` — Bug 2b scope-recovery guidance

(c) ClusterNode YAML schema sentinels (PR #5352 Copilot findings)

  • `apiVersion: zeta.lucent-financial-group.com/v1`
  • `kind: ClusterNode`
  • ` roles:` — spec.roles is ARRAY (was scalar spec.role)
  • ` registration:` — spec.registration block (was spec.maintainer flat)
  • ` hardware:` — spec.hardware block (storage was sibling)

(d) Hardware-probe sentinels (MAC parsing regression catch)

  • `/proc/cpuinfo` — CPU_MODEL extraction
  • `link/ether` — MAC parses field AFTER link/ether

(e) Self-reg branch shape

  • `register-${NODE_HOSTNAME}-` — iter-5.4.1 branch name pattern

Verified

```
$ bun tools/ci/audit-installer-substrate.ts
audit-installer-substrate: PASS — 10 required files + 5 sentinel-file assertions OK
```

Runs in the existing `build-ai-cluster-iso.yml` workflow on every PR touching the installer surface.

Composes with

Substrate-honest framing

Layer 1 doesn't test BEHAVIOR — only that the substrate is PRESENT. A future Aaron-edit that accidentally removes `gh auth setup-git` would be caught by this layer; an edit that changes `gh auth setup-git` to `gh auth setup-git --hostname github.com` would still pass (substring match). Layer 2 (mock-gh shim) catches behavioral regressions; this layer is the cheapest first line of defense.

🤖 Generated with Claude Code

…ntinels (gh auth setup-git, ssh-key stderr-capture, self-reg flow, ClusterNode YAML schema, MAC parsing)

Layer 1 of a 4-layer CI testing approach for the iter-5.4 substrate
(B-0812 self-registration + B-0813 cluster reconciliation + B-0835 bug
fixes Bug 2a + 2b on #5364):

  Layer 1 (THIS PR) — source-level sentinel audit (cheap; catches regression)
  Layer 2 (next PR) — behavioral test with mock gh shim on PATH
  Layer 3 (B-0833 Approach A) — mock GH device-code endpoint
  Layer 4 (B-0831 cascade #6) — QEMU full-install + cluster auto-join

This layer extends the existing REQUIRED_SENTINELS for
full-ai-cluster/usb-nixos-installer/zeta-install.sh with 14 new
substrings, organized into 3 groups:

(a) iter-5.4 flow anchors (5 sentinels):
  - "Step 6.8: iter-5.4.0 homelab gh-auth + operator pubkey copy"
  - "Step 6.9: iter-5.4.1 self-registration commit+push"
  - "gh auth login"
  - "gh ssh-key list"
  - "gh repo clone Lucent-Financial-Group/Zeta"

(b) Bug 2a + 2b fix-regression catches (3 sentinels):
  - "gh auth setup-git"     — Bug 2a fix; presence catches removal
  - "SSH_KEY_ERR_FILE"      — Bug 2b fix; presence catches stderr-capture removal
  - "admin:public_key"      — Bug 2b fix; presence catches scope-recovery message removal

(c) ClusterNode YAML schema sentinels (5 sentinels — catches the Copilot
findings on #5352 where spec.role was scalar, spec.maintainer was at
wrong path, spec.storage was a sibling instead of under hardware block):
  - "apiVersion: zeta.lucent-financial-group.com/v1"
  - "kind: ClusterNode"
  - "  roles:"             — spec.roles is ARRAY per B-0813
  - "  registration:"      — spec.registration block per B-0813
  - "  hardware:"          — spec.hardware block per B-0813

(d) Hardware-probe sentinels (catches MAC parsing regression from #5352):
  - "/proc/cpuinfo"   — CPU_MODEL extraction
  - "link/ether"      — MAC parses field after link/ether (not before)

(e) Self-reg branch-shape sentinel:
  - "register-${NODE_HOSTNAME}-" — iter-5.4.1 branch name pattern

Composes with:
  - PR #5364 (Bug 2a + 2b fixes that this audit will catch if regressed)
  - PR #5352 (iter-5.4.1 Copilot findings that this audit will catch)
  - PR #5354 (Bug 1 hostname symlink fix — already covered by existing sentinels)
  - B-0831 (cascade #6 full-install QEMU test; this is layer 1 of that work)
  - B-0833 (interactive-login vs baked-in-keys tension; layer 3 of the cascade)

Why source-level + cheap-first:
  - Workflow build-ai-cluster-iso.yml runs `bun tools/ci/audit-installer-substrate.ts`
    on every PR touching the installer surface
  - Source-level catches substrate-regression at PR-author-time (seconds)
  - vs Layer 4 QEMU full-install (~minutes; expensive; flaky)
  - Layer 1 is the inner loop; Layers 2-4 are the outer loops

Per `.claude/rules/verify-existing-substrate-before-authoring.md`:
substrate-inventory pass found `tools/ci/audit-installer-substrate.ts`
already has the REQUIRED_SENTINELS pattern for iter-4.2 + iter-5.1 +
iter-5.2 + iter-5.2.2; this PR extends with iter-5.4 sentinels rather
than minting parallel substrate.

Verified: `bun tools/ci/audit-installer-substrate.ts` exits 0
("PASS — 10 required files + 5 sentinel-file assertions OK") with the
extended sentinel list against the current installer script at
origin/main HEAD (commit 19d9617 from #5364).

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Copilot AI review requested due to automatic review settings May 27, 2026 00:41
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@AceHack AceHack enabled auto-merge (squash) May 27, 2026 00:41
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends the source-level CI sentinel audit for the AI-cluster installer substrate to cover the iter-5.4.0/5.4.1 GitHub auth + self-registration flows, so text-level regressions (dropped commands / dropped YAML schema anchors) are caught quickly in CI.

Changes:

  • Added iter-5.4 sentinel substrings for zeta-install.sh covering gh auth, ssh-key retrieval, repo clone, and registration-branch shape.
  • Added schema/hardware-probe sentinels to catch regressions in ClusterNode YAML composition and MAC parsing.
  • Updated the sentinel rationale string to reflect the newly-audited substrate.

Comment thread tools/ci/audit-installer-substrate.ts Outdated
…o parenthesis closes on first line

Copilot finding on the audit-installer-substrate.ts iter-5.4 sentinel
addition: the comment 'iter-5.4.1 YAML schema sentinels (catches the
Copilot findings from #5352' opened a parenthesis on line 98 that
didn't close until line 100 ('block)'). To a code-reader scanning
line 98, the sentence reads as unfinished.

Fix: restructure as 'sentinels. Each catches a specific Copilot
finding on PR #5352: ...' — no multi-line parenthesis; each
schema-correction is a complete clause.
auto-merge was automatically disabled May 27, 2026 00:45

Pull Request is not mergeable

AceHack added a commit that referenced this pull request May 27, 2026
…low (asserts logical relationships between Bug 2a + 2b fix elements, ClusterNode YAML schema, iter-5.4.1 cascade gating) (#5367)

Layer 2a of the 4-layer CI testing approach for iter-5.4 substrate:

  Layer 1 (#5365)        — source-level sentinel audit (substring presence)
  Layer 2a (THIS PR)     — structural-behavioral test (logical relationships)
  Layer 2b (future PR)   — true mock-gh shim execution (refactor iter-5.4
                            into sourceable bash function; test against
                            mock gh on PATH with success/scope-error/empty
                            modes)
  Layer 3 (B-0833 App A) — mock GH device-code endpoint
  Layer 4 (B-0831)       — QEMU full-install + cluster auto-join

What this layer catches that Layer 1 doesn't:

1. `gh auth setup-git` is INSIDE the SUCCESS branch of
   `if gh auth login; then` (not just present somewhere in the script
   — placement matters; if setup-git ended up outside the success
   branch, it'd run on auth failure too).

2. setup-git is called BEFORE the ssh-key fetch (ordering matters —
   the git credential helper must be wired before any git push attempt).

3. SSH_KEY_ERR_FILE is wired AS the stderr redirect to `gh ssh-key list`
   (Bug 2b: if the file is created but not used as stderr, scope-error
   discrimination silently fails).

4. 3 distinct WARN paths exist (scope-error, empty-no-keys, pipe-broke)
   with their substrate-honest recovery messages (recovery commands for
   scope-error; settings/keys URL for empty-no-keys).

5. GH_AUTH_OK=1 is set EXACTLY ONCE — in the success branch of
   gh auth login (not in any failure or skip path).

6. iter-5.4.1 self-reg is gated on `GH_AUTH_OK = 1` (cascade-skip
   discipline; runs only if iter-5.4.0 succeeded).

7. iter-5.4.1 subshell uses `set +e` + the subshell wrapper closes
   with `|| true` (Copilot finding on #5352 — outer set -euo pipefail
   would propagate subshell failure out of the install).

8. ClusterNode YAML schema sentinels (catches the 3 Copilot findings
   on #5352 — spec.role was scalar instead of array; spec.maintainer
   was at flat path instead of nested under spec.registration;
   spec.storage was sibling of hardware instead of nested under it).

9. MAC parsing extracts the field AFTER `link/ether` (prior bug was
   `$(NF-2)` extracting `brd` instead of the MAC).

10. Self-reg branch name shape matches `register-<HOSTNAME>-<UTCTS>`
    (catches accidental rename that would break the cluster-side
    ArgoCD pattern watching register-* branches).

Test approach: parse zeta-install.sh as text; extract iter-5.4.0 and
iter-5.4.1 blocks by step-header boundaries; assert regex relationships
within each block. 23 tests, 35 expect() calls, ~150ms runtime.

Layer 2b deferred: requires refactoring iter-5.4.0 + iter-5.4.1 into a
sourceable bash function so we can mock `gh` on PATH and assert behavior
across the 4 modes (success/scope-error/empty/pipe-broke). That's a
bigger refactor — separate PR. Structural-behavioral catches the same
failure modes at much lower cost as the inner-loop test.

Composes with:

- PR #5364 (Bug 2a + 2b fixes — this layer asserts the fixes' STRUCTURE
  not just their presence)
- PR #5352 (Copilot YAML schema findings — this layer asserts the
  schema corrections held)
- PR #5365 (Layer 1 sentinels — composes; same workflow runs both)
- B-0831 (cascade #6 full-install QEMU — this is layer 2a)
- B-0833 (interactive-login vs baked-in-keys tension — layer 3 of cascade)

Wired into .github/workflows/build-ai-cluster-iso.yml as a fast
preflight (runs BEFORE the ~15-min Nix build; fails fast if iter-5.4
substrate has regressed structurally).

Verified locally:

  $ bun test tools/ci/test-iter-54-install-flow.test.ts
  bun test v1.3.13
   23 pass
   0 fail
   35 expect() calls

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Lior <lior@zeta.dev>
@AceHack AceHack merged commit 7fdf4db into main May 27, 2026
33 checks passed
@AceHack AceHack deleted the ci-layer1-iter54-sentinels-audit-installer-substrate-otto-cli-2026-05-26 branch May 27, 2026 00:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants