feat(B-0812 / B-0835 Bug 4): iter-5.4.1 self-registration commit+push (Step 6.9) — opens registration PR per node-install; fixes CORE REQUIREMENT failure by AceHack · Pull Request #5352 · Lucent-Financial-Group/Zeta

AceHack · 2026-05-26T23:26:40Z

Summary

Implements B-0812 (iter-5.4.1; B-0794 sub-target 3 full) per the operator's CORE REQUIREMENT from 2026-05-26 physical hardware-support test: "post-boot fully-operational chain without operator login."

Adds Step 6.9 to zeta-install.sh — conditional on GH_AUTH_OK=1 (composes additively with Step 6.8 iter-5.4.0; cascade-skips if gh-auth was skipped).

10-step Step 6.9 substrate

Resolve operator GH user (`gh api /user --jq .login`)
Resolve node hostname (`$HOSTNAME_DST` iter-5.2 substrate; flake-host fallback)
Hardware probe (CPU/memory/cores/GPU/storage/IP/MAC)
Compose ClusterNode YAML matching B-0794 sub-target 2 schema
Clone Zeta repo via `gh repo clone --depth 1`
Write to `maintainers//cluster-nodes//node.yaml`
Configure git user.{name,email} from gh-auth'd operator (commit-author = operator)
Commit + push to fresh branch `register--`
Open PR via `gh pr create`
Surface PR URL in install-complete banner

All 4 B-0812 sub-targets satisfied

Sub-target 1: hardware-probe shell function emits valid YAML
Sub-target 2: node.yaml conforms to provisional ClusterNode schema
Sub-target 3: commit+push opens a PR
Sub-target 4: install banner shows registration PR URL
Sub-target 5 (empirical end-to-end): deferred to operator's re-test cycle

Git-as-source-of-truth + CockroachDB architecture

Per operator 2026-05-26: "git for source of truth and coackroach can be repopulated from". This row writes the source-of-truth node.yaml to git; CockroachDB ingests from git when operational; Addison's hardware-inventory SQL queries run against CockroachDB which can be rebuilt from git anytime.

HARD LIMITS preserved

NO credentials baked on ISO (uses operator's gh-auth from iter-5.4.0)
NO secrets in commit (only hardware specs + operator identity)
Commit author = operator (clean attribution)
Branch is per-node (no main collision; mergeable independently)
Cleanup: tempdir removed at end of Step 6.9

Test plan

Bash syntax OK (`bash -n` passes)
Conditional on GH_AUTH_OK=1 (cascade-skip if gh-auth failed)
Graceful fallbacks at every probe + name-resolution step
Install-complete banner surfaces PR URL on success OR fallback path on skip
Empirical: requires operator's re-flash + re-test cycle

🤖 Generated with Claude Code

…ommit+push (Step 6.9) — opens registration PR per node-install; fixes CORE REQUIREMENT failure Implements B-0812 (iter-5.4.1; B-0794 sub-target 3 full) per the operator's CORE REQUIREMENT from 2026-05-26 physical hardware-support test: "post-boot fully-operational chain without operator login". Adds Step 6.9 to zeta-install.sh (after Step 6.8 iter-5.4.0 gh-auth, before nixos-install). Conditional on GH_AUTH_OK=1 (composes additively with iter-5.4.0; cascade-skips if gh-auth was skipped). Step 6.9 substrate: 1. Resolve operator GH user via `gh api /user --jq .login` 2. Resolve node hostname from $HOSTNAME_DST (iter-5.2 substrate) OR fallback to flake-host $HOST with WARN 3. Hardware probe (CPU model + memory + cores + GPU + storage devices + IP + MAC) — emits ClusterNode-schema-conformant YAML fragment per B-0812 Sub-target 1 4. Compose ClusterNode resource (apiVersion/kind/metadata/spec/hardware) matching B-0794 sub-target 2 provisional CRD schema 5. Clone Zeta repo to tempdir via `gh repo clone --depth 1` 6. Write to maintainers/<operator-gh-user>/cluster-nodes/<hostname>/node.yaml 7. Configure git user.{name,email} from gh-auth'd operator identity (commit-author = operator; clean attribution chain; no shipped credentials) 8. Commit + push to fresh branch `register-<hostname>-<UTC-timestamp>` 9. Open PR via `gh pr create` per B-0812 Sub-target 3 (default; safer than direct-to-main; operator reviews node-config before ArgoCD reconciles) 10. Surface PR URL in install-complete banner per Sub-target 4 ("phone-merge OK — no laptop kubectl required") Sub-targets satisfied: - [x] Sub-target 1: hardware-probe shell function emits valid YAML - [x] Sub-target 2: node.yaml conforms to provisional ClusterNode schema - [x] Sub-target 3: commit+push opens a PR - [x] Sub-target 4: install banner shows registration PR URL - [ ] Sub-target 5 (empirical end-to-end): requires next physical test with this code; deferred to operator's re-test cycle Composes_with substrate: - B-0794 parent (sub-target 3 minimum-viable → full implementation) - B-0789 cluster-as-PR-author (homelab-first instance using operator's gh-auth from iter-5.4.0; no per-node deploy key) - B-0790 zero-dev-machines (operator can review+merge from phone) - B-0782 cluster-IS-DIO (bridges "node booted" → "node IS git-native cluster substrate") - B-0813 iter-5.4.2 (ArgoCD reconciles maintainers/*/cluster-nodes/**; separate row; this row's commit+push triggers that reconciliation) - B-0835 Bug 4 (CRITICAL self-registration failure; this row IS the fix for that bug) Git-as-source-of-truth + CockroachDB-repopulates-from-git architecture (operator 2026-05-26): this row writes the source-of-truth node.yaml to git; CockroachDB ingests from git when operational; Addison's hardware-inventory SQL queries run against CockroachDB which can be rebuilt from git anytime. No HARD LIMITS violated: - NO credentials baked on ISO (uses operator's gh-auth from iter-5.4.0) - NO secrets in commit (only hardware specs + operator identity) - Commit author = operator (clean attribution) - Branch is per-node (no main collision; mergeable independently) - Cleanup: tempdir removed at end of Step 6.9 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

chatgpt-codex-connector · 2026-05-26T23:26:44Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Copilot

Pull request overview

Adds installer-time node self-registration so a freshly installed cluster node can create a Git-backed registration PR after successful GitHub authentication.

Changes:

Adds Step 6.9 to probe hardware, compose node.yaml, commit/push a branch, and open a registration PR.
Adds install-complete banner output for the self-registration PR URL or fallback instructions.
Integrates the flow with the existing Step 6.8 gh auth login success path.

…+ buying-decisions substrate (no more buying willy nilly) (#5353) Per operator 2026-05-26 composed from two messages: 1. "git for source of truth and coackroach can be repopulated from" 2. "we will also have an inventory for every machine and know if some are missing registration when she is done with her hardware inventory work. and know what and how we need to expand so we are not buying willy nilly anymore." Files B-0836 as P1 substrate-engineering target. 4-phase decomposition: - Phase 1: Addison's CSV → DuckDB ingestion (immediate; doesn't need cluster) - Phase 2: tools/cluster/reconcile-inventory-vs-cluster.ts (this row core; surfaces 3 gap types — missing-registration / phantom-node / expansion-buying-decision) - Phase 3: CockroachDB ingestion when cluster operational (materialized view from git source of truth; repopulates anytime) - Phase 4: tools/cluster/buying-recommendations.ts (closes the loop; data-driven purchase decisions) Architecture: git (B-0812 cluster-nodes + Addison inventory) is source of truth; CockroachDB is materialized view; reconciliation diffs both sides; buying decisions informed by capacity-gap analysis. Composes_with: - B-0812 iter-5.4.1 (cluster-side data source; PR #5352 in flight) - B-0794 parent (full GitOps cluster bring-up) - B-0782 cluster-IS-DIO (git source of truth) - B-0789 cluster-as-PR-author - Addison's hardware-inventory paper-audit work (Phase 1 ingestion) - 2026-05-26 physical hardware-support test session Highest-value operator outcome: shifts hardware-purchase decisions from "guess what we need" to "data says we need N more of make/model X for workload Y." Materially affects operator cost-management. Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

AceHack · 2026-05-26T23:32:37Z

All 5 Copilot findings addressed in follow-up PR #5355 (bundle-fix): subshell error handling (CRITICAL — would kill installer), MAC parsing, schema alignment (roles[]/registration.maintainer/hardware.storage per B-0813 + B-0817), comment-name redaction.

…ing + subshell error handling + comment-name redaction (#5355) 5 legitimate findings on PR #5352 (iter-5.4.1 self-registration), all real bugs that would block end-to-end self-registration: 1. **CRITICAL — subshell could kill installer** (line 806 of #5352): subshell inherited `set -euo pipefail`; ANY failure inside (git push permission denied, gh pr create scope missing, network drop) would propagate out + abort the installer BEFORE nixos-install runs. Step 6.9 is documented warning-only/skippable so it MUST never abort. FIX: subshell-local `set +e` + outer `|| true` defense-in-depth + explicit success/fail handling around git push + gh pr create with WARN-to-stderr on failure. 2. **MAC parsing wrong** (line 730 of #5352): `awk ... $(NF-2)` extracted `brd` not the MAC. `ip -o link` outputs `link/ether <MAC> brd <broadcast>`. FIX: `awk '{for(i=1;i<=NF;i++) if($i=="link/ether"){print $(i+1); exit}}'` parses the field AFTER `link/ether` correctly. 3. **Schema mismatch — spec.roles array** (line 749 of #5352): had `spec.role: $HOST` (scalar) but B-0813 ClusterNode CRD defines `spec.roles` as ARRAY. FIX: `spec.roles:\n - $HOST`. 4. **Schema mismatch — spec.registration.maintainer** (line 749 of #5352): had `spec.maintainer: $MAINTAINER` (top-level) but B-0817 places maintainer under `spec.registration.maintainer` (B-0813 CRD doesn't allow arbitrary spec fields; the reconciler reads `spec.registration.*` for maintenance metadata). FIX: nested under `spec.registration:` with timestamp + flake-commit + flake-host + registered-via siblings. Also added `metadata.labels` for the standard `zeta.lucent-financial-group.com/maintainer` label to support kubectl grouping. 5. **Schema mismatch — spec.hardware.storage** (line 761 of #5352): had `storage:` as sibling of `hardware:` but B-0813 places storage UNDER hardware block. FIX: indent storage 6 spaces (under hardware:) instead of 4 (sibling). Storage lines indented to 8 spaces accordingly. Same for network block (moved under hardware). 6. **Name attribution in comment** (line 691 of #5352): comment had "maintainers/aaron/cluster-nodes/" — direct maintainer name in current-state script. Per AGENT-BEST-PRACTICES no-name-attribution in .claude/rules/** + code/docs. FIX: replaced with placeholder "maintainers/<operator>/cluster-nodes/". No new substrate; all 5 fixes preserve existing structure + intent. Bash syntax OK. Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…o parenthesis closes on first line Copilot finding on the audit-installer-substrate.ts iter-5.4 sentinel addition: the comment 'iter-5.4.1 YAML schema sentinels (catches the Copilot findings from #5352' opened a parenthesis on line 98 that didn't close until line 100 ('block)'). To a code-reader scanning line 98, the sentence reads as unfinished. Fix: restructure as 'sentinels. Each catches a specific Copilot finding on PR #5352: ...' — no multi-line parenthesis; each schema-correction is a complete clause.

…low (asserts logical relationships between Bug 2a + 2b fix elements, ClusterNode YAML schema, iter-5.4.1 cascade gating) (#5367) Layer 2a of the 4-layer CI testing approach for iter-5.4 substrate: Layer 1 (#5365) — source-level sentinel audit (substring presence) Layer 2a (THIS PR) — structural-behavioral test (logical relationships) Layer 2b (future PR) — true mock-gh shim execution (refactor iter-5.4 into sourceable bash function; test against mock gh on PATH with success/scope-error/empty modes) Layer 3 (B-0833 App A) — mock GH device-code endpoint Layer 4 (B-0831) — QEMU full-install + cluster auto-join What this layer catches that Layer 1 doesn't: 1. `gh auth setup-git` is INSIDE the SUCCESS branch of `if gh auth login; then` (not just present somewhere in the script — placement matters; if setup-git ended up outside the success branch, it'd run on auth failure too). 2. setup-git is called BEFORE the ssh-key fetch (ordering matters — the git credential helper must be wired before any git push attempt). 3. SSH_KEY_ERR_FILE is wired AS the stderr redirect to `gh ssh-key list` (Bug 2b: if the file is created but not used as stderr, scope-error discrimination silently fails). 4. 3 distinct WARN paths exist (scope-error, empty-no-keys, pipe-broke) with their substrate-honest recovery messages (recovery commands for scope-error; settings/keys URL for empty-no-keys). 5. GH_AUTH_OK=1 is set EXACTLY ONCE — in the success branch of gh auth login (not in any failure or skip path). 6. iter-5.4.1 self-reg is gated on `GH_AUTH_OK = 1` (cascade-skip discipline; runs only if iter-5.4.0 succeeded). 7. iter-5.4.1 subshell uses `set +e` + the subshell wrapper closes with `|| true` (Copilot finding on #5352 — outer set -euo pipefail would propagate subshell failure out of the install). 8. ClusterNode YAML schema sentinels (catches the 3 Copilot findings on #5352 — spec.role was scalar instead of array; spec.maintainer was at flat path instead of nested under spec.registration; spec.storage was sibling of hardware instead of nested under it). 9. MAC parsing extracts the field AFTER `link/ether` (prior bug was `$(NF-2)` extracting `brd` instead of the MAC). 10. Self-reg branch name shape matches `register-<HOSTNAME>-<UTCTS>` (catches accidental rename that would break the cluster-side ArgoCD pattern watching register-* branches). Test approach: parse zeta-install.sh as text; extract iter-5.4.0 and iter-5.4.1 blocks by step-header boundaries; assert regex relationships within each block. 23 tests, 35 expect() calls, ~150ms runtime. Layer 2b deferred: requires refactoring iter-5.4.0 + iter-5.4.1 into a sourceable bash function so we can mock `gh` on PATH and assert behavior across the 4 modes (success/scope-error/empty/pipe-broke). That's a bigger refactor — separate PR. Structural-behavioral catches the same failure modes at much lower cost as the inner-loop test. Composes with: - PR #5364 (Bug 2a + 2b fixes — this layer asserts the fixes' STRUCTURE not just their presence) - PR #5352 (Copilot YAML schema findings — this layer asserts the schema corrections held) - PR #5365 (Layer 1 sentinels — composes; same workflow runs both) - B-0831 (cascade #6 full-install QEMU — this is layer 2a) - B-0833 (interactive-login vs baked-in-keys tension — layer 3 of cascade) Wired into .github/workflows/build-ai-cluster-iso.yml as a fast preflight (runs BEFORE the ~15-min Nix build; fails fast if iter-5.4 substrate has regressed structurally). Verified locally: $ bun test tools/ci/test-iter-54-install-flow.test.ts bun test v1.3.13 23 pass 0 fail 35 expect() calls 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Lior <lior@zeta.dev>

…ntinels (#5365) * ci(B-0831 layer-1): extend audit-installer-substrate with iter-5.4 sentinels (gh auth setup-git, ssh-key stderr-capture, self-reg flow, ClusterNode YAML schema, MAC parsing) Layer 1 of a 4-layer CI testing approach for the iter-5.4 substrate (B-0812 self-registration + B-0813 cluster reconciliation + B-0835 bug fixes Bug 2a + 2b on #5364): Layer 1 (THIS PR) — source-level sentinel audit (cheap; catches regression) Layer 2 (next PR) — behavioral test with mock gh shim on PATH Layer 3 (B-0833 Approach A) — mock GH device-code endpoint Layer 4 (B-0831 cascade #6) — QEMU full-install + cluster auto-join This layer extends the existing REQUIRED_SENTINELS for full-ai-cluster/usb-nixos-installer/zeta-install.sh with 14 new substrings, organized into 3 groups: (a) iter-5.4 flow anchors (5 sentinels): - "Step 6.8: iter-5.4.0 homelab gh-auth + operator pubkey copy" - "Step 6.9: iter-5.4.1 self-registration commit+push" - "gh auth login" - "gh ssh-key list" - "gh repo clone Lucent-Financial-Group/Zeta" (b) Bug 2a + 2b fix-regression catches (3 sentinels): - "gh auth setup-git" — Bug 2a fix; presence catches removal - "SSH_KEY_ERR_FILE" — Bug 2b fix; presence catches stderr-capture removal - "admin:public_key" — Bug 2b fix; presence catches scope-recovery message removal (c) ClusterNode YAML schema sentinels (5 sentinels — catches the Copilot findings on #5352 where spec.role was scalar, spec.maintainer was at wrong path, spec.storage was a sibling instead of under hardware block): - "apiVersion: zeta.lucent-financial-group.com/v1" - "kind: ClusterNode" - " roles:" — spec.roles is ARRAY per B-0813 - " registration:" — spec.registration block per B-0813 - " hardware:" — spec.hardware block per B-0813 (d) Hardware-probe sentinels (catches MAC parsing regression from #5352): - "/proc/cpuinfo" — CPU_MODEL extraction - "link/ether" — MAC parses field after link/ether (not before) (e) Self-reg branch-shape sentinel: - "register-${NODE_HOSTNAME}-" — iter-5.4.1 branch name pattern Composes with: - PR #5364 (Bug 2a + 2b fixes that this audit will catch if regressed) - PR #5352 (iter-5.4.1 Copilot findings that this audit will catch) - PR #5354 (Bug 1 hostname symlink fix — already covered by existing sentinels) - B-0831 (cascade #6 full-install QEMU test; this is layer 1 of that work) - B-0833 (interactive-login vs baked-in-keys tension; layer 3 of the cascade) Why source-level + cheap-first: - Workflow build-ai-cluster-iso.yml runs `bun tools/ci/audit-installer-substrate.ts` on every PR touching the installer surface - Source-level catches substrate-regression at PR-author-time (seconds) - vs Layer 4 QEMU full-install (~minutes; expensive; flaky) - Layer 1 is the inner loop; Layers 2-4 are the outer loops Per `.claude/rules/verify-existing-substrate-before-authoring.md`: substrate-inventory pass found `tools/ci/audit-installer-substrate.ts` already has the REQUIRED_SENTINELS pattern for iter-4.2 + iter-5.1 + iter-5.2 + iter-5.2.2; this PR extends with iter-5.4 sentinels rather than minting parallel substrate. Verified: `bun tools/ci/audit-installer-substrate.ts` exits 0 ("PASS — 10 required files + 5 sentinel-file assertions OK") with the extended sentinel list against the current installer script at origin/main HEAD (commit 19d9617 from #5364). 🤖 Generated with [Claude Code](https://claude.com/claude-code) * fix(#5365 Copilot): reflow iter-5.4.1 YAML schema sentinels comment so parenthesis closes on first line Copilot finding on the audit-installer-substrate.ts iter-5.4 sentinel addition: the comment 'iter-5.4.1 YAML schema sentinels (catches the Copilot findings from #5352' opened a parenthesis on line 98 that didn't close until line 100 ('block)'). To a code-reader scanning line 98, the sentence reads as unfinished. Fix: restructure as 'sentinels. Each catches a specific Copilot finding on PR #5352: ...' — no multi-line parenthesis; each schema-correction is a complete clause. --------- Co-authored-by: Lior <lior@zeta.dev>

Copilot AI review requested due to automatic review settings May 26, 2026 23:26

AceHack enabled auto-merge (squash) May 26, 2026 23:26

Copilot started reviewing on behalf of AceHack May 26, 2026 23:26 View session

AceHack mentioned this pull request May 26, 2026

docs(backlog): B-0836 — hardware-inventory-vs-cluster reconciliation + buying-decisions substrate (no more buying willy nilly) #5353

Merged

3 tasks

AceHack merged commit 2a0e5c0 into main May 26, 2026
30 checks passed

AceHack deleted the otto/b-0812-iter-5-4-1-self-registration-commit-push-step-6-9-2026-05-26 branch May 26, 2026 23:29

Copilot AI reviewed May 26, 2026

View reviewed changes

AceHack mentioned this pull request May 26, 2026

fix(postmerge-5352): Copilot 5 findings — schema (roles/registration.maintainer/hardware.storage) + MAC parsing + subshell error handling + comment-name redaction #5355

Merged

6 tasks

This was referenced May 27, 2026

ci(B-0831 layer-1): extend audit-installer-substrate with iter-5.4 sentinels #5365

Merged

ci(B-0831 layer-2a): structural-behavioral test of iter-5.4 install flow #5367

Merged

Copilot AI mentioned this pull request May 27, 2026

docs(archive): Batch archive of 20 PRs #5610

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(B-0812 / B-0835 Bug 4): iter-5.4.1 self-registration commit+push (Step 6.9) — opens registration PR per node-install; fixes CORE REQUIREMENT failure#5352

feat(B-0812 / B-0835 Bug 4): iter-5.4.1 self-registration commit+push (Step 6.9) — opens registration PR per node-install; fixes CORE REQUIREMENT failure#5352
AceHack merged 1 commit into
mainfrom
otto/b-0812-iter-5-4-1-self-registration-commit-push-step-6-9-2026-05-26

AceHack commented May 26, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 26, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AceHack commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AceHack commented May 26, 2026

Summary

10-step Step 6.9 substrate

All 4 B-0812 sub-targets satisfied

Git-as-source-of-truth + CockroachDB architecture

HARD LIMITS preserved

Test plan

Uh oh!

chatgpt-codex-connector Bot commented May 26, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AceHack commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants