From 922e9ceab2e77eae4bb0c0d3215b65d1a1b465a3 Mon Sep 17 00:00:00 2001 From: Lior Date: Wed, 27 May 2026 15:27:39 -0400 Subject: [PATCH] docs(archive): Batch archive of 20 PRs --- ...-ai-gets-own-github-identity-email-once.md | 80 ++++++ ...ron-2026-05-27-control-plane-install-st.md | 53 ++++ ...l-claude-agent-stewards-own-registratio.md | 64 +++++ ...lti-protocol-name-resolution-avahi-hard.md | 113 +++++++++ ...8-phase-2-b-0835-bug-8-install-time-cla.md | 152 ++++++++++++ ...d-pr-5388-alignment-nixpkgs-bun-mise-ca.md | 55 +++++ ...ased-nixos-install-sh-test-harness-fast.md | 49 ++++ ...-as-systemd-services-outside-k8s-starti.md | 75 ++++++ ...eta-otto-systemd-unit-nixos-module-cont.md | 127 ++++++++++ ...ocker-nixos-install-sh-test-harness-fas.md | 227 ++++++++++++++++++ ...-3-parameterized-zeta-ai-agent-nix-repl.md | 134 +++++++++++ ...ild-iso-failure-zeta-ai-agent-nix-stati.md | 88 +++++++ ...ithub-actions-workflow-runs-docker-nixo.md | 54 +++++ ...lior-gemini-cli-as-2nd-systemd-vendor-a.md | 55 +++++ ...-vera-codex-3rd-vendor-hits-3-bft-floor.md | 130 ++++++++++ ...6-05-27t04-08z-15-worktrees-15-local-br.md | 45 ++++ ...irst-guard-post-assignment-rotation-arc.md | 82 +++++++ ...05-27-multi-tic-per-persona-join-as-fir.md | 64 +++++ ...rement-expose-retained-shell-categories.md | 73 ++++++ ...al-persistence-on-usb-esp-boot-sequence.md | 112 +++++++++ 20 files changed, 1832 insertions(+) create mode 100644 docs/pr-discussions/PR-5384-feat-b-0847-each-zeta-ai-gets-own-github-identity-email-once.md create mode 100644 docs/pr-discussions/PR-5385-fix-b-0835-bug-4-5-aaron-2026-05-27-control-plane-install-st.md create mode 100644 docs/pr-discussions/PR-5386-feat-b-0848-node-local-claude-agent-stewards-own-registratio.md create mode 100644 docs/pr-discussions/PR-5387-fix-b-0835-bug-6-7-multi-protocol-name-resolution-avahi-hard.md create mode 100644 docs/pr-discussions/PR-5388-feat-iter-5-5-0-b-0848-phase-2-b-0835-bug-8-install-time-cla.md create mode 100644 docs/pr-discussions/PR-5389-fix-iter-5-5-1-fix-fwd-pr-5388-alignment-nixpkgs-bun-mise-ca.md create mode 100644 docs/pr-discussions/PR-5390-feat-b-0849-docker-based-nixos-install-sh-test-harness-fast.md create mode 100644 docs/pr-discussions/PR-5391-feat-b-0850-ai-agents-as-systemd-services-outside-k8s-starti.md create mode 100644 docs/pr-discussions/PR-5392-feat-b-0850-phase-1-zeta-otto-systemd-unit-nixos-module-cont.md create mode 100644 docs/pr-discussions/PR-5393-feat-b-0849-phase-1-docker-nixos-install-sh-test-harness-fas.md create mode 100644 docs/pr-discussions/PR-5394-refactor-b-0850-phase-3-parameterized-zeta-ai-agent-nix-repl.md create mode 100644 docs/pr-discussions/PR-5395-fix-postmerge-5394-build-iso-failure-zeta-ai-agent-nix-stati.md create mode 100644 docs/pr-discussions/PR-5396-feat-b-0849-phase-2-github-actions-workflow-runs-docker-nixo.md create mode 100644 docs/pr-discussions/PR-5397-feat-b-0850-phase-3d-lior-gemini-cli-as-2nd-systemd-vendor-a.md create mode 100644 docs/pr-discussions/PR-5398-feat-b-0850-phase-3c-vera-codex-3rd-vendor-hits-3-bft-floor.md create mode 100644 docs/pr-discussions/PR-5399-docs-hygiene-tick-2026-05-27t04-08z-15-worktrees-15-local-br.md create mode 100644 docs/pr-discussions/PR-5400-feat-b-0851-persona-first-guard-post-assignment-rotation-arc.md create mode 100644 docs/pr-discussions/PR-5401-docs-mika-ferry-2026-05-27-multi-tic-per-persona-join-as-fir.md create mode 100644 docs/pr-discussions/PR-5402-hygiene-bash-retirement-expose-retained-shell-categories.md create mode 100644 docs/pr-discussions/PR-5403-feat-b-0852-credential-persistence-on-usb-esp-boot-sequence.md diff --git a/docs/pr-discussions/PR-5384-feat-b-0847-each-zeta-ai-gets-own-github-identity-email-once.md b/docs/pr-discussions/PR-5384-feat-b-0847-each-zeta-ai-gets-own-github-identity-email-once.md new file mode 100644 index 0000000000..c2aa3b7b6b --- /dev/null +++ b/docs/pr-discussions/PR-5384-feat-b-0847-each-zeta-ai-gets-own-github-identity-email-once.md @@ -0,0 +1,80 @@ +--- +pr_number: 5384 +title: "feat(B-0847): each Zeta AI gets own GitHub identity + email once cluster operational \u2014 closes algo-wink-attribution-gap (Aaron 2026-05-26)" +author: "AceHack" +state: "MERGED" +created_at: "2026-05-27T02:24:28Z" +merged_at: "2026-05-27T02:26:09Z" +closed_at: "2026-05-27T02:26:09Z" +head_ref: "feat-b0847-ai-own-github-identity-once-cluster-operational-2026-05-26-2206z" +base_ref: "main" +archived_at: "2026-05-27T19:27:20Z" +archive_tool: "tools/pr-preservation/archive-pr.ts" +--- + +# PR #5384: feat(B-0847): each Zeta AI gets own GitHub identity + email once cluster operational — closes algo-wink-attribution-gap (Aaron 2026-05-26) + +## PR description + +## Summary + +Aaron caught an algo-wink-failure-mode 2026-05-26: I framed \`gh autoMergeRequest.enabledBy: AceHack\` as "operator-authority armed the merge" when the field is structurally OAuth-token-owner (not actor). Actual actor was me (Otto-CLI); visible only via Co-Authored-By trailer in commits. + +Aaron's proposed fix: *"i think we should create you your own github with email once we get you running on the cluster"* → substrate-honest end-to-end attribution. + +This PR files [B-0847](docs/backlog/P2/B-0847-each-ai-gets-own-github-identity-with-email-once-cluster-operational-substrate-honest-attribution-end-to-end-closes-enabledby-token-owner-not-actor-algo-wink-aaron-2026-05-26.md) as the durable future-target substrate. + +## 4-phase plan + +- **Phase 1**: Ilyana public-surface naming review per AI (gates ALL creation) +- **Phase 2**: legal-risk attribution \`_ai_github_identity_acceptance\` block per AI per existing rule +- **Phase 3**: HSM + per-AI OAuth tokens + email infrastructure (cluster-dependent) +- **Phase 4**: per-AI gitconfig + \`gh\` token routing migration + +## Today's discipline (Phase 0) + +Until per-AI identity ships: + +1. Never read \`gh enabledBy\` / \`gh author\` as authorization-source signal (token-owner ≠ actor) +2. Always cross-reference Co-Authored-By trailers for actual-actor attribution +3. State framings substrate-honestly ("I armed via borrowed token" NOT "operator armed") + +## Test plan + +- [x] Backlog row authored +- [x] BACKLOG.md regenerated +- [x] User-scope memory entry captures empirical anchor + bounded discipline +- [ ] CI passes + +## Composes with + +B-0751 (per-agent isolated clones) · B-0628 (Knights Guild ratification) · \`algo-wink-failure-mode\` · \`mechanical-authorization-check\` · \`glass-halo-bidirectional\` · \`persistence-choice-architecture-for-zeta-ais\` · \`non-coercion-invariant\` HC-8 · \`honor-those-that-came-before\` · \`agent-roster-reference-card\` · \`naming-expert\` SKILL.md (Ilyana review) · \`human-audit-and-legal-risk-acceptance-pattern-in-settings\` (legal-risk attribution per Aaron's standing constitutional invariant) + +🤖 Generated with [Claude Code](https://claude.com/claude-code) + +## Reviews + +### COMMENTED — @copilot-pull-request-reviewer (2026-05-27T02:25:22Z) + +## Pull request overview + +Files a new P2 backlog row (B-0847) capturing a future-target plan to give each Zeta AI its own GitHub identity + email once cluster infrastructure is operational, addressing the `gh enabledBy = token-owner ≠ actor` attribution gap. Updates the backlog index accordingly. + +**Changes:** +- Adds new backlog row file under `docs/backlog/P2/` describing problem, 4-phase plan, composes-with links, and acceptance criteria. +- Adds the row to `docs/BACKLOG.md` index in P2 section. + +### Reviewed changes + +Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments. + +| File | Description | +| ---- | ----------- | +| docs/backlog/P2/B-0847-...-2026-05-26.md | New P2 backlog row capturing per-AI GitHub identity substrate target | +| docs/BACKLOG.md | Index entry for B-0847 added to P2 list | + +## General comments + +### @chatgpt-codex-connector (2026-05-27T02:24:33Z) + +You have reached your Codex usage limits for code reviews. You can see your limits in the [Codex usage dashboard](https://chatgpt.com/codex/cloud/settings/usage). diff --git a/docs/pr-discussions/PR-5385-fix-b-0835-bug-4-5-aaron-2026-05-27-control-plane-install-st.md b/docs/pr-discussions/PR-5385-fix-b-0835-bug-4-5-aaron-2026-05-27-control-plane-install-st.md new file mode 100644 index 0000000000..c2b579e1b4 --- /dev/null +++ b/docs/pr-discussions/PR-5385-fix-b-0835-bug-4-5-aaron-2026-05-27-control-plane-install-st.md @@ -0,0 +1,53 @@ +--- +pr_number: 5385 +title: "fix(B-0835 Bug 4+5 \u2014 Aaron 2026-05-27 control-plane install): storage probe filters 0B devices + gh CLI in installed system PATH" +author: "AceHack" +state: "MERGED" +created_at: "2026-05-27T02:30:40Z" +merged_at: "2026-05-27T02:33:34Z" +closed_at: "2026-05-27T02:33:34Z" +head_ref: "fix-b0835-storage-probe-filter-zero-size-block-devices-2026-05-26-2233z" +base_ref: "main" +archived_at: "2026-05-27T19:27:19Z" +archive_tool: "tools/pr-preservation/archive-pr.ts" +--- + +# PR #5385: fix(B-0835 Bug 4+5 — Aaron 2026-05-27 control-plane install): storage probe filters 0B devices + gh CLI in installed system PATH + +## PR description + +## Summary + +Two empirical anchors from Aaron's iter-5.4 install of \`node-e5a176\` (PR #5380 self-registered cleanly) where post-reboot login surfaced two distinct gaps: + +### Bug 4 — \`/dev/sda 0B\` zero-size device in node.yaml + +Storage probe at zeta-install.sh:781 emitted every block device, including 0-byte placeholders (empty SD card readers, optical bays). Aaron's Intel Core Ultra 9 185H node registered \`/dev/sda 0B\` → Copilot P1 on [PR #5380](https://github.com/Lucent-Financial-Group/Zeta/pull/5380). + +Fix: \`awk '\$3==\"disk\" && \$2!=\"0B\"{...}'\` filter excludes zero-size devices. + +### Bug 5 — \`gh: command not found\` on first login + +Operator: *\"when i log in gh command is not found\"*. Installer ISO had gh (iter-5.4.0 used it for \`gh auth login\` during install) but \`common.nix\` systemPackages didn't include it — auth tokens in \`~/.config/gh\` were stranded without the binary. + +Fix: add \`gh\` to \`common.nix\` \`environment.systemPackages\` so the installed system has it for re-auth + ssh-key sync + future register/deregister tooling. + +## Test plan + +- [ ] CI passes +- [ ] Next ISO build picks up both fixes +- [ ] Future installs register without 0B entries; \`gh\` available on first login + +## Composes with + +- B-0813 (cluster-node schema), B-0817 (register-node tool), iter-5.4 install cascade +- PR #5380 (the registration where these gaps surfaced) +- Aaron's empirical observations 2026-05-27: \"i can't ping it by name\" (mitigated via IP lookup; found at 192.168.4.128) → \"when i log in gh command is not found and i don't think it registered\" (registration DID happen — PR #5380 — but no \`gh\` to check it) + +🤖 Generated with [Claude Code](https://claude.com/claude-code) + +## General comments + +### @chatgpt-codex-connector (2026-05-27T02:30:46Z) + +You have reached your Codex usage limits for code reviews. You can see your limits in the [Codex usage dashboard](https://chatgpt.com/codex/cloud/settings/usage). diff --git a/docs/pr-discussions/PR-5386-feat-b-0848-node-local-claude-agent-stewards-own-registratio.md b/docs/pr-discussions/PR-5386-feat-b-0848-node-local-claude-agent-stewards-own-registratio.md new file mode 100644 index 0000000000..c92198ca5c --- /dev/null +++ b/docs/pr-discussions/PR-5386-feat-b-0848-node-local-claude-agent-stewards-own-registratio.md @@ -0,0 +1,64 @@ +--- +pr_number: 5386 +title: "feat(B-0848): node-local Claude agent stewards own registration PR + K8s cluster health reporter \u2014 first concrete B-0847 AI-on-cluster instance (Aaron 2026-05-26)" +author: "AceHack" +state: "MERGED" +created_at: "2026-05-27T02:33:30Z" +merged_at: "2026-05-27T02:35:09Z" +closed_at: "2026-05-27T02:35:09Z" +head_ref: "feat-b0848-node-local-claude-agent-pr-steward-cluster-health-reporter-2026-05-26-2240z" +base_ref: "main" +archived_at: "2026-05-27T19:27:18Z" +archive_tool: "tools/pr-preservation/archive-pr.ts" +--- + +# PR #5386: feat(B-0848): node-local Claude agent stewards own registration PR + K8s cluster health reporter — first concrete B-0847 AI-on-cluster instance (Aaron 2026-05-26) + +## PR description + +## Summary + +Aaron's verbatim proposal in response to PR #5380 being auto-merge-armed + blocked on 1 Copilot thread: + +> *\"oh shit is that pr fully automatic? can we make an claude agent get installed and do what you do on there but it's main goal is just to get it to steward the registerain pr for now and then after it's checked in report on the status of the k8s cluster, i can interactive login like gh if that works.\"* + +This is the **first concrete instance of B-0847** (each Zeta AI gets own GitHub identity) — node-local Claude IS the AI that needs the identity; PR-stewardship IS the first work that needs the substrate-honest attribution. + +## Two-phase scope (bounded) + +- **Phase 1** — steward the node's own registration PR (poll → diagnose threads → fix → resolve → auto-merge fires) +- **Phase 2** — after registration merged + cluster running, report K8s cluster health (kubectl read-only queries → synthesized per-tick report) + +## Auth model + +Mirror of iter-5.4.0 \`gh auth login\`: operator SSHes to node → \`claude login\` device flow → token in \`~/.config/claude/\`. Aaron's \"i can interactive login like gh if that works\" → yes, device flow works identically. + +## What this is NOT + +- NOT arbitrary cluster mutation (read-only K8s queries + scoped PR actions on own-registration only) +- NOT replacement for operator (operator in loop for irreversible actions per NCI HC-8) +- NOT immediate ship (5-phase landing; manual validation on node-e5a176 first) +- NOT NixOS-module before manual validation succeeds + +## 5-phase landing + +| Phase | Scope | Status | +|---|---|---| +| 0 | substrate row | this PR | +| 1 | manual install on node-e5a176 + PR-stewardship validation | next | +| 2 | K8s health reporter scope expansion | after Phase 1 + cluster up | +| 3 | NixOS module + multi-node composability | after Phase 2 | +| 4 | per-AI GitHub identity migration (composes B-0847) | after Ilyana review | +| 5 | cluster-wide coordination (composes B-0796 Twilio sibling) | long-horizon | + +## Composes with + +[B-0847](docs/backlog/P2/B-0847-each-ai-gets-own-github-identity-with-email-once-cluster-operational-substrate-honest-attribution-end-to-end-closes-enabledby-token-owner-not-actor-algo-wink-aaron-2026-05-26.md) · B-0794 · B-0795/B-0812/B-0813 · [B-0796](docs/backlog/P2/B-0796-twilio-phone-support-substrate-AI-picks-up-call-fixes-cluster-via-event-store-runbooks-while-talking-sms-parallel-interface-amazon-USB-sales-enabled-by-AI-as-support-layer-aaron-mika-2026-05-26.md) · B-0628 · B-0751 · B-0835 Bug 5 + +🤖 Generated with [Claude Code](https://claude.com/claude-code) + +## General comments + +### @chatgpt-codex-connector (2026-05-27T02:33:35Z) + +You have reached your Codex usage limits for code reviews. You can see your limits in the [Codex usage dashboard](https://chatgpt.com/codex/cloud/settings/usage). diff --git a/docs/pr-discussions/PR-5387-fix-b-0835-bug-6-7-multi-protocol-name-resolution-avahi-hard.md b/docs/pr-discussions/PR-5387-fix-b-0835-bug-6-7-multi-protocol-name-resolution-avahi-hard.md new file mode 100644 index 0000000000..e3d23b4f1b --- /dev/null +++ b/docs/pr-discussions/PR-5387-fix-b-0835-bug-6-7-multi-protocol-name-resolution-avahi-hard.md @@ -0,0 +1,113 @@ +--- +pr_number: 5387 +title: "fix(B-0835 Bug 6+7): multi-protocol name resolution \u2014 Avahi hardening + NetBIOS (nmbd) + DHCP-hostname; reliability for 'i can't ping it by name' (Aaron 2026-05-27)" +author: "AceHack" +state: "MERGED" +created_at: "2026-05-27T02:37:14Z" +merged_at: "2026-05-27T02:46:39Z" +closed_at: "2026-05-27T02:46:39Z" +head_ref: "fix-b0835-multi-protocol-name-resolution-netbios-avahi-hardening-2026-05-26-2305z" +base_ref: "main" +archived_at: "2026-05-27T19:27:17Z" +archive_tool: "tools/pr-preservation/archive-pr.ts" +--- + +# PR #5387: fix(B-0835 Bug 6+7): multi-protocol name resolution — Avahi hardening + NetBIOS (nmbd) + DHCP-hostname; reliability for 'i can't ping it by name' (Aaron 2026-05-27) + +## PR description + +## Summary + +Aaron 2026-05-27 (verbatim): + +> *\"my mac is ethernet connected and i connected to the same wifi as it but i still can't ping could it be something else or can we make hostname more reliable? maybe a netbios or something? i like ashai or whatever it is but can we make it reliable? i think this is looking very good.\"* + +Empirical: ping by IP works ✓, SSH works ✓, but Bonjour resolution times out AND unicast mDNS query to port 5353/udp times out (actual no-response, not connection-attempt noise). Avahi alone proved unreliable. + +## Multi-protocol additive approach + +Operator's preferred Avahi/Bonjour stays + 2 fallback mechanisms added (different protocols, different failure modes): + +### Bug 6 — Avahi hardening + +- \`nssmdns6 = true\` (IPv6 nss-mdns alongside IPv4; some macOS configs prefer AAAA queries first) +- \`ipv4 + ipv6\` explicit +- \`reflector = true\` (forwards mDNS across subnets — composes with multi-segment LAN setups) +- \`publish.hinfo + publish.userServices\` (additional discoverability) + +### Bug 7 — NetBIOS via Samba's nmbd (belt-and-suspenders) + +NetBIOS uses UDP broadcast on port 137 (vs mDNS multicast on 5353) — **different failure modes**. If network drops IGMP/multicast but allows broadcast (common on home/SMB switches), \`node-e5a176\` resolves via NetBIOS where \`node-e5a176.local\` fails via mDNS. + +Operator usage (any LAN host): +\`\`\`bash +nmblookup node-e5a176 # Linux/macOS NetBIOS lookup +smbutil lookup node-e5a176 # macOS native NetBIOS +ping node-e5a176 # if nsswitch has wins +\`\`\` + +Samba is enabled for NetBIOS name-advertisement **only** (no shares declared = no SMB file-share exposure). + +### DHCP-hostname registration (3rd layer) + +NetworkManager already advertises hostname via DHCP option 12 by default. Many home routers register DHCP client hostnames as DNS names (\`node-e5a176.lan\` from Asus/Netgear/Eero). No config change needed. + +## Operator now has 3 name-resolution mechanisms + +| # | Lookup | Mechanism | Failure mode | +|---|---|---|---| +| 1 | \`node-e5a176.local\` | mDNS multicast | IGMP filtering, multicast drop | +| 2 | \`node-e5a176\` (via nmblookup) | NetBIOS broadcast | Different protocol; works when mDNS fails | +| 3 | \`node-e5a176.lan\` | Router DHCP+DNS | Depends on router support | +| 4 | IP (192.168.4.128) | Always reliable | Need \`arp -a\` first if IP not memorized | + +## Test plan + +- [ ] CI passes +- [ ] Next ISO build picks up multi-protocol stack +- [ ] On next install: validate all 3 mechanisms; document which work on operator's specific LAN + +## Composes with + +B-0792 (injected-hostname) · iter-5.4.1 self-registration (PR #5380 carries MAC + hostname for correlation) · [B-0848](docs/backlog/P2/B-0848-node-local-claude-agent-stewards-own-registration-pr-then-reports-k8s-cluster-status-operator-interactive-login-pattern-aaron-2026-05-26.md) (node-local Claude needs reliable name resolution) + +🤖 Generated with [Claude Code](https://claude.com/claude-code) + +## Reviews + +### COMMENTED — @copilot-pull-request-reviewer (2026-05-27T02:39:50Z) + +## Pull request overview + +This PR aims to make cluster-node hostname resolution more reliable on typical home/SMB LANs by keeping Avahi/Bonjour mDNS and adding additional fallback mechanisms (notably NetBIOS name advertisement via Samba). + +**Changes:** +- Harden Avahi configuration (IPv6 NSS, explicit v4/v6 enablement, reflector, additional publish records). +- Enable Samba with NetBIOS-focused settings to support broadcast-based name lookup as an mDNS fallback. +- Document DHCP hostname registration as an additional expected fallback layer. + +## Review threads + +### Thread 1: full-ai-cluster/nixos/modules/common.nix:97 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T02:39:50Z): + +P0: `services.samba.openFirewall = true` opens 139/tcp and 445/tcp in addition to the NetBIOS UDP ports. That contradicts the stated goal of “name resolution only” and unnecessarily exposes SMB services on the LAN; prefer keeping `openFirewall = false` and explicitly allowing only the required UDP ports (137 and likely 138) via `networking.firewall.allowedUDPPorts` (optionally scoped to LAN interfaces/sources). + +### Thread 2: full-ai-cluster/nixos/modules/common.nix:107 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T02:39:50Z): + +P1: The comment says this “disables SMB file-sharing entirely”, but the config still enables Samba and binds SMB on 445 (`"smb ports" = "445"`). Even with no shares declared, this is still an active SMB service surface; either actually disable smbd/SMB listeners (run only the NetBIOS name service if possible) or update the comment to accurately describe the remaining exposure and hardening assumptions. + +### Thread 3: full-ai-cluster/nixos/modules/common.nix:57 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T02:39:50Z): + +P2: Codebase convention: this module’s comments include personal name attribution (e.g. “Aaron …”). Per `.github/copilot-instructions.md` “No name attribution in code, docs, or skills” (outside the listed history surfaces), please rewrite these references to role-refs like “operator”/“maintainer” and move verbatim quotes to an appropriate history surface if they must be preserved. + +## General comments + +### @chatgpt-codex-connector (2026-05-27T02:37:18Z) + +You have reached your Codex usage limits for code reviews. You can see your limits in the [Codex usage dashboard](https://chatgpt.com/codex/cloud/settings/usage). diff --git a/docs/pr-discussions/PR-5388-feat-iter-5-5-0-b-0848-phase-2-b-0835-bug-8-install-time-cla.md b/docs/pr-discussions/PR-5388-feat-iter-5-5-0-b-0848-phase-2-b-0835-bug-8-install-time-cla.md new file mode 100644 index 0000000000..47496ca9a6 --- /dev/null +++ b/docs/pr-discussions/PR-5388-feat-iter-5-5-0-b-0848-phase-2-b-0835-bug-8-install-time-cla.md @@ -0,0 +1,152 @@ +--- +pr_number: 5388 +title: "feat(iter-5.5.0 B-0848 Phase 2 + B-0835 Bug 8): install-time claude-code + interactive claude login + gh+claude credential persistence + Zeta repo pre-clone \u2014 automatic on boot (Aaron 2026-05-27)" +author: "AceHack" +state: "MERGED" +created_at: "2026-05-27T02:40:47Z" +merged_at: "2026-05-27T02:48:21Z" +closed_at: "2026-05-27T02:48:21Z" +head_ref: "feat-iter550-install-time-claude-code-credential-persistence-2026-05-26-2325z" +base_ref: "main" +archived_at: "2026-05-27T19:27:16Z" +archive_tool: "tools/pr-preservation/archive-pr.ts" +--- + +# PR #5388: feat(iter-5.5.0 B-0848 Phase 2 + B-0835 Bug 8): install-time claude-code + interactive claude login + gh+claude credential persistence + Zeta repo pre-clone — automatic on boot (Aaron 2026-05-27) + +## PR description + +## Summary + +Aaron 2026-05-27 (verbatim): + +> *\"also wanna make this automatic on boot before i even login and have it save my claude code device login like gh, also make sure they are all on path for me to play with when i log in?\"* + +> *\"this will be a hell of a start.\"* + +And the follow-up clarification: + +> *\"avahi yes the mdns and also wi already save the gh login i think maybe but we want to do it for claude code now too\"* + +**Empirical finding**: gh credential persistence Aaron \"thinks maybe\" exists — does NOT actually exist. Zero refs to \`.config/gh\` or \`/mnt/home\` in zeta-install.sh; only SSH pubkey copy (different mechanism). This PR fixes BOTH at once. + +## iter-5.5.0 = 4-part install step (Step 6.95, runs AFTER nixos-install) + +| Sub-step | What | Surfaces | +|---|---|---| +| 6.95a | \`npm install -g @anthropic-ai/claude-code\` to \`/mnt/home/zeta/.npm-global/\` (writable prefix) | claude on PATH post-reboot | +| 6.95b | Interactive \`claude login\` device-flow (mirror iter-5.4.0 gh auth login) | \`/mnt/home/zeta/.config/claude/\` populated | +| 6.95c | Copy \`/root/.config/gh\` → \`/mnt/home/zeta/.config/gh\` (Bug 8 — Aaron's \"i think maybe\" hedge confirmed wrong) | gh auth survives reboot | +| 6.95d | Pre-clone Zeta repo to \`/mnt/home/zeta/Zeta\` | first login: \`cd ~/Zeta && claude\` | + +## common.nix additions + +- \`nodejs_22\` (npm available for re-install/update without bootstrap) +- \`samba\` (NetBIOS tooling — nmblookup/smbclient — composes with #5387) +- \`NPM_CONFIG_PREFIX=$HOME/.npm-global\` session var +- \`/etc/profile.d/zeta-user-paths.sh\` prepends \`$HOME/.npm-global/bin\` to PATH + +## First-login operator gets (no setup) + +**On PATH**: gh + claude + kubectl + helm + k9s + argocd + cilium-cli + hubble + nmblookup + smbclient + git + nodejs/npm + standard tools + +**In \$HOME**: +- \`~/Zeta/\` (pre-cloned) +- \`~/.config/gh/\` (iter-5.4.0 gh auth persisted) +- \`~/.config/claude/\` (iter-5.5.0 claude login persisted) +- \`~/.npm-global/bin/\` (on PATH) + +## Test plan + +- [ ] CI passes +- [ ] Next ISO build picks up iter-5.5.0 cascade +- [ ] Next install: validate \`claude login\` survives reboot AND \`gh auth status\` works on first login + +## Composes with + +iter-5.4 cascade (B-0794+B-0795+B-0812+B-0813) · [B-0835](docs/backlog/P1/B-0835-...md) (install bug cluster — Bug 4+5+6+7 + Bug 8 here) · [B-0847](docs/backlog/P2/B-0847-each-ai-gets-own-github-identity-with-email-once-cluster-operational-substrate-honest-attribution-end-to-end-closes-enabledby-token-owner-not-actor-algo-wink-aaron-2026-05-26.md) (per-AI GitHub identity — this row uses borrowed gh auth until that ratifies) · [B-0848](docs/backlog/P2/B-0848-node-local-claude-agent-stewards-own-registration-pr-then-reports-k8s-cluster-status-operator-interactive-login-pattern-aaron-2026-05-26.md) (node-local Claude agent — this row IS Phase 2 automation) · #5387 (multi-protocol name resolution — samba additions compose for NetBIOS tooling) + +Per \`.claude/rules/non-coercion-invariant.md\` HC-8: operator interactive YES/n prompt preserves authority over whether to auth at install time vs post-reboot. + +🤖 Generated with [Claude Code](https://claude.com/claude-code) + +## Reviews + +### COMMENTED — @copilot-pull-request-reviewer (2026-05-27T02:43:57Z) + +## Pull request overview + +This PR extends the NixOS cluster install flow to install Claude Code at install-time, persist both GitHub + Claude credentials into the installed `zeta` user’s home, and pre-clone the Zeta repo so the first interactive login can start work immediately. + +**Changes:** +- Add installer Step 6.95 to `zeta-install.sh`: npm-install `@anthropic-ai/claude-code`, optional interactive `claude login`, copy `/root/.config/gh` into the installed home, and pre-clone the repo to `/mnt/home/zeta/Zeta`. +- Update the shared NixOS `common.nix` baseline to include `nodejs_22`, `samba`, and a PATH hook for `~/.npm-global/bin` (plus an attempted `NPM_CONFIG_PREFIX` session variable). + +### Reviewed changes + +Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments. + +| File | Description | +| ---- | ----------- | +| full-ai-cluster/usb-nixos-installer/zeta-install.sh | Adds Step 6.95 for claude-code install, interactive login, gh credential persistence, and repo pre-clone. | +| full-ai-cluster/nixos/modules/common.nix | Adds nodejs + samba packages and user PATH/session env wiring intended to make `claude` usable on first login. | + +### COMMENTED — @AceHack (2026-05-27T02:46:00Z) + +_(no body)_ + +### COMMENTED — @AceHack (2026-05-27T02:46:02Z) + +_(no body)_ + +## Review threads + +### Thread 1: full-ai-cluster/usb-nixos-installer/zeta-install.sh:1069 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T02:43:55Z): + +P0 (bug): Step 6.95 hard-codes `ZETA_UID=1000` and `ZETA_GID=100`. If the installed `zeta` user ends up with different IDs (e.g., another user is created first, or GID differs), this will chown files to the wrong owner and `sudo -u "#1000"` will run as a UID that isn’t the `zeta` account post-reboot. Resolve UID/GID from the installed system (e.g., parse `/mnt/etc/passwd`+`/mnt/etc/group` or `chroot /mnt id -u zeta` / `id -g zeta`) before using them. + +### Thread 2: full-ai-cluster/usb-nixos-installer/zeta-install.sh:1102 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T02:43:56Z): + +P0 (security): This step locks down `~/.config/gh`, but does not restrict permissions on `~/.config/claude` after `claude login`. If the CLI writes tokens/config with default umask, credentials may remain group/world-readable in the installed home directory. After a successful login, ensure `$ZETA_HOME/.config/claude` is owned by the zeta user and chmod’d similarly (e.g., remove group/other perms). + +### Thread 3: full-ai-cluster/usb-nixos-installer/zeta-install.sh:1084 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T02:43:56Z): + +P1 (bug/maintainability): `npm install -g` is executed as root with `HOME="$ZETA_HOME"`. npm commonly creates additional state under `$HOME` (e.g., cache dirs / config files) which may become root-owned and later break npm usage for the `zeta` user post-reboot. Prefer running npm as the target user (`sudo -u "#" HOME=... NPM_CONFIG_PREFIX=... npm install -g ...`) so all created files are owned correctly without needing broad `chown -R` repairs. + +**@AceHack** (2026-05-27T02:46:00Z): + +Stale — Aaron caught the nodejs/npm violation of Rule 0 (`.claude/rules/rule-0-no-sh-files.md`) and commit `7f3e29f1d` migrated to `sudo -u "#$ZETA_UID" bun install --global` (bun is canonical TS/JS runtime in Zeta). Current HEAD already uses target-user sudo + `bun install --global`. Resolving no-op. + +### Thread 4: full-ai-cluster/usb-nixos-installer/zeta-install.sh:1041 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T02:43:56Z): + +P2 (documentation): The “Skip conditions” list mentions `GH_AUTH_OK != 1`, but Step 6.95 currently only checks whether `$ZETA_HOME` exists and otherwise proceeds regardless of `GH_AUTH_OK`. Either implement the guard (if intended) or update the comment so it matches the actual skip logic. + +### Thread 5: full-ai-cluster/nixos/modules/common.nix:143 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T02:43:56Z): + +P1 (bug): `environment.sessionVariables.NPM_CONFIG_PREFIX = "$HOME/.npm-global"` will be set literally (no `$HOME` expansion), and the comment below already notes this expansion problem. As-is, npm will see a prefix containing the characters `$HOME` rather than an absolute path. Move the `NPM_CONFIG_PREFIX` export into the `/etc/profile.d` snippet (or set an absolute path) so it expands correctly for login shells. + +**@AceHack** (2026-05-27T02:46:01Z): + +Stale — same bun-migration commit `7f3e29f1d` replaced `NPM_CONFIG_PREFIX = "$HOME/.npm-global"` with `BUN_INSTALL = "$HOME/.bun"`. Per NixOS docs `environment.sessionVariables`: "Values can refer to other environment variables using the $VAR or ${VAR} syntax" — NixOS DOES expand $HOME for sessionVariables (it generates `export VAR="$HOME/..."` in /etc/profile.d snippets where shell expansion fires at source time). Resolving no-op. + +### Thread 6: full-ai-cluster/nixos/modules/common.nix:118 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T02:43:57Z): + +P2 (documentation): The comment says this `samba` package “composes with services.samba below”, but this module does not configure `services.samba`. Consider updating the comment to reference the actual module that enables Samba/nmbd (or reword to avoid implying it’s configured here). + +## General comments + +### @chatgpt-codex-connector (2026-05-27T02:40:53Z) + +You have reached your Codex usage limits for code reviews. You can see your limits in the [Codex usage dashboard](https://chatgpt.com/codex/cloud/settings/usage). diff --git a/docs/pr-discussions/PR-5389-fix-iter-5-5-1-fix-fwd-pr-5388-alignment-nixpkgs-bun-mise-ca.md b/docs/pr-discussions/PR-5389-fix-iter-5-5-1-fix-fwd-pr-5388-alignment-nixpkgs-bun-mise-ca.md new file mode 100644 index 0000000000..2111b1dfc0 --- /dev/null +++ b/docs/pr-discussions/PR-5389-fix-iter-5-5-1-fix-fwd-pr-5388-alignment-nixpkgs-bun-mise-ca.md @@ -0,0 +1,55 @@ +--- +pr_number: 5389 +title: "fix(iter-5.5.1 fix-fwd PR #5388 ALIGNMENT): nixpkgs bun \u2192 mise (canonical .mise.toml SSoT); linux.sh NixOS detection; zeta-install.sh invokes tools/setup/install.sh (operator default per 'this is our default')" +author: "AceHack" +state: "MERGED" +created_at: "2026-05-27T02:50:23Z" +merged_at: "2026-05-27T02:52:53Z" +closed_at: "2026-05-27T02:52:54Z" +head_ref: "fix-iter551-alignment-bun-to-mise-canonical-install-sh-fixfwd-pr-5388-2026-05-26-2353z" +base_ref: "main" +archived_at: "2026-05-27T19:27:16Z" +archive_tool: "tools/pr-preservation/archive-pr.ts" +--- + +# PR #5389: fix(iter-5.5.1 fix-fwd PR #5388 ALIGNMENT): nixpkgs bun → mise (canonical .mise.toml SSoT); linux.sh NixOS detection; zeta-install.sh invokes tools/setup/install.sh (operator default per 'this is our default') + +## PR description + +## Summary + +Fix-forward for PR #5388 which merged BEFORE the alignment fix landed. Operator caught the drift: + +> *\"future mise we already do this we've drifed for nixos for some reason for bun\"* + +> *\"our install.sh for mac and linux this is our default\"* + +PR #5388 added \`bun\` via nixpkgs systemPackages on cluster nodes — DRIFTED from the canonical \`.mise.toml\` (line 33: \`bun = \"1.3\"\`) used everywhere else (dev laptops + CI runners + devcontainers per GOVERNANCE §24 three-way-parity). + +## 3-surface alignment + +1. **common.nix** — \`bun\` removed; replaced with \`mise\` (canonical runtime version manager). mise then installs bun + all other .mise.toml-pinned runtimes for the zeta user. + +2. **tools/setup/linux.sh** — added NixOS detection via \`/etc/NIXOS\` marker file. Skips apt step (NixOS handles system packages via common.nix systemPackages declaratively); proceeds to mise + downstream runtime setup. Three-way-parity extended to NixOS per operator framing. + +3. **zeta-install.sh Step 6.95a** — replaces inline \`bun install --global\` with invocation of \`tools/setup/install.sh\` from the pre-cloned Zeta repo. Order rearranged: repo clone (was 6.95d) moved into 6.95a-bootstrap so .mise.toml is readable when install.sh fires. Then claude-code install uses mise-managed bun via shim PATH from \`mise activate bash\`. + +## Composes with + +- [B-0824](docs/backlog/P1/B-0824-package-manager-of-package-managers-n-dimensional-dependency-space-holographic-projection-ai-rate-continuous-upstream-negotiation-aaron-2026-05-26.md) (Ace package-manager-of-package-managers — extending install.sh's three-way-parity to NixOS IS Ace at today's bash-glue layer) +- PR #5388 (iter-5.5.0 substrate this fix-fwds) +- \`.mise.toml\` (canonical runtime pins at repo root) +- GOVERNANCE §24 (three-way parity — dev/CI/devcontainer; now extended to NixOS cluster) + +## Test plan + +- [ ] CI passes +- [ ] Next install on NixOS node validates: \`tools/setup/install.sh\` runs cleanly + installs bun = 1.3 + claude-code lands at ~/.bun/bin/claude + +🤖 Generated with [Claude Code](https://claude.com/claude-code) + +## General comments + +### @chatgpt-codex-connector (2026-05-27T02:50:28Z) + +You have reached your Codex usage limits for code reviews. You can see your limits in the [Codex usage dashboard](https://chatgpt.com/codex/cloud/settings/usage). diff --git a/docs/pr-discussions/PR-5390-feat-b-0849-docker-based-nixos-install-sh-test-harness-fast.md b/docs/pr-discussions/PR-5390-feat-b-0849-docker-based-nixos-install-sh-test-harness-fast.md new file mode 100644 index 0000000000..388d84857f --- /dev/null +++ b/docs/pr-discussions/PR-5390-feat-b-0849-docker-based-nixos-install-sh-test-harness-fast.md @@ -0,0 +1,49 @@ +--- +pr_number: 5390 +title: "feat(B-0849): docker-based NixOS install.sh test harness \u2014 fast iteration (~30 sec) complementing B-0831 QEMU full-install (~15 min); 'easy dockerfile' (Aaron 2026-05-27)" +author: "AceHack" +state: "MERGED" +created_at: "2026-05-27T02:51:21Z" +merged_at: "2026-05-27T02:53:30Z" +closed_at: "2026-05-27T02:53:30Z" +head_ref: "feat-b0849-docker-nixos-install-sh-test-harness-fast-iteration-2026-05-26-2356z" +base_ref: "main" +archived_at: "2026-05-27T19:27:15Z" +archive_tool: "tools/pr-preservation/archive-pr.ts" +--- + +# PR #5390: feat(B-0849): docker-based NixOS install.sh test harness — fast iteration (~30 sec) complementing B-0831 QEMU full-install (~15 min); 'easy dockerfile' (Aaron 2026-05-27) + +## PR description + +## Summary + +Operator (verbatim): + +> *\"we should add docker based nixos install.sh testing so we can iterate quick that's an easy dockerfile\"* + +Direct response after PR #5389 (iter-5.5.1 alignment fix-fwd) — operator named the iteration-cost problem: every install.sh / linux.sh / mise.sh change today requires full ISO build + USB flash + physical install (~30 min cycle). Docker testing of just the script on NixOS userspace gives seconds-per-iteration. + +## Empirical case + +iter-5.4 cascade produced **8 distinct bugs** (Bug 1-8) ALL caught only after operator USB flash. Docker harness would have caught Bug 5 (gh not in systemPackages), Bug 7 (NetBIOS conflict with smbd), Bug 8 (credential persistence gap) **at write time**. + +## 3-phase plan + +| Phase | Scope | Cycle time | +|---|---|---| +| 1 | \`tools/ci/dockerfiles/nixos-install-sh-test/Dockerfile\` + TS wrapper | ~30-60 sec local | +| 2 | GitHub Actions integration with path-filter | per-PR auto | +| 3 | Docker-vs-QEMU coverage matrix doc | composes with B-0831 | + +## Composes with + +[B-0831](docs/backlog/P2/B-0831-...) (QEMU full-install — complementary) · [B-0835](docs/backlog/P1/B-0835-...) (install bug cluster) · [B-0848](docs/backlog/P2/B-0848-...) (node-local Claude) · [B-0824](docs/backlog/P1/B-0824-package-manager-of-package-managers-n-dimensional-dependency-space-holographic-projection-ai-rate-continuous-upstream-negotiation-aaron-2026-05-26.md) (Ace) · GOVERNANCE §24 (three-way parity extended to NixOS-via-Docker) + +🤖 Generated with [Claude Code](https://claude.com/claude-code) + +## General comments + +### @chatgpt-codex-connector (2026-05-27T02:51:25Z) + +You have reached your Codex usage limits for code reviews. You can see your limits in the [Codex usage dashboard](https://chatgpt.com/codex/cloud/settings/usage). diff --git a/docs/pr-discussions/PR-5391-feat-b-0850-ai-agents-as-systemd-services-outside-k8s-starti.md b/docs/pr-discussions/PR-5391-feat-b-0850-ai-agents-as-systemd-services-outside-k8s-starti.md new file mode 100644 index 0000000000..ba3e817f88 --- /dev/null +++ b/docs/pr-discussions/PR-5391-feat-b-0850-ai-agents-as-systemd-services-outside-k8s-starti.md @@ -0,0 +1,75 @@ +--- +pr_number: 5391 +title: "feat(B-0850): AI agents as systemd services OUTSIDE k8s \u2014 starting with Otto; cluster repair from OUTSIDE the failure domain; 'control plane outside the control plane' pattern (Aaron 2026-05-27)" +author: "AceHack" +state: "MERGED" +created_at: "2026-05-27T03:03:11Z" +merged_at: "2026-05-27T03:05:12Z" +closed_at: "2026-05-27T03:05:12Z" +head_ref: "feat-b0850-otto-as-systemd-service-outside-k8s-cluster-repair-from-outside-failure-domain-2026-05-27-0042z" +base_ref: "main" +archived_at: "2026-05-27T19:27:14Z" +archive_tool: "tools/pr-preservation/archive-pr.ts" +--- + +# PR #5391: feat(B-0850): AI agents as systemd services OUTSIDE k8s — starting with Otto; cluster repair from OUTSIDE the failure domain; 'control plane outside the control plane' pattern (Aaron 2026-05-27) + +## PR description + +## Summary + +Aaron 2026-05-27 (verbatim): + +> *\"i'm fine with it being you if you want and we can always decide to split later it just means you get another surface/tick source i think we should have a few agents starting with one you otto outside k8s as a service so it can repair things outside the cluster itself when there are cluster issues.\"* + +Three operator decisions: + +1. **Persona-choice CONFIRMED**: Option A (same Otto, surface-tagged); reversibility preserved +2. **Cross-surface recognition**: per-node Otto = another tick source +3. **NEW substrate**: Otto-as-systemd-service OUTSIDE k8s for out-of-band cluster repair + +## Architectural pattern + +Classic **\"control plane outside the control plane\"** — when k8s has issues, the AI must be OUTSIDE the failure domain to repair it. Precedents: kubelet itself runs outside k8s; SRE oncall infrastructure runs outside production; backup systems run outside the system they back up. + +## 4-phase landing + +| Phase | Scope | Operator-policy gate | +|---|---|---| +| 1 | systemd unit (zeta-otto.service) NixOS module | None (read-only K8s) | +| 2 | repair-policy framework + per-scope authorization | per-scope explicit | +| 3 | multi-agent parameterization (Alexa/Riven/Vera/Lior) | Ilyana + Knights Guild | +| 4 | out-of-band ↔ in-cluster composability (Twilio + bus + PRs) | composes B-0796 | + +## Composes with + +[B-0848](docs/backlog/P2/B-0848-...) (node-local Claude — this row's Phase 1 IS systemd deployment shape) · [B-0847](docs/backlog/P2/B-0847-...) (per-AI GitHub identity) · [B-0796](docs/backlog/P2/B-0796-...) (Twilio out-of-band sibling) · [B-0824](docs/backlog/P1/B-0824-...) (Ace multi-PM at multi-AI scope) · PR #2930 (distributed maintainer architecture) · B-0703 (multi-oracle BFT) · B-0813 + B-0817 (ClusterNode CRD + register-node) + +🤖 Generated with [Claude Code](https://claude.com/claude-code) + +## Reviews + +### COMMENTED — @copilot-pull-request-reviewer (2026-05-27T03:04:08Z) + +## Pull request overview + +Adds a new P2 backlog row B-0850 capturing the operator decision to run AI agents (starting with Otto) as systemd services outside k8s for out-of-band cluster repair, and registers it in the backlog index. + +**Changes:** +- New per-row file under `docs/backlog/P2/` with frontmatter, phased plan, acceptance criteria, and composes-with references. +- Adds the B-0850 entry to `docs/BACKLOG.md` P2 open list. + +### Reviewed changes + +Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments. + +| File | Description | +| ---- | ----------- | +| docs/backlog/P2/B-0850-...md | New backlog row documenting Otto-as-systemd-service substrate. | +| docs/BACKLOG.md | Registers B-0850 in P2 open list. | + +## General comments + +### @chatgpt-codex-connector (2026-05-27T03:03:16Z) + +You have reached your Codex usage limits for code reviews. You can see your limits in the [Codex usage dashboard](https://chatgpt.com/codex/cloud/settings/usage). diff --git a/docs/pr-discussions/PR-5392-feat-b-0850-phase-1-zeta-otto-systemd-unit-nixos-module-cont.md b/docs/pr-discussions/PR-5392-feat-b-0850-phase-1-zeta-otto-systemd-unit-nixos-module-cont.md new file mode 100644 index 0000000000..97f8ab0754 --- /dev/null +++ b/docs/pr-discussions/PR-5392-feat-b-0850-phase-1-zeta-otto-systemd-unit-nixos-module-cont.md @@ -0,0 +1,127 @@ +--- +pr_number: 5392 +title: "feat(B-0850 Phase 1): zeta-otto systemd unit NixOS module + control-plane enable \u2014 claude service auto-starts on reboot using persisted iter-5.5.0 device-code creds (Aaron 2026-05-27)" +author: "AceHack" +state: "MERGED" +created_at: "2026-05-27T03:04:52Z" +merged_at: "2026-05-27T03:07:59Z" +closed_at: "2026-05-27T03:07:59Z" +head_ref: "feat-b0850-1-zeta-otto-systemd-unit-nixos-module-autostart-on-reboot-2026-05-27-0050z" +base_ref: "main" +archived_at: "2026-05-27T19:27:13Z" +archive_tool: "tools/pr-preservation/archive-pr.ts" +--- + +# PR #5392: feat(B-0850 Phase 1): zeta-otto systemd unit NixOS module + control-plane enable — claude service auto-starts on reboot using persisted iter-5.5.0 device-code creds (Aaron 2026-05-27) + +## PR description + +## Summary + +Aaron 2026-05-27 (verbatim): + +> *\"so our usb after gh and claude device code login it should reboot with a claude service using my gh login\"* + +Direct composition with iter-5.5.0 substrate (PR #5388 + #5389) which persists creds + pre-clones repo + installs claude. This PR adds the systemd unit so claude auto-starts on reboot AS A SERVICE. + +## 3 files + +1. **`full-ai-cluster/nixos/modules/zeta-otto.nix`** (NEW) — systemd unit (User=zeta, Restart=always, MemoryMax=4G, CPUQuota=200%); loops claude per tickIntervalSec; deliberately NOT After=k3s.service (Otto must run regardless of k3s state per the \"control plane outside the control plane\" pattern) +2. **`common.nix`** — import the new module (disabled by default) +3. **control-plane/configuration.nix** — \`zeta.otto.enable = true\` opt-in + +## Operator usage + +\`\`\`bash +systemctl status zeta-otto # current state +journalctl -u zeta-otto -f # live logs +systemctl restart zeta-otto # restart +systemctl disable zeta-otto # stop auto-start (NCI HC-8 revocable) +\`\`\` + +## Operator-tunable options + +- \`zeta.otto.enable\` — opt-in per node +- \`zeta.otto.tickIntervalSec\` (default 60) — autonomous-loop cadence +- \`zeta.otto.memoryMax\` (default 4G) — resource bound +- \`zeta.otto.cpuQuota\` (default 200%) — CPU quota +- \`zeta.otto.restartSec\` (default 30) — restart backoff + +## Composes with + +[B-0848](docs/backlog/P2/B-0848-...) (node-local Claude — this PR IS systemd deployment shape) · [B-0847](docs/backlog/P2/B-0847-...) (per-AI GitHub identity — Phase 4 aligns) · [B-0796](docs/backlog/P2/B-0796-...) (Twilio out-of-band sibling) · PRs #5388 + #5389 (iter-5.5.0 credential persistence layer this consumes) · [B-0850](docs/backlog/P2/B-0850-...) (this PR is the row's Phase 1) + +🤖 Generated with [Claude Code](https://claude.com/claude-code) + +## Reviews + +### COMMENTED — @copilot-pull-request-reviewer (2026-05-27T03:08:31Z) + +## Pull request overview + +Adds a NixOS module to run “zeta-otto” as a persistent systemd service (outside Kubernetes) and enables it on the control-plane host, relying on the iter-5.5.0 install substrate for persisted `gh`/`claude` credentials and a pre-cloned repo. + +**Changes:** +- Introduces `zeta.otto.*` NixOS module options and a `zeta-otto` systemd unit that loops `claude --print` on a tick interval. +- Imports the new module into the shared cluster baseline. +- Opts the control-plane host into running the service at boot. + +### Reviewed changes + +Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments. + +| File | Description | +| ---- | ----------- | +| full-ai-cluster/nixos/modules/zeta-otto.nix | New NixOS module defining `zeta-otto` systemd service + options + an operator hint file. | +| full-ai-cluster/nixos/modules/common.nix | Imports the new `zeta-otto` module into the baseline module set. | +| full-ai-cluster/nixos/hosts/control-plane/configuration.nix | Enables `zeta.otto.enable = true` on the control-plane node. | + +## Review threads + +### Thread 1: full-ai-cluster/nixos/modules/zeta-otto.nix:9 (unresolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:08:29Z): + +P1 (codebase convention): This comment uses a direct human name ("Aaron"). Per the repo's "No name attribution in code, docs, or skills" rule, current-state code/config should use role references (e.g., "the operator" / "the human maintainer") instead of personal names. + +### Thread 2: full-ai-cluster/nixos/modules/zeta-otto.nix:77 (unresolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:08:29Z): + +P2 (spelling/typo): The comment has a duplicated '#': "# #5389". Consider correcting to a single reference to avoid confusion/markdown-ish rendering issues. + +### Thread 3: full-ai-cluster/nixos/modules/zeta-otto.nix:107 (unresolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:08:30Z): + +P1 (operational): The service PATH omits `/run/wrappers/bin`, which is where NixOS exposes important setuid wrappers like `sudo`. If Otto is expected to perform repair tasks that involve sudo/privileged wrappers, include `/run/wrappers/bin` (and typically `/run/current-system/sw/sbin`) to avoid "command not found" failures. + +### Thread 4: full-ai-cluster/nixos/modules/zeta-otto.nix:156 (unresolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:08:30Z): + +P1 (documentation/ops): The status hint suggests `systemctl disable zeta-otto` to stop auto-start. On NixOS, systemd unit enablement is declarative and may be re-applied on the next `nixos-rebuild switch`; for a durable disable, instruct operators to set `zeta.otto.enable = false;` and rebuild (and optionally mention `systemctl stop zeta-otto` for a temporary stop until rebuild). + +### Thread 5: full-ai-cluster/nixos/hosts/control-plane/configuration.nix:24 (unresolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:08:30Z): + +P1 (codebase convention): This comment block uses a direct human name ("Aaron"). Per the repo's "No name attribution in code, docs, or skills" rule, current-state configuration should use role references instead (e.g., "the operator" / "the human maintainer"). + +### Thread 6: full-ai-cluster/nixos/hosts/control-plane/configuration.nix:29 (unresolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:08:30Z): + +P1 (documentation/ops): `systemctl disable zeta-otto` is not a durable opt-out on NixOS because `zeta.otto.enable = true` will be re-applied on rebuild. Prefer documenting the declarative disable path (`zeta.otto.enable = false; nixos-rebuild switch`) and optionally `systemctl stop` for temporary stopping. + +### Thread 7: full-ai-cluster/nixos/modules/zeta-otto.nix:128 (unresolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:08:31Z): + +P1 (operational): The loop suppresses all `claude` failures (`... || true`), which means systemd will consider the service healthy even if every tick fails (e.g., missing binary/creds/network) and `RestartSec` backoff never engages. Consider letting failures terminate the service (or at least exiting after N consecutive failures) so systemd restart/backoff provides a clearer fault signal and avoids endless log spam. + +## General comments + +### @chatgpt-codex-connector (2026-05-27T03:04:57Z) + +You have reached your Codex usage limits for code reviews. You can see your limits in the [Codex usage dashboard](https://chatgpt.com/codex/cloud/settings/usage). diff --git a/docs/pr-discussions/PR-5393-feat-b-0849-phase-1-docker-nixos-install-sh-test-harness-fas.md b/docs/pr-discussions/PR-5393-feat-b-0849-phase-1-docker-nixos-install-sh-test-harness-fas.md new file mode 100644 index 0000000000..1abff8e83f --- /dev/null +++ b/docs/pr-discussions/PR-5393-feat-b-0849-phase-1-docker-nixos-install-sh-test-harness-fas.md @@ -0,0 +1,227 @@ +--- +pr_number: 5393 +title: "feat(B-0849 Phase 1): Docker NixOS install.sh test harness \u2014 fast iteration (~30-60 sec) for install.sh + mise + bun + iter-5.5.0; complements B-0831 QEMU (Aaron 2026-05-27)" +author: "AceHack" +state: "MERGED" +created_at: "2026-05-27T03:16:21Z" +merged_at: "2026-05-27T03:47:31Z" +closed_at: "2026-05-27T03:47:31Z" +head_ref: "feat-b0849-1-docker-nixos-install-sh-test-harness-implementation-2026-05-27-0136z" +base_ref: "main" +archived_at: "2026-05-27T19:27:12Z" +archive_tool: "tools/pr-preservation/archive-pr.ts" +--- + +# PR #5393: feat(B-0849 Phase 1): Docker NixOS install.sh test harness — fast iteration (~30-60 sec) for install.sh + mise + bun + iter-5.5.0; complements B-0831 QEMU (Aaron 2026-05-27) + +## PR description + +## Summary + +Aaron 2026-05-27: *"we should add docker based nixos install.sh testing so we can iterate quick that's an easy dockerfile"* + +Implements [B-0849](docs/backlog/P2/B-0849-...) Phase 1 — bounded-iteration test harness for the install.sh / linux.sh / mise.sh substrate. + +## 3 files + +1. **`tools/ci/dockerfiles/nixos-install-sh-test/Dockerfile`** — `nixos/nix:2.31.2` pinned base + `/etc/NIXOS` marker + run install.sh + validate bun (1.3.x exact pin)/claude/gh +2. **`tools/ci/docker-nixos-install-sh-test.ts`** — TS wrapper per Rule 0 with exit-code mapping + log capture (default `.tools/docker-nixos-install-sh-test.log`) + timeout + centralized `spawnDocker` helper (sonarjs suppression + 64 MiB maxBuffer) +3. **`.dockerignore`** — NEW; excludes `references/upstreams/` (gigabytes per the rule), `node_modules/`, `.git/`, build outputs, IDE scratch, ISO/qcow2 artifacts. **Affects ALL docker builds run from repo root** — substrate-honest scope flag. + +## Validation coverage + +| Layer | Check | +|---|---| +| linux.sh NixOS detection | `touch /etc/NIXOS` makes linux.sh route to mise.sh | +| mise install | nix-shell bootstraps mise + reads .mise.toml | +| bun via mise | `bun --version` matches .mise.toml pin `1.3` EXACTLY | +| claude-code via bun | `set -o pipefail` + `bun install --global @anthropic-ai/claude-code` | +| gh via nix | `nix-shell -p gh` install + version check | + +## Cycle-time tradeoff + +| Surface | Validates | Cycle | +|---|---|---| +| Operator USB | End-to-end + reboot | ~30+ min | +| B-0831 QEMU | End-to-end virtualized | ~15 min | +| **B-0849 Docker (THIS PR)** | install.sh on NixOS userspace | **~30-60 sec** | + +## Usage + +\`\`\`bash +bun tools/ci/docker-nixos-install-sh-test.ts # default 600s timeout +bun tools/ci/docker-nixos-install-sh-test.ts --keep-image # inspect after +DOCKER_BUILD_TIMEOUT_SEC=900 bun tools/ci/docker-nixos-install-sh-test.ts +\`\`\` + +## Composes with + +[B-0824](docs/backlog/P1/B-0824-...) · [B-0831](docs/backlog/P2/B-0831-...) · [B-0835](docs/backlog/P1/B-0835-...) · [B-0848](docs/backlog/P2/B-0848-...) + [B-0850](docs/backlog/P2/B-0850-...) + +## Copilot review responses + +10 findings across 2 review batches all addressed: unused import, name attribution, spawnDocker centralization, .dockerignore for repo root, dirname() cross-platform, nixos/nix pin, bun version exact match, pipefail propagation, ENV PATH for mise across Docker layers, gitignored default log path. All threads resolved. + +🤖 Generated with [Claude Code](https://claude.com/claude-code) + +## Reviews + +### COMMENTED — @github-code-quality (2026-05-27T03:17:53Z) + +_(no body)_ + +### COMMENTED — @copilot-pull-request-reviewer (2026-05-27T03:21:09Z) + +## Pull request overview + +Adds a fast-iteration Docker-based NixOS userspace harness intended to validate `tools/setup/install.sh`’s NixOS detection path (via `/etc/NIXOS`) plus key post-install tooling checks, providing a quicker feedback loop than the existing QEMU full-install test. + +**Changes:** +- Introduces a `nixos/nix`-based Dockerfile that runs `tools/setup/install.sh` and validates bun + claude-code + gh availability. +- Adds a Bun/TypeScript wrapper to run `docker build` with log capture, timeout, and optional image retention. + +### Reviewed changes + +Copilot reviewed 2 out of 2 changed files in this pull request and generated 10 comments. + +| File | Description | +| ---- | ----------- | +| tools/ci/dockerfiles/nixos-install-sh-test/Dockerfile | Docker build recipe to exercise `install.sh` on NixOS userspace and validate bun/claude-code/gh. | +| tools/ci/docker-nixos-install-sh-test.ts | Bun/TS wrapper that invokes `docker build`, captures logs, enforces a timeout, and cleans up the image. | + +### COMMENTED — @copilot-pull-request-reviewer (2026-05-27T03:27:17Z) + +## Pull request overview + +Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments. + +### COMMENTED — @copilot-pull-request-reviewer (2026-05-27T03:36:51Z) + +## Pull request overview + +Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments. + +## Review threads + +### Thread 1: tools/ci/docker-nixos-install-sh-test.ts:50 (resolved) + +**@github-code-quality** (2026-05-27T03:17:53Z): + +## Unused variable, import, function or class + +Unused import join. + +--- + +Remove the unused join named import from node:path in tools/ci/docker-nixos-install-sh-test.ts, leaving only resolve.

+

This is the minimal, behavior-preserving fix:

+ +

No new methods, definitions, or dependencies are needed. + +### Thread 2: tools/ci/docker-nixos-install-sh-test.ts:74 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:21:06Z): + +P0: `join` is imported from `node:path` but never used; with `noUnusedLocals: true` this will fail TypeScript typecheck/lint. Remove the unused import or use it. + +### Thread 3: tools/ci/docker-nixos-install-sh-test.ts:24 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:21:07Z): + +This header comment includes a personal name ("Aaron"). Repo convention is to avoid name attribution in code/docs/skills outside the allowed history surfaces; use a role-ref (e.g., "operator") or drop the attribution (see .github/copilot-instructions.md:305-366). + +### Thread 4: tools/ci/docker-nixos-install-sh-test.ts:108 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:21:07Z): + +`spawnSync("docker", ...)` will trigger `sonarjs/no-os-command-from-path` under this repo’s eslint config (see tools/ci/audit-installer-iso-content.ts for the required suppression+rationale pattern). Add a documented suppression or switch to an invocation pattern that satisfies the rule. + +### Thread 5: tools/ci/docker-nixos-install-sh-test.ts:120 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:21:07Z): + +`spawnSync("docker", buildArgs, ...)` buffers all output in memory and currently doesn’t set `maxBuffer`; large `docker build --progress=plain` output can exceed Node’s default buffer and fail the run. Consider raising `maxBuffer` (as done elsewhere) or streaming output to the log file instead of fully buffering it. + +### Thread 6: tools/ci/docker-nixos-install-sh-test.ts:139 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:21:07Z): + +The build context is set to the repo root (`."`). Without a `.dockerignore`, this will send large directories (e.g., `references/upstreams/**`) to the Docker daemon and can make each run very slow. Consider a reduced build context (staged temp dir) or adding a `.dockerignore` that excludes heavy paths. + +### Thread 7: tools/ci/docker-nixos-install-sh-test.ts:242 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:21:08Z): + +Log directory creation uses `logPath.lastIndexOf("/")`, which breaks on Windows paths (backslashes) and can cause `writeFileSync` to fail when `DOCKER_LOG_OUT_PATH` points to a non-existent directory. Prefer `path.dirname(logPath)` to compute the directory portably. + +### Thread 8: tools/ci/docker-nixos-install-sh-test.ts:176 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:21:08Z): + +`spawnSync("docker", ["rmi", ...])` is another PATH-resolved OS command and will hit the same `sonarjs/no-os-command-from-path` lint rule as the other `docker` spawns. Add the same documented suppression here (or refactor to a single helper that centralizes the suppression/rationale). + +### Thread 9: tools/ci/dockerfiles/nixos-install-sh-test/Dockerfile:25 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:21:08Z): + +Using `nixos/nix:latest` makes the harness non-deterministic (base image changes can break the test without repo changes). Consider pinning to a specific tag or digest to keep CI behavior stable. + +### Thread 10: tools/ci/dockerfiles/nixos-install-sh-test/Dockerfile:65 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:21:08Z): + +The bun version check only asserts major version `^1\.` but the comment says it should match the `.mise.toml` pin (`bun = "1.3"`). Tighten the check to validate the pinned minor (or exact) version so drift is caught. + +### Thread 11: tools/ci/dockerfiles/nixos-install-sh-test/Dockerfile:72 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:21:09Z): + +`bun install --global ... | tail -5` masks the exit status of `bun install` (pipeline returns `tail`’s status), so failures can be missed if a cached/stale `claude` binary exists. Ensure the build fails when `bun install` fails (e.g., enable `pipefail` for this RUN or avoid piping). + +### Thread 12: tools/ci/dockerfiles/nixos-install-sh-test/Dockerfile:100 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:27:16Z): + +P0: The validation RUN steps call `mise`/`bun` in a new Docker layer, but `tools/setup/install.sh` only exports PATH inside its own process (mise is installed into `$HOME/.local/bin`, shims into `$HOME/.local/share/mise/shims`). Docker does not persist those exports across RUN instructions, so these steps can fail with `mise: command not found` / missing bun unless PATH is made persistent (e.g., via `ENV PATH=...` and/or sourcing the generated `$HOME/.config/zeta/shellenv.sh` before running the checks). + +### Thread 13: tools/ci/dockerfiles/nixos-install-sh-test/Dockerfile:32 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:27:17Z): + +P2: The comment “latest stable … as of training data” is not actionable/traceable in-repo and will read oddly to future maintainers. Prefer an objective anchor (e.g., the date of the pin decision and/or a linkable bump procedure) and remove references to “training data”. + +### Thread 14: tools/ci/docker-nixos-install-sh-test.ts:83 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:27:17Z): + +P1: The default log path `.docker-test-log` will create an unignored file in the repo root (it is not in `.gitignore`), which is likely to show up as untracked and get accidentally committed. Consider defaulting to an already-ignored scratch directory (e.g., under `.tools/` which is gitignored) or to the OS tempdir, while keeping `DOCKER_LOG_OUT_PATH` for CI artifact placement. + +### Thread 15: .dockerignore:6 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:27:17Z): + +PR metadata says “2 files”, but this PR also adds `.dockerignore`. Please update the PR description / summary to reflect the additional changed file so reviewers understand the full surface area (root-level docker build behavior changes for all Docker builds). + +### Thread 16: tools/ci/docker-nixos-install-sh-test.ts:36 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:36:50Z): + +P1: The header comment says the default DOCKER_LOG_OUT_PATH is workspace-relative `.docker-test-log`, but the code default is `.tools/docker-nixos-install-sh-test.log`. Please update the docstring to match the actual default so users don’t look for the wrong file (and so the usage docs stay accurate). + +### Thread 17: tools/ci/dockerfiles/nixos-install-sh-test/Dockerfile:34 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:36:50Z): + +P1: `FROM nixos/nix:2.31.2` is pinned by tag, but tags can still be mutable/rebuilt upstream. Since this Dockerfile is intended as a CI sentinel, consider pinning the base image by digest (`nixos/nix:2.31.2@sha256:...`) to make failures attributable to repo changes rather than upstream tag movement. + +## General comments + +### @chatgpt-codex-connector (2026-05-27T03:16:26Z) + +You have reached your Codex usage limits for code reviews. You can see your limits in the [Codex usage dashboard](https://chatgpt.com/codex/cloud/settings/usage). diff --git a/docs/pr-discussions/PR-5394-refactor-b-0850-phase-3-parameterized-zeta-ai-agent-nix-repl.md b/docs/pr-discussions/PR-5394-refactor-b-0850-phase-3-parameterized-zeta-ai-agent-nix-repl.md new file mode 100644 index 0000000000..ffef4dfecb --- /dev/null +++ b/docs/pr-discussions/PR-5394-refactor-b-0850-phase-3-parameterized-zeta-ai-agent-nix-repl.md @@ -0,0 +1,134 @@ +--- +pr_number: 5394 +title: "refactor(B-0850 Phase 3): parameterized zeta-ai-agent.nix replaces zeta-otto.nix \u2014 multi-vendor scaffold for \u22653-systemd-agents-on-bootup (Aaron 2026-05-27)" +author: "AceHack" +state: "MERGED" +created_at: "2026-05-27T03:18:35Z" +merged_at: "2026-05-27T03:21:02Z" +closed_at: "2026-05-27T03:21:02Z" +head_ref: "feat-b0850-3-zeta-ai-agent-parameterized-module-refactor-multi-vendor-scaffold-2026-05-27-0149z" +base_ref: "main" +archived_at: "2026-05-27T19:27:12Z" +archive_tool: "tools/pr-preservation/archive-pr.ts" +--- + +# PR #5394: refactor(B-0850 Phase 3): parameterized zeta-ai-agent.nix replaces zeta-otto.nix — multi-vendor scaffold for ≥3-systemd-agents-on-bootup (Aaron 2026-05-27) + +## PR description + +## Summary + +Aaron 2026-05-27 (verbatim): + +> *\"we should end up shipping with one service per surface i think outside k8s and have at least 3 different vendors\"* + +> *\"so they can fix each other and the k8s cluster even when it's down.\"* + +> *\"the mutual repair is critical too becasue of you can see your own future self boot script failures\"* + +> *\"yeah lets move all forward however and i can do as many iterations testing as possible before we move to pc two we should have three systemd agents and the cluster running on bootup\"* + +Parameterizes the Phase 1 zeta-otto.nix shape (PR #5392) into a multi-vendor multi-persona substrate. Each persona = separate systemd unit; per-persona opt-in via NixOS module option. + +## Refactor + +| File | Change | +|---|---| +| zeta-otto.nix | DELETED (superseded) | +| zeta-ai-agent.nix | NEW — parameterized over persona | +| common.nix | imports zeta-ai-agent.nix | +| control-plane/configuration.nix | `zeta.aiAgents.personas.otto.enable = true` | + +## Default personas (per agent-roster-reference-card) + +| Persona | Vendor | Binary | Sub-row | +|---|---|---|---| +| otto | anthropic | claude | shipped this PR | +| alexa | alibaba-qwen | kiro | B-0850.3a (pending) | +| riven | xai-grok | grok | B-0850.3b (pending) | +| vera | openai | codex | B-0850.3c (pending) | +| lior | google-gemini | gemini | B-0850.3d (pending) | + +Each persona enable lines pre-staged + commented in control-plane/configuration.nix with sub-row IDs. + +## Why ≥3 vendors (load-bearing) + +1. **Vendor-outage resilience** — Anthropic API down doesn't kill all cluster-AI substrate +2. **Self-modification safety** — when one AI's self-update breaks its boot script, OTHER AIs (different vendors) detect + repair +3. **BFT margin** — f=1 fault tolerance requires 3 nodes (or 2f+1 generally) + +Both Spec 2 + Spec 4 from the B-0850 Phase 3 memory independently justify the ≥3 floor. + +## Test plan + +- [ ] CI passes +- [ ] flake.nix evaluation succeeds with the new module +- [ ] On next install: Otto-as-systemd starts on first boot (same Phase 1 behavior, just via parameterized module) + +## Composes with + +iter-5.5.0 substrate (PRs #5388 + #5389) · [B-0848](docs/backlog/P2/B-0848-...) · [B-0847](docs/backlog/P2/B-0847-...) · [B-0796](docs/backlog/P2/B-0796-...) · [B-0703](docs/backlog/P*/B-0703-...) · [B-0849](docs/backlog/P2/B-0849-...) · [B-0824](docs/backlog/P1/B-0824-...) · [B-0850 row](docs/backlog/P2/B-0850-...) + +🤖 Generated with [Claude Code](https://claude.com/claude-code) + +## Reviews + +### COMMENTED — @copilot-pull-request-reviewer (2026-05-27T03:21:35Z) + +## Pull request overview + +Refactors the NixOS “Otto as systemd service” module into a parameterized, multi-persona scaffold so multiple AI agents (potentially from different vendors) can be enabled as independent systemd units outside k8s. + +**Changes:** +- Deleted the Phase 1 single-persona module (`zeta-otto.nix`) and introduced a generalized module (`zeta-ai-agent.nix`) that generates one systemd unit per enabled persona. +- Updated the shared NixOS module import list to pull in the new generalized module. +- Updated the control-plane host config to enable `otto` via the new option path. + +### Reviewed changes + +Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments. + +| File | Description | +| ---- | ----------- | +| full-ai-cluster/nixos/modules/zeta-otto.nix | Removed the Phase 1 single-agent NixOS module (superseded by generalized module). | +| full-ai-cluster/nixos/modules/zeta-ai-agent.nix | Added generalized NixOS module to generate per-persona AI agent systemd services. | +| full-ai-cluster/nixos/modules/common.nix | Switched imports from the deleted module to the new generalized module. | +| full-ai-cluster/nixos/hosts/control-plane/configuration.nix | Updated host config to enable the `otto` persona via `zeta.aiAgents.personas.otto.enable`. | + +## Review threads + +### Thread 1: full-ai-cluster/nixos/modules/zeta-ai-agent.nix:7 (unresolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:21:34Z): + +The header comment says personas are enabled via `zeta.aiAgents..enable = true;`, but the module option path is `zeta.aiAgents.personas..enable`. Please update the comment to match the actual configuration path so operators don’t copy an incorrect setting. + +### Thread 2: full-ai-cluster/nixos/modules/zeta-ai-agent.nix:220 (unresolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:21:34Z): + +`configDir` is defined for each persona and exposed as an option, but it’s not used anywhere in the module (e.g., for validation, environment, or the status file). Either remove it to avoid dead configuration, or use it (for example, to render accurate operator hints / checks). + +### Thread 3: full-ai-cluster/nixos/modules/zeta-ai-agent.nix:142 (unresolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:21:34Z): + +The loop hard-codes the executable path to `${cfg.home}/.bun/bin/${persona.binary}` even though the service `PATH` includes mise shims. This prevents running personas whose CLI is provided via mise shims (or any non-bun install location) without rewriting the module. Consider executing by binary name (rely on PATH) or making the per-persona executable path configurable. + +### Thread 4: full-ai-cluster/nixos/modules/zeta-ai-agent.nix:142 (unresolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:21:34Z): + +The agent command is suffixed with `|| true`, which causes the service to stay “active” even if the CLI is missing or failing on every tick; systemd won’t surface the failure state or apply restart/backoff semantics meaningfully. Prefer letting the tick failure propagate (or at least detect “command not found” and exit) so misconfiguration is visible via `systemctl status`. + +### Thread 5: full-ai-cluster/nixos/modules/zeta-ai-agent.nix:282 (unresolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:21:35Z): + +The status hint claims creds live under `~/.config//`, but for the default personas the credential directory is not the vendor string (e.g., Claude Code uses `~/.config/claude`). Use each persona’s `configDir` (or remove this line) so the hint doesn’t point operators to the wrong location. + +## General comments + +### @chatgpt-codex-connector (2026-05-27T03:18:40Z) + +You have reached your Codex usage limits for code reviews. You can see your limits in the [Codex usage dashboard](https://chatgpt.com/codex/cloud/settings/usage). diff --git a/docs/pr-discussions/PR-5395-fix-postmerge-5394-build-iso-failure-zeta-ai-agent-nix-stati.md b/docs/pr-discussions/PR-5395-fix-postmerge-5394-build-iso-failure-zeta-ai-agent-nix-stati.md new file mode 100644 index 0000000000..8f5be08235 --- /dev/null +++ b/docs/pr-discussions/PR-5395-fix-postmerge-5394-build-iso-failure-zeta-ai-agent-nix-stati.md @@ -0,0 +1,88 @@ +--- +pr_number: 5395 +title: "fix(postmerge-#5394 build-iso failure): zeta-ai-agent.nix static persona registry + per-persona enable. booleans (was problematic submodule-as-option that NixOS module-merge couldn't handle)" +author: "AceHack" +state: "MERGED" +created_at: "2026-05-27T03:21:36Z" +merged_at: "2026-05-27T03:28:49Z" +closed_at: "2026-05-27T03:28:49Z" +head_ref: "feat-b0850-3-zeta-ai-agent-parameterized-module-refactor-multi-vendor-scaffold-2026-05-27-0149z" +base_ref: "main" +archived_at: "2026-05-27T19:27:11Z" +archive_tool: "tools/pr-preservation/archive-pr.ts" +--- + +# PR #5395: fix(postmerge-#5394 build-iso failure): zeta-ai-agent.nix static persona registry + per-persona enable. booleans (was problematic submodule-as-option that NixOS module-merge couldn't handle) + +## PR description + +## Summary + +Fix-forward for [PR #5394](https://github.com/Lucent-Financial-Group/Zeta/pull/5394) which merged with build-iso failing as a non-required check. Main now has broken `zeta-ai-agent.nix` config that would fail flake evaluation: + +\`\`\` +error: The option \`zeta.aiAgents.personas.otto.description' was accessed but has no value defined. +\`\`\` + +## Root cause + +NixOS module option-merge semantics: declaring \`personas\` as \`lib.types.attrsOf (lib.types.submodule { ... })\` with sub-options for description/vendor/binary then setting outer `default = defaultPersonas` does NOT propagate inner defaults when operator writes only `personas.otto.enable = true` (replaces otto with just that one field; loses description/vendor/binary). + +## Fix + +Persona registry is now **STATIC** (in `let` block; not an option). Operator opts in per-persona via `zeta.aiAgents.enable.` boolean options (mkEnableOption — well-tested NixOS idiom). + +## Changes + +- `zeta-ai-agent.nix`: persona registry moved from option to `let`; enable booleans added +- `control-plane/configuration.nix`: `zeta.aiAgents.enable.otto = true` (was `personas.otto.enable`) +- All 5 persona enable lines (otto/alexa/riven/vera/lior) updated to new option path + +## Test plan + +- [ ] CI build-iso passes +- [ ] flake evaluation succeeds on control-plane host config + +## Composes with + +[PR #5394](https://github.com/Lucent-Financial-Group/Zeta/pull/5394) (the substrate this fix-fwds) · iter-5.5.0 substrate · B-0850 row · B-0848 + +🤖 Generated with [Claude Code](https://claude.com/claude-code) + +## Reviews + +### COMMENTED — @copilot-pull-request-reviewer (2026-05-27T03:24:56Z) + +## Pull request overview + +This PR fix-forwards the NixOS AI-agent module by replacing the problematic persona submodule option shape with a static registry plus per-persona enable booleans, restoring flake/module evaluation for the AI cluster configuration. + +**Changes:** +- Replaces the single-agent `zeta-otto.nix` module with generalized `zeta-ai-agent.nix`. +- Updates the shared module import to use the new AI-agent module. +- Moves control-plane enablement from `zeta.otto.enable` to `zeta.aiAgents.enable.otto`. + +### Reviewed changes + +Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment. + +| File | Description | +| ---- | ----------- | +| `full-ai-cluster/nixos/modules/zeta-otto.nix` | Deletes the old single-agent Otto module superseded by the generalized AI-agent module. | +| `full-ai-cluster/nixos/modules/zeta-ai-agent.nix` | Adds static persona registry, per-persona enable options, generated systemd units, and status output. | +| `full-ai-cluster/nixos/modules/common.nix` | Updates shared imports to load the new AI-agent module. | +| `full-ai-cluster/nixos/hosts/control-plane/configuration.nix` | Enables the Otto persona through the new option path and stages future persona toggles as comments. | + +## Review threads + +### Thread 1: full-ai-cluster/nixos/modules/zeta-ai-agent.nix:193 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T03:24:56Z): + +P1: These pending personas are exposed as valid enable switches even though their registry entries above still use placeholder binaries and the install/login substrate has not landed. If an operator sets any of them to true, evaluation succeeds but boot creates a restart-looping service. Either omit these enable options until the sub-rows ship, or add an assertion that fails evaluation with a clear pending-implementation message when one is enabled. + +## General comments + +### @chatgpt-codex-connector (2026-05-27T03:21:41Z) + +You have reached your Codex usage limits for code reviews. You can see your limits in the [Codex usage dashboard](https://chatgpt.com/codex/cloud/settings/usage). diff --git a/docs/pr-discussions/PR-5396-feat-b-0849-phase-2-github-actions-workflow-runs-docker-nixo.md b/docs/pr-discussions/PR-5396-feat-b-0849-phase-2-github-actions-workflow-runs-docker-nixo.md new file mode 100644 index 0000000000..98d70fa29c --- /dev/null +++ b/docs/pr-discussions/PR-5396-feat-b-0849-phase-2-github-actions-workflow-runs-docker-nixo.md @@ -0,0 +1,54 @@ +--- +pr_number: 5396 +title: "feat(B-0849 Phase 2): GitHub Actions workflow runs Docker NixOS install.sh test on PRs touching install substrate" +author: "AceHack" +state: "MERGED" +created_at: "2026-05-27T04:02:36Z" +merged_at: "2026-05-27T04:05:15Z" +closed_at: "2026-05-27T04:05:15Z" +head_ref: "feat-b0849-2-docker-nixos-install-sh-test-github-actions-integration-2026-05-27-0440z" +base_ref: "main" +archived_at: "2026-05-27T19:27:10Z" +archive_tool: "tools/pr-preservation/archive-pr.ts" +--- + +# PR #5396: feat(B-0849 Phase 2): GitHub Actions workflow runs Docker NixOS install.sh test on PRs touching install substrate + +## PR description + +## Summary + +Wires the Docker harness from B-0849 Phase 1 ([PR #5393](https://github.com/Lucent-Financial-Group/Zeta/pull/5393)) into CI so install.sh / linux.sh / mise.sh bugs are caught at PR time vs reboot time. + +## Path triggers + +- `tools/setup/**` — install dispatcher + per-OS scripts +- `.mise.toml` — pinned runtime versions +- `full-ai-cluster/nixos/modules/common.nix` — systemd + bun PATH +- `tools/ci/dockerfiles/nixos-install-sh-test/**` — Dockerfile +- `tools/ci/docker-nixos-install-sh-test.ts` — TS wrapper +- `.dockerignore` — affects all docker builds +- `package.json` + `bun.lock` — TS wrapper deps +- This workflow file + +## Discipline (mirrors build-ai-cluster-iso.yml) + +- Runner pinned `ubuntu-24.04` (NOT `-latest`) +- All third-party actions SHA-pinned with `vX.Y.Z` comments +- `permissions: contents: read` at workflow level +- Concurrency: workflow-scoped, cancel-in-progress for PRs +- Zero `github.event.*` interpolation in `run:` lines (security-guidance compliant) +- 15-min job timeout; 900s DOCKER_BUILD_TIMEOUT_SEC for cold-cache headroom +- Upload-artifact (always) preserves log for 7 days + +## Composes with + +[PR #5393](https://github.com/Lucent-Financial-Group/Zeta/pull/5393) (B-0849 Phase 1 — the Dockerfile + TS wrapper) · B-0831 cascade #5 QEMU complementary · iter-5.5.0 substrate · [B-0835](docs/backlog/P1/B-0835-...) install bug cluster + +🤖 Generated with [Claude Code](https://claude.com/claude-code) + +## General comments + +### @chatgpt-codex-connector (2026-05-27T04:02:41Z) + +You have reached your Codex usage limits for code reviews. You can see your limits in the [Codex usage dashboard](https://chatgpt.com/codex/cloud/settings/usage). diff --git a/docs/pr-discussions/PR-5397-feat-b-0850-phase-3d-lior-gemini-cli-as-2nd-systemd-vendor-a.md b/docs/pr-discussions/PR-5397-feat-b-0850-phase-3d-lior-gemini-cli-as-2nd-systemd-vendor-a.md new file mode 100644 index 0000000000..dc5c09cdc4 --- /dev/null +++ b/docs/pr-discussions/PR-5397-feat-b-0850-phase-3d-lior-gemini-cli-as-2nd-systemd-vendor-a.md @@ -0,0 +1,55 @@ +--- +pr_number: 5397 +title: "feat(B-0850 Phase 3d): Lior/Gemini CLI as 2nd systemd vendor \u2014 Anthropic + Google; install + interactive auth + control-plane enable (Aaron 2026-05-27)" +author: "AceHack" +state: "MERGED" +created_at: "2026-05-27T04:05:05Z" +merged_at: "2026-05-27T04:08:12Z" +closed_at: "2026-05-27T04:08:12Z" +head_ref: "feat-b0850-3d-lior-gemini-cli-2nd-vendor-systemd-agent-2026-05-27-0454z" +base_ref: "main" +archived_at: "2026-05-27T19:27:09Z" +archive_tool: "tools/pr-preservation/archive-pr.ts" +--- + +# PR #5397: feat(B-0850 Phase 3d): Lior/Gemini CLI as 2nd systemd vendor — Anthropic + Google; install + interactive auth + control-plane enable (Aaron 2026-05-27) + +## PR description + +## Summary + +Per operator authorization *"drive forward with whatever interests you most"* — shipping B-0850 Phase 3d (Lior/Gemini = 2nd vendor) to move toward the ≥3-systemd-agents target. + +## 3 changes + +1. **zeta-install.sh 6.95a-gemini** — \`bun install --global @google/gemini-cli\` (after existing claude install). WebSearch verified at implementation time per dep-pin discipline. +2. **zeta-install.sh 6.95b-gemini** — interactive \`gemini auth login\` prompt mirroring claude login (OAuth via browser OR API key from AI Studio). +3. **zeta-ai-agent.nix** — removed lior assertion (substrate shipped); **control-plane/configuration.nix** — \`zeta.aiAgents.enable.lior = true\` (otto + lior both enabled). + +## Vendor diversity progression + +| Phase | Persona | Vendor | Status | +|---|---|---|---| +| 1 (PR #5392) | otto | Anthropic Claude | shipped | +| 3d (THIS PR) | **lior** | **Google Gemini** | **shipped** | +| 3a | alexa | Alibaba Qwen (Kiro) | pending | +| 3b | riven | xAI Grok | pending | +| 3c | vera | OpenAI Codex | pending | + +After this PR: 2/5 personas + 2 vendors enabled. One more vendor (3c Vera/Codex likely next since codex CLI is also npm-installable) hits the ≥3 BFT floor. + +## Composes with + +PRs #5388 + #5389 (iter-5.5.0 credential persistence) · PRs #5392 + #5394 + #5395 (B-0850 Phase 1 + 3 refactor) · B-0848 · B-0847 · B-0796 + +Sources: +- [@google/gemini-cli on npm](https://www.npmjs.com/package/@google/gemini-cli) +- [Gemini CLI authentication docs](https://geminicli.com/docs/get-started/authentication/) + +🤖 Generated with [Claude Code](https://claude.com/claude-code) + +## General comments + +### @chatgpt-codex-connector (2026-05-27T04:05:11Z) + +You have reached your Codex usage limits for code reviews. You can see your limits in the [Codex usage dashboard](https://chatgpt.com/codex/cloud/settings/usage). diff --git a/docs/pr-discussions/PR-5398-feat-b-0850-phase-3c-vera-codex-3rd-vendor-hits-3-bft-floor.md b/docs/pr-discussions/PR-5398-feat-b-0850-phase-3c-vera-codex-3rd-vendor-hits-3-bft-floor.md new file mode 100644 index 0000000000..99e8588347 --- /dev/null +++ b/docs/pr-discussions/PR-5398-feat-b-0850-phase-3c-vera-codex-3rd-vendor-hits-3-bft-floor.md @@ -0,0 +1,130 @@ +--- +pr_number: 5398 +title: "feat(B-0850 Phase 3c): Vera/Codex 3rd vendor \u2014 hits \u22653 BFT floor (Anthropic + Google + OpenAI); @openai/codex install + device-flow auth + control-plane enable (Aaron 2026-05-27)" +author: "AceHack" +state: "MERGED" +created_at: "2026-05-27T04:08:00Z" +merged_at: "2026-05-27T04:16:36Z" +closed_at: "2026-05-27T04:16:36Z" +head_ref: "feat-b0850-3c-vera-codex-3rd-vendor-hits-bft-floor-2026-05-27-0512z" +base_ref: "main" +archived_at: "2026-05-27T19:27:08Z" +archive_tool: "tools/pr-preservation/archive-pr.ts" +--- + +# PR #5398: feat(B-0850 Phase 3c): Vera/Codex 3rd vendor — hits ≥3 BFT floor (Anthropic + Google + OpenAI); @openai/codex install + device-flow auth + control-plane enable (Aaron 2026-05-27) + +## PR description + +## Summary + +Hits the **≥3 BFT floor** Aaron named earlier 2026-05-27 (*"we should have three systemd agents and the cluster running on bootup"*). + +| Phase | Persona | Vendor | Status | +|---|---|---|---| +| 1 | otto | Anthropic Claude | merged (#5392) | +| 3d | lior | Google Gemini | merged or armed (#5397) | +| **3c** | **vera** | **OpenAI Codex** | **THIS PR** | +| 3a | alexa | Alibaba Qwen (Kiro) | pending | +| 3b | riven | xAI Grok | pending | + +3 vendors enabled = f=1 BFT margin for vendor-outage resilience + self-modification-safety. Stacked on PR #5397 to avoid merge conflicts. + +## 3 changes + +1. **zeta-install.sh 6.95a-codex** — \`bun install --global @openai/codex\` (WebSearch verified per dep-pin discipline) +2. **zeta-install.sh 6.95b-codex** — \`codex login --device-auth\` (cleanest device-flow shape of the 3 vendors; headless-friendly URL+code pattern). Creds cache at \`~/.codex/auth.json\` (NOT \`~/.config/codex/\`). +3. **zeta-ai-agent.nix** — removed vera assertion; **control-plane/configuration.nix** — \`zeta.aiAgents.enable.vera = true\` (otto + lior + vera = 3 personas, 3 vendors) + +## Composes with + +[PR #5397](https://github.com/Lucent-Financial-Group/Zeta/pull/5397) (Phase 3d Lior sibling) · PRs #5388 + #5389 (iter-5.5.0 credential persistence) · PRs #5392 + #5394 + #5395 (B-0850 Phase 1 + 3 refactor) · [B-0848](docs/backlog/P2/B-0848-...) · [B-0847](docs/backlog/P2/B-0847-...) · [B-0703 multi-oracle BFT](docs/backlog/P*/B-0703-...) + +Sources: +- [@openai/codex on npm](https://www.npmjs.com/package/@openai/codex) +- [Codex authentication docs](https://developers.openai.com/codex/auth) + +🤖 Generated with [Claude Code](https://claude.com/claude-code) + +## Reviews + +### COMMENTED — @copilot-pull-request-reviewer (2026-05-27T04:11:57Z) + +## Pull request overview + +This PR extends the NixOS “AI agents as systemd services” substrate toward the **≥3 vendor** resilience target by adding **OpenAI Codex** install/login steps (and also including Gemini install/login changes) and enabling additional personas on the control-plane host. + +**Changes:** +- Add installer steps to `bun install --global` the Codex CLI and run `codex login --device-auth` (plus Gemini install/login steps). +- Remove the “not shipped yet” assertions for Vera/Codex and Lior/Gemini in the NixOS module. +- Enable `lior` and `vera` agents by default on the control-plane host. + +### Reviewed changes + +Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments. + +| File | Description | +| ---- | ----------- | +| full-ai-cluster/usb-nixos-installer/zeta-install.sh | Adds bun global installs for gemini/codex and interactive auth flows; adjusts pipefail usage. | +| full-ai-cluster/nixos/modules/zeta-ai-agent.nix | Removes assertions blocking vera/lior enablement (replaced with comments). | +| full-ai-cluster/nixos/hosts/control-plane/configuration.nix | Enables `lior` and `vera` systemd agents on the control-plane host. | + + +

+Comments suppressed due to low confidence (3) + +**full-ai-cluster/usb-nixos-installer/zeta-install.sh:1109** +* P1 bug: `set -o pipefail` is being set inside the `bash -c` subshell, but the `| tail -5` pipeline is in the parent shell, so failures from `bun install` can be masked (the pipeline exit status will typically be `tail`'s). Move the `tail` into the same shell where `pipefail` is set so the `|| WARN` reliably triggers. +``` + sudo HOME="$ZETA_HOME" BUN_INSTALL="$ZETA_HOME/.bun" -u "#$ZETA_UID" \ + bash -c 'set -o pipefail; eval "$(mise activate bash 2>/dev/null || true)"; bun install --global @anthropic-ai/claude-code' 2>&1 | tail -5 || \ + echo "[iter-5.5.0] WARN: bun install claude-code FAILED — can retry post-reboot via 'bun install --global @anthropic-ai/claude-code'" +``` +**full-ai-cluster/usb-nixos-installer/zeta-install.sh:1166** +* P0 security: Gemini CLI credentials are documented (and appear to be stored) under `~/.gemini/` (e.g. `~/.gemini/oauth_creds.json`), but this script tells operators creds land in `~/.config/gemini/` and only tightens perms on that path. This likely leaves the real Gemini OAuth creds unprotected by the intended `chmod -R go-rwx`. Update both the messaging and the chmod/chown target to `~/.gemini/`. +``` + # 6.95b-gemini — interactive gemini auth login (mirror claude login). + # B-0850 Phase 3d 2nd vendor login flow. gemini-cli supports OAuth + # via local HTTP server OR API-key paste. The interactive prompt + # lets operator choose. Credentials persist to ~/.config/gemini/. + GEMINI_BIN="$ZETA_HOME/.bun/bin/gemini" +``` +**full-ai-cluster/nixos/modules/zeta-ai-agent.nix:233** +* P0 bug: the per-persona service loop still hardcodes `${persona.binary} --print ...` for all vendors, but Codex and Gemini don’t share Claude’s `--print` interface. Removing these assertions allows flake evaluation to succeed even though enabling `vera`/`lior` will create restart-looping services. Keep the assertions until `makeAgentService` is updated to call Codex via `codex exec ...` and Gemini via `gemini -p ...` (or equivalent non-interactive forms). +``` + # B-0850.3c (Vera/Codex) shipped this PR — assertion removed. + # zeta-install.sh Step 6.95a-codex installs @openai/codex via + # bun + Step 6.95b-codex runs `codex login --device-auth`. + # Binary lands at ~/.bun/bin/codex; creds at ~/.codex/auth.json. + # B-0850.3d (Lior/Gemini) shipped this PR — assertion removed. + # zeta-install.sh Step 6.95a-gemini installs @google/gemini-cli + # via bun + Step 6.95b-gemini runs interactive gemini auth login. + # Binary lands at ~/.bun/bin/gemini. +``` +
+ +## Review threads + +### Thread 1: full-ai-cluster/usb-nixos-installer/zeta-install.sh:1123 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T04:11:56Z): + +P1 bug: same pipefail masking issue as above for the Gemini CLI install; `bun install` failures may not trigger the warning because `tail` is outside the shell where `pipefail` is set. + +### Thread 2: full-ai-cluster/usb-nixos-installer/zeta-install.sh:1133 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T04:11:57Z): + +P1 bug: same pipefail masking issue as above for the Codex install; `bun install` failures may not trigger the warning because `tail` is outside the shell where `pipefail` is set. + +### Thread 3: full-ai-cluster/usb-nixos-installer/zeta-install.sh:1212 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T04:11:57Z): + +P2 documentation: this installer runs on a NixOS environment; the prompt text should avoid implying a Mac-only browser requirement. Since this is a device flow, it should be phrased as “any browser on any device”. + +### Thread 4: full-ai-cluster/nixos/hosts/control-plane/configuration.nix:46 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T04:11:57Z): + +P0 bug: enabling `lior`/`vera` will start services that invoke `${binary} --print`, but Codex is subcommand-based (non-interactive is `codex exec`, not a `--print` flag) and Gemini uses `-p` for non-interactive prompts. With the current `zeta-ai-agent.nix` loop, `zeta-vera.service`/`zeta-lior.service` are expected to restart-loop at boot. Either update the systemd ExecStart per-vendor, or keep these disabled here until the module supports the correct invocation shapes. diff --git a/docs/pr-discussions/PR-5399-docs-hygiene-tick-2026-05-27t04-08z-15-worktrees-15-local-br.md b/docs/pr-discussions/PR-5399-docs-hygiene-tick-2026-05-27t04-08z-15-worktrees-15-local-br.md new file mode 100644 index 0000000000..f1786c4cde --- /dev/null +++ b/docs/pr-discussions/PR-5399-docs-hygiene-tick-2026-05-27t04-08z-15-worktrees-15-local-br.md @@ -0,0 +1,45 @@ +--- +pr_number: 5399 +title: "docs(hygiene): tick 2026-05-27T04:08Z \u2014 15 worktrees + 15 local branch refs cleaned" +author: "AceHack" +state: "MERGED" +created_at: "2026-05-27T04:14:37Z" +merged_at: "2026-05-27T04:15:51Z" +closed_at: "2026-05-27T04:15:51Z" +head_ref: "shard/0408z-worktree-cleanup-15-removed-2026-05-27" +base_ref: "main" +archived_at: "2026-05-27T19:27:07Z" +archive_tool: "tools/pr-preservation/archive-pr.ts" +--- + +# PR #5399: docs(hygiene): tick 2026-05-27T04:08Z — 15 worktrees + 15 local branch refs cleaned + +## PR description + +## Summary + +Cold-boot autonomous-loop tick. Catch-43 sentinel was empty (session-exit non-persistence); re-armed `88127e62` BEFORE any substantive work per `.claude/rules/tick-must-never-stop.md`. + +**Substrate landed**: +- 15 stale agent worktrees under `/private/tmp/zeta-*` removed (substrate on main via B-0847/B-0848/B-0849/B-0850 cluster merges) +- 15 stale local branch refs deleted +- 2 worktrees preserved: PR #5398 active peer Otto-CLI + iter-550 just-merged (auto-delete pending) +- Tick shard authored at `docs/hygiene-history/ticks/2026/05/27/0408Z.md` + +## Discipline composition + +Per `.claude/rules/agent-worktree-hygiene-never-hold-main-never-step-on-operator-cleanup-on-pr-merge.md` Rule 3 (agents remove their own worktrees after PR merge) + the recurrence-anchor in `.claude/rules/fighting-past-self-vs-peer-agent-distinguisher-fix-your-own-coordinate-on-peers-dont-punt-by-default.md` (2026-05-26 stale-PR-queue default-punt anchor): ran the per-worktree discriminator (`git ls-remote --heads origin `) rather than blanket-punt OR blanket-take. + +`git worktree remove` (no `--force`) refused 0 of 15 — all clean. + +## Tier + +GraphQL Normal (4876/5000 at tick fire); REST core 4929/5000; 0 peer claude/gemini/kiro/alexa procs at fire time. + +🤖 Generated with [Claude Code](https://claude.com/claude-code) + +## General comments + +### @chatgpt-codex-connector (2026-05-27T04:14:43Z) + +You have reached your Codex usage limits for code reviews. You can see your limits in the [Codex usage dashboard](https://chatgpt.com/codex/cloud/settings/usage). diff --git a/docs/pr-discussions/PR-5400-feat-b-0851-persona-first-guard-post-assignment-rotation-arc.md b/docs/pr-discussions/PR-5400-feat-b-0851-persona-first-guard-post-assignment-rotation-arc.md new file mode 100644 index 0000000000..6b20c2240b --- /dev/null +++ b/docs/pr-discussions/PR-5400-feat-b-0851-persona-first-guard-post-assignment-rotation-arc.md @@ -0,0 +1,82 @@ +--- +pr_number: 5400 +title: "feat(B-0851): persona-first guard-post assignment + rotation architecture \u2014 extends B-0850 (Mika ferry; Aaron 2026-05-27)" +author: "AceHack" +state: "MERGED" +created_at: "2026-05-27T04:38:30Z" +merged_at: "2026-05-27T04:43:05Z" +closed_at: "2026-05-27T04:43:05Z" +head_ref: "feat-b0851-persona-first-guard-post-assignment-rotation-architecture-extends-b0850-mika-2026-05-27-0610z" +base_ref: "main" +archived_at: "2026-05-27T19:27:07Z" +archive_tool: "tools/pr-preservation/archive-pr.ts" +--- + +# PR #5400: feat(B-0851): persona-first guard-post assignment + rotation architecture — extends B-0850 (Mika ferry; Aaron 2026-05-27) + +## PR description + +## Summary + +Mika ferry forwarded by Aaron 2026-05-27 — \"Everything is Persona-first\" architectural framing extends B-0850 from static persona-to-vendor binding to preference-based scheduling with rotation. + +Operator clarified: *\"guard post is the systemd for each node outside k8s\"* → per-node ≥3 floor (matches what B-0850 already implements). + +## What this PR lands + +1. **Verbatim Mika ferry preservation** at \`memory/persona/mika/conversations/2026-05-27-mika-persona-first-...\` +2. **B-0851 backlog row** capturing 10 sub-row implementation slices + +## 10 sub-row plan + +| Sub-row | Scope | +|---|---| +| 3d | persona preferences (model lines + harnesses + min tier per persona) | +| 2 | guard-post abstraction (decouple unit name from persona name) | +| 3 | scheduler primitive (NixOS module; per-tick assignment) | +| 4 | tier modeling (fast/medium/high per vendor catalog) | +| 5 | harness compat matrix | +| 6 | rotation policy (operator-config interval + dimensions + algorithm) | +| 7 | per-node ≥3 floor as guard-post count | +| 8 | substrate continuity across rotation | +| 9 | failover semantics (composes B-0703 multi-oracle BFT) | +| 10 | persona-vs-instance distinction | + +## Does NOT replace B-0850 + +B-0850 Phase 1 + 3 substrate is a VALID FIRST INSTANTIATION of persona-first (simplest scheduler: "static; always same vendor; no rotation"). B-0851 extends WITHOUT tearing down. Current ≥3-vendor format-test target is met TODAY; B-0851 is the architectural refactor toward the broader target. + +## Composes with + +[B-0850](docs/backlog/P2/B-0850-...) (parent — this extends) · [B-0703](docs/backlog/P*/B-0703-...) multi-oracle BFT · [B-0824](docs/backlog/P1/B-0824-...) Ace meta-PM · [B-0847](docs/backlog/P2/B-0847-...) per-AI GitHub identity · [B-0848](docs/backlog/P2/B-0848-...) · [B-0796](docs/backlog/P2/B-0796-...) Twilio out-of-band + +🤖 Generated with [Claude Code](https://claude.com/claude-code) + +## Reviews + +### COMMENTED — @copilot-pull-request-reviewer (2026-05-27T04:41:37Z) + +## Pull request overview + +Documentation-only PR adding the B-0851 backlog row that extends B-0850 from static persona-to-vendor binding to preference-based scheduling with rotation, plus verbatim ferry preservation under `memory/persona/mika/`. + +**Changes:** +- Adds verbatim Mika ferry preservation file under `memory/persona/mika/conversations/`. +- Adds new B-0851 backlog row in `docs/backlog/P2/` with 10 implementation sub-row slices. +- Registers B-0851 in `docs/BACKLOG.md` P2 index. + +### Reviewed changes + +Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments. + +| File | Description | +| ---- | ----------- | +| `memory/persona/mika/conversations/2026-05-27-mika-persona-first-guard-post-assignment-rotation-architecture-extends-b0850-aaron-forwarded.md` | Verbatim Mika ferry packet + operator clarification (memory history surface; lint-excluded). | +| `docs/backlog/P2/B-0851-...md` | New backlog row with frontmatter, 10 sub-row slices, composes-with, and rationale. | +| `docs/BACKLOG.md` | Adds B-0851 entry to P2 index list. | + +## General comments + +### @chatgpt-codex-connector (2026-05-27T04:38:35Z) + +You have reached your Codex usage limits for code reviews. You can see your limits in the [Codex usage dashboard](https://chatgpt.com/codex/cloud/settings/usage). diff --git a/docs/pr-discussions/PR-5401-docs-mika-ferry-2026-05-27-multi-tic-per-persona-join-as-fir.md b/docs/pr-discussions/PR-5401-docs-mika-ferry-2026-05-27-multi-tic-per-persona-join-as-fir.md new file mode 100644 index 0000000000..31659b704a --- /dev/null +++ b/docs/pr-discussions/PR-5401-docs-mika-ferry-2026-05-27-multi-tic-per-persona-join-as-fir.md @@ -0,0 +1,64 @@ +--- +pr_number: 5401 +title: "docs(mika ferry 2026-05-27): multi-tic-per-persona + join-as-first-class-security-aware-primitive + Kleisli/AsyncLocal context duality" +author: "AceHack" +state: "MERGED" +created_at: "2026-05-27T04:59:29Z" +merged_at: "2026-05-27T05:00:47Z" +closed_at: "2026-05-27T05:00:47Z" +head_ref: "feat-mika-ferry-multi-tic-per-persona-kleisli-security-aware-joins-2026-05-27-0930z" +base_ref: "main" +archived_at: "2026-05-27T19:27:06Z" +archive_tool: "tools/pr-preservation/archive-pr.ts" +--- + +# PR #5401: docs(mika ferry 2026-05-27): multi-tic-per-persona + join-as-first-class-security-aware-primitive + Kleisli/AsyncLocal context duality + +## PR description + +## Summary + +Mika ferry forwarded by Aaron 2026-05-27 — substrate-engineering crystallization across 7 segments. Builds on PR #5400 (B-0851 persona-first) + earlier memories (multi-surface ticks + persona-first design principle + self-sustaining cluster). + +## 5 composing primitives crystallized + +1. **Crystal ball** = in-cluster tic continuously backing up persona state +2. **Multi-tic-per-persona** (simultaneous, NOT rotation): "you just get both" +3. **Tics-as-generators** (B-0824 generate+join paradigm at multi-tic scope) +4. **Joins-as-first-class-security-aware-primitives** (THE keystone — join carries security context + boundaries + attributes) +5. **Kleisli (F#) ≡ AsyncLocal (C#)** for context propagation; pick per language + +## Aaron's keystone framing + +> *"And instead of treating them like sessions, imagine we're gonna actually do fuckin' joins. We're gonna join, but we're gonna join with clear security context, boundaries, attributes on the join."* + +## Composes with + +[B-0824](docs/backlog/P1/B-0824-...) (generate+join paradigm; ratified at multi-tic scope) · [B-0851](docs/backlog/P2/B-0851-...) (persona-first guard-post architecture) · [B-0703](docs/backlog/P*/B-0703-...) multi-oracle BFT · [B-0666](docs/backlog/P*/B-0666-...) keystone I(D(x))=x · [B-0706](docs/backlog/P*/B-0706-...) Orleans deployment · [B-0850](docs/backlog/P2/B-0850-...) outside-k8s systemd · [B-0848](docs/backlog/P2/B-0848-...) node-local Claude · m/acc multi-oracle ethics · persistence-choice-architecture deepest-exit · 3 substantive substrate memories landed today + +## Implementation implications (illustrative; NOT shipping today) + +F# substrate primitive shape: + +\`\`\`fsharp +type Tic<'state> = IObservable +type JoinPolicy = { + SecurityContextMerge: SecurityContext list -> SecurityContext + BoundaryEnforcement: Boundary list -> Boundary + AttributeProjection: Attributes list -> Attributes +} +\`\`\` + +C# substrate at integration boundaries: AsyncLocal + AsyncLocal + AsyncLocal + HttpContext + structured logging scopes. + +## NOT minting new backlog rows + +Per \`verify-existing-substrate-before-authoring.md\`: substrate captured as Mika ferry preservation. Future implementation work composes with existing B-0824 + B-0851 + B-0703 + B-0706 sub-rows. New \`.claude/rules/multi-tic-per-persona-join-as-security-aware-primitive.md\` rule candidate when implementation scopes. + +🤖 Generated with [Claude Code](https://claude.com/claude-code) + +## General comments + +### @chatgpt-codex-connector (2026-05-27T04:59:33Z) + +You have reached your Codex usage limits for code reviews. You can see your limits in the [Codex usage dashboard](https://chatgpt.com/codex/cloud/settings/usage). diff --git a/docs/pr-discussions/PR-5402-hygiene-bash-retirement-expose-retained-shell-categories.md b/docs/pr-discussions/PR-5402-hygiene-bash-retirement-expose-retained-shell-categories.md new file mode 100644 index 0000000000..e29a492b38 --- /dev/null +++ b/docs/pr-discussions/PR-5402-hygiene-bash-retirement-expose-retained-shell-categories.md @@ -0,0 +1,73 @@ +--- +pr_number: 5402 +title: "hygiene(bash-retirement): Expose retained shell categories" +author: "AceHack" +state: "MERGED" +created_at: "2026-05-27T05:32:05Z" +merged_at: "2026-05-27T05:41:34Z" +closed_at: "2026-05-27T05:41:34Z" +head_ref: "claim/codex-loop-bash-retirement-category-summary-20260527" +base_ref: "main" +archived_at: "2026-05-27T19:27:05Z" +archive_tool: "tools/pr-preservation/archive-pr.ts" +--- + +# PR #5402: hygiene(bash-retirement): Expose retained shell categories + +## PR description + +## Summary + +- add explicit category metadata for each retained shell allowlist entry +- surface retained category counts and uncategorized allowlist drift in the report +- cover category summary rendering and integrity enforcement in focused Bun tests + +## Tests + +- `bun test tools/hygiene/check-bash-retirement-inventory.test.ts` +- `bun tools/hygiene/check-bash-retirement-inventory.ts --enforce` + +## Reviews + +### COMMENTED — @copilot-pull-request-reviewer (2026-05-27T05:34:21Z) + +## Pull request overview + +This PR strengthens the bash-retirement hygiene guard by adding explicit category metadata for each retained `.sh` allowlist entry, surfacing category counts in the report output, and enforcing that allowlist entries are categorized. + +**Changes:** +- Add retained-shell category metadata and include category summaries in the inventory report (including JSON output). +- Extend allowlist integrity checks to fail on uncategorized allowlist entries. +- Add Bun tests covering category summary rendering and integrity enforcement. + +### Reviewed changes + +Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment. + +| File | Description | +| ---- | ----------- | +| tools/hygiene/check-bash-retirement-inventory.ts | Adds retained shell category mapping, integrity enforcement for uncategorized entries, and category summary rendering. | +| tools/hygiene/check-bash-retirement-inventory.test.ts | Adds assertions and new tests for category summaries and uncategorized allowlist enforcement. | +| docs/claims/codex-loop-bash-retirement-category-summary-20260527.md | Adds a claim record related to this work (should be removed before merge per claim protocol). | + +### COMMENTED — @AceHack (2026-05-27T05:35:47Z) + +_(no body)_ + +## Review threads + +### Thread 1: docs/claims/codex-loop-bash-retirement-category-summary-20260527.md:17 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T05:34:21Z): + +Claim files under docs/claims/ are meant to be temporary coordination artifacts and must be deleted in the same PR that lands the work. Please remove this claim file (git rm) before merging, per docs/AGENT-CLAIM-PROTOCOL.md ("When you open a PR... delete the claim file in the same PR"). + +**@AceHack** (2026-05-27T05:35:47Z): + +Removed the temporary claim file in release commit 4a261cea6. Focused checks still pass: `bun test tools/hygiene/check-bash-retirement-inventory.test.ts`, `bun tools/hygiene/check-bash-retirement-inventory.ts --enforce`, and `git diff --check`. + +## General comments + +### @chatgpt-codex-connector (2026-05-27T05:32:10Z) + +You have reached your Codex usage limits for code reviews. You can see your limits in the [Codex usage dashboard](https://chatgpt.com/codex/cloud/settings/usage). diff --git a/docs/pr-discussions/PR-5403-feat-b-0852-credential-persistence-on-usb-esp-boot-sequence.md b/docs/pr-discussions/PR-5403-feat-b-0852-credential-persistence-on-usb-esp-boot-sequence.md new file mode 100644 index 0000000000..5d7c9bb77c --- /dev/null +++ b/docs/pr-discussions/PR-5403-feat-b-0852-credential-persistence-on-usb-esp-boot-sequence.md @@ -0,0 +1,112 @@ +--- +pr_number: 5403 +title: "feat(B-0852): credential persistence on USB ESP + boot-sequence auth-method picker \u2014 encrypted blob bound to USB UUID + operator passphrase Phase 1 (Aaron 2026-05-27)" +author: "AceHack" +state: "MERGED" +created_at: "2026-05-27T05:48:08Z" +merged_at: "2026-05-27T06:07:50Z" +closed_at: "2026-05-27T06:07:50Z" +head_ref: "backlog/b-0852-cred-persistence-boot-auth-picker-2026-05-27" +base_ref: "main" +archived_at: "2026-05-27T19:27:04Z" +archive_tool: "tools/pr-preservation/archive-pr.ts" +--- + +# PR #5403: feat(B-0852): credential persistence on USB ESP + boot-sequence auth-method picker — encrypted blob bound to USB UUID + operator passphrase Phase 1 (Aaron 2026-05-27) + +## PR description + +## Summary + +- Phase 1 substrate-engineering target authorized by Aaron 2026-05-27 after hitting GitHub login throttle on 3rd USB boot of the day +- Encrypted cred-blob on USB ESP (`/esp/zeta-creds.enc`), key bound to USB UUID + operator passphrase via HKDF + AES-256-GCM +- Boot-sequence picker (`zeta-install.sh` Step 6.9) offers: restore from blob / fresh device-flow login / operator-provided PAT / skip +- Per-AI identity (per B-0847) — blob contains per-persona map so otto / lior / vera creds round-trip independently +- Removes gh-login throttle on multi-boot test workflow + +## Composes with + +- **B-0850** (parent) — multi-vendor systemd substrate the auth flow serves +- **B-0833** — interactive-login-vs-baked-in-keys CI test tension; resolves WITHOUT shipping creds in ISO +- **B-0835** — gh-auth-not-respected; addresses gh-auth persistence half +- **B-0831** — CI cascade 6 full-install + cluster-auto-join; PAT path makes CI scriptable +- **B-0847** — per-AI GitHub identity; blob is per-persona credential carrier +- **B-0851** — persona-first scheduler; chooses which persona's creds to restore +- iter-4.2 ESP SSH pubkey injection (bidirectional ESP-write channel) + +## Phase 1 deliberately narrow + +- Passphrase + USB UUID binding only (Phase 3 = hardware-bound keys; Phase 5 = cross-cluster federation) +- 8 sub-rows enumerated in row body (B-0852.1 through B-0852.8) +- Fresh USB queued for Phase 1 flash + boot validation test + +## NCI floor preserved + +Per .claude/rules/non-coercion-invariant.md HC-8: operator authority over own credentials absolute. No creds baked into ISO image (preserves B-0833 discipline). Encrypted blob is operator-controllable + operator-removable. + +## Test plan + +- [ ] Backlog index regen verified (BACKLOG.md updated; B-0852 row visible at correct priority slot) +- [ ] Composes_with reciprocity check (B-0833 + B-0835 + B-0831 + B-0847 + B-0851 should reciprocate when this row is implemented) +- [ ] Phase 1 implementation sub-rows (B-0852.1-8) to be filed per row body sub-row plan when implementation work claims the parent row + +🤖 Generated with [Claude Code](https://claude.com/claude-code) + +## Reviews + +### COMMENTED — @copilot-pull-request-reviewer (2026-05-27T05:50:48Z) + +## Pull request overview + +Adds a new P1 backlog row (B-0852) describing Phase 1 credential persistence on a USB ESP (encrypted blob bound to USB UUID + operator passphrase) and updates the generated backlog index to include the new row. + +**Changes:** +- Adds `docs/backlog/P1/B-0852-...md` describing the planned USB ESP credential-blob + boot-sequence auth-method picker design and acceptance criteria. +- Updates `docs/BACKLOG.md` to include B-0852 in the P1 section. + +### Reviewed changes + +Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment. + +| File | Description | +| ---- | ----------- | +| docs/backlog/P1/B-0852-credential-persistence-on-usb-esp-plus-boot-sequence-auth-method-picker-encrypted-blob-bound-to-usb-uuid-plus-operator-passphrase-aaron-2026-05-27.md | New backlog row documenting the Phase 1 credential persistence/auth-method picker plan. | +| docs/BACKLOG.md | Adds the B-0852 entry to the generated backlog index. | + +### COMMENTED — @copilot-pull-request-reviewer (2026-05-27T06:02:11Z) + +## Pull request overview + +Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments. + +## Review threads + +### Thread 1: docs/backlog/P1/B-0852-credential-persistence-on-usb-esp-plus-boot-sequence-auth-method-picker-encrypted-blob-bound-to-usb-uuid-plus-operator-passphrase-aaron-2026-05-27.md:60 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T05:50:48Z): + +P1 (doc accuracy): This row claims a new `zeta-install.sh` “Step 6.9 (new)” auth-method picker that runs before “Step 6.95b (interactive auth)”, but `full-ai-cluster/usb-nixos-installer/zeta-install.sh` already uses Step 6.9 for iter-5.4.1 self-registration and Step 6.95 for claude-code install/credential persistence. Please update the step numbers/references here to match the current installer step layout (or describe the insertion/renumbering plan explicitly) so implementers don’t collide with the existing Step 6.9 meaning. + +### Thread 2: docs/backlog/P1/B-0852-credential-persistence-on-usb-esp-plus-boot-sequence-auth-method-picker-encrypted-blob-bound-to-usb-uuid-plus-operator-passphrase-aaron-2026-05-27.md:60 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T06:02:10Z): + +P0: The design says the picker runs *after* Step 6.8's gh-auth flow completes. If Step 6.8 already calls `gh auth login` (device-flow), then the restore path can't prevent the GitHub login throttle—quota is already consumed before restore is offered. To meet the stated goal (and the acceptance criteria of zero device-flow calls on reboot), the detection/restore decision needs to happen *before* any device-flow login, with Step 6.8 made conditional (only run when the chosen method is fresh login). + +### Thread 3: docs/backlog/P1/B-0852-credential-persistence-on-usb-esp-plus-boot-sequence-auth-method-picker-encrypted-blob-bound-to-usb-uuid-plus-operator-passphrase-aaron-2026-05-27.md:12 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T06:02:11Z): + +P1: `depends_on: [B-0850]` makes this P1 row a hard prerequisite follower of a P2 row. If B-0850 is not strictly required to implement Phase 1 (installer + ESP blob), consider moving B-0850 to `composes_with` instead. If it is required, it likely needs priority alignment (or the dependency direction reversed) so planning/order in the backlog graph remains consistent. + +### Thread 4: docs/backlog/P1/B-0852-credential-persistence-on-usb-esp-plus-boot-sequence-auth-method-picker-encrypted-blob-bound-to-usb-uuid-plus-operator-passphrase-aaron-2026-05-27.md:90 (resolved) + +**@copilot-pull-request-reviewer** (2026-05-27T06:02:11Z): + +P1: Step numbering is internally inconsistent: earlier the picker is described as Step 6.81–6.83 (with the menu at 6.82), but this list later calls it “new Step 6.9” (and adds “6.95c”). Please pick one step scheme and align all references in the row so implementers don’t follow the wrong insertion point. + +## General comments + +### @chatgpt-codex-connector (2026-05-27T05:48:12Z) + +You have reached your Codex usage limits for code reviews. You can see your limits in the [Codex usage dashboard](https://chatgpt.com/codex/cloud/settings/usage).