backlog(iter-6): 6-row cluster-update substrate — nixpkgs 24.11→25.11 + autoUpgrade + kured + deploy-rs + runbook + ALL-deps capstone#5123
Conversation
…2/0803/0804/0805) — nixpkgs 24.11→25.11 bump + autoUpgrade + kured + deploy-rs + distro-runbook + ALL-deps-current-sweep capstone
The maintainer 2026-05-26 directed two substrate-engineering pulls in this
session that compose into the iter-6 cluster-update arc:
(1) "is there a 25 we should go ahead and distro upgrade we don't want to be
behind search for latest we like to be on latest deps and don't start
behind from the beginning"
(2) "lets backlog all that we need to be able to upgrade without having to
reformat every time or if we reformat everytime it's handled by the
cluster not a manual operator"
(3) "we need to do that same thing to all our nix installed deps and argocd
deps casue you are not good at getting current version"
WebSearch surfaced: NixOS 25.11 "Xantusia" released 2025-11-30; current
stable; EOL 2026-06-30. Our pin `nixos-24.11` is past EOL (2025-06-30) —
substantively behind, supply-chain-security exposure.
Rows filed:
- **B-0800** (P1) — iter-6.0 — bump nixpkgs 24.11→25.11 (urgent: EOL recovery)
- **B-0801** (P2) — iter-6.1 — system.autoUpgrade in nixos/modules/common.nix
- **B-0802** (P2) — iter-6.2 — kured ArgoCD app (K8s-aware drain+reboot)
- **B-0803** (P2) — iter-6.3 — deploy-rs from CI (GitOps alt to autoUpgrade)
- **B-0804** (P2) — iter-6.4 — distro-upgrade runbook + orchestrator script
- **B-0805** (P1) — iter-6.5 (CAPSTONE) — ALL nix + ArgoCD + helm + image deps
current-version sweep + .claude/rules/dep-pin-search-first-authority.md to
encode the agent-side discipline so the gap doesn't re-open (the maintainer's
substrate-honest catch that Otto's training-data defaults skew stale)
All 6 rows cross-reference via composes_with; index regenerated. P1 rows are
the urgent ones (EOL recovery + agent-discipline encoding); P2 rows are the
substrate-engineering for "cluster handles updates, not manual operator".
Pick-one decision (autoUpgrade XOR deploy-rs) documented in both B-0801 +
B-0803; B-0802 kured composes with either shape.
Co-Authored-By: Claude <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Markdownlint surfaced 5 MD032 errors across B-0801 / B-0803 / B-0804 — "Lists should be surrounded by blank lines". Inserted blank line between each "intro sentence:" and the following bullet list. No content change; lint-only fix. Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Files a six-row iter-6 backlog cluster (B-0800–B-0805) for full-ai-cluster OS lifecycle: an urgent nixpkgs/nix-darwin 24.11→25.11 EOL-recovery bump, system.autoUpgrade enablement, kured ArgoCD app for drain-aware reboots, deploy-rs-from-CI as an alternative push-shape, a cross-channel distro-upgrade runbook + orchestrator, and a capstone all-deps currency audit + agent search-first-authority rule. Pure docs change — six new per-row markdown files plus the auto-regenerated docs/BACKLOG.md index entries.
Changes:
- Add six new backlog rows under
docs/backlog/P1/anddocs/backlog/P2/covering the iter-6 cluster-update substrate withdepends_on/composes_withedges wired across the cluster. - Regenerate
docs/BACKLOG.mdto surface the new entries under P1 and P2. - Encode an autoUpgrade-XOR-deploy-rs design constraint and a capstone agent-discipline rule (
.claude/rules/dep-pin-search-first-authority.md) to land later.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/backlog/P1/B-0800-…md | P1 row: bump nixpkgs/nix-darwin pin 24.11→25.11 (EOL recovery) |
| docs/backlog/P1/B-0805-…md | P1 capstone row: all-deps currency audit tool + agent search-first-authority rule |
| docs/backlog/P2/B-0801-…md | P2 row: enable system.autoUpgrade in common.nix |
| docs/backlog/P2/B-0802-…md | P2 row: kured ArgoCD app for K8s-aware drain+reboot |
| docs/backlog/P2/B-0803-…md | P2 row: deploy-rs from CI as alternative push-shape |
| docs/backlog/P2/B-0804-…md | P2 row: distro-upgrade runbook + Bun orchestrator |
| docs/BACKLOG.md | Regenerated index entries for the six new rows |
…dup key + contradictory flags + sentinel path + unclosed quote
All 5 caught by Copilot post-arming review, all real:
(1) B-0803 broken link: linked `../P2/B-0794-iter-5-4-...` but B-0794
actually lives at `docs/backlog/P1/B-0794-node-self-registers-in-git-...`.
Fixed link target.
(2) B-0802 duplicate `configuration:` YAML key: two top-level
`configuration:` mappings in the example Helm values; YAML silently
keeps only the second, so rebootDays/startTime/endTime/timeZone/
rebootSentinel would be dropped if copy-pasted. Merged into a single
`configuration:` block.
(3) B-0802 reboot sentinel path: example used `/var/run/reboot-required`
(Debian-ism) while sub-target note said `/run/reboot-required`
(NixOS-actual). Aligned to `/run/reboot-required`; added inline
comment naming the Debian-ism trap.
(4) B-0801 contradictory autoUpgrade flags: `--commit-lock-file` +
`--no-write-lock-file` are mutually exclusive (no-write means
no lock to commit). Dropped `--commit-lock-file`; rewrote the
explanatory note to match (the cluster has no repo write creds
anyway; lock updates ship from CI per B-0803).
(5) B-0804 unclosed quote in title: opened `"if we reformat every time
..."` but never closed. Fixed by adding `" per the maintainer
2026-05-26` close + attribution; propagates to BACKLOG.md correctly.
Index regenerated.
Co-Authored-By: Claude <noreply@anthropic.com>
…n 2025) to 25.11 'Xantusia' (current stable) — the maintainer 2026-05-26 EOL recovery catch (#5218) The maintainer 2026-05-26: "24.11 is a 2 year old version you found a 25.11 when you searched latest we need to make sure we are on latest too". Per WebSearch (per `.claude/rules/dep-pin-search-first-authority.md`): - NixOS 25.11 "Xantusia" — current stable; released 2025-11-30; EOL 2026-06-30 per https://nixos.org/blog/announcements/2025/nixos-2511/ - Our pin `nixos-24.11` had been EOL since 2025-06-30 (~11 months out-of-support) — substantive supply-chain-security gap. Changes (all 5 24.11 references in source bumped to 25.11; no behavioral change beyond the channel bump): (1) full-ai-cluster/flake.nix: - nixpkgs.url: nixos-24.11 → nixos-25.11 (with inline WebSearch citation comment for future-Otto reference) - nix-darwin.url: nix-darwin-24.11 → nix-darwin-25.11 (matching release branch) - stateVersion default: "24.11" → "25.11" (PC1 + future cluster nodes are fresh-install per maintainer — no persistent K8s workloads yet → safe to bump; already-installed hosts should NOT bump per-host stateVersion without explicit migration) (2) full-ai-cluster/usb-nixos-installer/flake.nix: - nixpkgs.url + stateVersion: matching bumps (3) full-ai-cluster/nixos/modules/common.nix: - stateVersion ? "24.11" → "25.11" (default fallback for new hosts) (4) full-ai-cluster/nixos/hosts/worker-template/default.nix: - system.stateVersion: "24.11" → "25.11" (5) full-ai-cluster/usb-nixos-installer/nixos/installer/configuration.nix: - system.stateVersion: "24.11" → "25.11" (6) full-ai-cluster/README.md + tools/zflash.ts: - nix-darwin-24.11 → nix-darwin-25.11 + zeta-installer-24.11.iso → zeta-installer-25.11.iso (cosmetic; ISO output file name follows stateVersion convention) (7) Both flake.lock files regenerated via `nix flake update`: - full-ai-cluster/flake.lock: nixpkgs pinned to b77b3de (2026-05-22) + nix-darwin to ebec37a (2026-02-26) + nixos-hardware to c97bc4d (2026-05-20) - full-ai-cluster/usb-nixos-installer/flake.lock: nixpkgs same commit b77b3de (8) Validated locally: `nix flake check --no-build --show-trace` ✅ clean (all attributes evaluate; build skipped per check semantics). Composes with B-0801–B-0805 iter-6 cluster-update arc landed earlier this session — this is sub-target 0 (the urgent EOL recovery). Once this lands, next CI ISO build triggers automatically (full-ai-cluster/** in push paths) → operator gets `zeta-installer-25.11.iso` artifact. Substrate-inventory pass per #5131 rule: - grep -rn "24\.11" full-ai-cluster/ → 5 source locations + bump- citation comments (intentional) - grep -rn "nixos-25" full-ai-cluster/ → none pre-bump; safe to introduce - B-0800 row (already on main via #5123) names this as the canonical bump target Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude <noreply@anthropic.com>
Summary
Backlog cluster for iter-6 cluster-update substrate, per maintainer directives:
WebSearch confirmed: NixOS 25.11 "Xantusia" is current stable (released 2025-11-30; EOL 2026-06-30). Our current pin
nixos-24.11is past EOL as of 2025-06-30 — substantively behind + supply-chain-security exposure.Rows filed
system.autoUpgradeinnixos/modules/common.nix.claude/rules/dep-pin-search-first-authority.mdKey design decisions captured
Test plan
BACKLOG.mdregenerated viabun tools/backlog/generate-index.ts🤖 Generated with Claude Code