Skip to content

backlog(iter-6): 6-row cluster-update substrate — nixpkgs 24.11→25.11 + autoUpgrade + kured + deploy-rs + runbook + ALL-deps capstone#5123

Merged
AceHack merged 3 commits into
mainfrom
otto-cli/iter6-cluster-update-backlog-cluster-2026-05-26
May 26, 2026
Merged

backlog(iter-6): 6-row cluster-update substrate — nixpkgs 24.11→25.11 + autoUpgrade + kured + deploy-rs + runbook + ALL-deps capstone#5123
AceHack merged 3 commits into
mainfrom
otto-cli/iter6-cluster-update-backlog-cluster-2026-05-26

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 26, 2026

Summary

Backlog cluster for iter-6 cluster-update substrate, per maintainer directives:

  • "is there a 25 we should go ahead and distro upgrade ... don't start behind from the beginning"
  • "lets backlog all that we need to be able to upgrade without ... manual operator"
  • "we need to do that same thing to all our nix installed deps and argocd deps casue you are not good at getting current version"

WebSearch confirmed: NixOS 25.11 "Xantusia" is current stable (released 2025-11-30; EOL 2026-06-30). Our current pin nixos-24.11 is past EOL as of 2025-06-30 — substantively behind + supply-chain-security exposure.

Rows filed

ID Tier Title
B-0800 P1 iter-6.0 — bump nixpkgs 24.11→25.11 (urgent EOL recovery)
B-0801 P2 iter-6.1 — system.autoUpgrade in nixos/modules/common.nix
B-0802 P2 iter-6.2 — kured ArgoCD app (K8s-aware drain+reboot)
B-0803 P2 iter-6.3 — deploy-rs from CI (GitOps alt to autoUpgrade)
B-0804 P2 iter-6.4 — distro-upgrade runbook + orchestrator
B-0805 P1 iter-6.5 (CAPSTONE) — ALL deps current-version sweep + .claude/rules/dep-pin-search-first-authority.md

Key design decisions captured

  • autoUpgrade XOR deploy-rs (B-0801 + B-0803 both note: pick one, not both — they race)
  • kured composes with either shape (B-0802 handles K8s-aware reboot orchestration regardless)
  • B-0805 is the substrate-honest catch: Otto's training-data defaults skew stale; without agent-discipline encoding the gap re-opens every PR

Test plan

  • All 6 row files validated against backlog frontmatter shape
  • BACKLOG.md regenerated via bun tools/backlog/generate-index.ts
  • composes_with cross-references back-filled across all 6 rows
  • Sources cited (NixOS 25.11 release + kured + deploy-rs)
  • No code changes; backlog rows + index only

🤖 Generated with Claude Code

…2/0803/0804/0805) — nixpkgs 24.11→25.11 bump + autoUpgrade + kured + deploy-rs + distro-runbook + ALL-deps-current-sweep capstone

The maintainer 2026-05-26 directed two substrate-engineering pulls in this
session that compose into the iter-6 cluster-update arc:

(1) "is there a 25 we should go ahead and distro upgrade we don't want to be
    behind search for latest we like to be on latest deps and don't start
    behind from the beginning"

(2) "lets backlog all that we need to be able to upgrade without having to
    reformat every time or if we reformat everytime it's handled by the
    cluster not a manual operator"

(3) "we need to do that same thing to all our nix installed deps and argocd
    deps casue you are not good at getting current version"

WebSearch surfaced: NixOS 25.11 "Xantusia" released 2025-11-30; current
stable; EOL 2026-06-30. Our pin `nixos-24.11` is past EOL (2025-06-30) —
substantively behind, supply-chain-security exposure.

Rows filed:

- **B-0800** (P1) — iter-6.0 — bump nixpkgs 24.11→25.11 (urgent: EOL recovery)
- **B-0801** (P2) — iter-6.1 — system.autoUpgrade in nixos/modules/common.nix
- **B-0802** (P2) — iter-6.2 — kured ArgoCD app (K8s-aware drain+reboot)
- **B-0803** (P2) — iter-6.3 — deploy-rs from CI (GitOps alt to autoUpgrade)
- **B-0804** (P2) — iter-6.4 — distro-upgrade runbook + orchestrator script
- **B-0805** (P1) — iter-6.5 (CAPSTONE) — ALL nix + ArgoCD + helm + image deps
  current-version sweep + .claude/rules/dep-pin-search-first-authority.md to
  encode the agent-side discipline so the gap doesn't re-open (the maintainer's
  substrate-honest catch that Otto's training-data defaults skew stale)

All 6 rows cross-reference via composes_with; index regenerated. P1 rows are
the urgent ones (EOL recovery + agent-discipline encoding); P2 rows are the
substrate-engineering for "cluster handles updates, not manual operator".

Pick-one decision (autoUpgrade XOR deploy-rs) documented in both B-0801 +
B-0803; B-0802 kured composes with either shape.

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 26, 2026 07:37
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@AceHack AceHack enabled auto-merge (squash) May 26, 2026 07:37
Markdownlint surfaced 5 MD032 errors across B-0801 / B-0803 / B-0804 —
"Lists should be surrounded by blank lines". Inserted blank line
between each "intro sentence:" and the following bullet list.

No content change; lint-only fix.

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Files a six-row iter-6 backlog cluster (B-0800–B-0805) for full-ai-cluster OS lifecycle: an urgent nixpkgs/nix-darwin 24.11→25.11 EOL-recovery bump, system.autoUpgrade enablement, kured ArgoCD app for drain-aware reboots, deploy-rs-from-CI as an alternative push-shape, a cross-channel distro-upgrade runbook + orchestrator, and a capstone all-deps currency audit + agent search-first-authority rule. Pure docs change — six new per-row markdown files plus the auto-regenerated docs/BACKLOG.md index entries.

Changes:

  • Add six new backlog rows under docs/backlog/P1/ and docs/backlog/P2/ covering the iter-6 cluster-update substrate with depends_on / composes_with edges wired across the cluster.
  • Regenerate docs/BACKLOG.md to surface the new entries under P1 and P2.
  • Encode an autoUpgrade-XOR-deploy-rs design constraint and a capstone agent-discipline rule (.claude/rules/dep-pin-search-first-authority.md) to land later.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
docs/backlog/P1/B-0800-…md P1 row: bump nixpkgs/nix-darwin pin 24.11→25.11 (EOL recovery)
docs/backlog/P1/B-0805-…md P1 capstone row: all-deps currency audit tool + agent search-first-authority rule
docs/backlog/P2/B-0801-…md P2 row: enable system.autoUpgrade in common.nix
docs/backlog/P2/B-0802-…md P2 row: kured ArgoCD app for K8s-aware drain+reboot
docs/backlog/P2/B-0803-…md P2 row: deploy-rs from CI as alternative push-shape
docs/backlog/P2/B-0804-…md P2 row: distro-upgrade runbook + Bun orchestrator
docs/BACKLOG.md Regenerated index entries for the six new rows

…dup key + contradictory flags + sentinel path + unclosed quote

All 5 caught by Copilot post-arming review, all real:

(1) B-0803 broken link: linked `../P2/B-0794-iter-5-4-...` but B-0794
    actually lives at `docs/backlog/P1/B-0794-node-self-registers-in-git-...`.
    Fixed link target.

(2) B-0802 duplicate `configuration:` YAML key: two top-level
    `configuration:` mappings in the example Helm values; YAML silently
    keeps only the second, so rebootDays/startTime/endTime/timeZone/
    rebootSentinel would be dropped if copy-pasted. Merged into a single
    `configuration:` block.

(3) B-0802 reboot sentinel path: example used `/var/run/reboot-required`
    (Debian-ism) while sub-target note said `/run/reboot-required`
    (NixOS-actual). Aligned to `/run/reboot-required`; added inline
    comment naming the Debian-ism trap.

(4) B-0801 contradictory autoUpgrade flags: `--commit-lock-file` +
    `--no-write-lock-file` are mutually exclusive (no-write means
    no lock to commit). Dropped `--commit-lock-file`; rewrote the
    explanatory note to match (the cluster has no repo write creds
    anyway; lock updates ship from CI per B-0803).

(5) B-0804 unclosed quote in title: opened `"if we reformat every time
    ..."` but never closed. Fixed by adding `" per the maintainer
    2026-05-26` close + attribution; propagates to BACKLOG.md correctly.

Index regenerated.

Co-Authored-By: Claude <noreply@anthropic.com>
@AceHack AceHack merged commit c1fd188 into main May 26, 2026
29 checks passed
@AceHack AceHack deleted the otto-cli/iter6-cluster-update-backlog-cluster-2026-05-26 branch May 26, 2026 07:46
AceHack added a commit that referenced this pull request May 26, 2026
…n 2025) to 25.11 'Xantusia' (current stable) — the maintainer 2026-05-26 EOL recovery catch (#5218)

The maintainer 2026-05-26: "24.11 is a 2 year old version you found a
25.11 when you searched latest we need to make sure we are on latest
too".

Per WebSearch (per `.claude/rules/dep-pin-search-first-authority.md`):
- NixOS 25.11 "Xantusia" — current stable; released 2025-11-30; EOL
  2026-06-30 per https://nixos.org/blog/announcements/2025/nixos-2511/
- Our pin `nixos-24.11` had been EOL since 2025-06-30 (~11 months
  out-of-support) — substantive supply-chain-security gap.

Changes (all 5 24.11 references in source bumped to 25.11; no behavioral
change beyond the channel bump):

(1) full-ai-cluster/flake.nix:
    - nixpkgs.url: nixos-24.11 → nixos-25.11 (with inline WebSearch
      citation comment for future-Otto reference)
    - nix-darwin.url: nix-darwin-24.11 → nix-darwin-25.11 (matching
      release branch)
    - stateVersion default: "24.11" → "25.11" (PC1 + future cluster
      nodes are fresh-install per maintainer — no persistent K8s
      workloads yet → safe to bump; already-installed hosts should
      NOT bump per-host stateVersion without explicit migration)

(2) full-ai-cluster/usb-nixos-installer/flake.nix:
    - nixpkgs.url + stateVersion: matching bumps

(3) full-ai-cluster/nixos/modules/common.nix:
    - stateVersion ? "24.11" → "25.11" (default fallback for new hosts)

(4) full-ai-cluster/nixos/hosts/worker-template/default.nix:
    - system.stateVersion: "24.11" → "25.11"

(5) full-ai-cluster/usb-nixos-installer/nixos/installer/configuration.nix:
    - system.stateVersion: "24.11" → "25.11"

(6) full-ai-cluster/README.md + tools/zflash.ts:
    - nix-darwin-24.11 → nix-darwin-25.11 + zeta-installer-24.11.iso →
      zeta-installer-25.11.iso (cosmetic; ISO output file name follows
      stateVersion convention)

(7) Both flake.lock files regenerated via `nix flake update`:
    - full-ai-cluster/flake.lock: nixpkgs pinned to b77b3de (2026-05-22)
      + nix-darwin to ebec37a (2026-02-26) + nixos-hardware to c97bc4d
      (2026-05-20)
    - full-ai-cluster/usb-nixos-installer/flake.lock: nixpkgs same
      commit b77b3de

(8) Validated locally: `nix flake check --no-build --show-trace` ✅
    clean (all attributes evaluate; build skipped per check semantics).

Composes with B-0801–B-0805 iter-6 cluster-update arc landed earlier
this session — this is sub-target 0 (the urgent EOL recovery). Once
this lands, next CI ISO build triggers automatically (full-ai-cluster/**
in push paths) → operator gets `zeta-installer-25.11.iso` artifact.

Substrate-inventory pass per #5131 rule:
- grep -rn "24\.11" full-ai-cluster/ → 5 source locations + bump-
  citation comments (intentional)
- grep -rn "nixos-25" full-ai-cluster/ → none pre-bump; safe to
  introduce
- B-0800 row (already on main via #5123) names this as the canonical
  bump target

Co-authored-by: Lior <lior@zeta.dev>
Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants