Skip to content

rule: verify-existing-substrate-before-authoring — sibling to dep-pin-search-first-authority (3-anchor empirical evidence 2026-05-26)#5131

Merged
AceHack merged 3 commits into
mainfrom
otto-cli/rule-ext-verify-existing-substrate-before-authoring-2026-05-26
May 26, 2026
Merged

rule: verify-existing-substrate-before-authoring — sibling to dep-pin-search-first-authority (3-anchor empirical evidence 2026-05-26)#5131
AceHack merged 3 commits into
mainfrom
otto-cli/rule-ext-verify-existing-substrate-before-authoring-2026-05-26

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 26, 2026

Summary

Lands .claude/rules/verify-existing-substrate-before-authoring.md as the substrate-authoring-scope sibling to .claude/rules/dep-pin-search-first-authority.md (landed earlier today via PR #5126).

Empirically grounded in 3 same-root-cause failures from session 2026-05-26:

Same root cause class, different surface

All 3 anchors are "Otto-defaults-to-plausible-but-unverified" — at different surfaces:

  • Version pins / path assertions → covered by dep-pin-search-first-authority.md
  • NEW substrate authoring (backlog rows, rules, skills, agenda entries, architectural framings) → covered by THIS rule
  • Existing-rule citation as alibi for action → covered by fighting-past-self-vs-peer-agent-distinguisher-fix-your-own-coordinate-on-peers-dont-punt-by-default.md

Together the 3 rules cover the surfaces today's empirical evidence shows are vulnerable.

Operational discipline (per the rule body)

4-step pass before authoring new substrate:

  1. Grep across substrate surfaces (agendas + trajectories + backlog + rules + memory + research)
  2. READ THE TOP HITS (not just list them)
  3. Decide: existing-covers / partial / no-existing
  4. Cite the search inline in the new substrate

Auto-loads at cold-boot per wake-time-substrate.

Test plan

  • Single rule file added; no other surface modified
  • Cross-references to dep-pin + fighting-past-self + skill-router + grep-substrate-anchors compose cleanly
  • 3 empirical anchors preserved with verbatim maintainer quotes

🤖 Generated with Claude Code

…search-first-authority) — 3-anchor empirical evidence from session 2026-05-26

Single 2026-05-26 session produced 3 same-root-cause failures
("Otto-defaults-to-plausible-but-unverified" at substrate-authoring
scope):

ANCHOR 1: cascade #4 ISO audit (PR #5119) asserted boot/grub/grub.cfg
without verifying NixOS-actual layout (isolinux + refind). Blocked 4
ISO builds. Fixed via PR #5125. Covered by dep-pin-search-first-
authority rule landed PR #5126.

ANCHOR 2: B-0806 backlog row (PR #5129) authored Ace section as if Ace
were just "a package manager CLI" without reading docs/agendas/ace-
package-manager/AGENDA.md + project memory + 7+ related backlog rows.
The maintainer 2026-05-26: "that is what ace has been since we first
talked about it you just keep forgetting we have substantial backlog
around this". Fixed via PR #5130.

ANCHOR 3: B-0806 hat/fork-negotiation NOT integrated into architecture
even after Anchor-2 correction. The maintainer 2026-05-26: "i'm
assuming you have the hat / fork negoation for ace too". Fixed via
PR #5130 follow-on commit.

Same root cause class as the dep-pin rule, but at a DIFFERENT surface:
this is substrate-authoring scope (backlog rows, rules, skills,
architectural framings), not version-pin scope. dep-pin-search-first-
authority + this rule + fighting-past-self-vs-peer-agent compose to
cover the surfaces today's empirical evidence showed are vulnerable.

The rule auto-loads at cold-boot per wake-time-substrate.

Provides:
- Operational discipline: 4-step grep + read top hits + decide + cite
  inline
- Checklist template for inline substrate-inventory pass annotation
- All 3 empirical anchors preserved so future-Otto sees the cost of
  skipping
- Cross-references to dep-pin + fighting-past-self for full coverage

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 26, 2026 08:29
@AceHack AceHack enabled auto-merge (squash) May 26, 2026 08:29
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

AceHack pushed a commit that referenced this pull request May 26, 2026
…5/B-0794 wrong paths

(1) `[B-0247](../P*/B-0247-*.md)` — markdown links don't support globs;
    GitHub won't resolve. Linked directly to
    `../P1/B-0247-ace-dlc-content-packs-kernel-extensions-package-manager-2026-05-07.md`.

(2) `[B-0805](B-0805-...)` — relative path missing `../P1/` prefix;
    B-0805 is under docs/backlog/P1/ while this row is under
    docs/backlog/P2/. Fixed 5 occurrences via sed (lines 36, 104,
    316, 355, 362).

(3) `[B-0794](B-0794-iter-5-4-...)` — same shape as (2): missing
    `../P1/` prefix AND wrong slug. The actual on-main B-0794 slug is
    `B-0794-node-self-registers-in-git-under-maintainers-cluster-nodes-
    triggers-argocd-full-bringup-of-k8s-apps-charts-gitops-native-
    cluster-substrate-aaron-2026-05-26.md` per `find docs/backlog
    -name B-0794*`. Fixed 2 occurrences.

Pattern note: this is the same broken-link class Copilot caught
earlier in this session on #5121 (B-0794 wrong slug). I keep
authoring these from training-data default slugs instead of running
`find docs/backlog -name "B-NNNN*"` first — fits the empirical-anchor
pattern for the verify-existing-substrate-before-authoring rule
landing in parallel via PR #5131.

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new .claude/rules/ wake-time substrate rule that formalizes a “search and read existing substrate before authoring new substrate” discipline, positioned as the substrate-authoring counterpart to dep-pin-search-first-authority.md.

Changes:

  • Introduces verify-existing-substrate-before-authoring.md with a required pre-authoring inventory/search process and an inline checklist template.
  • Captures three empirical anchors from 2026-05-26 to justify the rule and keep the motivating evidence present at cold-boot.
  • Cross-references existing related rules to clarify composition across surfaces (dep pins, rule-citation failures, substrate authoring).

Comment thread .claude/rules/verify-existing-substrate-before-authoring.md Outdated
Comment thread .claude/rules/verify-existing-substrate-before-authoring.md Outdated
Comment thread .claude/rules/verify-existing-substrate-before-authoring.md
AceHack added a commit that referenced this pull request May 26, 2026
… 'package manager of package managers'; B-0806 sits INSIDE Ace not parallel to it (#5130)

* fix(B-0806): substrate-honest correction — Ace agenda already encodes "package manager of package managers"; B-0806 sits INSIDE Ace, not parallel to it

The maintainer 2026-05-26 substrate-honest catch:
"that is what ace has been since we first talked about it you just keep
forgetting we have substantial backlog around this"

Caught a recurrence of the same agent-discipline gap that produced the
cascade #4 ISO audit failure (PR #5125) earlier today: authoring
substrate from incomplete view of what already exists. The Ace
package-manager-of-package-managers framing is canonical existing
substrate, NOT a new architectural insight surfaced by B-0806.

Existing Ace substrate I should have read first:
- docs/agendas/ace-package-manager/AGENDA.md (OPERATOR-SELF-CLAIMED
  2026-05-22; 13-stage Ace lifecycle; polyglot package contents;
  proto-governance via hats + multi-oracle BFT; symmetric/decentralized)
- docs/trajectories/ace-package-manager-skill-crystallization-pipeline/
  RESUME.md (active trajectory)
- memory/project_ace_package_manager_unrestricted_local_models_guardian_
  oversight_aaron_2026_05_07.md (canonical Aaron 2026-05-07 disclosure:
  unrestricted local models + Guardian/KSK + Bond Curve + Itron
  composition)
- memory/feedback_aaron_ace_package_manager_homebrew_shape_bootstrap_
  website_chat_interface_full_distribution_stack_no_setup_needed_2026_
  05_13.md (full distribution stack)
- B-0247 (parent), B-0287 (closed format spec), B-0288 (in-progress
  CLI), B-0424 (repo-split), B-0742, B-0777 (related backlog cluster)
- docs/research/2026-05-22-ace-package-format-spec-v2-substrate-
  engineering-pipeline-extension.md (DeepSeek 2026-05-22 substrate-
  engineering pipeline extension)

Changes:
- Reframed B-0806's Ace section as "this row sits INSIDE the Ace
  agenda as one instance of stage-8 (distribute), NOT parallel to it"
- Added complete substrate-table citing the canonical Ace docs
- Reworded architecture diagram annotations to credit canonical Ace
  framing (not my "architectural insight")
- Explicitly named this as a second empirical anchor for the
  verify-existing-substrate-before-authoring discipline gap (sibling
  failure mode to cascade #4 ISO audit; PR #5125 + #5126)

Also fixes MD040 (missing language on fenced code blocks at line 111
and 196) — `text` language tag added.

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(B-0806): add NixOS-as-north-star framing per the maintainer 2026-05-26

The maintainer 2026-05-26: "nixos is our north star for declarative
gitops ease"

This is the FRAMING PRINCIPLE for the whole iter-7 arc: NixOS sets the
gold-standard target; ansible+ace+crossplane exist to approximate the
NixOS-native experience on platforms that don't have it (Windows,
macOS, non-NixOS Linux). Every sub-target design decision answers:
"does this make non-Nix MORE like NixOS, or does it add a parallel
imperative-shape?" Former is the direction; latter is the failure mode.

Added new top-section "## North star" capturing this verbatim, with
the framing-implications for sub-target design decisions called out.

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(B-0806): integrate hats + fork-negotiation into architecture flow per maintainer 2026-05-26 — 3rd same-pattern catch this session

The maintainer 2026-05-26: "i'm assuming you have the hat / fork negoation
for ace too"

Third instance today of authoring-from-incomplete-view of the Ace
substrate. I cited B-0742 + B-0777 in the previous correction's
substrate-table but did NOT integrate hats + fork-negotiation into
B-0806's architectural flow. The Ace agenda already specifies:
"Hats = controls + self-bindings over time crystals (PAIR is load-
bearing primitive)" + "proto-governance via skill-bound hats with
multi-oracle BFT (authority + bindings tied to skills)" — canonical
existing substrate I should have integrated, not bolted on.

Changes:

(1) Added "### Architectural integration of hats + fork-negotiation"
    section showing the 5-step Ace invocation flow for every `ace
    install <pkg>`:
    1a. Hat resolution (skill-bound; PAIR primitive)
    1b. Multi-oracle BFT proto-governance (N-of-M consent)
    1c. Cross-fork ontology negotiation (per B-0741/B-0777; per-persona
        ontology maps)
    1d. Guardian/KSK gate (per canonical Ace project memory; Bond Curve
        pricing; local receipts; high-risk multi-N-of-M)
    1e. ace install proceeds + receipt written

(2) Added B-0741 to the substrate-citation table with explicit
    "CLOSED prematurely earlier this session" annotation. The close
    was mechanically justified (DIRTY conflict) but the substrate
    is load-bearing for B-0806's architectural integration.

(3) New "## Sub-row to re-file" section tracks B-0741 as a known
    dependency for iter-7 implementation; needs cherry-pick re-land
    per pr-triage-tiers Tier 3.

(4) Updated "agent-discipline failure" note to mark this as the
    THIRD instance today (cascade #4 ISO audit / B-0806 Ace-section /
    B-0806 hats-fork-negotiation). Pattern is clear enough that the
    "verify-existing-substrate-before-authoring" rule extension to
    dep-pin-search-first-authority is genuinely load-bearing.

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(B-0806): 2 Copilot P1 broken xrefs on #5130 — B-0247 glob + B-0805/B-0794 wrong paths

(1) `[B-0247](../P*/B-0247-*.md)` — markdown links don't support globs;
    GitHub won't resolve. Linked directly to
    `../P1/B-0247-ace-dlc-content-packs-kernel-extensions-package-manager-2026-05-07.md`.

(2) `[B-0805](B-0805-...)` — relative path missing `../P1/` prefix;
    B-0805 is under docs/backlog/P1/ while this row is under
    docs/backlog/P2/. Fixed 5 occurrences via sed (lines 36, 104,
    316, 355, 362).

(3) `[B-0794](B-0794-iter-5-4-...)` — same shape as (2): missing
    `../P1/` prefix AND wrong slug. The actual on-main B-0794 slug is
    `B-0794-node-self-registers-in-git-under-maintainers-cluster-nodes-
    triggers-argocd-full-bringup-of-k8s-apps-charts-gitops-native-
    cluster-substrate-aaron-2026-05-26.md` per `find docs/backlog
    -name B-0794*`. Fixed 2 occurrences.

Pattern note: this is the same broken-link class Copilot caught
earlier in this session on #5121 (B-0794 wrong slug). I keep
authoring these from training-data default slugs instead of running
`find docs/backlog -name "B-NNNN*"` first — fits the empirical-anchor
pattern for the verify-existing-substrate-before-authoring rule
landing in parallel via PR #5131.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Lior <lior@zeta.dev>
Co-authored-by: Claude <noreply@anthropic.com>
Lior and others added 2 commits May 26, 2026 04:33
…; reword to "plus refind"

markdownlint MD032 fired on line 100 because the wrap-continuation
"+ refind, NOT legacy GRUB..." starts with `+ ` which is a valid
markdown list marker. Linter doesn't know this is a wrapped paragraph
continuation from line 99.

Reword "isolinux + refind" → "isolinux plus refind" to disambiguate.
No content change.

Co-Authored-By: Claude <noreply@anthropic.com>
…d-string discipline; 3rd (table double-pipe) is FP

(1) Earlier inventory snippet used filename/directory-name filtering
    (`find docs/agendas -type d | grep -i "$topic"`) which misses
    substrate that mentions the topic in CONTENT without the keyword in
    the filename. Should be content-search via grep -rl. Same gap for
    docs/trajectories/.

(2) Earlier snippet used `grep -E "$topic"` (regex) + unquoted shell
    globs (`memory/*${topic}*`). Both break when topic contains regex
    metacharacters (`+`, `.`, `B-NNNN`) or spaces. Use `grep -F`
    (fixed-string) for safety + content-search (no globs).

(3) Bonus fix: `.claude/skills/` was missing from the inventory surfaces
    even though skills are explicitly in-scope for the rule. Added.

3rd Copilot thread (table double-pipe at line 158/149) is the
documented known-FP class per `.claude/rules/blocked-green-ci-investigate-threads.md`
("Table double-pipe (`||`) ... 4 confirmed FPs in one session"). Direct
inspection of line 158 (`| Surface | Rule that catches it |`) confirms
single pipes; resolving that thread no-op per the suspect-by-default
discipline.

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 26, 2026 08:35
@AceHack AceHack merged commit 78735f6 into main May 26, 2026
27 of 29 checks passed
@AceHack AceHack deleted the otto-cli/rule-ext-verify-existing-substrate-before-authoring-2026-05-26 branch May 26, 2026 08:37
AceHack added a commit that referenced this pull request May 26, 2026
…iage) — ontology+category negotiation; load-bearing for iter-7 (B-0806) (#5133)

* backlog(B-0811 re-land from PR #5003): ontology+category negotiation as AI-skills+hats federation point across clusters+forks — load-bearing for iter-7 (B-0806) hat/fork-negotiation architecture

Re-land of substrate originally filed as B-0741 via PR #5003 on
2026-05-25. Closed during this session's stale-PR triage as Tier 3
(DIRTY-conflict) per .claude/rules/pr-triage-tiers.md. Triage
close-comment named the cherry-pick re-land path explicitly.

Renumbered to B-0811 (next-free per inventory pass) because B-0741 ID
remains taken on main; renumbering follows the ID-allocation
discipline used by PR #5132 (peer Otto's classifier-bypass rows
B-0800-0803 → B-0807-0810 dup-fix today).

Inventory pass per `.claude/rules/verify-existing-substrate-before-
authoring.md` (landed earlier this session via PR #5131):

- grep -rlF "B-0741" docs/ memory/ .claude/ → 10+ existing references
  (BACKLOG.md + 3 docs/research/ files + 5 sibling backlog rows + 1
  research catalog entry) — confirms B-0741 is REFERENCED substrate
  whose re-land closes dangling cross-refs
- grep -rlF "fork-negotiation" docs/agendas/ docs/backlog/
  .claude/rules/ → 1 existing related row (B-0742 hats-as-negotiated-
  fork-structure) — sibling, NOT redundant
- highest B-08xx on main: B-0810 (just landed via #5132) — B-0811 is
  next free

Substrate-honest framing: this is the SAME content as PR #5003's
commit 0f691db; only `id:` field + filename + the new "## Re-land
context" section differ. Original cross-references (composes_with
B-0247/B-0287/B-0288/B-0731/B-0727/B-0726/B-0638/B-0703) preserved
verbatim — all those targets still exist on main.

Load-bearing for iter-7 (B-0806 Ansible+Crossplane+Ace cross-OS
substrate) per the maintainer 2026-05-26 catch "i'm assuming you have
the hat / fork negoation for ace too". Cross-fork ontology negotiation
is the third layer of every `ace install <pkg>` action in B-0806's
architectural integration section.

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(B-0811): MD032 blanks-around-lists — 5 spots auto-fixed via awk pass

Original PR #5003 content had 5 MD032 violations (intro-sentence
immediately followed by list with no blank line). Fixed via awk pass
inserting blank lines before list-start when prev line is non-blank,
non-list, non-table.

Lint-only fix; no content change.

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(B-0811): bump last_updated to 2026-05-26 (Copilot finding — re-land added content today)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Lior <lior@zeta.dev>
Co-authored-by: Claude <noreply@anthropic.com>
@AceHack AceHack review requested due to automatic review settings May 26, 2026 08:56
AceHack added a commit that referenced this pull request May 26, 2026
… updates refs not files (1008Z empirical anchor caught phantom PR #5128 drift) (#5134)

The failure mode: agent runs `git fetch origin main` (Step-1 refresh)
which updates `refs/remotes/origin/main` but does NOT promote local
HEAD; subsequent `Read`/`cat`/`grep` against working-tree paths read
the LOCAL HEAD's files (stale-against-origin if local hasn't been
ff-promoted). If the agent authors substrate at that point, it's
against state that may already be resolved on origin/main N commits
ahead — phantom "drift" findings that require retraction.

Empirical anchor (this commit's authoring session):
- 2026-05-26T10:08Z Otto-CLI cold-boot ran `git fetch origin main`
  (success) in the operator's primary checkout (local HEAD `2774fef5a`)
- Read `tools/alignment/filter_gate_log.ts` + `.test.ts` via working
  tree — both files appeared unfixed despite PR #5128 having landed
  the fix 1h 46min earlier at 08:22Z
- Local primary was 11 commits behind origin/main (`1641da6d2`)
- Without `refresh-before-decide` catching the staleness, the next
  substrate landing would have been a public PR retracting against
  already-resolved state

Three mitigation patterns named in the rule (pick by context):
1. Isolated worktree off `origin/main`:
   `git worktree add --detach <path> origin/main` (default for ticks;
   composes with agent-worktree-hygiene `--detach` discipline)
2. `git show origin/main:<path>` for ad-hoc single-file inspection
   without checkout
3. ff-promote local HEAD — ONLY when the checkout is the agent's own,
   never the operator's primary

Substrate-inventory step performed per verify-existing-substrate-
before-authoring (PR #5131 — only visible via `git show origin/main:`
because local was stale): existing partial coverage at
otto-channels-reference-card.md ID-allocation section names this for
`find docs/backlog -name "B-*.md"` queries. This extension generalizes
the principle to any working-tree file read post-fetch + lands it on
the refresh-before-decide surface where it auto-loads at every cold-boot.

Files:
- .claude/rules/refresh-before-decide.md (+69 lines: new section
  "git fetch updates refs but NOT working-tree files (post-fetch read
  trap)" + 3-pattern mitigation + 2026-05-26T10:08Z anchor + 5
  composes_with citations)
- docs/hygiene-history/ticks/2026/05/26/1008Z.md (+151 lines: full
  tick trace including the catch, the verify-before-defer composition,
  the substrate-inventory step, visibility signal)

Composes with:
- refresh-before-decide.md (existing 28-line rule extended)
- verify-existing-substrate-before-authoring.md (PR #5131)
- otto-channels-reference-card.md (ID-allocation narrow precedent)
- agent-worktree-hygiene-never-hold-main-never-step-on-operator-cleanup-on-pr-merge.md
- refresh-world-model-poll-pr-gate.md (origin/main over FETCH_HEAD)
- dep-pin-search-first-authority.md (sibling at version-pin scope)
- codeql-no-source-on-docs-only-pr-is-broken-commit-canary.md
  (verify-before-defer composition 8th-or-9th anchor)
- PR #5128 — the fix whose phantom-drift catch this tick prevented

Co-authored-by: Lior <lior@zeta.dev>
Co-authored-by: Claude <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 26, 2026
… at install time — minimum-viable device-registration substrate the maintainer's deferral named (#5210)

The maintainer 2026-05-26: "i'll wait till we have the install.sh and git
native device registration into github is ready before i run again" +
"so human maintiner cannot be the named dep you are waiting on the
backlog is too big" (substrate-honest catch on punt-by-default).

Implements the homelab-first variant of B-0794 sub-targets 1+3+5 per
Mika 2026-05-26 substrate ("USB ships with NO embedded credentials;
first boot prompts gh auth login + operator authenticates + auto-copy
operator's pubkey to authorized_keys"). Production-mode (per-node
deploy-key + bootstrap-key-rotation) deferred to follow-on per Aaron's
"simple homelab way first but like prod later" direction.

Changes:

(1) full-ai-cluster/usb-nixos-installer/zeta-install.sh — NEW Step 6.8
    inserted between Step 6.7 (iter-5.1 wifi persistence) and the
    nixos-install invocation:
    - Prompts operator with [Y/n] to run `gh auth login`
    - Operator authenticates interactively (browser code / device-flow /
      paste-token — gh CLI picks based on platform)
    - On success: `gh ssh-key list --json id,key,title` extracts all
      SSH pubkeys the operator has registered with GitHub
    - Writes one-per-line to /mnt/etc/zeta/operator-authorized-keys
      with `gh-key-<id>-<title>` comment so operator can identify
      later
    - Composes additively with iter-4.2 static maintainer-key injection
      (NOT a replacement; both paths can succeed for the same install)
    - Skippable; falls back gracefully to iter-4.2 OR manual config-edit
      per iter-4 v1 flow

(2) full-ai-cluster/nixos/modules/operator-authorized-keys.nix — NEW
    module that mirrors the iter-5.3 initial-password.nix +
    iter-5.2 injected-hostname.nix injection pattern:
    - Reads /etc/zeta/operator-authorized-keys via builtins.readFile
      at nixos-install/rebuild time
    - Filters lines (drops blank + comment + non-ssh-prefixed)
    - Adds to users.users.zeta.openssh.authorizedKeys.keys
    - Backward-compat fallback (no file → empty list → no harm; static
      iter-4.2 keys still apply if injected)

(3) full-ai-cluster/nixos/modules/common.nix — imports
    operator-authorized-keys.nix so every cluster host inherits the
    capability (composes with existing injected-hostname.nix +
    login-banner.nix imports landed earlier today).

(4) full-ai-cluster/usb-nixos-installer/nixos/installer/configuration.nix
    — adds `gh` to the installer ISO's environment.systemPackages so
    `gh auth login` is available at install time. (gh is NOT added to
    cluster nodes' baseline; out of scope for iter-5.4.0; operator can
    install separately later if needed.)

(5) install-complete banner updated with 3-way path discriminator:
    iter-5.4.0-success / iter-4.2-success-only / both-skipped (fallback
    to manual edit). Each path documents next-step UX.

Empirical UX (operator perspective):
- Boot from USB → zeta-install.sh runs interactively
- Steps 1-6.7 unchanged (disk wipe + cluster identity prompts + nixos
  config injection + wifi)
- NEW: Step 6.8 prompts "Run gh auth login now? [Y/n]:"
- Operator hits Enter (Y default) → gh auth flow opens → authenticate
- Step 7 nixos-install runs (~5-10min for fresh install)
- Final banner shows "iter-5.4.0 GH-AUTH + OPERATOR-PUBKEY INJECTION:
  SUCCESS (N keys)" + "ssh zeta@<hostname>.local" works on first boot
  from any machine using operator's registered-with-GitHub SSH keys

Per the maintainer's "after that gets on main we can format the usb
and try again" — this PR is the iter-5.4.0 dependency lift; once
merged, next ISO build (push to main on full-ai-cluster/** triggers
the workflow per the broadened trigger paths landed in #5116) will
produce a fresh artifact ready for re-flash.

NOT in scope (B-0794 future sub-rows):
- Self-registration commit/push to maintainers/<name>/cluster-nodes/
  (B-0794 sub-target 3 full; this PR is sub-target 1 minimum-viable)
- ArgoCD app watching cluster-nodes tree (sub-target 4)
- --maintainer flag on zflash (sub-target 5; defaults to gh-auth user)
- Production-mode bootstrap-key rotation (deferred per Aaron's
  homelab-first direction)

Substrate-inventory pass per `.claude/rules/verify-existing-substrate-
before-authoring.md` (landed earlier this session via #5131):
- grep -rlF "B-0794" → existing canonical row + Mika preservation +
  composes_with cluster (verified before authoring)
- grep -rlF "iter-5.4" → no prior implementation; this is the first
  iter-5.4.x landing
- grep -rlF "operator-authorized-keys" → no existing file; safe to add
- Pattern mirrors initial-password.nix + injected-hostname.nix exactly

Co-authored-by: Lior <lior@zeta.dev>
Co-authored-by: Claude <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 26, 2026
…ainers/<operator>/cluster-nodes — builds on iter-5.4.0 (PR #5210) gh-auth foothold; decomposes B-0794 sub-target 3 (#5211)

Filed concurrently while #5210 (iter-5.4.0 minimum-viable) builds CI.
Substrate-inventory pass per verify-existing-substrate-before-authoring
rule landed earlier today (#5131): iter-5.4.1 unused; ID B-0812 next-
free; B-0794 + composes_with chain all verified on main.

Co-authored-by: Lior <lior@zeta.dev>
Co-authored-by: Claude <noreply@anthropic.com>
AceHack pushed a commit that referenced this pull request May 26, 2026
…iene + link-depth correction + title-quote unbalance

5 TS findings on deregister-node.ts:
(1) P1: --reason now rejects values starting with `-` so `--reason --push-direct`
    doesn't silently consume the flag
(2) P1: --host validated against DNS-label regex
    `/^[a-zA-Z0-9]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?$/` — blocks path-traversal
    (`../foo`) + shell-metachars since --host interpolates into filesystem
    path AND branch name
(3) P1: mkdtempSync dir cleaned up in worktree-add failure path (was leaking)
(4) P2: branch prefix changed `otto-cli/deregister-` → `deregister/` (this is
    an operator tool, not an Otto-agent lane; misattribution per agent-roster
    discipline)
(5) P2: added `if (import.meta.main)` guard for import-without-side-effects
    pattern (matches tools/backlog/generate-index.ts convention)

3 backlog row findings:
(6) P1: B-0814 `status: in-progress` (not in documented enum) → `status: open`
    (the enum allows open/closed/superseded-by-*/deferred/decomposed per
    tools/backlog/README.md)
(7+8) P1: `.claude/rules/...` link from docs/backlog/P{1,2}/ was `../../`
    which resolves to docs/, not repo root. Fixed to `../../../.claude/` for
    correct 3-up to repo root. Copilot was right; my earlier "FP" inclination
    was wrong on direct verification (`ls docs/backlog/P1/../../.claude/...`
    confirmed broken).

2 BACKLOG.md generated-content findings:
(9+10) Row titles had maintainer-quotes that generate-index.ts truncated mid-
    string, leaving unclosed `"`. Fixed by moving maintainer-quotes from
    title to body; new titles are quote-free + drop the truncation hazard.

Substrate-inventory pass per #5131 rule extension still operative:
the BACKLOG.md unclosed-quote pattern is the generator-tool's truncation
behavior, not row-authoring; row-side fix (no quotes in title) sidesteps
the generator's truncation footgun. Generator-level fix (truncate-at-quote-
boundary) is the proper substrate fix; out of scope for this fix-fwd but
worth a follow-on row.

Co-Authored-By: Claude <noreply@anthropic.com>
AceHack pushed a commit that referenced this pull request May 26, 2026
…er-nodes tree → reconciles on PR-merge — completes the iter-5.4 arc

Iter-5.4.0 (PR #5210) lands gh-auth foothold.
Iter-5.4.1 (PR #5211 row) lands self-registration commit+push.
THIS row (iter-5.4.2) decomposes B-0794 sub-target 4: ArgoCD reconciler
that consumes the self-registration PRs and translates ClusterNode CRs
to K8s node-labels/taints/role-specific workloads.

After all 3 slices land + impl, the maintainer's full vision:
'zflash → boot → install → gh-auth → self-register → operator merges
PR from phone → cluster auto-converges' is operational. Zero manual
kubectl required.

Sub-targets sketched: CRD definition, ArgoCD Application resource,
reconciler controller (kustomize + simple kubectl-shell loop initial;
Go operator deferred), role-to-label/taint mapping ConfigMap,
empirical end-to-end validation on PC1.

Substrate-inventory pass per #5131 rule: iter-5.4.2 unused;
cluster-nodes-reconciler unused; ID B-0813 next-free; all composes_with
chain verified on main + in flight.

Co-Authored-By: Claude <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 26, 2026
…xpiration design — the maintainer 2026-05-26 dual ask (#5216)

* feat(B-0814 P1) + backlog(B-0815 P2): TS deregister tool + heartbeat/expiration design — the maintainer 2026-05-26 dual ask

The maintainer 2026-05-26 named two iter-5.4 follow-on substrate needs:

(1) "lets make a ts file for removing machines from git too cause i'm
    going to delete clusters a lot lol" → tools/cluster/deregister-node.ts

(2) "Or the next step will be how do keep registration status physically
    in sync with machine, like maybe you have to reregister once a day
    or week or something or it expires" → B-0815 heartbeat/expiration
    design row

This PR bundles all three (deregister tool ships; both backlog rows
file).

## tools/cluster/deregister-node.ts (B-0814, status: in-progress → done
upon merge)

TS Bun script (per Rule 0) that:
- Resolves operator via `gh api /user --jq .login` (matches registration
  flow auto-derivation)
- Verifies node exists on origin/main before destructive op (exit 2 if
  not found)
- Creates temp worktree (don't touch operator's primary checkout per
  Aaron's B-0751 SHARED-VIEW discipline)
- git rm -r the cluster-nodes/<host>/ subtree
- Commits + pushes (branch: `otto-cli/deregister-<host>-<YYYYMMDD-HHMM>`)
- Opens PR by default (safer; ArgoCD won't reconcile half-baked state);
  --push-direct flag for fast-path
- Cleans up temp worktree on exit (including error paths)

Usage:
  bun tools/cluster/deregister-node.ts --host pikachu \
      [--maintainer aaron] [--reason "..."] [--push-direct]

Exit-code contract: 0 = PR opened (or direct push); 1 = invocation
error; 2 = host not found; 3 = git/push/gh error.

## B-0815 P2 — heartbeat/expiration design row

Substrate-engineering design for the second-order "stay in sync with
physical reality" need. 4 options documented:

A. TTL-based expiration (scheduled scanner; auto-deregister past-expiry)
B. Node-side heartbeat daemon (commit-per-heartbeat → git churn concern)
C. Hybrid TTL + on-demand refresh (operator's framing match;
   recommended default)
D. Use K8s node-status as truth (cluster-native; requires more
   substrate to ship first)

Operator's pick required before implementation; row captures the
tradeoff space + my recommendation (C for homelab; D as upgrade path).

5 sub-targets sketched (schema extension, scanner, optional node-side
daemon, grace-period policy, documentation).

## Substrate-inventory pass per #5131 rule

- grep -rlF "deregister-node" → none; safe
- grep -rlF "heartbeat" → existing refs at different scopes (B-0726
  Reticulum, B-0703 BFT); no overlap with this row's cluster-node
  scope
- grep -rlF "expires_at" → no existing usage; safe
- tools/cluster/ directory doesn't yet exist; PR creates it
- B-0814 + B-0815 IDs next-free per git ls-tree origin/main

Composes with iter-5.4.x arc (B-0794 + B-0812 + B-0813), B-0790
zero-dev-machine end-state, B-0751 primary-checkout-is-SHARED-VIEW
discipline (deregister tool uses temp worktree).

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(B-0814+B-0815): MD032 blanks-around-lists — awk auto-fix pass

3 MD032 errors across the 2 backlog rows. Awk pass inserts blank line
before list-start when prev line is non-blank, non-list, non-table.
Lint-only fix; no content change.

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(postmerge-5216): 10 Copilot findings — TS hardening + backlog hygiene + link-depth correction + title-quote unbalance

5 TS findings on deregister-node.ts:
(1) P1: --reason now rejects values starting with `-` so `--reason --push-direct`
    doesn't silently consume the flag
(2) P1: --host validated against DNS-label regex
    `/^[a-zA-Z0-9]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?$/` — blocks path-traversal
    (`../foo`) + shell-metachars since --host interpolates into filesystem
    path AND branch name
(3) P1: mkdtempSync dir cleaned up in worktree-add failure path (was leaking)
(4) P2: branch prefix changed `otto-cli/deregister-` → `deregister/` (this is
    an operator tool, not an Otto-agent lane; misattribution per agent-roster
    discipline)
(5) P2: added `if (import.meta.main)` guard for import-without-side-effects
    pattern (matches tools/backlog/generate-index.ts convention)

3 backlog row findings:
(6) P1: B-0814 `status: in-progress` (not in documented enum) → `status: open`
    (the enum allows open/closed/superseded-by-*/deferred/decomposed per
    tools/backlog/README.md)
(7+8) P1: `.claude/rules/...` link from docs/backlog/P{1,2}/ was `../../`
    which resolves to docs/, not repo root. Fixed to `../../../.claude/` for
    correct 3-up to repo root. Copilot was right; my earlier "FP" inclination
    was wrong on direct verification (`ls docs/backlog/P1/../../.claude/...`
    confirmed broken).

2 BACKLOG.md generated-content findings:
(9+10) Row titles had maintainer-quotes that generate-index.ts truncated mid-
    string, leaving unclosed `"`. Fixed by moving maintainer-quotes from
    title to body; new titles are quote-free + drop the truncation hazard.

Substrate-inventory pass per #5131 rule extension still operative:
the BACKLOG.md unclosed-quote pattern is the generator-tool's truncation
behavior, not row-authoring; row-side fix (no quotes in title) sidesteps
the generator's truncation footgun. Generator-level fix (truncate-at-quote-
boundary) is the proper substrate fix; out of scope for this fix-fwd but
worth a follow-on row.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Lior <lior@zeta.dev>
Co-authored-by: Claude <noreply@anthropic.com>
AceHack pushed a commit that referenced this pull request May 26, 2026
…er-nodes tree → reconciles on PR-merge — completes the iter-5.4 arc

Iter-5.4.0 (PR #5210) lands gh-auth foothold.
Iter-5.4.1 (PR #5211 row) lands self-registration commit+push.
THIS row (iter-5.4.2) decomposes B-0794 sub-target 4: ArgoCD reconciler
that consumes the self-registration PRs and translates ClusterNode CRs
to K8s node-labels/taints/role-specific workloads.

After all 3 slices land + impl, the maintainer's full vision:
'zflash → boot → install → gh-auth → self-register → operator merges
PR from phone → cluster auto-converges' is operational. Zero manual
kubectl required.

Sub-targets sketched: CRD definition, ArgoCD Application resource,
reconciler controller (kustomize + simple kubectl-shell loop initial;
Go operator deferred), role-to-label/taint mapping ConfigMap,
empirical end-to-end validation on PC1.

Substrate-inventory pass per #5131 rule: iter-5.4.2 unused;
cluster-nodes-reconciler unused; ID B-0813 next-free; all composes_with
chain verified on main + in flight.

Co-Authored-By: Claude <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 26, 2026
…n 2025) to 25.11 'Xantusia' (current stable) — the maintainer 2026-05-26 EOL recovery catch (#5218)

The maintainer 2026-05-26: "24.11 is a 2 year old version you found a
25.11 when you searched latest we need to make sure we are on latest
too".

Per WebSearch (per `.claude/rules/dep-pin-search-first-authority.md`):
- NixOS 25.11 "Xantusia" — current stable; released 2025-11-30; EOL
  2026-06-30 per https://nixos.org/blog/announcements/2025/nixos-2511/
- Our pin `nixos-24.11` had been EOL since 2025-06-30 (~11 months
  out-of-support) — substantive supply-chain-security gap.

Changes (all 5 24.11 references in source bumped to 25.11; no behavioral
change beyond the channel bump):

(1) full-ai-cluster/flake.nix:
    - nixpkgs.url: nixos-24.11 → nixos-25.11 (with inline WebSearch
      citation comment for future-Otto reference)
    - nix-darwin.url: nix-darwin-24.11 → nix-darwin-25.11 (matching
      release branch)
    - stateVersion default: "24.11" → "25.11" (PC1 + future cluster
      nodes are fresh-install per maintainer — no persistent K8s
      workloads yet → safe to bump; already-installed hosts should
      NOT bump per-host stateVersion without explicit migration)

(2) full-ai-cluster/usb-nixos-installer/flake.nix:
    - nixpkgs.url + stateVersion: matching bumps

(3) full-ai-cluster/nixos/modules/common.nix:
    - stateVersion ? "24.11" → "25.11" (default fallback for new hosts)

(4) full-ai-cluster/nixos/hosts/worker-template/default.nix:
    - system.stateVersion: "24.11" → "25.11"

(5) full-ai-cluster/usb-nixos-installer/nixos/installer/configuration.nix:
    - system.stateVersion: "24.11" → "25.11"

(6) full-ai-cluster/README.md + tools/zflash.ts:
    - nix-darwin-24.11 → nix-darwin-25.11 + zeta-installer-24.11.iso →
      zeta-installer-25.11.iso (cosmetic; ISO output file name follows
      stateVersion convention)

(7) Both flake.lock files regenerated via `nix flake update`:
    - full-ai-cluster/flake.lock: nixpkgs pinned to b77b3de (2026-05-22)
      + nix-darwin to ebec37a (2026-02-26) + nixos-hardware to c97bc4d
      (2026-05-20)
    - full-ai-cluster/usb-nixos-installer/flake.lock: nixpkgs same
      commit b77b3de

(8) Validated locally: `nix flake check --no-build --show-trace` ✅
    clean (all attributes evaluate; build skipped per check semantics).

Composes with B-0801–B-0805 iter-6 cluster-update arc landed earlier
this session — this is sub-target 0 (the urgent EOL recovery). Once
this lands, next CI ISO build triggers automatically (full-ai-cluster/**
in push paths) → operator gets `zeta-installer-25.11.iso` artifact.

Substrate-inventory pass per #5131 rule:
- grep -rn "24\.11" full-ai-cluster/ → 5 source locations + bump-
  citation comments (intentional)
- grep -rn "nixos-25" full-ai-cluster/ → none pre-bump; safe to
  introduce
- B-0800 row (already on main via #5123) names this as the canonical
  bump target

Co-authored-by: Lior <lior@zeta.dev>
Co-authored-by: Claude <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 26, 2026
…nimize NixOS-native lock-in for cross-cluster portability — the maintainer 2026-05-26 ArgoCD-portability leverage catch (#5220)

The maintainer 2026-05-26 immediately after iter-6.0 nixpkgs bump:
'nice also ArgoCD is ususaly be anyone with k8s too not just nixos so
antoher reason to push as much as possible into argocd.'

Architectural principle row that informs every subsequent cluster-
substrate decision. Carved sentence:

  ArgoCD is used by ANYONE running Kubernetes (not just NixOS users);
  substrate-in-ArgoCD ports across every K8s cluster + every K8s
  distribution. NixOS-native substrate is load-bearing for the BOOT +
  OS layer, but BEYOND THAT every substrate-engineering decision
  should default to ArgoCD-managed for cross-cluster portability
  leverage.

Operational discipline + table-classification of existing iter-5/6/7
substrate per principle (B-0813 reconciler / B-0802 kured / B-0806
Crossplane stay ArgoCD; B-0800 nixpkgs / B-0801 autoUpgrade / B-0803
deploy-rs are NixOS-only path).

Implication for B-0782 cluster-IS-DIO: DIO lives in 4 layers (boot+OS
= NixOS; K8s+workload = ArgoCD; external-infra = Crossplane via
ArgoCD; heterogeneous-OS = Ansible+Ace bridge).

Implication for iter-5.4.x arc: cluster-nodes-reconciler (B-0813) IS
ArgoCD-managed → operators on any K8s distro (K3S-on-Ubuntu, Talos,
RKE2, EKS, etc.) can adopt the substrate by pointing their ArgoCD at
the maintainers/<op>/cluster-nodes/ tree.

Implication for Ace: Ace becomes cross-distro bootstrap entry-point;
ArgoCD becomes convergence engine; NixOS-native is one of N possible
host substrates.

Substrate-inventory pass per #5131 rule: no existing 'principle' row
on this topic; ID B-0816 next-free; composes with 12 existing rows
across iter-5/6/7 + Ace + B-0782/0790 arc.

Co-authored-by: Lior <lior@zeta.dev>
Co-authored-by: Claude <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 26, 2026
…ion companion symmetric to deregister (B-0814) (#5221)

Natural arc-completion: B-0814 deregister-node.ts shipped (PR #5216);
symmetric register-node.ts companion fills the manual register path.
Use cases: re-register after wipe + reinstall when self-registration
failed; legacy hardware adoption (non-NixOS distros per B-0816
cross-distro principle); operator metadata override (GPU swap, disk
replace); test-substrate register without booting hardware.

Two modes:
- Compose mode (default): --host + --roles + optional --ip/--mac →
  build node.yaml using same B-0813 schema
- Pass-through mode (--from-yaml ./node.yaml): operator provides
  pre-composed yaml; tool validates + commits + pushes

Mirrors deregister-node.ts shape:
- Temp worktree (no operator-checkout-touch per B-0751)
- DNS-label hostname validation
- --reason text in commit + PR
- Branch prefix 'register/' (NOT 'otto-cli/' per Copilot P2 finding
  on B-0814 — this is an operator tool, not Otto-agent work)
- Exit-code contract: 0 (PR opened) / 1 (invocation error) / 2 (host
  already registered without --force) / 3 (git error)
- import.meta.main guard

Existence check defaults to refuse-overwrite (safer); --force flag for
intentional re-register.

Filed as P2 because: deregister is P1 (Aaron named it explicitly);
manual register is implied by symmetry but not explicitly named.

Substrate-inventory pass per #5131 rule: no existing register-tool
substrate; ID B-0817 next-free; composes with iter-5.4 + B-0816
cross-distro principle.

Co-authored-by: Lior <lior@zeta.dev>
Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants