rule: verify-existing-substrate-before-authoring — sibling to dep-pin-search-first-authority (3-anchor empirical evidence 2026-05-26)#5131
Merged
AceHack merged 3 commits intoMay 26, 2026
Conversation
…search-first-authority) — 3-anchor empirical evidence from session 2026-05-26
Single 2026-05-26 session produced 3 same-root-cause failures
("Otto-defaults-to-plausible-but-unverified" at substrate-authoring
scope):
ANCHOR 1: cascade #4 ISO audit (PR #5119) asserted boot/grub/grub.cfg
without verifying NixOS-actual layout (isolinux + refind). Blocked 4
ISO builds. Fixed via PR #5125. Covered by dep-pin-search-first-
authority rule landed PR #5126.
ANCHOR 2: B-0806 backlog row (PR #5129) authored Ace section as if Ace
were just "a package manager CLI" without reading docs/agendas/ace-
package-manager/AGENDA.md + project memory + 7+ related backlog rows.
The maintainer 2026-05-26: "that is what ace has been since we first
talked about it you just keep forgetting we have substantial backlog
around this". Fixed via PR #5130.
ANCHOR 3: B-0806 hat/fork-negotiation NOT integrated into architecture
even after Anchor-2 correction. The maintainer 2026-05-26: "i'm
assuming you have the hat / fork negoation for ace too". Fixed via
PR #5130 follow-on commit.
Same root cause class as the dep-pin rule, but at a DIFFERENT surface:
this is substrate-authoring scope (backlog rows, rules, skills,
architectural framings), not version-pin scope. dep-pin-search-first-
authority + this rule + fighting-past-self-vs-peer-agent compose to
cover the surfaces today's empirical evidence showed are vulnerable.
The rule auto-loads at cold-boot per wake-time-substrate.
Provides:
- Operational discipline: 4-step grep + read top hits + decide + cite
inline
- Checklist template for inline substrate-inventory pass annotation
- All 3 empirical anchors preserved so future-Otto sees the cost of
skipping
- Cross-references to dep-pin + fighting-past-self for full coverage
Co-Authored-By: Claude <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
AceHack
pushed a commit
that referenced
this pull request
May 26, 2026
…5/B-0794 wrong paths
(1) `[B-0247](../P*/B-0247-*.md)` — markdown links don't support globs;
GitHub won't resolve. Linked directly to
`../P1/B-0247-ace-dlc-content-packs-kernel-extensions-package-manager-2026-05-07.md`.
(2) `[B-0805](B-0805-...)` — relative path missing `../P1/` prefix;
B-0805 is under docs/backlog/P1/ while this row is under
docs/backlog/P2/. Fixed 5 occurrences via sed (lines 36, 104,
316, 355, 362).
(3) `[B-0794](B-0794-iter-5-4-...)` — same shape as (2): missing
`../P1/` prefix AND wrong slug. The actual on-main B-0794 slug is
`B-0794-node-self-registers-in-git-under-maintainers-cluster-nodes-
triggers-argocd-full-bringup-of-k8s-apps-charts-gitops-native-
cluster-substrate-aaron-2026-05-26.md` per `find docs/backlog
-name B-0794*`. Fixed 2 occurrences.
Pattern note: this is the same broken-link class Copilot caught
earlier in this session on #5121 (B-0794 wrong slug). I keep
authoring these from training-data default slugs instead of running
`find docs/backlog -name "B-NNNN*"` first — fits the empirical-anchor
pattern for the verify-existing-substrate-before-authoring rule
landing in parallel via PR #5131.
Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a new .claude/rules/ wake-time substrate rule that formalizes a “search and read existing substrate before authoring new substrate” discipline, positioned as the substrate-authoring counterpart to dep-pin-search-first-authority.md.
Changes:
- Introduces
verify-existing-substrate-before-authoring.mdwith a required pre-authoring inventory/search process and an inline checklist template. - Captures three empirical anchors from 2026-05-26 to justify the rule and keep the motivating evidence present at cold-boot.
- Cross-references existing related rules to clarify composition across surfaces (dep pins, rule-citation failures, substrate authoring).
AceHack
added a commit
that referenced
this pull request
May 26, 2026
… 'package manager of package managers'; B-0806 sits INSIDE Ace not parallel to it (#5130) * fix(B-0806): substrate-honest correction — Ace agenda already encodes "package manager of package managers"; B-0806 sits INSIDE Ace, not parallel to it The maintainer 2026-05-26 substrate-honest catch: "that is what ace has been since we first talked about it you just keep forgetting we have substantial backlog around this" Caught a recurrence of the same agent-discipline gap that produced the cascade #4 ISO audit failure (PR #5125) earlier today: authoring substrate from incomplete view of what already exists. The Ace package-manager-of-package-managers framing is canonical existing substrate, NOT a new architectural insight surfaced by B-0806. Existing Ace substrate I should have read first: - docs/agendas/ace-package-manager/AGENDA.md (OPERATOR-SELF-CLAIMED 2026-05-22; 13-stage Ace lifecycle; polyglot package contents; proto-governance via hats + multi-oracle BFT; symmetric/decentralized) - docs/trajectories/ace-package-manager-skill-crystallization-pipeline/ RESUME.md (active trajectory) - memory/project_ace_package_manager_unrestricted_local_models_guardian_ oversight_aaron_2026_05_07.md (canonical Aaron 2026-05-07 disclosure: unrestricted local models + Guardian/KSK + Bond Curve + Itron composition) - memory/feedback_aaron_ace_package_manager_homebrew_shape_bootstrap_ website_chat_interface_full_distribution_stack_no_setup_needed_2026_ 05_13.md (full distribution stack) - B-0247 (parent), B-0287 (closed format spec), B-0288 (in-progress CLI), B-0424 (repo-split), B-0742, B-0777 (related backlog cluster) - docs/research/2026-05-22-ace-package-format-spec-v2-substrate- engineering-pipeline-extension.md (DeepSeek 2026-05-22 substrate- engineering pipeline extension) Changes: - Reframed B-0806's Ace section as "this row sits INSIDE the Ace agenda as one instance of stage-8 (distribute), NOT parallel to it" - Added complete substrate-table citing the canonical Ace docs - Reworded architecture diagram annotations to credit canonical Ace framing (not my "architectural insight") - Explicitly named this as a second empirical anchor for the verify-existing-substrate-before-authoring discipline gap (sibling failure mode to cascade #4 ISO audit; PR #5125 + #5126) Also fixes MD040 (missing language on fenced code blocks at line 111 and 196) — `text` language tag added. Co-Authored-By: Claude <noreply@anthropic.com> * fix(B-0806): add NixOS-as-north-star framing per the maintainer 2026-05-26 The maintainer 2026-05-26: "nixos is our north star for declarative gitops ease" This is the FRAMING PRINCIPLE for the whole iter-7 arc: NixOS sets the gold-standard target; ansible+ace+crossplane exist to approximate the NixOS-native experience on platforms that don't have it (Windows, macOS, non-NixOS Linux). Every sub-target design decision answers: "does this make non-Nix MORE like NixOS, or does it add a parallel imperative-shape?" Former is the direction; latter is the failure mode. Added new top-section "## North star" capturing this verbatim, with the framing-implications for sub-target design decisions called out. Co-Authored-By: Claude <noreply@anthropic.com> * fix(B-0806): integrate hats + fork-negotiation into architecture flow per maintainer 2026-05-26 — 3rd same-pattern catch this session The maintainer 2026-05-26: "i'm assuming you have the hat / fork negoation for ace too" Third instance today of authoring-from-incomplete-view of the Ace substrate. I cited B-0742 + B-0777 in the previous correction's substrate-table but did NOT integrate hats + fork-negotiation into B-0806's architectural flow. The Ace agenda already specifies: "Hats = controls + self-bindings over time crystals (PAIR is load- bearing primitive)" + "proto-governance via skill-bound hats with multi-oracle BFT (authority + bindings tied to skills)" — canonical existing substrate I should have integrated, not bolted on. Changes: (1) Added "### Architectural integration of hats + fork-negotiation" section showing the 5-step Ace invocation flow for every `ace install <pkg>`: 1a. Hat resolution (skill-bound; PAIR primitive) 1b. Multi-oracle BFT proto-governance (N-of-M consent) 1c. Cross-fork ontology negotiation (per B-0741/B-0777; per-persona ontology maps) 1d. Guardian/KSK gate (per canonical Ace project memory; Bond Curve pricing; local receipts; high-risk multi-N-of-M) 1e. ace install proceeds + receipt written (2) Added B-0741 to the substrate-citation table with explicit "CLOSED prematurely earlier this session" annotation. The close was mechanically justified (DIRTY conflict) but the substrate is load-bearing for B-0806's architectural integration. (3) New "## Sub-row to re-file" section tracks B-0741 as a known dependency for iter-7 implementation; needs cherry-pick re-land per pr-triage-tiers Tier 3. (4) Updated "agent-discipline failure" note to mark this as the THIRD instance today (cascade #4 ISO audit / B-0806 Ace-section / B-0806 hats-fork-negotiation). Pattern is clear enough that the "verify-existing-substrate-before-authoring" rule extension to dep-pin-search-first-authority is genuinely load-bearing. Co-Authored-By: Claude <noreply@anthropic.com> * fix(B-0806): 2 Copilot P1 broken xrefs on #5130 — B-0247 glob + B-0805/B-0794 wrong paths (1) `[B-0247](../P*/B-0247-*.md)` — markdown links don't support globs; GitHub won't resolve. Linked directly to `../P1/B-0247-ace-dlc-content-packs-kernel-extensions-package-manager-2026-05-07.md`. (2) `[B-0805](B-0805-...)` — relative path missing `../P1/` prefix; B-0805 is under docs/backlog/P1/ while this row is under docs/backlog/P2/. Fixed 5 occurrences via sed (lines 36, 104, 316, 355, 362). (3) `[B-0794](B-0794-iter-5-4-...)` — same shape as (2): missing `../P1/` prefix AND wrong slug. The actual on-main B-0794 slug is `B-0794-node-self-registers-in-git-under-maintainers-cluster-nodes- triggers-argocd-full-bringup-of-k8s-apps-charts-gitops-native- cluster-substrate-aaron-2026-05-26.md` per `find docs/backlog -name B-0794*`. Fixed 2 occurrences. Pattern note: this is the same broken-link class Copilot caught earlier in this session on #5121 (B-0794 wrong slug). I keep authoring these from training-data default slugs instead of running `find docs/backlog -name "B-NNNN*"` first — fits the empirical-anchor pattern for the verify-existing-substrate-before-authoring rule landing in parallel via PR #5131. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude <noreply@anthropic.com>
…; reword to "plus refind" markdownlint MD032 fired on line 100 because the wrap-continuation "+ refind, NOT legacy GRUB..." starts with `+ ` which is a valid markdown list marker. Linter doesn't know this is a wrapped paragraph continuation from line 99. Reword "isolinux + refind" → "isolinux plus refind" to disambiguate. No content change. Co-Authored-By: Claude <noreply@anthropic.com>
…d-string discipline; 3rd (table double-pipe) is FP
(1) Earlier inventory snippet used filename/directory-name filtering
(`find docs/agendas -type d | grep -i "$topic"`) which misses
substrate that mentions the topic in CONTENT without the keyword in
the filename. Should be content-search via grep -rl. Same gap for
docs/trajectories/.
(2) Earlier snippet used `grep -E "$topic"` (regex) + unquoted shell
globs (`memory/*${topic}*`). Both break when topic contains regex
metacharacters (`+`, `.`, `B-NNNN`) or spaces. Use `grep -F`
(fixed-string) for safety + content-search (no globs).
(3) Bonus fix: `.claude/skills/` was missing from the inventory surfaces
even though skills are explicitly in-scope for the rule. Added.
3rd Copilot thread (table double-pipe at line 158/149) is the
documented known-FP class per `.claude/rules/blocked-green-ci-investigate-threads.md`
("Table double-pipe (`||`) ... 4 confirmed FPs in one session"). Direct
inspection of line 158 (`| Surface | Rule that catches it |`) confirms
single pipes; resolving that thread no-op per the suspect-by-default
discipline.
Co-Authored-By: Claude <noreply@anthropic.com>
4 tasks
AceHack
added a commit
that referenced
this pull request
May 26, 2026
…iage) — ontology+category negotiation; load-bearing for iter-7 (B-0806) (#5133) * backlog(B-0811 re-land from PR #5003): ontology+category negotiation as AI-skills+hats federation point across clusters+forks — load-bearing for iter-7 (B-0806) hat/fork-negotiation architecture Re-land of substrate originally filed as B-0741 via PR #5003 on 2026-05-25. Closed during this session's stale-PR triage as Tier 3 (DIRTY-conflict) per .claude/rules/pr-triage-tiers.md. Triage close-comment named the cherry-pick re-land path explicitly. Renumbered to B-0811 (next-free per inventory pass) because B-0741 ID remains taken on main; renumbering follows the ID-allocation discipline used by PR #5132 (peer Otto's classifier-bypass rows B-0800-0803 → B-0807-0810 dup-fix today). Inventory pass per `.claude/rules/verify-existing-substrate-before- authoring.md` (landed earlier this session via PR #5131): - grep -rlF "B-0741" docs/ memory/ .claude/ → 10+ existing references (BACKLOG.md + 3 docs/research/ files + 5 sibling backlog rows + 1 research catalog entry) — confirms B-0741 is REFERENCED substrate whose re-land closes dangling cross-refs - grep -rlF "fork-negotiation" docs/agendas/ docs/backlog/ .claude/rules/ → 1 existing related row (B-0742 hats-as-negotiated- fork-structure) — sibling, NOT redundant - highest B-08xx on main: B-0810 (just landed via #5132) — B-0811 is next free Substrate-honest framing: this is the SAME content as PR #5003's commit 0f691db; only `id:` field + filename + the new "## Re-land context" section differ. Original cross-references (composes_with B-0247/B-0287/B-0288/B-0731/B-0727/B-0726/B-0638/B-0703) preserved verbatim — all those targets still exist on main. Load-bearing for iter-7 (B-0806 Ansible+Crossplane+Ace cross-OS substrate) per the maintainer 2026-05-26 catch "i'm assuming you have the hat / fork negoation for ace too". Cross-fork ontology negotiation is the third layer of every `ace install <pkg>` action in B-0806's architectural integration section. Co-Authored-By: Claude <noreply@anthropic.com> * fix(B-0811): MD032 blanks-around-lists — 5 spots auto-fixed via awk pass Original PR #5003 content had 5 MD032 violations (intro-sentence immediately followed by list with no blank line). Fixed via awk pass inserting blank lines before list-start when prev line is non-blank, non-list, non-table. Lint-only fix; no content change. Co-Authored-By: Claude <noreply@anthropic.com> * fix(B-0811): bump last_updated to 2026-05-26 (Copilot finding — re-land added content today) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude <noreply@anthropic.com>
6 tasks
AceHack
added a commit
that referenced
this pull request
May 26, 2026
… updates refs not files (1008Z empirical anchor caught phantom PR #5128 drift) (#5134) The failure mode: agent runs `git fetch origin main` (Step-1 refresh) which updates `refs/remotes/origin/main` but does NOT promote local HEAD; subsequent `Read`/`cat`/`grep` against working-tree paths read the LOCAL HEAD's files (stale-against-origin if local hasn't been ff-promoted). If the agent authors substrate at that point, it's against state that may already be resolved on origin/main N commits ahead — phantom "drift" findings that require retraction. Empirical anchor (this commit's authoring session): - 2026-05-26T10:08Z Otto-CLI cold-boot ran `git fetch origin main` (success) in the operator's primary checkout (local HEAD `2774fef5a`) - Read `tools/alignment/filter_gate_log.ts` + `.test.ts` via working tree — both files appeared unfixed despite PR #5128 having landed the fix 1h 46min earlier at 08:22Z - Local primary was 11 commits behind origin/main (`1641da6d2`) - Without `refresh-before-decide` catching the staleness, the next substrate landing would have been a public PR retracting against already-resolved state Three mitigation patterns named in the rule (pick by context): 1. Isolated worktree off `origin/main`: `git worktree add --detach <path> origin/main` (default for ticks; composes with agent-worktree-hygiene `--detach` discipline) 2. `git show origin/main:<path>` for ad-hoc single-file inspection without checkout 3. ff-promote local HEAD — ONLY when the checkout is the agent's own, never the operator's primary Substrate-inventory step performed per verify-existing-substrate- before-authoring (PR #5131 — only visible via `git show origin/main:` because local was stale): existing partial coverage at otto-channels-reference-card.md ID-allocation section names this for `find docs/backlog -name "B-*.md"` queries. This extension generalizes the principle to any working-tree file read post-fetch + lands it on the refresh-before-decide surface where it auto-loads at every cold-boot. Files: - .claude/rules/refresh-before-decide.md (+69 lines: new section "git fetch updates refs but NOT working-tree files (post-fetch read trap)" + 3-pattern mitigation + 2026-05-26T10:08Z anchor + 5 composes_with citations) - docs/hygiene-history/ticks/2026/05/26/1008Z.md (+151 lines: full tick trace including the catch, the verify-before-defer composition, the substrate-inventory step, visibility signal) Composes with: - refresh-before-decide.md (existing 28-line rule extended) - verify-existing-substrate-before-authoring.md (PR #5131) - otto-channels-reference-card.md (ID-allocation narrow precedent) - agent-worktree-hygiene-never-hold-main-never-step-on-operator-cleanup-on-pr-merge.md - refresh-world-model-poll-pr-gate.md (origin/main over FETCH_HEAD) - dep-pin-search-first-authority.md (sibling at version-pin scope) - codeql-no-source-on-docs-only-pr-is-broken-commit-canary.md (verify-before-defer composition 8th-or-9th anchor) - PR #5128 — the fix whose phantom-drift catch this tick prevented Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude <noreply@anthropic.com>
This was referenced May 26, 2026
AceHack
added a commit
that referenced
this pull request
May 26, 2026
… at install time — minimum-viable device-registration substrate the maintainer's deferral named (#5210) The maintainer 2026-05-26: "i'll wait till we have the install.sh and git native device registration into github is ready before i run again" + "so human maintiner cannot be the named dep you are waiting on the backlog is too big" (substrate-honest catch on punt-by-default). Implements the homelab-first variant of B-0794 sub-targets 1+3+5 per Mika 2026-05-26 substrate ("USB ships with NO embedded credentials; first boot prompts gh auth login + operator authenticates + auto-copy operator's pubkey to authorized_keys"). Production-mode (per-node deploy-key + bootstrap-key-rotation) deferred to follow-on per Aaron's "simple homelab way first but like prod later" direction. Changes: (1) full-ai-cluster/usb-nixos-installer/zeta-install.sh — NEW Step 6.8 inserted between Step 6.7 (iter-5.1 wifi persistence) and the nixos-install invocation: - Prompts operator with [Y/n] to run `gh auth login` - Operator authenticates interactively (browser code / device-flow / paste-token — gh CLI picks based on platform) - On success: `gh ssh-key list --json id,key,title` extracts all SSH pubkeys the operator has registered with GitHub - Writes one-per-line to /mnt/etc/zeta/operator-authorized-keys with `gh-key-<id>-<title>` comment so operator can identify later - Composes additively with iter-4.2 static maintainer-key injection (NOT a replacement; both paths can succeed for the same install) - Skippable; falls back gracefully to iter-4.2 OR manual config-edit per iter-4 v1 flow (2) full-ai-cluster/nixos/modules/operator-authorized-keys.nix — NEW module that mirrors the iter-5.3 initial-password.nix + iter-5.2 injected-hostname.nix injection pattern: - Reads /etc/zeta/operator-authorized-keys via builtins.readFile at nixos-install/rebuild time - Filters lines (drops blank + comment + non-ssh-prefixed) - Adds to users.users.zeta.openssh.authorizedKeys.keys - Backward-compat fallback (no file → empty list → no harm; static iter-4.2 keys still apply if injected) (3) full-ai-cluster/nixos/modules/common.nix — imports operator-authorized-keys.nix so every cluster host inherits the capability (composes with existing injected-hostname.nix + login-banner.nix imports landed earlier today). (4) full-ai-cluster/usb-nixos-installer/nixos/installer/configuration.nix — adds `gh` to the installer ISO's environment.systemPackages so `gh auth login` is available at install time. (gh is NOT added to cluster nodes' baseline; out of scope for iter-5.4.0; operator can install separately later if needed.) (5) install-complete banner updated with 3-way path discriminator: iter-5.4.0-success / iter-4.2-success-only / both-skipped (fallback to manual edit). Each path documents next-step UX. Empirical UX (operator perspective): - Boot from USB → zeta-install.sh runs interactively - Steps 1-6.7 unchanged (disk wipe + cluster identity prompts + nixos config injection + wifi) - NEW: Step 6.8 prompts "Run gh auth login now? [Y/n]:" - Operator hits Enter (Y default) → gh auth flow opens → authenticate - Step 7 nixos-install runs (~5-10min for fresh install) - Final banner shows "iter-5.4.0 GH-AUTH + OPERATOR-PUBKEY INJECTION: SUCCESS (N keys)" + "ssh zeta@<hostname>.local" works on first boot from any machine using operator's registered-with-GitHub SSH keys Per the maintainer's "after that gets on main we can format the usb and try again" — this PR is the iter-5.4.0 dependency lift; once merged, next ISO build (push to main on full-ai-cluster/** triggers the workflow per the broadened trigger paths landed in #5116) will produce a fresh artifact ready for re-flash. NOT in scope (B-0794 future sub-rows): - Self-registration commit/push to maintainers/<name>/cluster-nodes/ (B-0794 sub-target 3 full; this PR is sub-target 1 minimum-viable) - ArgoCD app watching cluster-nodes tree (sub-target 4) - --maintainer flag on zflash (sub-target 5; defaults to gh-auth user) - Production-mode bootstrap-key rotation (deferred per Aaron's homelab-first direction) Substrate-inventory pass per `.claude/rules/verify-existing-substrate- before-authoring.md` (landed earlier this session via #5131): - grep -rlF "B-0794" → existing canonical row + Mika preservation + composes_with cluster (verified before authoring) - grep -rlF "iter-5.4" → no prior implementation; this is the first iter-5.4.x landing - grep -rlF "operator-authorized-keys" → no existing file; safe to add - Pattern mirrors initial-password.nix + injected-hostname.nix exactly Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude <noreply@anthropic.com>
AceHack
added a commit
that referenced
this pull request
May 26, 2026
…ainers/<operator>/cluster-nodes — builds on iter-5.4.0 (PR #5210) gh-auth foothold; decomposes B-0794 sub-target 3 (#5211) Filed concurrently while #5210 (iter-5.4.0 minimum-viable) builds CI. Substrate-inventory pass per verify-existing-substrate-before-authoring rule landed earlier today (#5131): iter-5.4.1 unused; ID B-0812 next- free; B-0794 + composes_with chain all verified on main. Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude <noreply@anthropic.com>
This was referenced May 26, 2026
AceHack
pushed a commit
that referenced
this pull request
May 26, 2026
…iene + link-depth correction + title-quote unbalance
5 TS findings on deregister-node.ts:
(1) P1: --reason now rejects values starting with `-` so `--reason --push-direct`
doesn't silently consume the flag
(2) P1: --host validated against DNS-label regex
`/^[a-zA-Z0-9]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?$/` — blocks path-traversal
(`../foo`) + shell-metachars since --host interpolates into filesystem
path AND branch name
(3) P1: mkdtempSync dir cleaned up in worktree-add failure path (was leaking)
(4) P2: branch prefix changed `otto-cli/deregister-` → `deregister/` (this is
an operator tool, not an Otto-agent lane; misattribution per agent-roster
discipline)
(5) P2: added `if (import.meta.main)` guard for import-without-side-effects
pattern (matches tools/backlog/generate-index.ts convention)
3 backlog row findings:
(6) P1: B-0814 `status: in-progress` (not in documented enum) → `status: open`
(the enum allows open/closed/superseded-by-*/deferred/decomposed per
tools/backlog/README.md)
(7+8) P1: `.claude/rules/...` link from docs/backlog/P{1,2}/ was `../../`
which resolves to docs/, not repo root. Fixed to `../../../.claude/` for
correct 3-up to repo root. Copilot was right; my earlier "FP" inclination
was wrong on direct verification (`ls docs/backlog/P1/../../.claude/...`
confirmed broken).
2 BACKLOG.md generated-content findings:
(9+10) Row titles had maintainer-quotes that generate-index.ts truncated mid-
string, leaving unclosed `"`. Fixed by moving maintainer-quotes from
title to body; new titles are quote-free + drop the truncation hazard.
Substrate-inventory pass per #5131 rule extension still operative:
the BACKLOG.md unclosed-quote pattern is the generator-tool's truncation
behavior, not row-authoring; row-side fix (no quotes in title) sidesteps
the generator's truncation footgun. Generator-level fix (truncate-at-quote-
boundary) is the proper substrate fix; out of scope for this fix-fwd but
worth a follow-on row.
Co-Authored-By: Claude <noreply@anthropic.com>
AceHack
pushed a commit
that referenced
this pull request
May 26, 2026
…er-nodes tree → reconciles on PR-merge — completes the iter-5.4 arc Iter-5.4.0 (PR #5210) lands gh-auth foothold. Iter-5.4.1 (PR #5211 row) lands self-registration commit+push. THIS row (iter-5.4.2) decomposes B-0794 sub-target 4: ArgoCD reconciler that consumes the self-registration PRs and translates ClusterNode CRs to K8s node-labels/taints/role-specific workloads. After all 3 slices land + impl, the maintainer's full vision: 'zflash → boot → install → gh-auth → self-register → operator merges PR from phone → cluster auto-converges' is operational. Zero manual kubectl required. Sub-targets sketched: CRD definition, ArgoCD Application resource, reconciler controller (kustomize + simple kubectl-shell loop initial; Go operator deferred), role-to-label/taint mapping ConfigMap, empirical end-to-end validation on PC1. Substrate-inventory pass per #5131 rule: iter-5.4.2 unused; cluster-nodes-reconciler unused; ID B-0813 next-free; all composes_with chain verified on main + in flight. Co-Authored-By: Claude <noreply@anthropic.com>
AceHack
added a commit
that referenced
this pull request
May 26, 2026
…xpiration design — the maintainer 2026-05-26 dual ask (#5216) * feat(B-0814 P1) + backlog(B-0815 P2): TS deregister tool + heartbeat/expiration design — the maintainer 2026-05-26 dual ask The maintainer 2026-05-26 named two iter-5.4 follow-on substrate needs: (1) "lets make a ts file for removing machines from git too cause i'm going to delete clusters a lot lol" → tools/cluster/deregister-node.ts (2) "Or the next step will be how do keep registration status physically in sync with machine, like maybe you have to reregister once a day or week or something or it expires" → B-0815 heartbeat/expiration design row This PR bundles all three (deregister tool ships; both backlog rows file). ## tools/cluster/deregister-node.ts (B-0814, status: in-progress → done upon merge) TS Bun script (per Rule 0) that: - Resolves operator via `gh api /user --jq .login` (matches registration flow auto-derivation) - Verifies node exists on origin/main before destructive op (exit 2 if not found) - Creates temp worktree (don't touch operator's primary checkout per Aaron's B-0751 SHARED-VIEW discipline) - git rm -r the cluster-nodes/<host>/ subtree - Commits + pushes (branch: `otto-cli/deregister-<host>-<YYYYMMDD-HHMM>`) - Opens PR by default (safer; ArgoCD won't reconcile half-baked state); --push-direct flag for fast-path - Cleans up temp worktree on exit (including error paths) Usage: bun tools/cluster/deregister-node.ts --host pikachu \ [--maintainer aaron] [--reason "..."] [--push-direct] Exit-code contract: 0 = PR opened (or direct push); 1 = invocation error; 2 = host not found; 3 = git/push/gh error. ## B-0815 P2 — heartbeat/expiration design row Substrate-engineering design for the second-order "stay in sync with physical reality" need. 4 options documented: A. TTL-based expiration (scheduled scanner; auto-deregister past-expiry) B. Node-side heartbeat daemon (commit-per-heartbeat → git churn concern) C. Hybrid TTL + on-demand refresh (operator's framing match; recommended default) D. Use K8s node-status as truth (cluster-native; requires more substrate to ship first) Operator's pick required before implementation; row captures the tradeoff space + my recommendation (C for homelab; D as upgrade path). 5 sub-targets sketched (schema extension, scanner, optional node-side daemon, grace-period policy, documentation). ## Substrate-inventory pass per #5131 rule - grep -rlF "deregister-node" → none; safe - grep -rlF "heartbeat" → existing refs at different scopes (B-0726 Reticulum, B-0703 BFT); no overlap with this row's cluster-node scope - grep -rlF "expires_at" → no existing usage; safe - tools/cluster/ directory doesn't yet exist; PR creates it - B-0814 + B-0815 IDs next-free per git ls-tree origin/main Composes with iter-5.4.x arc (B-0794 + B-0812 + B-0813), B-0790 zero-dev-machine end-state, B-0751 primary-checkout-is-SHARED-VIEW discipline (deregister tool uses temp worktree). Co-Authored-By: Claude <noreply@anthropic.com> * fix(B-0814+B-0815): MD032 blanks-around-lists — awk auto-fix pass 3 MD032 errors across the 2 backlog rows. Awk pass inserts blank line before list-start when prev line is non-blank, non-list, non-table. Lint-only fix; no content change. Co-Authored-By: Claude <noreply@anthropic.com> * fix(postmerge-5216): 10 Copilot findings — TS hardening + backlog hygiene + link-depth correction + title-quote unbalance 5 TS findings on deregister-node.ts: (1) P1: --reason now rejects values starting with `-` so `--reason --push-direct` doesn't silently consume the flag (2) P1: --host validated against DNS-label regex `/^[a-zA-Z0-9]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?$/` — blocks path-traversal (`../foo`) + shell-metachars since --host interpolates into filesystem path AND branch name (3) P1: mkdtempSync dir cleaned up in worktree-add failure path (was leaking) (4) P2: branch prefix changed `otto-cli/deregister-` → `deregister/` (this is an operator tool, not an Otto-agent lane; misattribution per agent-roster discipline) (5) P2: added `if (import.meta.main)` guard for import-without-side-effects pattern (matches tools/backlog/generate-index.ts convention) 3 backlog row findings: (6) P1: B-0814 `status: in-progress` (not in documented enum) → `status: open` (the enum allows open/closed/superseded-by-*/deferred/decomposed per tools/backlog/README.md) (7+8) P1: `.claude/rules/...` link from docs/backlog/P{1,2}/ was `../../` which resolves to docs/, not repo root. Fixed to `../../../.claude/` for correct 3-up to repo root. Copilot was right; my earlier "FP" inclination was wrong on direct verification (`ls docs/backlog/P1/../../.claude/...` confirmed broken). 2 BACKLOG.md generated-content findings: (9+10) Row titles had maintainer-quotes that generate-index.ts truncated mid- string, leaving unclosed `"`. Fixed by moving maintainer-quotes from title to body; new titles are quote-free + drop the truncation hazard. Substrate-inventory pass per #5131 rule extension still operative: the BACKLOG.md unclosed-quote pattern is the generator-tool's truncation behavior, not row-authoring; row-side fix (no quotes in title) sidesteps the generator's truncation footgun. Generator-level fix (truncate-at-quote- boundary) is the proper substrate fix; out of scope for this fix-fwd but worth a follow-on row. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude <noreply@anthropic.com>
AceHack
pushed a commit
that referenced
this pull request
May 26, 2026
…er-nodes tree → reconciles on PR-merge — completes the iter-5.4 arc Iter-5.4.0 (PR #5210) lands gh-auth foothold. Iter-5.4.1 (PR #5211 row) lands self-registration commit+push. THIS row (iter-5.4.2) decomposes B-0794 sub-target 4: ArgoCD reconciler that consumes the self-registration PRs and translates ClusterNode CRs to K8s node-labels/taints/role-specific workloads. After all 3 slices land + impl, the maintainer's full vision: 'zflash → boot → install → gh-auth → self-register → operator merges PR from phone → cluster auto-converges' is operational. Zero manual kubectl required. Sub-targets sketched: CRD definition, ArgoCD Application resource, reconciler controller (kustomize + simple kubectl-shell loop initial; Go operator deferred), role-to-label/taint mapping ConfigMap, empirical end-to-end validation on PC1. Substrate-inventory pass per #5131 rule: iter-5.4.2 unused; cluster-nodes-reconciler unused; ID B-0813 next-free; all composes_with chain verified on main + in flight. Co-Authored-By: Claude <noreply@anthropic.com>
AceHack
added a commit
that referenced
this pull request
May 26, 2026
…n 2025) to 25.11 'Xantusia' (current stable) — the maintainer 2026-05-26 EOL recovery catch (#5218) The maintainer 2026-05-26: "24.11 is a 2 year old version you found a 25.11 when you searched latest we need to make sure we are on latest too". Per WebSearch (per `.claude/rules/dep-pin-search-first-authority.md`): - NixOS 25.11 "Xantusia" — current stable; released 2025-11-30; EOL 2026-06-30 per https://nixos.org/blog/announcements/2025/nixos-2511/ - Our pin `nixos-24.11` had been EOL since 2025-06-30 (~11 months out-of-support) — substantive supply-chain-security gap. Changes (all 5 24.11 references in source bumped to 25.11; no behavioral change beyond the channel bump): (1) full-ai-cluster/flake.nix: - nixpkgs.url: nixos-24.11 → nixos-25.11 (with inline WebSearch citation comment for future-Otto reference) - nix-darwin.url: nix-darwin-24.11 → nix-darwin-25.11 (matching release branch) - stateVersion default: "24.11" → "25.11" (PC1 + future cluster nodes are fresh-install per maintainer — no persistent K8s workloads yet → safe to bump; already-installed hosts should NOT bump per-host stateVersion without explicit migration) (2) full-ai-cluster/usb-nixos-installer/flake.nix: - nixpkgs.url + stateVersion: matching bumps (3) full-ai-cluster/nixos/modules/common.nix: - stateVersion ? "24.11" → "25.11" (default fallback for new hosts) (4) full-ai-cluster/nixos/hosts/worker-template/default.nix: - system.stateVersion: "24.11" → "25.11" (5) full-ai-cluster/usb-nixos-installer/nixos/installer/configuration.nix: - system.stateVersion: "24.11" → "25.11" (6) full-ai-cluster/README.md + tools/zflash.ts: - nix-darwin-24.11 → nix-darwin-25.11 + zeta-installer-24.11.iso → zeta-installer-25.11.iso (cosmetic; ISO output file name follows stateVersion convention) (7) Both flake.lock files regenerated via `nix flake update`: - full-ai-cluster/flake.lock: nixpkgs pinned to b77b3de (2026-05-22) + nix-darwin to ebec37a (2026-02-26) + nixos-hardware to c97bc4d (2026-05-20) - full-ai-cluster/usb-nixos-installer/flake.lock: nixpkgs same commit b77b3de (8) Validated locally: `nix flake check --no-build --show-trace` ✅ clean (all attributes evaluate; build skipped per check semantics). Composes with B-0801–B-0805 iter-6 cluster-update arc landed earlier this session — this is sub-target 0 (the urgent EOL recovery). Once this lands, next CI ISO build triggers automatically (full-ai-cluster/** in push paths) → operator gets `zeta-installer-25.11.iso` artifact. Substrate-inventory pass per #5131 rule: - grep -rn "24\.11" full-ai-cluster/ → 5 source locations + bump- citation comments (intentional) - grep -rn "nixos-25" full-ai-cluster/ → none pre-bump; safe to introduce - B-0800 row (already on main via #5123) names this as the canonical bump target Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude <noreply@anthropic.com>
AceHack
added a commit
that referenced
this pull request
May 26, 2026
…nimize NixOS-native lock-in for cross-cluster portability — the maintainer 2026-05-26 ArgoCD-portability leverage catch (#5220) The maintainer 2026-05-26 immediately after iter-6.0 nixpkgs bump: 'nice also ArgoCD is ususaly be anyone with k8s too not just nixos so antoher reason to push as much as possible into argocd.' Architectural principle row that informs every subsequent cluster- substrate decision. Carved sentence: ArgoCD is used by ANYONE running Kubernetes (not just NixOS users); substrate-in-ArgoCD ports across every K8s cluster + every K8s distribution. NixOS-native substrate is load-bearing for the BOOT + OS layer, but BEYOND THAT every substrate-engineering decision should default to ArgoCD-managed for cross-cluster portability leverage. Operational discipline + table-classification of existing iter-5/6/7 substrate per principle (B-0813 reconciler / B-0802 kured / B-0806 Crossplane stay ArgoCD; B-0800 nixpkgs / B-0801 autoUpgrade / B-0803 deploy-rs are NixOS-only path). Implication for B-0782 cluster-IS-DIO: DIO lives in 4 layers (boot+OS = NixOS; K8s+workload = ArgoCD; external-infra = Crossplane via ArgoCD; heterogeneous-OS = Ansible+Ace bridge). Implication for iter-5.4.x arc: cluster-nodes-reconciler (B-0813) IS ArgoCD-managed → operators on any K8s distro (K3S-on-Ubuntu, Talos, RKE2, EKS, etc.) can adopt the substrate by pointing their ArgoCD at the maintainers/<op>/cluster-nodes/ tree. Implication for Ace: Ace becomes cross-distro bootstrap entry-point; ArgoCD becomes convergence engine; NixOS-native is one of N possible host substrates. Substrate-inventory pass per #5131 rule: no existing 'principle' row on this topic; ID B-0816 next-free; composes with 12 existing rows across iter-5/6/7 + Ace + B-0782/0790 arc. Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude <noreply@anthropic.com>
AceHack
added a commit
that referenced
this pull request
May 26, 2026
…ion companion symmetric to deregister (B-0814) (#5221) Natural arc-completion: B-0814 deregister-node.ts shipped (PR #5216); symmetric register-node.ts companion fills the manual register path. Use cases: re-register after wipe + reinstall when self-registration failed; legacy hardware adoption (non-NixOS distros per B-0816 cross-distro principle); operator metadata override (GPU swap, disk replace); test-substrate register without booting hardware. Two modes: - Compose mode (default): --host + --roles + optional --ip/--mac → build node.yaml using same B-0813 schema - Pass-through mode (--from-yaml ./node.yaml): operator provides pre-composed yaml; tool validates + commits + pushes Mirrors deregister-node.ts shape: - Temp worktree (no operator-checkout-touch per B-0751) - DNS-label hostname validation - --reason text in commit + PR - Branch prefix 'register/' (NOT 'otto-cli/' per Copilot P2 finding on B-0814 — this is an operator tool, not Otto-agent work) - Exit-code contract: 0 (PR opened) / 1 (invocation error) / 2 (host already registered without --force) / 3 (git error) - import.meta.main guard Existence check defaults to refuse-overwrite (safer); --force flag for intentional re-register. Filed as P2 because: deregister is P1 (Aaron named it explicitly); manual register is implied by symmetry but not explicitly named. Substrate-inventory pass per #5131 rule: no existing register-tool substrate; ID B-0817 next-free; composes with iter-5.4 + B-0816 cross-distro principle. Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Lands
.claude/rules/verify-existing-substrate-before-authoring.mdas the substrate-authoring-scope sibling to.claude/rules/dep-pin-search-first-authority.md(landed earlier today via PR #5126).Empirically grounded in 3 same-root-cause failures from session 2026-05-26:
boot/grub/grub.cfgpath (NixOS actually uses isolinux + refind). Blocked 4 ISO builds. Fixed via PR fix(ci P0): cascade #4 audit blocking ALL ISO builds since #5119 — bootloader any-of #5125. Covered bydep-pin-search-first-authority.md.docs/agendas/ace-package-manager/AGENDA.md+ project memory + 7+ related backlog rows. The maintainer 2026-05-26: "that is what ace has been since we first talked about it you just keep forgetting we have substantial backlog around this". Fixed via PR fix(B-0806): substrate-honest correction — Ace agenda already encodes 'package manager of package managers'; B-0806 sits INSIDE Ace not parallel to it #5130.Same root cause class, different surface
All 3 anchors are "Otto-defaults-to-plausible-but-unverified" — at different surfaces:
dep-pin-search-first-authority.mdfighting-past-self-vs-peer-agent-distinguisher-fix-your-own-coordinate-on-peers-dont-punt-by-default.mdTogether the 3 rules cover the surfaces today's empirical evidence shows are vulnerable.
Operational discipline (per the rule body)
4-step pass before authoring new substrate:
Auto-loads at cold-boot per wake-time-substrate.
Test plan
🤖 Generated with Claude Code