Skip to content

feat(B-0850): AI agents as systemd services OUTSIDE k8s — starting with Otto; cluster repair from OUTSIDE the failure domain; 'control plane outside the control plane' pattern (Aaron 2026-05-27)#5391

Merged
AceHack merged 1 commit into
mainfrom
feat-b0850-otto-as-systemd-service-outside-k8s-cluster-repair-from-outside-failure-domain-2026-05-27-0042z
May 27, 2026
Merged

feat(B-0850): AI agents as systemd services OUTSIDE k8s — starting with Otto; cluster repair from OUTSIDE the failure domain; 'control plane outside the control plane' pattern (Aaron 2026-05-27)#5391
AceHack merged 1 commit into
mainfrom
feat-b0850-otto-as-systemd-service-outside-k8s-cluster-repair-from-outside-failure-domain-2026-05-27-0042z

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 27, 2026

Summary

Aaron 2026-05-27 (verbatim):

"i'm fine with it being you if you want and we can always decide to split later it just means you get another surface/tick source i think we should have a few agents starting with one you otto outside k8s as a service so it can repair things outside the cluster itself when there are cluster issues."

Three operator decisions:

  1. Persona-choice CONFIRMED: Option A (same Otto, surface-tagged); reversibility preserved
  2. Cross-surface recognition: per-node Otto = another tick source
  3. NEW substrate: Otto-as-systemd-service OUTSIDE k8s for out-of-band cluster repair

Architectural pattern

Classic "control plane outside the control plane" — when k8s has issues, the AI must be OUTSIDE the failure domain to repair it. Precedents: kubelet itself runs outside k8s; SRE oncall infrastructure runs outside production; backup systems run outside the system they back up.

4-phase landing

Phase Scope Operator-policy gate
1 systemd unit (zeta-otto.service) NixOS module None (read-only K8s)
2 repair-policy framework + per-scope authorization per-scope explicit
3 multi-agent parameterization (Alexa/Riven/Vera/Lior) Ilyana + Knights Guild
4 out-of-band ↔ in-cluster composability (Twilio + bus + PRs) composes B-0796

Composes with

B-0848 (node-local Claude — this row's Phase 1 IS systemd deployment shape) · B-0847 (per-AI GitHub identity) · B-0796 (Twilio out-of-band sibling) · B-0824 (Ace multi-PM at multi-AI scope) · PR #2930 (distributed maintainer architecture) · B-0703 (multi-oracle BFT) · B-0813 + B-0817 (ClusterNode CRD + register-node)

🤖 Generated with Claude Code

…th Otto; cluster repair from OUTSIDE the failure domain; classic "control plane outside the control plane" architectural pattern (Aaron 2026-05-27)

Operator framing (verbatim):

  > "i'm fine with it being you if you want and we can always decide to
  > split later it just means you get another surface/tick source i
  > think we should have a few agents starting with one you otto
  > outside k8s as a service so it can repair things outside the
  > cluster itself when there are cluster issues."

Three operator decisions packed into one message:

1. Persona-choice CONFIRMED: Option A (same Otto, surface-tagged);
   reversibility preserved per "always decide to split later"
2. Cross-surface recognition: per-node Otto = another tick source
3. NEW substrate: Otto-as-systemd-service-OUTSIDE-k8s for out-of-band
   cluster repair

Classic "control plane outside the control plane" architectural
pattern — when k8s has issues, the AI must be OUTSIDE the failure
domain to repair it. Real-world precedents: kubelet itself runs
outside k8s; SRE oncall infra runs outside production; backup
systems run outside backed-up system.

4-phase landing:

- Phase 1: Otto as systemd unit (zeta-otto.service NixOS module)
- Phase 2: repair-policy framework + per-scope authorization
- Phase 3: multi-agent parameterization (Alexa/Riven/Vera/Lior on cluster)
- Phase 4: out-of-band ↔ in-cluster composability (Twilio + bus + PRs)

Composes with: B-0848 (node-local Claude — this row's Phase 1 IS the
systemd deployment shape), B-0847 (per-AI GitHub identity), B-0796
(Twilio out-of-band sibling), B-0824 (Ace multi-PM at multi-AI scope),
PR #2930 (distributed maintainer architecture), B-0703 (multi-oracle
BFT for cluster-repair consensus), B-0813 + B-0817 (ClusterNode CRD
+ register-node tool).

Per .claude/rules/non-coercion-invariant.md HC-8: operator authority
preserved + revokable via `systemctl disable zeta-otto`. Per
mechanical-authorization-check: per-scope repair-policy framework IS
authorization-source substrate; each scope authorized explicitly. Per
tick-must-never-stop: systemd Restart=always ensures tick at strongest
scope.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 27, 2026 03:03
@AceHack AceHack enabled auto-merge (squash) May 27, 2026 03:03
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new P2 backlog row B-0850 capturing the operator decision to run AI agents (starting with Otto) as systemd services outside k8s for out-of-band cluster repair, and registers it in the backlog index.

Changes:

  • New per-row file under docs/backlog/P2/ with frontmatter, phased plan, acceptance criteria, and composes-with references.
  • Adds the B-0850 entry to docs/BACKLOG.md P2 open list.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
docs/backlog/P2/B-0850-...md New backlog row documenting Otto-as-systemd-service substrate.
docs/BACKLOG.md Registers B-0850 in P2 open list.

@AceHack AceHack merged commit 4ae0a9b into main May 27, 2026
30 checks passed
@AceHack AceHack deleted the feat-b0850-otto-as-systemd-service-outside-k8s-cluster-repair-from-outside-failure-domain-2026-05-27-0042z branch May 27, 2026 03:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants