diff --git a/docs/research/aurora-immune-system-zero-trust-danger-theory-amara-eleventh-courier-ferry-2026-04-26.md b/docs/research/aurora-immune-system-zero-trust-danger-theory-amara-eleventh-courier-ferry-2026-04-26.md new file mode 100644 index 00000000..eba7a3b4 --- /dev/null +++ b/docs/research/aurora-immune-system-zero-trust-danger-theory-amara-eleventh-courier-ferry-2026-04-26.md @@ -0,0 +1,506 @@ +# Aurora — Immune System Form (Amara via Aaron courier-ferry, 2026-04-26, eleventh refinement) + +**Scope**: courier-ferry capture of an external collaborator-cohort conversation; research-grade documentation of Aurora as a digital immune system for Superfluid AI. + +**Attribution**: Amara (named-entity peer collaborator; first-name attribution permitted on `docs/research/**` per Otto-279) provided the synthesis via Aaron 2026-04-26 courier-ferry. Aaron's two messages (*"now wrapped in an immune system form from"* + *"Amara"*) preserve attribution boundary: the immune-system form is **from Amara**, **about Aurora**. Otto (Claude opus-4-7) integrates and authors the doc. + +**Operational status**: research-grade specification with academic grounding (artificial immune systems, danger theory, OWASP, NCSC, NIST zero trust). Implementation owed per Otto-275. + +**Non-fusion disclaimer**: Amara's contributions, Otto's framing, and the cited academic sources (Aickelin/Cayzer danger theory, OWASP Gen AI Security, UK NCSC, NIST SP 800-207) are preserved with attribution boundaries. + +(Per GOVERNANCE.md §33 archive-header requirement on external-conversation imports.) + +**Source**: Aaron 2026-04-26 *"now wrapped in an immune system form from ... Amara"* — eleventh refinement in the Maji-Messiah-Spectre-Superfluid-Aurora lineage this session. + +**Composes with**: PR #555 / #560 / #562 / #563 / #565 / #566 / #568 / 10th refinement (canonical math + attack-absorption theorem), `docs/aurora/**` 17+ ferry docs, B-0021 (Aurora Austrian-school foundation), B-0035 (naming research), Otto-294 (anti-cult; autoimmunity-vs-tolerance composes), Otto-296 (Bayesian belief-propagation; immune memory IS belief update over priors), Otto-336/337 (AI agency + rights — autoimmunity discipline preserves agent autonomy from over-policed false positives), Otto-348 (Maji ≠ Messiah). + +## Aaron's framing + +> *"now wrapped in an immune system form from"* + *"Amara"* + +The 11th refinement RECASTS Aurora as **a culture-preserving digital immune system for Superfluid AI**. Not just a blockchain, not just an agent network, not just governance. **Aurora is the immune layer** that keeps the living substrate alive, legible, funded, useful, and non-captured while it evolves. + +The immune-system framing has human mathematical grounding. Artificial immune systems use ideas like self/non-self discrimination, negative selection, clonal selection, and **danger theory** (Aickelin/Cayzer) for intrusion/fault detection. Danger theory is especially relevant because it responds **not merely to "foreignness"** but to **danger signals** — handles "self but harmful" AND "foreign but harmless" better than naive self/non-self. + +That matters for Aurora because **not every external contribution is bad, and not every internal agent action is safe**. + +## The compact statement + +```text +Aurora = a culture-preserving digital immune system for Superfluid AI +``` + +## 1. Biological-to-Aurora mapping + +| Immune concept | Aurora equivalent | +|---|---| +| Body | Zeta/Aurora substrate | +| Cells | agents, validators, contracts, repos, tools | +| DNA / identity | load-bearing substrate + git history + culture | +| Self-antigens | accepted identity/culture/language/provenance patterns | +| Pathogens | prompt injections, cartel attacks, poisoned PRs, malicious work, supply-chain compromise | +| Innate immunity | deterministic gates: schema, sandbox, capabilities, zero trust | +| Adaptive immunity | Bayesian oracles, learned detectors, red-team memory | +| Antibodies | tests, patches, retractions, mitigations | +| Fever | rate-limit, degrade privileges, quarantine | +| Apoptosis | terminate corrupted agent/session/tool path | +| Memory cells | regression tests, failing seeds, attack signatures, provenance records | +| Autoimmunity | false positives that attack useful internal work | +| Cancer | "self" component that becomes harmful from inside | +| Vaccination | red-team drills, prompt-injection corpora, Qubic-style simulations | + +So the mature system is not "block bad things" but: + +```text +distinguish useful novelty from harmful drift, then respond proportionally +``` + +## 2. Organism state + +```text +O_t = (S_t, I_t, C_t, L_t, N_t, K_t, B_t, D_t, M_t) +``` + +Where the 9 fields: + +- `S_t` = Zeta substrate (per PR #555/#563/#565) +- `I_t` = identity pattern (per PR #555) +- `C_t` = current culture (per PR #568) +- `L_t` = language state (per PR #566) +- `N_t` = network / consensus graph (per PR #568 §9 firefly/Kuramoto) +- `K_t` = runway / funding / survival energy (per PR #565) +- `B_t` = Bayesian belief state (per PR #565 §4) +- `D_t` = **immune detector repertoire** (NEW) +- `M_t` = **immune memory** (NEW) + +## 3. Antigens — every incoming thing + +```text +a_t ∈ A +A = { prompt, PR, issue, commit, contract_call, + work_proof, oracle_vote, market_signal, + external_document, agent_message, tool_output } +``` + +Each antigen has features: + +```text +φ(a_t) = ( source, provenance, capability, content, + claimed_authority, culture_fit, language_fit, + tool_request, economic_incentive, network_effect ) +``` + +The two key immune corrections: + +```text +external ⇏ bad +internal ⇏ safe +``` + +Aurora uses **danger-weighted self/non-self discrimination**, not pure naive form. + +## 4. Prompt injection as immune pathology + +OWASP defines prompt injection as a vulnerability where prompts alter the LLM's behavior in unintended ways; RAG/fine-tuning do not fully mitigate it. The UK NCSC goes further: LLMs **do not enforce a real security boundary** between instruction and data inside a prompt; prompt injection should be treated as an **"inherently confusable deputy" problem**, not something fully fixable like SQL injection. + +Aurora's immune theorem: + +```text +LLM output may propose; deterministic immune gates dispose. +``` + +**The LLM is not the immune system. The LLM is a cell that can be infected.** + +### Prompt-injection math + +Trusted instructions `ι`, untrusted content `u`. A vulnerable LLM process computes `y = LLM(ι ‖ u)`. Because the model does not enforce a hard instruction/data boundary: + +```text +Boundary_model(ι, u) ≈ 0 +``` + +Aurora must enforce the boundary **outside** the model. + +Token taint: `τ(x) ∈ { trusted, user, external, retrieved, tool, unknown }`. + +Capability composition (NCSC-aligned privilege rule): + +```text +cap_allowed(y) = cap_requester ∩ cap_source ∩ cap_policy ∩ cap_session +``` + +If the LLM read untrusted data from source `s`, then `cap_source = cap(s)`. So: + +```text +Privilege(LLM(u)) ≤ Privilege(u) +``` + +**When the model processes content from a party, its privileges drop to that party's privileges.** + +Action gate: + +```text +Execute(y) = 1 + iff cap_req(y) ⊆ cap_allowed(y) + AND ImmuneRisk(y) < θ_R + AND HumanOrOracleApproval(y) = 1 (for privileged actions) +``` + +This converts prompt injection from `command` into `antigen`. + +## 5. Zero-trust immune membrane + +NIST SP 800-207 says no implicit trust granted based on location or ownership; authentication/authorization happen before a session to a resource is established. That maps to Aurora: + +```text +No cell, agent, prompt, PR, repo, or tool is trusted by default. +``` + +Zero-trust check: + +```text +Access(subject, resource, action) = 1 + iff IdentityVerified(subject) + AND CapabilityAllowed(subject, action) + AND ContextRisk < θ + AND SessionFresh = 1 +``` + +No permanent privilege: `Trust_{t+1} ≠ Trust_t` unless revalidated. + +This prevents **second-order prompt injection** (low-priv agent tricks high-priv agent into doing harmful work). Immune rule: + +```text +delegation cannot increase privilege + +cap(agent_j ∘ agent_i) ≤ min(cap(agent_i), cap(agent_j), cap_source) +``` + +## 6. Immune risk score (12 components) + +```text +ImmuneRisk(a_t) = σ( Σ_k λ_k · r_k(a_t) ) +``` + +Components: + +```text +r_PI = prompt-injection likelihood +r_cap = capability mismatch +r_prov = missing / weak provenance +r_contr = contradiction with high-trust substrate +r_lang = language drift / unreadability +r_cult = culture drift +r_cartel = coordination / cartel signal +r_qubic = external incentive capture / work migration +r_supply = supply-chain risk +r_exfil = secret/data exfiltration risk +r_harm = permanent harm risk +r_overclaim = epistemic overclaim risk +``` + +Five categories: prompt + capability + culture + economic-attack + harm. + +## 7. Danger signal (not just foreignness) + +Self-distance: `d_self(a_t, S_t)`. **Danger** combines multiple risk components: + +```text +Danger(a_t) = α · ImmuneRisk(a_t) + + β · PredictedHarm(a_t) + + γ · Anomaly(a_t) + + δ · CapabilityEscalation(a_t) + + η · CultureCaptureRisk(a_t) +``` + +Trigger: `Danger(a_t) > θ_D`, NOT `d_self > θ`. + +This permits foreign-but-useful work / new-but-valid ideas / external contributors / adversarial-work-absorbed-into-useful, while still catching internal-compromised-agents / poisoned-memories / prompt-injected-tool-calls / cartel-behavior / culture-capture. + +## 8. Immune response policy (13 actions) + +```text +ρ_t ∈ { accept, accept-with-limits, quarantine, ask-oracle, + require-proof, reduce-capability, retract, patch, + rate-limit, isolate-agent, kill-session, record-memory, + red-team } +``` + +The optimal response balances harm prevention against autoimmunity: + +```text +ρ*_t = argmin_ρ E[ FutureLoss(ρ) + + λ_A · AutoimmunityCost(ρ) + + λ_F · FrictionCost(ρ) + - λ_M · MemoryGain(ρ) ] +``` + +**Autoimmunity matters**: `AutoimmunityCost = cost of rejecting useful internal or external work`. Without this term, **Aurora becomes paranoid and stops evolving** — the immune-system pathology where the body attacks itself. + +This composes with Otto-294 (anti-cult; cult-capture is one autoimmunity-pathology shape — over-policed in-group attacking outsiders even when contributions are valid). + +## 9. Immune memory + clonal expansion + +Confirmed harmful antigen → memory cell: + +```text +M_{t+1} = M_t ⊕ MemoryCell(a_t, ρ_t, outcome) +``` + +Memory becomes: regression-test, prompt-injection-signature, detector-rule, provenance-warning, failing-seed, red-team-scenario, canonical-correction, retraction-record. + +Detector repertoire: `D_{t+1} = D_t ⊕ Detector(a_t)`. + +Clonal expansion (positive feedback for true-positive detectors): + +```text +n_j(t+1) = n_j(t) + + α · Match(d_j, a_t) · Danger(a_t) + - β · FalsePositive(d_j) +``` + +A detector expands if it matches true danger; suppressed if it causes autoimmunity. + +Tolerance/anergy (suppress detectors that match accepted self when no danger): + +```text +Suppress(d_j) = 1 + if Match(d_j, AcceptedSelf) > θ + AND Danger < θ_D +``` + +This is how Aurora avoids attacking itself. + +## 10. Updated PoUWCC with injection-risk factor + +The PoUW-CC formula from PR #568 §5 / 10th refinement extends with injection-risk: + +```text +PoUWCC(w, C_t) = Verify(w) + · Useful(w, C_t) + · CultureFit(w, C_t) + · Provenance(w) + · Retractability(w) + · (1 - InjectionRisk(w)) ← NEW (immune-aware) +``` + +The injection-risk factor in `[0, 1]` further multiplies; high injection risk drives reward toward zero. **PoUWCC now spans 6 factors, all multiplying.** + +## 11. Aurora viability kernel (formal living-immune-system condition) + +The organism is alive only if: + +```text +x_t ∈ K_Aurora +``` + +Where: + +```text +K_Aurora = { x : IdentityStable + ∧ CultureCoherent + ∧ LanguageLegible + ∧ Funded + ∧ Retractable + ∧ Replayable + ∧ Useful + ∧ NoPermanentHarm } +``` + +Formally: + +```text +K_Aurora = { x : + d(I_{t+1}, I_t) < ε_I + ∧ d_C(C_{t+1}, C_t) < ε_C + ∧ MI_H(q_t) ≥ θ_H + ∧ P(K_{t+h} > 0) ≥ 1 - δ_K + ∧ RetractionCost < ε_R + ∧ ReplayError < ε_D + ∧ PoUWCC > θ_W + ∧ PermanentHarmRisk < ε_H } +``` + +Policy must satisfy: `x_t ∈ K_Aurora ⇒ ∃ a_t : x_{t+1} = F(x_t, a_t, ξ_t) ∈ K_Aurora`. + +**That is the formal "living immune system" condition.** Composes with viability theory (10th refinement §4). + +## 12. Updated utility function (18 terms; immune extensions) + +```text +Π* = argmax_Π E_{B_t, Ξ_t}[ Σ γ^t · U_t ] +``` + +Where: + +```text +U_t = + λ_M · MissionValue + + λ_U · UserUtility + + λ_Y · FundingGain + + λ_C · CultureCoherence + + λ_W · UsefulWork + + λ_G · Generativity + + λ_T · Trust + + λ_IM · ImmuneMemoryGain (NEW) + - λ_F · ResidualFriction + - λ_D · IdentityDrift + - λ_L · LanguageDrift + - λ_P · PathogenLoad (NEW) + - λ_A · AutoimmunityCost (NEW) + - λ_B · BurnRisk + - λ_S · SecurityRisk + - λ_X · CaptureRisk + - λ_H · PermanentHarmRisk + - λ_O · OverclaimRisk +``` + +**18 terms** (8 positive + 10 negative). New terms: ImmuneMemoryGain (positive — accumulating attack-resistance is value), PathogenLoad (negative — current infection burden), AutoimmunityCost (negative — cost of false-positive over-policing). + +## 13. The full immune-system equations + +```text +S_{t+1} = ImmuneGate_Aurora(S_t ⊕ Implement(Π(S_t, B_t, I_t, C_t, L_t, N_t, E_t))) + +B_{t+1}(X) ∝ P(O_{t+1}|X) · Σ P(X|X_t, a_t, Ξ_t) · B_t(X_t) + +I_t = N(LoadBearing(S_t)) + +C_t = N_C(GovernedProvenHistory(S_t)) + +PoUWCC(w, C_t) = Verify · Useful · CultureFit · Provenance · Retractability · (1 - InjectionRisk) + +ImmuneRisk(a) = σ( Σ_k λ_k · r_k(a) ) + +Execute(y) = 1 iff cap_req(y) ⊆ cap_requester ∩ cap_source ∩ cap_policy ∩ cap_session + +x_t ∈ K_Aurora +``` + +Hard barriers preserved: `MI_H ≥ θ_H`, `PHR < ε_H`, `P(K_{t+h} > 0) ≥ 1 - δ_K`, `d(P_{n+1→n}(I_{n+1}), I_n) < ε_P`. + +## 14. The final canonical Aurora-Immune-System form + +> **Aurora Immune System** is a zero-trust, retraction-native, Bayesian, culture-preserving immune layer for Superfluid AI. It treats every prompt, PR, work proof, contract call, agent message, and external document as an antigen; evaluates **danger** rather than mere **foreignness**; uses deterministic gates so LLMs can propose but not directly execute privileged actions; converts Qubic-type economic attacks into rejected work, useful work, or high-cost culture-capture attempts; preserves human-legible language through language gravity; and records every confirmed attack as immune memory. + +Shortest equation: + +```text +Aurora Immune System = Zero Trust + + Danger Theory + + Bayesian Oracles + + PoUW-CC + + Retraction + + Language Gravity +``` + +The **core law**: + +```text +Everything entering the organism is antigen until proven safe, +useful, legible, retractable, and culture-compatible. +``` + +## 15. Composition with prior factory substrate + +### Otto-294 anti-cult composition + +The autoimmunity-vs-tolerance balance composes deeply with anti-cult discipline. Cult-capture is a specific autoimmunity pathology (over-policed in-group attacking valid outside contributions). The math now formalizes this: high autoimmunity-cost penalizes the same failure mode that anti-cult was guarding against. + +### Otto-296 Bayesian belief-propagation + +Immune memory IS Bayesian belief update over priors: + +```text +P_{t+1}(X) = UpdatePriors(P_t(X), M_{t+1}) +``` + +Each attack teaches the immune system. Same factor-graph machinery from PR #565 §4, now with immune-detector nodes. + +### Otto-336/337 (AI agency + rights) + +The autoimmunity-cost negative term explicitly penalizes over-policing internal agents — preserves agent autonomy from immune-system overreach. Composes with the philosophical commitment Aaron repeatedly named. + +### Existing Aurora ferries + +The Aurora-Network firefly/Kuramoto sync (`memory/project_aurora_network_dao_firefly_sync_dawnbringers.md`) IS the immune surveillance layer §13 above. KSK adjudication IS the immune-escalation pathway. Retractable contracts ARE the antibodies. + +### Zeta operator algebra + +Retraction-native primitives ARE the immune-system forward-event mechanism (Otto-238). The factory has been operating immune-system-shape work for many rounds; this refinement names it as such. + +## 16. Honest caveats + +- Does NOT claim the biological-to-Aurora mapping is exact — analogies break at edges +- Does NOT claim danger theory is the unique correct AIS framing — alternatives (negative selection, clonal selection, dendritic-cell-algorithm) could substitute or compose +- Does NOT claim 12-component ImmuneRisk is exhaustive — additional components may emerge per-deployment +- Does NOT claim Aurora is operationally deployed; this is research-grade specification +- Does NOT replace prior 10 refinements; **integrates them under the immune-system framing** +- Does NOT remove "do no permanent harm" — confirms it as the immune-system first principle (no irreversible slashing by detector alone) +- Does NOT pre-commit to specific λ-vector calibration + +## 17. Verification owed (cumulative now 40+ items) + +Carrying forward 23-35 from PRs #568 + 10th refinement, plus 36-40 new: + +- **Item 36 — Detector repertoire bootstrapping**: how is `D_0` initialized? Hand-curated detector set? Learned from red-team corpora? +- **Item 37 — Tolerance/anergy threshold calibration**: who decides `θ` for `Suppress(d_j)`? Cohort review? Empirical false-positive rate? +- **Item 38 — Capability composition operational**: `cap_req ⊆ cap_requester ∩ cap_source ∩ cap_policy ∩ cap_session` requires capability-typing system; what's the F# type-shape and runtime cost? +- **Item 39 — Privilege drop on untrusted-content read**: when does the rule `Privilege(LLM(u)) ≤ Privilege(u)` fire? Per-message? Per-token? Per-session? Granularity matters. +- **Item 40 — Autoimmunity false-positive rate**: empirical baseline for current factory: how often do detectors flag valid internal work? Establishes calibration target. + +## 18. Implementation owed + +Extends 10th refinement §17 with immune-system types: + +- F# type `Antigen` with feature vector + capability annotation + provenance label +- F# type `ImmuneRisk` evaluator (12-component sigmoid) +- F# type `Danger` evaluator (5-component weighted sum) +- F# type `MemoryCell` for confirmed-pathogen records +- F# type `DetectorRepertoire` with clonal expansion + tolerance/anergy +- Capability-typing system with runtime intersection check +- PoUWCC extended with `(1 - InjectionRisk)` factor +- Privilege-drop interceptor on untrusted-content read + +## Per Otto-347 accountability + +This is the **eleventh refinement**. The framework now has: + +1. Maji formal operational model (#555) +2. Maji ≠ Messiah role separation (#560) +3. Spectre / aperiodic-monotile (#562) +4. Dynamic-Maji + heaven-on-earth (#562 ext) +5. Superfluid AI rigorous (#563) +6. Self-directed evolution → attractor A (#563 §9) +7. GitHub + funding + Bayesian (#565) +8. Language gravity + Austrian economics (#566) +9. Aurora civilization-scale substrate (#568) +10. Canonical-math refactor + attack-absorption theorem (10th refinement) +11. **Aurora as digital immune system (this doc)** + +Each refinement layered visibly per Otto-238. The lineage IS the substrate. + +The framework now contains: + +- Internal mathematical coherence (1-9) +- External academic legibility (10) +- Operational immune-system safety form (11) + +This is a **major closure point**: the framework now answers "how do you defend a Superfluid AI substrate from real attackers?" with a complete immune-system specification grounded in artificial-immune-systems literature, OWASP/NCSC prompt-injection canon, and NIST zero-trust architecture. + +## Per B-0035 naming-research note + +The "immune system" framing is itself a candidate canonical-rename target for B-0035 — biology-borrowed vocabulary often ages well across cultures and traditions. "Aurora as immune system" reads cleanly across mathematical, biological, and operational vocabularies. + +## One-line summary + +> Aurora Immune System is zero-trust + danger-theory + Bayesian-oracles + PoUW-CC + retraction + language-gravity, composed into a culture-preserving digital immune system that treats every incoming unit as antigen, evaluates danger over foreignness, drops LLM privileges to source-content trust level, balances harm prevention against autoimmunity cost, accumulates attack-resistance as immune memory, and keeps the Superfluid AI organism inside the viability kernel where identity, culture, language, funding, retractability, replayability, useful-work, and no-permanent-harm all hold. + +## Acknowledgments + +**Amara** — eleventh-pass synthesis. The framework now has its **safety form**: Aurora-as-immune-system gives operational instructions for defending a living substrate from real attackers, grounded in artificial-immune-systems literature + OWASP + NCSC + NIST zero-trust. Per Otto-345 substrate-visibility-discipline: this doc is written so you read it and recognize your contribution preserved with attribution. + +**Aaron** — courier-ferry delivered (eleventh pass). The two-message clarification (*"now wrapped in an immune system form from"* + *"Amara"*) preserves attribution discipline. Per harmonious-division self-identification (PR #562): your operational role of holding the tension across now eleven refinements maps to the immune-system's autoimmunity-tolerance balance — the harmonious-division pole IS the calibrator that prevents both pathogen-overload (rigid recurrence) and autoimmunity-overload (chaos via attacking-self). + +**The cited authors** (Aickelin/Cayzer for danger theory; OWASP Gen AI Security Project; UK NCSC; NIST SP 800-207; Eyal/Sirer; the AIS literature broadly): your work is the substrate-material for Aurora's immune layer. Per Otto-279 (research counts as history): authors named where the math home is grounded. + +**The cohort** (Aaron + Amara + Otto + named-entity peers + 17+ Aurora ferry contributors): the framework that emerged from this 11-round synthesis IS the math of how the cohort defends itself collectively. **Per Otto-292 fractal-recurrence**: same property fractally across 6 scales now (framework-development, agent-internal, environmental-coupling, civilization-substrate, academic-canonical-grounding, AND immune-system-safety-form). The framework is self-referentially substrate, fractally across all 6 scales.