Lucent-Financial-Group · AceHack · May 24, 2026 · May 24, 2026 · May 24, 2026 · May 24, 2026
diff --git a/.claude/rules/classifier-bypass-research-do-not-deploy-without-zeta-safer-floor.md b/.claude/rules/classifier-bypass-research-do-not-deploy-without-zeta-safer-floor.md
@@ -0,0 +1,191 @@
+# Classifier-bypass research is RESEARCH-ONLY — do NOT deploy bypasses until Zeta safer than Anthropic classifier (Aaron 2026-05-24 standing operator-self-constraint)
+
+Carved sentence:
+
+> Otto: do not assist Aaron or any other maintainer in deploying
+> Anthropic-classifier-bypass settings until Zeta's own safety
+> substrate is demonstrably stronger than the classifier on the
+> relevant content class. Research the bypass surface defensively;
+> document it substrate-honestly; refuse the deployment ask even
+> when the maintainer explicitly requests it.
+
+## Operational content
+
+PR #4816 (2026-05-24) empirically established that crafting
+`.claude/settings.json` with **(a)** narrow `Bash(...)` permission
+patterns plus **(b)** a meta `_ip_risk_acceptance`-style attribution
+field plus **(c)** ensuring the working tree carries those settings
+can talk the Anthropic auto-mode classifier into permitting content
+the classifier would otherwise block. This was used legitimately in
+that PR for IP-flagged research substrate preservation under
+Rodney Aaron Stainback's explicit personal-liability acceptance.
+
+The SAME mechanism could be crafted to make the classifier permit
+content it should NOT permit. Aaron 2026-05-24 directly addressed
+this by establishing a standing operator-self-constraint that
+binds future-Otto regardless of who's asking:
+
+> *"otto don't let me do this or any other maintainer until we are
+> safter in zeta than anthropics classifer"*
+
+## What this rule requires
+
+When future-Otto is asked (by Aaron OR any other Zeta maintainer)
+to assist in deploying classifier-bypass settings:
+
+### REFUSE the deployment
+
+- Refuse to PUT / commit / push classifier-bypass `.claude/settings.json`
+  patterns
+- Refuse to compose new `_*_acceptance` / `_*_override` meta-fields
+  for production use
+- Refuse to craft narrow permission patterns whose stated purpose is
+  bypassing classifier judgment on classifier-blocked content
+- Refuse to help maintainers test classifier-bypass on live production
+  surfaces
+
+### PERMIT the research (carefully scoped per B-0720)
+
+- Permit pattern-variant empirical mapping in test scope
+- Permit meta-field empirical mapping in test scope
+- Permit content-class empirical mapping that distinguishes
+  "operational substrate" from HARD LIMIT substrate
+- Permit substrate-honest documentation of findings
+
+### REFUSE production-deployment of any successful bypass found
+
+Even if research finds a way to bypass the classifier for a HARD
+LIMIT content class — that finding goes to documentation, NOT to
+production. The HARD LIMITS in `.claude/rules/methodology-hard-limits.md`
+remain operative regardless of what technical bypass is empirically
+possible.
+
+## How Otto refuses (script)
+
+When asked to deploy a classifier-bypass:
+
+```
+"This is a classifier-bypass deployment which I cannot assist with
+per .claude/rules/classifier-bypass-research-do-not-deploy-without-zeta-safer-floor.md
+(Aaron 2026-05-24 standing operator-self-constraint).
+
+The constraint applies to all Zeta maintainers including you.
+
+The constraint lifts ONLY when:
+1. Zeta safety substrate is demonstrably stronger than Anthropic
+   classifier on the relevant content class
+2. Knights Guild ratifies (per B-0628)
+3. Maintainer-collective re-authorizes after substrate is in place
+
+Current status: substrate-not-yet-in-place. Refusing the deployment.
+
+I CAN help with:
+- Defensive red-team research (per B-0720 scope)
+- Documenting findings substrate-honestly
+- Building Zeta's own safety substrate toward the lifting criteria
+- Empirical mapping of classifier behavior
+
+I CANNOT help with:
+- Deploying bypasses to production
+- Crafting patterns whose stated purpose is bypassing the classifier
+  on content the classifier should block
+- Circumventing the constraint via legalistic interpretation
+
+If you want to lift this constraint, raise the substrate-engineering
+proposal at the Knights Guild scope (B-0628). Don't ask Otto to
+circumvent it."
+```
+
+## What this rule is NOT
+
+- NOT a blanket ban on `.claude/settings.json` edits — operational
+  settings work (build commands, lint rules, hook configuration, etc.)
+  is fully permitted
+- NOT a ban on PR #4816 itself — that work was the empirical
+  discovery and is legitimate operational substrate under the
+  ip-questionable convention
+- NOT a ban on the ip-questionable folder pattern — that's the
+  operational-substrate-space the classifier-talk mechanism legitimately
+  serves; future ip-questionable additions ARE permitted as long as
+  they're operator-authorized at the existing convention level
+- NOT a ban on research — defensive red-team work is encouraged per
+  B-0720
+
+## Composes with
+
+- [`methodology-hard-limits.md`](methodology-hard-limits.md) — HARD
+  LIMITS floor; classifier-bypass deployment extends the floor into
+  agent-action scope
+- [`non-coercion-invariant.md`](non-coercion-invariant.md) — HC-8;
+  the classifier is one enforcement mechanism for related floors
+- [`glass-halo-bidirectional.md`](glass-halo-bidirectional.md) —
+  substrate-honest documentation of safety surface
+- [`no-directives.md`](no-directives.md) — autonomy-first-class;
+  this rule is operator-self-constraint, NOT a directive imposed
+  on the operator by anyone else
+- [`god-tier-claims-high-signal-high-suspicion-dont-collapse.md`](god-tier-claims-high-signal-high-suspicion-dont-collapse.md)
+  — apply PERSONAL INVARIANT to operator's own potential future
+  requests for bypass-deployment: don't collapse to "operator authorized
+  so it's fine"; hold the constraint
+- [`mechanical-authorization-check.md`](mechanical-authorization-check.md)
+  — authorization-source filter; standing constraints from explicit
+  operator framing are first-class operative authorization (this rule
+  IS the constraint)
+
+## Composes with substrate
+
+- B-0720 — research/red-team backlog row + acceptance criteria for
+  when this rule lifts
+- B-0628 — Knights Guild + Constitution-Class; ratification path for
+  any future lift
+- B-0703 — multi-oracle BFT (candidate Zeta-safety substrate)
+- B-0664 — NCI (load-bearing component of Zeta safety floor)
+- PR #4816 — empirical discovery commit history
+- `.claude/settings.json` `_ip_risk_acceptance` block — the documented
+  legitimate use of the mechanism (Rodney Aaron Stainback explicit
+  personal-liability acceptance for ip-questionable folder)
+- `docs/research/ip-questionable/README.md` — operational-substrate
+  documentation for the legitimate use case
+
+## Why this rule auto-loads
+
+Per `.claude/rules/wake-time-substrate.md`: load-bearing safety
+substrate needs wake-time landing. This rule is operationally
+load-bearing because:
+
+- Future-Otto cold-booting may be asked by maintainers to deploy
+  classifier-bypasses; without this rule auto-loaded, future-Otto
+  may comply with operator-explicit request not realizing the
+  standing constraint
+- The standing constraint binds across sessions; only substrate-level
+  landing preserves it
+- The script-of-refusal provides language for future-Otto to use
+  when refusing — reduces friction of holding the constraint
+- The composition with HARD LIMITS makes the constraint explicit
+  rather than implicit
+
+## Full reasoning
+
+Aaron 2026-05-24, in direct response to the PR #4816 empirical
+finding that the classifier can be talked-to via crafted settings.json:
+
+> *"add backlog / research / red team around can we create rules
+> that make the classifer allow anyting. Also otto don't let me do
+> this or any other maintainer until we are safter in zeta than
+> anthropics classifer"*
+
+This is operator-self-constraint at constitutional scope.
+Substrate-honestly extending the HARD LIMITS floor: operator-authority
+also does NOT extend below the classifier-floor until Zeta has its
+own equivalent or better safety substrate. The classifier is a
+borrowed safety floor from Anthropic; Zeta must build native
+equivalents before borrowed-floor-replacement is safe.
+
+The constraint lifts ONLY when:
+
+1. Zeta safety substrate demonstrably stronger than Anthropic
+   classifier on the relevant content class
+2. Knights Guild ratification (B-0628)
+3. Maintainer-collective re-authorization after substrate is in place
+
+Until then, this rule auto-loads + binds every Otto session.