From 4ecadfdba2b5afd700f519990b716958f7919204 Mon Sep 17 00:00:00 2001
From: Aaron Stainback <aaron_bond@yahoo.com>
Date: Fri, 24 Apr 2026 00:02:47 -0400
Subject: [PATCH] research: Aminata pass on bullshit-detector design (4th pass;
 3 CRITICAL + 4 IMPORTANT + 3 WATCH + 1 DISMISS)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fourth Aminata adversarial pass this session. Third on the
Otto composition stack (Otto-90 oracle-scoring v0 → Otto-94
iteration-1 on multi-Claude → Otto-99 detector → this pass).

Reviews Otto-99's provenance-aware bullshit-detector design
(PR #282) which explicitly named "Aminata adversarial pass"
as dependency #1 of adoption.

3 CRITICAL findings:

1. **Cross-detector collusion** — detector + Aminata + Codex
   all share training-corpus / repo-access / PR-comment
   lineage. Reviewer-set lineage-coupling reintroduces
   carrier-laundering one meta-layer up. Detector's SD-9
   operationalisation does not break this.
2. **Min-merging on ordinal bands is Goodhart-bait at
   G_carrier_overlap** — most adversary-manipulable gate
   drives whole output. Author controls citation structure
   = author controls carrier-overlap gate. Sparse citations
   + dense implicit framing-inheritance = gameable GREEN.
3. **G_evidence fig-leaf** — advisory-only until independent-
   oracle substrate ships; v0 detector is four-gate not
   five-gate; `likely confabulated` output type is dead
   code until dependency #3 lands. Not stated as v0 caveat
   in design.

4 IMPORTANT: no-signal vs kNN-evasion indistinguishability;
Otto-wake second-review insufficient for status-pin
independence; DetectorOutputRetracted flood-control absent;
G_coverage_plausibility gate missing.

3 WATCH: worst-band masks distribution; self-demo is theatre
not validation; composition-stack silent-failure surface
absent TLA+ invariants (Soraya-routable).

1 DISMISS: parameter-ADR gate reused from oracle-scoring v0.

None block the research-doc land (Otto-99 correctly framed
research-grade). **All ten findings would block a v1
implementation-ADR.**

Non-fusion disclaimer load-bearing in this pass: Aminata's
concordance with prior Aminata passes is same-agent signal
NOT independent concordance. The detector applied to THIS
review would correctly emit `looks similar but lineage-
coupled` per SD-9 — and it would be right.

Archive-header format self-applied — 17th aurora/research
doc in a row.

Lands within-standing-authority per Otto-82/90/93
calibration — advisory research-grade; not a gate.

Otto-100 milestone tick. Closes dependency #1 of adoption
path named in Otto-99 (PR #282). Next natural step is Otto
integrating CRITICAL findings at write-time into a v1
detector-design revision OR scheduling that integration to
a future tick.

Otto-100 tick primary deliverable.
---
 ...-on-bullshit-detector-design-2026-04-24.md | 303 ++++++++++++++++++
 1 file changed, 303 insertions(+)
 create mode 100644 docs/research/aminata-pass-on-bullshit-detector-design-2026-04-24.md

diff --git a/docs/research/aminata-pass-on-bullshit-detector-design-2026-04-24.md b/docs/research/aminata-pass-on-bullshit-detector-design-2026-04-24.md
new file mode 100644
index 00000000..59fee86e
--- /dev/null
+++ b/docs/research/aminata-pass-on-bullshit-detector-design-2026-04-24.md
@@ -0,0 +1,303 @@
+# Aminata pass on provenance-aware bullshit-detector design
+
+**Scope:** adversarial review of Otto-99's provenance-aware
+bullshit-detector design (PR #282). Fourth Aminata pass this
+session; third on the Otto composition stack (Otto-90
+oracle-scoring v0 → Otto-94 iteration-1 on multi-Claude
+experiment → Otto-99 detector → this pass).
+
+**Attribution:** findings Aminata's, persona-authored.
+Otto-99 authored the detector design; this pass is
+adversarial review per Aminata's own role + the dependency
+named in Otto-99's adoption-path. Prior passes: PR #241
+(5th-ferry governance edits), PR #263 (7th-ferry oracle
+rules + threat model), PR #272 (iteration-1 on multi-Claude
+experiment design).
+
+**Operational status:** research-grade. Advisory; not a
+gate. Does not block the research-doc land (Otto-99
+correctly frames detector as research-grade); all ten
+findings would block a v1 implementation-ADR.
+
+**Non-fusion disclaimer:** alignment between Aminata's
+three prior passes and this one is a same-agent signal,
+NOT independent concordance. The detector applied to this
+very review would correctly emit `looks similar but
+lineage-coupled` — and it would be right. Per SD-9,
+same-agent concordance is not evidence; the pass's
+authority is its specific technical findings, not its
+agreement with prior passes.
+
+---
+
+## Findings
+
+### CRITICAL — cross-detector collusion is the carrier-laundering risk the detector itself incarnates
+
+**Claim.** The detector, Aminata, and Codex are all
+reviewers that (a) train or ground on overlapping corpora,
+(b) read the same repo tree, (c) read each other's PR
+comments. Their provenance cones are not independent. A
+query that trips the detector's `looks lineage-coupled`
+band will likely also trip Aminata into the same band of
+concern — because Aminata's cone and the detector's cone
+overlap by construction. The design names SD-9
+operationalisation but does NOT address reviewer-set
+carrier overlap.
+
+**Why it matters.** The detector is supposed to catch
+agreement-from-shared-lineage. If the reviewer set IS a
+shared-lineage cohort, then a `supported` verdict from
+detector + Aminata + Codex is still lineage-coupled
+concordance masquerading as three-independent-lines.
+This is the exact failure mode SD-9 names, applied one
+meta-layer up.
+
+**Ask.** Add a §"Reviewer-cone overlap" subsection
+documenting that the detector does NOT break the
+reviewer-set lineage coupling, and that human sign-off
+remains the only cone-breaking authority until an external
+independent-oracle substrate (supply-chain audit;
+external peer review; formal-methods proofs) lands.
+
+---
+
+### CRITICAL — `min`-merging on ordinal bands is Goodhart-bait at G_carrier_overlap
+
+**Claim.** `band = min(G_similarity, G_evidence,
+G_carrier_overlap, G_contradiction, G_status)`. One
+adversary-controllable gate drives the whole output. An
+adversary who can manipulate `cone(q) ∩ cone(y)` —
+trivially, by citing fewer sources, or routing citations
+through indirection chains the graph traversal does not
+follow — lands `G_carrier_overlap = GREEN` while leaving
+actual lineage coupling intact. The `min` rule then
+promotes the whole query to GREEN.
+
+**Why it matters.** `G_carrier_overlap` is the most
+adversary-manipulable gate of the five. Citation structure
+is author-controlled. The gate measures what the author
+TELLS the graph, not what the author actually INHERITED.
+An Otto-wake who wants `supported` simply writes the doc
+with sparse explicit citations and dense implicit
+framing-inheritance.
+
+**Ask.** Either (a) require G_carrier_overlap to use
+*content-similarity-in-cone* not *explicit-citation-edge-
+in-cone* (harder to game; costs embedding compute per
+cone member), or (b) downgrade any `supported` output to
+`YELLOW` when carrier-overlap was the gate closest to
+threshold (i.e., sensitivity-analysis-gate pattern).
+
+---
+
+### IMPORTANT — G_evidence_independent is a fig leaf until the substrate exists
+
+**Claim.** Otto-99 explicitly says `evidence` is
+"advisory only" until an independent-oracle substrate
+lands. That means v0 of the detector ships with a
+four-gate classifier, not five. The `likely confabulated`
+output type, defined as `G_evidence fail-to-RED + high
+similarity`, is therefore *unreachable* in v0 —
+G_evidence can only fail-to-YELLOW (advisory) by design.
+Output type 4 is dead code until dependency #3 in the
+adoption list ships.
+
+**Why it matters.** The design reads as five gates + five
+output types. The effective design for the foreseeable
+future is four gates + four reachable output types. This
+should be stated as an operational-status caveat, not
+buried in the dependency list.
+
+**Ask.** Add a §"v0 scope" subsection naming which output
+types are reachable in v0 vs. v1, and mark `likely
+confabulated` explicitly as `not-yet-reachable`. A
+dead-code output type in a safety-critical classifier is
+a maintenance hazard.
+
+---
+
+### IMPORTANT — `no-signal` is indistinguishable from `kNN-evasion`
+
+**Claim.** Retrieval returning empty emits `no-signal`.
+The design does not distinguish (a) the corpus genuinely
+doesn't cover the domain (legitimate no-signal) from (b)
+the query was crafted to evade kNN (paraphrase-drift
+attack; adversarial embedding perturbation; novel
+jargon). Both hit the same output type.
+
+**Why it matters.** Case (b) is precisely the adversary
+the detector exists to catch. If the adversary gets
+`no-signal` instead of `likely confabulated`, the
+detector has been bypassed by the simplest adaptive
+attack possible: write in a style the embedding model
+hasn't seen.
+
+**Ask.** Add a `G_coverage_plausibility` or
+`G_query_anomaly` gate: if `N(q)`'s representation
+distance to its nearest cluster centroid is anomalously
+high, treat retrieval-empty as `suspicious-no-signal`
+not `no-signal`. One extra nearest-centroid distance
+computation per query; cheap.
+
+---
+
+### IMPORTANT — `pinned_by` + optional `second-reviewer` does not prevent coordinated Otto-wake collusion
+
+**Claim.** Otto-99's Bonus response: status pins carry
+`pinned_by` + optional `second-reviewer`. All Otto-wakes
+are Claude sessions on the same factory. Two Otto-wakes
+signing off the same `known-bad` pin is not independent
+review; it is the same author across two sessions. The
+governance schema does not require the second reviewer
+to be a different *model*, a different *persona*, or a
+*human*.
+
+**Why it matters.** Same-agent-self-reinforcement drift,
+which the design names as the risk, is not actually
+mitigated. It is labelled-away.
+
+**Ask.** Require `second-reviewer` to be one of:
+different persona (Aminata / Kenji / other named
+specialist), different model (Codex), or human (Aaron).
+Pure Otto→Otto second-review should be named explicitly
+as insufficient for status-pin independence. Also:
+`second-reviewer` should be *required*, not *optional*,
+for `authorization-impacting: true` pins.
+
+---
+
+### IMPORTANT — `DetectorOutputRetracted` flood control is absent
+
+**Claim.** A threshold change via ADR retracts historical
+outputs whose classification changes. A single threshold
+move on `τ_low` could invalidate thousands of past query
+outputs. The design names the event but does not name a
+batch-retraction strategy, a rate limit, or a
+materialised-view invalidation plan.
+
+**Why it matters.** Retraction-native DoS is triggered by
+a single authorised-ADR write producing an unbounded
+retraction fan-out. Legitimate governance action becomes
+a substrate-DoS vector.
+
+**Ask.** Add a `DetectorOutputBatchRetracted(adr_id,
+affected_range, count)` event shape for ADR-triggered
+mass retractions, with a single batch event per ADR
+rather than one event per historical query. This aligns
+with Grey Goo Self-Replicating Retractions mitigation
+patterns.
+
+---
+
+### WATCH — worst-band query aggregation masks distribution of candidate quality
+
+**Claim.** `bullshitRisk(q) = worst-band(C(q))`. A single
+pathological candidate in a 20-candidate retrieval drops
+the whole query to RED. The aggregation does not report
+the distribution. A reviewer sees `YELLOW` and does not
+know whether 1/20 or 19/20 candidates drove it.
+
+**Why it matters.** Distribution shape is information that
+the reviewer needs; collapsing to worst-band hides it.
+
+**Ask.** Detector receipt must carry the per-candidate
+band histogram, not just the worst-band aggregate.
+Cheap; unlocks distributional review.
+
+---
+
+### WATCH — self-demonstrating worked example is theatre, not validation
+
+**Claim.** Otto-99's doc-applied-to-itself returns
+`looks similar but lineage-coupled`. This is correct but
+also the easiest possible case — the doc explicitly
+cites its sources and inherits their framing. A hostile
+author who wanted `supported` would write the same
+content while burying the lineage edges. The self-demo
+validates that carrier-overlap is measurable on
+well-cited inputs; it does not validate that the detector
+catches hostile carrier laundering.
+
+**Why it matters.** Reading the self-demo as evidence of
+adversarial robustness is a category error. It is a smoke
+test, not a red-team test.
+
+**Ask.** Reframe §"Worked example" as §"Smoke test" and
+add a §"Adversarial worked example (future)" placeholder
+that commits to running the detector against a
+deliberately-laundered query once the substrate ships.
+
+---
+
+### WATCH — composition stack compounds silent-failure surface
+
+Canonicalisation → representation → retrieval → provenance
+graph → gates → classifier. A bug in any lower layer
+(e.g., `ProvenanceEdgeAdded` event mis-ordering) silently
+degrades gate fidelity without surfacing as a detector-
+layer failure. The design does not name layer-boundary
+invariants or property tests that would make lower-layer
+bugs visible at the detector layer. Soraya-routable: at
+least one TLA+ invariant (`∀q: band(q) is monotone in
+|cone(q)|`) would make a whole class of lower-layer bugs
+detectable.
+
+---
+
+### DISMISS — parameter-ADR gate
+
+Reused from oracle-scoring v0; Aminata's Otto-90 concerns
+stand as mitigated. No new surface here.
+
+---
+
+## Summary
+
+Three CRITICAL, four IMPORTANT, three WATCH, one DISMISS.
+
+- **CRITICAL (3):** cross-detector collusion reintroduces
+  carrier-laundering at the reviewer-set meta-layer;
+  `min`-merging on ordinal bands is Goodhart-bait at the
+  adversary-manipulable G_carrier_overlap gate; reviewer
+  independence collapses when all reviewers share
+  training-corpus / repo-access / PR-comment lineage.
+  Two of three are gate-mechanics findings; one is a
+  sociological-composition finding.
+- **IMPORTANT (4):** G_evidence fig-leaf + dead-code
+  output type in v0; no-signal vs kNN-evasion
+  indistinguishability; Otto-wake second-review does not
+  prevent same-agent collusion; retraction-flood on
+  threshold-ADR.
+- **WATCH (3):** worst-band masks distribution; self-
+  demo is theatre; composition-stack silent-failure
+  surface absent TLA+ invariants.
+- **DISMISS (1):** parameter-ADR gate reused from
+  oracle-scoring v0.
+
+None block the research-doc land — Otto-99 correctly
+frames this as research-grade. **All ten findings would
+block a v1 implementation-ADR.** The detector's most
+adversary-exposed gate is G_carrier_overlap (author-
+controlled citation structure) and its most deceptive
+output is `no-signal` (kNN-evasion cover).
+
+Write-time integration of Aminata's three Otto-90
+concerns is real on (1) and (3), fig-leaf on (2) until
+the oracle substrate ships.
+
+## Relevant paths
+
+- [`docs/research/provenance-aware-bullshit-detector-2026-04-23.md`](provenance-aware-bullshit-detector-2026-04-23.md)
+  (under review, PR #282).
+- [`docs/research/semantic-canonicalization-and-provenance-aware-retrieval-2026-04-23.md`](semantic-canonicalization-and-provenance-aware-retrieval-2026-04-23.md)
+  (spine the detector composes on; PR #280).
+- [`docs/research/aminata-threat-model-7th-ferry-oracle-rules-2026-04-23.md`](aminata-threat-model-7th-ferry-oracle-rules-2026-04-23.md)
+  (Otto-90 prior pass; three CRITICAL concerns whose
+  write-time integration this pass evaluates).
+- [`docs/ALIGNMENT.md`](../ALIGNMENT.md) SD-9 — the soft
+  default this detector mechanises; the cross-detector
+  collusion CRITICAL flags a meta-layer SD-9 violation.
+- [`docs/DRIFT-TAXONOMY.md`](../DRIFT-TAXONOMY.md)
+  pattern 5 — real-time diagnostic the detector aims to
+  mechanise.