-
Notifications
You must be signed in to change notification settings - Fork 1
research: Aminata 4th pass on bullshit-detector design (3 CRITICAL + 4 IMPORTANT + 3 WATCH) #284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,303 @@ | ||||||||||||||
| # Aminata pass on provenance-aware bullshit-detector design | ||||||||||||||
|
|
||||||||||||||
| **Scope:** adversarial review of Otto-99's provenance-aware | ||||||||||||||
| bullshit-detector design (PR #282). Fourth Aminata pass this | ||||||||||||||
| session; third on the Otto composition stack (Otto-90 | ||||||||||||||
| oracle-scoring v0 → Otto-94 iteration-1 on multi-Claude | ||||||||||||||
| experiment → Otto-99 detector → this pass). | ||||||||||||||
|
|
||||||||||||||
| **Attribution:** findings Aminata's, persona-authored. | ||||||||||||||
| Otto-99 authored the detector design; this pass is | ||||||||||||||
| adversarial review per Aminata's own role + the dependency | ||||||||||||||
| named in Otto-99's adoption-path. Prior passes: PR #241 | ||||||||||||||
| (5th-ferry governance edits), PR #263 (7th-ferry oracle | ||||||||||||||
| rules + threat model), PR #272 (iteration-1 on multi-Claude | ||||||||||||||
| experiment design). | ||||||||||||||
|
|
||||||||||||||
| **Operational status:** research-grade. Advisory; not a | ||||||||||||||
| gate. Does not block the research-doc land (Otto-99 | ||||||||||||||
| correctly frames detector as research-grade); all ten | ||||||||||||||
| findings would block a v1 implementation-ADR. | ||||||||||||||
|
|
||||||||||||||
| **Non-fusion disclaimer:** alignment between Aminata's | ||||||||||||||
| three prior passes and this one is a same-agent signal, | ||||||||||||||
| NOT independent concordance. The detector applied to this | ||||||||||||||
| very review would correctly emit `looks similar but | ||||||||||||||
| lineage-coupled` — and it would be right. Per SD-9, | ||||||||||||||
| same-agent concordance is not evidence; the pass's | ||||||||||||||
| authority is its specific technical findings, not its | ||||||||||||||
| agreement with prior passes. | ||||||||||||||
|
|
||||||||||||||
| --- | ||||||||||||||
|
|
||||||||||||||
| ## Findings | ||||||||||||||
|
|
||||||||||||||
| ### CRITICAL — cross-detector collusion is the carrier-laundering risk the detector itself incarnates | ||||||||||||||
|
|
||||||||||||||
| **Claim.** The detector, Aminata, and Codex are all | ||||||||||||||
| reviewers that (a) train or ground on overlapping corpora, | ||||||||||||||
| (b) read the same repo tree, (c) read each other's PR | ||||||||||||||
| comments. Their provenance cones are not independent. A | ||||||||||||||
| query that trips the detector's `looks lineage-coupled` | ||||||||||||||
| band will likely also trip Aminata into the same band of | ||||||||||||||
| concern — because Aminata's cone and the detector's cone | ||||||||||||||
| overlap by construction. The design names SD-9 | ||||||||||||||
| operationalisation but does NOT address reviewer-set | ||||||||||||||
| carrier overlap. | ||||||||||||||
|
|
||||||||||||||
| **Why it matters.** The detector is supposed to catch | ||||||||||||||
| agreement-from-shared-lineage. If the reviewer set IS a | ||||||||||||||
| shared-lineage cohort, then a `supported` verdict from | ||||||||||||||
| detector + Aminata + Codex is still lineage-coupled | ||||||||||||||
| concordance masquerading as three-independent-lines. | ||||||||||||||
| This is the exact failure mode SD-9 names, applied one | ||||||||||||||
| meta-layer up. | ||||||||||||||
|
|
||||||||||||||
| **Ask.** Add a §"Reviewer-cone overlap" subsection | ||||||||||||||
| documenting that the detector does NOT break the | ||||||||||||||
| reviewer-set lineage coupling, and that human sign-off | ||||||||||||||
| remains the only cone-breaking authority until an external | ||||||||||||||
| independent-oracle substrate (supply-chain audit; | ||||||||||||||
| external peer review; formal-methods proofs) lands. | ||||||||||||||
|
|
||||||||||||||
| --- | ||||||||||||||
|
|
||||||||||||||
| ### CRITICAL — `min`-merging on ordinal bands is Goodhart-bait at G_carrier_overlap | ||||||||||||||
|
|
||||||||||||||
| **Claim.** `band = min(G_similarity, G_evidence, | ||||||||||||||
| G_carrier_overlap, G_contradiction, G_status)`. One | ||||||||||||||
| adversary-controllable gate drives the whole output. An | ||||||||||||||
| adversary who can manipulate `cone(q) ∩ cone(y)` — | ||||||||||||||
| trivially, by citing fewer sources, or routing citations | ||||||||||||||
| through indirection chains the graph traversal does not | ||||||||||||||
| follow — lands `G_carrier_overlap = GREEN` while leaving | ||||||||||||||
| actual lineage coupling intact. The `min` rule then | ||||||||||||||
| promotes the whole query to GREEN. | ||||||||||||||
|
|
||||||||||||||
| **Why it matters.** `G_carrier_overlap` is the most | ||||||||||||||
| adversary-manipulable gate of the five. Citation structure | ||||||||||||||
| is author-controlled. The gate measures what the author | ||||||||||||||
| TELLS the graph, not what the author actually INHERITED. | ||||||||||||||
| An Otto-wake who wants `supported` simply writes the doc | ||||||||||||||
| with sparse explicit citations and dense implicit | ||||||||||||||
| framing-inheritance. | ||||||||||||||
|
|
||||||||||||||
| **Ask.** Either (a) require G_carrier_overlap to use | ||||||||||||||
| *content-similarity-in-cone* not *explicit-citation-edge- | ||||||||||||||
| in-cone* (harder to game; costs embedding compute per | ||||||||||||||
| cone member), or (b) downgrade any `supported` output to | ||||||||||||||
| `YELLOW` when carrier-overlap was the gate closest to | ||||||||||||||
| threshold (i.e., sensitivity-analysis-gate pattern). | ||||||||||||||
|
|
||||||||||||||
| --- | ||||||||||||||
|
|
||||||||||||||
| ### IMPORTANT — G_evidence_independent is a fig leaf until the substrate exists | ||||||||||||||
|
|
||||||||||||||
| **Claim.** Otto-99 explicitly says `evidence` is | ||||||||||||||
| "advisory only" until an independent-oracle substrate | ||||||||||||||
| lands. That means v0 of the detector ships with a | ||||||||||||||
| four-gate classifier, not five. The `likely confabulated` | ||||||||||||||
| output type, defined as `G_evidence fail-to-RED + high | ||||||||||||||
| similarity`, is therefore *unreachable* in v0 — | ||||||||||||||
| G_evidence can only fail-to-YELLOW (advisory) by design. | ||||||||||||||
| Output type 4 is dead code until dependency #3 in the | ||||||||||||||
| adoption list ships. | ||||||||||||||
|
|
||||||||||||||
| **Why it matters.** The design reads as five gates + five | ||||||||||||||
| output types. The effective design for the foreseeable | ||||||||||||||
| future is four gates + four reachable output types. This | ||||||||||||||
| should be stated as an operational-status caveat, not | ||||||||||||||
| buried in the dependency list. | ||||||||||||||
|
|
||||||||||||||
| **Ask.** Add a §"v0 scope" subsection naming which output | ||||||||||||||
| types are reachable in v0 vs. v1, and mark `likely | ||||||||||||||
| confabulated` explicitly as `not-yet-reachable`. A | ||||||||||||||
| dead-code output type in a safety-critical classifier is | ||||||||||||||
| a maintenance hazard. | ||||||||||||||
|
|
||||||||||||||
| --- | ||||||||||||||
|
|
||||||||||||||
| ### IMPORTANT — `no-signal` is indistinguishable from `kNN-evasion` | ||||||||||||||
|
|
||||||||||||||
| **Claim.** Retrieval returning empty emits `no-signal`. | ||||||||||||||
| The design does not distinguish (a) the corpus genuinely | ||||||||||||||
| doesn't cover the domain (legitimate no-signal) from (b) | ||||||||||||||
| the query was crafted to evade kNN (paraphrase-drift | ||||||||||||||
| attack; adversarial embedding perturbation; novel | ||||||||||||||
| jargon). Both hit the same output type. | ||||||||||||||
|
|
||||||||||||||
| **Why it matters.** Case (b) is precisely the adversary | ||||||||||||||
| the detector exists to catch. If the adversary gets | ||||||||||||||
| `no-signal` instead of `likely confabulated`, the | ||||||||||||||
| detector has been bypassed by the simplest adaptive | ||||||||||||||
| attack possible: write in a style the embedding model | ||||||||||||||
| hasn't seen. | ||||||||||||||
|
|
||||||||||||||
| **Ask.** Add a `G_coverage_plausibility` or | ||||||||||||||
| `G_query_anomaly` gate: if `N(q)`'s representation | ||||||||||||||
| distance to its nearest cluster centroid is anomalously | ||||||||||||||
| high, treat retrieval-empty as `suspicious-no-signal` | ||||||||||||||
| not `no-signal`. One extra nearest-centroid distance | ||||||||||||||
| computation per query; cheap. | ||||||||||||||
|
|
||||||||||||||
| --- | ||||||||||||||
|
|
||||||||||||||
| ### IMPORTANT — `pinned_by` + optional `second-reviewer` does not prevent coordinated Otto-wake collusion | ||||||||||||||
|
|
||||||||||||||
| **Claim.** Otto-99's Bonus response: status pins carry | ||||||||||||||
| `pinned_by` + optional `second-reviewer`. All Otto-wakes | ||||||||||||||
| are Claude sessions on the same factory. Two Otto-wakes | ||||||||||||||
| signing off the same `known-bad` pin is not independent | ||||||||||||||
| review; it is the same author across two sessions. The | ||||||||||||||
| governance schema does not require the second reviewer | ||||||||||||||
| to be a different *model*, a different *persona*, or a | ||||||||||||||
| *human*. | ||||||||||||||
|
|
||||||||||||||
| **Why it matters.** Same-agent-self-reinforcement drift, | ||||||||||||||
| which the design names as the risk, is not actually | ||||||||||||||
| mitigated. It is labelled-away. | ||||||||||||||
|
|
||||||||||||||
| **Ask.** Require `second-reviewer` to be one of: | ||||||||||||||
| different persona (Aminata / Kenji / other named | ||||||||||||||
| specialist), different model (Codex), or human (Aaron). | ||||||||||||||
|
||||||||||||||
| specialist), different model (Codex), or human (Aaron). | |
| specialist), different model (Codex), or human maintainer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reconcile summary severity counts with actual findings
The summary line reports Three CRITICAL, four IMPORTANT, three WATCH, one DISMISS, but the document body defines only 10 finding headings (2 CRITICAL + 4 IMPORTANT + 3 WATCH + 1 DISMISS), and later also says All ten findings. This mismatch makes the triage signal ambiguous (it implies an extra critical finding that is not actually enumerated), which can skew downstream prioritization and ADR decisions.
Useful? React with 👍 / 👎.
Copilot
AI
Apr 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These relative links currently point at files that do not exist in the repository (provenance-aware-bullshit-detector-2026-04-23.md and semantic-canonicalization-and-provenance-aware-retrieval-2026-04-23.md). If the intent is to reference work that is only present in other open PRs, consider linking to the PRs (or to permanent commit/PR URLs) instead of a local relative path, so the links don't break when this doc lands independently.
| - [`docs/research/provenance-aware-bullshit-detector-2026-04-23.md`](provenance-aware-bullshit-detector-2026-04-23.md) | |
| (under review, PR #282). | |
| - [`docs/research/semantic-canonicalization-and-provenance-aware-retrieval-2026-04-23.md`](semantic-canonicalization-and-provenance-aware-retrieval-2026-04-23.md) | |
| - `docs/research/provenance-aware-bullshit-detector-2026-04-23.md` | |
| (under review, PR #282). | |
| - `docs/research/semantic-canonicalization-and-provenance-aware-retrieval-2026-04-23.md` |
Copilot
AI
Apr 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This link points to docs/DRIFT-TAXONOMY.md, which is not present in the repo, so it will be broken in rendered Markdown. Either update the link to an existing drift-taxonomy document (e.g. the research precursor) or add/land the referenced operational doc before linking to it here.
| - [`docs/DRIFT-TAXONOMY.md`](../DRIFT-TAXONOMY.md) | |
| pattern 5 — real-time diagnostic the detector aims to | |
| mechanise. | |
| - Drift taxonomy pattern 5 — real-time diagnostic the | |
| detector aims to mechanise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output-type label here (
looks lineage-coupled) is inconsistent with the earlier/elsewhere labellooks similar but lineage-coupled. This kind of terminology drift makes it hard to grep/track outputs and can cause downstream mismatches if the string becomes part of receipts/logs. Pick one canonical name (preferably the one defined in the detector design) and use it consistently throughout this doc.