Skip to content

research: Aminata 4th pass on bullshit-detector design (3 CRITICAL + 4 IMPORTANT + 3 WATCH)#284

Merged
AceHack merged 1 commit intomainfrom
research/aminata-pass-on-bullshit-detector-design
Apr 24, 2026
Merged

research: Aminata 4th pass on bullshit-detector design (3 CRITICAL + 4 IMPORTANT + 3 WATCH)#284
AceHack merged 1 commit intomainfrom
research/aminata-pass-on-bullshit-detector-design

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented Apr 24, 2026

Summary

Fourth Aminata adversarial pass this session. Third on the Otto composition stack (Otto-90 oracle-scoring → Otto-94 iteration-1 → Otto-99 detector → this pass).

Closes dependency #1 of the adoption path named in Otto-99's detector design (PR #282).

3 CRITICAL findings

  1. Cross-detector collusion — detector + Aminata + Codex share training-corpus / repo-access / PR-comment lineage. Reviewer-set lineage-coupling reintroduces carrier-laundering one meta-layer up. Detector's SD-9 operationalisation doesn't break this; human sign-off remains the only cone-breaking authority.
  2. Min-merging Goodhart-bait at G_carrier_overlap — most adversary-manipulable gate drives whole output; author controls citations = author controls carrier-overlap gate.
  3. G_evidence is fig-leaf in v0 — advisory-only until independent-oracle substrate ships; likely confabulated output type is dead code until then. v0 scope caveat missing from design.

4 IMPORTANT

  • no-signal vs kNN-evasion indistinguishability (G_coverage_plausibility gate missing)
  • Otto-wake second-reviewer insufficient for status-pin independence
  • DetectorOutputRetracted flood-control absent on threshold-ADR
  • Retraction-native DoS vector via ADR-triggered mass retractions

3 WATCH + 1 DISMISS

WATCH: worst-band masks distribution; self-demo is theatre not validation; composition-stack silent-failure absent TLA+ invariants (Soraya-routable).
DISMISS: parameter-ADR gate (reused from oracle-scoring v0).

Impact

None block the research-doc land — Otto-99 correctly framed as research-grade.
All ten findings would block a v1 implementation-ADR.

Non-fusion disclaimer load-bearing

Aminata's concordance with prior Aminata passes is same-agent signal NOT independent concordance. The detector applied to THIS review would correctly emit `looks similar but lineage-coupled` per SD-9 — and it would be right.

Test plan

  • 17th aurora/research doc to self-apply §33 archive-header
  • Section-by-section: 5-gate classifier + 5 output types + self-demo + write-time integration + PatternLedger governance + composition stack + retraction flood + cross-detector collusion
  • Classifications match severity
  • All 10 findings have concrete Asks

🤖 Generated with Claude Code

…CAL + 4 IMPORTANT + 3 WATCH + 1 DISMISS)

Fourth Aminata adversarial pass this session. Third on the
Otto composition stack (Otto-90 oracle-scoring v0 → Otto-94
iteration-1 on multi-Claude → Otto-99 detector → this pass).

Reviews Otto-99's provenance-aware bullshit-detector design
(PR #282) which explicitly named "Aminata adversarial pass"
as dependency #1 of adoption.

3 CRITICAL findings:

1. **Cross-detector collusion** — detector + Aminata + Codex
   all share training-corpus / repo-access / PR-comment
   lineage. Reviewer-set lineage-coupling reintroduces
   carrier-laundering one meta-layer up. Detector's SD-9
   operationalisation does not break this.
2. **Min-merging on ordinal bands is Goodhart-bait at
   G_carrier_overlap** — most adversary-manipulable gate
   drives whole output. Author controls citation structure
   = author controls carrier-overlap gate. Sparse citations
   + dense implicit framing-inheritance = gameable GREEN.
3. **G_evidence fig-leaf** — advisory-only until independent-
   oracle substrate ships; v0 detector is four-gate not
   five-gate; `likely confabulated` output type is dead
   code until dependency #3 lands. Not stated as v0 caveat
   in design.

4 IMPORTANT: no-signal vs kNN-evasion indistinguishability;
Otto-wake second-review insufficient for status-pin
independence; DetectorOutputRetracted flood-control absent;
G_coverage_plausibility gate missing.

3 WATCH: worst-band masks distribution; self-demo is theatre
not validation; composition-stack silent-failure surface
absent TLA+ invariants (Soraya-routable).

1 DISMISS: parameter-ADR gate reused from oracle-scoring v0.

None block the research-doc land (Otto-99 correctly framed
research-grade). **All ten findings would block a v1
implementation-ADR.**

Non-fusion disclaimer load-bearing in this pass: Aminata's
concordance with prior Aminata passes is same-agent signal
NOT independent concordance. The detector applied to THIS
review would correctly emit `looks similar but lineage-
coupled` per SD-9 — and it would be right.

Archive-header format self-applied — 17th aurora/research
doc in a row.

Lands within-standing-authority per Otto-82/90/93
calibration — advisory research-grade; not a gate.

Otto-100 milestone tick. Closes dependency #1 of adoption
path named in Otto-99 (PR #282). Next natural step is Otto
integrating CRITICAL findings at write-time into a v1
detector-design revision OR scheduling that integration to
a future tick.

Otto-100 tick primary deliverable.
Copilot AI review requested due to automatic review settings April 24, 2026 04:03
@AceHack AceHack enabled auto-merge (squash) April 24, 2026 04:03
@AceHack AceHack merged commit 376506c into main Apr 24, 2026
12 checks passed
@AceHack AceHack deleted the research/aminata-pass-on-bullshit-detector-design branch April 24, 2026 04:05
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4ecadfdba2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


## Summary

Three CRITICAL, four IMPORTANT, three WATCH, one DISMISS.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reconcile summary severity counts with actual findings

The summary line reports Three CRITICAL, four IMPORTANT, three WATCH, one DISMISS, but the document body defines only 10 finding headings (2 CRITICAL + 4 IMPORTANT + 3 WATCH + 1 DISMISS), and later also says All ten findings. This mismatch makes the triage signal ambiguous (it implies an extra critical finding that is not actually enumerated), which can skew downstream prioritization and ADR decisions.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new research-grade adversarial review document capturing Aminata’s 4th-pass findings on the provenance-aware “bullshit-detector” design (PR #282), including CRITICAL/IMPORTANT/WATCH items and concrete asks.

Changes:

  • Introduces a structured findings write-up (3 CRITICAL, 4 IMPORTANT, 3 WATCH, 1 DISMISS) with rationale and requested follow-ups.
  • Summarizes operational status and non-fusion disclaimer to frame authority/independence boundaries.
  • Adds a “Relevant paths” section intended to connect this pass to related design artifacts.

Comment on lines +41 to +45
query that trips the detector's `looks lineage-coupled`
band will likely also trip Aminata into the same band of
concern — because Aminata's cone and the detector's cone
overlap by construction. The design names SD-9
operationalisation but does NOT address reviewer-set
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output-type label here (looks lineage-coupled) is inconsistent with the earlier/elsewhere label looks similar but lineage-coupled. This kind of terminology drift makes it hard to grep/track outputs and can cause downstream mismatches if the string becomes part of receipts/logs. Pick one canonical name (preferably the one defined in the detector design) and use it consistently throughout this doc.

Suggested change
query that trips the detector's `looks lineage-coupled`
band will likely also trip Aminata into the same band of
concern — because Aminata's cone and the detector's cone
overlap by construction. The design names SD-9
operationalisation but does NOT address reviewer-set
query that trips the detector's `looks similar but
lineage-coupled` band will likely also trip Aminata into
the same band of concern — because Aminata's cone and the
detector's cone overlap by construction. The design names
SD-9 operationalisation but does NOT address reviewer-set

Copilot uses AI. Check for mistakes.

**Ask.** Require `second-reviewer` to be one of:
different persona (Aminata / Kenji / other named
specialist), different model (Codex), or human (Aaron).
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line names the human maintainer directly ("human (Aaron)"), but the repo convention is to use role references (e.g., "human maintainer") in docs and code. See docs/AGENT-BEST-PRACTICES.md:284-290 for the policy. Please rewrite to a role-ref while keeping the meaning of the independence requirement.

Suggested change
specialist), different model (Codex), or human (Aaron).
specialist), different model (Codex), or human maintainer.

Copilot uses AI. Check for mistakes.
Comment on lines +291 to +293
- [`docs/research/provenance-aware-bullshit-detector-2026-04-23.md`](provenance-aware-bullshit-detector-2026-04-23.md)
(under review, PR #282).
- [`docs/research/semantic-canonicalization-and-provenance-aware-retrieval-2026-04-23.md`](semantic-canonicalization-and-provenance-aware-retrieval-2026-04-23.md)
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These relative links currently point at files that do not exist in the repository (provenance-aware-bullshit-detector-2026-04-23.md and semantic-canonicalization-and-provenance-aware-retrieval-2026-04-23.md). If the intent is to reference work that is only present in other open PRs, consider linking to the PRs (or to permanent commit/PR URLs) instead of a local relative path, so the links don't break when this doc lands independently.

Suggested change
- [`docs/research/provenance-aware-bullshit-detector-2026-04-23.md`](provenance-aware-bullshit-detector-2026-04-23.md)
(under review, PR #282).
- [`docs/research/semantic-canonicalization-and-provenance-aware-retrieval-2026-04-23.md`](semantic-canonicalization-and-provenance-aware-retrieval-2026-04-23.md)
- `docs/research/provenance-aware-bullshit-detector-2026-04-23.md`
(under review, PR #282).
- `docs/research/semantic-canonicalization-and-provenance-aware-retrieval-2026-04-23.md`

Copilot uses AI. Check for mistakes.
Comment on lines +301 to +303
- [`docs/DRIFT-TAXONOMY.md`](../DRIFT-TAXONOMY.md)
pattern 5 — real-time diagnostic the detector aims to
mechanise.
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link points to docs/DRIFT-TAXONOMY.md, which is not present in the repo, so it will be broken in rendered Markdown. Either update the link to an existing drift-taxonomy document (e.g. the research precursor) or add/land the referenced operational doc before linking to it here.

Suggested change
- [`docs/DRIFT-TAXONOMY.md`](../DRIFT-TAXONOMY.md)
pattern 5 — real-time diagnostic the detector aims to
mechanise.
- Drift taxonomy pattern 5 — real-time diagnostic the
detector aims to mechanise.

Copilot uses AI. Check for mistakes.
AceHack added a commit that referenced this pull request Apr 24, 2026
…ngs integrated (Otto-101)

Delta-style revision integrating only the 3 CRITICAL findings
from Aminata's Otto-100 pass (PR #284) into Otto-99's
bullshit-detector design (PR #282). 7 non-CRITICAL findings
deferred to v2 delta; DISMISS unchanged.

Three CRITICAL integrations:

**C1 — Cross-detector collusion (new §Reviewer-cone overlap
section).** Acknowledges as FUNDAMENTAL LIMITATION NOT
CLOSABLE GAP: detector cannot break its own reviewer-set
lineage coupling. Mitigation via routing authority:
- Human sign-off = only cone-breaking authority for
  authorization-impacting band=supported queries.
- External-reviewer signatures as legitimate cone-breakers
  when available.
- Intra-factory reviewer concordance should DOWNGRADE
  confidence per SD-9, not inflate.
- Every receipt carries reviewer_cone_signature field.

**C2 — Min-merging Goodhart-bait (sensitivity-analysis-gate
pattern).** v1 enhancement:
  band(y|q) = if band_raw=GREEN AND G_carrier_overlap closest
              to threshold → YELLOW
              else band_raw
Closes cheap cheat path (threshold-hugging on author-
controlled gate). Alternative content-similarity-in-cone
deferred to v2 cost-benefit.

**C3 — G_evidence fig-leaf (new §v0 scope subsection).** v0
effective = 4-gate classifier (G_evidence advisory-only until
independent-oracle substrate ships). `likely confabulated`
output type explicitly marked not-yet-reachable in v0.
Conservative under-detection stance (RED comes only from
known-bad-pattern; confabulations land YELLOW as plausible-
but-unresolved). v0→v1 transition plan names
DetectorOutputBatchRetracted for historical re-classification.

v1 delta does NOT rewrite Otto-99's design; specifies 3
additive sections to insert + 1 section to supersede. Otto-99
original preserved in git history.

Seven non-CRITICALs still open:
- 4 IMPORTANT (deferred to v2): G_coverage_plausibility gate;
  Otto-wake second-reviewer schema; DetectorOutputBatchRetracted;
  no-signal vs kNN-evasion.
- 3 WATCH (deferred to v2+): distribution histogram;
  adversarial worked example; TLA+ invariants.

1 fundamental limitation (C1) acknowledged not closed —
routes authority to human + external reviewer layers.

Scope limits: no design rewrite; no implementation; no
human-sign-off UI proposal; no content-similarity-in-cone
commit; 5-gate/5-type target structure unchanged.

5 dependencies-to-adoption: Aminata pass on v1 delta (fifth
session-pass); integrate v1 changes into Otto-99 design PR
(separate PR); v2 delta; independent-oracle substrate;
human-sign-off UI/protocol.

Archive-header format self-applied — 18th aurora/research
doc in a row.

Lands within-standing-authority per Otto-82/90/93
calibration.

Otto-101 tick primary deliverable — closes the CRITICAL-
integration step of the Aminata-then-Otto-response loop for
bullshit-detector design.
AceHack added a commit that referenced this pull request Apr 24, 2026
…session-pattern convergence observable

Milestone tick 100. PR #284 Aminata's fourth pass this session
surfaces 3 CRITICAL + 4 IMPORTANT + 3 WATCH + 1 DISMISS on the
bullshit-detector design. Closes dependency #1 of Otto-99's
adoption path.

Key observations at milestone:

1. Session-lifetime patterns now observable:
   - CC-002 discipline reflexive across 8 ferries
   - Event+view module template reused across 4 designs
     (substrate convergence)
   - Authority-calibration narrowed 3x (Otto-82/90/93)
   - SD-9 exercised twice by Amara at author-side
   - 8th-ferry 4/5 closed matching 5th-ferry arc
2. Aminata's own non-fusion disclaimer in this pass is the
   cleanest SD-9 worked example this session — explicitly
   names same-agent concordance as signal not evidence,
   consistent with her CRITICAL #1 cross-detector-collusion
   finding.
3. 10 findings to integrate before detector v1. Not all at
   Otto-100; progressive Otto-101+ work.
4. Queue at ~30 open auto-merge-armed PRs; not a bottleneck
   per Otto-72 don't-wait; visibility observation only.

Stacked on #283 (Otto-99 history).
AceHack added a commit that referenced this pull request Apr 24, 2026
…session-pattern convergence observable

Milestone tick 100. PR #284 Aminata's fourth pass this session
surfaces 3 CRITICAL + 4 IMPORTANT + 3 WATCH + 1 DISMISS on the
bullshit-detector design. Closes dependency #1 of Otto-99's
adoption path.

Key observations at milestone:

1. Session-lifetime patterns now observable:
   - CC-002 discipline reflexive across 8 ferries
   - Event+view module template reused across 4 designs
     (substrate convergence)
   - Authority-calibration narrowed 3x (Otto-82/90/93)
   - SD-9 exercised twice by Amara at author-side
   - 8th-ferry 4/5 closed matching 5th-ferry arc
2. Aminata's own non-fusion disclaimer in this pass is the
   cleanest SD-9 worked example this session — explicitly
   names same-agent concordance as signal not evidence,
   consistent with her CRITICAL #1 cross-detector-collusion
   finding.
3. 10 findings to integrate before detector v1. Not all at
   Otto-100; progressive Otto-101+ work.
4. Queue at ~30 open auto-merge-armed PRs; not a bottleneck
   per Otto-72 don't-wait; visibility observation only.

Stacked on #283 (Otto-99 history).
AceHack added a commit that referenced this pull request Apr 24, 2026
…session-pattern convergence observable

Milestone tick 100. PR #284 Aminata's fourth pass this session
surfaces 3 CRITICAL + 4 IMPORTANT + 3 WATCH + 1 DISMISS on the
bullshit-detector design. Closes dependency #1 of Otto-99's
adoption path.

Key observations at milestone:

1. Session-lifetime patterns now observable:
   - CC-002 discipline reflexive across 8 ferries
   - Event+view module template reused across 4 designs
     (substrate convergence)
   - Authority-calibration narrowed 3x (Otto-82/90/93)
   - SD-9 exercised twice by Amara at author-side
   - 8th-ferry 4/5 closed matching 5th-ferry arc
2. Aminata's own non-fusion disclaimer in this pass is the
   cleanest SD-9 worked example this session — explicitly
   names same-agent concordance as signal not evidence,
   consistent with her CRITICAL #1 cross-detector-collusion
   finding.
3. 10 findings to integrate before detector v1. Not all at
   Otto-100; progressive Otto-101+ work.
4. Queue at ~30 open auto-merge-armed PRs; not a bottleneck
   per Otto-72 don't-wait; visibility observation only.

Stacked on #283 (Otto-99 history).
AceHack added a commit that referenced this pull request Apr 24, 2026
…ngs integrated (Otto-101)

Delta-style revision integrating only the 3 CRITICAL findings
from Aminata's Otto-100 pass (PR #284) into Otto-99's
bullshit-detector design (PR #282). 7 non-CRITICAL findings
deferred to v2 delta; DISMISS unchanged.

Three CRITICAL integrations:

**C1 — Cross-detector collusion (new §Reviewer-cone overlap
section).** Acknowledges as FUNDAMENTAL LIMITATION NOT
CLOSABLE GAP: detector cannot break its own reviewer-set
lineage coupling. Mitigation via routing authority:
- Human sign-off = only cone-breaking authority for
  authorization-impacting band=supported queries.
- External-reviewer signatures as legitimate cone-breakers
  when available.
- Intra-factory reviewer concordance should DOWNGRADE
  confidence per SD-9, not inflate.
- Every receipt carries reviewer_cone_signature field.

**C2 — Min-merging Goodhart-bait (sensitivity-analysis-gate
pattern).** v1 enhancement:
  band(y|q) = if band_raw=GREEN AND G_carrier_overlap closest
              to threshold → YELLOW
              else band_raw
Closes cheap cheat path (threshold-hugging on author-
controlled gate). Alternative content-similarity-in-cone
deferred to v2 cost-benefit.

**C3 — G_evidence fig-leaf (new §v0 scope subsection).** v0
effective = 4-gate classifier (G_evidence advisory-only until
independent-oracle substrate ships). `likely confabulated`
output type explicitly marked not-yet-reachable in v0.
Conservative under-detection stance (RED comes only from
known-bad-pattern; confabulations land YELLOW as plausible-
but-unresolved). v0→v1 transition plan names
DetectorOutputBatchRetracted for historical re-classification.

v1 delta does NOT rewrite Otto-99's design; specifies 3
additive sections to insert + 1 section to supersede. Otto-99
original preserved in git history.

Seven non-CRITICALs still open:
- 4 IMPORTANT (deferred to v2): G_coverage_plausibility gate;
  Otto-wake second-reviewer schema; DetectorOutputBatchRetracted;
  no-signal vs kNN-evasion.
- 3 WATCH (deferred to v2+): distribution histogram;
  adversarial worked example; TLA+ invariants.

1 fundamental limitation (C1) acknowledged not closed —
routes authority to human + external reviewer layers.

Scope limits: no design rewrite; no implementation; no
human-sign-off UI proposal; no content-similarity-in-cone
commit; 5-gate/5-type target structure unchanged.

5 dependencies-to-adoption: Aminata pass on v1 delta (fifth
session-pass); integrate v1 changes into Otto-99 design PR
(separate PR); v2 delta; independent-oracle substrate;
human-sign-off UI/protocol.

Archive-header format self-applied — 18th aurora/research
doc in a row.

Lands within-standing-authority per Otto-82/90/93
calibration.

Otto-101 tick primary deliverable — closes the CRITICAL-
integration step of the Aminata-then-Otto-response loop for
bullshit-detector design.
AceHack added a commit that referenced this pull request Apr 24, 2026
…ngs integrated (Otto-101)

Delta-style revision integrating only the 3 CRITICAL findings
from Aminata's Otto-100 pass (PR #284) into Otto-99's
bullshit-detector design (PR #282). 7 non-CRITICAL findings
deferred to v2 delta; DISMISS unchanged.

Three CRITICAL integrations:

**C1 — Cross-detector collusion (new §Reviewer-cone overlap
section).** Acknowledges as FUNDAMENTAL LIMITATION NOT
CLOSABLE GAP: detector cannot break its own reviewer-set
lineage coupling. Mitigation via routing authority:
- Human sign-off = only cone-breaking authority for
  authorization-impacting band=supported queries.
- External-reviewer signatures as legitimate cone-breakers
  when available.
- Intra-factory reviewer concordance should DOWNGRADE
  confidence per SD-9, not inflate.
- Every receipt carries reviewer_cone_signature field.

**C2 — Min-merging Goodhart-bait (sensitivity-analysis-gate
pattern).** v1 enhancement:
  band(y|q) = if band_raw=GREEN AND G_carrier_overlap closest
              to threshold → YELLOW
              else band_raw
Closes cheap cheat path (threshold-hugging on author-
controlled gate). Alternative content-similarity-in-cone
deferred to v2 cost-benefit.

**C3 — G_evidence fig-leaf (new §v0 scope subsection).** v0
effective = 4-gate classifier (G_evidence advisory-only until
independent-oracle substrate ships). `likely confabulated`
output type explicitly marked not-yet-reachable in v0.
Conservative under-detection stance (RED comes only from
known-bad-pattern; confabulations land YELLOW as plausible-
but-unresolved). v0→v1 transition plan names
DetectorOutputBatchRetracted for historical re-classification.

v1 delta does NOT rewrite Otto-99's design; specifies 3
additive sections to insert + 1 section to supersede. Otto-99
original preserved in git history.

Seven non-CRITICALs still open:
- 4 IMPORTANT (deferred to v2): G_coverage_plausibility gate;
  Otto-wake second-reviewer schema; DetectorOutputBatchRetracted;
  no-signal vs kNN-evasion.
- 3 WATCH (deferred to v2+): distribution histogram;
  adversarial worked example; TLA+ invariants.

1 fundamental limitation (C1) acknowledged not closed —
routes authority to human + external reviewer layers.

Scope limits: no design rewrite; no implementation; no
human-sign-off UI proposal; no content-similarity-in-cone
commit; 5-gate/5-type target structure unchanged.

5 dependencies-to-adoption: Aminata pass on v1 delta (fifth
session-pass); integrate v1 changes into Otto-99 design PR
(separate PR); v2 delta; independent-oracle substrate;
human-sign-off UI/protocol.

Archive-header format self-applied — 18th aurora/research
doc in a row.

Lands within-standing-authority per Otto-82/90/93
calibration.

Otto-101 tick primary deliverable — closes the CRITICAL-
integration step of the Aminata-then-Otto-response loop for
bullshit-detector design.
AceHack added a commit that referenced this pull request Apr 24, 2026
…ated; 7 non-CRITICAL deferred to v2) (#286)

* research: bullshit-detector v1 CRITICAL-only delta — 3 CRITICAL findings integrated (Otto-101)

Delta-style revision integrating only the 3 CRITICAL findings
from Aminata's Otto-100 pass (PR #284) into Otto-99's
bullshit-detector design (PR #282). 7 non-CRITICAL findings
deferred to v2 delta; DISMISS unchanged.

Three CRITICAL integrations:

**C1 — Cross-detector collusion (new §Reviewer-cone overlap
section).** Acknowledges as FUNDAMENTAL LIMITATION NOT
CLOSABLE GAP: detector cannot break its own reviewer-set
lineage coupling. Mitigation via routing authority:
- Human sign-off = only cone-breaking authority for
  authorization-impacting band=supported queries.
- External-reviewer signatures as legitimate cone-breakers
  when available.
- Intra-factory reviewer concordance should DOWNGRADE
  confidence per SD-9, not inflate.
- Every receipt carries reviewer_cone_signature field.

**C2 — Min-merging Goodhart-bait (sensitivity-analysis-gate
pattern).** v1 enhancement:
  band(y|q) = if band_raw=GREEN AND G_carrier_overlap closest
              to threshold → YELLOW
              else band_raw
Closes cheap cheat path (threshold-hugging on author-
controlled gate). Alternative content-similarity-in-cone
deferred to v2 cost-benefit.

**C3 — G_evidence fig-leaf (new §v0 scope subsection).** v0
effective = 4-gate classifier (G_evidence advisory-only until
independent-oracle substrate ships). `likely confabulated`
output type explicitly marked not-yet-reachable in v0.
Conservative under-detection stance (RED comes only from
known-bad-pattern; confabulations land YELLOW as plausible-
but-unresolved). v0→v1 transition plan names
DetectorOutputBatchRetracted for historical re-classification.

v1 delta does NOT rewrite Otto-99's design; specifies 3
additive sections to insert + 1 section to supersede. Otto-99
original preserved in git history.

Seven non-CRITICALs still open:
- 4 IMPORTANT (deferred to v2): G_coverage_plausibility gate;
  Otto-wake second-reviewer schema; DetectorOutputBatchRetracted;
  no-signal vs kNN-evasion.
- 3 WATCH (deferred to v2+): distribution histogram;
  adversarial worked example; TLA+ invariants.

1 fundamental limitation (C1) acknowledged not closed —
routes authority to human + external reviewer layers.

Scope limits: no design rewrite; no implementation; no
human-sign-off UI proposal; no content-similarity-in-cone
commit; 5-gate/5-type target structure unchanged.

5 dependencies-to-adoption: Aminata pass on v1 delta (fifth
session-pass); integrate v1 changes into Otto-99 design PR
(separate PR); v2 delta; independent-oracle substrate;
human-sign-off UI/protocol.

Archive-header format self-applied — 18th aurora/research
doc in a row.

Lands within-standing-authority per Otto-82/90/93
calibration.

Otto-101 tick primary deliverable — closes the CRITICAL-
integration step of the Aminata-then-Otto-response loop for
bullshit-detector design.

* fix(#286): 5 review threads — veridicality vocab (body, filename rename backlogged) + header-fields + name attrib + table count + DRIFT-TAXONOMY xref

- Thread 1 (PRRT_kwDOSF9kNM59SpxP): compacted four archive-header
  fields (Scope / Attribution / Operational status / Non-fusion
  disclaimer) into first six lines so "Archive-header self-applied"
  claim is now accurate; removed the trailing claim sentence in
  favour of the structural compliance itself.
- Thread 2 (PRRT_kwDOSF9kNM59Spxg): shifted body vocabulary from
  "bullshit-detector" to "veridicality-detector" throughout; added a
  vocabulary-note paragraph explaining the filename-slug retention;
  appended BACKLOG P2 research-grade row for the cross-repo filename
  rename sweep (three doc files + link-update across PRs / round-
  history / memory-index). Otto-229 append-only discipline observed.
- Thread 3 (PRRT_kwDOSF9kNM59Spxl): converted persona-specific names
  ("Aminata", "Otto-99/100/101", "Aaron", "Max", "Codex", "Soraya")
  to role references (Aminata-persona / main-agent persona /
  maintainer / external-peer-agent / formal-methods-persona).
- Thread 4 (PRRT_kwDOSF9kNM59Spx1): corrected §"v0 scope" header
  from "four gates + four reachable output types" to "five gates
  (four active + one advisory) + five reachable output types (one
  dead-code)", matching the tables that list 5 gates and 6 output
  types (5 reachable + 1 not-yet-reachable).
- Thread 5 (PRRT_kwDOSF9kNM59Spx-): "DRIFT-TAXONOMY pattern 5"
  softened to point at actual precursor file
  docs/research/drift-taxonomy-bootstrap-precursor-2026-04-22.md,
  noting there is no canonical docs/DRIFT-TAXONOMY.md at time of
  writing.

No new PR; filename rename is backlogged. No merge.

* fix(#286): 5 new review threads — inline-code single-line (BACKLOG) + sweep-refs skill path (BACKLOG) + Output-types wording (§5) + fail-to-RED typo + PR-#282 annotation on dead-path

Resolves the 5 new copilot findings on commit 40a5de9 plus
rebases onto origin/main (append-only per Otto-229; both
rows retained in BACKLOG.md conflict resolution).

- BACKLOG row rewritten so `docs/research/provenance-aware-
  bullshit-detector-*` inline-code fits on a single line
  (markdown inline-code cannot span newlines).
- BACKLOG row points at the `sweep-refs` skill at
  `.claude/skills/sweep-refs/SKILL.md` (verified to exist)
  instead of the non-existent `tools/sweep-refs/*` path.
- Delta-doc "§5 output types" wording changed to "the Output
  types section" to avoid the §5-as-count misread.
- Delta-doc `fail-to-RED` joined on a single line (was
  split across a line-break, read as a malformed token).
- Delta-doc references to the base-design file
  (`provenance-aware-bullshit-detector-2026-04-23.md`) now
  annotated "(in PR #282 — not yet on main at time of this
  delta's writing)" so readers know the path is a forward
  reference, not a dead one.

Otto-229 append-only; Otto-230 reply-and-resolve-after-push;
Otto-236 every-thread-resolved.

* fix: markdownlint auto-fixes on research doc

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants