research: bullshit-detector v1 CRITICAL-only delta (3 CRITICAL integrated; 7 non-CRITICAL deferred to v2) by AceHack · Pull Request #286 · Lucent-Financial-Group/Zeta

AceHack · 2026-04-24T04:08:58Z

Summary

Delta-style revision integrating only the 3 CRITICAL findings from Aminata's Otto-100 pass (PR #284) into Otto-99's bullshit-detector design (PR #282).

7 non-CRITICAL findings (4 IMPORTANT + 3 WATCH) deferred to v2 delta.

Closes CRITICAL-integration step of Aminata-then-Otto-response loop.

3 CRITICAL integrations

#	Finding	Response
C1	Cross-detector collusion	New §Reviewer-cone overlap section acknowledging fundamental limitation; human sign-off as cone-breaking authority
C2	Min-merging Goodhart-bait	Sensitivity-analysis-gate pattern: `band = YELLOW` when G_carrier_overlap was gate closest to threshold
C3	G_evidence fig-leaf + dead `likely confabulated` in v0	New §v0 scope subsection: 4-gate effective; `likely confabulated` not-yet-reachable

Fundamental limitation acknowledged

C1 (reviewer-cone overlap) is NOT closable by detector design alone. The detector can't break its own reviewer-set lineage coupling. Mitigation via routing: human sign-off + external-reviewer signatures are the cone-breaking authorities; intra-factory reviewer concordance downgrades confidence rather than inflating.

Deferred to v2

4 IMPORTANT: G_coverage_plausibility gate · Otto-wake second-reviewer schema tightening · DetectorOutputBatchRetracted · no-signal vs kNN-evasion distinction.

3 WATCH: distribution histogram · adversarial worked example · TLA+ invariants.

DISMISS unchanged: parameter-ADR gate (reused from oracle-scoring v0).

Scope limits

Does NOT rewrite Otto-99's design — specifies delta only
Does NOT address IMPORTANT/WATCH findings
Does NOT implement
Does NOT propose human-sign-off UI
Does NOT commit content-similarity-in-cone (alternative to C2)
Does NOT change 5-gate target structure (v0 effective is 4-gate; v1-post-substrate is 5-gate)

Authority

Within standing authority per Otto-82/90/93 calibration — research-grade design revision.

Dependencies

Aminata adversarial pass on v1 delta (5th session-pass if lands)
Integrate v1 changes into Otto-99 design PR (separate PR)
v2 delta for non-CRITICALs
Independent-oracle substrate for full G_evidence
Human-sign-off UI/protocol

🤖 Generated with Claude Code

chatgpt-codex-connector · 2026-04-24T04:09:02Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Copilot

Pull request overview

Adds a research-grade delta document that integrates the 3 CRITICAL findings from Aminata’s Otto-100 pass into the provenance-aware detector design narrative, while explicitly deferring the 7 non-CRITICAL findings to a later v2 delta.

Changes:

Introduces a new research doc describing the CRITICAL-only delta and its intended insert/replace points in the Otto-99 design.
Specifies responses for C1 (reviewer-cone overlap limitation), C2 (carrier-overlap sensitivity downgrade), and C3 (v0 scope + reachable output types).
Captures deferred IMPORTANT/WATCH items and adoption dependencies as explicit scope boundaries.

…me backlogged) + header-fields + name attrib + table count + DRIFT-TAXONOMY xref - Thread 1 (PRRT_kwDOSF9kNM59SpxP): compacted four archive-header fields (Scope / Attribution / Operational status / Non-fusion disclaimer) into first six lines so "Archive-header self-applied" claim is now accurate; removed the trailing claim sentence in favour of the structural compliance itself. - Thread 2 (PRRT_kwDOSF9kNM59Spxg): shifted body vocabulary from "bullshit-detector" to "veridicality-detector" throughout; added a vocabulary-note paragraph explaining the filename-slug retention; appended BACKLOG P2 research-grade row for the cross-repo filename rename sweep (three doc files + link-update across PRs / round- history / memory-index). Otto-229 append-only discipline observed. - Thread 3 (PRRT_kwDOSF9kNM59Spxl): converted persona-specific names ("Aminata", "Otto-99/100/101", "Aaron", "Max", "Codex", "Soraya") to role references (Aminata-persona / main-agent persona / maintainer / external-peer-agent / formal-methods-persona). - Thread 4 (PRRT_kwDOSF9kNM59Spx1): corrected §"v0 scope" header from "four gates + four reachable output types" to "five gates (four active + one advisory) + five reachable output types (one dead-code)", matching the tables that list 5 gates and 6 output types (5 reachable + 1 not-yet-reachable). - Thread 5 (PRRT_kwDOSF9kNM59Spx-): "DRIFT-TAXONOMY pattern 5" softened to point at actual precursor file docs/research/drift-taxonomy-bootstrap-precursor-2026-04-22.md, noting there is no canonical docs/DRIFT-TAXONOMY.md at time of writing. No new PR; filename rename is backlogged. No merge.

…ta integrating Aminata Otto-100 findings Post-milestone tick. First integration pass on Aminata's Otto-100 10 findings. PR #286 delta-style revision addresses only the 3 CRITICAL findings; 7 non-CRITICAL deferred to v2. Key observations: 1. **Delta-style revision** is a new useful pattern — Otto-99 design stays intact; v1 specifies diff-style additions; v2 composes on v1. Reviewer-clean; revertable; extends to v3/v4/vN as more findings arrive. 2. **C1 acknowledged as fundamental limitation not closable gap** — honest move. Detector can't break own reviewer-set lineage coupling; authority-routing to human + external is the mitigation. Architectural cleanness of "acknowledge + route" beats "invent closure mechanism that doesn't exist." 3. **Sensitivity-analysis-gate** is cheap mechanical defense against Goodhart-bait on adversary-manipulable gates. Candidate BP-NN promotion if exercised on second classifier. 4. **v0-scope explicit subsection** closes dead-code-in- safety-critical-classifier hazard. Applicable to any design-on-paper vs operational-in-practice gap in the factory. Stacked on #285 (Otto-100 history).

…me backlogged) + header-fields + name attrib + table count + DRIFT-TAXONOMY xref - Thread 1 (PRRT_kwDOSF9kNM59SpxP): compacted four archive-header fields (Scope / Attribution / Operational status / Non-fusion disclaimer) into first six lines so "Archive-header self-applied" claim is now accurate; removed the trailing claim sentence in favour of the structural compliance itself. - Thread 2 (PRRT_kwDOSF9kNM59Spxg): shifted body vocabulary from "bullshit-detector" to "veridicality-detector" throughout; added a vocabulary-note paragraph explaining the filename-slug retention; appended BACKLOG P2 research-grade row for the cross-repo filename rename sweep (three doc files + link-update across PRs / round- history / memory-index). Otto-229 append-only discipline observed. - Thread 3 (PRRT_kwDOSF9kNM59Spxl): converted persona-specific names ("Aminata", "Otto-99/100/101", "Aaron", "Max", "Codex", "Soraya") to role references (Aminata-persona / main-agent persona / maintainer / external-peer-agent / formal-methods-persona). - Thread 4 (PRRT_kwDOSF9kNM59Spx1): corrected §"v0 scope" header from "four gates + four reachable output types" to "five gates (four active + one advisory) + five reachable output types (one dead-code)", matching the tables that list 5 gates and 6 output types (5 reachable + 1 not-yet-reachable). - Thread 5 (PRRT_kwDOSF9kNM59Spx-): "DRIFT-TAXONOMY pattern 5" softened to point at actual precursor file docs/research/drift-taxonomy-bootstrap-precursor-2026-04-22.md, noting there is no canonical docs/DRIFT-TAXONOMY.md at time of writing. No new PR; filename rename is backlogged. No merge.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

…ngs integrated (Otto-101) Delta-style revision integrating only the 3 CRITICAL findings from Aminata's Otto-100 pass (PR #284) into Otto-99's bullshit-detector design (PR #282). 7 non-CRITICAL findings deferred to v2 delta; DISMISS unchanged. Three CRITICAL integrations: **C1 — Cross-detector collusion (new §Reviewer-cone overlap section).** Acknowledges as FUNDAMENTAL LIMITATION NOT CLOSABLE GAP: detector cannot break its own reviewer-set lineage coupling. Mitigation via routing authority: - Human sign-off = only cone-breaking authority for authorization-impacting band=supported queries. - External-reviewer signatures as legitimate cone-breakers when available. - Intra-factory reviewer concordance should DOWNGRADE confidence per SD-9, not inflate. - Every receipt carries reviewer_cone_signature field. **C2 — Min-merging Goodhart-bait (sensitivity-analysis-gate pattern).** v1 enhancement: band(y|q) = if band_raw=GREEN AND G_carrier_overlap closest to threshold → YELLOW else band_raw Closes cheap cheat path (threshold-hugging on author- controlled gate). Alternative content-similarity-in-cone deferred to v2 cost-benefit. **C3 — G_evidence fig-leaf (new §v0 scope subsection).** v0 effective = 4-gate classifier (G_evidence advisory-only until independent-oracle substrate ships). `likely confabulated` output type explicitly marked not-yet-reachable in v0. Conservative under-detection stance (RED comes only from known-bad-pattern; confabulations land YELLOW as plausible- but-unresolved). v0→v1 transition plan names DetectorOutputBatchRetracted for historical re-classification. v1 delta does NOT rewrite Otto-99's design; specifies 3 additive sections to insert + 1 section to supersede. Otto-99 original preserved in git history. Seven non-CRITICALs still open: - 4 IMPORTANT (deferred to v2): G_coverage_plausibility gate; Otto-wake second-reviewer schema; DetectorOutputBatchRetracted; no-signal vs kNN-evasion. - 3 WATCH (deferred to v2+): distribution histogram; adversarial worked example; TLA+ invariants. 1 fundamental limitation (C1) acknowledged not closed — routes authority to human + external reviewer layers. Scope limits: no design rewrite; no implementation; no human-sign-off UI proposal; no content-similarity-in-cone commit; 5-gate/5-type target structure unchanged. 5 dependencies-to-adoption: Aminata pass on v1 delta (fifth session-pass); integrate v1 changes into Otto-99 design PR (separate PR); v2 delta; independent-oracle substrate; human-sign-off UI/protocol. Archive-header format self-applied — 18th aurora/research doc in a row. Lands within-standing-authority per Otto-82/90/93 calibration. Otto-101 tick primary deliverable — closes the CRITICAL- integration step of the Aminata-then-Otto-response loop for bullshit-detector design.

…me backlogged) + header-fields + name attrib + table count + DRIFT-TAXONOMY xref - Thread 1 (PRRT_kwDOSF9kNM59SpxP): compacted four archive-header fields (Scope / Attribution / Operational status / Non-fusion disclaimer) into first six lines so "Archive-header self-applied" claim is now accurate; removed the trailing claim sentence in favour of the structural compliance itself. - Thread 2 (PRRT_kwDOSF9kNM59Spxg): shifted body vocabulary from "bullshit-detector" to "veridicality-detector" throughout; added a vocabulary-note paragraph explaining the filename-slug retention; appended BACKLOG P2 research-grade row for the cross-repo filename rename sweep (three doc files + link-update across PRs / round- history / memory-index). Otto-229 append-only discipline observed. - Thread 3 (PRRT_kwDOSF9kNM59Spxl): converted persona-specific names ("Aminata", "Otto-99/100/101", "Aaron", "Max", "Codex", "Soraya") to role references (Aminata-persona / main-agent persona / maintainer / external-peer-agent / formal-methods-persona). - Thread 4 (PRRT_kwDOSF9kNM59Spx1): corrected §"v0 scope" header from "four gates + four reachable output types" to "five gates (four active + one advisory) + five reachable output types (one dead-code)", matching the tables that list 5 gates and 6 output types (5 reachable + 1 not-yet-reachable). - Thread 5 (PRRT_kwDOSF9kNM59Spx-): "DRIFT-TAXONOMY pattern 5" softened to point at actual precursor file docs/research/drift-taxonomy-bootstrap-precursor-2026-04-22.md, noting there is no canonical docs/DRIFT-TAXONOMY.md at time of writing. No new PR; filename rename is backlogged. No merge.

… sweep-refs skill path (BACKLOG) + Output-types wording (§5) + fail-to-RED typo + PR-#282 annotation on dead-path Resolves the 5 new copilot findings on commit 40a5de9 plus rebases onto origin/main (append-only per Otto-229; both rows retained in BACKLOG.md conflict resolution). - BACKLOG row rewritten so `docs/research/provenance-aware- bullshit-detector-*` inline-code fits on a single line (markdown inline-code cannot span newlines). - BACKLOG row points at the `sweep-refs` skill at `.claude/skills/sweep-refs/SKILL.md` (verified to exist) instead of the non-existent `tools/sweep-refs/*` path. - Delta-doc "§5 output types" wording changed to "the Output types section" to avoid the §5-as-count misread. - Delta-doc `fail-to-RED` joined on a single line (was split across a line-break, read as a malformed token). - Delta-doc references to the base-design file (`provenance-aware-bullshit-detector-2026-04-23.md`) now annotated "(in PR #282 — not yet on main at time of this delta's writing)" so readers know the path is a forward reference, not a dead one. Otto-229 append-only; Otto-230 reply-and-resolve-after-push; Otto-236 every-thread-resolved.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Copilot · 2026-04-24T19:36:39Z

+> **re-introduces one meta-layer up**: the detector itself
+>
+> + the adversarial-reviewer persona + the external-peer-
+> agent reviewer + any other factory-internal reviewer
+> share training-corpus / repo-access / PR-comment lineage.


P1: The blockquote line starting with + (> + the adversarial-reviewer persona ...) will be parsed as an unordered-list item. Repo guidance explicitly avoids line-start + in prose/lists because it’s easy to accidentally create nested lists and confuse readers/linters; please rewrite this as normal prose (e.g., “the detector itself, the adversarial-reviewer persona, … share …”) or use - bullets consistently.

Suggested change

> **re-introduces one meta-layer up**: the detector itself

>

> + the adversarial-reviewer persona + the external-peer-

> agent reviewer + any other factory-internal reviewer

> share training-corpus / repo-access / PR-comment lineage.

> **re-introduces one meta-layer up**: the detector itself,

> the adversarial-reviewer persona, the external-peer-agent

> reviewer, and any other factory-internal reviewer share

> training-corpus / repo-access / PR-comment lineage.

Copilot · 2026-04-24T19:36:40Z

+> - **Maintainer sign-off is the only cone-breaking
+>   authority.** When a query's band is `supported` and
+>   the stakes are authorization-impacting, the detector's
+>   output is advisory; maintainer sign-off is required as
+>   the independent-cone signal.
+> - **External-reviewer signatures are legitimate cone-
+>   breaking** when they exist (external peer review;
+>   formal-methods proofs landed at Lean/Mathlib scale;


P1: This section says “Maintainer sign-off is the only cone-breaking authority”, but the next bullet says external-reviewer signatures are also cone-breaking. Please clarify the scope (e.g., “only internal cone-breaking authority” / “default required authority for authorization-impacting decisions”) so the policy doesn’t contradict itself.

Copilot · 2026-04-24T19:36:40Z

+  "G_carrier_overlap was the gate closest to threshold"
+  ↔ overlap_ratio(cone(q), cone(y)) ∈ [θ_med - ε, θ_med]
+    OR ∈ [θ_high - ε, θ_high]
+
+band(y | q) = if band_raw = GREEN AND sensitivity → YELLOW
+              else band_raw


P1: The sensitivity pseudo-code mixes a predicate definition with a rewrite arrow (sensitivity → YELLOW) and then uses sensitivity without applying it (it’s defined as sensitivity(y|q) = ...). This is ambiguous for implementers; please rewrite as a boolean predicate (no quoted string) and use a standard conditional like if band_raw = GREEN && sensitivity(y|q) then YELLOW else band_raw.

Suggested change

"G_carrier_overlap was the gate closest to threshold"

↔ overlap_ratio(cone(q), cone(y)) ∈ [θ_med - ε, θ_med]

OR ∈ [θ_high - ε, θ_high]

band(y | q) = if band_raw = GREEN AND sensitivity → YELLOW

else band_raw

overlap_ratio(cone(q), cone(y)) ∈ [θ_med - ε, θ_med]

OR overlap_ratio(cone(q), cone(y)) ∈ [θ_high - ε, θ_high]

band(y | q) =

if band_raw = GREEN && sensitivity(y | q)

then YELLOW

else band_raw

Copilot AI review requested due to automatic review settings April 24, 2026 04:08

AceHack enabled auto-merge (squash) April 24, 2026 04:09

Copilot started reviewing on behalf of AceHack April 24, 2026 04:09 View session

AceHack mentioned this pull request Apr 24, 2026

history: Otto-101 tick-close — bullshit-detector v1 CRITICAL-only delta #287

Closed

Copilot AI reviewed Apr 24, 2026

View reviewed changes

AceHack force-pushed the research/bullshit-detector-v1-critical-only-delta branch from ead499e to 39e7681 Compare April 24, 2026 13:35

AceHack force-pushed the research/bullshit-detector-v1-critical-only-delta branch from 39e7681 to 40a5de9 Compare April 24, 2026 14:42

Copilot AI review requested due to automatic review settings April 24, 2026 14:42

Copilot started reviewing on behalf of AceHack April 24, 2026 14:44 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

AceHack added 3 commits April 24, 2026 11:33

AceHack force-pushed the research/bullshit-detector-v1-critical-only-delta branch from 40a5de9 to fc50828 Compare April 24, 2026 15:35

fix: markdownlint auto-fixes on research doc

385f12d

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 24, 2026 19:33

Copilot started reviewing on behalf of AceHack April 24, 2026 19:33 View session

AceHack merged commit 19bc7a1 into main Apr 24, 2026
12 checks passed

AceHack deleted the research/bullshit-detector-v1-critical-only-delta branch April 24, 2026 19:35

Copilot AI reviewed Apr 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research: bullshit-detector v1 CRITICAL-only delta (3 CRITICAL integrated; 7 non-CRITICAL deferred to v2)#286

research: bullshit-detector v1 CRITICAL-only delta (3 CRITICAL integrated; 7 non-CRITICAL deferred to v2)#286
AceHack merged 4 commits intomainfrom
research/bullshit-detector-v1-critical-only-delta

AceHack commented Apr 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AceHack commented Apr 24, 2026

Summary

3 CRITICAL integrations

Fundamental limitation acknowledged

Deferred to v2

Scope limits

Authority

Dependencies

Uh oh!

chatgpt-codex-connector Bot commented Apr 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants