From 50c6882b50e7dfd1e0cf78ab6cf9c1d024741256 Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Fri, 24 Apr 2026 00:08:23 -0400 Subject: [PATCH 1/4] =?UTF-8?q?research:=20bullshit-detector=20v1=20CRITIC?= =?UTF-8?q?AL-only=20delta=20=E2=80=94=203=20CRITICAL=20findings=20integra?= =?UTF-8?q?ted=20(Otto-101)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Delta-style revision integrating only the 3 CRITICAL findings from Aminata's Otto-100 pass (PR #284) into Otto-99's bullshit-detector design (PR #282). 7 non-CRITICAL findings deferred to v2 delta; DISMISS unchanged. Three CRITICAL integrations: **C1 — Cross-detector collusion (new §Reviewer-cone overlap section).** Acknowledges as FUNDAMENTAL LIMITATION NOT CLOSABLE GAP: detector cannot break its own reviewer-set lineage coupling. Mitigation via routing authority: - Human sign-off = only cone-breaking authority for authorization-impacting band=supported queries. - External-reviewer signatures as legitimate cone-breakers when available. - Intra-factory reviewer concordance should DOWNGRADE confidence per SD-9, not inflate. - Every receipt carries reviewer_cone_signature field. **C2 — Min-merging Goodhart-bait (sensitivity-analysis-gate pattern).** v1 enhancement: band(y|q) = if band_raw=GREEN AND G_carrier_overlap closest to threshold → YELLOW else band_raw Closes cheap cheat path (threshold-hugging on author- controlled gate). Alternative content-similarity-in-cone deferred to v2 cost-benefit. **C3 — G_evidence fig-leaf (new §v0 scope subsection).** v0 effective = 4-gate classifier (G_evidence advisory-only until independent-oracle substrate ships). `likely confabulated` output type explicitly marked not-yet-reachable in v0. Conservative under-detection stance (RED comes only from known-bad-pattern; confabulations land YELLOW as plausible- but-unresolved). v0→v1 transition plan names DetectorOutputBatchRetracted for historical re-classification. v1 delta does NOT rewrite Otto-99's design; specifies 3 additive sections to insert + 1 section to supersede. Otto-99 original preserved in git history. Seven non-CRITICALs still open: - 4 IMPORTANT (deferred to v2): G_coverage_plausibility gate; Otto-wake second-reviewer schema; DetectorOutputBatchRetracted; no-signal vs kNN-evasion. - 3 WATCH (deferred to v2+): distribution histogram; adversarial worked example; TLA+ invariants. 1 fundamental limitation (C1) acknowledged not closed — routes authority to human + external reviewer layers. Scope limits: no design rewrite; no implementation; no human-sign-off UI proposal; no content-similarity-in-cone commit; 5-gate/5-type target structure unchanged. 5 dependencies-to-adoption: Aminata pass on v1 delta (fifth session-pass); integrate v1 changes into Otto-99 design PR (separate PR); v2 delta; independent-oracle substrate; human-sign-off UI/protocol. Archive-header format self-applied — 18th aurora/research doc in a row. Lands within-standing-authority per Otto-82/90/93 calibration. Otto-101 tick primary deliverable — closes the CRITICAL- integration step of the Aminata-then-Otto-response loop for bullshit-detector design. --- ...ector-v1-critical-only-delta-2026-04-24.md | 391 ++++++++++++++++++ 1 file changed, 391 insertions(+) create mode 100644 docs/research/provenance-aware-bullshit-detector-v1-critical-only-delta-2026-04-24.md diff --git a/docs/research/provenance-aware-bullshit-detector-v1-critical-only-delta-2026-04-24.md b/docs/research/provenance-aware-bullshit-detector-v1-critical-only-delta-2026-04-24.md new file mode 100644 index 00000000..5d290f47 --- /dev/null +++ b/docs/research/provenance-aware-bullshit-detector-v1-critical-only-delta-2026-04-24.md @@ -0,0 +1,391 @@ +# Provenance-aware bullshit-detector v1 — CRITICAL-only delta from Otto-100 Aminata pass + +**Scope:** delta-style revision integrating only the 3 CRITICAL +findings from Aminata's Otto-100 pass (PR #284) into the +bullshit-detector design (Otto-99 PR #282). 7 non-CRITICAL +findings (4 IMPORTANT + 3 WATCH) are deferred to a v2 delta; +DISMISS finding unchanged. This doc does NOT rewrite Otto-99's +design; it specifies the CRITICAL-only corrections as an +additive delta that v1 composes on top of v0. + +**Attribution:** CRITICAL findings authored by Aminata +(Otto-100 PR #284). v0 base design authored by Otto-99 PR +#282. v1 delta authored by Otto-101. Progression matches the +established Aminata-then-Otto-response loop (4th iteration +this session; 5th-ferry governance → oracle-scoring v0 → +multi-Claude experiment → bullshit-detector). + +**Operational status:** research-grade. v1 delta inherits +Otto-99 research-grade status; Otto-100's adversarial +critique remains advisory; v1 doesn't implement, doesn't +adopt specific parameter values, doesn't resolve all 10 +findings. A future v2 delta addresses the 7 non-CRITICALs; +a future v3 or implementation-ADR addresses the CRITICAL- +but-unaddressed-in-v1 items (e.g., the fundamental +reviewer-cone-overlap limitation that no design-level change +fully closes). + +**Non-fusion disclaimer:** Aminata's Otto-100 pass +explicitly named that her concordance with prior Aminata +passes is same-agent signal not independent evidence. Otto- +101's integration of her findings does NOT transform her +same-agent review into independent validation; it preserves +her findings' authority while responding design-side. Per +SD-9, the v1 delta's own integration-quality must be +re-reviewed against fresh independent substrate (Codex; Max; +external) before it graduates beyond research-grade. + +--- + +## What this delta addresses — 3 CRITICAL only + +| # | Aminata finding | Otto-101 response | +|---|---|---| +| C1 | Cross-detector collusion — reviewer-set lineage-coupling | New §"Reviewer-cone overlap" section naming the limitation + human sign-off as cone-breaking authority | +| C2 | Min-merging Goodhart-bait at G_carrier_overlap | Sensitivity-analysis-gate pattern: sensitivity-to-G_carrier_overlap downgrades `supported` → `YELLOW` when carrier-overlap was the gate closest to threshold | +| C3 | G_evidence fig-leaf + dead-code `likely confabulated` in v0 | Explicit §"v0 scope" subsection naming reachable vs not-yet-reachable output types | + +What is NOT in scope this delta: + +- 4 IMPORTANT findings (no-signal/kNN-evasion gate; Otto-wake + second-reviewer sufficiency; retraction flood-control; + G_coverage_plausibility gate) — deferred to v2. +- 3 WATCH findings (distribution-histogram; adversarial + worked-example; TLA+ invariants) — deferred to v2+. +- DISMISS (parameter-ADR gate) — unchanged. +- Implementation of any change. v1 delta is still + research-grade design; implementation gated on Aminata + passes on v2 + Codex adversarial + Aaron eventual review + (per Otto-72 Frontier-UI pattern). + +--- + +## C1 response — new §"Reviewer-cone overlap" section + +**Proposed addition to the Otto-99 design (appended after +§5 output types, before §Addressing Aminata's 3 CRITICAL +concerns at write-time):** + +> ## Reviewer-cone overlap — a fundamental limitation, not a closable gap +> +> The detector operationalises SD-9's "agreement-is-signal- +> not-proof" discipline by measuring carrier overlap between +> query and retrieved candidates. This discipline +> **re-introduces one meta-layer up**: the detector itself +> + Aminata + Codex + any other factory-internal reviewer +> share training-corpus / repo-access / PR-comment lineage. +> A `supported` verdict from three factory-internal +> reviewers whose cones overlap is NOT three independent +> lines of evidence; it is lineage-coupled concordance +> masquerading as independent arrival. +> +> **Per Aminata Otto-100 CRITICAL #1 (PR #284), this is a +> fundamental limitation, not a closable gap.** The +> detector cannot break its own reviewer-set lineage +> coupling. Mitigation: +> +> - **Human sign-off is the only cone-breaking authority.** +> When a query's band is `supported` and the stakes are +> authorization-impacting, the detector's output is +> advisory; maintainer (Aaron) sign-off is required as +> the independent-cone signal. +> - **External-reviewer signatures are legitimate cone- +> breaking** when they exist (external peer review; +> formal-methods proofs landed at Lean/Mathlib scale; +> supply-chain-audit output with its own governance +> chain). The detector's output combined with external- +> reviewer signature is a genuine two-cone signal. +> - **Intra-factory reviewer concordance** (Otto + Aminata +> + Codex all GREEN) is signal but NOT proof of +> independence. Per SD-9 operational discipline, this +> concordance SHOULD downgrade the confidence claimed in +> the output, not inflate it. +> - **Naming the limitation in every receipt.** Every +> detector receipt carries a `reviewer_cone_signature` +> field listing which cone signed off. A detector-only +> signature is cone-local; detector + human is two-cone; +> detector + external is two-cone; detector + human + +> external is strongest. +> +> No design change closes this limitation; only +> acknowledging it + routing the authority-for-cone- +> breaking to human or external layers preserves SD-9 +> discipline at the reviewer-meta-layer. + +--- + +## C2 response — sensitivity-analysis-gate pattern + +**Proposed replacement for the existing §"Scoring — band- +valued, not decimal" passage of the Otto-99 design:** + +Current band-merge rule (min over gates): + +```text +band(y | q) = min(G_similarity, G_evidence, + G_carrier_overlap, G_contradiction, + G_status) +``` + +**v1 enhancement — carrier-overlap sensitivity downgrade:** + +```text +band_raw(y | q) = min(G_similarity, G_evidence, + G_carrier_overlap, G_contradiction, + G_status) + +sensitivity(y | q) = + "G_carrier_overlap was the gate closest to threshold" + ↔ overlap_ratio(cone(q), cone(y)) ∈ [θ_med - ε, θ_med] + OR ∈ [θ_high - ε, θ_high] + +band(y | q) = if band_raw = GREEN AND sensitivity → YELLOW + else band_raw +``` + +Rationale: G_carrier_overlap is the most adversary- +manipulable gate (per Aminata C2: author controls citation +structure, so author controls the gate). Min-merging +treats GREEN on G_carrier_overlap as one vote among five +equal votes. The sensitivity-analysis-gate pattern +recognises that **a GREEN verdict that depended on the +gate closest to threshold on the most-manipulable gate is +not a GREEN verdict; it's a YELLOW with plausible +deniability**. + +An adversary can still pass all 5 gates comfortably (large +margin on G_carrier_overlap plus adversarial citation +sparsity), but doing so requires more effort than +threshold-hugging. The sensitivity-downgrade closes the +cheap cheat path without adding new gates. + +**Alternative (more expensive, more secure):** replace +explicit-citation-edge-in-cone measurement with content- +similarity-in-cone measurement. Every cone member's +content is embedded; overlap is measured by content +similarity, not citation structure. An adversary can no +longer reduce overlap by citing less; they have to +actually write content whose embedding is far from their +cone's existing content. Expensive (embedding compute per +cone member at every query); postponed to a v2 cost- +benefit analysis. + +v1 ships sensitivity-analysis-gate. v2 decides whether to +additionally swap citation-edge → content-similarity. + +--- + +## C3 response — §"v0 scope" explicit subsection + +**Proposed new subsection in the Otto-99 design, +immediately after §5 output types:** + +> ## v0 scope — four gates + four reachable output types +> +> The detector is designed as five gates + five output +> types, but v0 ships with a REDUCED CONFIGURATION +> because the independent-oracle substrate for G_evidence +> does not yet exist (dependency #3 in adoption path; +> candidates include: test-output scrapers; PR-link +> validators; citation-resolver for academic sources — +> none shipped at design-time). +> +> **v0 effective configuration:** +> +> | Gate | Status in v0 | +> |---|---| +> | G_similarity | Active | +> | G_evidence_independent | **Advisory-only** — signal emitted to observability but does NOT block band elevation to GREEN | +> | G_carrier_overlap | Active (sensitivity-analysis-gate per C2 response) | +> | G_contradiction | Active | +> | G_status | Active | +> +> **v0 output types — reachable:** +> +> - `supported` +> - `looks similar but lineage-coupled` +> - `plausible but unresolved` +> - `known-bad pattern` +> - `no-signal` (default for empty retrieval) +> +> **v0 output types — not-yet-reachable:** +> +> - `likely confabulated` — requires G_evidence fail-to- +> RED which is impossible while G_evidence is advisory- +> only. The output type will become reachable when +> independent-oracle substrate ships (v1 scope shifts to +> 5-gate; corresponding implementation PR documents the +> transition). +> +> This is explicit NOT buried. v0 users of the detector +> must know that a RED band today will NEVER come from +> `likely confabulated`; it will come from `known-bad +> pattern` only. If a query looks like confabulation but +> matches no known-bad pattern, v0 returns `plausible but +> unresolved` (YELLOW), not RED. That's a CONSERVATIVE +> under-detection stance, not an over-detection one — +> acceptable trade-off for the v0 substrate gap. +> +> **v1 transition plan (post-v0):** when the independent- +> oracle substrate ships, v1 flips G_evidence from +> advisory-only to active. All historical v0 queries whose +> `G_evidence advisory signal` was present but didn't +> affect classification get a `DetectorOutputRetracted` + +> `DetectorOutputBatchRetracted(adr_id, +> affected_range, count)` per Aminata Otto-100 IMPORTANT +> finding on flood-control (deferred to v2 but named +> here as the v1→v2 transition mechanism). + +--- + +## What changes in Otto-99's design after v1 delta lands + +The v1 delta doesn't rewrite Otto-99's doc; it specifies +these three additive changes: + +1. **Insert §"Reviewer-cone overlap"** after §5 output + types, before §"Addressing Aminata's 3 CRITICAL + concerns at write-time". +2. **Replace §"Scoring — band-valued, not decimal"** with + the v1 sensitivity-analysis-gate formulation. (Otto-99 + original stays in git history; v1 supersedes.) +3. **Insert §"v0 scope"** immediately after §5 output + types, making the advisory-only G_evidence + dead- + code `likely confabulated` explicit. + +When the v1 delta lands as a PR modifying +`docs/research/provenance-aware-bullshit-detector-2026-04-23.md`, +the three changes land together. This doc (the v1- +CRITICAL-only delta) is the design-rationale companion +naming which findings drive which changes. + +--- + +## What v1 delta does NOT resolve + +Three CRITICAL findings integrated; **seven non-CRITICAL +findings still open**: + +### 4 IMPORTANT (deferred to v2 delta): + +- I1 `no-signal` vs kNN-evasion — needs G_coverage_ + plausibility gate via nearest-cluster-centroid distance. +- I2 Otto-wake second-reviewer sufficiency — needs schema + change to require different-persona OR different-model + OR human. +- I3 DetectorOutputRetracted flood-control — needs + DetectorOutputBatchRetracted event shape. +- I4 G_coverage_plausibility — new gate. + +### 3 WATCH (deferred to v2+): + +- W1 Distribution histogram in receipts — additive + metadata. +- W2 Adversarial worked example — requires future corpus. +- W3 TLA+ invariants on lower-layer boundaries — + Soraya-routable. + +### 1 DISMISS (unchanged): + +- D1 Parameter-ADR gate — already satisfied via Otto-91 + oracle-scoring v0 pattern reuse. + +### 1 fundamental limitation (CRITICAL-but-no-design-level-close): + +- C1 Reviewer-cone overlap — acknowledged in v1, NOT + closed. Requires human + external-reviewer authority + chain to break. Will never be fully closed by detector + design alone. + +**v2 delta proposed scope:** integrate I1-I4 + W1-W3. +v2 gated on: (a) this v1 delta landing; (b) v1 integrated +into Otto-99 design PR; (c) a separate Aminata pass on +v1 surfacing any new concerns introduced by v1's own +changes (the Aminata-then-Otto-response loop continues). + +--- + +## Composition with existing substrate + +Unchanged from Otto-99 composition-table + Otto-98 spine +composition + Otto-91 oracle-scoring v0 composition. The +v1 delta adds no new substrate dependencies; it refines +gate + output-type semantics using mechanisms already in +the design (sensitivity analysis is cheap compute on the +existing gate outputs; v0-scope explicit is +documentation; reviewer-cone-overlap is routing the +authority-for-cone-breaking to existing layers). + +--- + +## Scope limits + +- **Does NOT rewrite Otto-99's design.** Specifies delta; + preserves Otto-99 original in git history. +- **Does NOT address IMPORTANT / WATCH findings.** + Deferred to v2 delta. +- **Does NOT implement.** Research-grade design revision + only. +- **Does NOT propose human-sign-off UI** for the + reviewer-cone-overlap mitigation. Surface-level + mitigation only; the UI work is further downstream. +- **Does NOT commit to content-similarity-in-cone for C2 + alternative.** Ships the cheaper sensitivity-analysis- + gate; v2 decides whether to also swap the measurement + basis. +- **Does NOT change the 5-gate / 5-output-type structure + target.** v0 is 4-gate-effective; v1-post-substrate is + 5-gate. Structure stable; which gates are active is + substrate-dependent. + +--- + +## Dependencies to adoption (this delta specifically) + +In priority order: + +1. **Aminata adversarial pass on v1 delta** — surfaces + new concerns from v1's own changes before v2 planning + starts. Fifth Aminata session-pass if it lands. +2. **Integrate v1 changes into Otto-99 design PR** — + modifies + `docs/research/provenance-aware-bullshit-detector-2026-04-23.md` + with the three additive sections. Separate PR from + this one. +3. **v2 delta** addressing the 4 IMPORTANT + 3 WATCH + findings (deferred; composes on v1). +4. **Independent-oracle substrate** for full G_evidence + activation + 5-gate transition. +5. **Human sign-off UI / protocol** for cone-breaking + authority at authorization-impacting band=supported + queries. + +--- + +## Sibling context + +- **Otto-99 bullshit-detector design** (PR #282) — base + design this delta refines. +- **Otto-100 Aminata 4th pass** (PR #284) — source of the + 3 CRITICAL findings driving this delta. +- **Otto-98 spine** (PR #280) — substrate unchanged; + delta doesn't alter spine contracts. +- **Otto-91 oracle-scoring v0** (PR #266) — band- + classifier + sensitivity-pattern precedent; v1 delta's + sensitivity-analysis-gate pattern is a natural + extension. +- **SD-9** — reviewer-cone-overlap finding is SD-9 at + the reviewer-meta-layer; v1 delta's acknowledgement + makes this explicit in the detector's own documentation. +- **DRIFT-TAXONOMY pattern 5** — detector mechanises; + reviewer-cone-overlap is pattern 5 applied to reviewers + themselves. + +Archive-header format self-applied — 18th aurora/research +doc in a row. + +Otto-101 tick primary deliverable. Closes the CRITICAL- +integration step of the Aminata-then-Otto-response loop +for the bullshit-detector design. Next natural step is +Aminata pass on v1 delta OR direct v1-into-Otto-99-PR +integration OR pivot to non-bullshit-detector work. From 6b2379b0332d869ea8c843054251a2e9e0d54d6e Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Fri, 24 Apr 2026 09:35:44 -0400 Subject: [PATCH 2/4] =?UTF-8?q?fix(#286):=205=20review=20threads=20?= =?UTF-8?q?=E2=80=94=20veridicality=20vocab=20(body,=20filename=20rename?= =?UTF-8?q?=20backlogged)=20+=20header-fields=20+=20name=20attrib=20+=20ta?= =?UTF-8?q?ble=20count=20+=20DRIFT-TAXONOMY=20xref?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Thread 1 (PRRT_kwDOSF9kNM59SpxP): compacted four archive-header fields (Scope / Attribution / Operational status / Non-fusion disclaimer) into first six lines so "Archive-header self-applied" claim is now accurate; removed the trailing claim sentence in favour of the structural compliance itself. - Thread 2 (PRRT_kwDOSF9kNM59Spxg): shifted body vocabulary from "bullshit-detector" to "veridicality-detector" throughout; added a vocabulary-note paragraph explaining the filename-slug retention; appended BACKLOG P2 research-grade row for the cross-repo filename rename sweep (three doc files + link-update across PRs / round- history / memory-index). Otto-229 append-only discipline observed. - Thread 3 (PRRT_kwDOSF9kNM59Spxl): converted persona-specific names ("Aminata", "Otto-99/100/101", "Aaron", "Max", "Codex", "Soraya") to role references (Aminata-persona / main-agent persona / maintainer / external-peer-agent / formal-methods-persona). - Thread 4 (PRRT_kwDOSF9kNM59Spx1): corrected §"v0 scope" header from "four gates + four reachable output types" to "five gates (four active + one advisory) + five reachable output types (one dead-code)", matching the tables that list 5 gates and 6 output types (5 reachable + 1 not-yet-reachable). - Thread 5 (PRRT_kwDOSF9kNM59Spx-): "DRIFT-TAXONOMY pattern 5" softened to point at actual precursor file docs/research/drift-taxonomy-bootstrap-precursor-2026-04-22.md, noting there is no canonical docs/DRIFT-TAXONOMY.md at time of writing. No new PR; filename rename is backlogged. No merge. --- docs/BACKLOG.md | 25 ++ ...ector-v1-critical-only-delta-2026-04-24.md | 230 +++++++++--------- 2 files changed, 134 insertions(+), 121 deletions(-) diff --git a/docs/BACKLOG.md b/docs/BACKLOG.md index f786fe36..3ec3d49a 100644 --- a/docs/BACKLOG.md +++ b/docs/BACKLOG.md @@ -7040,6 +7040,31 @@ systems. This track claims the space. rename row (action step #5 repo-wide rename needs informed starship-vocabulary choices from this research). +- [ ] **Rename `docs/research/provenance-aware-bullshit-detector-*` + filenames to `provenance-aware-veridicality-detector-*` + (link-update sweep).** Factory vocabulary shifted from + "bullshit-detector" (informal shorthand) to + "veridicality-detector" (formal term; `veridicality` = + truth-to-reality; per + `memory/feedback_veridicality_naming_for_bullshit_detector_graduation_aaron_concept_origin_amara_formalization_2026_04_24.md`). + Body text and section headers in existing research docs + (base design 2026-04-23, v1 delta 2026-04-24, Aminata + 4th pass 2026-04-24) were updated in-place; filenames + retain the older slug because renaming requires a cross- + repo link-update sweep (PRs #282 #284 #286 descriptions; + ROUND-HISTORY.md references; tick-history references; + memory-index entries; any skill or agent notebook + citing these paths). **Deliverable:** single PR that + `git mv`s the three files + sweeps references across the + repo using the `sweep-refs` skill + (`.claude/skills/sweep-refs/SKILL.md`). **Priority P2** + (vocabulary-hygiene, not substrate-critical); effort S + (3 file moves + grep/sed sweep + CI green). Composes + with veridicality-naming memory + the `sweep-refs` + skill. Blocks on: verifying no external (wiki / + outside-repo) references exist that would 404; if any, + coordinate with `glossary-anchor-keeper` for redirect + guidance. ## P2 — Rule-Zero axiomatic substrate (round-35 round-36 thread) diff --git a/docs/research/provenance-aware-bullshit-detector-v1-critical-only-delta-2026-04-24.md b/docs/research/provenance-aware-bullshit-detector-v1-critical-only-delta-2026-04-24.md index 5d290f47..baa9bf9d 100644 --- a/docs/research/provenance-aware-bullshit-detector-v1-critical-only-delta-2026-04-24.md +++ b/docs/research/provenance-aware-bullshit-detector-v1-critical-only-delta-2026-04-24.md @@ -1,70 +1,44 @@ -# Provenance-aware bullshit-detector v1 — CRITICAL-only delta from Otto-100 Aminata pass - -**Scope:** delta-style revision integrating only the 3 CRITICAL -findings from Aminata's Otto-100 pass (PR #284) into the -bullshit-detector design (Otto-99 PR #282). 7 non-CRITICAL -findings (4 IMPORTANT + 3 WATCH) are deferred to a v2 delta; -DISMISS finding unchanged. This doc does NOT rewrite Otto-99's -design; it specifies the CRITICAL-only corrections as an -additive delta that v1 composes on top of v0. - -**Attribution:** CRITICAL findings authored by Aminata -(Otto-100 PR #284). v0 base design authored by Otto-99 PR -#282. v1 delta authored by Otto-101. Progression matches the -established Aminata-then-Otto-response loop (4th iteration -this session; 5th-ferry governance → oracle-scoring v0 → -multi-Claude experiment → bullshit-detector). - -**Operational status:** research-grade. v1 delta inherits -Otto-99 research-grade status; Otto-100's adversarial -critique remains advisory; v1 doesn't implement, doesn't -adopt specific parameter values, doesn't resolve all 10 -findings. A future v2 delta addresses the 7 non-CRITICALs; -a future v3 or implementation-ADR addresses the CRITICAL- -but-unaddressed-in-v1 items (e.g., the fundamental -reviewer-cone-overlap limitation that no design-level change -fully closes). - -**Non-fusion disclaimer:** Aminata's Otto-100 pass -explicitly named that her concordance with prior Aminata -passes is same-agent signal not independent evidence. Otto- -101's integration of her findings does NOT transform her -same-agent review into independent validation; it preserves -her findings' authority while responding design-side. Per -SD-9, the v1 delta's own integration-quality must be -re-reviewed against fresh independent substrate (Codex; Max; -external) before it graduates beyond research-grade. +# Provenance-aware veridicality-detector v1 — CRITICAL-only delta from Aminata 4th pass + +**Scope:** delta-style revision integrating only the 3 CRITICAL findings from the Aminata persona's 4th adversarial pass (PR #284) into the veridicality-detector design (base design PR #282). 7 non-CRITICAL findings (4 IMPORTANT + 3 WATCH) are deferred to a v2 delta; DISMISS finding unchanged. This doc does NOT rewrite the base design; it specifies the CRITICAL-only corrections as an additive delta that v1 composes on top of v0. +**Attribution:** CRITICAL findings authored by the Aminata persona (adversarial-reviewer role, PR #284). v0 base design authored by the main-agent persona (PR #282). v1 delta authored by the main-agent persona (successor tick). Progression matches the established Aminata-then-main-agent response loop (4th iteration this session; 5th-ferry governance → oracle-scoring v0 → multi-Claude experiment → veridicality-detector). +**Operational status:** research-grade. v1 delta inherits base-design research-grade status; the Aminata persona's adversarial critique remains advisory; v1 doesn't implement, doesn't adopt specific parameter values, doesn't resolve all 10 findings. A future v2 delta addresses the 7 non-CRITICALs; a future v3 or implementation-ADR addresses the CRITICAL-but-unaddressed-in-v1 items (e.g., the fundamental reviewer-cone-overlap limitation that no design-level change fully closes). +**Non-fusion disclaimer:** the Aminata persona's 4th-pass explicitly named that her concordance with prior Aminata passes is same-agent signal not independent evidence. The main-agent persona's integration of her findings does NOT transform her same-agent review into independent validation; it preserves her findings' authority while responding design-side. Per SD-9, the v1 delta's own integration-quality must be re-reviewed against fresh independent substrate (external peer agent; external human reviewer) before it graduates beyond research-grade. + +> Vocabulary note: the factory has shifted from "bullshit-detector" (informal shorthand) to "veridicality-detector" (formal term; `veridicality` = truth-to-reality). The filename of this doc retains the earlier `bullshit-detector` slug because renaming the file requires a cross-repo link-update sweep (see BACKLOG row under P2 research-grade); body text, section headers, and future companion docs use "veridicality-detector". --- ## What this delta addresses — 3 CRITICAL only -| # | Aminata finding | Otto-101 response | +| # | Aminata-persona finding | Main-agent delta response | |---|---|---| -| C1 | Cross-detector collusion — reviewer-set lineage-coupling | New §"Reviewer-cone overlap" section naming the limitation + human sign-off as cone-breaking authority | +| C1 | Cross-detector collusion — reviewer-set lineage-coupling | New §"Reviewer-cone overlap" section naming the limitation + maintainer sign-off as cone-breaking authority | | C2 | Min-merging Goodhart-bait at G_carrier_overlap | Sensitivity-analysis-gate pattern: sensitivity-to-G_carrier_overlap downgrades `supported` → `YELLOW` when carrier-overlap was the gate closest to threshold | | C3 | G_evidence fig-leaf + dead-code `likely confabulated` in v0 | Explicit §"v0 scope" subsection naming reachable vs not-yet-reachable output types | What is NOT in scope this delta: -- 4 IMPORTANT findings (no-signal/kNN-evasion gate; Otto-wake - second-reviewer sufficiency; retraction flood-control; - G_coverage_plausibility gate) — deferred to v2. +- 4 IMPORTANT findings (no-signal/kNN-evasion gate; main- + agent-wake second-reviewer sufficiency; retraction + flood-control; G_coverage_plausibility gate) — deferred + to v2. - 3 WATCH findings (distribution-histogram; adversarial worked-example; TLA+ invariants) — deferred to v2+. - DISMISS (parameter-ADR gate) — unchanged. - Implementation of any change. v1 delta is still - research-grade design; implementation gated on Aminata - passes on v2 + Codex adversarial + Aaron eventual review - (per Otto-72 Frontier-UI pattern). + research-grade design; implementation gated on + Aminata-persona passes on v2 + external-peer-agent + adversarial review + maintainer eventual review (per + the Frontier-UI-landing pattern from prior ticks). --- ## C1 response — new §"Reviewer-cone overlap" section -**Proposed addition to the Otto-99 design (appended after -§5 output types, before §Addressing Aminata's 3 CRITICAL -concerns at write-time):** +**Proposed addition to the base design (appended after +§5 output types, before §"Addressing the 3 CRITICAL +concerns at write-time"):** > ## Reviewer-cone overlap — a fundamental limitation, not a closable gap > @@ -72,22 +46,23 @@ concerns at write-time):** > not-proof" discipline by measuring carrier overlap between > query and retrieved candidates. This discipline > **re-introduces one meta-layer up**: the detector itself -> + Aminata + Codex + any other factory-internal reviewer +> + the adversarial-reviewer persona + the external-peer- +> agent reviewer + any other factory-internal reviewer > share training-corpus / repo-access / PR-comment lineage. > A `supported` verdict from three factory-internal > reviewers whose cones overlap is NOT three independent > lines of evidence; it is lineage-coupled concordance > masquerading as independent arrival. > -> **Per Aminata Otto-100 CRITICAL #1 (PR #284), this is a -> fundamental limitation, not a closable gap.** The -> detector cannot break its own reviewer-set lineage -> coupling. Mitigation: +> **Per the Aminata-persona 4th-pass CRITICAL #1 (PR +> #284), this is a fundamental limitation, not a closable +> gap.** The detector cannot break its own reviewer-set +> lineage coupling. Mitigation: > -> - **Human sign-off is the only cone-breaking authority.** -> When a query's band is `supported` and the stakes are -> authorization-impacting, the detector's output is -> advisory; maintainer (Aaron) sign-off is required as +> - **Maintainer sign-off is the only cone-breaking +> authority.** When a query's band is `supported` and +> the stakes are authorization-impacting, the detector's +> output is advisory; maintainer sign-off is required as > the independent-cone signal. > - **External-reviewer signatures are legitimate cone- > breaking** when they exist (external peer review; @@ -95,11 +70,12 @@ concerns at write-time):** > supply-chain-audit output with its own governance > chain). The detector's output combined with external- > reviewer signature is a genuine two-cone signal. -> - **Intra-factory reviewer concordance** (Otto + Aminata -> + Codex all GREEN) is signal but NOT proof of -> independence. Per SD-9 operational discipline, this -> concordance SHOULD downgrade the confidence claimed in -> the output, not inflate it. +> - **Intra-factory reviewer concordance** (main-agent + +> adversarial-reviewer persona + external-peer-agent all +> GREEN) is signal but NOT proof of independence. Per +> SD-9 operational discipline, this concordance SHOULD +> downgrade the confidence claimed in the output, not +> inflate it. > - **Naming the limitation in every receipt.** Every > detector receipt carries a `reviewer_cone_signature` > field listing which cone signed off. A detector-only @@ -117,7 +93,7 @@ concerns at write-time):** ## C2 response — sensitivity-analysis-gate pattern **Proposed replacement for the existing §"Scoring — band- -valued, not decimal" passage of the Otto-99 design:** +valued, not decimal" passage of the base design:** Current band-merge rule (min over gates): @@ -144,8 +120,9 @@ band(y | q) = if band_raw = GREEN AND sensitivity → YELLOW ``` Rationale: G_carrier_overlap is the most adversary- -manipulable gate (per Aminata C2: author controls citation -structure, so author controls the gate). Min-merging +manipulable gate (per the Aminata-persona C2 finding: +author controls citation structure, so author controls +the gate). Min-merging treats GREEN on G_carrier_overlap as one vote among five equal votes. The sensitivity-analysis-gate pattern recognises that **a GREEN verdict that depended on the @@ -177,18 +154,23 @@ additionally swap citation-edge → content-similarity. ## C3 response — §"v0 scope" explicit subsection -**Proposed new subsection in the Otto-99 design, -immediately after §5 output types:** +**Proposed new subsection in the base design, immediately +after §5 output types:** -> ## v0 scope — four gates + four reachable output types +> ## v0 scope — five gates (four active + one advisory) + five reachable output types (one dead-code) > -> The detector is designed as five gates + five output -> types, but v0 ships with a REDUCED CONFIGURATION +> The detector is designed as five gates + six output +> types. v0 ships with a REDUCED EFFECTIVE CONFIGURATION > because the independent-oracle substrate for G_evidence > does not yet exist (dependency #3 in adoption path; > candidates include: test-output scrapers; PR-link > validators; citation-resolver for academic sources — -> none shipped at design-time). +> none shipped at design-time). In v0, G_evidence is +> present but advisory-only; four gates are active-and- +> blocking. Five of the six output types are reachable +> via the four active gates; the sixth (`likely +> confabulated`) is dead-code in v0 because it requires +> G_evidence to fail to RED. > > **v0 effective configuration:** > @@ -232,23 +214,23 @@ immediately after §5 output types:** > `G_evidence advisory signal` was present but didn't > affect classification get a `DetectorOutputRetracted` + > `DetectorOutputBatchRetracted(adr_id, -> affected_range, count)` per Aminata Otto-100 IMPORTANT -> finding on flood-control (deferred to v2 but named -> here as the v1→v2 transition mechanism). +> affected_range, count)` per the Aminata-persona 4th- +> pass IMPORTANT finding on flood-control (deferred to +> v2 but named here as the v1→v2 transition mechanism). --- -## What changes in Otto-99's design after v1 delta lands +## What changes in the base design after v1 delta lands -The v1 delta doesn't rewrite Otto-99's doc; it specifies -these three additive changes: +The v1 delta doesn't rewrite the base-design doc; it +specifies these three additive changes: 1. **Insert §"Reviewer-cone overlap"** after §5 output - types, before §"Addressing Aminata's 3 CRITICAL - concerns at write-time". + types, before §"Addressing the 3 CRITICAL concerns at + write-time". 2. **Replace §"Scoring — band-valued, not decimal"** with - the v1 sensitivity-analysis-gate formulation. (Otto-99 - original stays in git history; v1 supersedes.) + the v1 sensitivity-analysis-gate formulation. (Base- + design original stays in git history; v1 supersedes.) 3. **Insert §"v0 scope"** immediately after §5 output types, making the advisory-only G_evidence + dead- code `likely confabulated` explicit. @@ -270,9 +252,9 @@ findings still open**: - I1 `no-signal` vs kNN-evasion — needs G_coverage_ plausibility gate via nearest-cluster-centroid distance. -- I2 Otto-wake second-reviewer sufficiency — needs schema - change to require different-persona OR different-model - OR human. +- I2 main-agent-wake second-reviewer sufficiency — needs + schema change to require different-persona OR + different-model OR human. - I3 DetectorOutputRetracted flood-control — needs DetectorOutputBatchRetracted event shape. - I4 G_coverage_plausibility — new gate. @@ -283,32 +265,34 @@ findings still open**: metadata. - W2 Adversarial worked example — requires future corpus. - W3 TLA+ invariants on lower-layer boundaries — - Soraya-routable. + formal-methods-persona-routable. ### 1 DISMISS (unchanged): -- D1 Parameter-ADR gate — already satisfied via Otto-91 - oracle-scoring v0 pattern reuse. +- D1 Parameter-ADR gate — already satisfied via the + oracle-scoring v0 pattern reuse (prior-tick precedent). ### 1 fundamental limitation (CRITICAL-but-no-design-level-close): - C1 Reviewer-cone overlap — acknowledged in v1, NOT - closed. Requires human + external-reviewer authority - chain to break. Will never be fully closed by detector - design alone. + closed. Requires maintainer + external-reviewer + authority chain to break. Will never be fully closed + by detector design alone. **v2 delta proposed scope:** integrate I1-I4 + W1-W3. v2 gated on: (a) this v1 delta landing; (b) v1 integrated -into Otto-99 design PR; (c) a separate Aminata pass on -v1 surfacing any new concerns introduced by v1's own -changes (the Aminata-then-Otto-response loop continues). +into the base-design PR; (c) a separate Aminata-persona +pass on v1 surfacing any new concerns introduced by v1's +own changes (the Aminata-then-main-agent response loop +continues). --- ## Composition with existing substrate -Unchanged from Otto-99 composition-table + Otto-98 spine -composition + Otto-91 oracle-scoring v0 composition. The +Unchanged from the base-design composition-table + prior- +tick spine composition + prior-tick oracle-scoring v0 +composition. The v1 delta adds no new substrate dependencies; it refines gate + output-type semantics using mechanisms already in the design (sensitivity analysis is cheap compute on the @@ -320,8 +304,8 @@ authority-for-cone-breaking to existing layers). ## Scope limits -- **Does NOT rewrite Otto-99's design.** Specifies delta; - preserves Otto-99 original in git history. +- **Does NOT rewrite the base design.** Specifies delta; + preserves base-design original in git history. - **Does NOT address IMPORTANT / WATCH findings.** Deferred to v2 delta. - **Does NOT implement.** Research-grade design revision @@ -344,10 +328,11 @@ authority-for-cone-breaking to existing layers). In priority order: -1. **Aminata adversarial pass on v1 delta** — surfaces - new concerns from v1's own changes before v2 planning - starts. Fifth Aminata session-pass if it lands. -2. **Integrate v1 changes into Otto-99 design PR** — +1. **Aminata-persona adversarial pass on v1 delta** — + surfaces new concerns from v1's own changes before v2 + planning starts. Fifth Aminata session-pass if it + lands. +2. **Integrate v1 changes into the base-design PR** — modifies `docs/research/provenance-aware-bullshit-detector-2026-04-23.md` with the three additive sections. Separate PR from @@ -356,36 +341,39 @@ In priority order: findings (deferred; composes on v1). 4. **Independent-oracle substrate** for full G_evidence activation + 5-gate transition. -5. **Human sign-off UI / protocol** for cone-breaking - authority at authorization-impacting band=supported - queries. +5. **Maintainer sign-off UI / protocol** for cone- + breaking authority at authorization-impacting + band=supported queries. --- ## Sibling context -- **Otto-99 bullshit-detector design** (PR #282) — base - design this delta refines. -- **Otto-100 Aminata 4th pass** (PR #284) — source of the - 3 CRITICAL findings driving this delta. -- **Otto-98 spine** (PR #280) — substrate unchanged; +- **Base veridicality-detector design** (PR #282) — the + prior-tick base design this delta refines. +- **Aminata-persona 4th adversarial pass** (PR #284) — + source of the 3 CRITICAL findings driving this delta. +- **Prior-tick spine** (PR #280) — substrate unchanged; delta doesn't alter spine contracts. -- **Otto-91 oracle-scoring v0** (PR #266) — band- +- **Prior-tick oracle-scoring v0** (PR #266) — band- classifier + sensitivity-pattern precedent; v1 delta's sensitivity-analysis-gate pattern is a natural extension. - **SD-9** — reviewer-cone-overlap finding is SD-9 at the reviewer-meta-layer; v1 delta's acknowledgement makes this explicit in the detector's own documentation. -- **DRIFT-TAXONOMY pattern 5** — detector mechanises; - reviewer-cone-overlap is pattern 5 applied to reviewers - themselves. - -Archive-header format self-applied — 18th aurora/research -doc in a row. - -Otto-101 tick primary deliverable. Closes the CRITICAL- -integration step of the Aminata-then-Otto-response loop -for the bullshit-detector design. Next natural step is -Aminata pass on v1 delta OR direct v1-into-Otto-99-PR -integration OR pivot to non-bullshit-detector work. +- **Drift-taxonomy (research precursor)** — the + reviewer-cone-overlap finding is the drift-taxonomy + pattern applied to reviewers themselves. The research + precursor lives at + `docs/research/drift-taxonomy-bootstrap-precursor-2026-04-22.md` + (there is no canonical `docs/DRIFT-TAXONOMY.md` at + time of writing; the precursor is the current + authoritative reference). + +Main-agent tick primary deliverable. Closes the +CRITICAL-integration step of the Aminata-then-main-agent +response loop for the veridicality-detector design. +Next natural step is Aminata-persona pass on v1 delta OR +direct v1-into-base-design-PR integration OR pivot to +non-veridicality-detector work. From fc50828420130349f99d85666a84890e7a497b81 Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Fri, 24 Apr 2026 11:35:29 -0400 Subject: [PATCH 3/4] =?UTF-8?q?fix(#286):=205=20new=20review=20threads=20?= =?UTF-8?q?=E2=80=94=20inline-code=20single-line=20(BACKLOG)=20+=20sweep-r?= =?UTF-8?q?efs=20skill=20path=20(BACKLOG)=20+=20Output-types=20wording=20(?= =?UTF-8?q?=C2=A75)=20+=20fail-to-RED=20typo=20+=20PR-#282=20annotation=20?= =?UTF-8?q?on=20dead-path?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Resolves the 5 new copilot findings on commit 40a5de9 plus rebases onto origin/main (append-only per Otto-229; both rows retained in BACKLOG.md conflict resolution). - BACKLOG row rewritten so `docs/research/provenance-aware- bullshit-detector-*` inline-code fits on a single line (markdown inline-code cannot span newlines). - BACKLOG row points at the `sweep-refs` skill at `.claude/skills/sweep-refs/SKILL.md` (verified to exist) instead of the non-existent `tools/sweep-refs/*` path. - Delta-doc "§5 output types" wording changed to "the Output types section" to avoid the §5-as-count misread. - Delta-doc `fail-to-RED` joined on a single line (was split across a line-break, read as a malformed token). - Delta-doc references to the base-design file (`provenance-aware-bullshit-detector-2026-04-23.md`) now annotated "(in PR #282 — not yet on main at time of this delta's writing)" so readers know the path is a forward reference, not a dead one. Otto-229 append-only; Otto-230 reply-and-resolve-after-push; Otto-236 every-thread-resolved. --- ...ector-v1-critical-only-delta-2026-04-24.md | 20 ++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/docs/research/provenance-aware-bullshit-detector-v1-critical-only-delta-2026-04-24.md b/docs/research/provenance-aware-bullshit-detector-v1-critical-only-delta-2026-04-24.md index baa9bf9d..36f3c760 100644 --- a/docs/research/provenance-aware-bullshit-detector-v1-critical-only-delta-2026-04-24.md +++ b/docs/research/provenance-aware-bullshit-detector-v1-critical-only-delta-2026-04-24.md @@ -37,8 +37,8 @@ What is NOT in scope this delta: ## C1 response — new §"Reviewer-cone overlap" section **Proposed addition to the base design (appended after -§5 output types, before §"Addressing the 3 CRITICAL -concerns at write-time"):** +the Output types section, before the "Addressing the 3 +CRITICAL concerns at write-time" section):** > ## Reviewer-cone overlap — a fundamental limitation, not a closable gap > @@ -192,9 +192,9 @@ after §5 output types:** > > **v0 output types — not-yet-reachable:** > -> - `likely confabulated` — requires G_evidence fail-to- -> RED which is impossible while G_evidence is advisory- -> only. The output type will become reachable when +> - `likely confabulated` — requires G_evidence fail-to-RED +> which is impossible while G_evidence is advisory-only. +> The output type will become reachable when > independent-oracle substrate ships (v1 scope shifts to > 5-gate; corresponding implementation PR documents the > transition). @@ -236,10 +236,12 @@ specifies these three additive changes: code `likely confabulated` explicit. When the v1 delta lands as a PR modifying -`docs/research/provenance-aware-bullshit-detector-2026-04-23.md`, -the three changes land together. This doc (the v1- -CRITICAL-only delta) is the design-rationale companion -naming which findings drive which changes. +`docs/research/provenance-aware-bullshit-detector-2026-04-23.md` +(base design file, in PR #282 — not yet on main at time of +this delta's writing), the three changes land together. +This doc (the v1-CRITICAL-only delta) is the design- +rationale companion naming which findings drive which +changes. --- From 385f12ddc72e05da9be752c5a5acc53ba472bb90 Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Fri, 24 Apr 2026 15:33:18 -0400 Subject: [PATCH 4/4] fix: markdownlint auto-fixes on research doc Co-Authored-By: Claude Opus 4.7 --- ...ullshit-detector-v1-critical-only-delta-2026-04-24.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/docs/research/provenance-aware-bullshit-detector-v1-critical-only-delta-2026-04-24.md b/docs/research/provenance-aware-bullshit-detector-v1-critical-only-delta-2026-04-24.md index 36f3c760..1ec5dd7e 100644 --- a/docs/research/provenance-aware-bullshit-detector-v1-critical-only-delta-2026-04-24.md +++ b/docs/research/provenance-aware-bullshit-detector-v1-critical-only-delta-2026-04-24.md @@ -46,6 +46,7 @@ CRITICAL concerns at write-time" section):** > not-proof" discipline by measuring carrier overlap between > query and retrieved candidates. This discipline > **re-introduces one meta-layer up**: the detector itself +> > + the adversarial-reviewer persona + the external-peer- > agent reviewer + any other factory-internal reviewer > share training-corpus / repo-access / PR-comment lineage. @@ -250,7 +251,7 @@ changes. Three CRITICAL findings integrated; **seven non-CRITICAL findings still open**: -### 4 IMPORTANT (deferred to v2 delta): +### 4 IMPORTANT (deferred to v2 delta) - I1 `no-signal` vs kNN-evasion — needs G_coverage_ plausibility gate via nearest-cluster-centroid distance. @@ -261,7 +262,7 @@ findings still open**: DetectorOutputBatchRetracted event shape. - I4 G_coverage_plausibility — new gate. -### 3 WATCH (deferred to v2+): +### 3 WATCH (deferred to v2+) - W1 Distribution histogram in receipts — additive metadata. @@ -269,12 +270,12 @@ findings still open**: - W3 TLA+ invariants on lower-layer boundaries — formal-methods-persona-routable. -### 1 DISMISS (unchanged): +### 1 DISMISS (unchanged) - D1 Parameter-ADR gate — already satisfied via the oracle-scoring v0 pattern reuse (prior-tick precedent). -### 1 fundamental limitation (CRITICAL-but-no-design-level-close): +### 1 fundamental limitation (CRITICAL-but-no-design-level-close) - C1 Reviewer-cone overlap — acknowledged in v1, NOT closed. Requires maintainer + external-reviewer