Conversation
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Pull request overview
Adds a research-grade delta document that integrates the 3 CRITICAL findings from Aminata’s Otto-100 pass into the provenance-aware detector design narrative, while explicitly deferring the 7 non-CRITICAL findings to a later v2 delta.
Changes:
- Introduces a new research doc describing the CRITICAL-only delta and its intended insert/replace points in the Otto-99 design.
- Specifies responses for C1 (reviewer-cone overlap limitation), C2 (carrier-overlap sensitivity downgrade), and C3 (v0 scope + reachable output types).
- Captures deferred IMPORTANT/WATCH items and adoption dependencies as explicit scope boundaries.
…me backlogged) + header-fields + name attrib + table count + DRIFT-TAXONOMY xref
- Thread 1 (PRRT_kwDOSF9kNM59SpxP): compacted four archive-header
fields (Scope / Attribution / Operational status / Non-fusion
disclaimer) into first six lines so "Archive-header self-applied"
claim is now accurate; removed the trailing claim sentence in
favour of the structural compliance itself.
- Thread 2 (PRRT_kwDOSF9kNM59Spxg): shifted body vocabulary from
"bullshit-detector" to "veridicality-detector" throughout; added a
vocabulary-note paragraph explaining the filename-slug retention;
appended BACKLOG P2 research-grade row for the cross-repo filename
rename sweep (three doc files + link-update across PRs / round-
history / memory-index). Otto-229 append-only discipline observed.
- Thread 3 (PRRT_kwDOSF9kNM59Spxl): converted persona-specific names
("Aminata", "Otto-99/100/101", "Aaron", "Max", "Codex", "Soraya")
to role references (Aminata-persona / main-agent persona /
maintainer / external-peer-agent / formal-methods-persona).
- Thread 4 (PRRT_kwDOSF9kNM59Spx1): corrected §"v0 scope" header
from "four gates + four reachable output types" to "five gates
(four active + one advisory) + five reachable output types (one
dead-code)", matching the tables that list 5 gates and 6 output
types (5 reachable + 1 not-yet-reachable).
- Thread 5 (PRRT_kwDOSF9kNM59Spx-): "DRIFT-TAXONOMY pattern 5"
softened to point at actual precursor file
docs/research/drift-taxonomy-bootstrap-precursor-2026-04-22.md,
noting there is no canonical docs/DRIFT-TAXONOMY.md at time of
writing.
No new PR; filename rename is backlogged. No merge.
ead499e to
39e7681
Compare
…ta integrating Aminata Otto-100 findings Post-milestone tick. First integration pass on Aminata's Otto-100 10 findings. PR #286 delta-style revision addresses only the 3 CRITICAL findings; 7 non-CRITICAL deferred to v2. Key observations: 1. **Delta-style revision** is a new useful pattern — Otto-99 design stays intact; v1 specifies diff-style additions; v2 composes on v1. Reviewer-clean; revertable; extends to v3/v4/vN as more findings arrive. 2. **C1 acknowledged as fundamental limitation not closable gap** — honest move. Detector can't break own reviewer-set lineage coupling; authority-routing to human + external is the mitigation. Architectural cleanness of "acknowledge + route" beats "invent closure mechanism that doesn't exist." 3. **Sensitivity-analysis-gate** is cheap mechanical defense against Goodhart-bait on adversary-manipulable gates. Candidate BP-NN promotion if exercised on second classifier. 4. **v0-scope explicit subsection** closes dead-code-in- safety-critical-classifier hazard. Applicable to any design-on-paper vs operational-in-practice gap in the factory. Stacked on #285 (Otto-100 history).
…ta integrating Aminata Otto-100 findings Post-milestone tick. First integration pass on Aminata's Otto-100 10 findings. PR #286 delta-style revision addresses only the 3 CRITICAL findings; 7 non-CRITICAL deferred to v2. Key observations: 1. **Delta-style revision** is a new useful pattern — Otto-99 design stays intact; v1 specifies diff-style additions; v2 composes on v1. Reviewer-clean; revertable; extends to v3/v4/vN as more findings arrive. 2. **C1 acknowledged as fundamental limitation not closable gap** — honest move. Detector can't break own reviewer-set lineage coupling; authority-routing to human + external is the mitigation. Architectural cleanness of "acknowledge + route" beats "invent closure mechanism that doesn't exist." 3. **Sensitivity-analysis-gate** is cheap mechanical defense against Goodhart-bait on adversary-manipulable gates. Candidate BP-NN promotion if exercised on second classifier. 4. **v0-scope explicit subsection** closes dead-code-in- safety-critical-classifier hazard. Applicable to any design-on-paper vs operational-in-practice gap in the factory. Stacked on #285 (Otto-100 history).
…ta integrating Aminata Otto-100 findings Post-milestone tick. First integration pass on Aminata's Otto-100 10 findings. PR #286 delta-style revision addresses only the 3 CRITICAL findings; 7 non-CRITICAL deferred to v2. Key observations: 1. **Delta-style revision** is a new useful pattern — Otto-99 design stays intact; v1 specifies diff-style additions; v2 composes on v1. Reviewer-clean; revertable; extends to v3/v4/vN as more findings arrive. 2. **C1 acknowledged as fundamental limitation not closable gap** — honest move. Detector can't break own reviewer-set lineage coupling; authority-routing to human + external is the mitigation. Architectural cleanness of "acknowledge + route" beats "invent closure mechanism that doesn't exist." 3. **Sensitivity-analysis-gate** is cheap mechanical defense against Goodhart-bait on adversary-manipulable gates. Candidate BP-NN promotion if exercised on second classifier. 4. **v0-scope explicit subsection** closes dead-code-in- safety-critical-classifier hazard. Applicable to any design-on-paper vs operational-in-practice gap in the factory. Stacked on #285 (Otto-100 history).
…me backlogged) + header-fields + name attrib + table count + DRIFT-TAXONOMY xref
- Thread 1 (PRRT_kwDOSF9kNM59SpxP): compacted four archive-header
fields (Scope / Attribution / Operational status / Non-fusion
disclaimer) into first six lines so "Archive-header self-applied"
claim is now accurate; removed the trailing claim sentence in
favour of the structural compliance itself.
- Thread 2 (PRRT_kwDOSF9kNM59Spxg): shifted body vocabulary from
"bullshit-detector" to "veridicality-detector" throughout; added a
vocabulary-note paragraph explaining the filename-slug retention;
appended BACKLOG P2 research-grade row for the cross-repo filename
rename sweep (three doc files + link-update across PRs / round-
history / memory-index). Otto-229 append-only discipline observed.
- Thread 3 (PRRT_kwDOSF9kNM59Spxl): converted persona-specific names
("Aminata", "Otto-99/100/101", "Aaron", "Max", "Codex", "Soraya")
to role references (Aminata-persona / main-agent persona /
maintainer / external-peer-agent / formal-methods-persona).
- Thread 4 (PRRT_kwDOSF9kNM59Spx1): corrected §"v0 scope" header
from "four gates + four reachable output types" to "five gates
(four active + one advisory) + five reachable output types (one
dead-code)", matching the tables that list 5 gates and 6 output
types (5 reachable + 1 not-yet-reachable).
- Thread 5 (PRRT_kwDOSF9kNM59Spx-): "DRIFT-TAXONOMY pattern 5"
softened to point at actual precursor file
docs/research/drift-taxonomy-bootstrap-precursor-2026-04-22.md,
noting there is no canonical docs/DRIFT-TAXONOMY.md at time of
writing.
No new PR; filename rename is backlogged. No merge.
39e7681 to
40a5de9
Compare
…ngs integrated (Otto-101) Delta-style revision integrating only the 3 CRITICAL findings from Aminata's Otto-100 pass (PR #284) into Otto-99's bullshit-detector design (PR #282). 7 non-CRITICAL findings deferred to v2 delta; DISMISS unchanged. Three CRITICAL integrations: **C1 — Cross-detector collusion (new §Reviewer-cone overlap section).** Acknowledges as FUNDAMENTAL LIMITATION NOT CLOSABLE GAP: detector cannot break its own reviewer-set lineage coupling. Mitigation via routing authority: - Human sign-off = only cone-breaking authority for authorization-impacting band=supported queries. - External-reviewer signatures as legitimate cone-breakers when available. - Intra-factory reviewer concordance should DOWNGRADE confidence per SD-9, not inflate. - Every receipt carries reviewer_cone_signature field. **C2 — Min-merging Goodhart-bait (sensitivity-analysis-gate pattern).** v1 enhancement: band(y|q) = if band_raw=GREEN AND G_carrier_overlap closest to threshold → YELLOW else band_raw Closes cheap cheat path (threshold-hugging on author- controlled gate). Alternative content-similarity-in-cone deferred to v2 cost-benefit. **C3 — G_evidence fig-leaf (new §v0 scope subsection).** v0 effective = 4-gate classifier (G_evidence advisory-only until independent-oracle substrate ships). `likely confabulated` output type explicitly marked not-yet-reachable in v0. Conservative under-detection stance (RED comes only from known-bad-pattern; confabulations land YELLOW as plausible- but-unresolved). v0→v1 transition plan names DetectorOutputBatchRetracted for historical re-classification. v1 delta does NOT rewrite Otto-99's design; specifies 3 additive sections to insert + 1 section to supersede. Otto-99 original preserved in git history. Seven non-CRITICALs still open: - 4 IMPORTANT (deferred to v2): G_coverage_plausibility gate; Otto-wake second-reviewer schema; DetectorOutputBatchRetracted; no-signal vs kNN-evasion. - 3 WATCH (deferred to v2+): distribution histogram; adversarial worked example; TLA+ invariants. 1 fundamental limitation (C1) acknowledged not closed — routes authority to human + external reviewer layers. Scope limits: no design rewrite; no implementation; no human-sign-off UI proposal; no content-similarity-in-cone commit; 5-gate/5-type target structure unchanged. 5 dependencies-to-adoption: Aminata pass on v1 delta (fifth session-pass); integrate v1 changes into Otto-99 design PR (separate PR); v2 delta; independent-oracle substrate; human-sign-off UI/protocol. Archive-header format self-applied — 18th aurora/research doc in a row. Lands within-standing-authority per Otto-82/90/93 calibration. Otto-101 tick primary deliverable — closes the CRITICAL- integration step of the Aminata-then-Otto-response loop for bullshit-detector design.
…me backlogged) + header-fields + name attrib + table count + DRIFT-TAXONOMY xref
- Thread 1 (PRRT_kwDOSF9kNM59SpxP): compacted four archive-header
fields (Scope / Attribution / Operational status / Non-fusion
disclaimer) into first six lines so "Archive-header self-applied"
claim is now accurate; removed the trailing claim sentence in
favour of the structural compliance itself.
- Thread 2 (PRRT_kwDOSF9kNM59Spxg): shifted body vocabulary from
"bullshit-detector" to "veridicality-detector" throughout; added a
vocabulary-note paragraph explaining the filename-slug retention;
appended BACKLOG P2 research-grade row for the cross-repo filename
rename sweep (three doc files + link-update across PRs / round-
history / memory-index). Otto-229 append-only discipline observed.
- Thread 3 (PRRT_kwDOSF9kNM59Spxl): converted persona-specific names
("Aminata", "Otto-99/100/101", "Aaron", "Max", "Codex", "Soraya")
to role references (Aminata-persona / main-agent persona /
maintainer / external-peer-agent / formal-methods-persona).
- Thread 4 (PRRT_kwDOSF9kNM59Spx1): corrected §"v0 scope" header
from "four gates + four reachable output types" to "five gates
(four active + one advisory) + five reachable output types (one
dead-code)", matching the tables that list 5 gates and 6 output
types (5 reachable + 1 not-yet-reachable).
- Thread 5 (PRRT_kwDOSF9kNM59Spx-): "DRIFT-TAXONOMY pattern 5"
softened to point at actual precursor file
docs/research/drift-taxonomy-bootstrap-precursor-2026-04-22.md,
noting there is no canonical docs/DRIFT-TAXONOMY.md at time of
writing.
No new PR; filename rename is backlogged. No merge.
… sweep-refs skill path (BACKLOG) + Output-types wording (§5) + fail-to-RED typo + PR-#282 annotation on dead-path Resolves the 5 new copilot findings on commit 40a5de9 plus rebases onto origin/main (append-only per Otto-229; both rows retained in BACKLOG.md conflict resolution). - BACKLOG row rewritten so `docs/research/provenance-aware- bullshit-detector-*` inline-code fits on a single line (markdown inline-code cannot span newlines). - BACKLOG row points at the `sweep-refs` skill at `.claude/skills/sweep-refs/SKILL.md` (verified to exist) instead of the non-existent `tools/sweep-refs/*` path. - Delta-doc "§5 output types" wording changed to "the Output types section" to avoid the §5-as-count misread. - Delta-doc `fail-to-RED` joined on a single line (was split across a line-break, read as a malformed token). - Delta-doc references to the base-design file (`provenance-aware-bullshit-detector-2026-04-23.md`) now annotated "(in PR #282 — not yet on main at time of this delta's writing)" so readers know the path is a forward reference, not a dead one. Otto-229 append-only; Otto-230 reply-and-resolve-after-push; Otto-236 every-thread-resolved.
40a5de9 to
fc50828
Compare
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
| > **re-introduces one meta-layer up**: the detector itself | ||
| > | ||
| > + the adversarial-reviewer persona + the external-peer- | ||
| > agent reviewer + any other factory-internal reviewer | ||
| > share training-corpus / repo-access / PR-comment lineage. |
There was a problem hiding this comment.
P1: The blockquote line starting with + (> + the adversarial-reviewer persona ...) will be parsed as an unordered-list item. Repo guidance explicitly avoids line-start + in prose/lists because it’s easy to accidentally create nested lists and confuse readers/linters; please rewrite this as normal prose (e.g., “the detector itself, the adversarial-reviewer persona, … share …”) or use - bullets consistently.
| > **re-introduces one meta-layer up**: the detector itself | |
| > | |
| > + the adversarial-reviewer persona + the external-peer- | |
| > agent reviewer + any other factory-internal reviewer | |
| > share training-corpus / repo-access / PR-comment lineage. | |
| > **re-introduces one meta-layer up**: the detector itself, | |
| > the adversarial-reviewer persona, the external-peer-agent | |
| > reviewer, and any other factory-internal reviewer share | |
| > training-corpus / repo-access / PR-comment lineage. |
| > - **Maintainer sign-off is the only cone-breaking | ||
| > authority.** When a query's band is `supported` and | ||
| > the stakes are authorization-impacting, the detector's | ||
| > output is advisory; maintainer sign-off is required as | ||
| > the independent-cone signal. | ||
| > - **External-reviewer signatures are legitimate cone- | ||
| > breaking** when they exist (external peer review; | ||
| > formal-methods proofs landed at Lean/Mathlib scale; |
There was a problem hiding this comment.
P1: This section says “Maintainer sign-off is the only cone-breaking authority”, but the next bullet says external-reviewer signatures are also cone-breaking. Please clarify the scope (e.g., “only internal cone-breaking authority” / “default required authority for authorization-impacting decisions”) so the policy doesn’t contradict itself.
| "G_carrier_overlap was the gate closest to threshold" | ||
| ↔ overlap_ratio(cone(q), cone(y)) ∈ [θ_med - ε, θ_med] | ||
| OR ∈ [θ_high - ε, θ_high] | ||
|
|
||
| band(y | q) = if band_raw = GREEN AND sensitivity → YELLOW | ||
| else band_raw |
There was a problem hiding this comment.
P1: The sensitivity pseudo-code mixes a predicate definition with a rewrite arrow (sensitivity → YELLOW) and then uses sensitivity without applying it (it’s defined as sensitivity(y|q) = ...). This is ambiguous for implementers; please rewrite as a boolean predicate (no quoted string) and use a standard conditional like if band_raw = GREEN && sensitivity(y|q) then YELLOW else band_raw.
| "G_carrier_overlap was the gate closest to threshold" | |
| ↔ overlap_ratio(cone(q), cone(y)) ∈ [θ_med - ε, θ_med] | |
| OR ∈ [θ_high - ε, θ_high] | |
| band(y | q) = if band_raw = GREEN AND sensitivity → YELLOW | |
| else band_raw | |
| overlap_ratio(cone(q), cone(y)) ∈ [θ_med - ε, θ_med] | |
| OR overlap_ratio(cone(q), cone(y)) ∈ [θ_high - ε, θ_high] | |
| band(y | q) = | |
| if band_raw = GREEN && sensitivity(y | q) | |
| then YELLOW | |
| else band_raw |
Summary
Delta-style revision integrating only the 3 CRITICAL findings from Aminata's Otto-100 pass (PR #284) into Otto-99's bullshit-detector design (PR #282).
7 non-CRITICAL findings (4 IMPORTANT + 3 WATCH) deferred to v2 delta.
Closes CRITICAL-integration step of Aminata-then-Otto-response loop.
3 CRITICAL integrations
band = YELLOWwhen G_carrier_overlap was gate closest to thresholdlikely confabulatedin v0likely confabulatednot-yet-reachableFundamental limitation acknowledged
C1 (reviewer-cone overlap) is NOT closable by detector design alone. The detector can't break its own reviewer-set lineage coupling. Mitigation via routing: human sign-off + external-reviewer signatures are the cone-breaking authorities; intra-factory reviewer concordance downgrades confidence rather than inflating.
Deferred to v2
4 IMPORTANT: G_coverage_plausibility gate · Otto-wake second-reviewer schema tightening · DetectorOutputBatchRetracted · no-signal vs kNN-evasion distinction.
3 WATCH: distribution histogram · adversarial worked example · TLA+ invariants.
DISMISS unchanged: parameter-ADR gate (reused from oracle-scoring v0).
Scope limits
Authority
Within standing authority per Otto-82/90/93 calibration — research-grade design revision.
Dependencies
🤖 Generated with Claude Code