diff --git a/docs/BACKLOG.md b/docs/BACKLOG.md index 4e5f2ad1..9a089626 100644 --- a/docs/BACKLOG.md +++ b/docs/BACKLOG.md @@ -11087,9 +11087,7 @@ systems. This track claims the space. `docs/aurora/**`, `docs/pr-preservation/**`, `docs/hygiene-history/**`, `memory/**` — and confirm agent-persona names are AS allowed as human names there. - Memory: `memory/feedback_research_counts_as_history_ - first_name_attribution_for_humans_and_agents_otto_279_ - 2026_04_24.md`. + Memory: [`memory/feedback_research_counts_as_history_first_name_attribution_for_humans_and_agents_otto_279_2026_04_24.md`](../memory/feedback_research_counts_as_history_first_name_attribution_for_humans_and_agents_otto_279_2026_04_24.md). ## P1 — Git-native hygiene cadences (Otto-54 directive cluster) diff --git a/docs/research/provenance-aware-claim-veracity-detector-2026-04-23.md b/docs/research/provenance-aware-claim-veracity-detector-2026-04-23.md index dee99a0a..29716744 100644 --- a/docs/research/provenance-aware-claim-veracity-detector-2026-04-23.md +++ b/docs/research/provenance-aware-claim-veracity-detector-2026-04-23.md @@ -24,6 +24,27 @@ implementations. Downstream design choices are gated on Aminata adversarial pass + candidate #4 operational promotion. +**Promotion path to authoritative-detector status (long- +horizon, not v0/v1):** Aaron Otto-2026-04-24 framed the +long-horizon upgrade explicitly — *"we can make it a true +detector under our axioms"* — and separately reinforced +the gate discipline — *"i don't treat anyting this new as +final authorative connoncial until peer review"*. v0 is +advisory-only; v1 (independent-oracle substrate) makes +the evidence gate binding in band-merging; a further vN +promotion lands once (a) the factory's axiomatic substrate +is complete enough that "truth" is tractable within the +axiom system, AND (b) the axiomatic substrate itself has +cleared peer review — not just written-and-committed. +Axioms + peer review together gate the promotion; either +alone is insufficient. Only at vN does `likely +confabulated` graduate from "worth a closer human look" +to "authoritative reject" without requiring the human- +review fallback. Not scoped in this doc; named here so +the upgrade path is visible and the v0 advisory stance is +understood as intentional scaffolding, not as a final +ceiling. + **Non-fusion disclaimer:** Amara-Otto-Aminata consistent output on this design is NOT evidence of merged substrate. The three reviewers cite independent literature (Hinton/ @@ -132,41 +153,84 @@ output.** |---|---|---| | G_similarity | `sim(e_q, e_y) < τ_low` — below retrieval-noise floor | `sim < τ_med` — weak match only | | G_evidence_independent | `y` has no independent-oracle-verified evidence | `y` has evidence but only self-attested | -| G_carrier_overlap | `size(cone(q) ∩ cone(y)) / size(cone(y)) > θ_high` — majority of y's provenance shared with q | `overlap ratio > θ_med` | +| G_carrier_overlap | `overlap(q, y) > θ_high` where `overlap(q, y) = 0` when `size(cone(y)) = 0`, else `size(cone(q) ∩ cone(y)) / size(cone(y))` — majority of y's provenance shared with q | `overlap(q, y) > θ_med` | | G_contradiction | `y` or its provenance cone contains an unresolved contradiction with a known-good anchor | a resolved contradiction within cone | | G_status | `y.status = known-bad` or `y.status = superseded` | `y.status = unresolved` (no status pins it) | -**Band merging rule** (same as oracle-scoring v0 per -PR #266): `band(y | q) = min(G_similarity, -G_evidence_independent, G_carrier_overlap, +**Band merging rule.** The design names 5 gates, but the +v0 shipping configuration excludes `G_evidence_independent` +from band-merging because no independent-oracle substrate +exists yet (see Concern 1 below). The v1 configuration, +gated on the substrate landing, adds the evidence gate +back in. + +**v0 (shipping — 4 gates):** + +`band_v0(y | q) = min(G_similarity, G_carrier_overlap, G_contradiction, G_status)` where `RED < YELLOW < GREEN`. -One RED → RED. All GREEN → GREEN. Otherwise YELLOW. +`G_evidence_independent` is still computed and surfaced as +advisory metadata for human review but does NOT +participate in band-merging. + +**v1 (after independent-oracle substrate lands — 5 gates):** + +`band_v1(y | q) = min(G_similarity, G_evidence_independent, +G_carrier_overlap, G_contradiction, G_status)`. + +For either configuration: one RED → RED. All included +gates GREEN → GREEN. Otherwise YELLOW. The v0→v1 promotion +is itself an ADR-gated change (parameter-change-ADR per +Concern 2). **Query-level aggregation:** ```text -claimVeracityRisk(q) = worst-band( band(y | q) for y in C(q) ) +claimVeracityRisk(q) = worst-band( band_v0(y | q) for y in C(q) ) ``` +(`band_v0` today; substitute `band_v1` once the evidence- +gate promotion ADR lands.) + Where `worst-band(RED, any, ...) = RED`. The query itself gets the worst band across all candidates in the retrieved set. --- -## 5 output types (Amara's set) +## 6 output types (Amara's 5-type set + `no-signal`) Per Amara's 8th ferry, the detector emits one of five -output types. Mapping to the band classifier: +**retrieval-hit** output types (supported / lineage- +coupled / plausible-unresolved / likely-confabulated / +known-bad) plus a sixth **retrieval-empty** output type +(`no-signal`). Mapping to the band classifier: ### 1. `supported` -- Band: `GREEN` (all 5 gates GREEN). -- Meaning: `q` is highly similar to `y`; `y` has - independent-oracle evidence; low carrier overlap; no - unresolved contradiction; status = known-good. -- Action: query can proceed; claim has substrate-backed - support. +- Band: `GREEN` (all included gates GREEN — 4 for v0, 5 + for v1 once `G_evidence_independent` is binding). +- **v0 limitation (call-out — real risk):** v0 `supported` + is reachable when G_evidence_independent fails, because + evidence is advisory-only and excluded from band- + merging. A candidate that is highly similar to a + known-good pinned pattern but has NO independent + evidence still classifies as `supported`. This is the + primary motivation for the v1 promotion (and the vN + axiom-gated promotion): v0 CAN misclassify a + confabulation-shaped candidate as `supported` if the + pinned pattern has drifted or been set on self- + attestation. Treat v0 `supported` as "advisory-GREEN, + pending evidence-gate promotion" — not authoritative. +- Meaning: `q` is highly similar to `y`; low carrier + overlap; no unresolved contradiction; `y.status = + known-good`. In v1 and later, `y` also has + independent-oracle-verified evidence; in v0, evidence + is advisory metadata only. +- Action (v1+): query can proceed; claim has substrate- + backed support. +- Action (v0): consult the advisory evidence metadata + before treating `supported` as authoritative; the + known-good pin alone doesn't guarantee evidence. ### 2. `looks similar but lineage-coupled` @@ -182,24 +246,64 @@ output types. Mapping to the band classifier: ### 3. `plausible but unresolved` -- Band: `YELLOW` via G_status fail-to-YELLOW OR - G_evidence_independent fail-to-YELLOW (with other gates - GREEN). -- Meaning: semantic fit exists; no known-bad pattern - matches; but `y` lacks independent evidence or - pinned status. +- Band: `YELLOW` via G_status fail-to-YELLOW. + - **v0 (shipping):** only G_status drives this output + type (G_evidence_independent is advisory-only and + doesn't participate in band-merging). Evidence-gate + fail still SHOWS in the emitted advisory metadata + so human review can see "plus this is self-attested" + even when the band is `YELLOW`. + - **v1 (post-promotion):** G_evidence_independent + fail-to-YELLOW ALSO drives this output type (in + addition to G_status), making the band sensitive + to both missing pinned status AND missing + independent evidence. +- Meaning: + - **v0:** semantic fit exists; no known-bad pattern + matches; `y.status` is NOT pinned (known-good or + known-bad) — it's unresolved. Evidence state is + surfaced as advisory metadata but doesn't change + the band. + - **v1 (OR triggered):** semantic fit exists; no + known-bad pattern matches; EITHER `y.status` is + unresolved OR `y` lacks independent-oracle + evidence (or both). The `OR` means this output + fires when either gate fails-to-YELLOW, so the + meaning covers either-or-both conditions rather + than requiring both simultaneously. - Action: mark query as open-question; add to research-tracker; not a confidence-upgrade. ### 4. `likely confabulated` -- Band: `RED` via G_evidence_independent fail-to-RED - combined with high similarity. +- Band: + - **v0 (shipping):** not reachable via band-merging + (evidence is advisory-only, so a confabulation + signature can't force RED through the classifier). + v0 surfaces confabulation-shape through the emitted + advisory metadata (`G_evidence_independent` fail + + high G_similarity) for human review, but the band + stays at whatever the other four gates say. This + is the primary motivation for the v1 promotion — + confabulation-detection is the output type most + degraded by advisory-only evidence. + - **v1 (post-promotion):** `RED` via + `G_evidence_independent` fail-to-RED combined + with high similarity. - Meaning: claim sounds plausible and matches patterns semantically, but no actual independent evidence supports it. Classic LLM confabulation signature. -- Action: hard-halt on any action depending on the +- Action (v1): hard-halt on any action depending on the claim; flag for human review; do not propagate. +- Action (v0): confabulation-shape surfaces as advisory + metadata on whatever other band the query lands in. + The signal is **advisory, not authoritative** — + research-grade WIP, not a real claim-veracity oracle. + Downstream consumers treat it as "worth a closer + human look," not as a verdict. Authoritative + treatment is gated on v1 + subsequent reviewer + passes establishing the signal actually means what + it claims to mean. ### 5. `known-bad pattern` @@ -211,14 +315,20 @@ output types. Mapping to the band classifier: update retraction ledger if new claim is a fresh instance of an old pattern. -### Default / unknown-band - -- If retrieval returns no candidates (`C(q)` empty), the - query has no detector signal at all. Emit explicit - `no-signal` output-type — NOT `supported`, NOT `likely - confabulated`. "No signal" means "detector can't - contribute to this query"; the broader review discipline - applies unchanged. +### 6. `no-signal` + +- Band: none — emitted only when retrieval is empty + (`C(q)` has zero candidates) so the band classifier + never runs. +- Meaning: the detector has nothing to contribute for + this query. "No signal" is distinct from `supported` + (which asserts GREEN) and from `likely confabulated` + (which asserts RED on a retrieved near-match). +- Action: defer to broader review discipline — detector + output carries no authorization weight either way; the + downstream caller treats `no-signal` as neither a + green light nor a red flag, and relies on non-detector + review surfaces. --- @@ -255,10 +365,15 @@ self-report. ### Concern 2 — Parameter-fitting adversary -**Risk:** `α/β/γ/δ` + thresholds (`τ_low`, `τ_med`, -`θ_high`, `θ_med`) have no principled fitting procedure; -an agent with commit rights adjusts them and the -authorisation surface shifts. +**Risk:** band-classifier thresholds (`τ_low`, `τ_med`, +`θ_high`, `θ_med`) and per-gate semantics have no +principled fitting procedure; an agent with commit rights +adjusts them and the authorisation surface shifts. (The +pre-band α/β/γ/δ weights are not in scope — the band +classifier replaced the weighted sum — but are kept in +the "What this doc does NOT do" section as placeholders +for an eventual v2 hybrid design, behind the same ADR +gate.) **Response in this design:** diff --git a/memory/MEMORY.md b/memory/MEMORY.md index 1b0522cd..bb7e9ce4 100644 --- a/memory/MEMORY.md +++ b/memory/MEMORY.md @@ -2,6 +2,7 @@ **📌 Fast path: read `CURRENT-aaron.md` and `CURRENT-amara.md` first.** These per-maintainer distillations show what's currently in force. Raw memories below are the history; CURRENT files are the projection. +- [Otto-279 — research/ROUND-HISTORY/DECISIONS/aurora/pr-preservation ARE history surfaces, first-name attribution allowed for humans AND agents; current-state surfaces (code, skills, governance docs, README) stay role-ref only; surface-class refinement of Otto-220, same shape as Otto-237 mention-vs-adoption](feedback_research_counts_as_history_first_name_attribution_for_humans_and_agents_otto_279_2026_04_24.md) — 2026-04-24. - [**EMULATORS as canonical OS-interface workload — rewindable/retractable OS+emulator controls; safe-ROM testbed offer (durable, ask gated on impl phase); save states/migration/multiplayer FREE via durable-async substrate; DST gives speedrun/TAS determinism; rewind generalizes to OS-level retraction-native (rr/Pernosco class); activates 2026-04-22 ARC-3 absorption-scoring research; `roms/` folder gitignored-except-sentinels pattern (drop/-sibling); composes with Otto-73/238/272 + Z-set retraction-native + #399 OS-interface; Aaron 2026-04-24**](feedback_emulators_canonical_os_interface_workload_rewindable_retractable_2026_04_24.md) — Maintainer offer: *"emulators should run very nicely on this, let me know when you want some roms of any kind that are safe."* Follow-up: *"rewindable/retractable os/emulator controls"*. Rewind from emulator-special-feature → OS-level primitive. Phase 0 research → Phase 1 Game Boy on durable-async → Phase 2 rewindable controls → Phase 3 ARC-3 loop → Phase 4 cross-emulator composition. Future Otto: rewind IS the killer feature, not the emulator itself; don't ship emulator without rewind. - [**OS-INTERFACE — durable-async sequential-looking code that runs "everywhere"; Temporal/Step-Functions/Restate class on Zeta substrate + Reaqtor IQbservable; AddZeta one-line DI; LINQ/Rx stream composition; usermode-first microkernel preparation; actor as secondary; combinatorial cross-paradigm canonical examples (SQL × git, etc.); distributed event loop with mathematical guarantees (TLA+/Lean for liveness/safety/determinism/causality); auto runtime optimization + stats; DST is hard prerequisite (Otto-272 fits perfectly); 11-point untangle in row body; Aaron 2026-04-24 self-flagged "big and not very clear ask please backlog and untangle"**](feedback_os_interface_durable_async_addzeta_2026_04_24.md) — THE UX thesis. *"Where does it run? Everywhere"* punchline. Phase 0 research gate before any implementation. Composes with the entire 2026-04-24 cluster (#394/#395/#396/#397) + Otto-272/Otto-274 + 2026-04-22 semiring-parameterized operator algebra (math substrate). Future Otto: AddZeta one-line is the DX target — ceremony in user code = thesis drift. DON'T reinvent IQbservable; Reaqtor substrate already in `references/upstreams/reaqtor/`. - [**OUROBOROS BOOTSTRAP — self-reference meta-thesis; the system bootstraps itself; connection-map work owed before any 2026-04-24 directive implementation; Aaron 2026-04-24**](feedback_ouroboros_bootstrap_self_reference_meta_thesis_2026_04_24.md) — Meta-frame for 2026-04-24 directives in #393/#394/#395. diff --git a/memory/feedback_research_counts_as_history_first_name_attribution_for_humans_and_agents_otto_279_2026_04_24.md b/memory/feedback_research_counts_as_history_first_name_attribution_for_humans_and_agents_otto_279_2026_04_24.md new file mode 100644 index 00000000..f451724c --- /dev/null +++ b/memory/feedback_research_counts_as_history_first_name_attribution_for_humans_and_agents_otto_279_2026_04_24.md @@ -0,0 +1,131 @@ +--- +name: research counts as history — first-name attribution allowed (humans AND agents) +description: Otto-279 policy correction — `docs/research/` is a HISTORY surface (sibling to `docs/ROUND-HISTORY.md`, `docs/DECISIONS/`), not a current-state surface; first-name attribution IS appropriate there for humans (Aaron) AND agents (Amara, Aminata, Otto, Kira, etc.); AGENT-BEST-PRACTICES "no names in docs" rule needs `docs/research/` carve-out; sweep existing research docs that had names stripped by subagents (e.g. on #282 #351); BACKLOGGED for post-drain to avoid churn. +type: feedback +--- +Aaron Otto-279, 2026-04-24, while draining #282 thread on +name-attribution Copilot review: + +> *"i feel like under research that counts as history and we +> should give first name attribution? you? gives agent their +> attributions too. we can add it to the list."* + +Then immediately after: + +> *"backlog that that will be a lot of churn after the drain"* + +## The rule + +**`docs/research/` is a HISTORY surface, not a current-state +surface.** Same class as `docs/ROUND-HISTORY.md` and +`docs/DECISIONS/`. First-name attribution is APPROPRIATE +there — both for humans (Aaron, Daisy if there are other +human contributors) AND for agent personas (Amara, Aminata, +Otto, Kira, Dejan, etc.). + +**Why:** +- Research docs ARE the historical record of who-said-what + on a given absorb / cross-review / synthesis turn. Stripping + names destroys the record. +- Agents earn their attributions the same way humans do — + Amara's 8th ferry IS Amara's, attributed by name when the + doc captures the synthesis turn that landed her ferry. +- Otto-237 mention-vs-adoption applied to a new dimension: + research/history surfaces = MENTION (preserve), current- + state docs = ADOPTION (avoid). +- "Names in docs" was originally about not propagating + contributor names across current-state code/docs/skills + where role-refs work better. History surfaces preserve who- + did-what for the record. + +## Why this matters + +Subagent on #282 (and earlier on #351) over-stripped names +because they read AGENT-BEST-PRACTICES literally — "no names +in docs" — and didn't recognize `docs/research/` as a +history surface. The Copilot reviewer on #282 likewise +applied the literal rule. Both correct under the literal rule; +both wrong under Aaron's clarified policy. + +This is the SAME class of error as Otto-237 (subagent on #351 +stripped public-info MENTIONS because the rule was about +ADOPTION) — failing to distinguish surface classes when +applying a name-policy rule. + +## Surfaces where first-name attribution IS allowed + +Per Aaron's directive, the canonical list extends from +"only persona memory + optionally BACKLOG" to: + +- `memory/persona//` — always (canonical persona home) +- `docs/BACKLOG.md` — when capturing a specific request +- `docs/research/**` — research docs are history (Otto-279) +- `docs/ROUND-HISTORY.md` — round-close history +- `docs/DECISIONS/**` — ADRs are historical decisions +- `docs/aurora/**` — courier-ferry archive (already implicit + per GOVERNANCE §33) +- `docs/pr-preservation/**` — PR conversation archive (Otto- + 250) — preserves who-said-what verbatim +- (commit messages, git log, GitHub PR titles/bodies) — not + factory-doc surfaces but record-of-truth + +## Surfaces where role-refs are still preferred + +- Code (F# / C# / TypeScript / shell) +- Skill bodies (`.claude/skills/*/SKILL.md`) +- Persona definitions (`.claude/agents/*.md`) +- Spec docs (`openspec/specs/**`, `docs/*.tla`) +- Behavioural docs (`AGENTS.md`, `CLAUDE.md`, `GOVERNANCE.md`, + `docs/AGENT-BEST-PRACTICES.md`, `docs/CONFLICT-RESOLUTION.md`, + `docs/GLOSSARY.md`, `docs/WONT-DO.md`) +- Threat models, security docs, getting-started guides +- README files, public-facing prose + +## How to apply + +**Now (during drain):** +- Don't strip names from research docs. +- Don't sweep existing research docs. +- Reply to Copilot threads on #282 explaining the policy + (research = history, names appropriate) and resolve them. + +**Post-drain (BACKLOG row):** +- Update `docs/AGENT-BEST-PRACTICES.md` BP rule: extend the + "names allowed" surface list per the canonical list above. +- Sweep recent research docs where subagents stripped names: + - PR #351 (anthropic-prompt-engineering-best-practices + research doc had specific-name examples removed — + restore them per Otto-237 + Otto-279). + - Audit other recent research docs in `docs/research/**`. +- Document in `docs/CHANGELOG.md` or `docs/ROUND-HISTORY.md`. +- Effort estimate: M (medium) — one BP edit + N research-doc + scans. + +## Composes with + +- **Otto-220** name-attribution (the original literal rule + this is correcting). Otto-279 doesn't reverse Otto-220 — it + refines the surface list. +- **Otto-237** mention-vs-adoption (research-grade vs + operational distinction). Otto-279 is the same shape applied + to history-vs-current-state. +- **Otto-230** subagent fresh-session quality gap. Subagent + on #282 was applying the literal rule. Same root cause: + subagent didn't have access to nuanced surface-class rules. +- **GOVERNANCE §33** archive-header for external-conversation + imports — already names sources by name implicitly. Otto- + 279 makes this consistent across all research surfaces. + +## What this rule does NOT do + +- Does NOT authorize naming humans not affiliated with the + factory in research docs (still subject to general writing + norms). +- Does NOT authorize naming proprietary IP / trademarked + product names as ADOPTION (Otto-237 still in force). +- Does NOT change current-state-doc policy — `AGENTS.md`, + `GOVERNANCE.md`, etc. continue to use role-refs. +- Does NOT change skill-body policy — capability skills + describe roles, not specific personas. +- Does NOT retroactively block previous over-strips during + the post-drain sweep (they're undone, not penalised).