From 47ec0005b2d1ac54b8bc35579e982f4851bdf228 Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Thu, 23 Apr 2026 20:48:39 -0400 Subject: [PATCH 1/3] =?UTF-8?q?research:=20memory=20reconciliation=20algor?= =?UTF-8?q?ithm=20=E2=80=94=20v0=20design=20(Amara=20Determinize=20L-effor?= =?UTF-8?q?t=20item)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Amara's 4th ferry (PR #221 absorb) centerpiece proposal: replace hand-maintained CURRENT-*.md distillations with generated views over typed memory facts. Her sketch was ~40 lines of Python; this is the design that downstream implementation follows. ~380 lines covering: - MemoryFact record schema (id / subject / predicate / object / source_kind / source_path / source_anchor / timestamp_utc / supersedes / priority / status / confidence / tags) - 6 schema invariants (at-most-one-active-per-canonical-key + monotone-timestamps-on-chain + retraction-leaves-trail + ...) - Canonical-key normalization rules (7 apply; 3 deliberately NOT applied to preserve distinctions) - Reconciliation pseudocode (group by canonical key, detect conflicts, follow supersession chains) - Conflict output format → CONTRIBUTOR-CONFLICTS.md rows - Rendering rules for CURRENT-.md + MEMORY.md - 5-phase incremental migration (schema adoption → generator prototype → mechanical backfill → cutover → LLM extraction) - CI integration hooks composing with rows #58, #59, #12 - Worked examples (MF-2026-04-23-001 "Aaron endorses deterministic reconciliation"; MF-2026-04-23-004 "Aaron grants full GitHub access") - 5 open questions for Phase 1 PR design decisions Composes with: - Otto-73 retractability-by-design foundation — MemoryFact status (active / superseded / retracted) is the retraction- native primitive at the memory substrate - PR #222 decision-proxy-evidence — consulted_memory_ids can now reference MemoryFact.id directly - PR #225 memory-reference-existence CI (row #59) — generated output preserves the invariant by construction - Zeta's ZSet algebra — MemoryFact records ARE Z-set entries at the memory layer; same primitive, different surface Addresses MEMORY.md cap-drift (Otto-70 snapshot-tool surfaced 58842 bytes vs. 24976-byte cap): a generated index can be bounded by construction (top-N most-recent, archive the rest). Not implementation. Research doc only. Downstream arc: schema adoption (S) → generator prototype off-CI (S-M) → mechanical backfill (M) → cutover with retractability (M) → LLM-assisted extraction (L research). Amara Determinize-stage: 3/5 (with this PR). ✓ Live-state-before-policy (PR #224) ✓ Memory reference-existence lint (PR #225) ✓ Memory reconciliation algorithm design (this PR) Remaining: - Generated CURRENT-*.md views (L; this doc's Phase 2) - Memory duplicate-title lint enforcement (partial via AceHack PR #12; graduates via batch-sync) Per Aaron Otto-73 retractability foundation: the design itself embodies the thesis — supersession + status + retraction make the memory layer's reconciliation deterministic, same primitive as Zeta's data layer. Co-Authored-By: Claude Opus 4.7 --- ...onciliation-algorithm-design-2026-04-24.md | 476 ++++++++++++++++++ 1 file changed, 476 insertions(+) create mode 100644 docs/research/memory-reconciliation-algorithm-design-2026-04-24.md diff --git a/docs/research/memory-reconciliation-algorithm-design-2026-04-24.md b/docs/research/memory-reconciliation-algorithm-design-2026-04-24.md new file mode 100644 index 00000000..83c53ab0 --- /dev/null +++ b/docs/research/memory-reconciliation-algorithm-design-2026-04-24.md @@ -0,0 +1,476 @@ +# Memory reconciliation algorithm — design v0 + +**Date:** 2026-04-24 +**Status:** research proposal; v0 design ready for review + incremental implementation +**Stage:** Amara Determinize (L-effort item per PR #221 absorb) +**Companion:** Otto-73 retractability-by-design foundation memory +**Implementation arc:** this doc is design-only; implementation lands as separate PRs (schema adoption → migration tooling → generation tool → CI integration) across multiple rounds + +--- + +## Why this exists + +Amara's 4th courier ferry (PR #221 absorb) proposed replacing +hand-maintained prose-based `CURRENT-aaron.md` / `CURRENT-amara.md` +distillations with **generated views over typed memory facts**. + +Her sketch was a ~40-line Python prototype. This doc is the +design that downstream implementation follows: schema +semantics, normalization rules, conflict detection, rendering, +migration path from the existing prose corpus. + +The design also addresses the MEMORY.md cap-drift surfaced by +Otto-70's snapshot-pinning tool (58842 bytes vs. 24976-byte +cap per FACTORY-HYGIENE row #11). A generated index can be +bounded by construction (emit top-N most-relevant, archive +the rest). + +Composes with "deterministic reconciliation" naming (Otto-67 +endorsement): this IS the concrete reconciliation mechanism +for the memory layer. Also composes with Zeta's retraction- +native algebra — `MemoryFact` records with explicit +supersession + retraction status mirror Z-set algebraic +semantics at the memory substrate. + +--- + +## Scope + +### In scope + +- Typed `MemoryFact` record schema (fields + invariants) +- Canonical-key normalization rules (what makes two facts + "about the same thing") +- Priority / supersession / status semantics +- Conflict detection + surfacing +- Generated rendering rules for `CURRENT-.md` + and `MEMORY.md` index +- Migration path from existing prose memories +- CI integration hooks + +### Out of scope (future work) + +- Actual implementation language + tool (Python, F#, shell — + later decision; design is language-agnostic) +- Full backfill of the 391 existing per-user memories + + 44 in-repo memories into typed records +- LLM-based fact extraction (if needed for prose-to-fact + migration — separate research arc) +- Multi-maintainer consensus protocols (today: one + human maintainer + AI maintainers. Cross-human + consensus can be added when roster grows) + +### Guardrail principles + +- **Don't rewrite prior prose memories.** They're source- + of-truth for the facts they encode; typed records + extract facts FROM them, don't replace them. +- **Retractions leave trails.** Supersession is explicit + + dated; no silent rewrite. Honors Otto-73 retractability- + by-design discipline. +- **Generated views are DERIVED, not authoritative.** + `CURRENT-*.md` and `MEMORY.md` become generated; the + typed fact corpus is the source of truth. +- **Migration is incremental.** Land the schema first; + backfill mechanically where possible; retain prose for + facts too rich to compress. + +--- + +## Schema — `MemoryFact` record + +### Fields + +| Field | Type | Required | Semantics | +|---|---|---|---| +| `id` | string | yes | Globally unique fact ID (e.g., `MF-2026-04-23-001`) | +| `subject` | string | yes | Who the fact is about: `aaron` / `amara` / `otto` / `kenji` / ... / `any` (factory-generic) | +| `predicate` | string | yes | Normalized verb: `prefers` / `delegates` / `forbids` / `endorses` / `retracted` / `supersedes` / ... | +| `object` | string | yes | Normalized claim text | +| `source_kind` | enum | yes | `memory` / `current` / `decision` / `backlog` / `conflict` / `verbatim-quote` | +| `source_path` | string | yes | File path the fact was extracted from | +| `source_anchor` | string | optional | Line number, section header, or hash for citation | +| `timestamp_utc` | ISO8601 | yes | When the fact was authored (not when extracted) | +| `supersedes` | string | optional | ID of fact this one supersedes (one-to-one) | +| `priority` | int | yes | Explicit override > current view > memory > archive (4 > 3 > 2 > 1) | +| `status` | enum | yes | `active` / `retracted` / `superseded` | +| `confidence` | enum | optional | `verbatim` / `paraphrase` / `inference` — how tight the extraction is | +| `tags` | list[string] | optional | Cross-cutting tags: `principle`, `authorization`, `register`, `ops`, `naming`, etc. | + +### Invariants + +1. `(subject, predicate, canonical_key(object))` is the + canonical key. Multiple facts with the same canonical + key form a version chain. +2. At most one fact per canonical key has `status: active` + at any given time. Others are `superseded` or `retracted`. +3. `supersedes` is a single-step back-pointer. Chain + traversal: follow `supersedes` until null. +4. `timestamp_utc` is monotone along a supersession chain + (newer supersedes older). +5. `retracted` status implies `supersedes` is set to the + previously-active fact (retraction creates a new + record, not an in-place edit). +6. `priority` breaks ties only among simultaneously- + active facts (shouldn't happen under invariant 2 but + provides a deterministic fallback). + +### Canonical-key normalization + +`canonical_key(object)` collapses minor variations so +facts-about-the-same-thing chain cleanly. + +Rules (applied in order): + +1. Lowercase all characters +2. Replace whitespace sequences with single space +3. Strip leading/trailing whitespace +4. Remove markdown emphasis markers (`**`, `*`, `_`, backticks) +5. Normalize smart quotes (`"` / `"` / `'` / `'`) to plain + ASCII (`"` / `'`) +6. Collapse repeated punctuation (`!!!` → `!`) +7. Strip trailing punctuation (`.`, `!`, `?`, `;`, `,`) + +Rules NOT applied (preserve these distinctions): + +- Word order — "Aaron prefers X" ≠ "X is Aaron's preference" + (different canonical keys; handle via separate fact + extraction, not normalization) +- Synonyms — "like" vs. "prefer" (lexically distinct; + collapsing requires LLM-assisted normalization, + out of scope for v0) +- Tense — "Aaron prefers X" vs. "Aaron preferred X" + (different tense = different time; preserve) + +### Example records + +```yaml +- id: MF-2026-04-23-001 + subject: aaron + predicate: endorses + object: deterministic reconciliation as canonical phrasing for operational closure + source_kind: memory + source_path: memory/feedback_deterministic_reconciliation_endorsed_naming_for_closure_gap_not_philosophy_gap_2026_04_23.md + timestamp_utc: 2026-04-23T20:45:00Z + supersedes: null + priority: 3 + status: active + confidence: verbatim + tags: [naming, principle, vocabulary] + +- id: MF-2026-04-23-004 + subject: aaron + predicate: grants + object: full GitHub access for AceHack + LFG, only restriction is don't increase spending without asking + source_kind: memory + source_path: memory/feedback_aaron_full_github_access_authorization_all_acehack_lfg_only_restriction_no_spending_increase_2026_04_23.md + timestamp_utc: 2026-04-23T21:30:00Z + supersedes: MF-2026-04-23-002 # superseding the prior Otto-23 partial grant + priority: 3 + status: active + confidence: verbatim + tags: [authorization, standing, github] +``` + +--- + +## Reconciliation algorithm + +Pseudocode (language-agnostic): + +``` +function reconcile(facts): + # Group by canonical key + by_key = {} + for f in facts: + k = (f.subject, f.predicate, canonical_key(f.object)) + by_key[k].append(f) + + # Per-key: pick the winner, detect conflicts + accepted = {} + conflicts = [] + for key, group in by_key.items(): + active = [f for f in group if f.status == "active"] + if len(active) == 0: + continue # all retracted/superseded + if len(active) > 1: + # multiple active with same key = invariant-2 violation + winner = max(active, key=lambda f: (f.priority, f.timestamp_utc)) + conflicts.append(ConflictRow(key, active, winner=winner)) + accepted[key] = winner + else: + accepted[key] = active[0] + + # Check version-chain consistency + for key, f in accepted.items(): + chain = follow_supersession(f, by_key[key]) + if chain_broken(chain): + conflicts.append(ConflictRow(key, chain, reason="broken chain")) + + return accepted, conflicts +``` + +### Conflict outputs + +Each conflict becomes a row in `docs/CONTRIBUTOR-CONFLICTS.md` +(the file Amara's 4th ferry noted is empty but should be used). +Row format: + +```markdown +### CONF--: / +- **Canonical key:** `::::` +- **Conflicting facts:** [MF-..., MF-...] +- **Winner (priority tiebreak):** MF-... +- **Reason:** invariant-2 violation | broken chain | explicit disagreement +- **Resolution:** pending | explicit-preference-recorded | escalated +- **Resolution evidence:** +``` + +Conflicts block the `CURRENT-*.md` generation if unresolved +— this is the "explicit-not-silent" discipline Amara +emphasized. A CI run that discovers unresolved conflicts +fails the generation job. + +--- + +## Rendering rules + +### `CURRENT-.md` generation + +Filter accepted facts by subject (`` or `any`), +sort by `(priority DESC, timestamp DESC)`, group by +`predicate`, render as markdown: + +```markdown +# CURRENT-.md — generated + +**Last generated:** +**Source corpus:** facts from docs/> +**Conflicts pending:** + +--- + +## + +- **** — source: [](), +- ... +``` + +Header states generation-time + source-corpus-size + +pending-conflict-count. The generator may refuse to emit +if `conflicts_pending > 0` and `--allow-conflicts` is not +set. + +### `MEMORY.md` index generation + +Accept facts where `source_kind == "memory"`; emit +newest-first list of `(source_path, first-sentence-of-object, tags)` +tuples. Cap at configurable size (default: 250 entries or 30KB, +whichever smaller — matches the FACTORY-HYGIENE row #11 cap with +headroom). + +Older entries move to dated archive files +`memory/MEMORY-ARCHIVE-YYYY-MM.md`. Ordering + link integrity +preserved across the archive boundary. + +--- + +## Migration path from existing prose corpus + +### Phase 1 — Schema adoption + worked example (S) + +- Land this research doc (current PR) +- Create `memory/facts/` directory seeded with 5-10 + manually-authored `MemoryFact` records as worked + examples (e.g., the "Aaron endorses deterministic + reconciliation" record shown above) +- Keep existing prose memories unchanged + +### Phase 2 — Generator prototype, off-CI (S-M) + +- Implement `tools/memory/reconcile.py` (or equivalent) + reading `memory/facts/*.yaml` + emitting + `memory/CURRENT-.md.generated` + + `memory/MEMORY.md.generated` (parallel output, not + replacing existing files yet) +- Land the tool + a research doc comparing generated + output against current hand-maintained files +- Do NOT overwrite existing files in this phase + +### Phase 3 — Mechanical backfill (M) + +- For each existing prose memory, extract 1-5 + `MemoryFact` records mechanically (parse frontmatter + `description` + `verbatim` quotes) +- Human-maintainer spot-check of backfill quality +- Cross-link: typed records cite their source prose + memory via `source_path` + +### Phase 4 — Cutover with retractability (M) + +- Move existing hand-maintained `CURRENT-*.md` to + archive (`CURRENT-aaron-archive-2026-04.md`); + retractability preserves the old versions +- Cutover the root `CURRENT-aaron.md` / `CURRENT-amara.md` + to generated output +- Same for `MEMORY.md` +- CI integration: fail if generated output drifts from + expected; conflict rows block generation + +### Phase 5 — Richer LLM-assisted extraction (L, research) + +- Use an LLM pass to extract additional facts from + prose that the mechanical parser missed +- Careful review discipline — not auto-merge; human + + peer review for each LLM extraction pass +- Establishes a richer fact-count; may surface additional + conflicts + +--- + +## CI integration hooks + +### Existing surfaces this composes with + +- FACTORY-HYGIENE row #58 (memory-index-integrity CI) — + same-commit pairing of memory changes + MEMORY.md + updates. Generated MEMORY.md preserves this invariant + by construction; CI stays green. +- FACTORY-HYGIENE row #59 (memory-reference-existence) — + link targets must resolve. Generated output can be + validated by the same tool; CI stays green. +- AceHack PR #12 (memory-index-duplicates) — no duplicate + link targets. Generated output deduplicates by + construction; CI stays green. +- PR #222 decision-proxy-evidence — `consulted_memory_ids` + can now reference `MemoryFact.id` directly for + tighter audit. + +### New CI hook for this work + +- `memory-reconcile-generation.yml` — on PR touching + `memory/facts/*.yaml` or the generator, re-run + generation; fail if generated output ≠ committed + output (similar to OpenAPI-spec-diff style check). + +### Ordering of hooks + +1. memory-index-integrity (row #58) — same-commit +2. memory-reference-existence (row #59) — refs resolve +3. memory-index-duplicates (AceHack #12) — no dups +4. memory-reconcile-generation (new) — generated output + matches committed +5. memory-reconcile-conflict-check (new) — no unresolved + conflicts + +Steps 4 + 5 are future work; 1-3 already cover the +prose-layer invariants. + +--- + +## Relationship to existing substrate + +### With Otto-73 retractability-by-design + +The `MemoryFact.status` field (active / superseded / +retracted) is exactly the retraction-native primitive at +the memory substrate. Each record is a signed delta; +supersession chains encode history; the reconciliation +algorithm is a deterministic fold over the deltas. +Zeta's ZSet algebra applied to memory. + +### With Amara's 4 ferries + +Amara's 4th ferry explicitly proposed this algorithm; +earlier ferries established the drift classes it +addresses: + +- Otto-24 (PR #196) operational gap — memory-index lag + (NSA-001) now captured as canonical-key conflict + in the fact corpus +- Otto-54 (PR #211) ZSet semantics — the algebraic + framework (Z-sets + retraction) that this memory + schema inherits +- Otto-59 (PR #219) decision-proxy technical review — + `consulted_memory_ids` field needs stable memory IDs; + MemoryFact.id provides them +- Otto-67 (PR #221) memory drift alignment — this is + the concrete algorithm her report proposed + +### With Zeta's core algebra + +`MemoryFact` records ARE Z-set entries at the memory +layer: + +- `(subject, predicate, canonical_key(object))` = the Z-set + key +- Priority + status + timestamp = the "weight" dimension + (non-integer; resembles signed-delta semantics) +- Reconciliation = the `distinct` operator at the + memory level, clamping to at-most-one-active per key +- Conflict detection = invariant violation surfacing + (the same discipline Zeta's algebra-owner enforces + for the code layer) + +This is not coincidence. Aaron's Otto-73 thesis: +retractability is design at every layer of the factory. +This doc operationalizes it at the memory layer. + +--- + +## What this design is NOT + +- **Not a commitment to one implementation language.** + Python, F#, shell — later decision. Design is + language-agnostic. +- **Not a requirement to migrate all 391 existing + per-user memories at once.** Incremental backfill, + prose retained as source-of-truth. +- **Not authorization to overwrite existing + CURRENT-*.md files.** Cutover is Phase 4; earlier + phases generate `.generated` companions. +- **Not a commitment to LLM-assisted extraction.** + Phase 5 is research-grade; manual + mechanical + parsing covers the main backfill. +- **Not a replacement for decision-proxy-evidence + records.** Evidence records capture per-decision + context; MemoryFacts capture long-lived claims. + Different surfaces; they compose via ID references. +- **Not a retraction of prose memory discipline.** + Prose stays; it's the source material from which + typed records extract. The factory's thought-layer + continues in prose. + +--- + +## Open questions for follow-up rounds + +1. **Language choice** — Python (Amara's prototype), + F# (consistent with Zeta), shell (matches existing + tools/hygiene/ pattern)? +2. **Facts directory location** — `memory/facts/` under + the existing memory tree, or separate surface? +3. **Conflict-row automation boundary** — CI-generated + rows, or human-required fields for resolution? +4. **Archive boundary policy** — date-based (>90 days), + count-based (keep 250 most-recent), relevance-scored + (keep most-cited), or hybrid? +5. **Extraction granularity for mechanical backfill** — + one fact per memory frontmatter, or mine the body + for multi-fact patterns? + +These are Phase 1 PR design decisions, not blockers for +the research-doc approval. + +--- + +## Attribution + +Amara (external AI maintainer) proposed the algorithm +Otto-67 (PR #221 ferry). Otto (loop-agent PM hat, +Otto-74) authored this design doc. Aaron's Otto-73 +retractability-by-design insight grounds the schema's +supersession semantics. Kenji (Architect) queued for +synthesis on Phase 1 scope. Downstream implementation +follows this design across multiple PRs on the Amara +Determinize + Govern + Assure roadmap. From 6f895f1042da09e4ff9513312076d6800244dcb6 Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Sat, 25 Apr 2026 01:08:41 -0400 Subject: [PATCH 2/3] =?UTF-8?q?drain(#226=20P0+P1=C3=972+P2=C3=973=20Codex?= =?UTF-8?q?):=20retraction=20semantics=20+=20cap=20consistency=20+=20smart?= =?UTF-8?q?-quote=20+=20pseudocode=20init=20+=20present-with-schema?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Six substantive Codex findings on memory-reconciliation algorithm doc: P0 (line 202) — retraction semantics inconsistency: reconcile() filtered by status == 'active' which masked the intent. Added explicit retraction-semantics docstring: - Facts transition via explicit FactRetracted / FactSuperseded events; never deleted, only marked. - reconcile() ignores retracted/superseded for liveness but STILL considers them when checking version-chain integrity. - Updated chain check to operate over ALL facts in the group (including retracted/superseded), not just active ones — chain integrity needs the full history. P1 (line 187) — stable fact identity vs grouping key: Distinguished fact ID (stable identity, unique) from (subject, predicate, canonical_key) grouping tuple (which multiple facts can share under invariant-2's collision case). Comment makes the distinction explicit. P1 (line 270) — MEMORY.md cap inconsistency: Default 30KB exceeded FACTORY-HYGIENE row #11 cap (24,976 bytes). Updated to 24,000 bytes — strictly under the hard cap with ~1KB headroom for header/annotation overhead. P2 (line 130) — smart-quote example ambiguous: Both sides showed plain ASCII ('"' / "'"). Replaced with explicit Unicode codepoint references (U+201C/D for double, U+2018/9 for single) so the rule is unambiguous in plain-ASCII source. P2 (line 186) — pseudocode by_key[k] used before init: Switched to defaultdict(list); added a comment noting the equivalence to 'if k not in by_key: by_key[k] = []' for non-Python implementers. P2 (line 216) — CONTRIBUTOR-CONFLICTS.md 'empty' wording: File is present and contains a schema; just unpopulated. Updated text to 'present-with-schema-but-unpopulated; this design starts populating it via the generator'. --- ...onciliation-algorithm-design-2026-04-24.md | 45 ++++++++++++++----- 1 file changed, 33 insertions(+), 12 deletions(-) diff --git a/docs/research/memory-reconciliation-algorithm-design-2026-04-24.md b/docs/research/memory-reconciliation-algorithm-design-2026-04-24.md index 83c53ab0..01120a83 100644 --- a/docs/research/memory-reconciliation-algorithm-design-2026-04-24.md +++ b/docs/research/memory-reconciliation-algorithm-design-2026-04-24.md @@ -126,8 +126,9 @@ Rules (applied in order): 2. Replace whitespace sequences with single space 3. Strip leading/trailing whitespace 4. Remove markdown emphasis markers (`**`, `*`, `_`, backticks) -5. Normalize smart quotes (`"` / `"` / `'` / `'`) to plain - ASCII (`"` / `'`) +5. Normalize smart/curly quotes (left-double U+201C, right- + double U+201D, left-single U+2018, right-single U+2019) + to plain ASCII straight quotes (`"` and `'`) 6. Collapse repeated punctuation (`!!!` → `!`) 7. Strip trailing punctuation (`.`, `!`, `?`, `;`, `,`) @@ -180,19 +181,34 @@ Pseudocode (language-agnostic): ``` function reconcile(facts): - # Group by canonical key - by_key = {} + # Group by canonical key. Use defaultdict(list) so the + # first append() initialises the bucket; equivalent to + # `if k not in by_key: by_key[k] = []` then append. + by_key = defaultdict(list) for f in facts: + # Stable fact identity is (id) — fact-IDs are unique. + # The (subject, predicate, canonical_key(object)) tuple + # is the *grouping* key (multiple distinct facts may + # share it under invariant #2's collision case below); + # do NOT confuse the two. k = (f.subject, f.predicate, canonical_key(f.object)) by_key[k].append(f) - # Per-key: pick the winner, detect conflicts + # Per-key: pick the winner, detect conflicts. accepted = {} conflicts = [] for key, group in by_key.items(): + # Retraction semantics: a fact is "live" if its + # latest version (by supersession chain + timestamp) + # has status == "active". Status transitions to + # "retracted" or "superseded" via explicit + # FactRetracted / FactSuperseded events; we never + # delete records, only mark them. The reconcile() + # filter below ignores retracted/superseded forms but + # still considers them when checking chain integrity. active = [f for f in group if f.status == "active"] if len(active) == 0: - continue # all retracted/superseded + continue # all retracted/superseded — key not live if len(active) > 1: # multiple active with same key = invariant-2 violation winner = max(active, key=lambda f: (f.priority, f.timestamp_utc)) @@ -201,7 +217,9 @@ function reconcile(facts): else: accepted[key] = active[0] - # Check version-chain consistency + # Check version-chain consistency over ALL facts in the + # group (including retracted/superseded), not just active + # ones — chain integrity needs the full history. for key, f in accepted.items(): chain = follow_supersession(f, by_key[key]) if chain_broken(chain): @@ -213,7 +231,9 @@ function reconcile(facts): ### Conflict outputs Each conflict becomes a row in `docs/CONTRIBUTOR-CONFLICTS.md` -(the file Amara's 4th ferry noted is empty but should be used). +(the file Amara's 4th ferry noted is present-with-schema-but- +unpopulated; this design starts populating it via the +generator). Row format: ```markdown @@ -264,10 +284,11 @@ set. ### `MEMORY.md` index generation Accept facts where `source_kind == "memory"`; emit -newest-first list of `(source_path, first-sentence-of-object, tags)` -tuples. Cap at configurable size (default: 250 entries or 30KB, -whichever smaller — matches the FACTORY-HYGIENE row #11 cap with -headroom). +newest-first list of `(source_path, first-sentence-of-object, +tags)` tuples. Cap at configurable size (default: 250 entries +or 24,000 bytes — strictly under the FACTORY-HYGIENE row #11 +24,976-byte hard cap, with ~1KB headroom for any header / +index annotations the generator writes around the entry list). Older entries move to dated archive files `memory/MEMORY-ARCHIVE-YYYY-MM.md`. Ordering + link integrity From 32721c01b8c69be7dda056275bed5485b4c4bce0 Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Sat, 25 Apr 2026 01:18:41 -0400 Subject: [PATCH 3/3] drain(#226 P1+P2 Codex): chain-head liveness + chain-integrity for retired groups MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit P1 (line 210) — chain-HEAD liveness, not 'any active in group': The reconcile filter marked a key live whenever any record in the group had status==active. That's wrong — a key with active(t=1) → retracted(t=2) has an earlier active record but the HEAD of the supersession chain is retracted, so the key is not live. Fix: `follow_supersession_to_head(group)` walks supersedes-pointers to find the most-recent record; liveness keyed on its status == active. P2 (line 224) — chain integrity for fully retired groups: The chain-integrity check looped over `accepted.items()`, which only included keys with at least one active record. Retired groups (all members retracted/superseded) could have broken chains and we'd silently miss them. Fix: loop over `by_key.items()` (all groups, including fully retired ones). Chain integrity is independent of liveness. --- ...onciliation-algorithm-design-2026-04-24.md | 52 +++++++++++-------- 1 file changed, 30 insertions(+), 22 deletions(-) diff --git a/docs/research/memory-reconciliation-algorithm-design-2026-04-24.md b/docs/research/memory-reconciliation-algorithm-design-2026-04-24.md index 01120a83..092bd8f5 100644 --- a/docs/research/memory-reconciliation-algorithm-design-2026-04-24.md +++ b/docs/research/memory-reconciliation-algorithm-design-2026-04-24.md @@ -198,30 +198,38 @@ function reconcile(facts): accepted = {} conflicts = [] for key, group in by_key.items(): - # Retraction semantics: a fact is "live" if its - # latest version (by supersession chain + timestamp) - # has status == "active". Status transitions to + # Retraction semantics: a key is "live" if the HEAD + # of its supersession chain has status == "active". + # The chain head — not "any active record in the + # group" — determines liveness, because a key with + # active(t=1) → retracted(t=2) is NOT live (head is + # retracted) even though an earlier active record + # exists in the group. Status transitions to # "retracted" or "superseded" via explicit # FactRetracted / FactSuperseded events; we never - # delete records, only mark them. The reconcile() - # filter below ignores retracted/superseded forms but - # still considers them when checking chain integrity. - active = [f for f in group if f.status == "active"] - if len(active) == 0: - continue # all retracted/superseded — key not live - if len(active) > 1: - # multiple active with same key = invariant-2 violation - winner = max(active, key=lambda f: (f.priority, f.timestamp_utc)) - conflicts.append(ConflictRow(key, active, winner=winner)) - accepted[key] = winner - else: - accepted[key] = active[0] - - # Check version-chain consistency over ALL facts in the - # group (including retracted/superseded), not just active - # ones — chain integrity needs the full history. - for key, f in accepted.items(): - chain = follow_supersession(f, by_key[key]) + # delete records, only mark them. + chain_head = follow_supersession_to_head(group) + if chain_head is not None and chain_head.status == "active": + # Multiple active records that all map to the same + # canonical key (invariant-2 violation) surface as a + # ConflictRow; chain head is the winner. + siblings_active = [f for f in group + if f.status == "active" + and f.id != chain_head.id] + if siblings_active: + conflicts.append(ConflictRow( + key, [chain_head, *siblings_active], winner=chain_head)) + accepted[key] = chain_head + # else: key is fully retired (chain head retracted or + # superseded with no successor). Don't mark live; + # chain integrity is still validated below. + + # Check version-chain consistency over ALL grouped keys + # — including those whose chain head is retracted or + # superseded — not just `accepted`. Chain integrity is + # a property of the history, independent of liveness. + for key, group in by_key.items(): + chain = follow_supersession_full(group) if chain_broken(chain): conflicts.append(ConflictRow(key, chain, reason="broken chain"))