Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 14f99545e9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Adds a research/design document proposing a typed MemoryFact schema plus a deterministic reconciliation + rendering pipeline intended to replace hand-maintained CURRENT-*.md / MEMORY.md views with generated outputs.
Changes:
- Defines a v0
MemoryFactschema (fields, invariants, canonical-key normalization). - Specifies reconciliation pseudocode, conflict surfacing, and rendering rules for generated views.
- Outlines a phased migration plan and CI hook ordering for eventual implementation.
…rows (Amara Govern-stage 1/2) Amara's 4th ferry (PR #221 absorb) named populating docs/CONTRIBUTOR-CONFLICTS.md as the Govern-stage action: the schema has existed since PR #166 but the Resolved table was empty despite multiple session-observed contributor-level disagreements that closed with evidence. Backfills three genuine contributor-level conflicts observed this session (narrow scope — not maintainer-directives, which are out-of-scope per the schema's contributor-level disagreement definition): - CC-001: Copilot (PR reviewer) vs Aaron on no-name-attribution rule scope (history-file exemption). Resolved in Aaron's favor via Otto-52 clarification; policy BACKLOG row filed in PR #210. - CC-002: Amara (4th ferry) vs Otto (pre-Otto-67 pattern) on Stabilize-vs-keep-opening-new-frames. Resolved in Amara's favor; Otto pivoted at Otto-68 to execute her roadmap; 3/3 Stabilize + 3/5 Determinize landed via PRs #222/#223/#224/#225/#226. - CC-003: Codex (PR reviewer) vs Otto (initial framing) on citing-absent-artifacts. Resolved in Codex's favor via fix commits 29872af/1c7f97d on PRs #207/#208; pattern now discipline (distinguish merged-on-main from proposed-in-PR-open). All three rows follow the schema's 8-column layout and include the full Resolution-so-far / Scope / Source cells the schema requires. No retroactive Aaron→human-maintainer sweep of prior rows; schema's rule 1 (resolutions are additive) honored. This is 1/2 of Amara's Govern-stage work. 2/2 is the authority-envelope + escalation-path ADR (deferred, M-effort). Part of Amara's 4-stage remediation roadmap (Stabilize → Determinize → Govern → Assure). Otto-75 tick.
…rows (Amara Govern 1/2) (#227) * govern: CONTRIBUTOR-CONFLICTS backfill — 3 resolved session-observed rows (Amara Govern-stage 1/2) Amara's 4th ferry (PR #221 absorb) named populating docs/CONTRIBUTOR-CONFLICTS.md as the Govern-stage action: the schema has existed since PR #166 but the Resolved table was empty despite multiple session-observed contributor-level disagreements that closed with evidence. Backfills three genuine contributor-level conflicts observed this session (narrow scope — not maintainer-directives, which are out-of-scope per the schema's contributor-level disagreement definition): - CC-001: Copilot (PR reviewer) vs Aaron on no-name-attribution rule scope (history-file exemption). Resolved in Aaron's favor via Otto-52 clarification; policy BACKLOG row filed in PR #210. - CC-002: Amara (4th ferry) vs Otto (pre-Otto-67 pattern) on Stabilize-vs-keep-opening-new-frames. Resolved in Amara's favor; Otto pivoted at Otto-68 to execute her roadmap; 3/3 Stabilize + 3/5 Determinize landed via PRs #222/#223/#224/#225/#226. - CC-003: Codex (PR reviewer) vs Otto (initial framing) on citing-absent-artifacts. Resolved in Codex's favor via fix commits 29872af/1c7f97d on PRs #207/#208; pattern now discipline (distinguish merged-on-main from proposed-in-PR-open). All three rows follow the schema's 8-column layout and include the full Resolution-so-far / Scope / Source cells the schema requires. No retroactive Aaron→human-maintainer sweep of prior rows; schema's rule 1 (resolutions are additive) honored. This is 1/2 of Amara's Govern-stage work. 2/2 is the authority-envelope + escalation-path ADR (deferred, M-effort). Part of Amara's 4-stage remediation roadmap (Stabilize → Determinize → Govern → Assure). Otto-75 tick. * govern: annotate CC-002/CC-003 Source cells — PR #221/#219 open, not yet on main Applies CC-003's own discipline (cite-as-open-not-landed) to CC-002 and CC-003 themselves. Both rows cited `docs/aurora/2026-04-23-amara-memory-drift-*` and `docs/aurora/2026-04-23-amara-decision-proxy-*` without the "not yet on main" marker — the files are added by PRs #221 / #219 which are still open. Drain for PR #227 review threads PRRT_kwDOSF9kNM59RFIx and PRRT_kwDOSF9kNM59RFJE (dangling file refs at lines 132, 133). * fix: markdownlint auto-fixes on governance doc Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…y + smart-quote + pseudocode init + present-with-schema Six substantive Codex findings on memory-reconciliation algorithm doc: P0 (line 202) — retraction semantics inconsistency: reconcile() filtered by status == 'active' which masked the intent. Added explicit retraction-semantics docstring: - Facts transition via explicit FactRetracted / FactSuperseded events; never deleted, only marked. - reconcile() ignores retracted/superseded for liveness but STILL considers them when checking version-chain integrity. - Updated chain check to operate over ALL facts in the group (including retracted/superseded), not just active ones — chain integrity needs the full history. P1 (line 187) — stable fact identity vs grouping key: Distinguished fact ID (stable identity, unique) from (subject, predicate, canonical_key) grouping tuple (which multiple facts can share under invariant-2's collision case). Comment makes the distinction explicit. P1 (line 270) — MEMORY.md cap inconsistency: Default 30KB exceeded FACTORY-HYGIENE row #11 cap (24,976 bytes). Updated to 24,000 bytes — strictly under the hard cap with ~1KB headroom for header/annotation overhead. P2 (line 130) — smart-quote example ambiguous: Both sides showed plain ASCII ('"' / "'"). Replaced with explicit Unicode codepoint references (U+201C/D for double, U+2018/9 for single) so the rule is unambiguous in plain-ASCII source. P2 (line 186) — pseudocode by_key[k] used before init: Switched to defaultdict(list); added a comment noting the equivalence to 'if k not in by_key: by_key[k] = []' for non-Python implementers. P2 (line 216) — CONTRIBUTOR-CONFLICTS.md 'empty' wording: File is present and contains a schema; just unpopulated. Updated text to 'present-with-schema-but-unpopulated; this design starts populating it via the generator'.
14f9954 to
16b6ccf
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 16b6ccf67d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ize L-effort item) Amara's 4th ferry (PR #221 absorb) centerpiece proposal: replace hand-maintained CURRENT-*.md distillations with generated views over typed memory facts. Her sketch was ~40 lines of Python; this is the design that downstream implementation follows. ~380 lines covering: - MemoryFact record schema (id / subject / predicate / object / source_kind / source_path / source_anchor / timestamp_utc / supersedes / priority / status / confidence / tags) - 6 schema invariants (at-most-one-active-per-canonical-key + monotone-timestamps-on-chain + retraction-leaves-trail + ...) - Canonical-key normalization rules (7 apply; 3 deliberately NOT applied to preserve distinctions) - Reconciliation pseudocode (group by canonical key, detect conflicts, follow supersession chains) - Conflict output format → CONTRIBUTOR-CONFLICTS.md rows - Rendering rules for CURRENT-<maintainer>.md + MEMORY.md - 5-phase incremental migration (schema adoption → generator prototype → mechanical backfill → cutover → LLM extraction) - CI integration hooks composing with rows #58, #59, #12 - Worked examples (MF-2026-04-23-001 "Aaron endorses deterministic reconciliation"; MF-2026-04-23-004 "Aaron grants full GitHub access") - 5 open questions for Phase 1 PR design decisions Composes with: - Otto-73 retractability-by-design foundation — MemoryFact status (active / superseded / retracted) is the retraction- native primitive at the memory substrate - PR #222 decision-proxy-evidence — consulted_memory_ids can now reference MemoryFact.id directly - PR #225 memory-reference-existence CI (row #59) — generated output preserves the invariant by construction - Zeta's ZSet algebra — MemoryFact records ARE Z-set entries at the memory layer; same primitive, different surface Addresses MEMORY.md cap-drift (Otto-70 snapshot-tool surfaced 58842 bytes vs. 24976-byte cap): a generated index can be bounded by construction (top-N most-recent, archive the rest). Not implementation. Research doc only. Downstream arc: schema adoption (S) → generator prototype off-CI (S-M) → mechanical backfill (M) → cutover with retractability (M) → LLM-assisted extraction (L research). Amara Determinize-stage: 3/5 (with this PR). ✓ Live-state-before-policy (PR #224) ✓ Memory reference-existence lint (PR #225) ✓ Memory reconciliation algorithm design (this PR) Remaining: - Generated CURRENT-*.md views (L; this doc's Phase 2) - Memory duplicate-title lint enforcement (partial via AceHack PR #12; graduates via batch-sync) Per Aaron Otto-73 retractability foundation: the design itself embodies the thesis — supersession + status + retraction make the memory layer's reconciliation deterministic, same primitive as Zeta's data layer. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…y + smart-quote + pseudocode init + present-with-schema Six substantive Codex findings on memory-reconciliation algorithm doc: P0 (line 202) — retraction semantics inconsistency: reconcile() filtered by status == 'active' which masked the intent. Added explicit retraction-semantics docstring: - Facts transition via explicit FactRetracted / FactSuperseded events; never deleted, only marked. - reconcile() ignores retracted/superseded for liveness but STILL considers them when checking version-chain integrity. - Updated chain check to operate over ALL facts in the group (including retracted/superseded), not just active ones — chain integrity needs the full history. P1 (line 187) — stable fact identity vs grouping key: Distinguished fact ID (stable identity, unique) from (subject, predicate, canonical_key) grouping tuple (which multiple facts can share under invariant-2's collision case). Comment makes the distinction explicit. P1 (line 270) — MEMORY.md cap inconsistency: Default 30KB exceeded FACTORY-HYGIENE row #11 cap (24,976 bytes). Updated to 24,000 bytes — strictly under the hard cap with ~1KB headroom for header/annotation overhead. P2 (line 130) — smart-quote example ambiguous: Both sides showed plain ASCII ('"' / "'"). Replaced with explicit Unicode codepoint references (U+201C/D for double, U+2018/9 for single) so the rule is unambiguous in plain-ASCII source. P2 (line 186) — pseudocode by_key[k] used before init: Switched to defaultdict(list); added a comment noting the equivalence to 'if k not in by_key: by_key[k] = []' for non-Python implementers. P2 (line 216) — CONTRIBUTOR-CONFLICTS.md 'empty' wording: File is present and contains a schema; just unpopulated. Updated text to 'present-with-schema-but-unpopulated; this design starts populating it via the generator'.
…tired groups P1 (line 210) — chain-HEAD liveness, not 'any active in group': The reconcile filter marked a key live whenever any record in the group had status==active. That's wrong — a key with active(t=1) → retracted(t=2) has an earlier active record but the HEAD of the supersession chain is retracted, so the key is not live. Fix: `follow_supersession_to_head(group)` walks supersedes-pointers to find the most-recent record; liveness keyed on its status == active. P2 (line 224) — chain integrity for fully retired groups: The chain-integrity check looped over `accepted.items()`, which only included keys with at least one active record. Retired groups (all members retracted/superseded) could have broken chains and we'd silently miss them. Fix: loop over `by_key.items()` (all groups, including fully retired ones). Chain integrity is independent of liveness.
16b6ccf to
32721c0
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 32721c01b8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
P1 (line 222) — invariant-2 winner via priority tie-break: The reconcile pseudocode picked chain_head as the winner in the invariant-2 violation path. Per invariant 6, simultaneous actives must use priority tie-break (max priority, then max timestamp). chain_head determines liveness but NOT winner-among-actives. Updated: actives includes the chain_head + any sibling active records; winner = max by (priority, timestamp_utc). chain_head wins only if it has the highest tuple. P1 (line 128) — markdown formatting delimiters not raw chars: The canonicalization rule removed * _ ` as raw characters, which corrupted identifiers like _internal_var or __private. Reworded to 'strip formatting delimiters' — unwrap text from PAIRED markdown spans, not remove every occurrence. Specifically: **text** / *text* / _text_ / `text` → text, where _text_ stripping requires the contents to match [A-Za-z0-9-]+ so identifiers survive. Single occurrences and unpaired delimiters preserved. P2 (line 296) — MEMORY.md dedup by source_path: Phase 3 backfills multiple typed facts from a single prose memory file. Without dedup, each MEMORY.md index row would emit per-fact rather than per-file, multiplying the index. Updated: dedup by source_path, picking the highest-priority fact per file as the row's representative; row description = first-sentence of that fact; tags = union of all facts from that source_path.
Codex post-merge findings on PR #226 (after the first follow-up #433 merged): P1 (line 244) — CONTRIBUTOR-CONFLICTS.md is already populated: The doc said the file is 'present-with-schema-but- unpopulated'. Wrong: CC-001..CC-003 are filled in (no-name-attribution-rule scope, Stabilize-vs-keep-opening frames, absent-artifact-citation discipline). Updated text to reflect the populated state. P1 (line 270) — Conflict-row format mismatch: Proposed format used '### CONF-<date>-<nnn>' + bullet fields. Actual file uses CC-### IDs in a markdown table with columns (Conflict ID | Date | Question | Parties | Positions | Resolution | Scope | Source). Replaced the proposed format with a template that matches the existing table schema. CC-### counter continues from the highest existing ID (next auto-detected = CC-004). Generator preserves manually-curated rows and only appends auto- detected ones.
P1 (L269) — clarify Open-table targeting: Replaced "appending machine-generated rows" (ambiguous given the file has separate Open / Resolved / Stale tables) with "inserting machine-generated rows into the Open table (or a delimited autogenerated subsection)". Without this, a literal append-to-EOF implementation wouldn't land in Open. P1 (L290) — specify idempotent generator strategy: Added explicit canonical-key → CC-NNN mapping. Generator updates in-place when key already matches; allocates new CC-NNN only when unmapped. Avoids unbounded growth / duplicate entries on repeated CI runs. Documents both strategies (in-place update + delimited autogenerated subsection). P2 (L275) — placeholder labels match column headers: Renamed schema placeholders from "Parties:" / "Resolution:" to "Between:" / "Resolution-so-far:" matching the actual headers in docs/CONTRIBUTOR-CONFLICTS.md (verified via grep). Reduces drift between the design doc and the live schema.
) * drain(#226 follow-up 2: CC log schema + populated state) Codex post-merge findings on PR #226 (after the first follow-up #433 merged): P1 (line 244) — CONTRIBUTOR-CONFLICTS.md is already populated: The doc said the file is 'present-with-schema-but- unpopulated'. Wrong: CC-001..CC-003 are filled in (no-name-attribution-rule scope, Stabilize-vs-keep-opening frames, absent-artifact-citation discipline). Updated text to reflect the populated state. P1 (line 270) — Conflict-row format mismatch: Proposed format used '### CONF-<date>-<nnn>' + bullet fields. Actual file uses CC-### IDs in a markdown table with columns (Conflict ID | Date | Question | Parties | Positions | Resolution | Scope | Source). Replaced the proposed format with a template that matches the existing table schema. CC-### counter continues from the highest existing ID (next auto-detected = CC-004). Generator preserves manually-curated rows and only appends auto- detected ones. * drain(#226 follow-up 2): fix Copilot P1/P2 — schema + idempotency P1 (L269) — clarify Open-table targeting: Replaced "appending machine-generated rows" (ambiguous given the file has separate Open / Resolved / Stale tables) with "inserting machine-generated rows into the Open table (or a delimited autogenerated subsection)". Without this, a literal append-to-EOF implementation wouldn't land in Open. P1 (L290) — specify idempotent generator strategy: Added explicit canonical-key → CC-NNN mapping. Generator updates in-place when key already matches; allocates new CC-NNN only when unmapped. Avoids unbounded growth / duplicate entries on repeated CI runs. Documents both strategies (in-place update + delimited autogenerated subsection). P2 (L275) — placeholder labels match column headers: Renamed schema placeholders from "Parties:" / "Resolution:" to "Between:" / "Resolution-so-far:" matching the actual headers in docs/CONTRIBUTOR-CONFLICTS.md (verified via grep). Reduces drift between the design doc and the live schema.
…e) (#463) Otto-268 follow-on: drain-log for the 3-finding first cascade PR #433 (post-merge follow-up to #226 memory reconciliation algorithm design v0). Captures three orthogonal algorithm-correctness improvements. Per Otto-250 training-signal discipline. Pattern observations: 1. chain_head-liveness vs priority-tie-break-for-winner distinction: two invariants conflated at high level (Invariant 2 vs Invariant 6) distinguished. Spec-correctness findings benefit from per-invariant orthogonal-checks reasoning. 2. Paired-delimiter-vs-raw-character is a normalization-rule precision class. Same shape as #206's K-relations subset-vs-superset precision error. `_internal_var` should be preserved (single delimiters); `_text_` should be unwrapped (paired). 3. Index-rendering dedup is its own correctness class — multiple typed facts from same source_path → ONE index row. 4. Memory-reconciliation algorithm spec benefits from multiple cascade waves: #226 → #433 → #434 walked through schema + invariants + normalization + dedup + CC alignment + idempotent generator across three iterations. Each wave catches a different class of gap.
Summary
Amara's 4th-ferry centerpiece: replace hand-maintained
CURRENT-*.mddistillations with generated views over typed memory facts. ~380-line v0 design for downstream incremental implementation.What landed
docs/research/memory-reconciliation-algorithm-design-2026-04-24.md:MemoryFactschema (13 fields, 6 invariants, canonical-key normalization rules)CURRENT-*.md+MEMORY.mdgenerated outputComposes with Otto-73 retractability-by-design
MemoryFact.status(active / superseded / retracted) IS the retraction-native primitive at the memory substrate — same primitive as Zeta's ZSet algebra, different surface. Aaron's Otto-73 foundation and Otto-74 broader-human-trust addendum both flow through.Addresses known drift
Otto-70 snapshot tool surfaced MEMORY.md at 58,842 bytes vs. 24,976-byte cap. Generated index can be bounded by construction (Phase 4 cutover archives older entries).
Amara Determinize progress
What this PR is NOT
Test plan
🤖 Generated with Claude Code