Skip to content

backlog: closure-table hardening for fast-git (pluggable hierarchical index)#396

Merged
AceHack merged 1 commit intomainfrom
backlog/closure-table-fast-git
Apr 25, 2026
Merged

backlog: closure-table hardening for fast-git (pluggable hierarchical index)#396
AceHack merged 1 commit intomainfrom
backlog/closure-table-fast-git

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented Apr 24, 2026

Summary

Maintainer 2026-04-24 directive — closure-table needs hardening for filesystem-class workloads to support the native F# git implementation (#395 cluster). Make pluggable so faster substrate can swap in if profiling shows bottleneck.

Phase 0 research scope captured

  • State-of-the-art survey of hierarchical indexes (closure-table / nested-set / materialized-path / Postgres ltree / B-tree / radix-trie / Verkle / Merkle Patricia).
  • Alternative substrate candidates for plug-in: B-trees (ZFS/btrfs/NTFS scale), Patricia/HAMT/CRDT-tree, Dolt + TerminusDB precedents (already speak git over versioned-DB).
  • IHierarchicalIndex contract definition.
  • Empirical baseline benchmark on representative repo (Linux kernel? Chromium? Zeta itself?).

Composes with

Test plan

  • Single BACKLOG row in P2 — research-grade.
  • Maintainer directive verbatim preserved.
  • Otto-275 log-don't-implement: research scope captured; does NOT authorize implementation start.

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings April 24, 2026 23:58
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@AceHack AceHack enabled auto-merge (squash) April 24, 2026 23:58
@AceHack AceHack force-pushed the backlog/closure-table-fast-git branch from 8638985 to 9c868d0 Compare April 24, 2026 23:59
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new P2 research-grade BACKLOG entry to scope and track Phase 0 research for hardening Zeta’s hierarchy/closure-table approach into a pluggable hierarchical index suitable for filesystem-scale Git workloads (as part of the broader native F# git effort).

Changes:

  • Adds a P2 BACKLOG row capturing the 2026-04-24 maintainer directive verbatim.
  • Documents Phase 0 research scope (survey + interface sketch + baseline benchmark) and how it composes with related initiatives (#395, #394).
Comments suppressed due to low confidence (2)

docs/BACKLOG.md:5625

  • cgit is a web UI for Git repositories, not a Git implementation/library to benchmark native Git performance against. Consider replacing this comparison with something like libgit2 / JGit / core git CLI (or clarify what “compete” means here).
  > on top of our distributed db. we can stick a ui in
  > front of that too lol. Also you need to do a lot of
  > research here cause some nodes will try to call you a

docs/BACKLOG.md:5638

  • The claim that filesystem trees can be “100k+ files deep” is likely incorrect/misleading (depth is typically constrained by path length / OS limits). Suggest rewording to something like “100k+ nodes (files+dirs) total, very wide; depth varies but is usually far smaller” so the research scope stays accurate.
  > them in streams and maybe further. backlog"*

  **Two load-bearing motivations:**
  1. **Aurora preparation** — Zeta's own blockchain-ish

… index)

Maintainer 2026-04-24 directive — closure-table substrate
needs hardening to support filesystem-class workloads
(deep + wide trees, 100k+ files) for the native F# git
implementation. Make the index pluggable so a faster
substrate can swap in if profiling shows it's the
bottleneck. Maintainer hasn't looked at space/time
tradeoffs; backlog research.

Phase 0 research scope captured in the row:
  - State-of-the-art survey: nested-set, materialized-path,
    closure-table, Postgres ltree, B-tree-prefix-index,
    radix-trie, Verkle/Merkle Patricia.
  - Substrates worth interface-compatibility: B-trees
    (ZFS/btrfs scale), Patricia/HAMT/CRDT-tree, Dolt /
    TerminusDB existing precedents.
  - Define IHierarchicalIndex contract.
  - Empirical baseline benchmark on representative repo.

Composes with native F# git impl (#395 cluster as primary
consumer), Mode 2 protocol upgrade, Ouroboros bootstrap
meta-thesis (index correctness IS part of the closure
proof), blockchain-ingest (#394 — block hierarchy may
share the same abstraction).

Otto-275 log-don't-implement: row captures research scope,
does NOT authorize implementation start.
@AceHack AceHack force-pushed the backlog/closure-table-fast-git branch from 9c868d0 to a4d7c32 Compare April 25, 2026 00:21
@AceHack AceHack merged commit a71986a into main Apr 25, 2026
13 checks passed
@AceHack AceHack deleted the backlog/closure-table-fast-git branch April 25, 2026 00:24
AceHack added a commit that referenced this pull request Apr 25, 2026
…indexes

Maintainer 2026-04-24 directive — every first-class interface on
Zeta's substrate (git, SQL, operator algebra, LINQ, future
GraphQL / blockchain query / WASM-RPC) must compose with every
other interface. Mixed-DSL queries must:
  (1) parse + bind through unified type system
  (2) plan through cost-based optimizer (full mixed AST)
  (3) hit indexes for each constituent DSL
  (4) preserve retraction semantics end-to-end

Architectural primitive captured: this is a direct application
of the 2026-04-22 semiring-parameterized Zeta substrate research
("one algebra to map the others"). With operator algebra
parameterized by a semiring, every other DSL's semantics maps
into the same one algebra by semiring-swap, and cross-DSL
composability falls out for free.

Phased: Phase 0 design proposal → pairwise adapters → unified
planner/binder → index-utilization audit → retraction-preservation
proof.

Composes with closure-table hardening (#396 — the hierarchical
index this layer hits), native F# git impl (#395), Ouroboros
bootstrap meta-thesis (cross-DSL composability IS an Ouroboros
closure), semiring-parameterized substrate, blockchain ingest
(#394 — chain queries compose via same substrate).

Otto-275 log-don't-implement: research scope captured; does NOT
authorize implementation start.
AceHack added a commit that referenced this pull request Apr 25, 2026
…indexes (#397)

* backlog: cross-DSL composability — git/SQL/operator-algebra/LINQ hit indexes

Maintainer 2026-04-24 directive — every first-class interface on
Zeta's substrate (git, SQL, operator algebra, LINQ, future
GraphQL / blockchain query / WASM-RPC) must compose with every
other interface. Mixed-DSL queries must:
  (1) parse + bind through unified type system
  (2) plan through cost-based optimizer (full mixed AST)
  (3) hit indexes for each constituent DSL
  (4) preserve retraction semantics end-to-end

Architectural primitive captured: this is a direct application
of the 2026-04-22 semiring-parameterized Zeta substrate research
("one algebra to map the others"). With operator algebra
parameterized by a semiring, every other DSL's semantics maps
into the same one algebra by semiring-swap, and cross-DSL
composability falls out for free.

Phased: Phase 0 design proposal → pairwise adapters → unified
planner/binder → index-utilization audit → retraction-preservation
proof.

Composes with closure-table hardening (#396 — the hierarchical
index this layer hits), native F# git impl (#395), Ouroboros
bootstrap meta-thesis (cross-DSL composability IS an Ouroboros
closure), semiring-parameterized substrate, blockchain ingest
(#394 — chain queries compose via same substrate).

Otto-275 log-don't-implement: research scope captured; does NOT
authorize implementation start.

* drain(#397): fix 5 Copilot threads on cross-DSL composability row

P0/P1/P1/P1/P2 from late Copilot re-review on the freshly-opened
PR. All five fixes land as in-place edits to the new BACKLOG row
(the row itself was added by this PR, so this is not an
append-only-file violation).

- title: rewrap so `operator-algebra` stays contiguous (P1).
- body: rewrap `closure-table-hardening` contiguous (P1).
- body: rewrap inline-code `query-optimizer-expert` contiguous
  (P0 — inline-code split breaks rendering and grep).
- composes-with: closure-table dependency pointer made concrete
  — names `src/Core/Hierarchy.fs` and the "Closure-table over
  DBSP" research row under `## Research projects` instead of a
  non-existent "same section" hardening row (P2).
- semiring memory pointer: add `memory/` prefix to match the
  convention used at the existing semiring rows (P1).

Drain log at `docs/pr-preservation/397-drain-log.md` per
Otto-250.
AceHack added a commit that referenced this pull request Apr 25, 2026
…le + safe-ROM substrate (#400)

* artifact-c: tools/alignment/audit_archive_headers.sh — archive-header lint v0 (detect-only)

Amara's 5th-ferry Artifact C landing (PR #235 absorb).

Detect-only lint for the four archive-header fields proposed
in §33 (PR #235 exemplar; not yet governance-landed):

- Scope:
- Attribution:
- Operational status:
- Non-fusion disclaimer:

Defaults to checking docs/aurora/*.md; --path DIR overrides.
--enforce flips exit 2 on any gap; CI does not currently call
it (Aminata Otto-80 pass classified §33 as IMPORTANT-pending-
Aaron-signoff + lint-required-to-prevent-3-5-round-decay).

First-run baseline: 2/2 existing aurora absorbs missing all
four headers (predate the proposal). Detect-only first
prevents CI block on baseline; enforcement flips when Aaron
signs off on §33 + baseline is green (either backfill the
2 absorbs or explicit grandfather clause in §33).

v0 limitations documented in script:
- Partial-header adversary (label anywhere in first 20 lines
  passes; no syntactic check).
- Fake-header adversary (values not content-audited).
- In-memory-import adversary (memory/ not covered; different
  surface).

Harden in follow-up after §33 lands.

Bash 3.2 compatible (while-read loop, not mapfile) for macOS
default shell.

Same --json / --out DIR / exit code shape as existing
audit_commit.sh / audit_personas.sh / audit_skills.sh.

FACTORY-HYGIENE row #60 added:
- Detect-only cadence landed.
- Enforcement deferred until Aaron §33 signoff + baseline
  green.
- Same detect-only → triage → enforce pattern as rows #51
  (cross-platform parity) and #55 (machine-specific scrubber).

tools/alignment/README.md table updated with new row.

Composes with:
- Aminata threat-model pass (PR #241; names the decay risk
  this lint prevents).
- Amara's 5th-ferry absorb (PR #235; exemplar self-applies
  the format).
- Memory-index hygiene trio (rows #58 / #59 + this row's
  archive-header hygiene trio).

Otto-81 tick deliverable.

* drain(#243): seven Copilot/Codex threads — recursive scan + name-attribution + exit-code alignment

- Switch audit_archive_headers.sh from -maxdepth 1 to recursive find
  matching documented `docs/aurora/**/*.md` scope; exclude
  `references/` as bibliographic substrate.
- Encode subdirectory in --out per-file JSON basename to avoid
  collisions under recursive scan.
- Replace 'Aaron' with 'human-maintainer' role ref in script and
  FACTORY-HYGIENE row 60 (FACTORY-DISCIPLINE name-attribution rule).
- Drop persona names (Aminata, Amara) from script comments and
  row 60 in favour of role references (threat-model reviewer,
  absorbing agent), per Otto-220 code-comments-explain-code rule.
- Realign exit codes to sibling audit_*.sh convention: 1 =
  content-level signal under --enforce; 2 = script error /
  missing dependency / bad arg. Update header doc-block + row 60
  wording to match.
- Remove dead cross-reference to non-existent
  `docs/aurora/2026-04-23-amara-zeta-ksk-aurora-validation-5th-ferry.md`
  in row 60. Verified the aminata-threat-model-5th-ferry citation
  does exist on origin/main; kept that one.
- Append docs/pr-preservation/243-drain-log.md per Otto-250.

Smoke-tested: clean run exit 0 (16 files scanned), --enforce exit 1,
bad --path exit 2, --json exit 0, --out has no basename collisions.

* drain(#243): quote target_path inside parameter expansion (SC2295)

Local shellcheck install only flagged this on the lint runner with
--severity=style. Quote $target_path inside the ${file#...}
prefix-strip so the prefix is not interpreted as a glob pattern.

* drain: PR #243 round 2 — address 6 late-review threads

Round 2 drain after round 1 closed all 7 threads. Copilot
re-reviewed and opened 6 new P2 suggestion-shape threads;
all 6 are FIX outcomes:

- r2-1 (line 128): normalise --path to strip trailing slash
  so `docs/aurora/` matches the references/ exclusion.
- r2-2 (line 172): make --out filename encoding injective
  by percent-encoding literal `_` to `_5F` before the
  `/` -> `__` swap. Was non-injective: `a/b__c.md` and
  `a__b/c.md` both became `a__b__c.json`.
- r2-3 (line 26): fix stale Usage wording — `--enforce`
  exits 1 on gap (matches the dedicated Exit-codes section
  and round-1 Thread-7 realignment).
- r2-4 (line 61): correct factual error about memory
  surface — in-repo `memory/` is canonical per
  GOVERNANCE.md §18 and `memory/README.md`; per-user path
  is staging.
- r2-5 (line 128): force C-locale sort with `LC_ALL=C`
  for deterministic byte-order output regardless of caller
  env.
- r2-6 (line 7): drop persona name "Amara" from header
  banner in favour of role/artifact references
  ("5th-ferry Artifact C" / "the 5th-ferry external-
  research absorb"). Round 1 caught "Aaron" but missed
  "Amara".

Append-only drain-log update per Otto-229: prior round-1
sections untouched; new "Drain pass: 2026-04-24 (round 2 —
6 threads)" section appended.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* backlog+memory+roms: emulators on OS-interface + rewindable/retractable controls + safe-ROM substrate

Maintainer 2026-04-24 directive — emulators are the canonical
proof-out workload for the OS-interface (#399). Two related
directives captured:

(1) "emulators should run very nicely on this, let me know
    when you want some roms of any kind that are safe."

(2) "rewindable/retractable os/emulator controls"

Plus: maintainer requested a `roms/` folder with a
gitignored-except-sentinels pattern (same as `drop/`) so
binaries never enter git history but the directory exists
on every clone.

Why emulators compose perfectly with the OS-interface:
  - Emulator event loop = durable-async runtime workload
  - Save states FREE (every yield-point = checkpoint)
  - Cross-node migration FREE (state follows the function)
  - Multiplayer FREE (shared durable substrate)
  - DST guarantees speedrun/TAS bit-equal replay

Rewindable/retractable controls — the killer generalization:
  - Z-set retraction-native semantics extend UP to OS surface
  - "Rewind 5 seconds" is a first-class OS primitive
  - rr / Pernosco architectural class, generalized
  - Otto-238 trust-vector: rewindable controls grant agency

Activates 2026-04-22 ARC-3 adversarial-self-play
absorption-scoring research (level-creator / adversary /
player loop on durable-async + rewindable substrate).

Phased: Phase 0 research (Game Boy / NES / SNES / Genesis;
libretro; rr/Pernosco) → Phase 1 single emulator on
durable-async → Phase 2 rewindable controls promoted to
OS primitive → Phase 3 ARC-3 loop → Phase 4 cross-emulator
composition.

Safe-ROM offer captured durably; ask gated on Phase 1
landing first. Allowed classes enumerated in roms/README.md
(public-domain / homebrew / official test suites /
commercially-released-as-free / explicit-license).

Otto-275 log-don't-implement applies. Composes with #399
OS-interface, Otto-73/238/272, Z-set retraction-native,
#396/#397 closure-table+cross-DSL, request-play skill.

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants