-
Notifications
You must be signed in to change notification settings - Fork 0
substrate: AceHack pre-reset SHA-loss acceptable + multi-tenant fork-storage on LFG for collective training (Aaron 2026-04-27) #53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,122 @@ | ||||||
| --- | ||||||
| name: AceHack pre-reset SHA-history loss is acceptable — LFG is the preservation layer; fork-storage locations in LFG capture fork-specific high-signal data (Aaron 2026-04-27) | ||||||
| description: Aaron 2026-04-27 confirmation that AceHack's pre-reset parallel-SHA history dropping during the topology-collapse hard-reset is acceptable — AceHack is the dev-mirror by design; LFG is what we preserve. Fork-specific high-signal data (PR review threads, drain logs, decision records) gets captured on LFG via dedicated fork-storage locations like `docs/pr-preservation/`. The substrate-value loss from dropping AceHack's pre-reset SHA layer is zero — content survives via prior squash-merges, conversation archives survive via fork-storage. Going forward (post-0/0/0), both forks share identical SHAs and the question becomes moot. | ||||||
| type: feedback | ||||||
| --- | ||||||
|
|
||||||
| # AceHack pre-reset SHA-history loss is acceptable — LFG is the preservation layer | ||||||
|
|
||||||
| ## Verbatim quote (Aaron 2026-04-27) | ||||||
|
|
||||||
| After Otto laid out the nuance ("AceHack's pre-reset parallel-SHA history disappears from the live branch during hard-reset; content preserved on LFG via prior squash-merges"): | ||||||
|
|
||||||
| > "that's fine this is our dev setup anyways, LFG history is what we are preserving, it will all be the same anyways going forward. And we have the fork storage locations in lfg for any fork specific stuff that ends up in lfg for data collection purposes, nice clean high singnal data ffom the sources like the PR reviews threads" | ||||||
|
|
||||||
| ## Three-layer preservation accounting | ||||||
|
|
||||||
| When AceHack hard-resets to LFG main, the question "what's lost?" needs answering at three layers: | ||||||
|
|
||||||
| ### Layer 1: Content (what the code/docs say) | ||||||
|
|
||||||
| **Lost?** No. | ||||||
|
|
||||||
| Every AceHack-unique line gets forward-synced to LFG before the hard-reset (via paired-sync rounds). After hard-reset, AceHack absorbs LFG's complete state — content is the union of both forks. The dev-mirror topology + double-hop workflow + the path-to-start work today made this true. | ||||||
|
|
||||||
| ### Layer 2: Commit SHAs and commit messages (the audit trail of when-which-line-changed) | ||||||
|
|
||||||
| **Lost?** AceHack's pre-reset SHAs disappear from AceHack's live branch history. Yes-but-irrelevant. | ||||||
|
|
||||||
| The SHAs of AceHack's pre-reset 80-ish unique commits are dropped from the live tree during force-push. Their CONTENT is preserved on LFG (via prior squash-merges with different SHAs), but the specific commit-message-text + SHA-identity disappears. | ||||||
|
|
||||||
| This is acceptable because: | ||||||
|
|
||||||
| - **AceHack is the dev-mirror, by design transient** — *"this is our dev setup anyways"* (Aaron). Force-pushes to AceHack main are part of the protocol. | ||||||
| - **LFG is what we preserve** — *"LFG history is what we are preserving"* (Aaron). LFG main's commit history is append-only via PR squash-merge; that history IS the canonical record. | ||||||
| - **Going forward both forks share SHAs** — *"it will all be the same anyways going forward"* (Aaron). After 0/0/0 starting point, every paired-sync round produces identical SHAs on both forks. The pre-reset asymmetry is a one-time topology collapse, not an ongoing pattern. | ||||||
|
|
||||||
| ### Layer 3: High-signal artifact data (PR review threads, drain logs, decisions) | ||||||
|
|
||||||
| **Lost?** No. | ||||||
|
|
||||||
| Aaron's framing (compounded across two messages): | ||||||
|
|
||||||
| > "we have the fork storage locations in lfg for any fork specific stuff that ends up in lfg for data collection purposes, nice clean high singnal data ffom the sources like the PR reviews threads" | ||||||
|
|
||||||
| > "PR review threads + conversation archives: LFG has a location for all forks that want to send back PR threads/ cost data, whatever fork specific stuff that LFG collects but in a way where all fork specific can keep it's data on LFG too so everyone can train from it and learn form it." | ||||||
|
|
||||||
| This is **multi-tenant fork-storage-on-LFG** — not just AceHack's location. Any fork (current or future) that wants to send back fork-specific data has a place on LFG to keep it, in a way that lets all contributors train from / learn from the collective dataset. | ||||||
|
|
||||||
| ### The architecture | ||||||
|
|
||||||
| LFG has dedicated **fork-storage locations** that preserve fork-specific high-signal artifacts. Today's set: | ||||||
|
|
||||||
| - **`docs/pr-preservation/`** — drain logs of PR conversation archives (Otto-250 discipline). Captures review threads as high-signal labeled training data per the "PR reviews are training signals" memory. | ||||||
| - **`docs/hygiene-history/`** — tick-history + drain-logs from autonomous-loop ticks. | ||||||
| - **`docs/DECISIONS/`** — ADR records of architectural decisions. | ||||||
| - **`docs/research/`** — research history. | ||||||
| - **`docs/aurora/`** — courier-ferry archive (cross-AI research). | ||||||
| - **`docs/budget-history/`** — cost-data snapshots (the "cost data" Aaron flags explicitly). | ||||||
| - **`memory/`** (factory-wide memory + persona notebooks) — substrate that survives compaction. | ||||||
| - Commit messages and PR titles/bodies on LFG side — git-native record. | ||||||
|
|
||||||
| ### Multi-tenant by design — collective training/learning purpose | ||||||
|
|
||||||
| Aaron's load-bearing framing: *"all forks that want to send back ... whatever fork specific stuff ... in a way where all fork specific can keep it's data on LFG too so everyone can train from it and learn form it."* | ||||||
|
|
||||||
| The fork-storage architecture is NOT just for AceHack's review threads — it's **multi-tenant**: | ||||||
|
|
||||||
| - **Any fork** (AceHack today; possible future forks under different maintainer-agent pairs) can write its fork-specific artifacts to LFG's fork-storage paths. | ||||||
| - **Each fork keeps its own data** — the storage is partitioned/labeled per fork, not merged into a single anonymous heap. | ||||||
| - **All contributors can read** all forks' data — the storage is collective-readable, even if write-partitioned. | ||||||
| - **The purpose is training + learning** — high-signal labeled data (PR review threads with reviewer judgments, cost-data snapshots showing real budget patterns, drain logs showing real-world failure-recovery sequences) becomes a training corpus for both AI agents and human contributors. | ||||||
|
|
||||||
| ### Data types beyond review threads | ||||||
|
|
||||||
| Aaron's list is open-ended (*"whatever fork specific stuff"*) but explicitly names two categories: | ||||||
|
|
||||||
| - **PR review threads** — captured via `docs/pr-preservation/` drain logs (Otto-250). | ||||||
| - **Cost data** — captured via `docs/budget-history/snapshots.jsonl` and the budget-cadence weekly workflow (task #297). | ||||||
|
||||||
| - **Cost data** — captured via `docs/budget-history/snapshots.jsonl` and the budget-cadence weekly workflow (task #297). | |
| - **Cost data** — captured via `docs/budget-history/snapshots.jsonl` and the budget-cadence workflow under `tools/budget/`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
memory/README.mdsaysMEMORY.mdentries should be “kept terse” (one line per file, newest-first). This new index entry is extremely long and duplicates detail that’s already in the memory file itself; consider shortening it to a brief clause (enough to disambiguate the topic) and rely on the linked file’s frontmatter/body for the full explanation.