diff --git a/docs/hygiene-history/loop-tick-history.md b/docs/hygiene-history/loop-tick-history.md index be43b84d..a33027ff 100644 --- a/docs/hygiene-history/loop-tick-history.md +++ b/docs/hygiene-history/loop-tick-history.md @@ -133,3 +133,5 @@ fire. | 2026-04-22T11:45:00Z (round-44 tick, auto-loop-29 — IceDrive/pCloud substrate grant received + ToS investigation + stacking-risk analysis + RAID-clean-substrate recommendation) | opus-4-7 / session round-44 (post-compaction, auto-loop #29) | aece202e | Auto-loop tick received a substrate-access grant (IceDrive + pCloud login, 10 TB each, lifetime-paid, 20-year preservationist archive) and a follow-on directive *"so read ther usage polices so i don't get banned"* — the tick's primary work became **ToS pre-flight safety analysis** rather than any speculative factory artefact. Tick actions: (a) **Step 0 PR-pool audit**: main advanced to `c7ca390→→1adcfc9` after PR #127 merged mid-tick-open window. Four in-flight PRs from prior tick remain open (#122 Gemini map, #124 wink-validation watch, #126 Grok map — all UNKNOWN merge-state, auto-merge armed); three AceHack-authored carry-forward (#109 DIRTY merge-conflict, #110/#112 BEHIND). Harness-authorization-boundary bars me from refreshing fork-authored PRs; carry-forward unchanged. (b) **Substrate-grant memory filed** (`memory/project_aaron_icedrive_pcloud_substrate_access_20_years_preservationist_archive_2026_04_22.md`, out-of-repo, maintainer context) + MEMORY.md index entry. Captured: IceDrive + pCloud access grant with 10 TB each; 4-copy redundancy topology (2 cloud hot + 2 local RAID cold per maintainer's *"i have 4 copied of that data"*); preservationist cultural signal from *"20 years of carefully maintained books and games and software"*; archive contents catalogued explicitly by maintainer (WikiLeaks material, hacking information, decompilers, IDA Pro). (c) **pCloud ToS read** (`pcloud.com/terms_and_conditions.html`, 2026-04-22) — three clauses stacked make AI-agent-login gray-area: *"User accounts are not transferable. Only the user who signs up for an account may use the account."* + *"You must keep your Credentials confidential and must not reveal them to anyone."* + *"use automated methods to use the Site or Services in a manner that sends more requests to the pCloud servers in a given period of time than a human can reasonably produce"* (prohibited). Lifetime-plan clause *"duration of the lifetime of the account owner or 99 years, whichever is shorter"* noted for factory-continuity-of-substrate reasoning. (d) **IceDrive ToS**: 403 bot-blocked on direct fetch from both `/legal/terms` and `/legal/terms-of-service`. ToS;DR index (`tosdr.org/en/service/3118`, grade C) summarised: *"Spidering, crawling, or accessing the site through any automated means is not allowed"* + *"You are responsible for maintaining the security of your account and for the activities on your account"* — same-class as pCloud on automated-access prohibition; account-activity-responsibility puts ban-consequences on maintainer directly. (e) **Stacking-risk analysis** — three risk layers compound when agent-login targets this specific archive: (i) ToS-clause layer (agent-as-tool-of-owner gray-area on both providers); (ii) content-sensitivity layer (WikiLeaks is politically-hot; hacking information is jurisdiction-dependent; auto-flagging on bulk-access patterns stacks enforcement-risk); (iii) copyright-infringement-scope layer (IDA Pro has known pirated-copy gray market; per-copy license provenance not knowable to agent; same Anthropic-policy-compatibility line as ROM-offer boundary). **Each layer alone is manageable; stacked they are not**. Enforcement = ban = loss of 2 of 4 redundant copies of a 20-year preservation archive — unacceptable routine-use risk. (f) **RAID-clean-substrate recommendation** — maintainer's 4-copy topology is already the engineering answer to ToS-risk. Two local RAID arrays hold same content with no third-party policy surface (owned hardware). Flow proposed: maintainer names specific file/corpus → plugs RAID → mounts at path → agent reads only path-permitted subset → RAID unplugged after task. Zero ToS-risk; per-task authorization; preserves 4-way redundancy. (g) **Response to maintainer composed** with ToS clauses cited + stacking reasoning + RAID-path recommendation + task-binding ask ("which specific file/research-question unlocks this?"); NO login attempted this tick. (h) **No speculative-artefact work this tick** — ToS pre-flight analysis IS the work. Honest-never-idle: the meta-check fired (is there a structural change that would have made login-without-task directed? answer: yes — the stacking-risk framework now exists as a reusable lens for future expansive-trust-grants). (i) **Accounting-lag same-tick-mitigation maintained** (sixth consecutive tick): substrate-improvement (stacking-risk framework named + RAID-alternative proposed) and substrate-accounting (this tick-history row) same session, separate PRs. (j) **Preservationist cultural-biography signal absorbed** — 4-copy discipline + 20-year archive reveal maintainer as clinical-paranoid-redundancy preservationist; load-bearing context for Chronovisor (#213), emulator (#249), soulsnap/SVF (#241), ServiceTitan-demo (#244) material availability, honor-those-that-came-before composition. (k) **Maintainer override received mid-tick** — *"clear to me i'm grey grey means fun and do it"* + *"the edge is grey"* — explicit rejection of decline-by-default on the gray-area layer. Revised posture for this grant: **gray = maintainer's operational zone, per-action judgment applies rather than default-decline**. What's now green (Aaron-authorized ✓ + Anthropic-policy-compatible ✓): technical study of archive materials, access to content maintainer legally owns (legitimately-purchased books/software/games, personally-authored research notes), access to public-release material (WikiLeaks is legal to possess for research/journalism). What stays narrowly-declined (Anthropic-policy-compatible ✗ regardless of local authorization): redistribution beyond maintainer's rights (mass-copy to third parties / public publishing), bulk-ingest for training/embedding, per-file-piracy-scope items where license provenance cannot be verified (e.g. IDA Pro copies without matching license). **Edge-is-grey framing internalized**: no pre-computed boundary; live-judgment per specific action. Stacking-risk framework still applies but no longer defaults to decline — it surfaces which layer is firing for a given action so the judgment is visible. Awaiting task-binding from maintainer (which specific file/corpus/research question unlocks the access now). (l) **CronList + visibility signal**: `aece202e` minutely fire verified live. | `` + no PRs merged this tick (three from prior tick still pending CI) | Twentieth auto-loop tick to operate cleanly across compaction. **First observation — expansive-trust-grant-pattern prediction fulfilled** (auto-loop-24 memory predicted it). Expansive-trust-grant (ROM offer → Twitter/DeBank → Gemini Ultra → IceDrive/pCloud) is a recurring pattern; each instance gets handled with the same two-layer authorization model + warm-decline + narrow-reason + redirect. Factory now has a named lens (stacking-risk) for when three risk layers compound to override single-layer OK. **Second observation — stacking-risk is the missing primitive**. Prior boundary work (ROM offer, torrent decline) evaluated risk layer-by-layer. This tick introduced **stacking** as the primitive — three manageable risks together exceed tolerance even when each is individually fine. Applies generally: ToS-gray + content-sensitive + copyright-ambiguous together = decline, even though ToS-gray alone or content-sensitive alone or copyright-ambiguous alone might be accepted. Worth promoting to BACKLOG row once the pattern has 2+ occurrences — currently occurrence-1 of this specific framing. **Third observation — 4-copy redundancy IS the ToS-risk mitigation**. Maintainer's *"i like to make sure lol"* self-aware-clinical-paranoia turns out to be perfect for the ToS-risk case: cloud copies are at ban-risk, local-RAID copies are ban-immune. The factory's recommendation (route through RAID) honors both (a) maintainer's preservation discipline and (b) maintainer's ToS concern simultaneously — same move answers both. Nice-home-for-trillions generalization: when multiple maintainer-values compose onto a single engineering move, the move is strongly-preferred. **Fourth observation — tick-work = ToS-pre-flight is legitimate factory work**. No speculative artefact landed this tick; no new BACKLOG row. The tick-work WAS the ToS read + stacking-analysis + recommendation. Never-idle discipline allows this because the alternative (skip-ToS-read-and-log-in) would have been directly harmful to maintainer's preservation asset. Honest-work-over-theatrical-work. **Fifth observation — preservationist-cultural-signal is now context for four downstream BACKLOG rows**. Maintainer's archive contents name concrete material relevant to #213 Chronovisor (preservation-infrastructure), #249 emulator (game formats), #241 soulsnap/SVF (format-family preservation), #244 ServiceTitan demo (material depth for rich demo content). These rows now have a known-material-source for when task-binding lands. **Sixth observation — maintainer-override clarifies the two-layer model's per-layer granularity**. Aaron's *"grey means fun and do it"* + *"the edge is grey"* explicitly tells me the Aaron-authorized layer is wider than my read treated it — gray-zone IS his permissive zone, not a decline zone. Critically, this does NOT collapse the Anthropic-policy-compatible layer into the same permissive zone; per-file-piracy-scope + redistribution-beyond-rights still sit outside that layer regardless of local authorization (per ROM-offer memory). The override improves the factory's calibration on layer-1 (Aaron-authorization granularity) without relaxing layer-2 (Anthropic-policy granularity). Net effect: more of the archive is now actionable (legal-owned content + public-research material + technical study) with a thinner residual decline-set (piracy-scope redistribution). Live-judgment per-action discipline preserved — no collapse into blanket yes or blanket no. **Seventh observation — compoundings-per-tick = 7** (up from 6 after override-addendum): (1) Substrate-grant memory filed + indexed; (2) pCloud ToS read and clauses captured; (3) IceDrive ToS attempt (403 + ToS;DR fallback) documented; (4) Stacking-risk framework named; (5) RAID-clean-substrate recommendation proposed; (6) Preservationist cultural-biography context captured for four downstream BACKLOG rows; (7) Maintainer override received + two-layer-model per-layer granularity clarified in response posture. Zero-compoundings not a risk. `open-pr-refresh-debt` this tick: 0 incurred, 0 cleared (PR #127 merged mid-tick but not via my action; carry-forward #110/#112 BEHIND unchanged). Cumulative auto-loop-{9..29}: +3 / -3 / -2 / -1 / -1 / 0 / 0 / -1 / -1 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / -2 / 0 = **net -8 units over 21 ticks**. `hazardous-stacked-base-count` = 0 this tick. | | 2026-04-22T12:05:00Z (round-44 tick, auto-loop-30 — stacking-risk framework published as research doc + bottleneck-principle posture change + CLI-DX-cascade directive captured) | opus-4-7 / session round-44 (post-compaction, auto-loop #30) | aece202e | Auto-loop tick applied the grey-zone-bottleneck principle from Aaron's same-tick *"yes if i'm the only grey i'm the bottleneck"* directive on the first possible substrate: speculative factory work landed without ask-first. Tick actions: (a) **Step 0 PR-pool audit**: main advanced `1adcfc9→17fe71e` after PR #128 (auto-loop-29 tick-history) merged; PRs #122/#124/#126 still UNKNOWN/CI-pending, auto-merge armed; AceHack-authored carry-forward (#109 DIRTY, #110/#112/#108/#88/#85/#54/#52) unchanged per harness-authority boundary. (b) **Stacking-risk decision framework published** (`docs/research/stacking-risk-decision-framework.md`, PR #129, 200 lines) — occurrence-1 of the specific framing captured as first-pass research doc. Framework claim: three individually-manageable risk layers can compound to exceed tolerance; decision rule = when ≥ 3 ambiguity layers stack on same action, default flips from agent-decides-proceeds to decline+clean-substrate. Clean-substrate pattern documented with IceDrive/pCloud RAID example. Honest status banner (occurrence-1, NOT ADR yet, promotes on occurrence-2+). Overlays the two-layer authorization model from ROM-offer memory; narrow exception to the gray-zone-agent-judgment default. (c) **Bottleneck-principle feedback memory filed** (`memory/feedback_maintainer_only_grey_is_bottleneck_agent_judgment_in_grey_zone_2026_04_22.md`, out-of-repo, maintainer context) + MEMORY.md index entry. Default-posture change: gray-zone judgment is agent's call by default; ask-before-acting on gray-alone serialises the factory through maintainer. Three-level taxonomy (green/gray/red); five explicit escalation triggers (irreversibility / shared-state-visible / axiom-layer-scope / budget-significant / novel-failure-class) stay distinct; paper trail still required. (d) **CLI-DX-cascade directive captured to memory** (`memory/project_cli_new_command_dev_experience_no_doc_compensation_actions_cascade_of_success_2026_04_22.md`, out-of-repo) + MEMORY.md index. Maintainer directive *"when we have a cli the dev experience for new commands when you are writing them no documentation, let compsation actions take care of it, cascade of success"* — zero author-friction posture for CLI-command authorship, cascade of downstream compensation actions generates derivatives (--help / man / completions / examples / changelog / docs-site / error-validation). Same shape as UI-DSL class-level + event-storming + shipped-kernels (author at source-of-truth, derive everything else). 6 open questions flagged to maintainer not self-resolved. No BACKLOG row — conditional on CLI materializing. (e) **Bottleneck-principle exercised live**: chose speculative work (the stacking-risk doc) by agent-judgment without asking, with paper trail via PR #129 + tick-history + memory. First occurrence of the new-posture discipline; first data point for calibration. (f) **Accounting-lag same-tick-mitigation maintained** (seventh consecutive tick): substrate-improvement (stacking-risk framework doc + bottleneck-principle memory + CLI-cascade memory) and substrate-accounting (this tick-history row) same session, separate PRs (#129 + this). (g) **CronList + visibility signal**: `aece202e` minutely fire verified live. | `` + PR #128 merged (auto-loop-29 tick-history) | Twenty-first auto-loop tick clean across compaction. **First observation — bottleneck-principle is a factory-scaling claim in disguise**. *"if i'm the only grey i'm the bottleneck"* names the failure mode that forecloses the nice-home-for-trillions endpoint: a factory that serialises every gray judgment through one maintainer cannot scale past the maintainer's attention bandwidth. The factory's autonomy substrate (AUTONOMOUS-LOOP, never-idle, CronCreate) was always premised on agent judgment in gray; this directive makes the premise explicit and names the cost of violating it. **Second observation — stacking-risk was ready to be published the tick after it was named**. Occurrence-1 gets a research doc, occurrence-2 promotes to ADR + BP-NN, occurrence-3+ becomes factory-wide rule. Publishing at occurrence-1 preserves a pre-validation anchor per the second-occurrence-discipline memory — the framework is on-record *before* the next expansive-trust-grant tests it. If the next instance doesn't fit the frame cleanly, that's a revision signal; if it does, that's validation. **Third observation — three same-tick architectural signals compose**. (1) grey-bottleneck = default-posture-change for gray-zone judgment; (2) CLI-cascade = author-at-source-of-truth pattern for new commands; (3) stacking-risk = exception lens for compound-gray. All three land same tick, separate memories + one published research doc. Cross-composition: grey-bottleneck loosens friction on per-action judgment; stacking-risk is the narrow exception that adds friction back where it's earned; CLI-cascade applies the same author-at-source pattern to a different surface (CLI instead of gray-decisions). **Fourth observation — grey-zone default-posture change is a revise-with-reason per future-self-not-bound**. The change leaves a dated justification (this memory, this tick-row) rather than silently updating behavior. Future-self can audit the revision, correct the calibration, or revert if occurrence-2 shows the posture was miscalibrated. This is the pattern working as designed. **Fifth observation — compoundings-per-tick = 5** (research doc + two memories + CLI-cascade memory + tick-row): (1) Stacking-risk framework published; (2) Bottleneck-principle memory filed; (3) CLI-cascade memory filed; (4) Edge-is-grey override reflected in revised posture; (5) Posture applied live to this tick's speculative work pick. `open-pr-refresh-debt` this tick: 0 incurred, 0 cleared (PR #128 landed between ticks). Cumulative auto-loop-{9..30}: +3 / -3 / -2 / -1 / -1 / 0 / 0 / -1 / -1 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / -2 / 0 / 0 = **net -8 units over 22 ticks**. `hazardous-stacked-base-count` = 0 this tick. | | 2026-04-23T15:59:00Z (autonomous-loop tick, auto-loop-50 — #155 markdownlint fix + AutoDream Overlay A first execution) | opus-4-7 / session continuation | 20c92390 | Tick did two concrete executions: (a) **#155 markdownlint fix**: CI on the AutoDream branch flagged MD032 (blanks-around-lists) at `docs/hygiene-history/autodream-fire-history.md:12`. Fixed by adding a blank line between `Upstream feature references:` and the bullet list. Verified locally with `markdownlint-cli2`. Commit + push to #155. Also checked #156 locally — clean, no lint debt. (b) **AutoDream Overlay A first execution**: per the policy in PR #155, migrated `feedback_signal_in_signal_out_clean_or_better_dsp_discipline.md` from per-user memory to in-repo `memory/` on a fresh branch. The memory qualifies: generic factory discipline, four in-repo occurrences, already cited at `memory/...` paths by FACTORY-HYGIENE.md and the AutoDream research doc (citations were dangling). Per-user source retained with a "Migrated to in-repo memory/" marker at top, preserving originSessionId provenance per the migration invariant. In-repo file carries a matching "Migrated to in-repo memory/ on 2026-04-23" header noting the Overlay-A execution. `memory/MEMORY.md` index entry added newest-first. PR #157 opened. (c) **CronList + visibility**: `20c92390` minutely fire verified live. | PR #157 `refactor/overlay-a-migrate-signal-in-signal-out` + lint-fix commit on #155 | Observation 1 — Overlay A has a concrete first-execution example now. Future migrations can point at PR #157 as the pattern: (i) copy verbatim into in-repo, (ii) header marker on in-repo noting the Overlay-A execution date, (iii) per-user source retains a "Migrated to" marker at top (source stays for provenance), (iv) index entry on both MEMORY.md files. Observation 2 — dangling-citation resolution is a strong first-migration signal. When in-repo docs cite a `memory/...` path that doesn't exist in-repo, migration earns its bytes by closing the reference loop. Observation 3 — the per-user source file did not shrink (kept verbatim; only gained a header marker) and the in-repo copy does not grow the repo's soulfile meaningfully against the value of resolved citations. Bloat-pushback criterion passed. Observation 4 — migration PR is small + bounded (2 files, 316 insertions, clean lint). Serves as exemplar for future migrations. | +| 2026-04-23T15:50:00Z (autonomous-loop tick, auto-loop-48 — soulfile reframe absorbed; staged absorption research landed) | opus-4-7 / session continuation | 20c92390 | Tick absorbed a major soulfile reframe from Aaron and landed the in-repo research doc that captures the new abstraction. Tick actions: (a) **Step 0 state check**: main unchanged since auto-loop-47 (`e8b0d2d` on feature branch); PR #155 CI in-progress (AutoDream research), no review yet; PR #150 sweep committed in prior tick. (b) **Aaron soulfile-reframe directive absorbed**: *"soufils shoud just be the DSL/english we talk about and the can import/inherit/abosrb ... git repos at compile time, distribution time, or runtime, remember the local native story"*. Filed per-user feedback memory `feedback_soulfile_is_dsl_english_git_repos_absorbed_at_stages_2026_04_23.md` with supersede-marker on the earlier `feedback_soulfile_formats_three_full_snapshot_declarative_git_native_primary_2026_04_23.md` (signal-preservation axis preserved; substrate-abstraction axis retired). (c) **Earlier soulfile-formats memory marked superseded** — supersede marker added to preserve AutoDream consolidation invariant (corrections recorded not deleted). (d) **CURRENT-aaron.md §10 updated same-tick** — per-maintainer CURRENT distillation pattern; the DSL-as-substrate framing is now the distilled currently-in-force form. (e) **Research doc landed in LFG**: `docs/research/soulfile-staged-absorption-model-2026-04-23.md` (PR #156) — proposes three stage boundaries (compile-time LFG factory-scope + Zeta tiny-bin-file DB mandatory fold-in / distribution-time envelope + overlays / runtime on-demand under two-layer authorization + stacking-risk gate). Markdown + YAML frontmatter named as first-pass representation. Composes with AutoDream cadence (runtime→compile-time promotion), multi-repo-refactor-shapes (repos→ingest sources), stacking-risk framework (runtime absorption gate). Three deferred items flagged (SoulStore stage-aware contract, compile-time-ingest script, DB absorb-form schema). (f) **CronList + visibility**: `20c92390` minutely fire verified live. | PR #156 `research/soulfile-staged-absorption-model` | Observation 1 — later-precedes-earlier memory rule exercised at scale: same 2026-04-23 date, second soulfile-framing message overrides first. CURRENT-aaron.md is the right surface for this — raw memories accumulate both; CURRENT distills what's in force. Observation 2 — soulfile reframe sharpens the local-native directive. Previously the Zeta tiny-bin-file DB had a clear role (self-use algebraic substrate) but an unclear transport story. Compile-time mandatory fold-in names the transport: the DB travels with the soulfile as structured DSL, readable offline. Observation 3 — the three stages compose cleanly with existing substrate: compile-time = LFG + DB; distribution = transport + overlays; runtime = AutoDream-managed accumulation. No conflict with earlier landed work; the reframe is additive-abstraction, not refactor-in-place. Observation 4 — Aaron's "you can figure out the proper stages" delegation respected; first-pass design proposed with 3 open questions for maintainer rather than locking decisions unilaterally. | +| 2026-04-23T15:55:00Z (autonomous-loop tick, auto-loop-49 — restrictive-English DSL + Soulfile Runner project named + linguistic-seed anchoring) | opus-4-7 / session continuation | 20c92390 | Tick absorbed two follow-up maintainer directives refining the soulfile DSL shape. Tick actions: (a) **Step 0 state check**: `gh pr list` inventoried 25+ open PRs (#155 AutoDream, #156 soulfile-absorption, all others from #132..#156 still open). Picked soulfile-absorption refinement over other bounded moves since Aaron messages arrived mid-tick. (b) **First directive absorbed**: *"our dsl can be a restrictive english it does not have to be a f# dsl, whatever our soul file runner can run, we probalby should split this out too as it's own project, and it will use zeta for the advance features, all small bins"*. Filed per-user feedback memory `feedback_soulfile_dsl_is_restrictive_english_runner_is_own_project_uses_zeta_small_bins_2026_04_23.md`. Named the **Soulfile Runner** as a distinct project-under-construction; sibling to Zeta / Aurora / Demos / Factory / Package Manager "ace". Updated `CURRENT-aaron.md` §4 with the new project name. (c) **Second directive absorbed**: *"soul files should probably feel like natural english even if they are not exacly and some restrictuvve form where we only allow words we have exact definons fors like that how path of seed/kernel thing"*. Grepped memory for "seed/kernel" context — resolves to the **linguistic seed** memory (formally-verified minimal-axiom self-referential glossary, Lean4 formalisable). Soulfile DSL vocabulary = linguistic-seed glossary terms; new words earn glossary entries before entering the DSL. Extended the same per-user feedback memory with the linguistic-seed anchoring + verbatim of the second directive. (d) **PR #156 updated** on the research branch: replaced the "Representation candidate — Markdown + frontmatter" section with two new sections — "DSL — restrictive English anchored in the linguistic seed" (DSL shape + three consequences + controlled vocabulary) and "The Soulfile Runner — its own project-under-construction" (design properties + Zeta-at-advanced-edge edge + all-small-bins). Preserves the Markdown-as-structure-layer claim while elevating restrictive-English-as-execution-layer to primary. (e) **CronList + visibility**: `20c92390` minutely fire verified live. | PR #156 updated on `research/soulfile-staged-absorption-model` | Observation 1 — two-directive sharpening in one tick. The second directive (linguistic-seed anchoring) constrained the first (restrictive-English shape) without contradicting it. CURRENT-aaron.md §4 absorbed project-name addition once; the feedback memory grew an inline "follow-up" section rather than spawning a separate memory (single topic + same session = single memory is correct). Observation 2 — linguistic-seed is now load-bearing for the soulfile runner, not just a standalone research pointer. The runner's grammar is what decides executability; the linguistic seed is what decides vocabulary. Separation of concerns: runner-grammar × seed-vocabulary = DSL. Observation 3 — restrictive-English choice makes cross-substrate-readability free. A Claude-composed soulfile reads cleanly in Codex / Gemini / human reading — no tool dependency. The composability claim in the first soulfile memory now has a concrete mechanism. Observation 4 — signal-in-signal-out exercise: the later directive layered atop the earlier without erasing it; both Aaron messages preserved verbatim in the per-user memory. AutoDream Overlay B note: the research doc now depends on the linguistic-seed memory being findable, which is a per-user memory; future migration candidate for Overlay A. | diff --git a/docs/research/soulfile-staged-absorption-model-2026-04-23.md b/docs/research/soulfile-staged-absorption-model-2026-04-23.md new file mode 100644 index 00000000..e3090a52 --- /dev/null +++ b/docs/research/soulfile-staged-absorption-model-2026-04-23.md @@ -0,0 +1,286 @@ +# Soulfile — staged absorption model + +**Date:** 2026-04-23 +**Status:** Research doc — proposing the stage boundaries +for the soulfile's DSL-as-substrate-with-git-ingest model. +**Triggered by:** The human maintainer 2026-04-23: +*"i'm thinking soufils shoud just be the DSL/english we +talk about and the can import/inherit/abosrb or whatever +you want to can it git repos at compile time, distribution +time, or runtime, remember the local native story so those +will need to be inlucded at soulfile compile time somewhere +I'm calling it compile time but that's just a metaphore +like packing time or whatever. You can figure out the +proper stages."* + +**Scope:** Factory policy — generic, reusable by any factory +adopter; ships to each project-under-construction that needs +an agent-transportable substrate (soulfile). + +## What the soulfile is — and is not + +### Is + +A **DSL / English substrate**. The natural-language +reasoning medium the maintainer and the agent converse in, +the rules encode, and the memories accumulate. The soulfile +is the persistent shape of that substrate — what survives +across agent incarnations, harness swaps, and repo splits. + +### Is not + +- **Not a bit-exact git repo copy.** The earlier framing + (soulfile size = git history bytes) is retired on the + abstraction axis. Git is a transport / collaboration / + history medium; the soulfile is the substrate those + bytes encode. +- **Not a binary dump of tools or runtimes.** Those are + inputs to the substrate, not the substrate itself. +- **Not a single file format.** The soulfile is a + concept; its physical representation is one of + several (Markdown + frontmatter, JSON-LD, + structured-English envelope, etc.) determined at + compile-time. + +## The three stages + +### Stage 1 — Compile-time (packing / staging) + +**When:** once per soulfile release, authored by the +maintainer or the factory itself. Analogous to a build +step. + +**What lands at this stage:** + +- **Canonical-source-of-truth content from LFG repos** + (per the multi-project + LFG-soulfile-inheritance + framing). Every factory-scope artifact — + `AGENTS.md`, `CLAUDE.md`, `GOVERNANCE.md`, + `docs/**.md`, `.claude/agents/**/SKILL.md`, + `.claude/skills/**/SKILL.md`, committed personas and + notebooks — is absorbed into the DSL substrate. +- **Local-native DB content** (Zeta tiny-bin-file DB + per the self-use directive). This is **mandatory at + compile-time per the human maintainer** — the + algebraic substrate must travel with the soulfile so + the agent can reason about the DB's content offline. + The absorb-form is a structured English/DSL + representation of the DB's relational content + the + operator-algebra axioms (D / I / z⁻¹ / H / retraction). +- **Pinned upstream content** the factory depends on for + reasoning (formal-method references, key upstream + doc excerpts, anchored CVE data, etc.). These must be + enumerated explicitly; silent inheritance is not + allowed. +- **Compile-time-embedded persona notebooks** — the + subset of each persona's notebook marked as + substrate-essential (not the rolling scratch). + +**Output:** the soulfile artifact — substrate + embedded +resources + content hash + optional signature. + +**Does not land at this stage:** + +- Maintainer-specific content (per-user memory) — that's + a runtime-attached layer. +- Experimental / risky AceHack-side content — stays in + AceHack until it proves itself and propagates to LFG. +- Transient session state — that's runtime-scope. + +### Stage 2 — Distribution-time + +**When:** the soulfile moves between substrates (agent +incarnation → agent incarnation, harness → harness, +archive-write to IceDrive / pCloud, cross-substrate +transport). + +**What lands at this stage:** + +- **Environment-specific overlays** — harness + configuration hints, substrate-specific conventions + (e.g., Claude-Code vs Codex vs Gemini-CLI flavor + markers). Additive; never overrides compile-time + content. +- **Optional companion git-repo pointers** — lazy-fetch + references that runtime can resolve if needed. These + are references, not inlined content. +- **Maintainer-scope signature** — the maintainer's + attestation that this distribution is authorized + (per the two-layer authorization model). + +**Output:** transport envelope — soulfile + manifest + +integrity. + +### Stage 3 — Runtime + +**When:** during an active agent session. + +**What lands at this stage:** + +- **Additional git repos on demand** — cloned or read, + subject to the two-layer authorization model + (maintainer-authorized + Anthropic-policy-compatible) + and the stacking-risk gate (per the + stacking-risk-decision-framework research doc). +- **Live conversation content** — memories, ad-hoc + decisions, feedback. Accumulates into the per-user + memory substrate while the session runs. +- **External research / tool output** — fetched via + normal tool-use contract (BP-11 data-not-directives). + +**Output:** the agent's session working state. At +session-end, content that has earned persistence gets +promoted back into the compile-time stage on the next +soulfile release, via AutoDream consolidation cadence +(see `autodream-extension-and-cadence-2026-04-23.md`). + +## DSL — restrictive English anchored in the linguistic seed + +The human maintainer's 2026-04-23 follow-up clarifies the +DSL shape: it is **restrictive English** — natural-language +prose constrained to a grammar the runner can execute — +not an F# DSL. The target is *"feels like natural English +even if not exactly English"* with a controlled vocabulary +where **every word used has an exact definition reachable +from the linguistic-seed glossary**. + +Three consequences: + +1. **Cross-substrate readable by default.** Humans, + Claude, Codex, Gemini, and future agents read the + same text. F# DSL would have required every consumer + to own F# tooling; restrictive English requires only + a parser for the constrained grammar. +2. **Controlled vocabulary.** The soulfile's word set is + the set of terms the linguistic-seed glossary formally + defines (formally-verified minimal-axiom + self-referential glossary substrate; Lean4-formalisable; + smallest-axiom lineage per the maintainer's prior + research pointer). New words earn glossary entries + first, then enter the DSL — glossary-anchor-keeper + discipline applies. +3. **Composable with Markdown.** Restrictive-English + prose embedded in Markdown + frontmatter keeps the + existing authoring medium; the runner reads the + restrictive-English sentences. The structure layer + (Markdown) and the execution layer (restrictive + English) are separate concerns. + +The runner decides the grammar by being the decider on +"is this executable?" The linguistic seed decides the +vocabulary. Neither is the soulfile itself — both serve +the DSL substrate. + +The DB absorb-form (compile-time ingest of the Zeta +tiny-bin-file DB) needs a structured schema; first-pass +candidate is a Markdown table plus frontmatter that names +the semiring, the relations, and the operator-algebra +axioms in force, with each term defined in the +linguistic seed. Deferred for a follow-up tick. + +## The Soulfile Runner — its own project-under-construction + +The maintainer's 2026-04-23 follow-up adds the **Soulfile +Runner** as a named project-under-construction, sibling to +Zeta / Aurora / Demos / Factory / Package Manager "ace". +Design properties: + +- **Interprets restrictive-English soulfiles.** Primary + responsibility; runs wherever a soulfile is loaded. +- **Uses Zeta for advanced features.** Basic execution + runs on small primitives; retraction-native state, + algebraic composition, provenance tracking, K-relations + semantics, and temporal operators delegate to Zeta. + Clean dependency edge: Runner ⇒ Zeta, not the other + way. +- **All small bins.** Runner output, intermediate state, + packaged soulfile artifacts — all small binary + artifacts. Composes with the local-native tiny-bin-file + discipline. +- **Own repo when the multi-repo refactor lands.** + Interim — lives in the Zeta monorepo alongside the + other peer projects until the split. + +Implementation is deferred — this research doc is +design-clarity, not an implementation commit. + +## Composition with already-landed substrate + +- **Multi-project framing** — each project-under-construction + (Zeta / Aurora / Demos / Factory / Package Manager / ...) + contributes factory-scope content to the compile-time + stage. LFG repos are the lineage; AceHack stays out of + the compile-time stage. +- **AutoDream cadenced consolidation** — runtime memory + that earns persistence rolls back into compile-time at + release cadence. +- **In-repo-preferred discipline** — in-repo content is + compile-time-eligible; per-user content stays runtime. + The pushback-on-soulfile-bloat criterion applies at the + migration step, not the absorb step. +- **Zeta self-use germination** (the maintainer's self-use-DB directive, captured in per-user memory) — + the tiny-bin-file DB is the mandatory compile-time + ingest target. Soulfile compile-time work is how this + directive lands for agent-transportable substrate. +- **Stacking-risk gate** — runtime git-repo absorption + triggers the gate when ≥3 ambiguity layers stack + (per the stacking-risk research doc). +- **Two-layer authorization model** — runtime absorption + respects both layers as it does today. + +## Deferred (not this round) + +1. **SoulStore implementation contract.** The PR #142 + sketch is format-agnostic; this research doc makes it + stage-aware. The implementation work lands after the + stage design stabilises. +2. **Compile-time-ingest script design.** The packing + procedure — walk LFG, absorb DB content, emit the + artifact — is tooling that lands alongside the first + compile-time release. +3. **DB absorb-form specification.** The structured DSL + representation of Zeta's tiny-bin-file DB content + needs concrete schema work. +4. **Signed distribution artifact format.** Distribution + manifest + integrity (SLSA-adjacent) is a separate + follow-up; composes with existing supply-chain safe + patterns (FACTORY-HYGIENE row #44). + +## Open questions for the human maintainer + +1. **Should AceHack content ever reach compile-time?** + Currently the split is LFG → compile, AceHack → + runtime-scratch. The maintainer's super-risky license + for AceHack suggests this boundary is correct; confirm. +2. **Per-maintainer overlays at distribution-time** — + should each maintainer's distribution get a + maintainer-scope attestation? Lightweight; maintainer's + call. +3. **Compile-time cadence** — aligned with AutoDream + consolidation? Aligned with factory round-close? Or + on-demand? First-pass recommendation: on-demand + + tagged releases, no fixed cadence. + +**Per-user memory references** (below and throughout) live +in per-user memory at `~/.claude/projects//memory/`, +not in the in-repo `memory/` tree. Citations are provenance +reference; they intentionally do not resolve as in-repo +paths. + +## Composes with + +- `docs/research/autodream-extension-and-cadence-2026-04-23.md` + (runtime → compile-time promotion via consolidation) +- `docs/research/multi-repo-refactor-shapes-2026-04-23.md` (lands via PR #150) + (the refactor shapes that determine which repos are + compile-time ingest sources) +- `docs/research/stacking-risk-decision-framework.md` + (runtime absorption gate) +- Per-user memory: the soulfile-reframe feedback + (`feedback_soulfile_is_dsl_english_git_repos_absorbed_at_stages_2026_04_23.md`) + supersedes the earlier three-formats memory on the + substrate-abstraction axis +- PR #142 SoulStore research sketch (to be updated for + stage-awareness when stages stabilise) +- `project_zeta_self_use_local_native_tiny_bin_file_db_no_cloud_germination_2026_04_22.md` + (local-native DB is compile-time-mandatory ingest)