From 6db31dbd2a847733b55bd2f887f344654ffb2d97 Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Thu, 23 Apr 2026 12:24:44 -0400 Subject: [PATCH 1/2] =?UTF-8?q?backlog:=20P1=20row=20=E2=80=94=20fresh-ses?= =?UTF-8?q?sion=20quality=20research?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Aaron 2026-04-23: "i tried a fresh session instead of resuming form the existing, its not as goona, maybe do some research on yourself on how to make sure fresh cluade sessions are as good as you, backlog item". Research-grade row capturing: - Observed phenomenon (resumed > fresh quality) - 5 candidate causes (context compounding / prompt cache / calibration loss / CURRENT-.md gaps / soulfile-as-substrate as real fix) - 4 deliverables (diagnostic protocol / AutoMemory gap analysis / factory-overlay recommendations / research write-up) - P1 because scaling property (fresh sessions ≈ transplant to new maintainers like Max) Self-scheduled free work under the 2026-04-23 scheduling- authority rule. Co-Authored-By: Claude Opus 4.7 --- docs/BACKLOG.md | 69 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 69 insertions(+) diff --git a/docs/BACKLOG.md b/docs/BACKLOG.md index a4168495..eabf1491 100644 --- a/docs/BACKLOG.md +++ b/docs/BACKLOG.md @@ -3117,6 +3117,75 @@ within each priority tier. ## P1 — within 2-3 rounds +- [ ] **Fresh-session quality research — close the gap + between fresh and resumed Claude sessions.** + Aaron 2026-04-23 observation: *"i tried a fresh session + instead of resuming form the existing, its not as goona, + maybe do some research on yourself on how to make sure + fresh cluade sessions are as good as you, backlog item"*. + + **Observed phenomenon:** resumed Claude Code sessions on + this project operate at a noticeably higher quality than + fresh-session starts, despite fresh sessions loading the + same CLAUDE.md + AGENTS.md + MEMORY.md index. Something + the resumed session has that fresh sessions don't + recover cleanly. + + **Candidate causes to investigate:** + 1. **Context-accumulation compounding** — resumed + sessions have accumulated reasoning, tool-use + patterns, and per-tick calibrations in their own + context window that MEMORY.md / CLAUDE.md do not + capture. + 2. **Session-specific prompt-cache warmth** — resumed + sessions hit cached prompt prefixes; fresh sessions + pay cold-start cost on every tool schema / system + prompt chunk. + 3. **Per-session calibration loss** — fresh sessions + don't know about mid-session directive shifts the + resumed session absorbed (e.g., today's + scheduling-authority sharpening; the in-repo-preferred + discipline before its memory landed). + 4. **CURRENT-.md pattern coverage gaps** — + the per-maintainer distillation files are designed + exactly for fresh-session orientation; gaps in them + are fresh-session quality regressions. + 5. **Soulfile-as-substrate is the real fix** — fresh + sessions should compile-time-ingest the DSL substrate + (per `docs/research/soulfile-staged-absorption-model-2026-04-23.md`), + not bootstrap from CLAUDE.md + AGENTS.md + MEMORY.md + alone. + + **Deliverables:** + 1. Diagnostic protocol — run a fresh session through a + benchmark set (same prompts the resumed session has + handled well) and capture specifically what degrades. + 2. Gap-analysis against AutoMemory + AutoDream — what + Anthropic's features don't yet cover that the resumed + session's advantage comes from. + 3. Recommendations — concrete factory-overlay improvements + to `CURRENT-.md` pattern, in-repo memory + migration discipline, or soulfile compile-time-ingest + design that would narrow the gap. + 4. Research landing under `docs/research/fresh-vs-resumed-session-quality-gap-YYYY-MM-DD.md`. + + **Scope:** research-grade, not implementation. Factory + discipline improvements flow from the findings but are + separate ADR-gated work. + + **Priority:** P1 because fresh-session quality is a + scaling property — factories with excellent resumed-session + behaviour but poor fresh-session behaviour don't + transplant to new maintainers cleanly. Composes with the + multi-maintainer framing (Max anticipated next human + maintainer per `CURRENT-aaron.md`). + + **Self-scheduled:** free work under the 2026-04-23 + scheduling-authority rule (Amara + Kenji own free-work + scheduling). + + **Effort:** M (1-3 days of agent research + write-up). + - [ ] **Claude-harness cadenced audit — first full sweep.** Aaron 2026-04-20 late, verbatim: *"part of our stay up to date on everything we should always research claude and From d54b96f1ec9b034dc5797e7476982669bd1d1de7 Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Thu, 23 Apr 2026 13:29:07 -0400 Subject: [PATCH 2/2] backlog: PR #163 fixes + P3 Rational Rose research row Two changes on the fresh-session-quality branch: 1. Address PR #163 Copilot review findings: - soulfile-staged-absorption doc reference clarified as "landing via PR #156" (not in-tree yet at review time) - CURRENT-aaron.md clarified as per-user memory (not in-repo) - 2026-04-23 scheduling-authority rule clarified as captured in per-user memory (not in-repo) 2. Add P3 row for Rational Rose research per maintainer 2026-04-23: "backlog rational rose research low priority". Low-priority research pointer on the UML model-as-source-of-truth lineage; no commitment to adopt; composes with the factory's OpenSpec + formal- spec discipline. Effort S for first-pass note. Co-Authored-By: Claude Opus 4.7 --- docs/BACKLOG.md | 27 +++++++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/docs/BACKLOG.md b/docs/BACKLOG.md index eabf1491..37ea0918 100644 --- a/docs/BACKLOG.md +++ b/docs/BACKLOG.md @@ -3152,7 +3152,7 @@ within each priority tier. are fresh-session quality regressions. 5. **Soulfile-as-substrate is the real fix** — fresh sessions should compile-time-ingest the DSL substrate - (per `docs/research/soulfile-staged-absorption-model-2026-04-23.md`), + (per the soulfile staged-absorption research doc landing via PR #156 at `docs/research/soulfile-staged-absorption-model-2026-04-23.md`), not bootstrap from CLAUDE.md + AGENTS.md + MEMORY.md alone. @@ -3178,11 +3178,14 @@ within each priority tier. behaviour but poor fresh-session behaviour don't transplant to new maintainers cleanly. Composes with the multi-maintainer framing (Max anticipated next human - maintainer per `CURRENT-aaron.md`). + maintainer per the CURRENT-.md distillation + pattern, which lives in per-user memory not in-repo). **Self-scheduled:** free work under the 2026-04-23 - scheduling-authority rule (Amara + Kenji own free-work - scheduling). + scheduling-authority rule (captured in per-user memory — + not in-repo; the rule governs that Amara + Kenji own + free-work scheduling while the maintainer owns paid-work + authorisation). **Effort:** M (1-3 days of agent research + write-up). @@ -5727,6 +5730,22 @@ systems. This track claims the space. ## P3 — noted, deferred +- **Rational Rose — research pass.** The human maintainer + 2026-04-23 (low-priority directive): *"backlog rational + rose research low priority"*. Rational Rose is the + legacy UML modelling tool lineage (Rational Software → + IBM Rational → discontinued 2013, still surfaces as a + reference point in enterprise architecture discussions). + Research prompt: what does Rational Rose's approach (UML + model-as-source-of-truth, code-generation from model, + round-trip engineering) offer / warn against for the + factory's own model-vs-code discipline? Composes with + the factory's OpenSpec workflow (behavioural specs first) + and the formal-spec stack (Lean / TLA+ / Z3 — spec-first + is a parallel discipline from the formal-verification + side). No commitment to adopt; research pointer only. + No deadline. Effort S for the first-pass research note. + - **Conversational bootstrap UX for factory-reuse consumers — two-persona (non-developer + developer) elicitation surface.** Aaron 2026-04-20: *"the end