backlog: P1 — fresh-session quality research (Aaron 2026-04-23) by AceHack · Pull Request #163 · Lucent-Financial-Group/Zeta

AceHack · 2026-04-23T16:24:57Z

Summary

Adds a P1 BACKLOG row capturing Aaron's 2026-04-23 observation that fresh Claude Code sessions operate at noticeably lower quality than resumed sessions, and proposing research to close the gap.

"i tried a fresh session instead of resuming form the existing, its not as goona, maybe do some research on yourself on how to make sure fresh cluade sessions are as good as you, backlog item"

Why this is P1

Fresh-session quality is a scaling property — a factory whose resumed sessions are excellent but whose fresh sessions are mediocre doesn't transplant to new maintainers cleanly. Max is anticipated as the next human maintainer per CURRENT-aaron.md; his fresh-session experience is the benchmark.

Candidate causes to investigate

Context-accumulation compounding (resumed has reasoning in window that MEMORY.md doesn't capture)
Prompt-cache warmth (fresh pays cold-start repeatedly)
Per-session calibration loss (mid-session directive shifts don't survive)
CURRENT-<maintainer>.md coverage gaps (the fast-path is meant exactly for this)
Soulfile-as-substrate as the real fix (per docs/research/soulfile-staged-absorption-model-2026-04-23.md compile-time ingest)

Deliverables

Diagnostic protocol — benchmark fresh vs resumed on known-good prompts
Gap analysis vs AutoMemory + AutoDream Anthropic features
Factory-overlay recommendations (CURRENT-file improvements, migration discipline, soulfile compile-time design)
Research doc landing under docs/research/fresh-vs-resumed-session-quality-gap-YYYY-MM-DD.md

Self-scheduled free work under the 2026-04-23 scheduling-authority rule.

🤖 Generated with Claude Code

…+ Overlay A #4 (PR #162) Two PRs this tick, both self-scheduled free work per the 2026-04-23 scheduling-authority rule: - PR #162 — Overlay A #4: external-signal-confirms-internal- insight discipline migrated per-user → in-repo - PR #163 — P1 BACKLOG row for fresh-session quality research (Aaron 2026-04-23 directive) Queue now 1 remaining Overlay A migration (semiring-parameterized-zeta). Fresh-session gap research cites soulfile-staged-absorption (PR #156) as the designed fix; research would validate that thesis. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Adds a new P1 BACKLOG item to track research into why “fresh” Claude Code sessions appear to perform worse than resumed sessions, and to define candidate causes + deliverables for closing that gap.

Changes:

Adds a P1 BACKLOG row describing the fresh-vs-resumed session quality gap
Enumerates candidate causes and concrete deliverables for a research write-up
Adds priority/scope/effort framing for scheduling and planning

Aaron 2026-04-23: "i tried a fresh session instead of resuming form the existing, its not as goona, maybe do some research on yourself on how to make sure fresh cluade sessions are as good as you, backlog item". Research-grade row capturing: - Observed phenomenon (resumed > fresh quality) - 5 candidate causes (context compounding / prompt cache / calibration loss / CURRENT-<maintainer>.md gaps / soulfile-as-substrate as real fix) - 4 deliverables (diagnostic protocol / AutoMemory gap analysis / factory-overlay recommendations / research write-up) - P1 because scaling property (fresh sessions ≈ transplant to new maintainers like Max) Self-scheduled free work under the 2026-04-23 scheduling- authority rule. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Two changes on the fresh-session-quality branch: 1. Address PR #163 Copilot review findings: - soulfile-staged-absorption doc reference clarified as "landing via PR #156" (not in-tree yet at review time) - CURRENT-aaron.md clarified as per-user memory (not in-repo) - 2026-04-23 scheduling-authority rule clarified as captured in per-user memory (not in-repo) 2. Add P3 row for Rational Rose research per maintainer 2026-04-23: "backlog rational rose research low priority". Low-priority research pointer on the UML model-as-source-of-truth lineage; no commitment to adopt; composes with the factory's OpenSpec + formal- spec discipline. Effort S for first-pass note. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…filed PR #163 (fresh-session-quality research BACKLOG): 3 Copilot findings on references-to-not-yet-merged / references-to- per-user-memory. Fixed at source; 3 threads resolved; rebased. New P3 row: Rational Rose research (Aaron 2026-04-23 low- priority directive) — UML model-as-source-of-truth lineage; research pointer; no adopt commitment. Both landed on PR #163's branch (same BACKLOG.md edits). 4 session PRs merged; 3 armed; 12 still open. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Aaron: "backlog is uml modeling useful for the factory and what tools would it require us map?" Filed as P3 row with two-question research pointer (utility vs existing OpenSpec + formal-spec discipline; tooling-map for factory-technology-inventory). First-pass recommendation: Mermaid as factory-aligned default (git-native, zero toolchain). Auto-merge armed. Adjacent to Rational Rose P3 row (PR #163) — both will sit together on merge; row #54 first firing likely flags for consolidation consideration. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Aaron 2026-04-23: "backlog is uml modeling useful for the factory and what tools would it require us map?" Two-question research pointer: 1. Utility — does UML add value on top of OpenSpec + formal specs (TLA+ / Lean / Z3 / FsCheck / Alloy)? 2. Tooling-map — if we adopt, what tools would the factory inventory (PlantUML / Mermaid / draw.io / Structurizr / Rational Rose lineage)? Composes with: - Rational Rose P3 row (adjacent when PR #163 merges) - docs/FACTORY-TECHNOLOGY-INVENTORY.md (PR #170 target) - OpenSpec workflow (spec-as-source-of-truth already in place) - Formal-spec stack First-pass recommendation (to validate): Mermaid is the factory-aligned default (git-native, zero toolchain, GitHub renders natively); heavy UML tools likely over-scoped. Research note under docs/research/uml-modelling-for-the- factory-YYYY-MM-DD.md when prioritised. No adopt commitment. No deadline. Effort S first-pass; M if adopting. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…#173) Aaron 2026-04-23: "backlog is uml modeling useful for the factory and what tools would it require us map?" Two-question research pointer: 1. Utility — does UML add value on top of OpenSpec + formal specs (TLA+ / Lean / Z3 / FsCheck / Alloy)? 2. Tooling-map — if we adopt, what tools would the factory inventory (PlantUML / Mermaid / draw.io / Structurizr / Rational Rose lineage)? Composes with: - Rational Rose P3 row (adjacent when PR #163 merges) - docs/FACTORY-TECHNOLOGY-INVENTORY.md (PR #170 target) - OpenSpec workflow (spec-as-source-of-truth already in place) - Formal-spec stack First-pass recommendation (to validate): Mermaid is the factory-aligned default (git-native, zero toolchain, GitHub renders natively); heavy UML tools likely over-scoped. Research note under docs/research/uml-modelling-for-the- factory-YYYY-MM-DD.md when prioritised. No adopt commitment. No deadline. Effort S first-pass; M if adopting. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…tive + first test First Otto-attributed tick. Three directive absorptions: (a) Loop agent named Otto, role Project Manager per Aaron 2026-04-23 directive. Otto IS Claude-in-autonomous-loop- without-a-persona-hat; sibling to Kenji/Aarav/etc. Not a new SKILL.md. Prior "unnamed-default (loop-agent)" attributions (Showcase, Anima) reattribute to Otto. (b) Claude Cowork fact-check: Google hallucinated `-w` workstream mode. Real flag is `--worktree` (git worktree isolation). Cowork is a separate Anthropic product (Claude Desktop / web), not a CLI mode. `/loop` already inherits all harness features. No restart needed. (c) NSA (New Session Agent) persona = first-class directive. Extends PR #163 passive monitoring → active testing. 5-prompt test set, 3 configurations (baseline / NSA-default / NSA-worktree), 5 metrics. First NSA test run same-tick: `claude -p --model haiku-4-5` cold-start query found Zeta project identity correctly but FAILED to find Otto — gap identified (MEMORY.md had no pointer to new per-user memories). Fixed same-tick. Concrete demonstration: NSA testing catches substrate gaps that current-session agents miss. Attribution: Otto (loop-agent PM hat) for hat-less work. No persona hats worn this tick. Per-user memories filed: - project_loop_agent_named_otto_role_project_manager_2026_04_23.md - reference_claude_code_w_flag_is_worktree_not_workstream_cowork_is_separate_product_2026_04_23.md - feedback_new_session_agent_persona_first_class_experience_test_fresh_sessions_including_worktree_2026_04_23.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 23, 2026 16:24

Copilot started reviewing on behalf of AceHack April 23, 2026 16:25 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Comment thread docs/BACKLOG.md Outdated

Comment thread docs/BACKLOG.md Outdated

Comment thread docs/BACKLOG.md Outdated

AceHack enabled auto-merge (squash) April 23, 2026 17:05

AceHack and others added 2 commits April 23, 2026 13:29

AceHack force-pushed the backlog/fresh-session-quality-research branch from e5b1dba to d54b96f Compare April 23, 2026 17:29

AceHack merged commit 3e884ca into main Apr 23, 2026
10 checks passed

AceHack deleted the backlog/fresh-session-quality-research branch April 23, 2026 17:31

AceHack mentioned this pull request Apr 23, 2026

backlog: P3 — is UML modelling useful for the factory + what tools to map? #173

Merged

AceHack mentioned this pull request Apr 23, 2026

hygiene: NSA test history bootstrap — gap #3 of Frontier readiness closed #177

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backlog: P1 — fresh-session quality research (Aaron 2026-04-23)#163

backlog: P1 — fresh-session quality research (Aaron 2026-04-23)#163
AceHack merged 2 commits intomainfrom
backlog/fresh-session-quality-research

AceHack commented Apr 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AceHack commented Apr 23, 2026

Summary

Why this is P1

Candidate causes to investigate

Deliverables

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants