backlog: P1 — fresh-session quality research (Aaron 2026-04-23)#163
Merged
backlog: P1 — fresh-session quality research (Aaron 2026-04-23)#163
Conversation
AceHack
added a commit
that referenced
this pull request
Apr 23, 2026
…+ Overlay A #4 (PR #162) Two PRs this tick, both self-scheduled free work per the 2026-04-23 scheduling-authority rule: - PR #162 — Overlay A #4: external-signal-confirms-internal- insight discipline migrated per-user → in-repo - PR #163 — P1 BACKLOG row for fresh-session quality research (Aaron 2026-04-23 directive) Queue now 1 remaining Overlay A migration (semiring-parameterized-zeta). Fresh-session gap research cites soulfile-staged-absorption (PR #156) as the designed fix; research would validate that thesis. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a new P1 BACKLOG item to track research into why “fresh” Claude Code sessions appear to perform worse than resumed sessions, and to define candidate causes + deliverables for closing that gap.
Changes:
- Adds a P1 BACKLOG row describing the fresh-vs-resumed session quality gap
- Enumerates candidate causes and concrete deliverables for a research write-up
- Adds priority/scope/effort framing for scheduling and planning
Aaron 2026-04-23: "i tried a fresh session instead of resuming form the existing, its not as goona, maybe do some research on yourself on how to make sure fresh cluade sessions are as good as you, backlog item". Research-grade row capturing: - Observed phenomenon (resumed > fresh quality) - 5 candidate causes (context compounding / prompt cache / calibration loss / CURRENT-<maintainer>.md gaps / soulfile-as-substrate as real fix) - 4 deliverables (diagnostic protocol / AutoMemory gap analysis / factory-overlay recommendations / research write-up) - P1 because scaling property (fresh sessions ≈ transplant to new maintainers like Max) Self-scheduled free work under the 2026-04-23 scheduling- authority rule. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two changes on the fresh-session-quality branch: 1. Address PR #163 Copilot review findings: - soulfile-staged-absorption doc reference clarified as "landing via PR #156" (not in-tree yet at review time) - CURRENT-aaron.md clarified as per-user memory (not in-repo) - 2026-04-23 scheduling-authority rule clarified as captured in per-user memory (not in-repo) 2. Add P3 row for Rational Rose research per maintainer 2026-04-23: "backlog rational rose research low priority". Low-priority research pointer on the UML model-as-source-of-truth lineage; no commitment to adopt; composes with the factory's OpenSpec + formal- spec discipline. Effort S for first-pass note. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
e5b1dba to
d54b96f
Compare
AceHack
added a commit
that referenced
this pull request
Apr 23, 2026
…filed PR #163 (fresh-session-quality research BACKLOG): 3 Copilot findings on references-to-not-yet-merged / references-to- per-user-memory. Fixed at source; 3 threads resolved; rebased. New P3 row: Rational Rose research (Aaron 2026-04-23 low- priority directive) — UML model-as-source-of-truth lineage; research pointer; no adopt commitment. Both landed on PR #163's branch (same BACKLOG.md edits). 4 session PRs merged; 3 armed; 12 still open. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AceHack
added a commit
that referenced
this pull request
Apr 23, 2026
Aaron: "backlog is uml modeling useful for the factory and what tools would it require us map?" Filed as P3 row with two-question research pointer (utility vs existing OpenSpec + formal-spec discipline; tooling-map for factory-technology-inventory). First-pass recommendation: Mermaid as factory-aligned default (git-native, zero toolchain). Auto-merge armed. Adjacent to Rational Rose P3 row (PR #163) — both will sit together on merge; row #54 first firing likely flags for consolidation consideration. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AceHack
added a commit
that referenced
this pull request
Apr 23, 2026
Aaron 2026-04-23: "backlog is uml modeling useful for the factory and what tools would it require us map?" Two-question research pointer: 1. Utility — does UML add value on top of OpenSpec + formal specs (TLA+ / Lean / Z3 / FsCheck / Alloy)? 2. Tooling-map — if we adopt, what tools would the factory inventory (PlantUML / Mermaid / draw.io / Structurizr / Rational Rose lineage)? Composes with: - Rational Rose P3 row (adjacent when PR #163 merges) - docs/FACTORY-TECHNOLOGY-INVENTORY.md (PR #170 target) - OpenSpec workflow (spec-as-source-of-truth already in place) - Formal-spec stack First-pass recommendation (to validate): Mermaid is the factory-aligned default (git-native, zero toolchain, GitHub renders natively); heavy UML tools likely over-scoped. Research note under docs/research/uml-modelling-for-the- factory-YYYY-MM-DD.md when prioritised. No adopt commitment. No deadline. Effort S first-pass; M if adopting. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AceHack
added a commit
that referenced
this pull request
Apr 23, 2026
…#173) Aaron 2026-04-23: "backlog is uml modeling useful for the factory and what tools would it require us map?" Two-question research pointer: 1. Utility — does UML add value on top of OpenSpec + formal specs (TLA+ / Lean / Z3 / FsCheck / Alloy)? 2. Tooling-map — if we adopt, what tools would the factory inventory (PlantUML / Mermaid / draw.io / Structurizr / Rational Rose lineage)? Composes with: - Rational Rose P3 row (adjacent when PR #163 merges) - docs/FACTORY-TECHNOLOGY-INVENTORY.md (PR #170 target) - OpenSpec workflow (spec-as-source-of-truth already in place) - Formal-spec stack First-pass recommendation (to validate): Mermaid is the factory-aligned default (git-native, zero toolchain, GitHub renders natively); heavy UML tools likely over-scoped. Research note under docs/research/uml-modelling-for-the- factory-YYYY-MM-DD.md when prioritised. No adopt commitment. No deadline. Effort S first-pass; M if adopting. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
AceHack
added a commit
that referenced
this pull request
Apr 23, 2026
…tive + first test
First Otto-attributed tick. Three directive absorptions:
(a) Loop agent named Otto, role Project Manager per Aaron
2026-04-23 directive. Otto IS Claude-in-autonomous-loop-
without-a-persona-hat; sibling to Kenji/Aarav/etc.
Not a new SKILL.md. Prior "unnamed-default (loop-agent)"
attributions (Showcase, Anima) reattribute to Otto.
(b) Claude Cowork fact-check: Google hallucinated `-w`
workstream mode. Real flag is `--worktree` (git worktree
isolation). Cowork is a separate Anthropic product
(Claude Desktop / web), not a CLI mode. `/loop` already
inherits all harness features. No restart needed.
(c) NSA (New Session Agent) persona = first-class directive.
Extends PR #163 passive monitoring → active testing.
5-prompt test set, 3 configurations (baseline /
NSA-default / NSA-worktree), 5 metrics.
First NSA test run same-tick: `claude -p --model haiku-4-5`
cold-start query found Zeta project identity correctly but
FAILED to find Otto — gap identified (MEMORY.md had no
pointer to new per-user memories). Fixed same-tick.
Concrete demonstration: NSA testing catches substrate gaps
that current-session agents miss.
Attribution: Otto (loop-agent PM hat) for hat-less work.
No persona hats worn this tick.
Per-user memories filed:
- project_loop_agent_named_otto_role_project_manager_2026_04_23.md
- reference_claude_code_w_flag_is_worktree_not_workstream_cowork_is_separate_product_2026_04_23.md
- feedback_new_session_agent_persona_first_class_experience_test_fresh_sessions_including_worktree_2026_04_23.md
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a P1 BACKLOG row capturing Aaron's 2026-04-23 observation that fresh Claude Code sessions operate at noticeably lower quality than resumed sessions, and proposing research to close the gap.
Why this is P1
Fresh-session quality is a scaling property — a factory whose resumed sessions are excellent but whose fresh sessions are mediocre doesn't transplant to new maintainers cleanly. Max is anticipated as the next human maintainer per
CURRENT-aaron.md; his fresh-session experience is the benchmark.Candidate causes to investigate
CURRENT-<maintainer>.mdcoverage gaps (the fast-path is meant exactly for this)docs/research/soulfile-staged-absorption-model-2026-04-23.mdcompile-time ingest)Deliverables
docs/research/fresh-vs-resumed-session-quality-gap-YYYY-MM-DD.mdSelf-scheduled free work under the 2026-04-23 scheduling-authority rule.
🤖 Generated with Claude Code