-
Notifications
You must be signed in to change notification settings - Fork 1
Round 44: parallel-worktree-safety cartographer research #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,223 @@ | ||||||||||||||||||||||||||||||||||||||
| # Parallel worktree safety — research (cartographer pass) | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| **Date:** 2026-04-22 | ||||||||||||||||||||||||||||||||||||||
| **Trigger:** Aaron across nine messages in one tick: | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| - *"that's nice you can parallelize now with worktrees"* | ||||||||||||||||||||||||||||||||||||||
| - *"next time i'll restart with the flag set i think it -w"* / *"IDK if it matters"* | ||||||||||||||||||||||||||||||||||||||
| - *"we want to use it always for this software factory now, we want to promote best practices and parallelism"* | ||||||||||||||||||||||||||||||||||||||
| - *"i think you are going to have to merge on one of those PRs, want to make sure you don't live lock bouncing back and fourth between the the two PRs too"* | ||||||||||||||||||||||||||||||||||||||
| - *"i bet depending on the build speed there is alimit to parallelism, the faster the build the more we can scale, i'm glad incremental builds and running only affected tests are on the backlog."* | ||||||||||||||||||||||||||||||||||||||
| - *"yall are going to conflict with each other too problably i bet you edited a bunch of the same files. Wow it's gonna be hard to get you to parallelize wihout live locks."* | ||||||||||||||||||||||||||||||||||||||
| - *"it might be better just to wait on the build and do resarch on how to parallel safely with all that taken into account plus the unknow unknowns lol cartographer"* | ||||||||||||||||||||||||||||||||||||||
| - *"oh part of the git surface for you is cleaning up stale branches on our repo on a cadence, you could also add preventive measures to stop them from showiing up i the first please, i can you can make the PR close them automaticlly for instance but still need he compesating action in case it regreses."* | ||||||||||||||||||||||||||||||||||||||
| - *"oh now how do memory and stuff work when i'm chatting while you are on a worktree?"* | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| **Author:** opus-4-7 / session round-44 | ||||||||||||||||||||||||||||||||||||||
| **Status:** map-before-walk. No parallel worktree spawns this tick, no `EnterWorktree` default flip, no BACKLOG items promoted beyond P1 queue until Aaron signs off. | ||||||||||||||||||||||||||||||||||||||
| **Scope-tag:** factory-universal (not Zeta-project-specific) — every software factory using Claude Code worktrees can absorb this. | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| ## 1. What Aaron is asking for | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| Make parallel execution via worktrees the factory default, but **only after** the safety map is drawn — every known-hazard charted, preventive-AND-compensating action paired per the discovered-class principle, unknown-unknowns at least enumerated by class even if not enumerated by instance. | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| The cartographer metaphor (`memory/feedback_kanban_factory_metaphor_blade_crystallize_materia_pipeline.md`) is invoked explicitly: *map the territory* of parallel-worktree-operation before *walking it*. | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| ## 2. Hazard map — what can go wrong | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| ### 2.1 Live-lock between parallel worktrees (highest-severity known) | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| **Shape:** Agent A in worktree W₁ edits file F. Agent B in worktree W₂ also edits F. Both attempt merge back to main. One merges; the other's merge conflicts. Resolving the conflict requires re-running the slow build in the second worktree. Meanwhile, a third tick spawns and edits F again. The conflict-resolve-rebase cycle outruns the resolve cycle → neither worktree's work lands → live-lock. | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| **Why it's worse than deadlock:** deadlock is static (one detector catches it); live-lock is *progress-looking-like-no-progress* — commits keep landing in worktrees, CI keeps running, but nothing integrates. | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| Aaron named this twice: *"don't live lock bouncing back and fourth between the the two PRs"* + *"gonna be hard to get you to parallelize wihout live locks."* | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| **Class-detector candidates (pair with every preventive mitigation):** | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| 1. **Overlap registry.** Before spawning a parallel worktree, record `(worktree-name, scope-files-or-globs, spawning-agent, round-tag)` to `docs/hygiene-history/worktree-scope-registry.md` (or similar). New worktree requests that overlap existing-registered scope are refused OR merged into the existing worktree. | ||||||||||||||||||||||||||||||||||||||
| 2. **Pre-merge conflict probe.** Before kicking off work in a new worktree, `git merge-tree` the target branch against the registered scope of every open worktree. If conflict, refuse or warn. | ||||||||||||||||||||||||||||||||||||||
| 3. **Round-timeout on unmerged worktree.** A worktree that stays unmerged past N rounds is a stale-branch incident (see §2.5); cadenced hygiene catches it. | ||||||||||||||||||||||||||||||||||||||
| 4. **Merge-front throughput monitor.** Track `worktree-close → merged-to-main` latency per round. If P95 exceeds some threshold, the parallelism-ceiling has been hit (see §2.3). | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| **Preventive structural fix:** *scope discipline*. Each worktree carries a declared scope (file-path-glob or subsystem name) at spawn time. Two worktrees whose scopes intersect are disallowed. *This is the primary invariant* — the registry + probe enforce it. | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| ### 2.2 Merge conflicts as a superset of live-lock (expected, not pathological) | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| Even without live-lock, parallel worktrees will produce ordinary merge conflicts. This is *not* a bug class in itself — it is the expected cost of parallelism. The hazard is when conflict-resolution *cost* exceeds parallelism-*benefit*. | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| **Rule of thumb (to be measured, not presumed):** if integration-cost > parallelism-gain, serialize. The threshold is empirical and will vary with build speed (§2.3), file-layout (files that everyone touches are anti-parallel), and round tempo. | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| **What the factory should instrument:** | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| - Time-from-worktree-spawn to worktree-merged. | ||||||||||||||||||||||||||||||||||||||
| - Number of conflict-files per merge. | ||||||||||||||||||||||||||||||||||||||
| - Re-work (subagent re-runs after conflict resolution) count. | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| These become observability signals for the parallel-worktree policy itself. | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| ### 2.3 Build-speed ceiling — parallelism is rate-limited by the gate | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| Aaron: *"i bet depending on the build speed there is alimit to parallelism, the faster the build the more we can scale, i'm glad incremental builds and running only affected tests are on the backlog."* | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| **The invariant:** parallel worktrees can only be productively integrated as fast as CI validates their merges. If CI is 15 minutes and worktrees spawn every 3 minutes, the integration queue grows unbounded. | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| **Existing BACKLOG coverage:** | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| - Incremental builds (backlog P1/P2) — reduces gate time. | ||||||||||||||||||||||||||||||||||||||
| - Affected-tests-only (backlog) — reduces gate time. | ||||||||||||||||||||||||||||||||||||||
| - CI cache warming — reduces gate time. | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| **New coverage this research surfaces:** | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| - **Measure before flipping.** Before `EnterWorktree` becomes factory-default, collect baseline: median/P95 gate time on main, median/P95 gate time on worktree PRs (they share the same gate). The parallelism ceiling N ≈ gate-time-budget / spawn-rate. If N < 2, parallel worktrees are not yet net-positive. | ||||||||||||||||||||||||||||||||||||||
| - **Backpressure.** When the integration queue exceeds N, refuse new worktree spawns (or queue them) rather than fan out further. | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| ### 2.4 Stale-branch accumulation (Aaron's preventive+compensating ask) | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| Aaron: *"part of the git surface for you is cleaning up stale branches on our repo on a cadence, you could also add preventive measures to stop them from showiing up i the first please, i can you can make the PR close them automaticlly for instance but still need he compesating action in case it regreses."* | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| This is a direct application of the `feedback_discovered_class_outlives_fix_anti_regression_detector_pair.md` principle: **preventive fix + compensating detector, both permanent**. | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| **Preventive:** | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| - **Auto-delete branch on PR merge.** GitHub setting: *Automatically delete head branches* (Settings → General → Pull Requests). One-time toggle; subsequent PR merges auto-delete their branch. | ||||||||||||||||||||||||||||||||||||||
| - **Auto-delete branch on PR close (unmerged).** Same setting covers this since PR close and merge both fire the hook. | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
| - **Auto-delete branch on PR close (unmerged).** Same setting covers this since PR close and merge both fire the hook. | |
| - **PR close (unmerged) needs a separate mechanism.** GitHub's *Automatically delete head branches* setting does **not** delete branches when a PR is closed unmerged. If we want deletion-on-close, document and ship that separately as a workflow/bot policy; otherwise rely on the audits below plus the factory's explicit branch-removal convention. |
Copilot
AI
Apr 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: Grammar issue: “does NOT re-keyed the slug” is ungrammatical; it reads like a mix of active/passive voice. Rephrase to “does not re-key the slug” or “the slug is not re-keyed” to keep the statement clear.
| - **Single session that uses `EnterWorktree`:** the slug is set when the session starts, based on the initial CWD (the main repo root). `EnterWorktree` changes the session's CWD but does NOT re-keyed the slug. Memory continues to load/write from the original slug. Tool calls using absolute paths (which all of mine do) work identically across the boundary. **Verified**: this tick's session started in `/Users/acehack/Documents/src/repos/Zeta`, entered a worktree at `.claude/worktrees/pr32-markdownlint`, wrote three memory files from within the worktree, and `ls ~/.claude/projects/` shows only the main-repo slug — no worktree-specific slug was created. | |
| - **Single session that uses `EnterWorktree`:** the slug is set when the session starts, based on the initial CWD (the main repo root). `EnterWorktree` changes the session's CWD but does NOT re-key the slug. Memory continues to load/write from the original slug. Tool calls using absolute paths (which all of mine do) work identically across the boundary. **Verified**: this tick's session started in `/Users/acehack/Documents/src/repos/Zeta`, entered a worktree at `.claude/worktrees/pr32-markdownlint`, wrote three memory files from within the worktree, and `ls ~/.claude/projects/` shows only the main-repo slug — no worktree-specific slug was created. |
Copilot
AI
Apr 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P1 (xref): This section instructs updating docs/AUTONOMOUS-LOOP.md, but that file does not exist in the repo at this path. Please either point to the correct existing doc (if renamed) or add the missing doc/update the reference so the action item is actionable.
| - Document this rule in `docs/AUTONOMOUS-LOOP.md` under "session-start checklist" (if that section exists; otherwise add one). | |
| - Document this rule in `CLAUDE.md` under a session-start checklist or startup hygiene section (add that section if it does not exist yet). |
Copilot
AI
Apr 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P1 (xref): The referenced hygiene-history path docs/hygiene-history/loop-tick-history.md (and the docs/hygiene-history/ directory) doesn’t exist in the repo. If the canonical append-only history is stored elsewhere (e.g., ROUND-HISTORY or another log), please update the reference; otherwise mark this as a proposed new file path rather than an existing one.
| A worktree is a separate working directory. Files that are auto-appended-to every tick (e.g. `docs/hygiene-history/loop-tick-history.md`) will diverge between the worktree and main if the tick runs in the worktree. | |
| **Example of the hazard (this tick almost hit it):** I entered `pr32-markdownlint` worktree on `round-42-speculative`. If I had written to `docs/hygiene-history/loop-tick-history.md` *from within the worktree* instead of *from the main repo*, the entry would have landed on the wrong branch and wrong working-tree copy. Then at tick close, the main repo wouldn't reflect the tick. | |
| **Class-detector:** tick-history appends must happen in the main repo, not inside a worktree. Enforcement options: | |
| 1. CLAUDE.md rule: "append tick-history *after* `ExitWorktree`, never before." | |
| 2. Pre-commit hook: if a commit modifies `docs/hygiene-history/loop-tick-history.md` AND the branch is not `main`/`round-NN-speculative`, refuse. | |
| 3. A helper function `append-tick-history.sh` that `cd`s to main root before writing. | |
| A worktree is a separate working directory. Files that are auto-appended-to every tick (for example, the canonical tick-history log, wherever it lives in the repo) will diverge between the worktree and main if the tick runs in the worktree. If the project later creates `docs/hygiene-history/loop-tick-history.md`, treat that as a proposed path, not a current one. | |
| **Example of the hazard (this tick almost hit it):** I entered `pr32-markdownlint` worktree on `round-42-speculative`. If I had written to the canonical tick-history log *from within the worktree* instead of *from the main repo*, the entry would have landed on the wrong branch and wrong working-tree copy. Then at tick close, the main repo wouldn't reflect the tick. | |
| **Class-detector:** tick-history appends must happen in the main repo, not inside a worktree. Enforcement options: | |
| 1. CLAUDE.md rule: "append tick-history *after* `ExitWorktree`, never before." | |
| 2. Pre-commit hook: if a commit modifies the canonical tick-history log and the branch is not `main`/`round-NN-speculative`, refuse. | |
| 3. A helper function `append-tick-history.sh` that `cd`s to main root before writing to the canonical tick-history log. |
Copilot
AI
Apr 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: The markdown table in §3 has an extra leading | on each row (e.g., || Hazard | ...), which creates an empty first column in most renderers. Consider removing the extra pipe so the table renders as intended.
Copilot
AI
Apr 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P1 (xref): Several items in “Related docs & memories” point to files that aren’t present in the repo at the cited paths (e.g., memory/feedback_discovered_class_outlives_fix_anti_regression_detector_pair.md, memory/feedback_live_loop_detector_speculative_on_pr_branch.md, docs/research/worktree-pattern-for-live-loop-prevention-2026-04-22.md). Please either add these files in the PR or update the references to the correct existing locations so pointer-integrity audits don’t fail.
| - `memory/feedback_discovered_class_outlives_fix_anti_regression_detector_pair.md` — the principle this research instantiates across 8 hazards. | |
| - `memory/feedback_live_loop_detector_speculative_on_pr_branch.md` — the live-loop class that motivated the worktree pattern in the first place. | |
| - `docs/research/worktree-pattern-for-live-loop-prevention-2026-04-22.md` — the prior research doc that introduced `EnterWorktree` to the factory; this doc extends it from single-worktree to parallel-worktree safety. | |
| - `memory/feedback_kanban_factory_metaphor_blade_crystallize_materia_pipeline.md` — cartographer metaphor source. | |
| - `memory/feedback_never_idle_speculative_work_over_waiting.md` — the rule that drives tick-cadence; parallelism must not violate it via backpressure-starvation. | |
| - The discovered-class-outlives-fix rule — the principle this research instantiates across 8 hazards. | |
| - The live-loop detector rule for speculative work on PR branches — the live-loop class that motivated the worktree pattern in the first place. | |
| - The prior worktree-pattern research doc that introduced `EnterWorktree` to the factory; this doc extends it from single-worktree to parallel-worktree safety. | |
| - `memory/feedback_kanban_factory_metaphor_blade_crystallize_materia_pipeline.md` — cartographer metaphor source. | |
| - The never-idle speculative-work-over-waiting rule — the rule that drives tick-cadence; parallelism must not violate it via backpressure-starvation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P1 (codebase convention): This doc uses the human maintainer’s personal name (“Aaron”) in narrative headings and throughout. The repo standard is to avoid name attribution in docs/skill bodies and use role refs (e.g., “human maintainer”) instead (see docs/AGENT-BEST-PRACTICES.md:284-290). Please rename the references accordingly (and keep names confined to the allowed locations).