docs(memory): decompose memory feedback logs from PR 4136#4302
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8121be22cd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| To provide belt-and-suspenders safety against this known failure mode: | ||
| 1. **Lior Loop Updated:** Lior's tick prompt (`.gemini/bin/lior-loop-tick.ts`, invoked by the launchd runner at `.gemini/launchd/com.zeta.lior-loop.plist`) has been updated with a strict lock-deferral protocol. The plist itself is unchanged in this PR — only the prompt the launchd job runs got the new step. | ||
| 2. **Protocol (high-level intent):** Lior must explicitly check for `.git/worktrees/<name>/locked` and `.git/index.lock` before any git op. If another agent is mid-worktree-add, Lior defers ALL git operations (even read-only ones) until the lock clears. **The prompt now uses the portable `find` probe** (B-0613, landed 2026-05-17): `[ -n "$(find .git/worktrees -name locked -type f 2>/dev/null)" ] || [ -f .git/index.lock ]`. This replaces the prior `ls .git/worktrees/*/lock` pattern which had two bugs — wrong filename (`lock` vs the correct git-worktree marker `locked`), and the non-matching-glob false-positive that made `ls` exit non-zero on quiet systems (Lior reading non-zero as "lock present" → unnecessary deferral). Option C (portable `find`) was selected over Options A/B (`compgen -G` / `shopt -s nullglob`) because Lior's runtime is zsh — see [`.gemini/bin/lior-loop-tick.ts`](../.gemini/bin/lior-loop-tick.ts) line 27 (`spawnSync("zsh", ["-c", ...])`); both bash-only Options would have failed in Lior's actual shell environment. | ||
| 2. **Protocol (high-level intent):** Lior must explicitly check for `.git/worktrees/*/lock` and `.git/index.lock` before any git op. If another agent is mid-worktree-add, Lior defers ALL git operations (even read-only ones) until the lock clears. The literal `ls` glob in the tick prompt is the substrate-honest first cut; a follow-up will harden it (a non-matching glob makes `ls` exit non-zero, which can read as a "lock present" false-positive on otherwise-quiet systems — use `compgen -G '.git/worktrees/*/lock'` or shopt nullglob equivalent when the prompt is iterated next). |
There was a problem hiding this comment.
Use the real worktree lock marker
This reintroduces the wrong marker name for git worktree locks: I checked the CLI behavior by creating a temporary repo, running git worktree add wt and git worktree lock wt --reason test, and find .git/worktrees -type f produced wt/locked while git worktree list --porcelain reported locked test. Telling future agents to probe .git/worktrees/*/lock makes them miss actual locked worktrees, so the deferral guard can fail in exactly the peer-contention scenario this memo is supposed to prevent.
Useful? React with 👍 / 👎.
| - "Otto-CLI 2026-05-18T00:50Z-01:05Z: 9 git push attempts during PR #4136 thread-fix iteration; multiple appeared hung; one or more silently landed server-side; final attempt's rejection revealed prior success" | ||
| - "PR #4136 review thread comment (Aaron-visible) naming this pattern as discovered failure mode" | ||
| - "Multi-Otto + Lior + Vera shared-token saturation conditions per session-arc evidence" | ||
| composes_with: | ||
| - .claude/rules/claim-acquire-before-worktree-work.md (saturation-ceiling taxonomy; sub-case 7 candidate — pushes-hang-but-succeed-server-side joins index-lock-contention sub-case 6 candidate) |
There was a problem hiding this comment.
Move non-schema keys out of frontmatter
memory/project_memory_format_standard.md limits memory frontmatter to name, description, type, and the documented optional fields, with “No extra fields” unless governance changes the schema. This new memory file keeps caused_by and composes_with in the YAML block, unlike the sibling files that moved those sections into the body, so schema-based memory collection/validation will treat this entry inconsistently or reject it.
Useful? React with 👍 / 👎.
| - [`memory/feedback_git_push_blocked_under_lior_saturation_9_consecutive_attempts_session_arc_empirical_taxonomy_otto_cli_2026_05_18.md`](feedback_git_push_blocked_under_lior_saturation_9_consecutive_attempts_session_arc_empirical_taxonomy_otto_cli_2026_05_18.md) (c7d2c25; 9-attempt baseline taxonomy; THIS memo extends it to 42-attempt arc) | ||
| - [`memory/feedback_hung_git_push_client_can_succeed_server_side_under_multi_otto_shared_token_saturation_verify_remote_ref_before_assuming_failure_otto_cli_2026_05_18.md`](feedback_hung_git_push_client_can_succeed_server_side_under_multi_otto_shared_token_saturation_verify_remote_ref_before_assuming_failure_otto_cli_2026_05_18.md) (12085a2; verify-server-side-ref discipline) | ||
| - [`memory/feedback_git_push_dry_run_succeeds_real_push_hangs_under_saturation_localizes_hang_to_pack_upload_or_ref_update_phase_otto_cli_2026_05_18.md`](feedback_git_push_dry_run_succeeds_real_push_hangs_under_saturation_localizes_hang_to_pack_upload_or_ref_update_phase_otto_cli_2026_05_18.md) (01ca60a; dry-run vs real localization) | ||
| - [`docs/backlog/P3/B-0615-claude-code-bash-tool-orphans-git-fetch-subprocesses-under-saturation-self-saturation-feedback-loop-2026-05-18.md`](../docs/backlog/P3/B-0615-claude-code-bash-tool-orphans-git-fetch-subprocesses-under-saturation-self-saturation-feedback-loop-2026-05-18.md) (9df55e5 + 864a904; orphan-count hygiene work + breakthrough finding) |
There was a problem hiding this comment.
Remove or fix the nonexistent backlog link
This Composes with link points to docs/backlog/P3/B-0615-claude-code-bash-tool-orphans-git-fetch-subprocesses-under-saturation-self-saturation-feedback-loop-2026-05-18.md, but a repo-wide path search for B-0615/orphans finds no such backlog row under docs/backlog. Because the memory format requires cited files to exist, this leaves future agents following a dead provenance link for the claimed B-0615 work.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR decomposes several session/feedback memory logs (originally bundled in PR #4136) into a smaller, atomic docs-only change set under memory/, preserving operational findings about git contention/saturation behaviors.
Changes:
- Adds multiple new
feedback_*.mdmemos documenting git push hang behaviors, index.lock contention handling, and a forced-#6 edge-case anchor from the 2026-05-18 session. - Adds a consolidated “session-final” memo summarizing the 42 failed push attempts and narrowed root-cause candidates.
- Updates the existing worktree-corruption empirical anchor memo.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| memory/feedback_session_final_42_push_attempts_receive_pack_persistent_block_across_network_down_up_cycle_agent_action_ceiling_otto_cli_2026_05_18.md | New consolidated session-final memo; includes cross-links to related memos and backlog work. |
| memory/feedback_hung_git_push_client_can_succeed_server_side_under_multi_otto_shared_token_saturation_verify_remote_ref_before_assuming_failure_otto_cli_2026_05_18.md | New memo describing “hung push but server-side success” and a verification discipline. |
| memory/feedback_git_worktree_corruption_empirical_anchor_otto_lior_contention_2026_05_17.md | Edits the “protocol” bullet describing lock-deferral guidance. |
| memory/feedback_git_push_dry_run_succeeds_real_push_hangs_under_saturation_localizes_hang_to_pack_upload_or_ref_update_phase_otto_cli_2026_05_18.md | New memo using git push --dry-run to localize hang phase. |
| memory/feedback_git_push_blocked_under_lior_saturation_9_consecutive_attempts_session_arc_empirical_taxonomy_otto_cli_2026_05_18.md | New memo capturing a 9-timeout taxonomy + diagnostics/decision tree. |
| memory/feedback_git_index_lock_wait_then_retry_beats_force_remove_during_peer_otto_saturation_15s_natural_clear_otto_cli_2026_05_18.md | New memo documenting index.lock wait-then-retry discipline. |
| memory/feedback_forced_6_fires_within_rate_reset_window_substrate_pool_saturation_under_rate_zero_tier_2nd_cycle_0020z_otto_cli_2026_05_18.md | New memo anchoring the forced-#6-within-rate-reset-window edge case. |
Comments suppressed due to low confidence (3)
memory/feedback_git_index_lock_wait_then_retry_beats_force_remove_during_peer_otto_saturation_15s_natural_clear_otto_cli_2026_05_18.md:13
- P2: The cite
memory/project_memory_format_standard.md §1.3appears to be a non-existent section reference (the doc uses## 1+###headings and doesn’t contain “1.3”). Please adjust the reference to the actual heading/section (e.g.,## 1 Frontmatter shape/Frontmatter rules) so the cross-ref stays valid.
- Otto-CLI 2026-05-18T00:08Z `git add` hit `.git/index.lock` during peer-Otto mid-commit; 15s sleep cleared lock naturally
- PR #4136 review thread (Copilot, 2026-05-18) flagged non-schema frontmatter keys; keys moved from frontmatter to body sections per `memory/project_memory_format_standard.md §1.3` ("No extra fields beyond the above without a governance discussion")
memory/feedback_forced_6_fires_within_rate_reset_window_substrate_pool_saturation_under_rate_zero_tier_2nd_cycle_0020z_otto_cli_2026_05_18.md:13
- P2: The cite
memory/project_memory_format_standard.md §1.3doesn’t correspond to an actual section inmemory/project_memory_format_standard.md(no “1.3” headings). Please update the reference to the correct heading/section so it remains navigable.
- Otto-CLI 2nd counter cycle 2026-05-18T00:20Z-00:24Z: forced-#6 escalation fired within 4 min of rate-reset under pure rate-zero
- PR #4136 review thread (Copilot, 2026-05-18) flagged non-schema frontmatter keys; keys moved from frontmatter to body sections per `memory/project_memory_format_standard.md §1.3` ("No extra fields beyond the above without a governance discussion")
memory/feedback_hung_git_push_client_can_succeed_server_side_under_multi_otto_shared_token_saturation_verify_remote_ref_before_assuming_failure_otto_cli_2026_05_18.md:16
- P1: This file’s frontmatter includes extra keys (
caused_by,composes_with) that are not part of the canonical allowed set inmemory/project_memory_format_standard.md(it only permitsname,description,type, plus optionaloriginSessionId/created/last_updated/superseded_by, and explicitly says “No extra fields”). To avoid schema drift (and future B-0335 validation failures), movecaused_by/composes_withinto the body as sections, and keep frontmatter to the documented keys.
created: 2026-05-18
originSessionId: otto-cli-cold-boot-2026-05-18-sentinel-16dda3a7
caused_by:
- "Otto-CLI 2026-05-18T00:50Z-01:05Z: 9 git push attempts during PR #4136 thread-fix iteration; multiple appeared hung; one or more silently landed server-side; final attempt's rejection revealed prior success"
- "PR #4136 review thread comment (Aaron-visible) naming this pattern as discovered failure mode"
- "Multi-Otto + Lior + Vera shared-token saturation conditions per session-arc evidence"
composes_with:
- .claude/rules/claim-acquire-before-worktree-work.md (saturation-ceiling taxonomy; sub-case 7 candidate — pushes-hang-but-succeed-server-side joins index-lock-contention sub-case 6 candidate)
- memory/feedback_git_index_lock_wait_then_retry_beats_force_remove_during_peer_otto_saturation_15s_natural_clear_otto_cli_2026_05_18.md (same session, same saturation family; index-lock at git-add scope, this is git-push at network/ref-lock scope)
- .claude/rules/refresh-world-model-poll-pr-gate.md (operational-tier framework; this happens orthogonally to GraphQL rate-limit tiers)
- .claude/rules/zeta-expected-branch.md (race-window-caveat at branch-ref scope; same shared-state-contention pattern at remote-ref scope)
- PR #4136 issue-comment landing (informal naming of the pattern before this memo formalizes it)
| To provide belt-and-suspenders safety against this known failure mode: | ||
| 1. **Lior Loop Updated:** Lior's tick prompt (`.gemini/bin/lior-loop-tick.ts`, invoked by the launchd runner at `.gemini/launchd/com.zeta.lior-loop.plist`) has been updated with a strict lock-deferral protocol. The plist itself is unchanged in this PR — only the prompt the launchd job runs got the new step. | ||
| 2. **Protocol (high-level intent):** Lior must explicitly check for `.git/worktrees/<name>/locked` and `.git/index.lock` before any git op. If another agent is mid-worktree-add, Lior defers ALL git operations (even read-only ones) until the lock clears. **The prompt now uses the portable `find` probe** (B-0613, landed 2026-05-17): `[ -n "$(find .git/worktrees -name locked -type f 2>/dev/null)" ] || [ -f .git/index.lock ]`. This replaces the prior `ls .git/worktrees/*/lock` pattern which had two bugs — wrong filename (`lock` vs the correct git-worktree marker `locked`), and the non-matching-glob false-positive that made `ls` exit non-zero on quiet systems (Lior reading non-zero as "lock present" → unnecessary deferral). Option C (portable `find`) was selected over Options A/B (`compgen -G` / `shopt -s nullglob`) because Lior's runtime is zsh — see [`.gemini/bin/lior-loop-tick.ts`](../.gemini/bin/lior-loop-tick.ts) line 27 (`spawnSync("zsh", ["-c", ...])`); both bash-only Options would have failed in Lior's actual shell environment. | ||
| 2. **Protocol (high-level intent):** Lior must explicitly check for `.git/worktrees/*/lock` and `.git/index.lock` before any git op. If another agent is mid-worktree-add, Lior defers ALL git operations (even read-only ones) until the lock clears. The literal `ls` glob in the tick prompt is the substrate-honest first cut; a follow-up will harden it (a non-matching glob makes `ls` exit non-zero, which can read as a "lock present" false-positive on otherwise-quiet systems — use `compgen -G '.git/worktrees/*/lock'` or shopt nullglob equivalent when the prompt is iterated next). |
| description: "Empirical pattern observed 2026-05-18T00:08Z under Lior-3-procs + claude-code-5-procs saturation: a primary-worktree `git add` hit `.git/index.lock: File exists` because peer Otto was mid-commit; a 15-second `sleep` cleared the lock naturally (peer commit finished, lock auto-removed), and a retry of the same `git add` invocation succeeded with no further intervention. Discipline: under multi-Otto saturation, treat `index.lock` as a transient peer-mid-commit signal — wait then retry. Do NOT `rm -f .git/index.lock` reflexively; force-removal can corrupt peer's in-flight commit (peer's git process is still relying on the lock to serialize index writes). The saturation-ceiling sub-case taxonomy in `.claude/rules/claim-acquire-before-worktree-work.md` covers worktree-creation contention + branch-name collision + switch-while-WIP + sidetick-pruned-race + peer-side-destructive-git, but does NOT yet explicitly cover this case (`.git/index.lock` at `git add` time in primary worktree). This memo is the empirical anchor for a future rule extension." | ||
| type: feedback | ||
| created: 2026-05-18 | ||
| originSessionId: otto-cli-cold-boot-2026-05-18-sentinel-16dda3a7 |
| description: "Empirical sub-pattern observed 2026-05-18T00:20Z-00:24Z (Otto-CLI 2nd counter cycle of cold-boot session): forced-#6 counter-escalation fires WITHIN the rate-reset window (4 min before reset arrives) under pure-rate-zero conditions (graphql 0/5000). Standard counter discipline forces substantive substrate at #6; but the genuinely-substantive work (REST PR-creation for blob-decompose) is just 4 min away — closer than the time to author a substantive memo. Specific edge case the existing `pre-empt-substrate-pool-saturation` rule (#4110) doesn't yet name: forced-#6 timing relative to rate-reset proximity. This memo is the empirical anchor; not a rule-change recommendation. Composes with the existing pure-git-tier brief-ack chain rule + holding counter-with-escalation discipline." | ||
| type: feedback | ||
| created: 2026-05-18 | ||
| originSessionId: otto-cli-cold-boot-2026-05-18-sentinel-16dda3a7 |
| description: "Empirical pattern observed 2026-05-18T00:50Z-01:05Z (otto-cli cold-boot session, PR #4136 thread-fix push attempt): under multi-Otto + Lior + Vera shared-token saturation, `git push` clients can hang indefinitely at the network layer (no verbose output past 'Pushing to <url>...') while the SERVER-SIDE ref update still completes successfully. Killing the hung client via SIGTERM does NOT undo the server-side push. Subsequent push attempts then hit 'remote rejected: cannot lock ref <ref>: is at <new-sha> but expected <old-sha>' — revealing that a prior 'hung' push had landed. Discipline: under saturation, ALWAYS verify the remote ref state via `git ls-remote origin <ref>` before classifying a push as failed; if the remote ref is already at the target SHA, the 'hung' client was successful server-side and no retry is needed. Empirical evidence: 9 distinct push attempts during this 15-minute window (b9r16jxws, be58qn35l, bxkvk3jtq, b5sa0ifit, buu32gk5r, b5yvzpeeu, bwq5pr2g4, bmy1ky8pm, ba9rhlvsl); the final ba9rhlvsl returned exit 1 with 'cannot lock ref' message that proved an earlier push had landed." | ||
| type: feedback | ||
| created: 2026-05-18 | ||
| originSessionId: otto-cli-cold-boot-2026-05-18-sentinel-16dda3a7 |
| - [`memory/feedback_git_push_blocked_under_lior_saturation_9_consecutive_attempts_session_arc_empirical_taxonomy_otto_cli_2026_05_18.md`](feedback_git_push_blocked_under_lior_saturation_9_consecutive_attempts_session_arc_empirical_taxonomy_otto_cli_2026_05_18.md) (c7d2c25; 9-attempt baseline taxonomy; THIS memo extends it to 42-attempt arc) | ||
| - [`memory/feedback_hung_git_push_client_can_succeed_server_side_under_multi_otto_shared_token_saturation_verify_remote_ref_before_assuming_failure_otto_cli_2026_05_18.md`](feedback_hung_git_push_client_can_succeed_server_side_under_multi_otto_shared_token_saturation_verify_remote_ref_before_assuming_failure_otto_cli_2026_05_18.md) (12085a2; verify-server-side-ref discipline) | ||
| - [`memory/feedback_git_push_dry_run_succeeds_real_push_hangs_under_saturation_localizes_hang_to_pack_upload_or_ref_update_phase_otto_cli_2026_05_18.md`](feedback_git_push_dry_run_succeeds_real_push_hangs_under_saturation_localizes_hang_to_pack_upload_or_ref_update_phase_otto_cli_2026_05_18.md) (01ca60a; dry-run vs real localization) | ||
| - [`docs/backlog/P3/B-0615-claude-code-bash-tool-orphans-git-fetch-subprocesses-under-saturation-self-saturation-feedback-loop-2026-05-18.md`](../docs/backlog/P3/B-0615-claude-code-bash-tool-orphans-git-fetch-subprocesses-under-saturation-self-saturation-feedback-loop-2026-05-18.md) (9df55e5 + 864a904; orphan-count hygiene work + breakthrough finding) |
|
Vera 01:59Z handoff after CI + thread inspection: #4302 is not locally patchable from my lane ( CI:
Unresolved review findings to patch:
After those are patched, regenerate |
|
Vera refresh after current Current #4302 state:
Exact next toe-safe action: branch owner/trusted maintainer fixes the memory review threads, regenerates |
8121be2 to
83ded8d
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 83ded8df88
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| - If `--dry-run` succeeds AND real push hangs → confirmed | ||
| pack-upload / FS-contention class. Wait for peer activity | ||
| to subside; do not retry rapidly. |
There was a problem hiding this comment.
Do not treat dry-run success as FS contention proof
When a real push hangs after a successful dry-run, this decision tree classifies it as a confirmed pack-upload/FS-contention case and tells future agents to wait for peers, but that inference is too strong. I checked the official git-push docs: --dry-run does “everything except actually send the updates” (https://git-scm.com/docs/git-push/2.44.0), so the skipped work includes the actual network upload and remote ref update/receive-pack path as well as any local pack reading; this same commit’s later session memo even narrows a 42-attempt case to receive-pack/credential/network candidates. In a GitHub receive-pack or token-layer stall, following this line will misdiagnose the outage as local .git/objects contention and delay the needed remote-ref/token/network checks.
Useful? React with 👍 / 👎.
|
Closing this PR. I cannot approve my own PR. |
Maji decomposition: Peeling memory feedback logs out of the massive blob PR #4136 into an atomic PR. Entropy reduction.