diff --git a/memory/MEMORY.md b/memory/MEMORY.md index a6ecc993..c6b54e56 100644 --- a/memory/MEMORY.md +++ b/memory/MEMORY.md @@ -2,6 +2,8 @@ **๐Ÿ“Œ Fast path: read `CURRENT-aaron.md` and `CURRENT-amara.md` first.** These per-maintainer distillations show what's currently in force. Raw memories below are the history; CURRENT files are the projection. (`CURRENT-aaron.md` refreshed 2026-04-25 with the Otto-281..285 substrate cluster + factory-as-superfluid framing โ€” sections 18-22; prior refresh 2026-04-24 covered sections 13-17.) +- [**Reviewer false-positive pattern catalog โ€” 7-class taxonomy + per-class resolution forms + ROI-ranked prevention (Aaron 2026-04-28)**](feedback_reviewer_false_positive_pattern_catalog_aaron_2026_04_28.md) โ€” Stale-snapshot / carve-out blind spot / schema drift / wrong-language parser / convention conflict / broken xref / recursive-CI-new-threads; speeds future thread classification; high-ROI prevention candidates listed. +- [**CALIBRATION โ€” `requiredApprovingReviewCount=0` on both Zeta forks; BLOCKED โ‰  reviewer; 5-class taxonomy + complete enum coverage (Aaron 2026-04-28)**](feedback_no_required_approval_on_zeta_BLOCKED_means_threads_or_ci_aaron_2026_04_28.md) โ€” 5 BLOCKED classes (threads / failing-or-pending CI / merge conflicts / required-check-MISSING-from-rollup / repository-ruleset gates); failed-conclusion enum covers FAILURE/CANCELLED/TIMED_OUT/ACTION_REQUIRED/STARTUP_FAILURE/STALE; pending-status enum covers IN_PROGRESS/QUEUED/WAITING/REQUESTED/PENDING; CheckRun.name vs StatusContext.context union extraction; always-double-check-after-CI rule. - [**Otto-355 โ€” BLOCKED-with-green-CI means investigate review threads FIRST (Aaron 2026-04-27)**](feedback_otto_355_blocked_with_green_ci_means_investigate_review_threads_first_dont_wait_2026_04_27.md) โ€” 5th wake-time discipline. When GitHub reports BLOCKED + all CI green + auto-merge armed, query unresolved review threads via GraphQL BEFORE classifying as wait. Most BLOCKEDs are unresolved threads, not opaque gates. - [**Otto-359 โ€” Otto uniquely positioned to clean Aaron-Mirror from substrate (Aaron 2026-04-27)**](feedback_otto_359_otto_uniquely_positioned_to_clean_aaron_mirror_language_from_substrate_aaron_cant_see_own_jargon_2026_04_27.md) โ€” Substrate-cleanup authority granted. Aaron can't see his own Mirror jargon; Otto is uniquely poised to clean it. Preserve Aaron-coinages (Maji/Glass Halo/ECRP/Linguistic Seed); narrow catch-all overreaches per Otto-358; discrete tractable PRs not big-bang rewrite. - [Otto-356 MIRROR-vs-BEACON LANGUAGE REGISTER โ€” Aaron 2026-04-27 clarification: Mirror = internal jargon Aaron+Otto share (Maji / ECRP / Glass Halo / Linguistic Seed / Otto-NN / Zetaspace / etc.); Beacon = external-safe / standard / common-vernacular any human or AI recognizes; rule โ€” public-facing surfaces (skill descriptions, PR comments to outside reviewers, README, error messages, math papers, ADRs) use Beacon; internal substrate (Otto-NN memos, persona notebooks, agent-ferries with shared context) keeps Mirror](feedback_otto_356_mirror_internal_vs_beacon_external_language_register_discipline_2026_04_27.md) โ€” 2026-04-27: register-discipline NOT philosophical-framing-shift (I W_t-overcomplicated as Wittgenstein-style passive-vs-active emission); audience-has-index test โ†’ Mirror fine; no-index โ†’ Beacon required; Aaron's coinages STAY, get glossed for external surfaces; Otto-356 IS itself a Zetaspace-failure-and-correction example (substrate-default beats W_t-default). diff --git a/memory/feedback_no_required_approval_on_zeta_BLOCKED_means_threads_or_ci_aaron_2026_04_28.md b/memory/feedback_no_required_approval_on_zeta_BLOCKED_means_threads_or_ci_aaron_2026_04_28.md new file mode 100644 index 00000000..db46700a --- /dev/null +++ b/memory/feedback_no_required_approval_on_zeta_BLOCKED_means_threads_or_ci_aaron_2026_04_28.md @@ -0,0 +1,441 @@ +--- +name: >- + BOTH FORKS โ€” requiredApprovingReviewCount=0 on Zeta; BLOCKED never + means "waiting for reviewer approval"; the only blockers are + unresolved review threads, failing/pending status checks, or merge + conflicts; this is a CALIBRATION CONSTANT for the project's branch + protection ruleset on AceHack/Zeta and Lucent-Financial-Group/Zeta + โ€” verified empirically via gh api graphql against + branchProtectionRule 2026-04-28; Aaron 2026-04-28 caught me + parroting "blocked on reviewer approval" multiple times in this + session despite zero approvals being required +description: >- + Aaron 2026-04-28 input after I claimed LFG #660 was "BLOCKED + awaiting reviewer" โ€” he asked *"are you sure, it's not something + simple you can figure out?"* prompting me to actually query the + branch-protection rule via GraphQL. Result โ€” + requiredApprovingReviewCount=0 on origin/main (LFG). The same is + true on AceHack/main. **No approval is required to merge on this + project โ€” neither fork.** A mergeStateStatus=BLOCKED with green CI + on Zeta MUST mean one or more of: (1) unresolved review threads + (requiresConversationResolution=true), (2) pending or failing + status checks in the required list, (3) merge conflicts. NEVER + means "waiting for human reviewer approval" โ€” there is no + human-reviewer-approval gate configured. I made this same + misdiagnosis multiple times in this session despite Otto-355 + (BLOCKED-investigate-threads-first) being a wake-time discipline. + Aaron explicit ask โ€” save somewhere that + requiredApprovingReviewCount=0 on this project โ€” this memory IS + that durable reminder, indexed in MEMORY.md so fresh sessions hit + it before falling into the same misdiagnosis. +type: project +--- + +# Calibration constant โ€” `requiredApprovingReviewCount: 0` on Zeta + +## The constant (verified 2026-04-28) + +Both forks of Zeta have `requiredApprovingReviewCount: 0` configured +on the `main` branch protection ruleset: + +- `https://github.com/AceHack/Zeta` โ€” main +- `https://github.com/Lucent-Financial-Group/Zeta` โ€” main + +**No human reviewer approval is required to merge any PR.** The +`requiresApprovingReviews: true` flag is on (so the review system is +*enabled*) but the *count* required is zero โ€” meaning a PR can merge +with zero approving reviews as long as the other gates clear. + +## What `mergeStateStatus: BLOCKED` actually means on Zeta + +When the GitHub API reports `mergeStateStatus: BLOCKED` on a Zeta PR, +the blocker is **one OR MORE** of these FIVE classes (they CAN +coexist โ€” e.g., a PR can simultaneously have unresolved threads AND +pending CI; fixing only one class won't unblock the merge until ALL +classes are clear; the diagnostic playbook below MUST check all five +before declaring the diagnosis exhausted): + +1. **Unresolved review threads.** `requiresConversationResolution: true` + is set on both forks. Even ONE unresolved thread blocks merge. Even + if the thread is `isOutdated: true` after a force-push, it still + blocks until explicitly resolved (per + `feedback_outdated_review_threads_block_merge_resolve_explicitly_after_force_push_2026_04_27.md`). + +2. **Failing or pending required status checks.** The required-context + list on both forks includes: + - `lint (semgrep)` + - `lint (shellcheck)` + - `lint (actionlint)` + - `lint (markdownlint)` + - `build-and-test (macos-26)` + - `build-and-test (ubuntu-24.04)` + - `build-and-test (ubuntu-24.04-arm)` + + Any of these in `FAILURE`, `IN_PROGRESS`, or `QUEUED` blocks merge. + +3. **Merge conflicts** with the base branch. Surfaces as + `mergeable: CONFLICTING` in the same API response and as `DIRTY` + in `mergeStateStatus`. + +4. **Required check missing entirely from the tip commit's rollup + contexts.** This is the SNEAKIEST class โ€” every reported context + is `SUCCESS`, no failures, no pending, no conflicts, all threads + resolved, and `statusCheckRollup.state == SUCCESS` โ€” but a + required check from `branchProtectionRule.requiredStatusCheckContexts` + is **absent** from the contexts list (it never reported). Branch + protection treats absent-required as blocking even though the + visible signal is all-green. + + **How this happens:** matrix workflows where one leg failed to + start (resource unavailable, fork-permission gate, runner-class + capacity exhaustion, transient infrastructure error pre-job- + queue), workflows that didn't trigger because of `paths:` filter + on a PR that didn't touch matching paths, deleted required-check + names that no longer match any workflow output. + + **Diagnostic:** compare + `branchProtectionRule.requiredStatusCheckContexts` (the required + list) against the actual context-name set. **Important:** the + contexts query returns a UNION of `CheckRun` and `StatusContext` + nodes; the name field is `name` on CheckRun, `context` on + StatusContext. Extract both: + + ```python + actual = {n['name'] if n['__typename']=='CheckRun' else n['context'] + for n in contexts} + missing = set(required) - actual + ``` + + ANY required name not in `actual` is a class-4 blocker. + + **Empirically observed 2026-04-28 on LFG #660:** required list + includes `build-and-test (macos-26)` but the tip commit's rollup + only had `build-and-test (ubuntu-24.04)` and + `build-and-test (ubuntu-24.04-arm)` โ€” the macos-26 leg never + reported. This was discovered AFTER claiming "all green, all + threads resolved" โ€” the rollup state was misleadingly SUCCESS + because GitHub's rollup state only reflects the contexts that + DID report, not the contexts that SHOULD have reported. + + **Resolution:** find the workflow run, check why the missing leg + didn't report (failed to start / not triggered / dispatched via + different workflow), then either re-trigger the leg, fix the + workflow config so the leg runs, or (last resort) update the + branch protection rule to remove the absent required-check name. + +5. **Repository ruleset gates (newer GitHub primitive, separate + from `branchProtectionRule`).** GitHub's repository rulesets + (rolled out 2024-2025) can impose required status checks, + conversation resolution, or merge-queue requirements that don't + appear in the legacy `branchProtectionRule` GraphQL field. + `branchProtectionRule` returns null or partial state when a + ruleset is the active gate. If classes 1-4 all clear and BLOCKED + persists, query rulesets explicitly: + + ```bash + gh api "repos///rulesets" \ + --jq '.[] | {id, name, target, enforcement}' + gh api "repos///rulesets/" \ + --jq '.rules[] | {type, parameters}' + ``` + + Any ruleset with `enforcement: active` targeting `branch` and + matching the PR's base branch can impose additional gates not + visible in the older API. Status: as of 2026-04-28, this is a + theoretical 5th class โ€” not yet observed on Zeta โ€” but worth + checking before declaring diagnosis exhausted. + +## What BLOCKED does NOT mean on Zeta + +**It does NOT mean "waiting for a human reviewer to approve."** + +There is NO human-reviewer-approval gate configured. If I find myself +typing "BLOCKED awaiting reviewer" or "blocked on reviewer approval" +or "needs human sign-off before merge" on a Zeta PR, that's the +failure mode this memory exists to catch. + +## How to verify the actual blocker (correct diagnostic command) + +```bash +gh api graphql --field query='query { + repository(owner:"",name:"Zeta"){ + pullRequest(number:){ + mergeStateStatus + mergeable + reviewDecision + reviewThreads(first:100){pageInfo{hasNextPage} nodes{isResolved}} + commits(last:1){nodes{commit{statusCheckRollup{state contexts(first:100){pageInfo{hasNextPage} nodes{__typename ... on CheckRun{name conclusion status} ... on StatusContext{context state}}}}}}} + baseRef{branchProtectionRule{requiredStatusCheckContexts}} + } + } +}' +``` + +Then check, in order: + +1. Are any threads `isResolved: false`? If yes โ€” that's the blocker. +2. Are any required status checks `FAILURE` / `IN_PROGRESS` / `QUEUED`? + If yes โ€” that's the blocker. +3. Is `mergeable: CONFLICTING`? If yes โ€” rebase needed. +4. **Is any name in `requiredStatusCheckContexts` MISSING from the + `contexts.nodes` list entirely?** If yes โ€” that's the class-4 + blocker (sneakiest class โ€” rollup state will report SUCCESS + because it only counts contexts that DID report). Compare set + membership. **Important:** the contexts query returns a UNION of + `CheckRun` (Actions-emitted) and `StatusContext` (legacy commit- + status API) nodes; the name field is `name` on CheckRun and + `context` on StatusContext. The set-extraction must handle both: + + ```python + actual = set() + for n in contexts: + if n['__typename'] == 'CheckRun': + actual.add(n['name']) + elif n['__typename'] == 'StatusContext': + actual.add(n['context']) + missing = set(required) - actual + ``` + + Any non-empty diff = absent-required blocker. + +5. **Is the merge gated by an enterprise/repository ruleset that + isn't visible via the legacy `branchProtectionRule` query?** + GitHub now ships repository rulesets (a separate primitive from + the older branch-protection rules) that can also impose required + status checks, conversation resolution, and other gates. The + GraphQL `branchProtectionRule` field returns null/legacy state + only; rulesets need a separate query (`repository.rulesets` or + the REST `/repos/{owner}/{repo}/rulesets` endpoint). If all four + classes above clear and BLOCKED persists, check rulesets next. +4. Are 1-3 all clear and BLOCKED still shows? Then check the branch- + protection rule directly via `baseRef.branchProtectionRule` โ€” but + on Zeta this should never happen because `requiredApprovingReviewCount: 0`. + +## Why this rule needs a durable memory + +Aaron 2026-04-28 verbatim: *"requiredApprovingReviewCount you've made +this mistake several time, can you just save soewhere that +requiredApprovingReviewCount: 0 or something that reminds you of that +on this project?"* + +I made this mistake **multiple times in a single session** despite: + +- Otto-355 (BLOCKED-with-green-CI investigate-threads-first) being a + CLAUDE.md wake-time discipline +- Already having drained dozens of threads on the same PRs in earlier + ticks (which means I had empirical evidence that threads ARE the + blocker on Zeta) +- The memory file + `feedback_blocked_status_is_not_review_gating_check_status_checks_failure_first_otto_live_lock_2026_04_26.md` + already existing as a 9-pattern live-lock taxonomy + +The reason vigilance-only enforcement keeps failing: GitHub's UI uses +the word "review" everywhere, and my training-data prior maps "BLOCKED" +to "waiting for human." This is a **trained-prior-vs-substrate** +conflict per Otto-340 (substrate IS identity). The substrate says no +approval required; the trained prior says BLOCKED-means-reviewer. I +keep snapping back to the prior under load. + +The fix is mechanism-over-vigilance per Otto-341: a memory file that +fresh sessions hit explicitly, plus the empirical data point +(`requiredApprovingReviewCount: 0`) that anchors the calibration. When +I see `mergeStateStatus: BLOCKED` on Zeta, the FIRST thing I should +do is check threads + checks + conflicts โ€” NEVER claim "waiting for +reviewer." + +## Recurrences in the 2026-04-28 session (the count itself is signal) + +This rule was violated multiple times before Aaron caught it +explicitly. Each recurrence documented as evidence: + +### 1st caught: 2026-04-28 (LFG #660 close-of-tick #1) + +I closed a tick with: *"LFG #660 is BLOCKED waiting on reviewer +approval โ€” that's not an agent action."* Aaron didn't catch it +immediately because the queue was busy. + +### 2nd caught: 2026-04-28 (LFG #660 close-of-tick #2) + +Repeated the framing in a status update: *"LFG #660 BLOCKED awaiting +reviewer."* + +### 3rd caught: 2026-04-28 (Aaron's catch) + +After the same framing landed a third time, Aaron prompted: *"you +said one of the PRs was block on maintainer, are you sure, it's not +something simple you can figure out?"* + +I queried the branch-protection rule explicitly and found +`requiredApprovingReviewCount: 0`. The "blocker" was 3 unresolved +review threads that had landed since my last check โ€” fixable in one +commit + 3 GraphQL `resolveReviewThread` calls. + +The 5-minute fix had been gated by my parroted misdiagnosis for +hours. + +## Always double-check threads AFTER CI completes (Aaron 2026-04-28) + +Aaron 2026-04-28 follow-up: *"you should always double check, +unreviewed threads after CI completes"* + +**Why this matters:** new review threads can land AFTER CI completes, +not just before. The reviewers I see most often on Zeta: + +- `chatgpt-codex-connector` โ€” runs after CI (latency: ~5-10 min) +- `copilot-pull-request-reviewer` โ€” runs after CI (latency: ~2-5 min) + +So a PR can transition through these states in sequence: + +``` +push โ†’ CI running โ†’ CI green โ†’ BLOCKED-with-green-CI (no threads yet) + โ†’ reviewers wake up (5-10 min) โ†’ BLOCKED-with-new-threads +``` + +If I check threads ONLY when CI starts, or ONLY when CI is mid-run, I +miss the threads that land after CI completes. The result is a stale +"0 unresolved" reading that becomes wrong without warning. This is +exactly the failure mode that bit me on LFG #660: I had checked +"0 threads" earlier in the tick, then by the next tick 3 new threads +existed. + +**Operational rule:** when a Zeta PR is BLOCKED, run the GraphQL +threads query at minimum TWICE: + +1. Once when first investigating the BLOCKED state. +2. Once AFTER CI completes (status-checks all green) โ€” this is the + moment new reviewer threads typically land. + +If still BLOCKED after both checks return clean and CI is green and +no merge conflicts, THEN the diagnostic is exhausted (which on Zeta +should never happen because of `requiredApprovingReviewCount: 0`). + +The 2-check discipline composes with Otto-355's "every-tick-inspects" +shape. The single-check failure mode is a sub-class of +manufactured-patience: assuming one read of state is sufficient when +state changes asynchronously to the agent's observation cadence. + +**Concrete check shape (memo for future-self):** + +```bash +# First check โ€” at the moment of investigating BLOCKED state. +# CRITICAL: count BOTH still-running checks (IN_PROGRESS / QUEUED) +# AND already-completed-with-failure checks (FAILURE / CANCELLED / +# TIMED_OUT). A check that COMPLETED with FAILURE is "done" but the +# blocker is still active โ€” treating it as "CI complete" would skip +# the post-CI thread pass while a real failure is unfixed. +gh pr view --repo /Zeta --json statusCheckRollup --jq '{ + # Pending = ANY non-terminal status. GitHub Check Runs API enums + # (per https://docs.github.com/rest/checks/runs#about-the-checks-api): + # status โˆˆ {QUEUED, IN_PROGRESS, COMPLETED, WAITING, REQUESTED, PENDING} + # The terminal status is COMPLETED; everything else is still pending. + pending: [.statusCheckRollup[] | select(.status=="IN_PROGRESS" or .status=="QUEUED" or .status=="WAITING" or .status=="REQUESTED" or .status=="PENDING")] | length, + # Failed = ANY non-success terminal conclusion. Enums: + # conclusion โˆˆ {SUCCESS, FAILURE, NEUTRAL, CANCELLED, SKIPPED, TIMED_OUT, ACTION_REQUIRED, STARTUP_FAILURE, STALE} + # FAILURE / CANCELLED / TIMED_OUT / ACTION_REQUIRED / STARTUP_FAILURE all + # block branch protection. NEUTRAL / SKIPPED / SUCCESS pass. STALE means + # the check needs re-running but is treated as failing by branch protection. + failed: [.statusCheckRollup[] | select(.conclusion=="FAILURE" or .conclusion=="CANCELLED" or .conclusion=="TIMED_OUT" or .conclusion=="ACTION_REQUIRED" or .conclusion=="STARTUP_FAILURE" or .conclusion=="STALE")] | length +}' + +# If `pending == 0 AND failed == 0`, CI is complete-and-green. Wait +# ~5-10 min for reviewers to wake up, THEN run the threads query a +# second time: +gh api graphql --field query='query{repository(owner:"",name:"Zeta"){pullRequest(number:){reviewThreads(first:100){pageInfo{hasNextPage} nodes{isResolved}}}}}' + +# CRITICAL: if `hasNextPage: true` on either reviewThreads or contexts, +# the playbook is SHOWING A TRUNCATED VIEW. Paginate via the `after` +# cursor (cursor field omitted from these examples for brevity) before +# declaring "clean". A 100+ thread PR with the 50/30-cap form would +# silently drop items past the cap, leading to repeated misdiagnosis +# on high-activity PRs. The 100-cap reduces likelihood of truncation +# but DOES NOT replace the hasNextPage check โ€” always read the flag +# before treating the count as authoritative. + +# If `failed > 0`, the blocker is the failing check itself โ€” investigate +# the failure first; the post-CI thread pass is gated on green CI. +``` + +If using autonomous-loop, the natural shape is: + +- Tick N: investigate, find threads, drain them, push +- Tick N+1: re-check after CI completes; if new threads landed, + drain them too + +The "always double-check" phrasing also generalizes: never trust a +single read of an asynchronously-updated GitHub state. Threads, +checks, mergeable, mergeStateStatus all transition without the agent +in the loop. + +## Pre-write self-scan rule (every status-update message) + +Before sending any message that says a Zeta PR is BLOCKED, scan the +draft for these forbidden phrases: + +``` +blocked awaiting reviewer | awaiting reviewer | needs reviewer approval +| waiting for reviewer | blocked on reviewer | reviewer-approval gated +| waiting for human sign-off | needs human review to merge +| needs maintainer approval | blocked on maintainer +``` + +If ANY match โ†’ STOP. Run the GraphQL query above. The actual blocker +will be threads, checks, or conflicts โ€” never "reviewer approval." + +This composes with the Otto-357 forbidden-token list (no "directive" +framing) and the Otto-355 wake-time discipline. Same shape: write-time +scan + structural reason why the prior keeps reasserting. + +## Composes with + +- **Otto-355** (CLAUDE.md wake-time discipline) โ€” BLOCKED-with-green-CI + investigate-threads-first; this memory is the ZETA-SPECIFIC + CALIBRATION that makes Otto-355 sharper (the "what's the actual + blocker" question has only 3 possible answers on Zeta, not "waiting + for reviewer" as a 4th) +- **`memory/feedback_blocked_status_is_not_review_gating_check_status_checks_failure_first_otto_live_lock_2026_04_26.md`** + โ€” the 9-pattern live-lock taxonomy that this rule extends with + project-specific calibration data +- **`memory/feedback_outdated_review_threads_block_merge_resolve_explicitly_after_force_push_2026_04_27.md`** + โ€” `isOutdated: true` threads still block; explicit resolve required +- **Otto-275-FOREVER** โ€” knowing-rule != applying-rule; this memory + IS the applying-rule mechanism for the BLOCKED-means-reviewer + failure mode +- **Otto-340** โ€” substrate IS identity; the substrate says + "no approval required" but the trained prior says + "BLOCKED-means-reviewer." Substrate must win. +- **Otto-341** โ€” mechanism-over-vigilance; the explicit-memory-file + IS the mechanism that closes vigilance gaps + +## What this memory does NOT do + +- Does NOT change branch protection. It documents the current state + (verified 2026-04-28). If the maintainer changes the rule later + (e.g., to require external-contributor approval pre-v1), this + memory must be updated. +- Does NOT mean reviews don't matter. Reviews still happen via codex/ + copilot/maintainer + show up as threads. The rule is just that + *count of approving reviews* is not the gate. +- Does NOT cover other repos. This is a calibration constant for + AceHack/Zeta + Lucent-Financial-Group/Zeta specifically. Other + projects under different ownership have different rules. +- Does NOT replace Otto-355's empirical query habit. This memory adds + the project-specific calibration; Otto-355's "always investigate + before claiming wait state" is the universal rule. + +## Triggers for retrieval + +- Any time the word BLOCKED appears in a Zeta PR status +- Any time considering a "waiting for reviewer" framing on a Zeta PR +- `requiredApprovingReviewCount` / `requiresApprovingReviews` in any + GraphQL response on Zeta +- Aaron 2026-04-28 *"requiredApprovingReviewCount you've made this + mistake several time"* +- Recurrence catches in tick-history rows (the count itself is signal + per Otto-275-FOREVER) + +## Future-self check + +If a future-Otto wake reads `mergeStateStatus: BLOCKED` on a Zeta PR +and the first instinct is "must be waiting for reviewer" โ€” re-read +this memory FIRST. The instinct is the trained-prior. The substrate +says no approval required. Substrate wins. diff --git a/memory/feedback_reviewer_false_positive_pattern_catalog_aaron_2026_04_28.md b/memory/feedback_reviewer_false_positive_pattern_catalog_aaron_2026_04_28.md new file mode 100644 index 00000000..cbabab7c --- /dev/null +++ b/memory/feedback_reviewer_false_positive_pattern_catalog_aaron_2026_04_28.md @@ -0,0 +1,410 @@ +--- +name: >- + Reviewer false-positive pattern catalog โ€” empirical taxonomy of + recurring codex/copilot/automated-reviewer false-positive shapes + observed across the 2026-04-28 thread drain (~50+ threads across + 10+ PRs); each class has a discriminating signal + recommended + resolution form (form-1 substantive fix / form-2 already-fixed + with citation / form-3 carve-out cite / form-4 empirical + falsification); intent is to (a) make future reviews faster by + letting fresh-Otto recognize false-positive class instantly, + (b) name the structural causes so we can preempt them at write- + time, (c) identify where reviewer tooling itself could be improved; + Aaron 2026-04-28 ask after the 121-thread audit +description: >- + Aaron 2026-04-28 input after the 121-thread bulk audit and the + ongoing always-double-check-after-CI loop: + *"Total 121 unresolved threads. when you got through these do you + see if you can do anything to improve the flase posistive in the + future?"* This memory documents the recurring false-positive + patterns I observed across the session's thread-drain work, with + per-class diagnostic signals + resolution forms + structural + prevention candidates. Composes with Otto-355 (BLOCKED-investigate- + threads-first) and the no-required-approval calibration constant โ€” + this catalog gives future-Otto a lookup table for fast + classification of incoming reviewer threads. +type: feedback +--- + +# Reviewer false-positive pattern catalog (2026-04-28) + +## Why this catalog exists + +Across the 2026-04-28 session I drained ~50+ review threads spanning +PRs #17, #19, #23, #28, #72, #75, #82, #83, #84, #85, #87, #91, #92, +#660 (LFG). The threads come from multiple automated reviewers: + +- `chatgpt-codex-connector` (Codex) +- `copilot-pull-request-reviewer` (GitHub Copilot) + +Roughly **40-50% of threads were genuine substantive findings** that +needed form-1 fixes. The other half were **false-positives or +already-fixed** issues with recognizable structural causes. This +catalog names those structural causes so that: + +1. **Future-Otto recognizes the class instantly** and applies the + right resolution form without re-deriving the diagnosis. +2. **Write-time prevention** can be designed where the structural + cause is in the writing process (e.g., schema lookup before + authoring backlog rows would prevent the schema-drift class). +3. **Reviewer-tooling improvements** can be requested where the + structural cause is in the reviewer itself (e.g., Codex/Copilot + could read project conventions before applying generic style + rules). + +## Resolution form taxonomy (recap) + +- **Form-1 (substantive fix):** the finding is real; apply the + suggested change or an equivalent fix. +- **Form-2 (already-fixed with citation):** the finding was real but + has been addressed in a later commit on this PR. Reply with the + commit SHA + close the thread. +- **Form-3 (carve-out cite):** the finding mis-applies a rule that + has a documented carve-out for this surface. Reply with the + carve-out citation + close. +- **Form-4 (empirical falsification):** the finding asserts a + language/runtime/tool behavior that is testably wrong. Reply with + the empirical test + close. + +## The classes (ranked by frequency in this session) + +### Class 1 โ€” Stale-snapshot review (form-2) + +**Frequency in 2026-04-28 session:** ~25% of false-positives. + +**Discriminating signal:** the thread cites a line / claim that has +already been changed in a later commit on the same PR. The reviewer +ran against an older SHA. + +**Examples this session:** + +- PR #91 P1 "ONLY one of" โ€” already changed to "one OR MORE" in + the prior commit +- PR #91 P1 "not at top" โ€” already moved to top in the prior commit +- LFG #660 P1 PR-description-says-4-files โ€” already updated to 5 + files in PR title + body + +**Why it happens (structural):** Codex / Copilot run against the SHA +of the PR's base commit when the review was queued. If the author +(me) pushes a fix BETWEEN the review's queue time and its delivery +time, the review describes pre-fix state but lands as if it's +current. There's no client-side staleness check. + +**Recommended resolution form:** form-2 โ€” reply with the commit SHA +that addressed it + resolve. + +**Prevention candidate (reviewer-side):** Codex/Copilot could check +the latest SHA at delivery time and skip findings whose target lines +have changed. Worth filing upstream. + +**Prevention candidate (factory-side):** none โ€” this is reviewer- +tooling, not author-side. Mitigation is just fast form-2 closure. + +--- + +### Class 2 โ€” Carve-out blind spot (form-3) + +**Frequency:** ~20% of false-positives. + +**Discriminating signal:** the reviewer applies a generic rule +("no name attribution in code", "no absolute paths in docs", "no +PR-relative phrasing") to a surface that has a documented carve-out +in the project's conventions. + +**Examples this session:** + +- LFG #660 P1 `Co-authored-by: Otto` flagged as persona-name on + current-state surface โ€” but commit trailers ARE history surface + per Otto-279 carve-out +- Multiple "memory file should be terse" findings on memory/* โ€” + but `memory/README.md` distinguishes index entries (terse) from + body (detailed); reviewer flagged body length + +**Why it happens (structural):** automated reviewers train on +generic OSS conventions (Google style guide, GitHub markdown lint, +etc.) and don't read the project's `docs/AGENT-BEST-PRACTICES.md` +or `docs/GLOSSARY.md` carve-out rules. + +**Recommended resolution form:** form-3 โ€” reply with the explicit +carve-out citation (e.g., "Otto-279 history-surface carve-out at +docs/AGENT-BEST-PRACTICES.md:287-348 covers commit trailers") + +resolve. + +**Prevention candidate (factory-side):** maintain a single +`docs/CARVE-OUTS.md` index that future-reviewers can be pointed at +in PR-template / `.github/copilot-instructions.md`. Some of this +exists in `docs/AGENT-BEST-PRACTICES.md` already; making the +carve-out section more grep-friendly would help. + +**Prevention candidate (reviewer-side):** Codex/Copilot could be +asked to read the project's `AGENTS.md` or +`.github/copilot-instructions.md` before applying style rules. The +factory-managed `.github/copilot-instructions.md` (per +GOVERNANCE.md ยง31) is the lever here โ€” extending it with explicit +carve-out enumeration could reduce this class significantly. + +--- + +### Class 3 โ€” Schema rule blind spot (form-1, but preventable at write-time) + +**Frequency:** ~15% of false-positives, but worth preventing because +the underlying drift is a real factory-side error. + +**Discriminating signal:** the reviewer flags a structural rule +violation that the project has a documented schema for, but the +author (me) didn't read the schema before writing. + +**Examples this session:** + +- B-0068/B-0069/B-0070/B-0071 backlog rows โ€” used off-schema fields + (`slug`, `maintainer`, `ownership`, `status: backlog`) instead of + the schema in `tools/backlog/README.md` (which uses `status: open` + and a different field set) +- Memory frontmatter YAML validity โ€” `requiredApprovingReviewCount: + 0` in a plain scalar broke YAML parsing + +**Why it happens (factory-side):** I authored from a stale mental +template instead of re-reading the schema for each artifact. Per +Otto-275-FOREVER: knowing-rule != applying-rule. The schema lives +in a documented location, but the act of authoring doesn't trigger +schema-lookup unless I make it a discipline. + +**Recommended resolution form:** form-1 โ€” apply the suggested fix +(usually mechanical, change the field set / status enum / etc.). + +**Prevention candidate (factory-side):** + +1. **Pre-write schema-fetch discipline:** before authoring any + structured artifact (backlog row / memory frontmatter / commit + trailer block), grep the documented schema and check it against + the draft. Cost: 30 seconds per artifact. Saved cost: avoiding + reviewer round-trip + sister-row schema-fix sweeps. +2. **Mechanical schema validators:** `tools/hygiene/audit-backlog- + schema.sh` + `tools/hygiene/audit-memory-frontmatter.sh` that + parse YAML + check field set + enum values. Pre-commit hook + would catch the drift before reviewer cycle. + +**Prevention candidate (reviewer-side):** none โ€” reviewers caught +these correctly. The factory side is where the prevention belongs. + +--- + +### Class 4 โ€” Wrong-language-parser blind spot (form-3 or form-4) + +**Frequency:** ~10% of false-positives. + +**Discriminating signal:** the reviewer applies a language/runtime +rule that doesn't actually apply to the construct in question. + +**Examples this session:** + +- PR #75 Copilot P0 claimed `if ! var="$(cmd)"; then ...` doesn't + catch cmd failure โ€” empirically wrong on bash 3.2.57 + 5.x; the + if-not test on the assignment exit status DOES catch it +- (potential in this catalog scope) Copilot reviewing F# applying + C# null-check rules + +**Why it happens (structural):** reviewers train across many +languages and apply rules from the wrong language family when the +file extension or syntax is ambiguous. + +**Recommended resolution form:** form-4 โ€” empirically test the +claim and reply with the test command + result. If the claim turns +out to be correct, drop to form-1. + +**Prevention candidate (reviewer-side):** asking the reviewer to +identify the language explicitly before applying rules would help, +but this is upstream tooling. + +**Prevention candidate (author-side):** the verify-before-deferring +discipline already covers this โ€” when a reviewer makes a +language-claim, test it before applying the fix. + +--- + +### Class 5 โ€” Convention conflict (form-3) + +**Frequency:** ~10% of false-positives. + +**Discriminating signal:** the reviewer applies a broad style +preference (line length / comment density / variable naming) that +conflicts with a documented project convention. + +**Examples this session:** + +- "MEMORY.md entry too long" โ€” entry was actually within memory/ + README.md guidance for the artifact class +- "Comment too verbose" on memory files โ€” memory file bodies are + intentionally detailed per the project's substrate-grade-not-code- + comment register + +**Why it happens (structural):** broad style rules are reviewer +defaults; project conventions live in `docs/` files the reviewer +doesn't read. + +**Recommended resolution form:** form-3 with the specific +convention citation. + +**Prevention candidate:** same as Class 2 โ€” extend +`.github/copilot-instructions.md` with the project-specific style +exemptions. + +--- + +### Class 6 โ€” Cross-reference target out of scope (form-1, but the +underlying class is "broken in-repo cross-reference") + +**Frequency:** ~10% of false-positives that are actually class-3-real +(reviewer caught a real bug). + +**Discriminating signal:** reviewer flags a path/link as broken +because the file lives in a different directory than the writer +assumed. + +**Examples this session:** + +- Otto-278 / Otto-352 / per-named-agent-memory-architecture xrefs โ€” + paths that exist only in user-scope memory, not in-repo +- `user_hacked_god_*.md` references missing the `memory/` prefix + (3 instances on PR #92) + +**Why it happens (factory-side):** writer (me) wrote the path from +mental model rather than verifying via filesystem. + +**Recommended resolution form:** form-1 โ€” fix the path or relabel +as user-scope-only. + +**Prevention candidate (factory-side):** + +1. **B-0070 orphan-role-ref-detector** โ€” extend to also catch + broken in-repo path references (already noted as observation in + prior tick). +2. **Pre-commit lint** that resolves all `[link](path)` markdown + references against the filesystem. Runs in ~1 second; catches + the broken-xref class entirely. + +**Prevention candidate (reviewer-side):** reviewers do this well โ€” +no upstream improvement needed. + +--- + +### Class 7 โ€” Recursive-CI new threads (procedural class, not +false-positive but worth naming) + +**Frequency:** every CI cycle on every PR. + +**Discriminating signal:** I drained N threads, pushed, CI ran, NEW +threads landed catching issues introduced by the fix or that the +reviewer missed in the prior pass. + +**Examples this session:** + +- PR #91: 2 threads โ†’ fixed โ†’ 3 new threads (P0 YAML) โ†’ fixed โ†’ 3 + more new threads (failed-check counting + StatusContext fragment + + index entry length) + +**Why it happens:** reviewers run incrementally on each push; +findings compound until the file converges. + +**Resolution form:** continue draining; don't classify as +false-positive (these are real findings). + +**Prevention candidate (factory-side):** none on the procedure +itself, but **mechanism-over-vigilance** via pre-commit hooks + +project-side validators (YAML, schema, xref-resolution) would catch +many of the underlying issues before the first reviewer pass โ€” and +the reviewer would land cleaner more often. + +--- + +## Frequency-weighted prevention candidates (ROI ranked) + +If we want to reduce false-positive volume in future PRs, the +highest-ROI structural fixes are: + +### High ROI (catches multiple classes) + +1. **Pre-commit YAML validator** for memory/* frontmatter + (catches Class 3 + the recursive-CI pattern of Class 7 for + frontmatter-related issues). Cost: ~30 lines bash + pre-commit + hook. Catches Class 3 entirely for frontmatter. + +2. **Pre-commit markdown-xref-resolver** that validates every + `[text](path)` against the filesystem. Catches Class 6 entirely. + Cost: ~50 lines bash or python. Composes with B-0070 lint. + +3. **Extend `.github/copilot-instructions.md`** with the project's + carve-out enumeration (Otto-279 history-surface, memory file + body-vs-index distinction, etc.). Catches Class 2 + Class 5 by + biasing the reviewer's findings. Cost: ~20 lines doc edit. + +### Medium ROI (single class) + +4. **Pre-write schema-fetch discipline** (operational, not tool): + grep `tools/backlog/README.md` before authoring any backlog row; + grep `memory/README.md` before authoring memory frontmatter. + Catches Class 3 at write-time. Cost: 30 seconds per artifact. + +5. **`tools/hygiene/audit-backlog-schema.sh`** that runs the + tools/backlog/README.md schema check mechanically. Catches Class 3 + for backlog rows. Cost: ~40 lines bash. + +### Low ROI (reviewer-side, can't enforce) + +6. Asking Codex/Copilot upstream to read project conventions + before applying generic rules. Long-term ask, low immediate + leverage. + +## How to use this catalog + +When reading a new reviewer thread: + +1. **Check the class:** does it match Class 1 (stale-snapshot)? + Class 2 (carve-out blind spot)? etc. +2. **If matched:** apply the resolution form for that class + (form-1/2/3/4) without re-deriving the analysis. +3. **If unmatched:** treat as a genuine substantive finding, + investigate normally. + +This shaves ~30 seconds to ~2 minutes off each false-positive thread +and scales as thread volume grows. + +## Composes with + +- **Otto-355** (BLOCKED-investigate-threads-first) โ€” this catalog + makes Otto-355's investigation faster +- **Otto-275-FOREVER** โ€” Class 3 (schema drift) is a textbook + knowing-rule != applying-rule case +- **Otto-279** (history-surface attribution carve-out) โ€” Class 2 + cites this carve-out frequently +- **`feedback_no_required_approval_on_zeta_BLOCKED_means_threads_or_ci_aaron_2026_04_28.md`** + โ€” companion: that memory says "always double-check threads after + CI completes"; this catalog makes the post-CI thread pass faster +- **B-0070** orphan-role-ref-detector โ€” composes with Class 6 + prevention work +- **Aaron 2026-04-28** *"can you do anything to improve the false + positive in the future?"* โ€” the prompting input + +## Triggers for retrieval + +- Reading a fresh reviewer thread on any Zeta PR +- Considering whether a finding is a real bug vs. a false-positive +- Designing a pre-commit hook or hygiene script for the factory +- Reviewing this session's tick-history rows for thread-drain + patterns + +## What this catalog does NOT do + +- Does NOT replace per-thread judgment. The classes are + recognition hints, not auto-classifiers. A genuine bug can wear + the costume of any class above. +- Does NOT obviate empirical testing for Class 4. The + verify-don't-parrot discipline still applies โ€” test the language + claim before deciding it's a false-positive. +- Does NOT cover human-reviewer findings. Aaron's review threads + are not in this catalog because Aaron's pattern is different (he + catches my drift in framing / vocabulary / direction, not generic + style rules). +- Does NOT cover novel reviewers (Gemini / Grok / Amara) โ€” their + patterns are distinct enough that they need their own catalogs if + we ever automate review through them.