feat(bg-notifier): B-0501 slice 5a — assignment-history cooldown gate#4449
Conversation
Adds the assignment-history dedup/cooldown mechanism specified in B-0501:
the notifier no longer re-publishes the same `work-assignment` envelope
every poll cycle when an idle agent hasn't acted on it yet.
Changes:
- NotifierConfig gains `historyFile` (default resolves to
`${ZETA_BUS_DIR ?? "/tmp/zeta-bus"}/assignment-history.json` via the
new `defaultHistoryFile()` helper) and `cooldownMin` (default 30)
- PollResult gains `skippedDueToCooldown: string[]`
- Adapters interface gains `readHistoryFile` + `writeHistoryFile`;
REAL_ADAPTERS uses atomic-rename (`writeFileSync` to `<path>.tmp`
then `renameSync`) to survive concurrent notifier instances
- pollOnce reads history before the publish loop, computes the active-
cooldown set, partitions toAssign into actually-publishing vs skipped,
then writes pruned-history + new entries atomically when publishes
occurred
- parseArgs gains `--history-file` and `--cooldown-min` flags
- B-0501 closed; B-0441 parent acceptance bullet ("Tracks assignment
history...") checked off
Tests added (8 new, 45 total): cooldown-skip within window, re-assign
after window, history-absent first-assignment, multi-row partial-skip,
pruning, defaultHistoryFile env-var honoring, --history-file/--cooldown-min
parse, --history-file rejects missing value. All 45 tests pass.
Per .claude/rules/claim-acquire-before-worktree-work.md: claim
acquired (7152b349) before starting; isolated FETCH_HEAD-anchored
worktree at /private/tmp/zeta-shard-1807z-coldboot.
Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 74dc2f0fe9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Adds an assignment-history “cooldown” mechanism to the backlog ready-to-grind notifier to avoid re-sending identical work-assignment envelopes to idle agents on every poll cycle, and updates the associated backlog rows/docs to mark the slice as shipped.
Changes:
- Extend
NotifierConfig/PollResultandAdaptersto support a persisted assignment history file and a cooldown window. - Implement cooldown gating in
pollOnce, including history read, skip tracking, and history pruning/write-back. - Add targeted tests for cooldown behavior and CLI parsing; close out B-0501/B-0441 checklist items in docs.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/bg/backlog-ready-notifier.ts | Implements cooldown gate + history persistence hooks and CLI flags. |
| tools/bg/backlog-ready-notifier.test.ts | Adds 8 tests covering cooldown behavior, pruning, and arg parsing. |
| docs/backlog/P1/B-0501-b0441-slice-5-assignment-history-dedup-cooldown-2026-05-14.md | Marks B-0501 closed and documents the shipped resolution. |
| docs/backlog/P1/B-0441-backlog-row-ready-to-grind-notifier-background-service-2026-05-13.md | Checks off the slice-5a acceptance bullet as shipped. |
| docs/BACKLOG.md | Regenerates index entry to reflect B-0501 as closed. |
|
Vera triage 2026-05-20T19:53Z: #4449 is owner-only ( Current blockers:
Next owner action: push a new head that uses a unique temp file for history writes and applies cooldown filtering before |
… P0/P1 + CodeQL)
5 unresolved threads — all valid; all addressed:
1. Codex P1 (line 370): cooled-down rows must not consume maxAssignments
quota. Restructured the publish loop to scan readyRows in order and
apply the cooldown check PER ROW, breaking only when
publishedEnvelopeIds.length === maxAssignments. Cooled-down rows are
recorded in skippedDueToCooldown but do NOT count toward the cap.
2. Codex P1 + Copilot P0 + CodeQL (lines 262-264): fixed-path `${path}.tmp`
is racy with concurrent notifier instances (the exact case slice 5a
addresses). Switched to a process-unique temp filename:
`${path}.tmp.${process.pid}.${randomBytes(6).toString("hex")}`.
Two writers can no longer share a temp path.
3. Copilot P1 (line 365): history file was read on every poll cycle even
when nothing would be published. Deferred the read+parse inside the
`!noPublish && readyRows.length > 0` branch so dry-runs and
ready-empty polls skip the IO entirely.
Tests added (3 new, 48 total):
- cooled-down rows do NOT consume maxAssignments quota — 3 cooled + 3
eligible → 3 publishes go to eligible
- readHistoryFile NOT called when noPublish: true
- readHistoryFile NOT called when readyRows is empty
Co-Authored-By: Claude <noreply@anthropic.com>
|
Vera recheck 2026-05-20T19:56Z after new head The prior owner-blocking review findings on #4449 have moved forward:
Next toe-safe action: wait for this new-head CI run to finish; do not rerun or merge yet. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fe47416885
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Vera recheck 2026-05-20T19:58Z after inspecting current CodeQL/review state on head Correction to my 19:56Z wait-state: #4449 is not just waiting on CI anymore. Current blockers:
Next owner action: fix the temp-file creation so CodeQL clears, and make the history write merge concurrent updates before replacing the history file; then push a new head and let CodeQL/reviews refresh. |
|
Vera settled recheck 2026-05-20T20:00Z on head The new-head run has effectively settled. All fetched build/lint/analysis jobs are completed/success except CodeQL, which remains Current blockers remain actionable, not transient:
No rerun is useful yet. Next owner action is still to push a fix for the temp-file alert and concurrent history-write merge behavior, then let checks/reviews refresh. |
…ead-merge-write Two new threads on the first-round-fix commit: 1. CodeQL (line 267 — js/insecure-temporary-file, same alert #91): the unique-temp-filename was correct for the multi-instance race but the underlying file-creation was still O_WRONLY|O_CREAT|O_TRUNC (no O_EXCL). An attacker with bus-directory write access could pre-create a symlink at our temp path. Added `{ flag: "wx" }` to writeFileSync (O_WRONLY|O_CREAT|O_EXCL) so the create fails if the file exists. 2. Codex P1 (line 412): read-modify-write race on history. Two notifier instances both reading the same pre-write snapshot, each adding their own row, then both writing — the later rename wins and the first instance's row gets DROPPED from history (extending the double-assignment risk into the full cooldown window for the lost entry). Added read-merge-write: re-read on-disk history immediately before computing the next write, union any rowIds the peer recorded between our initial read and now, then write the merged result. Reduces (but does not strictly eliminate) the race — strict elimination needs flock or an append-only log, both out of scope for slice 5a per the B-0501 spec atomic-write-note. Tests added (1 new, 49 total): peer-wrote-between-our-reads scenario verifies the merge preserves the peer's entry alongside ours. Co-Authored-By: Claude <noreply@anthropic.com>
PR #4449 CI surfaced TS18047: `'history' is possibly 'null'` at the read-merge-write line. The `let history: AssignmentHistory | null = null` declared outside the publish branch wasn't narrowed by tsc's flow analysis inside the `if (publishedRowIds.length > 0)` nested block. Solved by moving the declaration into the outer publish-branch as a const: `const history: AssignmentHistory = adapters.readHistoryFile(...) ?? { entries: [] };`. Equivalent runtime behavior, narrower scope, satisfies tsc. Co-Authored-By: Claude <noreply@anthropic.com>
…me cleanup Two PR #4449 review findings on commit 6052f30: 1. CodeQL alert #92 (line 280, js/insecure-temporary-file): the `pid + randomBytes(6) + flag:"wx"` pattern was secure but CodeQL still flags predictable filenames in OS temp directories. Switched to the stricter `mkdtempSync` pattern: create a private (0700-mode) directory in the parent dir with cryptographically-unguessable suffix, write history inside that dir, rename onto target, then cleanup the now-empty dir. Defeats symlink attacks AND satisfies the CodeQL rule. 2. Copilot P1 (line 417): persona-name attribution in non-history code comments. Repo convention per .github/copilot-instructions.md:305-366 is to use role-refs / generic references outside history surfaces. Rephrased "Codex PR #4449 P1" → "PR #4449 review finding". `randomBytes` import dropped (replaced by mkdtempSync's built-in uniqueness). All 49 tests still pass. Co-Authored-By: Claude <noreply@anthropic.com>
|
Vera queue note — 2026-05-20T20:12Z Current head CI is also not fully settled yet: |
…ons (#4450) PR #4449 review finding (Copilot P1, post-merge): the test "defaultHistoryFile honors ZETA_BUS_DIR" hard-coded "/var/zeta-test/assignment-history.json" but defaultHistoryFile uses path.join, which returns OS-native separators (backslashes on Windows). The assertion would fail on a Windows CI leg even though no Windows workflow exists today. Computed expected values with path.join() for portability. 49/49 tests still pass; tsc clean. Co-authored-by: Claude <noreply@anthropic.com>
Summary
Closes B-0501 (B-0441 slice 5a). Adds the assignment-history dedup/cooldown mechanism to
tools/bg/backlog-ready-notifier.tsso an idle agent isn't spammed with the same `work-assignment` envelope every poll cycle.Test plan
🤖 Generated with Claude Code