Skip to content

fix(dispatcher): idempotent createWorkflowRecord so a retried step can't mask errors with UNIQUE#159

Merged
thejustinwalsh merged 4 commits into
mainfrom
middle-issue-108
May 28, 2026
Merged

fix(dispatcher): idempotent createWorkflowRecord so a retried step can't mask errors with UNIQUE#159
thejustinwalsh merged 4 commits into
mainfrom
middle-issue-108

Conversation

@thejustinwalsh

@thejustinwalsh thejustinwalsh commented May 26, 2026

Copy link
Copy Markdown
Owner

Summary

Closes #108

createWorkflowRecord was a plain INSERT into workflows (PK = id, the bunqueue execution id). The implementation workflow's prepare-worktree step registers with bunqueue's default retry: 3 and calls createWorkflowRecord before createWorktree. A transient createWorktree failure retried the whole step, re-ran the INSERT for the same id, and surfaced UNIQUE constraint failed: workflows.id — masking the real error. The recommender/documentation check-rate-limit steps had dodged the same class with a per-step retry: 1 workaround.

Fixed at the source: INSERT ... ON CONFLICT(id) DO NOTHING makes the record-creating step idempotent. The only way the PK collides is a same-execution retry — exactly the case to no-op — so a retried step now surfaces the real downstream error. Scoped to the id PK (not a blanket INSERT OR IGNORE) so a genuine CHECK/NOT-NULL violation still throws. Hardens all three workflows (implementation, recommender, documentation) at one point.

What changed

  • packages/dispatcher/src/workflow-record.tscreateWorkflowRecord INSERT → ON CONFLICT(id) DO NOTHING, with a doc note on the idempotency contract and why it's PK-scoped.
  • packages/dispatcher/test/workflow-record.test.ts — idempotent-retry no-op test (a second create with the same id leaves the existing row untouched); bad-kind test (a real CHECK violation still throws — guards the PK-scoping).
  • packages/dispatcher/test/implementation-workflow.test.ts — a transient createWorktree failure retries and recovers to completed instead of failing on a masked UNIQUE.

Why these changes

prepare-worktree's createWorktree is legitimately retriable (a transient git failure can succeed on retry), so retry: 1 (the existing workaround) would throw that recovery away. The INSERT is the only non-idempotent thing in the step; making it idempotent on the PK preserves the retry semantics and fixes the whole class at the source rather than per-step. ON CONFLICT(id) over INSERT OR IGNORE keeps genuine schema-constraint violations throwing.

Status

  • Phase 1: Idempotent createWorkflowRecord + tests

Verification

  • bun test packages/dispatcher/405 pass, 0 fail
  • bun test (full repo) → 722 pass, 0 fail
  • bun run typecheck → clean
  • bun run lint → clean (--deny-warnings)
  • Both new tests proven red→green: with the fix reverted, the unit test throws UNIQUE constraint failed: workflows.id and the workflow test fails to reach completed; with the fix both pass.

Acceptance criteria

Stumbling points

  • Dependencies weren't installed in the fresh worktree (bun install first). bunqueue installs under node_modules/.bun/, so confirming its default retry: 3 meant resolving the hoisted path manually.

Suggested CLAUDE.md updates

None. The existing ## state-issue contract / dispatcher conventions already cover the relevant invariants.

Follow-up issues

None. The reviewer's pass confirmed recordEvent (append-only, autoincrement PK) is not the same class, and the other workflow INSERTs already use INSERT OR IGNORE or aren't on the retry path.

Out of scope

Decisions

Posted as an inline review comment on workflow-record.ts; full log in planning/issues/108/decisions.md.

Strategy

Branch is even with origin/main (no rebase/merge needed); mergeable: MERGEABLE.

Summary by CodeRabbit

  • Bug Fixes

    • Improved workflow retry resilience by preventing duplicate key constraint failures during retried executions.
  • Tests

    • Added test coverage for workflow idempotency on retry and error handling during step retries.

Review Change Stack

@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8723ed63-76ad-4c1d-af0c-02b116e9a093

📥 Commits

Reviewing files that changed from the base of the PR and between 720044c and 09e15a8.

📒 Files selected for processing (5)
  • packages/dispatcher/src/workflow-record.ts
  • packages/dispatcher/test/implementation-workflow.test.ts
  • packages/dispatcher/test/workflow-record.test.ts
  • planning/issues/108/decisions.md
  • planning/issues/108/plan.md

📝 Walkthrough

Walkthrough

This PR fixes a latent bug in the prepare-worktree workflow step: when createWorkflowRecord is called and a downstream operation fails, bunqueue retries the entire step, causing the record insert to fail on the primary key UNIQUE constraint and masking the original error. The fix makes the insert idempotent using ON CONFLICT(id) DO NOTHING, with unit and integration tests validating the behavior.

Changes

Idempotent workflow record creation

Layer / File(s) Summary
Idempotent workflow record insertion
packages/dispatcher/src/workflow-record.ts
createWorkflowRecord insert now includes ON CONFLICT(id) DO NOTHING to safely handle retries without UNIQUE constraint violations. Documentation expanded to clarify the idempotency scope and meta_json initialization.
Unit tests for idempotent insert
packages/dispatcher/test/workflow-record.test.ts
Added type import and two new test cases: one verifying duplicate inserts with the same id are no-ops while preserving the original row, and another confirming non-PK constraint violations (invalid kind cast) still throw correctly.
Integration test for step retry resilience
packages/dispatcher/test/implementation-workflow.test.ts
New test suite for issue #108 injects a transient createWorktree failure, verifies the workflow retries and eventually completes, confirms the call was retried exactly twice, validates worktreePath is set, and checks for tmux session leaks.
Planning documentation
planning/issues/108/plan.md, planning/issues/108/decisions.md
Plan outlines the root cause of UNIQUE masking during bunqueue retries and the idempotent insert approach. Decision document records the selected approach, rationale for not using broader conflict suppression, and references to test evidence.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • thejustinwalsh/middle#132: Both PRs modify packages/dispatcher/src/workflow-record.ts's createWorkflowRecord insert behavior (main PR adds ON CONFLICT(id) DO NOTHING idempotency; retrieved PR extends the insert to persist meta_json.source), so they overlap directly in the same code path.
  • thejustinwalsh/middle#105: Main PR's idempotent createWorkflowRecord (fixing UNIQUE failures on workflow-record retries) directly aligns with/reconciles the retrieved PR #105 recommender workflow and tests that rely on check-rate-limit retry semantics to avoid (or observe) UNIQUE/row re-insert masking.

Suggested labels

ready-for-review

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately captures the main change: making createWorkflowRecord idempotent to prevent UNIQUE constraint errors from masking downstream failures on step retry.
Linked Issues check ✅ Passed All coding requirements from issue #108 are met: createWorkflowRecord is now idempotent via ON CONFLICT(id) DO NOTHING, tests verify idempotent retry behavior and that constraint violations still throw, and the workflow-level test confirms step retry recovery.
Out of Scope Changes check ✅ Passed All changes directly support the idempotency fix for issue #108: source code changes to workflow-record.ts, corresponding unit and integration tests, and supporting documentation and plan files are all in scope.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

…n't mask errors with UNIQUE

prepare-worktree (implementation workflow) runs createWorkflowRecord (INSERT on
the workflows PK) then createWorktree, under bunqueue's default retry: 3. A
transient createWorktree failure retries the step from the top, re-runs the
INSERT for the same execution id, and surfaces `UNIQUE constraint failed`,
masking the real error. The recommender/documentation check-rate-limit steps
dodged this with retry: 1; prepare-worktree legitimately wants to retry.

Fix at the source: INSERT OR IGNORE. The only PK collision is a same-execution
retry — the no-op case — so a retried record-creating step surfaces the real
downstream error. Hardens all three workflows at one point.

Tests: unit test that a second create with the same id is a no-op (reproduces
the masking UNIQUE before the fix); workflow test that a transient createWorktree
failure now recovers to completed instead of failing on a masked UNIQUE.
…d PK conflict

ON CONFLICT(id) DO NOTHING rather than a blanket INSERT OR IGNORE, so a genuine
CHECK/NOT-NULL violation still throws instead of being silently swallowed; only
the same-execution retry no-ops. Add a test that a bad kind still throws.
const now = Date.now();
const metaJson = input.source === undefined ? null : JSON.stringify({ source: input.source });
db.run(
`INSERT INTO workflows

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decision: ON CONFLICT(id) DO NOTHING, not per-step retry: 1 or blanket INSERT OR IGNORE.

Two candidate fixes for the retry-masking bug:

  1. retry: 1 on prepare-worktree (mirror the check-rate-limit workaround) — but createWorktree is legitimately retriable (a transient git failure can succeed on retry); retry: 1 throws that recovery away just to dodge the INSERT.
  2. Idempotent INSERT — the INSERT is the only thing in the step that isn't safe to re-run. The PK is ctx.executionId, unique per bunqueue execution, so the only way it collides is the same execution retrying — precisely the no-op case. Keeps the worktree retry intact and hardens all three record-creating steps (implementation/recommender/documentation) at one point.

Chose (2). Scoped to the id PK conflict rather than INSERT OR IGNORE on purpose: a blanket ignore would also swallow a genuine CHECK/NOT-NULL violation (a real bug), whereas ON CONFLICT(id) only no-ops the same-execution retry. Guarded by a test asserting a bad kind still throws.

The existing retry: 1 on the recommender/documentation check-rate-limit steps is left in place — independent rationale (a deterministic db-state check gains nothing from retrying), now belt-and-suspenders.

@thejustinwalsh thejustinwalsh marked this pull request as ready for review May 26, 2026 20:42
@thejustinwalsh thejustinwalsh added the ready-for-review All phases done and verified — PR ready for final human review and merge label May 26, 2026
@thejustinwalsh

Copy link
Copy Markdown
Owner Author

Reviewer's brief — PR #159 (Closes #108)

What this fixes: createWorkflowRecord was a plain INSERT. Workflow steps that call it run under bunqueue's default retry: 3 (prepare-worktree in the implementation workflow being the unguarded one), so a transient failure after the INSERT retried the whole step, re-INSERTed the same PK id, and surfaced UNIQUE constraint failed — masking the real error. Now INSERT ... ON CONFLICT(id) DO NOTHING: the same-execution retry no-ops, the real downstream error surfaces.

How to run it:

bun install
bun test packages/dispatcher/test/workflow-record.test.ts \
         packages/dispatcher/test/implementation-workflow.test.ts
bun run typecheck && bun run lint

What to verify (and what "correct" looks like):

  • packages/dispatcher/src/workflow-record.ts:67 — the INSERT uses ON CONFLICT(id) DO NOTHING, not a blanket INSERT OR IGNORE. This is deliberate: a real CHECK/NOT-NULL violation must still throw; only the PK collision (a same-execution retry) no-ops.
  • workflow-record.test.ts — "idempotent on retry" proves a second create with the same id leaves the existing (already-advanced) row untouched; "bad kind still throws" proves the PK-scoping (would fail if downgraded to INSERT OR IGNORE).
  • implementation-workflow.test.ts — a transient createWorktree failure retries and reaches completed (asserts calls === 2 so the retry path genuinely ran). Reverting the fix makes this fail on the masked UNIQUE.

How to review it: the change is one SQL clause + doc + three tests. Both tests were confirmed red→green (revert the clause and they fail with UNIQUE constraint failed: workflows.id). The design rationale (why this over retry: 1 and over INSERT OR IGNORE) is an inline review comment on workflow-record.ts and in planning/issues/108/decisions.md.

Fragile / worth extra eyes: the existing retry: 1 on the recommender/documentation check-rate-limit steps is intentionally left in place (independent rationale: a deterministic db check gains nothing from retrying) — it's now belt-and-suspenders, not the load-bearing guard. recordEvent is append-only with an autoincrement PK, so it is not the same class and was correctly left alone.

Mergeability: branch is even with origin/main; mergeable: MERGEABLE.

@thejustinwalsh thejustinwalsh merged commit 851efcf into main May 28, 2026
1 check passed
@thejustinwalsh thejustinwalsh deleted the middle-issue-108 branch May 28, 2026 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-review All phases done and verified — PR ready for final human review and merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

implementation workflow: prepare-worktree can hit UNIQUE on step retry

1 participant