fix(etl): atomic counter updates + retry counter reset by andrew-bierman · Pull Request #2421 · PackRat-AI/PackRat

andrew-bierman · 2026-05-14T15:54:53Z

Summary

Three related ETL counter bugs that caused stalled/corrupted job state:

successRate > 100% (e.g. campsaver 700/600): processValidItemsBatch incremented totalValid in one DB call, then processCatalogEtl incremented totalProcessed in a separate call. CF Worker dying between these two left totalValid > totalProcessed permanently. Fixed by moving totalProcessed into updateEtlJobProgress so all three counters update atomically.
Counter double-counting on CF Queue retry: CF Worker CPU timeouts hard-kill the process, bypassing the catch block. CF Queue re-delivers the message with the same jobId, and additive counters accumulated 2× the actual row counts. Fixed by detecting a retry (same jobId already in running state with non-zero totalProcessed) and resetting counters to 0 before re-processing.
Broken auto-complete in updateEtlJobProgress: The CASE WHEN totalProcessed = totalValid + totalInvalid check always read a stale totalProcessed (it was updated after this function ran) and never fired. Removed the dead code — the explicit status='completed' set at the end of processCatalogEtl is the real completion mechanism.

Changes

updateEtlJobProgress.ts — add processed param, update all three counters atomically, remove broken auto-complete CASE
processValidItemsBatch.ts — pass processed: items.length
processLogsBatch.ts — pass processed: logs.length
processCatalogEtl.ts — retry detection at start; remove standalone totalProcessed increments
test/etl.test.ts — two new regression tests: retry counter reset, totalProcessed == totalValid + totalInvalid invariant

Post-Deploy Monitoring & Validation

What to monitor: /api/admin/analytics/catalog/etl — watch successRate on newly-processed jobs; should always be ≤ 100%
Call /api/admin/analytics/catalog/etl/reset-stuck after deploy to clear existing stale running jobs
Expected healthy behavior: new jobs complete with successRate ≤ 100%, totalProcessed == totalValid + totalInvalid
Failure signal: any job with successRate > 100% means the atomic update didn't land (rollback this PR)
Validation window: first 3 scraper runs after deploy
Owner: Andrew

🤖 Generated with Claude Code

Summary by CodeRabbit

Refactor
- Improved ETL job progress tracking to ensure atomic updates, preventing inconsistencies if worker failures occur mid-operation.
Tests
- Added validation test to verify ETL progress accounting invariants remain consistent upon job completion.

Copilot

Copilot wasn't able to review this pull request because it exceeds the maximum number of files (300). Try reducing the number of changed files and requesting a review from Copilot again.

coderabbitai · 2026-05-14T15:55:08Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9ba3a035-6921-4678-9df0-26aa5f5b1867

📥 Commits

Reviewing files that changed from the base of the PR and between feb909d and c27ea32.

📒 Files selected for processing (5)

packages/api/src/services/etl/processCatalogEtl.ts
packages/api/src/services/etl/processLogsBatch.ts
packages/api/src/services/etl/processValidItemsBatch.ts
packages/api/src/services/etl/updateEtlJobProgress.ts
packages/api/test/etl.test.ts

Walkthrough

The PR refactors ETL progress accounting to use atomic database increments within batch processors instead of separate updates in the main catalog stream function. The updateEtlJobProgress function accepts a new processed counter, and both valid and invalid batch handlers now report their item counts alongside progress in a single update call.

Changes

ETL Progress Atomicity

Layer / File(s)	Summary
Progress counter API update `packages/api/src/services/etl/updateEtlJobProgress.ts`	`updateEtlJobProgress` now accepts an optional `processed` parameter and increments `totalProcessed` in the database. Completion-related status/timestamp updates are removed from this function.
Batch processors report progress atomically `packages/api/src/services/etl/processValidItemsBatch.ts`, `packages/api/src/services/etl/processLogsBatch.ts`	`processValidItemsBatch` and `processLogsBatch` each call `updateEtlJobProgress` with both their item count and a `processed` field set to items length, ensuring progress and validity counters increment together in a single update.
Main ETL stream delegates progress to batch handlers `packages/api/src/services/etl/processCatalogEtl.ts`	`processCatalogEtl` removes direct progress updates and remaining-row computation logic, delegating all progress accounting to batch processors. Unused `sql` import is removed.
Accounting invariant test `packages/api/test/etl.test.ts`	New test asserts that `totalProcessed` equals `totalValid + totalInvalid` after catalog ETL completion, validating the atomicity invariant.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

PackRat-AI/PackRat#2419: Modifies processCatalogEtl's streaming logic in the same function; changes interact with batch flushing mechanics.
PackRat-AI/PackRat#2394: Updates ETL job finalization and progress counter behavior; directly connected to progress accounting changes.

Suggested labels

api

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the primary changes: atomic counter updates in ETL progress tracking and retry counter reset logic for handling re-delivered jobs.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/etl-stale-jobs

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cloudflare-workers-and-pages · 2026-05-14T15:56:14Z

Deploying packrat-landing with Cloudflare Pages

Latest commit:	`310e627`
Status:	✅ Deploy successful!
Preview URL:	https://bb85d6ca.packrat-landing.pages.dev
Branch Preview URL:	https://fix-etl-stale-jobs.packrat-landing.pages.dev

View logs

cloudflare-workers-and-pages · 2026-05-14T15:56:20Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
✅ Deployment successful! View logs	packrat-admin	`310e627`	Commit Preview URL Branch Preview URL	May 14 2026, 03:56 PM

cloudflare-workers-and-pages · 2026-05-14T15:58:35Z

Deploying packrat-guides with Cloudflare Pages

Latest commit:	`310e627`
Status:	✅ Deploy successful!
Preview URL:	https://6b445eed.packrat-guides-6gq.pages.dev
Branch Preview URL:	https://fix-etl-stale-jobs.packrat-guides-6gq.pages.dev

View logs

Three related bugs that corrupted ETL job counters: 1. totalValid > totalProcessed (successRate > 100%): processValidItemsBatch called updateEtlJobProgress (which incremented totalValid) in one DB call, then processCatalogEtl incremented totalProcessed in a separate DB call. If the CF Worker died between these two calls, the job was permanently stuck with totalValid > totalProcessed. Fix: move totalProcessed increment into updateEtlJobProgress so all three counters (valid, invalid, processed) are updated in a single atomic query. 2. Counter double-counting on CF Queue retry: CF Worker CPU timeouts hard-kill the process, bypassing the catch block, so CF Queue re-delivers the message with the same jobId. The counters were additive, so retried jobs accumulated 2x (or more) the actual row counts. Fix: at the start of processCatalogETL, detect a retry (running status with non-zero totalProcessed) and reset counters to 0 before re-processing. 3. Broken auto-complete in updateEtlJobProgress: the CASE WHEN check compared totalProcessed against totalValid + totalInvalid, but totalProcessed was always updated after updateEtlJobProgress ran, so the check always saw a stale value and never fired. The explicit status='completed' set at the end of processCatalogETL is the real completion mechanism. Fix: remove the dead auto-complete logic from updateEtlJobProgress. Also adds two regression tests: one for the retry counter reset, one to assert totalProcessed == totalValid + totalInvalid after a clean run.

…k processing With chunked ETL (multiple queue messages per file, each covering a byte range), chunk 2..N arrive with totalProcessed already > 0 from earlier chunks. The retry-reset guard would have fired on every non-first chunk, wiping the counters legitimately set by previous chunks. The core atomic-counter fix (totalProcessed updated inside updateEtlJobProgress alongside totalValid/totalInvalid) is the primary guard against successRate > 100%. Stale running jobs are already handled by the existing POST /etl/reset-stuck endpoint.

github-actions · 2026-05-14T16:11:29Z

Coverage Report for Expo Unit Tests Coverage (./apps/expo)

Status	Category	Percentage	Covered / Total
🔵	Lines	82.61%	480 / 581
🔵	Statements	82.61% (🎯 75%)	480 / 581
🔵	Functions	92.59%	50 / 54
🔵	Branches	90.9%	170 / 187

File Coverage

No changed files found.

Generated in workflow #1223 for commit c27ea32 by the Vitest Coverage Report Action

github-actions · 2026-05-14T16:11:30Z

Coverage Report for API Unit Tests Coverage (./packages/api)

Status	Category	Percentage	Covered / Total
🔵	Lines	76.17%	502 / 659
🔵	Statements	76.17% (🎯 65%)	502 / 659
🔵	Functions	95%	38 / 40
🔵	Branches	88.67%	227 / 256

File Coverage

No changed files found.

Generated in workflow #1223 for commit c27ea32 by the Vitest Coverage Report Action

Copilot AI review requested due to automatic review settings May 14, 2026 15:54

Copilot AI reviewed May 14, 2026

View reviewed changes

github-advanced-security AI found potential problems May 14, 2026

View reviewed changes

Comment thread .github/workflows/migrations.yml Fixed

andrew-bierman added 2 commits May 14, 2026 10:10

andrew-bierman force-pushed the fix/etl-stale-jobs branch from 310e627 to c27ea32 Compare May 14, 2026 16:10

andrew-bierman temporarily deployed to copilot May 14, 2026 16:11 — with GitHub Actions Inactive

github-actions Bot added the api label May 14, 2026

andrew-bierman merged commit 107972a into main May 14, 2026
11 checks passed

andrew-bierman deleted the fix/etl-stale-jobs branch May 14, 2026 16:15

This was referenced May 15, 2026

Release/2.0.26 #2427

Merged

feat(etl): add JSONL/NDJSON support alongside CSV #2471

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(etl): atomic counter updates + retry counter reset#2421

fix(etl): atomic counter updates + retry counter reset#2421
andrew-bierman merged 2 commits into
mainfrom
fix/etl-stale-jobs

andrew-bierman commented May 14, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

coderabbitai Bot commented May 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

cloudflare-workers-and-pages Bot commented May 14, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented May 14, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

andrew-bierman commented May 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Post-Deploy Monitoring & Validation

Summary by CodeRabbit

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

Uh oh!

cloudflare-workers-and-pages Bot commented May 14, 2026

Deploying packrat-landing with Cloudflare Pages

Uh oh!

cloudflare-workers-and-pages Bot commented May 14, 2026

Deploying with Cloudflare Workers

Uh oh!

cloudflare-workers-and-pages Bot commented May 14, 2026

Deploying packrat-guides with Cloudflare Pages

Uh oh!

github-actions Bot commented May 14, 2026

Coverage Report for Expo Unit Tests Coverage (./apps/expo)

Uh oh!

github-actions Bot commented May 14, 2026

Coverage Report for API Unit Tests Coverage (./packages/api)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

andrew-bierman commented May 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading