fix(etl): flush remaining items before totalProcessed + use ORM for completion status#2413
Conversation
…ORM for completion
Two bugs caused phantom failures and inflated totalProcessed counts:
1. Remaining-items ordering: totalProcessed was incremented *before* the final
valid/invalid flushes. If processValidItemsBatch or processLogsBatch threw,
totalProcessed showed an inflated count while totalValid/totalInvalid stayed
null — making job records misleading and harder to diagnose on retry.
Fixed by flushing remaining items first, then updating totalProcessed.
2. Silent completion failure: the final status='completed' update used raw SQL
with an ::etl_job_status cast that silently failed in some Neon HTTP driver
versions. The inner try-catch swallowed the error, leaving the job in
'running' state so the reset-stuck sweep would mark it 'failed' even though
all items were successfully ingested.
Fixed by using the same db.update().set({ status: 'completed' }) pattern
that already works reliably for the 'failed' path in the outer catch.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Coverage Report for API Unit Tests Coverage (./packages/api)
File CoverageNo changed files found. |
Coverage Report for Expo Unit Tests Coverage (./apps/expo)
File CoverageNo changed files found. |
There was a problem hiding this comment.
Pull request overview
Fixes two bugs in the catalog ETL processor: (1) totalProcessed was incremented before the final batch flush so a flush failure produced inflated counts with null valid/invalid totals, and (2) the final status='completed' update used a raw-SQL enum cast that silently failed under some Neon HTTP driver versions, leaving jobs stuck and later marked as failed by the stuck-job sweep.
Changes:
- Reorder remaining-items handling so valid/invalid batches are flushed before
totalProcessedis updated. - Replace the raw-SQL completion update with the same Drizzle ORM
update().set({ status: 'completed' })pattern used by the failure path.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
Remaining-items ordering bug:
totalProcessedwas incremented before flushing the final valid/invalid batches. IfprocessValidItemsBatchorprocessLogsBatchthrew,totalProcessedshowed an inflated count whiletotalValid/totalInvalidstayed null — making job records misleading (e.g. the May 7 EMS job: 15,975totalProcessed, null valid/invalid). Fixed by flushing remaining items first, then updatingtotalProcessed.Silent completion failure: the final
status = 'completed'update used raw SQL with an::etl_job_statuscast that silently failed in some Neon HTTP driver versions (the inner try-catch swallowed the error). This left jobs stuck in'running', which thereset-stucksweep then marked as'failed'— even though all items were successfully ingested (the "phantom failure" pattern visible as 100% successRate + status=failed). Fixed by using the samedb.update().set({ status: 'completed' })ORM pattern that already works reliably for the'failed'path in the outer catch.Root cause
These two bugs together explain the patterns observed in the May 7–8 ETL run:
status: failedwithsuccessRate: 100→ phantom failures from the completion update bugtotalProcessedbut nulltotalValid/totalInvalid→ ordering bug in the remaining-items flushTest plan
status: completedand non-nulltotalValid/totalInvalid/catalog/etlendpoint no longer shows phantom failures after successful runsPost-Deploy Monitoring & Validation
GET /api/admin/analytics/catalog/etl— watch for jobs completing withstatus: completedinstead ofstatus: failedafter the fix shipsstatus: completed,successRate: 100, non-nulltotalValid/totalInvalid