fix(etl): prevent Worker OOM on large CSV files via stream backpressure#2418
Conversation
…files Without drain-wait, R2 delivers the entire file into the csv-parse buffer before the main processing loop can drain any rows. For files >50 MB this fills the 128 MB Worker memory limit → Worker killed externally → outer catch never runs → job stays stuck in 'running'. Fix: check the return value of parser.write() and await 'drain' before pushing more chunks, exactly as the Node.js streams backpressure contract requires.
|
Caution Review failedPull request was closed or merged during review WalkthroughThe ETL catalog processor now implements backpressure handling when feeding CSV chunks to the parser. The parser's ChangesCSV Parser Backpressure
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Coverage Report for API Unit Tests Coverage (./packages/api)
File CoverageNo changed files found. |
Coverage Report for Expo Unit Tests Coverage (./apps/expo)
File CoverageNo changed files found. |
There was a problem hiding this comment.
Pull request overview
Adds Node stream backpressure handling to the catalog ETL CSV ingestion loop so the csv-parse parser doesn't buffer entire multi-hundred-MB R2 files in memory and crash the Worker with OOM.
Changes:
- Capture
parser.write()return value andawaitthedrainevent when it returnsfalse. - Adds an inline comment explaining the OOM root cause and rationale.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
parser.write()buffers the entire file before a single row is processed → Worker hits 128 MB memory limit → killed externally → outercatchnever runs → job stuckrunningforever.parser.write()return value andawait drainbefore pushing more chunks — standard Node.js streams backpressure contract.Root Cause Detail
Fix
Affected Jobs
Seven jobs were stuck/failed due to this bug and have been re-queued:
Post-Deploy Monitoring & Validation
wrangler tail packrat-api --format pretty— watch for✅ Done processinglog lines for the 7 re-queued jobsrunning → completed,totalProcessedmatches CSV row countrunning> 10 min → still OOM (unlikely after fix)🤖 Generated with Claude Code
Co-Authored-By: Claude noreply@anthropic.com
Summary by CodeRabbit