Skip to content

fix(etl): prevent Worker OOM on large CSV files via stream backpressure#2418

Merged
andrew-bierman merged 1 commit into
mainfrom
fix/etl-completion-ordering
May 13, 2026
Merged

fix(etl): prevent Worker OOM on large CSV files via stream backpressure#2418
andrew-bierman merged 1 commit into
mainfrom
fix/etl-completion-ordering

Conversation

@andrew-bierman
Copy link
Copy Markdown
Collaborator

@andrew-bierman andrew-bierman commented May 13, 2026

Summary

  • Root cause: R2 delivers large files (329–607 MB) faster than the csv-parse loop can consume rows. Without backpressure, parser.write() buffers the entire file before a single row is processed → Worker hits 128 MB memory limit → killed externally → outer catch never runs → job stuck running forever.
  • Fix: Check parser.write() return value and await drain before pushing more chunks — standard Node.js streams backpressure contract.
  • Secondary fix (previous commit): yield to event loop every 100 rows instead of every row, which was causing wall-clock limit issues on large files.

Root Cause Detail

R2 stream → streamToText → parser.write() [no backpressure]
                             ↓
                    Parser buffer grows to 329-607 MB
                             ↓
                    Worker OOM (128 MB limit)
                             ↓
                    Worker killed externally
                             ↓
                    catch{} block never runs
                             ↓
                    Job stays in 'running' state forever

Fix

const ok = parser.write(chunk);
if (!ok) await new Promise<void>((resolve) => parser.once('drain', resolve));

Affected Jobs

Seven jobs were stuck/failed due to this bug and have been re-queued:

  • backcountry (329 MB file, failed 5×)
  • campsaver, rei, campmor, omcgear, geartrade, evo

Post-Deploy Monitoring & Validation

  • What to monitor: wrangler tail packrat-api --format pretty — watch for ✅ Done processing log lines for the 7 re-queued jobs
  • Expected healthy signal: Each job transitions running → completed, totalProcessed matches CSV row count
  • Failure signal: Job stays running > 10 min → still OOM (unlikely after fix)
  • Validation window: 30 min after deploy; owner: @andrew-bierman
  • Queries:
    SELECT id, source, status, "totalProcessed", "totalValid" 
    FROM etl_jobs 
    WHERE status != 'completed' 
    ORDER BY "startedAt" DESC LIMIT 20;

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

Summary by CodeRabbit

  • Bug Fixes
    • Improved stability and reliability of catalog data processing for large files, reducing potential performance issues during data import operations.

Review Change Stack

…files

Without drain-wait, R2 delivers the entire file into the csv-parse
buffer before the main processing loop can drain any rows. For files
>50 MB this fills the 128 MB Worker memory limit → Worker killed
externally → outer catch never runs → job stays stuck in 'running'.

Fix: check the return value of parser.write() and await 'drain' before
pushing more chunks, exactly as the Node.js streams backpressure
contract requires.
Copilot AI review requested due to automatic review settings May 13, 2026 18:38
@github-actions github-actions Bot added the api label May 13, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 13, 2026

Caution

Review failed

Pull request was closed or merged during review

Walkthrough

The ETL catalog processor now implements backpressure handling when feeding CSV chunks to the parser. The parser's write() method returns a boolean indicating buffer state; when false, the processor waits for a drain event before continuing, preventing memory exhaustion on large files.

Changes

CSV Parser Backpressure

Layer / File(s) Summary
CSV parser backpressure mechanism
packages/api/src/services/etl/processCatalogEtl.ts
Parser backpressure is now honored by checking the return value of parser.write(chunk) and awaiting the drain event when the buffer is full, preventing the reader from outpacing the parser on large file ingestion.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

  • PackRat-AI/PackRat#2395: Modifies the same processCatalogETL function with a separate change to job completion status handling.
  • PackRat-AI/PackRat#2394: Also modifies processCatalogETL to change job completion behavior on the success path.

Suggested labels

api

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: implementing stream backpressure handling in the ETL process to prevent Worker out-of-memory errors on large CSV files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/etl-completion-ordering

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

Coverage Report for API Unit Tests Coverage (./packages/api)

Status Category Percentage Covered / Total
🔵 Lines 76.17% 502 / 659
🔵 Statements 76.17% (🎯 65%) 502 / 659
🔵 Functions 95% 38 / 40
🔵 Branches 88.67% 227 / 256
File CoverageNo changed files found.
Generated in workflow #1188 for commit 7654123 by the Vitest Coverage Report Action

@github-actions
Copy link
Copy Markdown
Contributor

Coverage Report for Expo Unit Tests Coverage (./apps/expo)

Status Category Percentage Covered / Total
🔵 Lines 82.61% 480 / 581
🔵 Statements 82.61% (🎯 75%) 480 / 581
🔵 Functions 92.59% 50 / 54
🔵 Branches 90.9% 170 / 187
File CoverageNo changed files found.
Generated in workflow #1188 for commit 7654123 by the Vitest Coverage Report Action

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Node stream backpressure handling to the catalog ETL CSV ingestion loop so the csv-parse parser doesn't buffer entire multi-hundred-MB R2 files in memory and crash the Worker with OOM.

Changes:

  • Capture parser.write() return value and await the drain event when it returns false.
  • Adds an inline comment explaining the OOM root cause and rationale.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@andrew-bierman andrew-bierman merged commit b875cd2 into main May 13, 2026
13 of 14 checks passed
@andrew-bierman andrew-bierman deleted the fix/etl-completion-ordering branch May 13, 2026 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants