fix(etl): raise cpu_ms limit to 400k (CF max) for large-file queue processing by andrew-bierman · Pull Request #2419 · PackRat-AI/PackRat

andrew-bierman · 2026-05-13T19:19:30Z

Problem

The backpressure fix (PR #2418) solved Worker OOM. But large CSV files now hit a second limit: the 5-minute CPU time limit (cpu_ms: 300000).

CSV parsing 42-field rows costs ~6ms CPU/row
Budget of 300,000ms ÷ 6ms/row ≈ 50k rows per invocation
Evo file has ~68k rows → Worker killed at row ~50k mid-file

Observed in wrangler tail:

Queue packrat-etl-queue (1 message) - Exceeded CPU Limit @ 5/13/2026, 12:57:16 PM

Fix

Raise cpu_ms from 300,000ms to 400,000ms (the Cloudflare maximum).
This extends the per-invocation budget to ~66k rows — enough for evo.

Limitations

400k is the CF hard cap. Backcountry (329 MB, likely 200k+ rows) will still hit it. The permanent solution is chunking large files in ScrapyD so each queue message processes ≤40k rows. That work is tracked separately.

Post-Deploy Validation

After deploy, re-queue evo and confirm it completes end-to-end.

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

Summary by CodeRabbit

New Features
- Added support for processing large catalog files by automatically splitting files larger than ~20MB into smaller chunks for more efficient handling.
Bug Fixes
- Fixed out-of-memory issues during large file processing by implementing improved memory management and backpressure controls.
Chores
- Increased CPU time allocation for catalog ETL job processing.

…ocessing 50k-row files exhaust the 300k CPU budget (~6ms CPU/row for 42-field CSV). This extends the per-invocation limit to ~66k rows. Chunking in ScrapyD is the long-term fix for files beyond that threshold.

coderabbitai · 2026-05-13T19:19:37Z

Warning

Rate limit exceeded

@andrew-bierman has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 53 minutes and 58 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ec93d4bb-65a8-481d-99ec-22ced5f8ce0f

📥 Commits

Reviewing files that changed from the base of the PR and between 0715bd9 and 9b744c1.

📒 Files selected for processing (1)

packages/api/src/routes/admin/analytics/catalog.ts

Walkthrough

This PR implements large-file chunking for catalog ETL processing. R2 objects larger than ~20MB are now split into byte-range chunks at the queue endpoint, each chunk is processed independently with header injection and boundary-row skipping in the worker, and parser backpressure is enforced to prevent out-of-memory conditions. The CPU time limit is increased to 400 seconds to support this larger scope.

Changes

Large-file catalog ETL chunking

Layer / File(s)	Summary
Type contracts for chunked ETL `packages/api/src/services/etl/types.ts`	`CatalogETLMessage.data` and `QueueCatalogETLParams` updated to carry optional byte-range fields (`byteStart`, `byteEnd`) and `chunks` array structure instead of individual object keys and scraper metadata.
Queue service refactored for chunks `packages/api/src/services/etl/queue.ts`	`queueCatalogETL` signature changed from `objectKeys: string[]` to `chunks: Array<{ objectKey, byteStart?, byteEnd? }>`. Batching loop iterates over chunks and includes byte-range metadata in queued messages.
Catalog endpoint chunk-splitting logic `packages/api/src/routes/catalog/index.ts`	`/etl` route imports `R2BucketService`, fetches object size metadata, computes byte-range chunks for objects larger than ~20MB, and passes the chunks array to `queueCatalogETL`.
Process worker chunk handling and backpressure `packages/api/src/services/etl/processCatalogEtl.ts`	`processCatalogETL` extracts byte-range metadata, performs range-scoped R2 reads, fetches and injects CSV headers for non-first chunks, skips partial boundary rows, and honors parser backpressure via `drain` event waiting.
Admin retry endpoint aligned to new chunk format `packages/api/src/routes/admin/analytics/catalog.ts`	`/catalog/etl/:jobId/retry` retry invocation updated to use new `chunks: [{ objectKey }]` format.
Increase CPU time limit `packages/api/wrangler.jsonc`	Cloudflare worker CPU limit increased from 300 seconds to 400 seconds to accommodate larger catalog ETL workloads.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

PackRat-AI/PackRat#2418: Both PRs add parser backpressure handling to processCatalogETL to prevent large-file memory pressure.
PackRat-AI/PackRat#2394: Both PRs modify the admin /catalog/etl/:jobId/retry retry flow to align with updates to queueCatalogETL.
PackRat-AI/PackRat#2409: Both PRs modify the same admin retry flow and may have related changes to the ETL queuing mechanism.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: raising the cpu_ms limit from 300k to 400k (CF max) for large-file queue processing, which directly addresses the CPU time bottleneck blocking large CSV file processing.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/etl-completion-ordering

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR updates the Cloudflare Worker execution limits for the API’s queue processing so large CSV ETL jobs have more CPU time per invocation and are less likely to be terminated mid-file.

Changes:

Increased Cloudflare Worker limits.cpu_ms from 300,000ms (5 minutes) to 400,000ms (Cloudflare maximum).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… time Instead of streaming the full file (up to 600 MB) in one Worker invocation, the notify endpoint now HEADs each object and splits files >20 MB into sequential byte-range messages. Each Worker invocation fetches only its slice via a Range request (~30k rows) — well within the 400k CPU budget. Non-first chunks fetch the header row via a separate 4 KB range request and skip the partial row at the chunk boundary. No ScrapyD changes needed.

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

packages/api/src/services/etl/processCatalogEtl.ts (2)

205-209: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Don't mark a chunked job completed from each chunk worker.

After this PR, one ETL job can fan out into multiple byte-range messages. The first chunk that reaches Lines 205-209 flips the whole job to completed even if sibling chunks are still running, so admin history can show a finished job with incomplete totals. Completion needs to move behind a chunk-level completion counter or an orchestrator step that runs after the last chunk.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/api/src/services/etl/processCatalogEtl.ts` around lines 205 - 209,
The current chunk worker code in processCatalogEtl.ts flips the entire ETL job
to status 'completed' (the db.update on etlJobs using jobId) from within each
chunk handler, which prematurely completes the job; instead, stop marking the
job completed here and implement a chunk-level completion mechanism: have the
chunk worker mark its chunk record as finished and atomically increment a
completedChunks counter (or insert/update a chunk status table) in the same
transaction, then check totalChunks vs completedChunks and only when they match
perform the etlJobs.update to set status: 'completed' and completedAt using
jobId; alternatively, publish a "chunk-finished" message to an orchestrator
service which performs the final completion update after verifying all chunks
(reference symbols: etlJobs, jobId, the try block in processCatalogEtl.ts).

89-117: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Await the stream pump to prevent consumer stall on read failures.

The async pump (lines 89–117) is fire-and-forget. If the R2 stream fails mid-read, parser.end() never executes, and the consumer loop (for await (const record of parser)) hangs indefinitely waiting for more data—eventually timing out the Worker instead of failing cleanly. Capture the pump promise, await it after the consumer loop, and tear down the parser on error.

Proposed direction

-    (async () => {
+    const pump = (async () => {
       if (injectedHeader) {
         parser.write(`${injectedHeader}\n`);
       }
       let skipPartialRow = byteStart !== undefined && byteStart > 0;

       for await (const chunk of streamToText(r2Object.body)) {
         let text = chunk;
         // ...
         const ok = parser.write(text);
         if (!ok) await new Promise<void>((resolve) => parser.once('drain', resolve));
       }
       parser.end();
-    })();
+    })();

-    for await (const record of parser) {
-      // ...
-    }
+    try {
+      for await (const record of parser) {
+        // ...
+      }
+      await pump;
+    } catch (error) {
+      parser.destroy(error as Error);
+      await pump.catch(() => undefined);
+      throw error;
+    }

This also aligns with the guideline to use async/await everywhere in the API package.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/api/src/services/etl/processCatalogEtl.ts` around lines 89 - 117,
The stream pump launched as a fire-and-forget IIFE must be captured and awaited
so read failures don't leave `parser` hanging; change the IIFE to assign its
promise (e.g., const pumpPromise = (async () => { ... })()), wrap its body in
try/catch that calls `parser.end()` (or `parser.destroy()` if available) on
error and rethrows, and after the consumer loop (for await (const record of
parser)) await pumpPromise so any pump errors propagate and the parser is torn
down cleanly; reference `parser`, `streamToText(r2Object.body)`,
`injectedHeader`, and `byteStart` when locating the pump logic to update.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/api/src/routes/admin/analytics/catalog.ts`:
- Line 435: The retry path is requeuing the entire object as a single chunk by
calling queueCatalogETL({ ..., chunks: [{ objectKey }], ... }); instead, call
the same chunk-planning service used by the main /catalog/etl route to split the
object into chunks before calling queueCatalogETL; locate the chunk-planning
helper used by the /catalog/etl handler (the function that computes/splits
chunks for an object) and replace the hard-coded [{ objectKey }] with its
returned chunk list, then pass that chunk list into queueCatalogETL along with
the same env.ETL_QUEUE and newJobId.

In `@packages/api/src/routes/catalog/index.ts`:
- Around line 187-189: The code currently treats a null result from
r2.head(objectKey) as a valid whole-file job; change the logic in the handler
that calls r2.head so that if meta is null you fail fast instead of pushing to
queueChunks: explicitly check if meta === null (or falsy but distinguish null vs
size) and return/throw an error (or mark the job as invalid) with a clear
message referencing objectKey so the caller/workflow does not create a running
job for a missing object; keep the existing branch that uses meta.size <=
CHUNK_BYTES to enqueue whole-file jobs only when meta is present.
- Around line 180-205: Extract the chunk planning logic (CHUNK_BYTES constant,
R2BucketService usage, loop building queueChunks) into a new ETL service
function (e.g., planCatalogChunks or buildChunkRanges) under the API services
namespace; the new function should accept (env, chunks) and return the array of
{ objectKey, byteStart?, byteEnd? } entries after performing r2.head and the
byte-range math, leaving the route handler to only call that service and then
call queueCatalogETL with its result. Ensure the route no longer instantiates
R2BucketService or contains the loop — instead import and await the new
planCatalogChunks(env, chunks) function, preserve existing semantics
(CHUNK_BYTES, Math.ceil, byteEnd calculation), and surface any errors from the
service so the handler can handle validation/persistence/dispatch only.

In `@packages/api/src/services/etl/processCatalogEtl.ts`:
- Around line 95-105: skipPartialRow currently treats every non-zero byteStart
as mid-row and can drop a full row that starts exactly at byteStart; update the
logic so we only discard a leading fragment when the chunk truly begins mid-row.
Either propagate a startsMidRow flag from the splitter into the consumer
(preferred) or, before setting skipPartialRow in processCatalogEtl,
fetch/inspect the single byte immediately before byteStart (byteStart-1) from
the R2 object and set skipPartialRow = byteStart > 0 && previousByte !== '\n'.
Apply this change around the streamToText(r2Object.body) loop where
skipPartialRow is used so the code only slices off a partial row when an actual
row fragment precedes the chunk.

In `@packages/api/src/services/etl/types.ts`:
- Around line 6-16: Define and export a single shared chunk union type (e.g.,
CatalogChunk = { objectKey: string } | { objectKey: string; byteStart: number;
byteEnd: number }) and replace the ad-hoc inline chunk shapes with that type:
update the existing `data` shape and the `QueueCatalogETLParams.chunks`
signature to use `CatalogChunk[]`, ensuring ranged chunks require both
`byteStart` and `byteEnd` (no optionals) while whole-object chunks have only
`objectKey`; export the new `CatalogChunk` type so route and queue layers can
import and reuse it to prevent contract drift.

---

Outside diff comments:
In `@packages/api/src/services/etl/processCatalogEtl.ts`:
- Around line 205-209: The current chunk worker code in processCatalogEtl.ts
flips the entire ETL job to status 'completed' (the db.update on etlJobs using
jobId) from within each chunk handler, which prematurely completes the job;
instead, stop marking the job completed here and implement a chunk-level
completion mechanism: have the chunk worker mark its chunk record as finished
and atomically increment a completedChunks counter (or insert/update a chunk
status table) in the same transaction, then check totalChunks vs completedChunks
and only when they match perform the etlJobs.update to set status: 'completed'
and completedAt using jobId; alternatively, publish a "chunk-finished" message
to an orchestrator service which performs the final completion update after
verifying all chunks (reference symbols: etlJobs, jobId, the try block in
processCatalogEtl.ts).
- Around line 89-117: The stream pump launched as a fire-and-forget IIFE must be
captured and awaited so read failures don't leave `parser` hanging; change the
IIFE to assign its promise (e.g., const pumpPromise = (async () => { ... })()),
wrap its body in try/catch that calls `parser.end()` (or `parser.destroy()` if
available) on error and rethrows, and after the consumer loop (for await (const
record of parser)) await pumpPromise so any pump errors propagate and the parser
is torn down cleanly; reference `parser`, `streamToText(r2Object.body)`,
`injectedHeader`, and `byteStart` when locating the pump logic to update.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 22f763d7-f274-45b6-846f-7a3ea6c252e4

📥 Commits

Reviewing files that changed from the base of the PR and between b875cd2 and 0715bd9.

📒 Files selected for processing (6)

packages/api/src/routes/admin/analytics/catalog.ts
packages/api/src/routes/catalog/index.ts
packages/api/src/services/etl/processCatalogEtl.ts
packages/api/src/services/etl/queue.ts
packages/api/src/services/etl/types.ts
packages/api/wrangler.jsonc

coderabbitai · 2026-05-13T19:51:03Z

        });

-        await queueCatalogETL({ queue: env.ETL_QUEUE, objectKeys: [objectKey], jobId: newJobId });
+        await queueCatalogETL({ queue: env.ETL_QUEUE, chunks: [{ objectKey }], jobId: newJobId });


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Retry still requeues large files as a single chunk.

This matches the new queueCatalogETL shape, but it bypasses the chunk-splitting logic added on the main /catalog/etl path. Retrying a previously failed large object will enqueue the full file again and can reproduce the CPU-limit failure this PR is trying to avoid. Reuse the same chunk-planning service here before calling queueCatalogETL.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/api/src/routes/admin/analytics/catalog.ts` at line 435, The retry path is requeuing the entire object as a single chunk by calling queueCatalogETL({ ..., chunks: [{ objectKey }], ... }); instead, call the same chunk-planning service used by the main /catalog/etl route to split the object into chunks before calling queueCatalogETL; locate the chunk-planning helper used by the /catalog/etl handler (the function that computes/splits chunks for an object) and replace the hard-coded [{ objectKey }] with its returned chunk list, then pass that chunk list into queueCatalogETL along with the same env.ETL_QUEUE and newJobId.

coderabbitai · 2026-05-13T19:51:03Z

+      // Split large files into 20 MB byte-range chunks so each Worker
+      // invocation stays within the CPU time budget (~30k rows / chunk).
+      const CHUNK_BYTES = 20 * 1024 * 1024;
+      const r2 = new R2BucketService({ env, bucketType: 'catalog' });
+      const queueChunks: Array<{ objectKey: string; byteStart?: number; byteEnd?: number }> = [];
+
+      for (const objectKey of chunks) {
+        const meta = await r2.head(objectKey);
+        if (!meta || meta.size <= CHUNK_BYTES) {
+          queueChunks.push({ objectKey });
+        } else {
+          const n = Math.ceil(meta.size / CHUNK_BYTES);
+          for (let i = 0; i < n; i++) {
+            queueChunks.push({
+              objectKey,
+              byteStart: i * CHUNK_BYTES,
+              byteEnd: Math.min((i + 1) * CHUNK_BYTES - 1, meta.size - 1),
+            });
+          }
+        }
+      }
+
      await queueCatalogETL({
        queue: env.ETL_QUEUE,
-        objectKeys: chunks,
+        chunks: queueChunks,
        jobId,


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Move chunk planning into an ETL service.

This block is ETL business logic, not route orchestration. Extract the R2 head + byte-range planning into src/services/etl/* and keep the handler focused on validation, persistence, and dispatch.

As per coding guidelines, "Keep business logic in src/services/ of the API package, not in route handlers".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/api/src/routes/catalog/index.ts` around lines 180 - 205, Extract the chunk planning logic (CHUNK_BYTES constant, R2BucketService usage, loop building queueChunks) into a new ETL service function (e.g., planCatalogChunks or buildChunkRanges) under the API services namespace; the new function should accept (env, chunks) and return the array of { objectKey, byteStart?, byteEnd? } entries after performing r2.head and the byte-range math, leaving the route handler to only call that service and then call queueCatalogETL with its result. Ensure the route no longer instantiates R2BucketService or contains the loop — instead import and await the new planCatalogChunks(env, chunks) function, preserve existing semantics (CHUNK_BYTES, Math.ceil, byteEnd calculation), and surface any errors from the service so the handler can handle validation/persistence/dispatch only.

coderabbitai · 2026-05-13T19:51:03Z

+        const meta = await r2.head(objectKey);
+        if (!meta || meta.size <= CHUNK_BYTES) {
+          queueChunks.push({ objectKey });


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast when head() returns no metadata.

When r2.head(objectKey) returns null, this branch still enqueues the object as if it were a normal whole-file job. That turns a known validation failure into a worker failure later, and the job has already been created as running. Reject missing objects here instead of pushing them into the queue.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/api/src/routes/catalog/index.ts` around lines 187 - 189, The code currently treats a null result from r2.head(objectKey) as a valid whole-file job; change the logic in the handler that calls r2.head so that if meta is null you fail fast instead of pushing to queueChunks: explicitly check if meta === null (or falsy but distinguish null vs size) and return/throw an error (or mark the job as invalid) with a clear message referencing objectKey so the caller/workflow does not create a running job for a missing object; keep the existing branch that uses meta.size <= CHUNK_BYTES to enqueue whole-file jobs only when meta is present.

coderabbitai · 2026-05-13T19:51:03Z

+      let skipPartialRow = byteStart !== undefined && byteStart > 0;
+
      for await (const chunk of streamToText(r2Object.body)) {
+        let text = chunk;
+
+        if (skipPartialRow) {
+          // Discard bytes up to and including the first newline — those bytes are
+          // the tail of the row that the previous chunk already processed.
+          const nl = text.indexOf('\n');
+          if (nl === -1) continue; // entire buffer is still the partial row tail
+          text = text.slice(nl + 1);


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Guard the boundary skip with an actual row-fragment check.

skipPartialRow is enabled for every non-first chunk. If a slice begins exactly on the first byte of a real row, Lines 100-105 still discard everything through the next newline, so that row is silently lost. Carry a startsMidRow flag from the splitter or inspect the byte before byteStart before skipping.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/api/src/services/etl/processCatalogEtl.ts` around lines 95 - 105, skipPartialRow currently treats every non-zero byteStart as mid-row and can drop a full row that starts exactly at byteStart; update the logic so we only discard a leading fragment when the chunk truly begins mid-row. Either propagate a startsMidRow flag from the splitter into the consumer (preferred) or, before setting skipPartialRow in processCatalogEtl, fetch/inspect the single byte immediately before byteStart (byteStart-1) from the R2 object and set skipPartialRow = byteStart > 0 && previousByte !== '\n'. Apply this change around the streamToText(r2Object.body) loop where skipPartialRow is used so the code only slices off a partial row when an actual row fragment precedes the chunk.

coderabbitai · 2026-05-13T19:51:03Z

  data: {
    objectKey: string;
+    byteStart?: number;
+    byteEnd?: number;
  };
 }

 export interface QueueCatalogETLParams {
  queue: Queue;
-  objectKey: string;
-  userId: string;
-  source: string;
-  scraperRevision: string;
+  chunks: Array<{ objectKey: string; byteStart?: number; byteEnd?: number }>;
+  jobId: string;


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Model chunk ranges as an all-or-nothing contract.

{ byteStart?: number; byteEnd?: number } allows invalid states like only one bound being present. This contract is safer as a union: either a whole object { objectKey } or a ranged chunk { objectKey, byteStart, byteEnd }. Export that shared chunk type here and reuse it across the queue and route layers so the contract cannot drift.

Proposed refactor

+export type CatalogETLChunk = + | { objectKey: string } + | { objectKey: string; byteStart: number; byteEnd: number }; + export interface CatalogETLMessage { timestamp: number; id: string; - data: { - objectKey: string; - byteStart?: number; - byteEnd?: number; - }; + data: CatalogETLChunk; } export interface QueueCatalogETLParams { queue: Queue; - chunks: Array<{ objectKey: string; byteStart?: number; byteEnd?: number }>; + chunks: CatalogETLChunk[]; jobId: string; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

data: {

objectKey: string;

byteStart?: number;

byteEnd?: number;

};

}

export interface QueueCatalogETLParams {

queue: Queue;

objectKey: string;

userId: string;

source: string;

scraperRevision: string;

chunks: Array<{ objectKey: string; byteStart?: number; byteEnd?: number }>;

jobId: string;

export type CatalogETLChunk =

| { objectKey: string }

| { objectKey: string; byteStart: number; byteEnd: number };

export interface CatalogETLMessage {

timestamp: number;

id: string;

data: CatalogETLChunk;

}

export interface QueueCatalogETLParams {

queue: Queue;

chunks: CatalogETLChunk[];

jobId: string;

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/api/src/services/etl/types.ts` around lines 6 - 16, Define and export a single shared chunk union type (e.g., CatalogChunk = { objectKey: string } | { objectKey: string; byteStart: number; byteEnd: number }) and replace the ad-hoc inline chunk shapes with that type: update the existing `data` shape and the `QueueCatalogETLParams.chunks` signature to use `CatalogChunk[]`, ensuring ranged chunks require both `byteStart` and `byteEnd` (no optionals) while whole-object chunks have only `objectKey`; export the new `CatalogChunk` type so route and queue layers can import and reuse it to prevent contract drift.

cloudflare-workers-and-pages · 2026-05-13T19:53:06Z

Deploying packrat-guides with Cloudflare Pages

Latest commit:	`9b744c1`
Status:	🚫 Build failed.

View logs

cloudflare-workers-and-pages · 2026-05-13T19:53:13Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
✅ Deployment successful! View logs	packrat-admin	`9b744c1`	Commit Preview URL Branch Preview URL	May 13 2026, 07:54 PM

github-actions · 2026-05-13T19:53:34Z

Coverage Report for Expo Unit Tests Coverage (./apps/expo)

Status	Category	Percentage	Covered / Total
🔵	Lines	82.61%	480 / 581
🔵	Statements	82.61% (🎯 75%)	480 / 581
🔵	Functions	92.59%	50 / 54
🔵	Branches	90.9%	170 / 187

File Coverage

No changed files found.

Generated in workflow #1190 for commit 9b744c1 by the Vitest Coverage Report Action

github-actions · 2026-05-13T19:53:34Z

Coverage Report for API Unit Tests Coverage (./packages/api)

Status	Category	Percentage	Covered / Total
🔵	Lines	76.17%	502 / 659
🔵	Statements	76.17% (🎯 65%)	502 / 659
🔵	Functions	95%	38 / 40
🔵	Branches	88.67%	227 / 256

File Coverage

No changed files found.

Generated in workflow #1190 for commit 9b744c1 by the Vitest Coverage Report Action

cloudflare-workers-and-pages · 2026-05-13T19:54:56Z

Deploying packrat-landing with Cloudflare Pages

Latest commit:	`9b744c1`
Status:	✅ Deploy successful!
Preview URL:	https://533fb489.packrat-landing.pages.dev
Branch Preview URL:	https://fix-etl-completion-ordering.packrat-landing.pages.dev

View logs

fix(etl): raise cpu_ms limit to 400k (CF max) for large-file queue pr…

df9d527

…ocessing 50k-row files exhaust the 300k CPU budget (~6ms CPU/row for 42-field CSV). This extends the per-invocation limit to ~66k rows. Chunking in ScrapyD is the long-term fix for files beyond that threshold.

Copilot AI review requested due to automatic review settings May 13, 2026 19:19

github-actions Bot added the api label May 13, 2026

Copilot started reviewing on behalf of andrew-bierman May 13, 2026 19:20 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

chore: resolve merge conflict — keep enqueue error handler from main

9b744c1

andrew-bierman merged commit 9b90c6c into main May 13, 2026
12 of 13 checks passed

andrew-bierman deleted the fix/etl-completion-ordering branch May 13, 2026 20:17

This was referenced May 14, 2026

fix(etl): atomic counter updates + retry counter reset #2421

Merged

fix: migrate catalog ETL to Cloudflare Workflows (audit remediation, phase 1) #2462

Merged

Conversation

andrew-bierman commented May 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Limitations

Post-Deploy Validation

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

cloudflare-workers-and-pages Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying packrat-guides with Cloudflare Pages

Uh oh!

cloudflare-workers-and-pages Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

github-actions Bot commented May 13, 2026

Coverage Report for Expo Unit Tests Coverage (./apps/expo)

Uh oh!

github-actions Bot commented May 13, 2026

Coverage Report for API Unit Tests Coverage (./packages/api)

Uh oh!

cloudflare-workers-and-pages Bot commented May 13, 2026

Deploying packrat-landing with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andrew-bierman commented May 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 13, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented May 13, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented May 13, 2026 •

edited

Loading