perf(core): Skip worker pool for lightweight file processing#1338
perf(core): Skip worker pool for lightweight file processing#1338
Conversation
When files don't require tree-sitter parsing (compress/removeComments), the worker pool IPC overhead is unnecessary. Processing them on the main thread avoids serialization costs and is faster for lightweight workloads. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
⚡ Performance Benchmark
Details
History
|
This comment has been minimized.
This comment has been minimized.
Deploying repomix with
|
| Latest commit: |
6fecdca
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://e2ebb05e.repomix.pages.dev |
| Branch Preview URL: | https://perf-skip-worker-pool-lightw-ibtg.repomix.pages.dev |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1338 +/- ##
==========================================
+ Coverage 87.13% 87.14% +0.01%
==========================================
Files 115 115
Lines 4367 4389 +22
Branches 1015 1020 +5
==========================================
+ Hits 3805 3825 +20
- Misses 562 564 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/core/file/fileProcess.ts (2)
29-59: Don’t use dependency injection as the mode switch.
deps === nullnow doubles as “use the fast path”, whileprocessFilesMainThread()hard-codes the module-levelgetFileManipulator. Any caller that injects deps loses the optimization, and this branch becomes awkward to cover with mocked manipulators. Resolve deps first and pass the resolved manipulator into the helper instead; if tests need workers, make that an explicit flag. Longer term, a shared per-file transform helper reused bysrc/core/file/workers/fileProcessWorker.tswould keep both branches in sync.♻️ Suggested refactor
const processFilesMainThread = async ( rawFiles: RawFile[], config: RepomixConfigMerged, progressCallback: RepomixProgressCallback, + resolveFileManipulator: GetFileManipulator, ): Promise<ProcessedFile[]> => { @@ if (config.output.removeEmptyLines) { - const manipulator = getFileManipulator(rawFile.path); + const manipulator = resolveFileManipulator(rawFile.path); if (manipulator) { content = manipulator.removeEmptyLines(content); } } @@ - if (!needsWorkerProcessing(config) && deps === null) { - return processFilesMainThread(rawFiles, config, progressCallback); - } - const resolvedDeps = deps ?? { initTaskRunner, getFileManipulator, }; + + if (!needsWorkerProcessing(config)) { + return processFilesMainThread( + rawFiles, + config, + progressCallback, + resolvedDeps.getFileManipulator, + ); + }Also applies to: 89-103
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/core/file/fileProcess.ts` around lines 29 - 59, processFilesMainThread currently treats deps === null as the “fast path” and directly calls the module-level getFileManipulator, which makes behavior diverge when callers inject deps; resolve the deps up-front and pass the concrete manipulator (or a per-file manipulator factory) into processFilesMainThread instead of calling getFileManipulator internally, so callers that pass deps keep the same optimized path; make worker-vs-main an explicit flag or parameter (not implicit via deps) and extract the per-file transform into a shared helper used by both processFilesMainThread and the worker (see getFileManipulator and the loop logic in processFilesMainThread and corresponding worker code) so both branches share the same resolved manipulator behavior.
19-20: Make the worker decision file-aware.When
compressisfalseandremoveCommentsistrue, this still forces the worker pool even if every input path is unsupported and manipulator lookup would returnnull. Since that lookup is only an extension check, you can cheaply gate onrawFileshere and keep the fast path for Markdown/JSON-only repos too.Also applies to: 96-97
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/core/file/fileProcess.ts` around lines 19 - 20, The worker decision in needsWorkerProcessing currently ignores the repository's files and returns true whenever compress or removeComments is set; change needsWorkerProcessing to also accept the repository file list (rawFiles) and only return true if (config.output.compress || config.output.removeComments) AND at least one file in rawFiles would be handled by a manipulator (i.e., perform the same cheap extension-based manipulator lookup used elsewhere for each file and return true if any lookup is non-null). Update the function signature (needsWorkerProcessing) and any call sites (also the usage around the similar check at the other spot mentioned) to pass rawFiles so the fast path is preserved for Markdown/JSON-only repos.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/core/file/fileProcess.ts`:
- Around line 29-59: processFilesMainThread currently treats deps === null as
the “fast path” and directly calls the module-level getFileManipulator, which
makes behavior diverge when callers inject deps; resolve the deps up-front and
pass the concrete manipulator (or a per-file manipulator factory) into
processFilesMainThread instead of calling getFileManipulator internally, so
callers that pass deps keep the same optimized path; make worker-vs-main an
explicit flag or parameter (not implicit via deps) and extract the per-file
transform into a shared helper used by both processFilesMainThread and the
worker (see getFileManipulator and the loop logic in processFilesMainThread and
corresponding worker code) so both branches share the same resolved manipulator
behavior.
- Around line 19-20: The worker decision in needsWorkerProcessing currently
ignores the repository's files and returns true whenever compress or
removeComments is set; change needsWorkerProcessing to also accept the
repository file list (rawFiles) and only return true if (config.output.compress
|| config.output.removeComments) AND at least one file in rawFiles would be
handled by a manipulator (i.e., perform the same cheap extension-based
manipulator lookup used elsewhere for each file and return true if any lookup is
non-null). Update the function signature (needsWorkerProcessing) and any call
sites (also the usage around the similar check at the other spot mentioned) to
pass rawFiles so the fast path is preserved for Markdown/JSON-only repos.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 94d5ab5c-cc6c-40bd-bf71-c0bc91495487
📒 Files selected for processing (1)
src/core/file/fileProcess.ts
Extract lightweight file transforms (truncateBase64, removeEmptyLines, trim, showLineNumbers) into applyLightweightTransforms() on the main thread, keeping only heavy operations (removeComments, compress) in worker processContent(). This eliminates dual management of the same logic across worker and main thread paths. Also pre-compile base64 regex patterns at module level to avoid re-creation per file call. Action: split processContent into heavy (worker) and lightweight (main thread) phases Action: extract applyLightweightTransforms() as single source of truth for lightweight ops Action: hoist regex patterns in truncateBase64.ts to module scope with lastIndex reset Why: lightweight transforms were duplicated in both processFilesMainThread and processContent Why: regex re-compilation per file added unnecessary overhead for large repos Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
…safety Add test for consecutive truncateBase64Content calls to verify global regex lastIndex reset works correctly. Add test for truncateBase64 config branch in applyLightweightTransforms. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
…ss phases Split applyLightweightTransforms into applyPreCompressTransforms and applyPostCompressTransforms to preserve the original execution order: truncateBase64 → removeComments → removeEmptyLines → trim → compress → showLineNumbers Pre-compress transforms (truncateBase64, removeEmptyLines) must run before tree-sitter parsing to avoid performance regression with large base64 strings and to ensure empty line removal affects chunk merging. Action: split lightweight transforms into pre-compress and post-compress phases Why: previous refactor changed execution order, causing tree-sitter to receive untreated base64 and content with empty lines, altering compress output Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
Move removeEmptyLines from applyPreCompressTransforms to applyPostCompressTransforms so it runs after removeComments. This ensures empty lines created by comment removal are cleaned up. Transform order: truncateBase64 (pre) → [removeComments → compress] (worker) → removeEmptyLines → trim → showLineNumbers (post) Simplify applyPreCompressTransforms to only handle truncateBase64 with an early return when disabled. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
…emove redundant trim Merge applyPreCompressTransforms and applyPostCompressTransforms into a single applyLightweightTransforms function. Move truncateBase64 to post-worker phase since tree-sitter handles string literals as single AST nodes regardless of content size. Remove redundant trim from worker processContent — the main thread applyLightweightTransforms already handles it. Final pipeline: Worker: removeComments → compress Main: truncateBase64 → removeEmptyLines → trim → showLineNumbers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
Replace the needsWorkerProcessing function with a local const variable inside processFiles for simplicity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Code Review — PR #1338Overall this is a well-motivated optimization with clear benchmark wins. The two-phase pipeline (heavy worker → lightweight main thread) is a clean architectural split. A few items worth considering: 1. Transform ordering changed when workers are active
|
Revert deps parameter from `| null = null` with internal resolution
to the standard `deps = { ... }` default parameter pattern used
throughout the codebase.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add test that exercises all transforms together: removeComments (worker) + truncateBase64 + removeEmptyLines + showLineNumbers (lightweight) to verify the full two-phase pipeline produces correct output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Code Review — PR #1338 (Updated)Good performance optimization with clear benchmark wins (~5-7% across platforms). The two-phase pipeline is architecturally clean. Since the previous review, the code has been significantly refined (deps pattern restored, transforms unified into a single function). Here are remaining findings: 1.
|
When files don't require tree-sitter parsing (compress/removeComments), the worker pool IPC overhead is unnecessary. Processing them on the main thread avoids serialization costs and is faster for lightweight workloads. Benchmark: Ubuntu -0.19s.
Checklist
npm run testnpm run lint🤖 Generated with Claude Code