perf(core): Skip worker pool for lightweight file processing by yamadashy · Pull Request #1338 · yamadashy/repomix

yamadashy · 2026-03-28T07:59:16Z

When files don't require tree-sitter parsing (compress/removeComments), the worker pool IPC overhead is unnecessary. Processing them on the main thread avoids serialization costs and is faster for lightweight workloads. Benchmark: Ubuntu -0.19s.

Checklist

Run npm run test
Run npm run lint

🤖 Generated with Claude Code

When files don't require tree-sitter parsing (compress/removeComments), the worker pool IPC overhead is unnecessary. Processing them on the main thread avoids serialization costs and is faster for lightweight workloads. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-28T07:59:28Z

⚡ Performance Benchmark

Latest commit:	`6fecdca` test(core): Add combined worker + lightweight pipeline integration test
Status:	✅ Benchmark complete!
Ubuntu:	2.21s (±0.01s) → 2.11s (±0.05s) · -0.11s (-4.8%)
macOS:	1.13s (±0.04s) → 1.09s (±0.07s) · -0.04s (-4.0%)
Windows:	2.59s (±0.03s) → 2.57s (±0.25s) · -0.03s (-1.0%)

Details

Packing the repomix repository with node bin/repomix.cjs
Warmup: 2 runs (discarded)
Measurement: 10 runs / 20 on macOS (median ± IQR)
Workflow run

History

a2fb57e [autofix.ci] apply automated fixes

Ubuntu:	2.24s (±0.02s) → 2.09s (±0.01s) · -0.15s (-6.6%)
macOS:	1.18s (±0.10s) → 1.13s (±0.11s) · -0.05s (-4.6%)
Windows:	2.58s (±0.01s) → 2.42s (±0.05s) · -0.16s (-6.3%)

cdaca77 refactor(core): Inline needsWorkerProcessing as local variable

Ubuntu:	2.07s (±0.03s) → 1.95s (±0.02s) · -0.12s (-5.8%)
macOS:	1.19s (±0.16s) → 1.14s (±0.09s) · -0.05s (-4.5%)
Windows:	2.58s (±0.02s) → 2.40s (±0.02s) · -0.17s (-6.7%)

f106716 refactor(core): Simplify into single applyLightweightTransforms and remove redundant trim

Ubuntu:	2.20s (±0.02s) → 2.08s (±0.04s) · -0.12s (-5.5%)
macOS:	1.15s (±0.05s) → 1.14s (±0.09s) · -0.01s (-0.6%)
Windows:	2.81s (±0.18s) → 2.45s (±0.03s) · -0.36s (-12.7%)

47e4a65 fix(core): Move removeEmptyLines to post-compress to preserve ordering

Ubuntu:	2.23s (±0.02s) → 2.11s (±0.04s) · -0.12s (-5.3%)
macOS:	1.35s (±0.19s) → 1.36s (±0.12s) · +0.01s (+0.5%)
Windows:	2.68s (±0.04s) → 2.48s (±0.02s) · -0.20s (-7.4%)

cac35d0 fix(core): Preserve transform order by splitting into pre/post compress phases

Ubuntu:	2.24s (±0.02s) → 2.05s (±0.01s) · -0.19s (-8.7%)
macOS:	1.18s (±0.03s) → 1.16s (±0.07s) · -0.02s (-1.8%)
Windows:	2.65s (±0.03s) → 2.46s (±0.02s) · -0.19s (-7.3%)

e978dec test(core): Add regression tests for base64 truncation and lastIndex safety

Ubuntu:	2.27s (±0.02s) → 2.12s (±0.03s) · -0.15s (-6.7%)
macOS:	1.15s (±0.05s) → 1.09s (±0.04s) · -0.05s (-4.5%)
Windows:	2.63s (±0.10s) → 2.44s (±0.02s) · -0.19s (-7.2%)

3e70628 refactor(core): Separate lightweight transforms from worker processing

Ubuntu:	2.17s (±0.02s) → 2.04s (±0.02s) · -0.14s (-6.4%)
macOS:	1.14s (±0.12s) → 1.11s (±0.20s) · -0.03s (-2.5%)
Windows:	2.80s (±0.12s) → 2.65s (±0.10s) · -0.15s (-5.4%)

d77d758 perf(core): Skip worker pool for lightweight file processing

Ubuntu:	2.19s (±0.02s) → 2.05s (±0.02s) · -0.15s (-6.7%)
macOS:	1.17s (±0.06s) → 1.14s (±0.10s) · -0.03s (-2.9%)
Windows:	2.75s (±0.06s) → 2.56s (±0.05s) · -0.19s (-6.9%)

cloudflare-workers-and-pages · 2026-03-28T08:01:04Z

Deploying repomix with Cloudflare Pages

Latest commit:	`6fecdca`
Status:	✅ Deploy successful!
Preview URL:	https://e2ebb05e.repomix.pages.dev
Branch Preview URL:	https://perf-skip-worker-pool-lightw-ibtg.repomix.pages.dev

View logs

codecov · 2026-03-28T08:01:55Z

Codecov Report

❌ Patch coverage is 96.36364% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.14%. Comparing base (d762d38) to head (6fecdca).
⚠️ Report is 15 commits behind head on main.

Files with missing lines	Patch %	Lines
src/core/file/fileProcess.ts	96.07%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1338      +/-   ##
==========================================
+ Coverage   87.13%   87.14%   +0.01%     
==========================================
  Files         115      115              
  Lines        4367     4389      +22     
  Branches     1015     1020       +5     
==========================================
+ Hits         3805     3825      +20     
- Misses        562      564       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

coderabbitai

🧹 Nitpick comments (2)

src/core/file/fileProcess.ts (2)

29-59: Don’t use dependency injection as the mode switch.

deps === null now doubles as “use the fast path”, while processFilesMainThread() hard-codes the module-level getFileManipulator. Any caller that injects deps loses the optimization, and this branch becomes awkward to cover with mocked manipulators. Resolve deps first and pass the resolved manipulator into the helper instead; if tests need workers, make that an explicit flag. Longer term, a shared per-file transform helper reused by src/core/file/workers/fileProcessWorker.ts would keep both branches in sync.

♻️ Suggested refactor

 const processFilesMainThread = async (
   rawFiles: RawFile[],
   config: RepomixConfigMerged,
   progressCallback: RepomixProgressCallback,
+  resolveFileManipulator: GetFileManipulator,
 ): Promise<ProcessedFile[]> => {
@@
     if (config.output.removeEmptyLines) {
-      const manipulator = getFileManipulator(rawFile.path);
+      const manipulator = resolveFileManipulator(rawFile.path);
       if (manipulator) {
         content = manipulator.removeEmptyLines(content);
       }
     }
@@
-  if (!needsWorkerProcessing(config) && deps === null) {
-    return processFilesMainThread(rawFiles, config, progressCallback);
-  }
-
   const resolvedDeps = deps ?? {
     initTaskRunner,
     getFileManipulator,
   };
+
+  if (!needsWorkerProcessing(config)) {
+    return processFilesMainThread(
+      rawFiles,
+      config,
+      progressCallback,
+      resolvedDeps.getFileManipulator,
+    );
+  }

Also applies to: 89-103

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/core/file/fileProcess.ts` around lines 29 - 59, processFilesMainThread
currently treats deps === null as the “fast path” and directly calls the
module-level getFileManipulator, which makes behavior diverge when callers
inject deps; resolve the deps up-front and pass the concrete manipulator (or a
per-file manipulator factory) into processFilesMainThread instead of calling
getFileManipulator internally, so callers that pass deps keep the same optimized
path; make worker-vs-main an explicit flag or parameter (not implicit via deps)
and extract the per-file transform into a shared helper used by both
processFilesMainThread and the worker (see getFileManipulator and the loop logic
in processFilesMainThread and corresponding worker code) so both branches share
the same resolved manipulator behavior.

19-20: Make the worker decision file-aware.

When compress is false and removeComments is true, this still forces the worker pool even if every input path is unsupported and manipulator lookup would return null. Since that lookup is only an extension check, you can cheaply gate on rawFiles here and keep the fast path for Markdown/JSON-only repos too.

Also applies to: 96-97

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/core/file/fileProcess.ts` around lines 19 - 20, The worker decision in
needsWorkerProcessing currently ignores the repository's files and returns true
whenever compress or removeComments is set; change needsWorkerProcessing to also
accept the repository file list (rawFiles) and only return true if
(config.output.compress || config.output.removeComments) AND at least one file
in rawFiles would be handled by a manipulator (i.e., perform the same cheap
extension-based manipulator lookup used elsewhere for each file and return true
if any lookup is non-null). Update the function signature
(needsWorkerProcessing) and any call sites (also the usage around the similar
check at the other spot mentioned) to pass rawFiles so the fast path is
preserved for Markdown/JSON-only repos.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/core/file/fileProcess.ts`:
- Around line 29-59: processFilesMainThread currently treats deps === null as
the “fast path” and directly calls the module-level getFileManipulator, which
makes behavior diverge when callers inject deps; resolve the deps up-front and
pass the concrete manipulator (or a per-file manipulator factory) into
processFilesMainThread instead of calling getFileManipulator internally, so
callers that pass deps keep the same optimized path; make worker-vs-main an
explicit flag or parameter (not implicit via deps) and extract the per-file
transform into a shared helper used by both processFilesMainThread and the
worker (see getFileManipulator and the loop logic in processFilesMainThread and
corresponding worker code) so both branches share the same resolved manipulator
behavior.
- Around line 19-20: The worker decision in needsWorkerProcessing currently
ignores the repository's files and returns true whenever compress or
removeComments is set; change needsWorkerProcessing to also accept the
repository file list (rawFiles) and only return true if (config.output.compress
|| config.output.removeComments) AND at least one file in rawFiles would be
handled by a manipulator (i.e., perform the same cheap extension-based
manipulator lookup used elsewhere for each file and return true if any lookup is
non-null). Update the function signature (needsWorkerProcessing) and any call
sites (also the usage around the similar check at the other spot mentioned) to
pass rawFiles so the fast path is preserved for Markdown/JSON-only repos.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 94d5ab5c-cc6c-40bd-bf71-c0bc91495487

📥 Commits

Reviewing files that changed from the base of the PR and between d762d38 and d77d758.

📒 Files selected for processing (1)

src/core/file/fileProcess.ts

Extract lightweight file transforms (truncateBase64, removeEmptyLines, trim, showLineNumbers) into applyLightweightTransforms() on the main thread, keeping only heavy operations (removeComments, compress) in worker processContent(). This eliminates dual management of the same logic across worker and main thread paths. Also pre-compile base64 regex patterns at module level to avoid re-creation per file call. Action: split processContent into heavy (worker) and lightweight (main thread) phases Action: extract applyLightweightTransforms() as single source of truth for lightweight ops Action: hoist regex patterns in truncateBase64.ts to module scope with lastIndex reset Why: lightweight transforms were duplicated in both processFilesMainThread and processContent Why: regex re-compilation per file added unnecessary overhead for large repos Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…safety Add test for consecutive truncateBase64Content calls to verify global regex lastIndex reset works correctly. Add test for truncateBase64 config branch in applyLightweightTransforms. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ss phases Split applyLightweightTransforms into applyPreCompressTransforms and applyPostCompressTransforms to preserve the original execution order: truncateBase64 → removeComments → removeEmptyLines → trim → compress → showLineNumbers Pre-compress transforms (truncateBase64, removeEmptyLines) must run before tree-sitter parsing to avoid performance regression with large base64 strings and to ensure empty line removal affects chunk merging. Action: split lightweight transforms into pre-compress and post-compress phases Why: previous refactor changed execution order, causing tree-sitter to receive untreated base64 and content with empty lines, altering compress output Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Move removeEmptyLines from applyPreCompressTransforms to applyPostCompressTransforms so it runs after removeComments. This ensures empty lines created by comment removal are cleaned up. Transform order: truncateBase64 (pre) → [removeComments → compress] (worker) → removeEmptyLines → trim → showLineNumbers (post) Simplify applyPreCompressTransforms to only handle truncateBase64 with an early return when disabled. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…emove redundant trim Merge applyPreCompressTransforms and applyPostCompressTransforms into a single applyLightweightTransforms function. Move truncateBase64 to post-worker phase since tree-sitter handles string literals as single AST nodes regardless of content size. Remove redundant trim from worker processContent — the main thread applyLightweightTransforms already handles it. Final pipeline: Worker: removeComments → compress Main: truncateBase64 → removeEmptyLines → trim → showLineNumbers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace the needsWorkerProcessing function with a local const variable inside processFiles for simplicity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

claude · 2026-03-28T09:11:24Z

Code Review — PR #1338

Overall this is a well-motivated optimization with clear benchmark wins. The two-phase pipeline (heavy worker → lightweight main thread) is a clean architectural split. A few items worth considering:

1. Transform ordering changed when workers are active ⚠️

This is the most notable behavioral change. The old pipeline was:

truncateBase64 → removeComments → removeEmptyLines → trim → compress → showLineNumbers

The new pipeline (when workers are used) is:

[removeComments → compress] (worker) → truncateBase64 → removeEmptyLines → trim → showLineNumbers (main)

Three transforms that previously ran before tree-sitter (truncateBase64, removeEmptyLines, trim) now run after it.

Details

trim() removed from worker: Previously, processContent called .trim() before compress/tree-sitter parsing. Now trim only runs after workers return. If tree-sitter parsing is sensitive to leading/trailing whitespace (e.g., indentation-significant languages), this could produce different compressed output.
truncateBase64 after compress: Large base64 strings are now parsed by tree-sitter before truncation. Commit 6 justifies this with "tree-sitter handles string literals as single AST nodes regardless of content size" — plausible for most languages, but worth validating that compressed output is identical for a representative set of files.
removeEmptyLines after compress: Tree-sitter compression now operates on content that still has empty lines, which may produce different compressed output than before.

The commit history (7 commits) shows iterative back-and-forth on this ordering, suggesting it was tricky to get right. An integration test verifying compressed output is identical before/after this change would give confidence.

2. Missing test for combined worker + lightweight pipeline

The worker-path test ("should process multiple files with worker path") sets removeComments: true but doesn't enable any lightweight transforms. There's no integration test verifying the combined pipeline (e.g., removeComments: true + truncateBase64: true + showLineNumbers: true). A combined test would catch ordering regressions.

Details

Also worth adding:

A transform ordering test where multiple transforms interact (e.g., removeEmptyLines + showLineNumbers together) to verify empty lines are removed before line numbers are assigned.
A negative assertion in processContent tests confirming removeEmptyLines is not called even when configured, to guard against accidental re-introduction.

3. `deps` parameter convention (minor)

Details

The processFiles deps parameter changed from the project's established pattern:

// Before (project convention):
deps: { initTaskRunner, getFileManipulator } = { initTaskRunner, getFileManipulator }

// After:
deps: { initTaskRunner, getFileManipulator } | null = null

This works fine but departs from the dependency injection idiom used elsewhere in the codebase. The original default-parameter pattern was arguably cleaner and consistent with other functions.

Additionally, applyLightweightTransforms takes deps as a required parameter with no default, which is inconsistent with the convention of providing defaults for testability convenience.

Security & Performance

No security concerns. Performance approach is sound — skipping IPC for lightweight-only configs is the right call, and benchmarks confirm the win.

🤖 Generated with Claude Code

Revert deps parameter from `| null = null` with internal resolution to the standard `deps = { ... }` default parameter pattern used throughout the codebase. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add test that exercises all transforms together: removeComments (worker) + truncateBase64 + removeEmptyLines + showLineNumbers (lightweight) to verify the full two-phase pipeline produces correct output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

claude · 2026-03-28T09:32:46Z

Code Review — PR #1338 (Updated)

Good performance optimization with clear benchmark wins (~5-7% across platforms). The two-phase pipeline is architecturally clean. Since the previous review, the code has been significantly refined (deps pattern restored, transforms unified into a single function). Here are remaining findings:

1. `processContent` is now a partial processor — fragile contract ⚠️

processContent no longer produces fully processed output — it only handles removeComments and compress, requiring applyLightweightTransforms to finish the job. However, it's still exported and called by workers. The function name doesn't convey that it only does "heavy" processing. If a future contributor calls processContent directly, they'll get incompletely processed output with no compile-time guard.

Suggestion: Consider renaming to processContentHeavy or adding a @internal TSDoc tag to clarify the contract.

2. Silent progress callback during lightweight phase

files = applyLightweightTransforms(files, config, () => {}, deps);

When workers are used, the lightweight phase runs with a no-op progress callback. For very large repos, this synchronous loop could take noticeable time with no progress indication. Additionally, pc.dim(file.path) string construction still occurs every 50 files even though the callback discards it.

Suggestion: Either forward progressCallback or guard the string construction:

if (progressCallback && ((i + 1) % 50 === 0 || i === totalFiles - 1)) {

3. `applyLightweightTransforms` missing `deps` default — convention deviation

Details

The project convention is to provide defaults for deps parameters so functions are independently callable/testable. applyLightweightTransforms requires deps explicitly with no default, while being a public export. Consider adding a default:

deps = { getFileManipulator }

4. Worker decision is all-or-nothing (future optimization opportunity)

Details

useWorkers = config.output.compress || config.output.removeComments sends ALL files through workers even if every file is an unsupported type (e.g., Markdown-only repo with removeComments: true). A per-file check against getFileManipulator would allow the lightweight path for repos where no file actually benefits from worker processing. This is a follow-up optimization, not a blocker.

5. Test gap: worker-skip path not asserted

Details

The "should process files with lightweight-only config" test verifies correct output but doesn't assert that workers were actually skipped. Since the core claim of this PR is skipping workers for lightweight configs, a spy/assertion on initTaskRunner not being called would strengthen confidence in the optimization.

Overall

Clean refactoring with real performance benefits backed by benchmarks. The code has improved significantly through the commit series. The main actionable items are (1) clarifying the processContent contract and (2) the silent progress callback. Everything else is minor or future-work.

🤖 Generated with Claude Code