Skip to content

perf(core): Skip worker pool for lightweight file processing#1338

Merged
yamadashy merged 10 commits intomainfrom
perf/skip-worker-pool-lightweight-v2
Mar 28, 2026
Merged

perf(core): Skip worker pool for lightweight file processing#1338
yamadashy merged 10 commits intomainfrom
perf/skip-worker-pool-lightweight-v2

Conversation

@yamadashy
Copy link
Copy Markdown
Owner

@yamadashy yamadashy commented Mar 28, 2026

When files don't require tree-sitter parsing (compress/removeComments), the worker pool IPC overhead is unnecessary. Processing them on the main thread avoids serialization costs and is faster for lightweight workloads. Benchmark: Ubuntu -0.19s.

Checklist

  • Run npm run test
  • Run npm run lint

🤖 Generated with Claude Code


Open with Devin

When files don't require tree-sitter parsing (compress/removeComments),
the worker pool IPC overhead is unnecessary. Processing them on the main
thread avoids serialization costs and is faster for lightweight workloads.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 28, 2026

⚡ Performance Benchmark

Latest commit:6fecdca test(core): Add combined worker + lightweight pipeline integration test
Status:✅ Benchmark complete!
Ubuntu:2.21s (±0.01s) → 2.11s (±0.05s) · -0.11s (-4.8%)
macOS:1.13s (±0.04s) → 1.09s (±0.07s) · -0.04s (-4.0%)
Windows:2.59s (±0.03s) → 2.57s (±0.25s) · -0.03s (-1.0%)
Details
  • Packing the repomix repository with node bin/repomix.cjs
  • Warmup: 2 runs (discarded)
  • Measurement: 10 runs / 20 on macOS (median ± IQR)
  • Workflow run
History

a2fb57e [autofix.ci] apply automated fixes

Ubuntu:2.24s (±0.02s) → 2.09s (±0.01s) · -0.15s (-6.6%)
macOS:1.18s (±0.10s) → 1.13s (±0.11s) · -0.05s (-4.6%)
Windows:2.58s (±0.01s) → 2.42s (±0.05s) · -0.16s (-6.3%)

cdaca77 refactor(core): Inline needsWorkerProcessing as local variable

Ubuntu:2.07s (±0.03s) → 1.95s (±0.02s) · -0.12s (-5.8%)
macOS:1.19s (±0.16s) → 1.14s (±0.09s) · -0.05s (-4.5%)
Windows:2.58s (±0.02s) → 2.40s (±0.02s) · -0.17s (-6.7%)

f106716 refactor(core): Simplify into single applyLightweightTransforms and remove redundant trim

Ubuntu:2.20s (±0.02s) → 2.08s (±0.04s) · -0.12s (-5.5%)
macOS:1.15s (±0.05s) → 1.14s (±0.09s) · -0.01s (-0.6%)
Windows:2.81s (±0.18s) → 2.45s (±0.03s) · -0.36s (-12.7%)

47e4a65 fix(core): Move removeEmptyLines to post-compress to preserve ordering

Ubuntu:2.23s (±0.02s) → 2.11s (±0.04s) · -0.12s (-5.3%)
macOS:1.35s (±0.19s) → 1.36s (±0.12s) · +0.01s (+0.5%)
Windows:2.68s (±0.04s) → 2.48s (±0.02s) · -0.20s (-7.4%)

cac35d0 fix(core): Preserve transform order by splitting into pre/post compress phases

Ubuntu:2.24s (±0.02s) → 2.05s (±0.01s) · -0.19s (-8.7%)
macOS:1.18s (±0.03s) → 1.16s (±0.07s) · -0.02s (-1.8%)
Windows:2.65s (±0.03s) → 2.46s (±0.02s) · -0.19s (-7.3%)

e978dec test(core): Add regression tests for base64 truncation and lastIndex safety

Ubuntu:2.27s (±0.02s) → 2.12s (±0.03s) · -0.15s (-6.7%)
macOS:1.15s (±0.05s) → 1.09s (±0.04s) · -0.05s (-4.5%)
Windows:2.63s (±0.10s) → 2.44s (±0.02s) · -0.19s (-7.2%)

3e70628 refactor(core): Separate lightweight transforms from worker processing

Ubuntu:2.17s (±0.02s) → 2.04s (±0.02s) · -0.14s (-6.4%)
macOS:1.14s (±0.12s) → 1.11s (±0.20s) · -0.03s (-2.5%)
Windows:2.80s (±0.12s) → 2.65s (±0.10s) · -0.15s (-5.4%)

d77d758 perf(core): Skip worker pool for lightweight file processing

Ubuntu:2.19s (±0.02s) → 2.05s (±0.02s) · -0.15s (-6.7%)
macOS:1.17s (±0.06s) → 1.14s (±0.10s) · -0.03s (-2.9%)
Windows:2.75s (±0.06s) → 2.56s (±0.05s) · -0.19s (-6.9%)

@coderabbitai

This comment has been minimized.

gemini-code-assist[bot]

This comment was marked as resolved.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages bot commented Mar 28, 2026

Deploying repomix with  Cloudflare Pages  Cloudflare Pages

Latest commit: 6fecdca
Status: ✅  Deploy successful!
Preview URL: https://e2ebb05e.repomix.pages.dev
Branch Preview URL: https://perf-skip-worker-pool-lightw-ibtg.repomix.pages.dev

View logs

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 28, 2026

Codecov Report

❌ Patch coverage is 96.36364% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.14%. Comparing base (d762d38) to head (6fecdca).
⚠️ Report is 15 commits behind head on main.

Files with missing lines Patch % Lines
src/core/file/fileProcess.ts 96.07% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1338      +/-   ##
==========================================
+ Coverage   87.13%   87.14%   +0.01%     
==========================================
  Files         115      115              
  Lines        4367     4389      +22     
  Branches     1015     1020       +5     
==========================================
+ Hits         3805     3825      +20     
- Misses        562      564       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

@claude

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/core/file/fileProcess.ts (2)

29-59: Don’t use dependency injection as the mode switch.

deps === null now doubles as “use the fast path”, while processFilesMainThread() hard-codes the module-level getFileManipulator. Any caller that injects deps loses the optimization, and this branch becomes awkward to cover with mocked manipulators. Resolve deps first and pass the resolved manipulator into the helper instead; if tests need workers, make that an explicit flag. Longer term, a shared per-file transform helper reused by src/core/file/workers/fileProcessWorker.ts would keep both branches in sync.

♻️ Suggested refactor
 const processFilesMainThread = async (
   rawFiles: RawFile[],
   config: RepomixConfigMerged,
   progressCallback: RepomixProgressCallback,
+  resolveFileManipulator: GetFileManipulator,
 ): Promise<ProcessedFile[]> => {
@@
     if (config.output.removeEmptyLines) {
-      const manipulator = getFileManipulator(rawFile.path);
+      const manipulator = resolveFileManipulator(rawFile.path);
       if (manipulator) {
         content = manipulator.removeEmptyLines(content);
       }
     }
@@
-  if (!needsWorkerProcessing(config) && deps === null) {
-    return processFilesMainThread(rawFiles, config, progressCallback);
-  }
-
   const resolvedDeps = deps ?? {
     initTaskRunner,
     getFileManipulator,
   };
+
+  if (!needsWorkerProcessing(config)) {
+    return processFilesMainThread(
+      rawFiles,
+      config,
+      progressCallback,
+      resolvedDeps.getFileManipulator,
+    );
+  }

Also applies to: 89-103

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/file/fileProcess.ts` around lines 29 - 59, processFilesMainThread
currently treats deps === null as the “fast path” and directly calls the
module-level getFileManipulator, which makes behavior diverge when callers
inject deps; resolve the deps up-front and pass the concrete manipulator (or a
per-file manipulator factory) into processFilesMainThread instead of calling
getFileManipulator internally, so callers that pass deps keep the same optimized
path; make worker-vs-main an explicit flag or parameter (not implicit via deps)
and extract the per-file transform into a shared helper used by both
processFilesMainThread and the worker (see getFileManipulator and the loop logic
in processFilesMainThread and corresponding worker code) so both branches share
the same resolved manipulator behavior.

19-20: Make the worker decision file-aware.

When compress is false and removeComments is true, this still forces the worker pool even if every input path is unsupported and manipulator lookup would return null. Since that lookup is only an extension check, you can cheaply gate on rawFiles here and keep the fast path for Markdown/JSON-only repos too.

Also applies to: 96-97

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/file/fileProcess.ts` around lines 19 - 20, The worker decision in
needsWorkerProcessing currently ignores the repository's files and returns true
whenever compress or removeComments is set; change needsWorkerProcessing to also
accept the repository file list (rawFiles) and only return true if
(config.output.compress || config.output.removeComments) AND at least one file
in rawFiles would be handled by a manipulator (i.e., perform the same cheap
extension-based manipulator lookup used elsewhere for each file and return true
if any lookup is non-null). Update the function signature
(needsWorkerProcessing) and any call sites (also the usage around the similar
check at the other spot mentioned) to pass rawFiles so the fast path is
preserved for Markdown/JSON-only repos.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/core/file/fileProcess.ts`:
- Around line 29-59: processFilesMainThread currently treats deps === null as
the “fast path” and directly calls the module-level getFileManipulator, which
makes behavior diverge when callers inject deps; resolve the deps up-front and
pass the concrete manipulator (or a per-file manipulator factory) into
processFilesMainThread instead of calling getFileManipulator internally, so
callers that pass deps keep the same optimized path; make worker-vs-main an
explicit flag or parameter (not implicit via deps) and extract the per-file
transform into a shared helper used by both processFilesMainThread and the
worker (see getFileManipulator and the loop logic in processFilesMainThread and
corresponding worker code) so both branches share the same resolved manipulator
behavior.
- Around line 19-20: The worker decision in needsWorkerProcessing currently
ignores the repository's files and returns true whenever compress or
removeComments is set; change needsWorkerProcessing to also accept the
repository file list (rawFiles) and only return true if (config.output.compress
|| config.output.removeComments) AND at least one file in rawFiles would be
handled by a manipulator (i.e., perform the same cheap extension-based
manipulator lookup used elsewhere for each file and return true if any lookup is
non-null). Update the function signature (needsWorkerProcessing) and any call
sites (also the usage around the similar check at the other spot mentioned) to
pass rawFiles so the fast path is preserved for Markdown/JSON-only repos.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 94d5ab5c-cc6c-40bd-bf71-c0bc91495487

📥 Commits

Reviewing files that changed from the base of the PR and between d762d38 and d77d758.

📒 Files selected for processing (1)
  • src/core/file/fileProcess.ts

Extract lightweight file transforms (truncateBase64, removeEmptyLines,
trim, showLineNumbers) into applyLightweightTransforms() on the main
thread, keeping only heavy operations (removeComments, compress) in
worker processContent(). This eliminates dual management of the same
logic across worker and main thread paths.

Also pre-compile base64 regex patterns at module level to avoid
re-creation per file call.

Action: split processContent into heavy (worker) and lightweight (main thread) phases
Action: extract applyLightweightTransforms() as single source of truth for lightweight ops
Action: hoist regex patterns in truncateBase64.ts to module scope with lastIndex reset
Why: lightweight transforms were duplicated in both processFilesMainThread and processContent
Why: regex re-compilation per file added unnecessary overhead for large repos

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude

This comment has been minimized.

devin-ai-integration[bot]

This comment was marked as resolved.

…safety

Add test for consecutive truncateBase64Content calls to verify global
regex lastIndex reset works correctly. Add test for truncateBase64
config branch in applyLightweightTransforms.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude

This comment has been minimized.

…ss phases

Split applyLightweightTransforms into applyPreCompressTransforms and
applyPostCompressTransforms to preserve the original execution order:
truncateBase64 → removeComments → removeEmptyLines → trim → compress → showLineNumbers

Pre-compress transforms (truncateBase64, removeEmptyLines) must run
before tree-sitter parsing to avoid performance regression with large
base64 strings and to ensure empty line removal affects chunk merging.

Action: split lightweight transforms into pre-compress and post-compress phases
Why: previous refactor changed execution order, causing tree-sitter to receive
untreated base64 and content with empty lines, altering compress output

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

@claude

This comment has been minimized.

Move removeEmptyLines from applyPreCompressTransforms to
applyPostCompressTransforms so it runs after removeComments.
This ensures empty lines created by comment removal are cleaned up.

Transform order: truncateBase64 (pre) → [removeComments → compress] (worker) → removeEmptyLines → trim → showLineNumbers (post)

Simplify applyPreCompressTransforms to only handle truncateBase64
with an early return when disabled.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude

This comment has been minimized.

…emove redundant trim

Merge applyPreCompressTransforms and applyPostCompressTransforms into
a single applyLightweightTransforms function. Move truncateBase64 to
post-worker phase since tree-sitter handles string literals as single
AST nodes regardless of content size.

Remove redundant trim from worker processContent — the main thread
applyLightweightTransforms already handles it.

Final pipeline:
  Worker: removeComments → compress
  Main:   truncateBase64 → removeEmptyLines → trim → showLineNumbers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude

This comment has been minimized.

Replace the needsWorkerProcessing function with a local const
variable inside processFiles for simplicity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 28, 2026

Code Review — PR #1338

Overall this is a well-motivated optimization with clear benchmark wins. The two-phase pipeline (heavy worker → lightweight main thread) is a clean architectural split. A few items worth considering:

1. Transform ordering changed when workers are active ⚠️

This is the most notable behavioral change. The old pipeline was:

truncateBase64 → removeComments → removeEmptyLines → trim → compress → showLineNumbers

The new pipeline (when workers are used) is:

[removeComments → compress] (worker) → truncateBase64 → removeEmptyLines → trim → showLineNumbers (main)

Three transforms that previously ran before tree-sitter (truncateBase64, removeEmptyLines, trim) now run after it.

Details
  • trim() removed from worker: Previously, processContent called .trim() before compress/tree-sitter parsing. Now trim only runs after workers return. If tree-sitter parsing is sensitive to leading/trailing whitespace (e.g., indentation-significant languages), this could produce different compressed output.
  • truncateBase64 after compress: Large base64 strings are now parsed by tree-sitter before truncation. Commit 6 justifies this with "tree-sitter handles string literals as single AST nodes regardless of content size" — plausible for most languages, but worth validating that compressed output is identical for a representative set of files.
  • removeEmptyLines after compress: Tree-sitter compression now operates on content that still has empty lines, which may produce different compressed output than before.

The commit history (7 commits) shows iterative back-and-forth on this ordering, suggesting it was tricky to get right. An integration test verifying compressed output is identical before/after this change would give confidence.

2. Missing test for combined worker + lightweight pipeline

The worker-path test ("should process multiple files with worker path") sets removeComments: true but doesn't enable any lightweight transforms. There's no integration test verifying the combined pipeline (e.g., removeComments: true + truncateBase64: true + showLineNumbers: true). A combined test would catch ordering regressions.

Details

Also worth adding:

  • A transform ordering test where multiple transforms interact (e.g., removeEmptyLines + showLineNumbers together) to verify empty lines are removed before line numbers are assigned.
  • A negative assertion in processContent tests confirming removeEmptyLines is not called even when configured, to guard against accidental re-introduction.

3. deps parameter convention (minor)

Details

The processFiles deps parameter changed from the project's established pattern:

// Before (project convention):
deps: { initTaskRunner, getFileManipulator } = { initTaskRunner, getFileManipulator }

// After:
deps: { initTaskRunner, getFileManipulator } | null = null

This works fine but departs from the dependency injection idiom used elsewhere in the codebase. The original default-parameter pattern was arguably cleaner and consistent with other functions.

Additionally, applyLightweightTransforms takes deps as a required parameter with no default, which is inconsistent with the convention of providing defaults for testability convenience.

Security & Performance

No security concerns. Performance approach is sound — skipping IPC for lightweight-only configs is the right call, and benchmarks confirm the win.


🤖 Generated with Claude Code

devin-ai-integration[bot]

This comment was marked as resolved.

yamadashy and others added 3 commits March 28, 2026 18:19
Revert deps parameter from `| null = null` with internal resolution
to the standard `deps = { ... }` default parameter pattern used
throughout the codebase.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add test that exercises all transforms together: removeComments (worker)
+ truncateBase64 + removeEmptyLines + showLineNumbers (lightweight) to
verify the full two-phase pipeline produces correct output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@yamadashy yamadashy merged commit 28f93e9 into main Mar 28, 2026
61 checks passed
@yamadashy yamadashy deleted the perf/skip-worker-pool-lightweight-v2 branch March 28, 2026 09:31
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 28, 2026

Code Review — PR #1338 (Updated)

Good performance optimization with clear benchmark wins (~5-7% across platforms). The two-phase pipeline is architecturally clean. Since the previous review, the code has been significantly refined (deps pattern restored, transforms unified into a single function). Here are remaining findings:

1. processContent is now a partial processor — fragile contract ⚠️

processContent no longer produces fully processed output — it only handles removeComments and compress, requiring applyLightweightTransforms to finish the job. However, it's still exported and called by workers. The function name doesn't convey that it only does "heavy" processing. If a future contributor calls processContent directly, they'll get incompletely processed output with no compile-time guard.

Suggestion: Consider renaming to processContentHeavy or adding a @internal TSDoc tag to clarify the contract.

2. Silent progress callback during lightweight phase

files = applyLightweightTransforms(files, config, () => {}, deps);

When workers are used, the lightweight phase runs with a no-op progress callback. For very large repos, this synchronous loop could take noticeable time with no progress indication. Additionally, pc.dim(file.path) string construction still occurs every 50 files even though the callback discards it.

Suggestion: Either forward progressCallback or guard the string construction:

if (progressCallback && ((i + 1) % 50 === 0 || i === totalFiles - 1)) {

3. applyLightweightTransforms missing deps default — convention deviation

Details

The project convention is to provide defaults for deps parameters so functions are independently callable/testable. applyLightweightTransforms requires deps explicitly with no default, while being a public export. Consider adding a default:

deps = { getFileManipulator }

4. Worker decision is all-or-nothing (future optimization opportunity)

Details

useWorkers = config.output.compress || config.output.removeComments sends ALL files through workers even if every file is an unsupported type (e.g., Markdown-only repo with removeComments: true). A per-file check against getFileManipulator would allow the lightweight path for repos where no file actually benefits from worker processing. This is a follow-up optimization, not a blocker.

5. Test gap: worker-skip path not asserted

Details

The "should process files with lightweight-only config" test verifies correct output but doesn't assert that workers were actually skipped. Since the core claim of this PR is skipping workers for lightweight configs, a spy/assertion on initTaskRunner not being called would strengthen confidence in the optimization.

Overall

Clean refactoring with real performance benefits backed by benchmarks. The code has improved significantly through the commit series. The main actionable items are (1) clarifying the processContent contract and (2) the silent progress callback. Everything else is minor or future-work.

🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant