perf(core): Automated performance tuning by Claude by yamadashy · Pull Request #1377 · yamadashy/repomix

yamadashy · 2026-04-03T06:32:50Z

Summary

Multiple performance optimizations targeting file I/O overhead, IPC round-trips, worker pool lifecycle, worker initialization latency, search phase I/O contention, output token estimation, and pipeline parallelism. Combined improvement of ~39% on end-to-end CLI execution.

Changes in this PR

1. Batch metrics token counting (calculateSelectiveFileMetrics.ts, calculateMetricsWorker.ts)

Groups files into batches of 50 before dispatching to worker threads
Reduces IPC round-trips by ~95% (990 → 20 for a typical repo)
Per-item error handling prevents one bad file from killing an entire batch

2. Replace regex with manual scan in truncateBase64 (truncateBase64.ts)

Replaces expensive regex patterns with character-by-character scanning for base64 detection

3. Non-blocking worker pool cleanup (packager.ts, securityCheck.ts)

Worker pool cleanup() calls are now fire-and-forget, eliminating shutdown overhead

4. Skip redundant output tokenization (calculateMetrics.ts)

When all files are individually tokenized (tokenCountTree enabled), estimates output tokens from file token sums + overhead ratio instead of re-tokenizing the entire output

5. Reduce file I/O overhead (fileRead.ts, fileSearch.ts)

Eliminated per-file stat() syscall — checks buffer.length after readFile() instead
Uses git ls-files for gitignore filtering (~10ms) instead of globby's JS parser (~250ms)
Results intersected with git-visible file set; falls back to globby when git unavailable

6. Move metrics worker warmup before file search (packager.ts)

Starts gpt-tokenizer loading earlier to maximize pipeline overlap with search, collection, and security phases

7. Pre-initialize security worker pool (packager.ts, securityCheck.ts, processConcurrency.ts)

Creates security worker threads eagerly (minThreads = maxThreads) via new eagerWorkers option
Overlaps @secretlint/core module loading (~150ms) with file collection I/O
Releases security workers immediately after check to free CPU for metrics

8. Reduce search I/O contention (packager.ts, fileSearch.ts)

Defers security worker pool creation to after search, reducing CPU/IO contention during filesystem traversal
Parallelizes file globby and directory globby within searchFiles when includeEmptyDirectories is enabled

9. Estimate output tokens via sampling (calculateMetrics.ts)

For large outputs (>500KB), samples 10 evenly-spaced 100KB portions and extrapolates total tokens
Avoids full BPE tokenization of multi-megabyte output strings (~150-200ms savings)

10. Skip globby via git ls-files + picomatch fast path (fileSearch.ts)

When include patterns are default (**/*) and no nested ignore files exist, skips globby filesystem traversal entirely
Filters git ls-files output directly with picomatch (~70ms savings)

11. Optimistic security pipeline + lazy module loading (packager.ts, fileSearch.ts, fileRead.ts)

Restructures pipeline to overlap security check with output generation and metrics calculation
In the common case (>95% of runs with no suspicious files), output string is generated optimistically while security runs in parallel
Output is only written to disk/stdout/clipboard AFTER security confirms no suspicious files
If security finds issues (rare), output is regenerated with filtered files
Lazy-loads globby (~50ms) and jschardet/iconv-lite (~25ms) to reduce startup overhead
Pipeline change: Search → Collect+Git → Process → [Security ‖ Output+Metrics] (was: Search → Collect+Git → [Security + Process] → Output+Metrics)

12. Skip unnecessary content scans in createRenderContext (outputGenerate.ts)

Skips markdown delimiter calculation and line count computation when not needed by the output style

Benchmark Results

End-to-end CLI execution (repomix on itself, 1012 files, default config, 20 runs):

Metric	Baseline	Optimized	Improvement
Median	2160ms	1316ms	-39%
Mean	2158ms	1324ms	-39%

Checklist

Run npm run test (1101 tests passing)
Run npm run lint (0 errors)

https://claude.ai/code/session_01EXHxiny9nuEy8HrdP6d9Em

github-actions · 2026-04-03T06:33:02Z

⚡ Performance Benchmark

Latest commit:	`ae547ec` Merge remote-tracking branch 'origin/perf/auto-perf-tuning-0403' into perf/auto-perf-tuning-0403
Status:	✅ Benchmark complete!
Ubuntu:	1.53s (±0.02s) → 0.99s (±0.03s) · -0.54s (-35.4%)
macOS:	0.99s (±0.13s) → 0.71s (±0.13s) · -0.28s (-28.3%)
Windows:	1.91s (±0.01s) → 1.17s (±0.02s) · -0.74s (-38.7%)

Details

Packing the repomix repository with node bin/repomix.cjs
Warmup: 2 runs (discarded), interleaved execution
Measurement: 20 runs / 30 on macOS (median ± IQR)
Workflow run

History

dcc8452 perf(core): Overlap security check with output generation via optimistic pipeline

Ubuntu:	1.52s (±0.02s) → 0.99s (±0.02s) · -0.53s (-34.6%)
macOS:	0.88s (±0.05s) → 0.64s (±0.04s) · -0.24s (-27.5%)
Windows:	2.68s (±0.66s) → 1.56s (±0.35s) · -1.12s (-41.7%)

763fc00 perf(core): Overlap security check with output generation via optimistic pipeline

Ubuntu:	1.51s (±0.03s) → 0.98s (±0.03s) · -0.53s (-35.0%)
macOS:	0.87s (±0.10s) → 0.63s (±0.04s) · -0.24s (-27.4%)
Windows:	1.93s (±0.04s) → 1.18s (±0.02s) · -0.74s (-38.6%)

1af91bf perf(core): Overlap security check with output generation via optimistic pipeline

Ubuntu:	1.59s (±0.02s) → 1.03s (±0.02s) · -0.55s (-34.7%)
macOS:	0.89s (±0.04s) → 0.67s (±0.06s) · -0.22s (-25.1%)
Windows:	2.01s (±0.52s) → 1.25s (±0.25s) · -0.76s (-37.8%)

1f5ac10 perf(core): Skip globby filesystem traversal via git ls-files + picomatch fast path

Ubuntu:	1.55s (±0.06s) → 1.05s (±0.03s) · -0.51s (-32.7%)
macOS:	1.34s (±0.29s) → 1.04s (±0.31s) · -0.31s (-22.8%)
Windows:	2.44s (±0.07s) → 1.52s (±0.03s) · -0.92s (-37.9%)

07e082a perf(core): Estimate output tokens via sampling for large outputs

Ubuntu:	1.59s (±0.05s) → 1.18s (±0.04s) · -0.41s (-25.6%)
macOS:	1.08s (±0.16s) → 0.87s (±0.14s) · -0.21s (-19.1%)
Windows:	1.62s (±0.36s) → 1.21s (±0.16s) · -0.41s (-25.6%)

a74f424 perf(core): Reduce search I/O contention by deferring security worker init and parallelizing globby

Ubuntu:	1.52s (±0.03s) → 1.33s (±0.02s) · -0.19s (-12.6%)
macOS:	0.88s (±0.06s) → 0.91s (±0.04s) · +0.03s (+2.9%)
Windows:	1.97s (±0.03s) → 1.80s (±0.05s) · -0.17s (-8.4%)

149d995 perf(core): Reduce security worker pool size to avoid thread oversubscription

Ubuntu:	1.56s (±0.02s) → 1.33s (±0.03s) · -0.23s (-15.0%)
macOS:	0.86s (±0.03s) → 0.90s (±0.04s) · +0.03s (+3.8%)
Windows:	1.99s (±0.06s) → 1.79s (±0.05s) · -0.20s (-10.0%)

bea2980 perf(security): Pre-initialize security worker pool to overlap module loading with file I/O

Ubuntu:	1.47s (±0.03s) → 1.28s (±0.03s) · -0.19s (-13.2%)
macOS:	1.03s (±0.19s) → 0.99s (±0.21s) · -0.04s (-3.5%)
Windows:	1.95s (±0.02s) → 1.77s (±0.04s) · -0.18s (-9.0%)

7a0ce5d Merge remote-tracking branch 'origin/perf/auto-perf-tuning-0403' into perf/auto-perf-tuning-0403

Ubuntu:	1.63s (±0.02s) → 1.44s (±0.03s) · -0.19s (-11.6%)
macOS:	1.09s (±0.47s) → 1.10s (±0.43s) · +0.01s (+0.5%)
Windows:	1.92s (±0.02s) → 1.75s (±0.03s) · -0.16s (-8.6%)

e0000fb Merge remote-tracking branch 'origin/perf/auto-perf-tuning-0403' into perf/auto-perf-tuning-0403

Ubuntu:	1.57s (±0.06s) → 1.57s (±0.04s) · -0.00s (-0.1%)
macOS:	0.89s (±0.04s) → 0.89s (±0.02s) · -0.01s (-1.0%)
Windows:	2.04s (±0.12s) → 2.05s (±0.21s) · +0.01s (+0.5%)

e8cb018 [autofix.ci] apply automated fixes

Ubuntu:	1.52s (±0.02s) → 1.52s (±0.02s) · +0.00s (+0.1%)
macOS:	0.85s (±0.03s) → 0.85s (±0.03s) · +0.00s (+0.2%)
Windows:	1.84s (±0.03s) → 1.83s (±0.04s) · -0.02s (-1.0%)

256784e perf(core): Make worker pool cleanup non-blocking to eliminate shutdown overhead

Ubuntu:	1.45s (±0.02s) → 1.44s (±0.03s) · -0.01s (-0.3%)
macOS:	1.21s (±0.16s) → 1.17s (±0.18s) · -0.04s (-3.6%)
Windows:	1.99s (±0.03s) → 1.97s (±0.05s) · -0.02s (-1.0%)

cc97be3 perf(core): Replace regex with manual scan in truncateBase64

Ubuntu:	1.51s (±0.02s) → 1.50s (±0.04s) · -0.00s (-0.3%)
macOS:	0.90s (±0.07s) → 0.91s (±0.06s) · +0.01s (+1.0%)
Windows:	1.87s (±0.03s) → 1.85s (±0.04s) · -0.02s (-0.9%)

2ac33ec perf(metrics): Batch token counting tasks to reduce IPC overhead

Ubuntu:	1.52s (±0.02s) → 1.52s (±0.03s) · +0.00s (+0.0%)
macOS:	1.23s (±0.48s) → 1.09s (±0.41s) · -0.14s (-11.4%)
Windows:	1.89s (±0.51s) → 1.83s (±0.47s) · -0.05s (-2.7%)

6d12895 Merge remote-tracking branch 'origin/perf/auto-perf-tuning-0403' into perf/auto-perf-tuning-0403

Ubuntu:	1.41s (±0.01s) → 0.95s (±0.02s) · -0.45s (-32.3%)
macOS:	1.36s (±0.16s) → 0.90s (±0.14s) · -0.46s (-33.8%)
Windows:	1.89s (±0.06s) → 1.35s (±0.05s) · -0.55s (-28.8%)

b980d65 Merge remote perf changes and resolve conflicts

Ubuntu:	1.59s (±0.03s) → 1.13s (±0.02s) · -0.46s (-29.0%)
macOS:	1.22s (±0.16s) → 0.82s (±0.10s) · -0.40s (-32.9%)
Windows:	1.84s (±0.03s) → 1.31s (±0.03s) · -0.53s (-28.8%)

1a32a49 Merge remote perf changes and resolve conflicts

Ubuntu:	1.60s (±0.05s) → 1.11s (±0.03s) · -0.48s (-30.3%)
macOS:	1.42s (±0.18s) → 0.95s (±0.09s) · -0.47s (-32.9%)
Windows:	1.83s (±0.03s) → 1.27s (±0.05s) · -0.55s (-30.2%)

70fa880 Merge remote-tracking branch 'origin/perf/auto-perf-tuning-0403' into perf/auto-perf-tuning-0403

Ubuntu:	1.50s (±0.03s) → 1.14s (±0.01s) · -0.36s (-24.3%)
macOS:	1.04s (±0.14s) → 0.71s (±0.07s) · -0.33s (-31.4%)
Windows:	1.84s (±0.04s) → 1.41s (±0.04s) · -0.43s (-23.4%)

0fd8bfe Merge remote-tracking branch 'origin/perf/auto-perf-tuning-0403' into perf/auto-perf-tuning-0403

Ubuntu:	1.52s (±0.04s) → 1.16s (±0.03s) · -0.36s (-23.8%)
macOS:	1.47s (±0.39s) → 1.05s (±0.17s) · -0.42s (-28.8%)
Windows:	1.86s (±0.06s) → 1.42s (±0.04s) · -0.43s (-23.3%)

897e668 test(core): Add test for token estimation fast path in calculateMetrics

Ubuntu:	1.56s (±0.02s) → 1.18s (±0.02s) · -0.39s (-24.6%)
macOS:	1.45s (±0.33s) → 1.00s (±0.25s) · -0.44s (-30.6%)
Windows:	1.82s (±0.04s) → 1.36s (±0.05s) · -0.46s (-25.5%)

e313c5b perf(core): Estimate output token count from file metrics ratio

Ubuntu:	1.47s (±0.02s) → 1.10s (±0.01s) · -0.36s (-24.8%)
macOS:	0.90s (±0.07s) → 0.63s (±0.06s) · -0.27s (-29.5%)
Windows:	1.86s (±0.04s) → 1.39s (±0.03s) · -0.47s (-25.4%)

78d17ba perf(core): Overlap output generation with security check via speculative execution

Ubuntu:	1.50s (±0.03s) → 1.51s (±0.03s) · +0.02s (+1.1%)
macOS:	1.16s (±0.26s) → 1.09s (±0.32s) · -0.07s (-6.0%)
Windows:	2.46s (±0.69s) → 2.53s (±0.81s) · +0.07s (+2.7%)

3b18625 merge: Resolve conflict with remote base64 pre-check, prefer charCodeAt approach

Ubuntu:	1.52s (±0.04s) → 1.47s (±0.02s) · -0.04s (-3.0%)
macOS:	0.93s (±0.05s) → 0.90s (±0.06s) · -0.03s (-3.6%)
Windows:	1.86s (±0.02s) → 1.81s (±0.04s) · -0.05s (-2.5%)

620c90a merge: Resolve conflicts with remote batch metrics implementation

Ubuntu:	1.54s (±0.02s) → 1.50s (±0.03s) · -0.04s (-2.5%)
macOS:	0.94s (±0.10s) → 0.92s (±0.09s) · -0.02s (-2.3%)
Windows:	2.14s (±0.06s) → 2.09s (±0.07s) · -0.05s (-2.4%)

dd6619b merge: Resolve conflicts with remote batch metrics changes

Ubuntu:	1.53s (±0.03s) → 1.50s (±0.02s) · -0.04s (-2.3%)
macOS:	0.90s (±0.06s) → 0.87s (±0.05s) · -0.03s (-3.3%)
Windows:	1.83s (±0.03s) → 1.77s (±0.05s) · -0.06s (-3.2%)

881f907 merge: Resolve conflicts with remote batch metrics changes

Ubuntu:	1.53s (±0.01s) → 1.49s (±0.03s) · -0.03s (-2.2%)
macOS:	1.33s (±0.17s) → 1.31s (±0.27s) · -0.02s (-1.4%)
Windows:	1.89s (±0.04s) → 1.84s (±0.02s) · -0.05s (-2.6%)

4c96b48 Merge branch 'perf/auto-perf-tuning-0403' of http://127.0.0.1:40119/git/yamadashy/repomix into perf/auto-perf-tuning-0403

Ubuntu:	1.44s (±0.02s) → 1.37s (±0.02s) · -0.07s (-4.9%)
macOS:	0.86s (±0.04s) → 0.84s (±0.06s) · -0.02s (-2.0%)
Windows:	1.87s (±0.03s) → 1.78s (±0.02s) · -0.09s (-4.8%)

05801b9 perf(security): Batch security check tasks to reduce IPC overhead

Ubuntu:	1.52s (±0.01s) → 1.45s (±0.02s) · -0.08s (-5.0%)
macOS:	0.89s (±0.03s) → 0.87s (±0.10s) · -0.02s (-2.6%)
Windows:	1.87s (±0.04s) → 1.80s (±0.05s) · -0.07s (-3.5%)

ce60e97 merge(main): Resolve conflicts with main branch warmup changes

Ubuntu:	1.52s (±0.02s) → 1.52s (±0.02s) · +0.00s (+0.1%)
macOS:	0.91s (±0.11s) → 0.92s (±0.07s) · +0.01s (+0.8%)
Windows:	2.24s (±0.32s) → 2.33s (±0.34s) · +0.09s (+3.8%)

85fc646 perf(metrics): Batch token counting tasks to reduce IPC overhead

Ubuntu:	1.56s (±0.02s) → 1.54s (±0.02s) · -0.01s (-1.0%)
macOS:	0.85s (±0.05s) → 0.85s (±0.03s) · +0.00s (+0.0%)
Windows:	2.34s (±0.07s) → 2.34s (±0.05s) · +0.00s (+0.0%)

coderabbitai · 2026-04-03T06:33:04Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6c4c4852-18c3-487f-b058-2bb96ab1f6a4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/auto-perf-tuning-0403

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cloudflare-workers-and-pages · 2026-04-03T06:33:22Z

Deploying repomix with Cloudflare Pages

Latest commit:	`ae547ec`
Status:	✅ Deploy successful!
Preview URL:	https://9599cd8d.repomix.pages.dev
Branch Preview URL:	https://perf-auto-perf-tuning-0403.repomix.pages.dev

View logs

codecov · 2026-04-03T06:33:43Z

Codecov Report

❌ Patch coverage is 83.78378% with 60 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.82%. Comparing base (208f492) to head (ae547ec).
⚠️ Report is 7 commits behind head on main.

Files with missing lines	Patch %	Lines
src/core/packager.ts	70.51%	23 Missing ⚠️
src/core/file/fileSearch.ts	80.72%	16 Missing ⚠️
src/core/metrics/workers/calculateMetricsWorker.ts	0.00%	10 Missing ⚠️
src/core/metrics/calculateMetrics.ts	91.11%	4 Missing ⚠️
src/core/security/securityCheck.ts	72.72%	3 Missing ⚠️
src/core/output/outputGenerate.ts	91.66%	2 Missing ⚠️
src/core/file/fileProcess.ts	50.00%	1 Missing ⚠️
src/core/file/truncateBase64.ts	98.59%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1377      +/-   ##
==========================================
- Coverage   87.40%   86.82%   -0.59%     
==========================================
  Files         116      116              
  Lines        4389     4651     +262     
  Branches     1018     1109      +91     
==========================================
+ Hits         3836     4038     +202     
- Misses        553      613      +60

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

gemini-code-assist

Code Review

This pull request introduces batch processing for token counting to minimize IPC overhead between the main thread and worker threads. Key changes include updating the TokenCountTask interface to support multiple items, modifying the calculateMetricsWorker to process these items in a single pass, and implementing a batching mechanism in calculateSelectiveFileMetrics with a default size of 100. Corresponding updates were made to git diff, git log, and output metrics calculations, along with comprehensive test updates. Feedback suggests that the fixed batch size might lead to under-utilization of the worker pool for medium-sized workloads and recommends considering a more dynamic approach.

gemini-code-assist · 2026-04-03T06:35:55Z

+// Batch size for grouping files into worker tasks to reduce IPC overhead.
+// Each batch is sent as a single message to a worker thread, avoiding
+// per-file round-trip costs that dominate when processing many small files.
+const BATCH_SIZE = 100;


While a fixed BATCH_SIZE of 100 significantly reduces IPC overhead for large file sets, it may lead to under-utilization of the worker pool for medium-sized sets. For instance, with 150 files on an 8-core machine, only 2 workers will be active. Consider a more dynamic approach or a smaller default if the goal is to maximize parallelization across all available cores for smaller workloads.

Batch file token counting tasks into groups of 50 before dispatching to worker threads, reducing IPC round-trips by ~95% (e.g. 990 → 20 for a typical repo). This follows the same batching pattern already applied to security checks in #1380. Changes: - Redesign metrics worker to accept batch tasks (TokenCountBatchTask) instead of individual items, processing multiple files per IPC round-trip - Update calculateSelectiveFileMetrics to group files into batches of 50 before dispatching to worker pool - Adapt all metrics callers (output, git diff, git log) to use the new batch interface - Update unified worker task inference to distinguish batched metrics tasks (items with encoding) from security check tasks (items with type) Benchmark (repomix on its own repo, 990 files, 4 CPU cores): Before: 1246ms avg pack time After: 1092ms avg pack time Improvement: ~155ms savings (12.4%) The improvement comes from eliminating per-file IPC message passing overhead. Each round-trip involves structured clone serialization of file content, message dispatch, and result deserialization. Batching amortizes this cost across 50 files per message. https://claude.ai/code/session_019tAfah6yMKnauVTNgK3wyQ

Replace the standalone base64 regex pattern /([A-Za-z0-9+/]{256,}={0,2})/g with a manual character-by-character scanning algorithm using a lookup table. The regex was identified via CPU profiling as consuming ~150ms per run on the main thread when truncateBase64 is enabled, scanning 6.6MB of file content across ~1000 files. The regex engine's per-character overhead for the {256,} quantifier made this disproportionately expensive. The optimized approach: - Uses a Uint8Array lookup table for O(1) base64 character classification - Performs a fast first pass to check if any 256+ char runs exist (early exit) - Builds the result using array parts + join instead of string concatenation - Replaces regex checks in isLikelyBase64() with charCode comparisons Benchmark results (repomix repo, 1006 files, 6.6MB content): - truncateBase64 function: 151ms → 36ms (4.2x faster) - Full pack() with truncateBase64=true: P50 1522ms → 1420ms (6.7% faster) The optimization produces byte-identical output (verified across all 1006 files). https://claude.ai/code/session_017cpLL66Hs2zjm3zZ9Bjori

…wn overhead Worker pool cleanup (terminating idle worker threads via Tinypool's destroy) was blocking the return path of pack(), adding ~150ms for the metrics pool and ~40ms for the security pool to every invocation. Since all tasks are already complete when cleanup runs, the threads are idle and termination is pure IPC overhead with no functional purpose for the caller. Changed `await taskRunner.cleanup()` to fire-and-forget `taskRunner.cleanup().catch(() => {})` in four locations: - packager.ts: metrics pool (finally block, ~150ms saved) - securityCheck.ts: security pool (finally block, ~40ms saved) - fileProcess.ts: file processing pool (finally block, affects --compress) - calculateMetrics.ts: standalone metrics pool (fallback path) For CLI usage, the process exits shortly after pack() returns, which terminates any remaining threads via OS cleanup. For library/MCP usage, threads are still terminated asynchronously and reclaimed. Benchmark (repomix on its own repo, 1014 files, 4 CPU cores): Before: 1811ms avg pack time After: 1595ms avg pack time Improvement: ~216ms savings (11.9%) The improvement comes from removing synchronous worker thread termination from the critical path. Tinypool's destroy() sends termination messages to each worker, waits for acknowledgment, and joins the threads—none of which the caller needs to block on when all tasks have already completed successfully. https://claude.ai/code/session_01SMHUcwLAmv7mcsNsr8sQFj

…individually tokenized When tokenCountTree is enabled, all files are individually tokenized for the token count tree display. The output file is essentially these file contents wrapped in template markup (XML tags, headers, tree structure). Previously, the entire output (~3.6MB for this repo) was re-tokenized via worker threads, duplicating ~95% of the work already done during file tokenization. Instead, estimate output tokens as: output_tokens = sum(file_tokens) + overhead_tokens where overhead_tokens uses the same chars-per-token ratio observed in file content. This avoids dispatching ~36 output chunks to worker threads (~207ms of tokenization work), freeing the worker pool to complete file metrics faster. The estimation error is negligible (~0.14% for repomix's own repo: 1,011,887 estimated vs 1,010,472 exact). The current chunk-based approach already has small boundary effects from splitting at arbitrary 100KB positions. When tokenCountTree is disabled (default), the standard full-output tokenization path is preserved unchanged. Benchmark (repomix on its own repo, 989 files, 4 CPU cores, 10 runs each): Baseline avg: 1791ms (min 1762, max 1830) Optimized avg: 1592ms (min 1546, max 1634) Improvement: ~199ms savings (11.1%) The improvement comes from eliminating redundant tokenization of file content that was already counted individually. With tokenCountTree enabled, the worker pool previously processed 20 file batches + 36 output chunks = 56 tasks. Now it processes only 20 file batches, reducing total worker pool wall time from ~530ms to ~303ms. https://claude.ai/code/session_0132C7om9T8M2skDqfSh95qW

…stat elimination Two optimizations that together reduce CLI execution time by ~6.6%: 1. Use `git ls-files` for gitignore filtering instead of globby's JS parser - globby's gitignore parsing reads and evaluates .gitignore files at every directory level using JavaScript (~250ms for a 1000-file repo) - `git ls-files --cached --others --exclude-standard` delegates this to git's native C implementation (~10ms) - globby still handles include/ignore patterns and .repomixignore support (with gitignore:false), and results are intersected with the git file set - Falls back to globby's gitignore when git is unavailable 2. Eliminate redundant fs.stat() call before fs.readFile() in file collection - Previously each file required stat() (size check) then readFile() = 2 syscalls - Now readFile() runs first, then buffer.length is checked = 1 syscall - Files exceeding maxFileSize (default 10MB) are rare; the occasional oversized read is acceptable for halving syscall count on all files Benchmark (15 iterations each, median, repomix on itself ~1000 files): Baseline: 2288ms (P25: 2234ms, P75: 2669ms) Optimized: 2137ms (P25: 2006ms, P75: 2284ms) Improvement: ~151ms (6.6%) Isolated measurements: - searchFiles: 239ms → 200ms (-39ms, git ls-files fast path) - collectFiles: 256ms → 132ms (-124ms, stat elimination, 50 concurrency) https://claude.ai/code/session_01BhyEyYSfaJev3zzMjKpj4x

… perf/auto-perf-tuning-0403

…pipeline overlap Move the metrics worker pool creation (gpt-tokenizer warmup) from after searchFiles to the very start of the pack() pipeline. This allows the expensive gpt-tokenizer module loading in worker threads (~215ms) to overlap with the file search phase (~33ms), reducing the critical path blocking time. Previously, the warmup overlapped only with file collection and security check stages (~150ms total), leaving ~65ms on the critical path. Now it overlaps with file search + collection + security check (~185ms total), reducing blocking to ~30ms. Since the actual file count is unknown before searchFiles completes, a heuristic estimate of 200 tasks is used for worker thread sizing. This yields 2 workers on most machines, balancing warmup speed (less CPU contention) with sufficient parallelism. For larger repos (>200 files), 2 workers still provide good throughput since metrics calculation runs concurrently with output generation. Benchmark results (20 runs each, packing 126 files from src/): - Baseline trimmed mean: 457ms - Optimized trimmed mean: 418ms - Improvement: 39ms (8.5%) - Baseline median: 457ms - Optimized median: 413ms - Improvement: 44ms (9.6%) https://claude.ai/code/session_015V48t1u1jZjm6mUkpsV1tv

… perf/auto-perf-tuning-0403

… loading with file I/O Create security check worker threads eagerly (minThreads = maxThreads) before file search begins, so @secretlint/core module loading (~150ms) runs in the background during the I/O-heavy file search and collection phases. Previously, security workers were created lazily on the first batch submission, meaning the expensive @secretlint/core initialization blocked the first security check batch. Now workers start loading immediately at pool creation and are ready when security check begins. Key changes: - Add `eagerWorkers` option to WorkerOptions to set minThreads = maxThreads - Add `createSecurityTaskRunner()` that creates pool with eager workers - Pre-create security pool before file search in packager pipeline - Pass pre-created task runner through validateFileSafety to runSecurityCheck - Clean up security pool immediately after security check (before metrics) - Security pool cleanup also in finally block for error paths Benchmark (repomix on itself, ~1000 files, 10 runs after warmup, 4 CPU cores): Baseline: Median 1158ms | P25 1136ms | P75 1187ms | Min 1120ms Optimized: Median 1103ms | P25 1091ms | P75 1105ms | Min 1063ms Improvement: ~5% median, ~7% P75, much tighter variance The improvement comes from overlapping the ~150ms @secretlint/core module loading in 4 worker threads with the ~100ms file search (globby I/O) and ~300ms file collection (disk reads), eliminating the module loading from the critical path. https://claude.ai/code/session_01JxK3KLc12sGQU7x1rxCcLJ

… init and parallelizing globby Two optimizations to reduce the file search phase bottleneck: 1. Defer security worker pool creation from before search to after search. With both metrics (2 workers) and security (up to 4 workers) pools pre-initialized before search, up to 6 worker threads simultaneously load heavy modules (@secretlint/core, gpt-tokenizer), creating CPU/IO contention with globby's filesystem traversal. Moving security pool creation to after search lets @secretlint/core loading (~150ms) overlap with the lighter file collection phase instead. Metrics workers are still pre-initialized before search since they are needed throughout the pipeline. 2. Parallelize file globby and directory globby within searchFiles. When includeEmptyDirectories is enabled, the directory search (with full gitignore parsing) ran sequentially after file search. Starting both concurrently overlaps the two filesystem traversals. Benchmark (15 iterations, in-process pack(), median): - Baseline: 1152ms - Optimized: 1080ms - Improvement: 72ms (6.3%) https://claude.ai/code/session_01Ga7a5Qg2mR3ZqFAGe4kk3Q

For single-part outputs larger than 500KB, estimate the total token count by sampling 10 evenly spaced 100KB portions of the output and extrapolating the chars-per-token ratio to the full content. This avoids running BPE tokenization on the entire multi-MB output through worker threads. The sampling approach achieves ~0.2% accuracy compared to full tokenization because the chars-per-token ratio is stable across evenly distributed samples that capture both markup overhead and file content. This optimization targets the default config path (tokenCountTree disabled). When tokenCountTree is enabled, the existing file-token-sum estimation (added in a prior commit) is used instead. Split outputs still use full tokenization per part. Benchmark results (991 files, ~3.9MB output, 4-core machine, 15 runs): Before (full output tokenization): Trimmed mean: 1.417s After (output sampling estimation): Trimmed mean: 1.301s Improvement: 8.2% (116ms) Token count accuracy: Exact: 1,034,842 tokens Estimated: 1,036,915 tokens (0.2% error) https://claude.ai/code/session_01EWxSA8Tdwvd2jJrMGsvVJU

…atch fast path When git ls-files is available and include patterns are default (**/*), bypass globby's filesystem traversal entirely by filtering git-tracked files with picomatch. This eliminates ~70ms of directory walking and pattern matching that globby performs even with gitignore disabled. The fast path: 1. Uses git ls-files (already fetched) as the file source 2. Reads root .repomixignore and merges patterns with default ignores 3. Uses git ls-files -s to detect symlinks (mode 120000) in parallel 4. Filters with picomatch (same engine as fast-glob) for consistency 5. Falls back to globby when nested .repomixignore/.ignore files exist Benchmark (repomix repo, ~1030 files, 7 runs, trimmed mean): - CLI (full config): 1233ms → 1144ms (-89ms, -7.2%) - pack() without emptyDirs: 770ms → 733ms (-37ms, -4.8%) - pack() with emptyDirs: neutral (dir globby still runs for empty dir detection) https://claude.ai/code/session_01WudWSWteuywL5ZfmPWFnXi

…tic pipeline Restructure the pack() pipeline to start output generation and metrics calculation immediately after file processing, without waiting for the security check to complete. In the common case (>95% of runs), no suspicious files are found and the optimistic output is correct. In the rare case where suspicious files are detected, the output is regenerated with filtered files. Additionally, lazy-load globby and jschardet/iconv-lite to reduce module import overhead on the critical startup path. Globby (~50ms) is only needed when the git-only fast path falls back to filesystem traversal. jschardet/iconv-lite (~25ms) are only needed for non-UTF-8 files (<1%). Pipeline change (before): Search → Collect+Git → [Security + Process] → Output+Metrics Pipeline change (after): Search → Collect+Git → Process → [Security ‖ Output+Metrics] Benchmark results (20 runs each, repomix on itself with 1012 files): Default config (tokenCountTree=false): Baseline: 1059ms avg → Optimized: 994ms avg (-6.1%) Project config (tokenCountTree=50000, all features): Baseline: 1319ms avg → Optimized: 1286ms avg (-2.5%) The default config improvement exceeds the 5% target. The project config shows smaller relative gains because all-file tokenization dominates the output+metrics phase, reducing the relative benefit of security overlap. All 1101 tests pass. No functional changes — output is identical. https://claude.ai/code/session_012ZhuvmD4C16mcvYy3YthxP

Skip two expensive full-content regex scans in createRenderContext when their results are unused by the current output format: - calculateFileLineCounts: scans all file contents to count newlines. Not referenced by any Handlebars template (XML, markdown, plain) or parsable output generator (XML, JSON). Only used by the skill path, so skip for regular output entirely. - calculateMarkdownDelimiter: scans all file contents for backtick sequences. Only used by the markdown template and skill generators. Skip for XML, plain, JSON, and parsable-XML output styles. Also optimized the implementations for when they do run: - calculateMarkdownDelimiter: replaced flatMap + intermediate array allocation with single-pass inline max tracking. - calculateFileLineCounts: replaced regex-based newline counting (which allocated arrays of all matches) with indexOf-based loop. Benchmark (3000 files / 12MB synthetic TypeScript codebase): - XML output (default): 1576ms → 1465ms (-111ms, ~7% improvement) - Micro-benchmark (5000 files / 36MB): 60.6ms of render context overhead eliminated entirely for default XML output - calculateFileLineCounts: 30ms → 13ms (2.4x faster via indexOf) - calculateMarkdownDelimiter: 39ms → 30ms (1.3x faster, no alloc) - Correctness verified: all 1096 tests pass https://claude.ai/code/session_01RoH4sBaaDHvVnZLJzTYGP7

… perf/auto-perf-tuning-0403

gemini-code-assist bot reviewed Apr 3, 2026

View reviewed changes

yamadashy added the automated label Apr 3, 2026

yamadashy force-pushed the perf/auto-perf-tuning-0403 branch 3 times, most recently from 1a32a49 to b980d65 Compare April 4, 2026 00:12

yamadashy force-pushed the perf/auto-perf-tuning-0403 branch from 6d12895 to 2ac33ec Compare April 4, 2026 01:50

claude and others added 10 commits April 4, 2026 03:07

[autofix.ci] apply automated fixes

e8cb018

Merge remote-tracking branch 'origin/perf/auto-perf-tuning-0403' into…

e0000fb

… perf/auto-perf-tuning-0403

Merge remote-tracking branch 'origin/perf/auto-perf-tuning-0403' into…

7a0ce5d

… perf/auto-perf-tuning-0403

yamadashy force-pushed the perf/auto-perf-tuning-0403 branch from 149d995 to a74f424 Compare April 4, 2026 22:50

claude added 2 commits April 5, 2026 00:18

yamadashy force-pushed the perf/auto-perf-tuning-0403 branch from 1af91bf to 763fc00 Compare April 5, 2026 03:00

yamadashy force-pushed the perf/auto-perf-tuning-0403 branch from 763fc00 to dcc8452 Compare April 5, 2026 03:01

claude added 2 commits April 5, 2026 04:18

Merge remote-tracking branch 'origin/perf/auto-perf-tuning-0403' into…

ae547ec

… perf/auto-perf-tuning-0403

yamadashy mentioned this pull request Apr 5, 2026

perf(core): Skip unnecessary content scans in createRenderContext #1398

Closed

2 tasks

This was referenced Apr 5, 2026

perf(core): Eliminate stat() syscall and lazy-load encoding libraries in fileRead #1399

Closed

perf(core): Eliminate redundant stat() syscall in fileRead #1400

Merged

perf(core): Lazy-load jschardet and iconv-lite in fileRead #1401

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(core): Automated performance tuning by Claude#1377

perf(core): Automated performance tuning by Claude#1377
yamadashy wants to merge 16 commits intomainfrom
perf/auto-perf-tuning-0403

yamadashy commented Apr 3, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 3, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Apr 3, 2026 •

edited

Loading

Review skipped

Uh oh!

cloudflare-workers-and-pages bot commented Apr 3, 2026 •

edited

Loading

Uh oh!

codecov bot commented Apr 3, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

yamadashy commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes in this PR

1. Batch metrics token counting (calculateSelectiveFileMetrics.ts, calculateMetricsWorker.ts)

2. Replace regex with manual scan in truncateBase64 (truncateBase64.ts)

3. Non-blocking worker pool cleanup (packager.ts, securityCheck.ts)

4. Skip redundant output tokenization (calculateMetrics.ts)

5. Reduce file I/O overhead (fileRead.ts, fileSearch.ts)

6. Move metrics worker warmup before file search (packager.ts)

7. Pre-initialize security worker pool (packager.ts, securityCheck.ts, processConcurrency.ts)

8. Reduce search I/O contention (packager.ts, fileSearch.ts)

9. Estimate output tokens via sampling (calculateMetrics.ts)

10. Skip globby via git ls-files + picomatch fast path (fileSearch.ts)

11. Optimistic security pipeline + lazy module loading (packager.ts, fileSearch.ts, fileRead.ts)

12. Skip unnecessary content scans in createRenderContext (outputGenerate.ts)

Benchmark Results

Checklist

Uh oh!

github-actions bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚡ Performance Benchmark

Uh oh!

coderabbitai bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

cloudflare-workers-and-pages bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying repomix with Cloudflare Pages

Uh oh!

codecov bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yamadashy commented Apr 3, 2026 •

edited

Loading

github-actions bot commented Apr 3, 2026 •

edited

Loading

coderabbitai bot commented Apr 3, 2026 •

edited

Loading

cloudflare-workers-and-pages bot commented Apr 3, 2026 •

edited

Loading

codecov bot commented Apr 3, 2026 •

edited

Loading