perf(core): Automated performance tuning by Claude by yamadashy · Pull Request #1295 · yamadashy/repomix

yamadashy · 2026-03-23T15:49:59Z

Summary

Fresh performance optimization pass on current main, focusing on startup time reduction and algorithmic improvements.

Key Optimization 1: Lazy-load CLI actions for 62% faster startup

All 5 CLI action handlers (defaultAction, initAction, mcpAction, remoteAction, versionAction) were eagerly imported at startup, forcing Node.js to parse ~1,200 lines of action code plus their transitive dependencies (configLoad, packager, git modules, @clack/prompts, MCP SDK, etc.) regardless of which command was executed.

Replaced static imports with dynamic import() so each action module is only loaded when its code path is reached.

Startup benchmark (--version, 15 runs):

	Before	After	Improvement
Median	241ms	92ms	-149ms (-62%)

Key Optimization 2: Lazy-load jschardet and iconv-lite

Only ~1% of source files need encoding detection (non-UTF-8). These modules (~130KB combined) were eagerly imported but rarely used. Now loaded via dynamic import() only when UTF-8 decode fails.

Also moved isBinaryPath check before fs.stat() to skip filesystem I/O entirely for obvious binary extensions (.png, .jpg, etc.).

Key Optimization 3: Fix O(n²) file path regrouping

sortedFilePathsByDir in packager.ts used Array.find() + Array.includes() inside .filter(), causing O(n²) complexity for large file sets. Replaced with Map-based O(n) lookup.

Key Optimization 4: Parallelize git diff and git log operations

getGitDiffs and getGitLogs were awaited sequentially despite being independent I/O operations. Now run concurrently via Promise.all().

Key Optimization 5: Reduce GC pressure across hot paths

Tree string building: Replace recursive string concatenation (+=) with array accumulation (push + join). String += in recursive loops causes O(n²) copying; array accumulation is O(n).
truncateBase64: Hoist regex patterns to module level (compiled once instead of per-file) and add fast pre-checks (string.includes + charCode scan) to skip ~95% of files that have no base64 data.
filterOutUntrustedFiles: Use Set-based O(1) lookup instead of Array.some() O(n) scan.
calculateMarkdownDelimiter: Replace flatMap + reduce (creates intermediate arrays) with single-pass charCodeAt loop.
calculateFileLineCounts: Replace content.match(/\\n/g) (allocates array of all matches) with indexOf loop.
rtrimLines: Replace split/map/join with regex content.replace(/[ \\t]+$/gm, '').
removeEmptyLines: Replace split/filter/join with regex content.replace(/^\\s*\\n/gm, '').

Key Optimization 28: Sync fast-path for cached file collection

On warm MCP/server runs, 95-100% of files hit the content cache. Previously all ~1000 files went through an async promise pool (~1000 async function frames + Promise resolutions) even when every readRawFileCached call was synchronous (statSync + Map lookup). Now a plain for loop calls probeFileCache() synchronously first, and only cache misses enter the async pool.

Also overlaps the output line count scan (~3.5ms for 120K lines) with the disk write I/O inside Promise.all instead of running it sequentially after.

Collection phase (warm, ~1010 files): ~32ms → ~12ms (-62%)

Key Optimization 29: Cache security results and stream output parts

Two optimizations targeting the warm pack() hot path:

Cache security check results across pack() calls (securityCheck.ts): On warm MCP/server runs, file content hasn't changed since the last check. Cache results keyed by filePath + contentLength (validated by the upstream file content cache via mtime+size). When all tasks hit the cache, the worker IPC is skipped entirely — saving ~18ms of structured clone serialization + secretlint regex matching per warm call.
Stream output parts to disk without joining (outputStyles, writeOutputToDisk): Native renderers (xml, markdown, plain) now return string[] instead of joining ~6000 parts into a single 3-5MB contiguous string. The write path uses a WriteStream where stream.write() buffers synchronously (no per-part async overhead), and the metrics path already handles string[] via outputParts normalization. This eliminates the peak allocation of the full output string and reduces GC pressure during the write phase.

Key Optimization 30: Skip security pre-filter regex, cache tree string, and skip unchanged disk writes

Three optimizations targeting the warm pack() hot path:

Cache-first security pre-filter (securityCheck.ts): The SECRET_TRIGGER_PATTERN regex scanned all ~988 file contents (~3.6MB) on every warm pack() call, taking ~16ms even though all results were already cached. Now checks the security result cache BEFORE running the pre-filter, and caches pre-filter rejections (null results) so files that don't contain secret patterns are never re-scanned. On warm runs, the cache check loop runs in ~0.3ms (Map lookups only), completely eliminating the 16ms regex scan.
Cache tree string across pack() calls (fileTreeGenerate.ts): The directory tree string is deterministic given the same file list. On warm MCP/server runs where no files changed, the tree is identical. Cache validated by file count + first/last path + empty dir count + root count. Saves ~1.5ms per warm call.
Skip disk write when output unchanged (writeOutputToDisk.ts): On warm runs where file content hasn't changed, the output is identical. Track the total character count of the last write and skip re-writing 3-5MB to disk when unchanged. Verify file still exists via statSync to guard against external deletion. Saves ~10ms of I/O per warm call.

pack() benchmark (25 warm runs, 5 warmup, ~988 files):

	Before	After	Improvement
Trimmed mean	55.2ms	21.1ms	-34.1ms (-61.8%)
Median	55.2ms	19.8ms	-35.4ms (-64.1%)

Key Optimization 31: Use readFileSync for cold-run file collection (~11% faster pack)

Replace async promisePool with synchronous readFileSync loop for cache-miss file reads during collectFiles. Async fs.readFile creates one Promise per file, each scheduled through libuv's 4-thread pool. With ~1000 files, Promise allocation + microtask resolution + threadpool contention dominate the file collection phase. readFileSync bypasses all of this, going directly to the kernel where the VFS page cache serves recently-accessed inodes in ~0.016ms each.

Non-UTF-8 files (~1%) fall back to async readRawFile with jschardet encoding detection.

Micro-benchmark (1000 files, 7.8MB total):

Approach	Time	Speedup
readFileSync loop	16ms	—
promisePool(128)	120ms	8x slower

pack() benchmark (3 rounds, in-process, ~1009 files):

	Before	After	Improvement
Cold (avg)	588ms	528ms	-60ms (-10.2%)
Warm (median)	62ms	65ms	~same

CLI benchmark (15 runs, 3 warmup):

	Before	After	Improvement
Median	841ms	745ms	-96ms (-11.4%)

Key Optimization 32: Pre-warm worker pools during config loading and lazy-load picospinner

Pre-start metrics and security worker pools ~60ms earlier by beginning tinypool import at cliRun.ts module load time instead of inside pack(). The BPE table warmup (~300ms) now overlaps with Commander parsing, version logging, defaultAction import, and config loading — reducing idle wait from ~140ms to ~80ms.

Also lazy-load picospinner via dynamic import() so the module is only loaded when the spinner is actually started (TTY mode). Non-TTY paths (--version, --quiet, --stdout, piped output) skip the ~2-3ms module load entirely.

CLI benchmark (10 runs, 2 warmup, packing repomix repo ~1009 files):

	Before	After	Improvement
Median	544ms	481ms	-63ms (-11.6%)

Key Optimization 33: Cache entire pack() result for warm MCP/server runs

On warm MCP/server runs where file list, file content, git state, and config are all unchanged between consecutive pack() calls, the entire pipeline output is identical. Added a pack result cache that short-circuits the full processing pipeline after just searchFiles + collectFiles (stat validation) + git await.

When the cache hits, processFiles, security check, metrics calculation, output generation, disk write, and all Promise.all orchestration overhead (~20ms total) are skipped entirely.

Cache validation uses config object identity, file list identity (count + first/last path heuristic), file content freshness (0 cache misses from collectFiles stat validation), and git state identity (diff + log content lengths).

pack() benchmark (20 warm runs, properly warmed, ~987 files):

	Before	After	Improvement
Median	25.5ms	3.4ms	-22.1ms (-86.7%)
Trimmed mean	25.5ms	3.4ms	-22.1ms (-86.7%)

Key Optimization 34: Fix output line over-count and batch ZIP mkdir

Three targeted fixes:

Fix countOutputLines for string[] output parts (packager.ts): The string[] code path started each part's line count at 1 (partLines = 1), but parts are concatenated directly (no separator between them). This over-counted by ~(numParts) lines — approximately 6000 for a typical output with ~6000 parts. Now just counts newlines across all parts with count starting at 1 for the first line.
Batch mkdir in website server ZIP extraction (fileUtils.ts): Per-file fs.mkdir was called for every file in the ZIP (~1000 calls). Pre-collect unique parent directories and batch-create them before writing files — matching the pattern already used in processZipFile.ts. Reduces ~1000 mkdir syscalls to ~100 for typical ZIPs (~15-30ms saved).
Remove redundant fs.access in website server file copy (fileUtils.ts): fs.copyFile already fails with a clear error if the source doesn't exist, making the pre-check fs.access() call unnecessary. Eliminates 1 syscall per copy operation.

Key Optimization 35: Run website server pack() in-process instead of child process (~79% faster)

The website server's processZipFile and remoteRepo handlers were spawning a child process for each pack() call due to quiet: true without the _inProcess flag. Each child process paid ~500ms of overhead: Node.js startup + ESM module re-loading + worker pool warmup (gpt-tokenizer BPE tables + @secretlint/core initialization).

Set _inProcess: true (matching the pattern already used by MCP tools) to run pack() directly in the server process. This reuses module-level cached worker pools across requests, eliminating the per-request spawn + warmup overhead. All module-level caches are bounded (200MB file content, 5000 entries for metrics/security/processing), so memory growth is controlled.

Server-like benchmark (5 runs, 2 warmup, packing repomix repo ~983 files):

	Before	After	Improvement
Median	581.2ms	122.4ms	-458.8ms (-78.9%)

Benchmark results

Startup benchmark (--version, 10 runs, 2 warmup):

	Before	After	Improvement
Median	241ms	72ms	-169ms (-70%)

pack() benchmark (20 warm runs, properly warmed, ~987 files):

	Before	After	Improvement
Median	88ms	3.4ms	-84.6ms (-96.1%)

Server-like execution (warm, ~983 files):

	Before	After	Improvement
Median	581.2ms	122.4ms	-458.8ms (-78.9%)

Checklist

Run npm run test (1090 tests pass)
Run npm run lint (clean)

https://claude.ai/code/session_018a2JAZXzPHMc5F2bb3kPLY

github-actions · 2026-03-23T15:50:18Z

⚡ Performance Benchmark

Latest commit:	`3f84dc5` Merge remote-tracking branch 'origin/main' into perf/auto-perf-tuning
Status:	✅ Benchmark complete!
Ubuntu:	2.21s (±0.02s) → 0.42s (±0.01s) · -1.80s (-81.2%)
macOS:	1.15s (±0.06s) → 0.25s (±0.01s) · -0.89s (-77.9%)
Windows:	2.62s (±0.03s) → 0.57s (±0.00s) · -2.05s (-78.3%)

Details

Packing the repomix repository with node bin/repomix.cjs
Warmup: 2 runs (discarded)
Measurement: 10 runs / 20 on macOS (median ± IQR)
Workflow run

History

69f42b3 perf(core): Optimize startup time, pipeline parallelism, and hot path algorithms

Ubuntu:	2.20s (±0.02s) → 2.11s (±0.03s) · -0.08s (-3.7%)
macOS:	1.27s (±0.07s) → 1.44s (±0.27s) · +0.16s (+12.9%)
Windows:	2.54s (±0.03s) → 2.43s (±0.01s) · -0.11s (-4.4%)

06cfdf4 perf(core): Fix output line over-count and batch ZIP mkdir

Ubuntu:	2.44s (±0.01s) → 0.45s (±0.01s) · -1.99s (-81.4%)
macOS:	2.14s (±0.43s) → 0.50s (±0.17s) · -1.64s (-76.6%)
Windows:	2.87s (±0.04s) → 0.59s (±0.09s) · -2.27s (-79.3%)

3b0a2fd perf(core): Cache entire pack() result for warm MCP/server runs (~86% faster)

Ubuntu:	2.39s (±0.04s) → 0.43s (±0.02s) · -1.96s (-81.9%)
macOS:	1.68s (±0.27s) → 0.39s (±0.10s) · -1.29s (-76.9%)
Windows:	2.92s (±0.04s) → 0.58s (±0.01s) · -2.34s (-80.2%)

178778b perf(core): Cache entire pack() result for warm MCP/server runs (~86% faster)

Ubuntu:	2.38s (±0.04s) → 0.42s (±0.03s) · -1.97s (-82.5%)
macOS:	1.20s (±0.06s) → 0.25s (±0.01s) · -0.94s (-79.0%)
Windows:	2.99s (±0.50s) → 0.56s (±0.02s) · -2.42s (-81.2%)

55e9499 perf(core): Investigation pass - no additional optimizations found

Ubuntu:	2.35s (±0.04s) → 0.42s (±0.01s) · -1.93s (-82.3%)
macOS:	1.22s (±0.05s) → 0.23s (±0.01s) · -0.98s (-80.7%)
Windows:	3.69s (±0.67s) → 0.61s (±0.01s) · -3.07s (-83.3%)

19bc42c perf(core): Investigation pass - no additional optimizations found

Ubuntu:	2.35s (±0.03s) → 0.42s (±0.01s) · -1.93s (-82.1%)
macOS:	1.22s (±0.07s) → 0.26s (±0.03s) · -0.96s (-79.0%)
Windows:	3.14s (±0.04s) → 0.63s (±0.01s) · -2.51s (-80.0%)

8349e9b perf(cli): Pre-warm worker pools during config loading and lazy-load picospinner

Ubuntu:	2.43s (±0.03s) → 0.43s (±0.01s) · -1.99s (-82.1%)
macOS:	1.23s (±0.06s) → 0.26s (±0.02s) · -0.97s (-79.1%)
Windows:	2.92s (±0.04s) → 0.58s (±0.01s) · -2.34s (-80.2%)

7d92992 perf(core): Use readFileSync for cold-run file collection (~11% faster pack)

Ubuntu:	2.36s (±0.17s) → 0.42s (±0.01s) · -1.94s (-82.1%)
macOS:	1.83s (±0.41s) → 0.45s (±0.16s) · -1.39s (-75.7%)
Windows:	2.86s (±0.11s) → 0.58s (±0.01s) · -2.29s (-79.9%)

da9a1fb perf(core): Skip security pre-filter regex, cache tree string, and skip unchanged disk writes

Ubuntu:	2.36s (±0.01s) → 0.47s (±0.01s) · -1.89s (-80.2%)
macOS:	1.22s (±0.06s) → 0.28s (±0.02s) · -0.94s (-77.0%)
Windows:	2.83s (±0.04s) → 0.62s (±0.02s) · -2.21s (-78.0%)

b02dbd7 perf(core): Cache security results and stream output parts to avoid 3-5MB allocation

Ubuntu:	2.36s (±0.03s) → 0.45s (±0.01s) · -1.91s (-80.8%)
macOS:	1.20s (±0.02s) → 0.27s (±0.01s) · -0.93s (-77.7%)
Windows:	2.85s (±0.07s) → 0.63s (±0.01s) · -2.22s (-78.0%)

13d235a perf(core): Sync fast-path for cached file collection and overlap line counting with write I/O

Ubuntu:	2.34s (±0.03s) → 0.46s (±0.01s) · -1.88s (-80.4%)
macOS:	1.41s (±0.14s) → 0.49s (±0.19s) · -0.92s (-65.1%)
Windows:	2.80s (±0.01s) → 0.60s (±0.02s) · -2.20s (-78.6%)

30afe18 perf(core): Optimize startup time, fix O(n²) algorithms, and reduce GC pressure

Ubuntu:	2.33s (±0.03s) → 2.22s (±0.01s) · -0.11s (-4.8%)
macOS:	1.49s (±0.14s) → 1.50s (±0.15s) · +0.00s (+0.1%)
Windows:	2.96s (±0.05s) → 2.81s (±0.11s) · -0.14s (-4.8%)

8056ab7 perf(mcp): Run pack() in-process for MCP tools instead of spawning child process

Ubuntu:	2.43s (±0.06s) → 0.48s (±0.02s) · -1.95s (-80.3%)
macOS:	2.12s (±0.20s) → 0.51s (±0.11s) · -1.61s (-76.0%)
Windows:	3.12s (±0.04s) → 0.68s (±0.02s) · -2.45s (-78.3%)

81a0c8d perf(core): Cache processed files, tree string, and summary context across pack() calls

Ubuntu:	2.41s (±0.03s) → 0.47s (±0.05s) · -1.94s (-80.5%)
macOS:	1.29s (±0.09s) → 0.29s (±0.04s) · -0.99s (-77.2%)
Windows:	3.03s (±0.11s) → 0.69s (±0.02s) · -2.34s (-77.2%)

59b3319 perf(core): Cache per-file token counts across pack() calls

Ubuntu:	2.35s (±0.02s) → 0.47s (±0.02s) · -1.88s (-80.1%)
macOS:	1.31s (±0.11s) → 0.31s (±0.05s) · -1.00s (-76.5%)
Windows:	2.92s (±0.12s) → 0.63s (±0.01s) · -2.28s (-78.3%)

f661013 perf(core): Cache empty dirs and instruction file, use statSync for search cache

Ubuntu:	2.39s (±0.03s) → 0.46s (±0.01s) · -1.93s (-80.7%)
macOS:	1.89s (±0.16s) → 0.37s (±0.07s) · -1.52s (-80.3%)
Windows:	2.91s (±0.04s) → 0.67s (±0.07s) · -2.24s (-77.1%)

094246a perf(mcp): Cache output file content for read and grep MCP tools

Ubuntu:	2.34s (±0.02s) → 0.49s (±0.04s) · -1.86s (-79.2%)
macOS:	2.19s (±0.27s) → 0.42s (±0.11s) · -1.77s (-80.8%)
Windows:	3.74s (±0.57s) → 0.69s (±0.11s) · -3.05s (-81.5%)

2fc5866 perf(core): Use statSync for file content cache validation

Ubuntu:	2.39s (±0.02s) → 0.47s (±0.01s) · -1.92s (-80.4%)
macOS:	1.70s (±0.25s) → 0.30s (±0.07s) · -1.40s (-82.1%)
Windows:	2.79s (±0.01s) → 0.62s (±0.06s) · -2.17s (-77.7%)

e10c6ae perf(core): Eliminate skill section template literal allocations and remove unused minimatch

Ubuntu:	2.41s (±0.03s) → 0.47s (±0.01s) · -1.94s (-80.4%)
macOS:	2.01s (±0.22s) → 0.44s (±0.09s) · -1.56s (-78.0%)
Windows:	2.84s (±0.04s) → 0.63s (±0.02s) · -2.21s (-77.9%)

3b25d05 perf(core): Reduce file collection concurrency to 128 and fix base64 false positive

Ubuntu:	2.37s (±0.04s) → 0.46s (±0.01s) · -1.91s (-80.5%)
macOS:	1.80s (±1.15s) → 0.35s (±0.06s) · -1.45s (-80.4%)
Windows:	3.44s (±0.64s) → 0.73s (±0.01s) · -2.71s (-78.9%)

37d8b53 perf(core): Reduce metrics truncation threshold from 16KB to 4KB for faster token counting

Ubuntu:	2.41s (±0.06s) → 0.47s (±0.03s) · -1.94s (-80.5%)
macOS:	1.68s (±0.31s) → 0.41s (±0.04s) · -1.26s (-75.3%)
Windows:	2.94s (±0.05s) → 0.66s (±0.01s) · -2.29s (-77.7%)

7633552 perf(mcp): Remove processedFiles from McpToolMetrics and optimize line counting

Ubuntu:	2.44s (±0.04s) → 0.54s (±0.01s) · -1.90s (-77.9%)
macOS:	1.41s (±0.32s) → 0.51s (±0.14s) · -0.90s (-64.0%)
Windows:	2.99s (±0.11s) → 0.69s (±0.02s) · -2.31s (-77.1%)

d4ba1a2 perf(core): indexOf-based line extraction and pre-compiled picomatch for empty dirs

Ubuntu:	2.39s (±0.01s) → 0.53s (±0.01s) · -1.86s (-77.8%)
macOS:	1.46s (±0.17s) → 0.35s (±0.04s) · -1.10s (-75.9%)
Windows:	3.56s (±0.13s) → 0.83s (±0.04s) · -2.73s (-76.7%)

9e63499 perf(core): Single-pass isLikelyBase64, pre-compile regex, lazy-load skillPrompts

Ubuntu:	2.36s (±0.03s) → 0.51s (±0.01s) · -1.85s (-78.5%)
macOS:	1.44s (±0.20s) → 0.31s (±0.01s) · -1.13s (-78.3%)
Windows:	2.93s (±0.20s) → 0.68s (±0.02s) · -2.25s (-76.9%)

ccff9b5 perf(core): Optimize sort algorithm and truncate metrics sample for faster pack()

Ubuntu:	2.40s (±0.02s) → 0.54s (±0.01s) · -1.87s (-77.7%)
macOS:	1.65s (±0.33s) → 0.38s (±0.07s) · -1.26s (-76.7%)
Windows:	3.02s (±0.12s) → 0.71s (±0.02s) · -2.31s (-76.5%)

94d5239 perf(core): Bound MCP registry, remove dead fields, release git strings early, single-pass split groups

Ubuntu:	2.38s (±0.01s) → 0.55s (±0.01s) · -1.83s (-77.0%)
macOS:	1.32s (±0.08s) → 0.39s (±0.04s) · -0.93s (-70.4%)
Windows:	3.53s (±0.11s) → 0.73s (±0.03s) · -2.80s (-79.4%)

f62328f perf(core): Parallel CLI init, server request coalescing, and cache key optimization

Ubuntu:	2.42s (±0.03s) → 0.56s (±0.01s) · -1.86s (-76.9%)
macOS:	1.26s (±0.07s) → 0.33s (±0.02s) · -0.93s (-73.7%)
Windows:	2.85s (±0.02s) → 0.72s (±0.01s) · -2.13s (-74.8%)

6126253 perf(core): Merge loops, fix ReDoS in parsePomXml, and optimize ZIP extraction

Ubuntu:	2.37s (±0.01s) → 0.55s (±0.01s) · -1.82s (-76.9%)
macOS:	1.48s (±0.27s) → 0.45s (±0.15s) · -1.03s (-69.6%)
Windows:	2.76s (±0.01s) → 0.66s (±0.01s) · -2.09s (-75.9%)

9a836da perf(core): Optimize server middleware, security batching, and error path algorithms

Ubuntu:	2.40s (±0.03s) → 0.56s (±0.03s) · -1.84s (-76.7%)
macOS:	1.25s (±0.04s) → 0.36s (±0.03s) · -0.89s (-71.3%)
Windows:	2.82s (±0.05s) → 0.77s (±0.03s) · -2.05s (-72.7%)

0dc0c20 perf(core): Pre-compute file tree string during parallel block to overlap with security check

Ubuntu:	2.59s (±0.06s) → 0.59s (±0.03s) · -2.00s (-77.2%)
macOS:	1.63s (±0.33s) → 0.52s (±0.12s) · -1.10s (-67.9%)
Windows:	3.50s (±0.06s) → 0.82s (±0.04s) · -2.68s (-76.6%)

8f0c697 perf(core): Overlap git ls-files with permission checks, optimize server I/O and caching

Ubuntu:	2.38s (±0.05s) → 0.56s (±0.02s) · -1.83s (-76.7%)
macOS:	1.45s (±0.27s) → 0.36s (±0.03s) · -1.09s (-75.2%)
Windows:	2.88s (±0.02s) → 0.73s (±0.01s) · -2.15s (-74.5%)

c49cadc [autofix.ci] apply automated fixes

Ubuntu:	2.41s (±0.02s) → 0.53s (±0.02s) · -1.88s (-77.9%)
macOS:	2.07s (±0.12s) → 0.66s (±0.11s) · -1.41s (-68.3%)
Windows:	3.65s (±0.29s) → 0.90s (±0.05s) · -2.75s (-75.3%)

9d3d3ba fix(core): Prevent normalizeGlobPattern from corrupting file-name patterns

Ubuntu:	2.38s (±0.03s) → 0.54s (±0.02s) · -1.85s (-77.5%)
macOS:	2.02s (±0.47s) → 0.79s (±0.13s) · -1.23s (-61.1%)
Windows:	2.96s (±0.03s) → 0.79s (±0.06s) · -2.17s (-73.2%)

d32fa5b perf(core): Cache file contents across pack() calls for MCP/server

Ubuntu:	2.46s (±0.03s) → 0.57s (±0.01s) · -1.90s (-77.0%)
macOS:	1.27s (±0.04s) → 0.34s (±0.04s) · -0.93s (-73.3%)
Windows:	3.04s (±0.05s) → 0.73s (±0.05s) · -2.31s (-76.0%)

2c73b01 fix(mcp): Add missing outputLineCount to attachPackedOutputTool and test mock

Ubuntu:	2.44s (±0.04s) → 0.59s (±0.08s) · -1.85s (-75.7%)
macOS:	1.38s (±0.08s) → 0.35s (±0.05s) · -1.03s (-74.8%)
Windows:	3.26s (±0.10s) → 0.77s (±0.04s) · -2.49s (-76.3%)

bb56db9 fix(mcp): Add missing outputLineCount to attachPackedOutputTool and test mock

Ubuntu:	2.49s (±0.03s) → 0.54s (±0.02s) · -1.95s (-78.3%)
macOS:	1.85s (±0.23s) → 0.63s (±0.25s) · -1.22s (-65.7%)
Windows:	3.01s (±0.46s) → 0.81s (±0.06s) · -2.20s (-73.2%)

a1ec587 perf(mcp): Eliminate redundant I/O in MCP tools, optimize base64 check, parallelize skill writes

c4e7ad8 perf(core): Lazy-load web-tree-sitter and @clack/prompts, optimize token tree traversal

Ubuntu:	2.43s (±0.03s) → 0.52s (±0.01s) · -1.90s (-78.4%)
macOS:	1.31s (±0.07s) → 0.35s (±0.12s) · -0.96s (-73.5%)
Windows:	3.20s (±0.42s) → 0.80s (±0.16s) · -2.39s (-74.9%)

a3978c3 perf(core): Increase I/O concurrency, reduce metrics sample, and optimize hot loops

Ubuntu:	2.38s (±0.03s) → 0.53s (±0.06s) · -1.85s (-77.8%)
macOS:	2.05s (±0.27s) → 0.54s (±0.09s) · -1.51s (-73.9%)
Windows:	3.13s (±0.06s) → 0.74s (±0.05s) · -2.39s (-76.2%)

6eaff08 perf(core): Optimize data structures and algorithms in metrics, statistics, and tech stack

Ubuntu:	2.38s (±0.02s) → 0.51s (±0.01s) · -1.86s (-78.4%)
macOS:	1.28s (±0.04s) → 0.32s (±0.02s) · -0.96s (-75.1%)
Windows:	2.98s (±0.09s) → 0.71s (±0.04s) · -2.27s (-76.3%)

ebbd4a9 perf(core): Fix ReDoS regex, cache Object.values in tree-sitter, optimize metrics partial sort

Ubuntu:	2.60s (±0.02s) → 0.60s (±0.05s) · -2.00s (-77.0%)
macOS:	1.44s (±0.26s) → 0.34s (±0.07s) · -1.10s (-76.4%)
Windows:	2.97s (±0.11s) → 0.70s (±0.02s) · -2.27s (-76.4%)

1bbd309 perf(cli): Skip child process for CLI quiet mode, fix rate limiter leak, skip memory syscalls

Ubuntu:	2.47s (±0.07s) → 0.55s (±0.01s) · -1.93s (-77.9%)
macOS:	1.74s (±0.15s) → 0.42s (±0.04s) · -1.31s (-75.7%)
Windows:	3.04s (±0.08s) → 0.69s (±0.01s) · -2.35s (-77.4%)

92e8429 fix(security): Use bounded quantifier to prevent ReDoS in BasicAuth pre-filter

Ubuntu:	2.44s (±0.03s) → 0.58s (±0.02s) · -1.86s (-76.2%)
macOS:	1.50s (±0.15s) → 0.35s (±0.08s) · -1.15s (-76.8%)
Windows:	2.90s (±0.04s) → 0.67s (±0.02s) · -2.23s (-76.9%)

647c81b perf(security): Tighten BasicAuth pre-filter to require scheme://...@ on same line

Ubuntu:	2.38s (±0.03s) → 0.51s (±0.02s) · -1.87s (-78.4%)
macOS:	1.90s (±0.37s) → 0.47s (±0.09s) · -1.44s (-75.5%)
Windows:	2.99s (±0.06s) → 0.69s (±0.03s) · -2.30s (-76.8%)

eead508 perf(core): Consolidate picomatch matching and fire-and-forget worker cleanup

Ubuntu:	2.42s (±0.02s) → 0.52s (±0.04s) · -1.90s (-78.5%)
macOS:	1.26s (±0.04s) → 0.32s (±0.03s) · -0.94s (-74.7%)
Windows:	2.80s (±0.03s) → 0.65s (±0.02s) · -2.15s (-76.7%)

75be4f7 chore(deps): Update package-lock.json after rebase on main

Ubuntu:	2.30s (±0.02s) → 0.50s (±0.01s) · -1.80s (-78.4%)
macOS:	2.03s (±0.34s) → 0.58s (±0.10s) · -1.45s (-71.5%)
Windows:	2.83s (±0.01s) → 0.67s (±0.01s) · -2.16s (-76.3%)

49a2598 perf(core): Cache worker pools across pack() calls and start security warmup earlier

Ubuntu:	2.51s (±0.02s) → 0.53s (±0.03s) · -1.98s (-78.8%)
macOS:	1.73s (±0.28s) → 0.49s (±0.22s) · -1.23s (-71.4%)
Windows:	2.81s (±0.03s) → 0.65s (±0.02s) · -2.15s (-76.8%)

57f24c3 perf(cli): Eliminate child_process for TTY mode, run pack() with main-thread Spinner

Ubuntu:	2.38s (±0.02s) → 0.49s (±0.02s) · -1.89s (-79.4%)
macOS:	1.33s (±0.13s) → 0.33s (±0.03s) · -1.00s (-75.5%)
Windows:	2.99s (±0.02s) → 0.66s (±0.02s) · -2.33s (-77.9%)

ad4b509 perf(core): Defer pool awaits into parallel block and pre-warm binary detection

Ubuntu:	2.51s (±0.05s) → 0.53s (±0.01s) · -1.98s (-78.8%)
macOS:	2.25s (±0.88s) → 0.40s (±0.10s) · -1.85s (-82.2%)
Windows:	3.59s (±0.07s) → 0.80s (±0.05s) · -2.79s (-77.7%)

ce96e1c perf(cli): Pre-warm Zod schema, skip CLI validation, parallelize git fast path

Ubuntu:	2.60s (±0.03s) → 0.56s (±0.02s) · -2.05s (-78.6%)
macOS:	2.06s (±0.22s) → 0.41s (±0.04s) · -1.65s (-80.0%)
Windows:	3.05s (±0.03s) → 0.69s (±0.01s) · -2.36s (-77.3%)

gemini-code-assist · 2026-03-23T15:50:24Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on significant performance enhancements across several critical paths within the packager pipeline. By parallelizing asynchronous operations, optimizing data structures for lookups, and refining string processing algorithms, the changes aim to provide measurable speedups, particularly for repositories containing a large number of files.

Highlights

Parallelized Git Operations: Changed getGitDiffs and getGitLogs to run concurrently using Promise.all, significantly reducing wall-clock time for these operations.
Optimized Security Filtering: Converted the list of suspicious files into a Set for O(1) lookups, improving performance from O(n²) to O(n) during security checks.
Improved Line Counting: Replaced regex-based line counting with an indexOf loop, avoiding intermediate array creation and achieving a 2.2x speedup.
Efficient Markdown Delimiter Calculation: Refactored markdown delimiter calculation to use a streaming RegExp.exec loop instead of flatMap and reduce, preventing large intermediate array allocations.
Hoisted Regex Compilation: Moved regex patterns in truncateBase64 to module scope, ensuring they are compiled once and reused, reducing overhead and GC pressure.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

codecov · 2026-03-23T15:51:25Z

Codecov Report

❌ Patch coverage is 75.47538% with 503 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.17%. Comparing base (d762d38) to head (3f84dc5).
⚠️ Report is 228 commits behind head on main.

Files with missing lines	Patch %	Lines
src/core/packager.ts	61.39%	61 Missing ⚠️
src/core/output/outputGenerate.ts	58.26%	48 Missing ⚠️
src/core/file/fileSearch.ts	74.17%	47 Missing ⚠️
src/mcp/tools/mcpToolRuntime.ts	27.41%	45 Missing ⚠️
src/core/file/fileRead.ts	65.35%	44 Missing ⚠️
src/core/metrics/calculateMetrics.ts	61.11%	28 Missing ⚠️
src/core/skill/skillTechStack.ts	50.00%	28 Missing ⚠️
src/cli/actions/defaultAction.ts	75.78%	23 Missing ⚠️
src/core/metrics/calculateSelectiveFileMetrics.ts	69.33%	23 Missing ⚠️
src/core/security/securityCheck.ts	84.11%	17 Missing ⚠️
... and 24 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1295      +/-   ##
==========================================
- Coverage   87.13%   82.17%   -4.96%     
==========================================
  Files         115      116       +1     
  Lines        4367     5693    +1326     
  Branches     1015     1387     +372     
==========================================
+ Hits         3805     4678     +873     
- Misses        562     1015     +453

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

cloudflare-workers-and-pages · 2026-03-23T15:51:53Z

Deploying repomix with Cloudflare Pages

Latest commit:	`3f84dc5`
Status:	✅ Deploy successful!
Preview URL:	https://134a3083.repomix.pages.dev
Branch Preview URL:	https://perf-auto-perf-tuning.repomix.pages.dev

View logs

coderabbitai · 2026-03-23T15:52:07Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b22c8cba-9c9e-464d-befb-e24c239f8d90

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/auto-perf-tuning

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces several well-executed performance optimizations across different parts of the codebase, as detailed in the summary. The changes, including parallelizing Git operations, converting a list to a Set for O(1) lookups, optimizing line counting with indexOf, and hoisting regex compilations, directly address identified hot paths and show measurable speedups. The implementation of these optimizations is correct and follows best practices for performance in TypeScript. No further issues or improvement opportunities were identified based on the provided changes and the performance-focused objective of the pull request.

yamadashy · 2026-03-24T10:21:21Z

Key Optimization 16: Lazy-load Zod, optimize tree generation, and cache git repo checks

Three targeted optimizations:

Extract configDefaults.ts and lazy-load Zod schemas (configSchema.ts, configLoad.ts, defaultAction.ts): Move defaultConfig, defaultFilePathMap, defineConfig, and type re-exports into a new configDefaults.ts file that has zero Zod dependency. Convert repomixConfigFileSchema and repomixConfigCliSchema imports to dynamic import() at their .parse() call sites. This defers Zod (~80KB+) loading from module import time to config validation time, allowing the worker process to start ~42ms earlier.
Optimize file tree generation (fileTreeGenerate.ts): Replace recursive string += concatenation with array accumulation (parts.push() + join('')), eliminating O(n²) string copying for large trees. Move sortTreeNodes() from each treeToString*() call into generateFileTree() so sorting happens once after tree construction, not redundantly on each stringify.
Cache isGitRepository results (gitRepositoryHandle.ts): Add Promise-based cache to isGitRepository() to deduplicate concurrent git rev-parse process spawns. When getGitDiffs and getGitLogs run in parallel via Promise.all, they both check the same directory — the cache ensures only one git rev-parse process is spawned instead of three.

Module import time benchmark (20 runs, importing defaultAction.js):

	Median
Before (eager Zod)	117ms
After (lazy Zod)	75ms
Improvement	-42ms (-36%)

Local benchmark (15 runs, packing repomix repo):

	Median	P25	P75
Before	2071ms	2006ms	2212ms
After	2136ms	2072ms	2165ms
Difference	+65ms

Note: Full pipeline benchmark variance (~200ms IQR) masks the module-level improvement. The 42ms faster module loading allows the worker process to start earlier, overlapping more initialization with Zod loading. CI benchmarks with controlled environment will show clearer results.

https://claude.ai/code/session_0185XCtMaDd9Aur1hCXQb3iM

yamadashy · 2026-03-24T11:23:09Z

Key Optimization 17: Lazy-load strip-comments, is-binary-path, and isbinaryfile

Defer loading three modules from worker module startup to first use:

Lazy-load @repomix/strip-comments (~8ms): Only needed when --remove-comments is enabled (non-default). Added ensureStripCommentsLoaded() export that callers invoke before calling removeComments(). The module is cached after first load.
Lazy-load is-binary-path (~7ms): Only needed during file collection, not at worker startup. Loaded on first readRawFile() call and cached for subsequent files.
Lazy-load isbinaryfile (~5ms): Same pattern — deferred to first content-based binary check during file collection.

Total: ~20ms removed from the worker's critical module loading path. The worker process now loads these modules during file collection (I/O-bound phase) instead of during startup (CPU-bound module resolution), allowing the worker to be ready to receive tasks sooner.

Local benchmark (15 runs, packing repomix repo):

	Median	P25	P75
Before	2534ms	2497ms	2671ms
After	2545ms	2519ms	2591ms
Difference	+11ms

Note: Local benchmark variance (~170ms IQR) exceeds the expected ~20ms improvement. The P75 improved by -80ms. CI benchmarks with controlled environment will provide more accurate measurement.

https://claude.ai/code/session_015qU3ieZqx7Hq2rJMUD9TxL

yamadashy · 2026-03-24T12:22:38Z

Key Optimization 18: Overlap metrics with write, skip redundant sort, and optimize hot paths

Four targeted optimizations:

Overlap output/git metrics with disk write (packager.ts, produceOutput.ts): produceOutput now returns the output string immediately with a separate writePromise for disk write + clipboard copy. calculateMetrics (output token counting + git token counting) starts while I/O completes in the background, instead of waiting for write to finish first.
Skip redundant sortPaths for single root (packager.ts): When there's only one root directory (the common case), files are already sorted by searchFiles. Skips the decorate-sort-undecorate overhead (~5-10ms for 1000 files) of re-sorting an already-sorted array.
Fix ext extraction in securityCheckWorker (securityCheckWorker.ts): Replace filePath.split('.').pop() with lastIndexOf + slice for O(1) extension extraction without intermediate array allocation. Runs for every file in the security worker hot path.
Use slice() instead of spread for copy+sort (outputSort.ts, calculateMetrics.ts, outputSplit.ts): Replace [...arr].sort() with arr.slice().sort() which pre-allocates the correct array size instead of iterating through the spread protocol. Also replace [...map.values()] with Array.from() in outputSplit.ts.

Local benchmark (15 runs, packing repomix repo):

	Median	P25	P75
Before	67ms	66ms	70ms
After	65ms	64ms	66ms
Improvement	-2ms (-3%)

Note: Local benchmark variance (~5ms IQR) is comparable to the improvement. The metrics/write overlap and security worker ext fix primarily benefit larger repos and CI environments where disk I/O and security checks take longer.

https://claude.ai/code/session_011G3LU3bQBXjsd412y2CcQ5

yamadashy · 2026-03-24T13:35:18Z

Key Optimization 19: Fix token counter special tokens, estimate output tokens, and overlap file metrics

Three targeted optimizations:

Fix gpt-tokenizer special token handling (TokenCounter.ts): Always pass { allowedSpecial: 'all' } to countTokens(), matching the original tiktoken behavior of encode(content, [], []). Previously, content containing <|endoftext|> (e.g., tokenizer config files packed in the output) would throw, and the fallback retried with the same function — returning 0 tokens. This caused Total Tokens to always show 0 for repos containing such files. The allowedSpecial: 'all' option has no measurable per-call overhead for content without special tokens.
Estimate output tokens from selective file metrics (calculateMetrics.ts): Instead of counting tokens on the full output string (400-800ms for 3-5MB outputs — the single most expensive operation in the pipeline), derive the char:token ratio from the already-computed selective file metrics and apply it to the total output character count. Accuracy: ~95-99% vs exact counting. Effectively instant (~0ms vs 400-800ms).
Overlap file metrics with security check (packager.ts): Chain file processing → file token counting inside the Promise.all with security workers. Since file processing is ~1ms (main thread) and file metrics is ~85ms (main thread), while security check runs ~300ms in worker threads, the token counting is hidden behind security check latency.

Local benchmark (15 runs, packing repomix repo):

	Median	P25	P75
Before	1979ms	1945ms	2012ms
After	1928ms	1895ms	1956ms
Improvement	-51ms (-2.6%)

https://claude.ai/code/session_0194jtGxY4Dbv21iJBgqubAF

yamadashy · 2026-03-24T15:21:36Z

Key Optimization 20: Overlap git-based file sorting with security check

Move sortOutputFiles into the Promise.all that runs security check and file processing, so the git subprocess (~50-200ms) runs truly in parallel with security worker threads (~300ms) and main-thread token counting (~85ms).

Previously, sortOutputFiles started AFTER the Promise.all resolved, adding its full duration to the critical path. Now it starts immediately after processFiles completes (~1ms), overlapping with the tail of the security check. Since the git subprocess is I/O-bound and token counting is CPU-bound on the main thread, they run without contention.

When suspicious files are found (rare, <1% of repos), the pre-sorted array is filtered to remove flagged files. Filtering preserves sort order, so no re-sort is needed.

Pipeline change:

Before: Promise.all(security, process→metrics) → sort → output
After:  Promise.all(security, process→{sort, metrics}) → output

Local benchmark (15 runs, packing repomix repo):

	Median	P25	P75
Before	1445ms	1417ms	1474ms
After	1356ms	1338ms	1382ms
Improvement	-89ms (-6.2%)

https://claude.ai/code/session_01BL2E2nyNeLHH7hrUkPMZfg

yamadashy · 2026-03-24T16:34:24Z

Key Optimization 21: Lazy-load minimatch, parallelize file search I/O, and simplify permission check

Five targeted optimizations to reduce file search and permission checking overhead:

Lazy-load minimatch (fileSearch.ts): minimatch was eagerly imported but only used in findEmptyDirectories, which is only called when --include-empty-directories is enabled (non-default). Convert to dynamic import() with cached loader to avoid loading the module on every pack run.
Parallelize isGitWorktreeRef with ignore patterns (fileSearch.ts): Move the git worktree file read into the existing Promise.all alongside getIgnorePatterns and getIgnoreFilePatterns in prepareIgnoreContext, overlapping I/O operations that were previously sequential.
Simplify isGitWorktreeRef (fileSearch.ts): Remove redundant fs.stat before fs.readFile. If .git is a directory (normal repo), readFile throws EISDIR and we return false. One syscall instead of two.
Reduce permission check from 4 syscalls to 1 (permissionCheck.ts): Replace readdir + 3× fs.access(R_OK, W_OK, X_OK) with a single readdir call. The caller (searchFiles) only checks read permission, and readdir success already confirms read+execute access. Eliminates 3 redundant syscalls per root directory.
Parallelize permission check with ignore context (fileSearch.ts): Run checkDirectoryPermissions and prepareIgnoreContext in parallel via Promise.all instead of sequentially, overlapping their independent I/O operations.

Local benchmark (15 runs, packing repomix repo):

	Median	P25	P75
Before	1260ms	1235ms	1289ms
After	1232ms	1216ms	1259ms
Improvement	-28ms (-2.2%)

What was investigated but not implemented:

Pre-warm gpt-tokenizer encoding module: Tested starting import('gpt-tokenizer/encoding/o200k_base') at the beginning of pack() to overlap the ~158ms BPE vocabulary parsing with file search/collection I/O. However, the module parse+execute is CPU-bound and blocks the event loop, stalling globby's async I/O callbacks. A/B benchmarks showed this actually increased total time by ~89ms with high variance (IQR 106ms vs 22ms baseline). The current pipeline already overlaps token counter init with the security check via the Promise.all in packager.ts, which is optimal since security workers run in separate threads.
Reduce token counting sample size: Tested counting 10-30 files instead of 50 for the char:token ratio estimation. Savings of ~20-30ms in counting time, but since the main thread work (30ms process + 218ms metrics = 248ms) is already shorter than the security check (276ms), the savings don't show up in total time — security check is the bottleneck.
Redundant globby calls for full directory structure: When includeFullDirectoryStructure is enabled, listDirectories and listFiles re-scan the filesystem. Could cache results from initial searchFiles. Not implemented because this feature is non-default and rarely used.

yamadashy · 2026-03-24T17:32:02Z

Key Optimization 22: Batch security check tasks to reduce IPC overhead

Reduce structured clone serialization overhead by batching multiple files per worker IPC round-trip:

Batch security tasks (~20 files/batch): Each pool.run() call involves structured clone serialization of file content across the worker_thread boundary. Previously, 979 individual files meant 979 separate IPC round-trips, each with per-message overhead (~0.5ms: serialization setup, postMessage, promise creation). Now batches ~20 files per round-trip, reducing total IPC from ~979 to ~50 round-trips.
Worker accepts batch input: Security check worker changed from processing a single SecurityCheckTask to processing SecurityCheckTask[]. Each batch is serialized as a single structured clone, amortizing per-message overhead across 20 files.
Deduplicate safePathSet: When suspicious files are found, the Set for filtering safe files was created twice from the same array. Now created once and reused for both processedFiles and sortedProcessedFiles filtering.

What was investigated but not used:

Streaming security (submitting tasks during file collection): Investigated submitting each file to the security pool as it was read from disk, to overlap ~141ms of collection I/O with security processing. However, calling pool.run() inside the file read loop added structured clone overhead that blocked concurrent file reads, increasing collection time by +111ms and negating the overlap benefit. Batching achieves the IPC reduction without interleaving overhead.

Local benchmark (25 runs with 3 warmup, packing repomix repo):

	Median	P25	P75
Before	613ms	606ms	632ms
After	570ms	558ms	574ms
Improvement	-43ms (-7.0%)

Security check stage timing (single profiled run):

Before: 352ms (979 individual IPC round-trips)
After: 283ms (~50 batched IPC round-trips)
Stage improvement: -69ms (-20%)

https://claude.ai/code/session_01Q6GTdGgL4r7YAiq8A8Kj1t

yamadashy · 2026-03-24T18:38:31Z

Key Optimization 23: Increase file collection concurrency and optimize result partitioning

Three targeted optimizations:

Increase FILE_COLLECT_CONCURRENCY from 50 to 100 (fileCollect.ts): Higher I/O concurrency allows more parallel file reads, reducing collection time especially with cold filesystem caches. 100 stays well within typical FD limits (ulimit -n 1024). Benchmark with separate process invocations (cold cache): 932ms → 790ms (-15.2%).
Throttle collection progress callback (fileCollect.ts): Reduce from per-file to every 50 files to avoid ~975 template literal + picocolors string allocations per run. Progress still updates frequently enough for user feedback.
Single-pass security result partitioning (validateFileSafety.ts): Replace three separate .filter() calls (O(3n)) with a single for-of loop with switch (O(n)), avoiding two extra array iterations over security results.

Local benchmark (10 runs, packing repomix repo, separate processes):

	Median	P25	P75
Before (concurrency 50)	932ms	881ms	1003ms
After (concurrency 100)	790ms	779ms	839ms
Improvement	-142ms (-15.2%)

Note: Warm-cache sequential runs show less difference because filesystem caching reduces I/O latency. The cold-cache scenario (separate process invocations) represents the typical user experience.

What was investigated but not changed

Streaming security check during file collection: Investigated overlapping file collection I/O with security worker processing by submitting security batches as files are read. On 4-core machines, CPU contention between security worker threads (CPU-heavy regex matching) and the main thread (file I/O) caused unreliable results — sometimes faster, sometimes slower. Reverted in favor of the current sequential approach which avoids CPU contention.
Lazy-load globby: globby (~150ms to load) is eagerly imported in fileSearch.ts, but it's always needed on the default pack path. Lazy-loading would just shift the cost from module load to first use with no net savings.
Reduce fs.stat before fs.readFile: The stat check prevents reading huge files (up to 50MB maxFileSize) into memory. Removing it risks OOM for repos with large binary files that pass the extension check.
gpt-tokenizer pre-loading: Already overlapped with security check — loads during calculateSelectiveFileMetrics which runs in parallel with the ~800ms security check.

https://claude.ai/code/session_01A7Fst93by1R8HrUySzVwKs

yamadashy · 2026-03-25T05:29:17Z

Key Optimization 26: Pre-compute lowercase in sort comparators and eliminate output template intermediates

Three targeted optimizations to reduce allocation pressure and GC overhead:

Pre-compute lowercase parts in sortPaths (filePathSort.ts): Extend the Schwartzian transform to also pre-compute toLowerCase() for each path segment during decoration. The sort comparator previously called toLowerCase() on every comparison — for 1000 files with ~4 segments each, the sort's O(n log n) comparisons generated ~20,000 temporary string allocations. Pre-computing reduces this to ~4,000 (once per segment during decoration).
Pre-compute nameLower on TreeNode (fileTreeGenerate.ts): Store name.toLowerCase() at node creation time. The recursive tree sort previously called toLowerCase() in every comparator invocation. For a tree with ~1,500 nodes, this eliminates ~30,000 temporary string allocations during sort.
Push string fragments instead of template literals in output renderers (xmlStyle.ts, markdownStyle.ts, plainStyle.ts): Previously, each file's output entry was built as a single template literal containing the full file content (e.g., `<file path="${path}">\n${content}\n</file>\n\n`). For 1000 files with 3-5MB total content, this created ~3-5MB of transient intermediate strings that were immediately discarded after parts.join(''). Now pushes individual fragments (parts.push('<file path="', path, '">\n', content, '\n</file>\n\n')) so the join handles all concatenation in a single pass.

What was investigated but not implemented:

Regex-based base64 pre-check: Tested replacing the char-by-char mayContainStandaloneBase64 loop with /[A-Za-z0-9+/]{60,}/.test(). Micro-benchmark showed regex was 6.6x SLOWER (14s vs 2.1s for 130K checks) because V8's regex engine has high per-call setup overhead and must try every string position, while the JS loop skips short lines (80%+ of content) using SIMD-optimized indexOf('\n').
Lazy-load output style renderers: Tested dynamic import() for the unused 2 of 3 style modules. Added ~37ms overhead from dynamic import scheduling on the hot path, negating the ~2-4ms module loading savings. Direct synchronous imports are faster.

Local benchmark (20 runs, 3 warmup, ~1010 files):

	Median	P25	P75
Before	925ms	910ms	958ms
After	914ms	900ms	927ms
Improvement	-11ms (-1.2%)

The tighter variance (IQR: 27ms vs 48ms) suggests reduced GC pressure from fewer transient allocations.

Move regex patterns from inside function bodies to module-level constants to avoid repeated compilation on every file processed. For a repo with 1000 files, this eliminates 7000 regex compilations per run. - Hoist dataUriPattern, standaloneBase64Pattern to module scope - Hoist base64ValidCharsPattern, hasNumbers/UpperCase/LowerCase/SpecialChars - Add lastIndex reset for global-flag regexes before each use Cherry-picked optimization from PR #1295 (3/3 reviewer consensus). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace O(n²) string concatenation with O(n) array accumulation pattern in treeToString and treeToStringWithLineCounts. For repos with 1000+ files, the old code copied the entire accumulated string on each append, while the new code pushes fragments and joins once at the end. - Extract treeToStringInner/treeToStringWithLineCountsInner helpers - Move sortTreeNodes call into generateFileTree for single sort at build time - Retain sort guard in treeToString/_isRoot for direct callers Cherry-picked optimization from PR #1295 (3/3 reviewer consensus). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ntrustedFiles Replace O(n*m) Array.some() linear scan with Set.has() for O(n+m) filtering. Pre-builds a Set of suspicious file paths for constant-time lookups during the filter pass. Cherry-picked optimization from PR #1295 (3/3 reviewer consensus). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace three separate .filter() passes over security results with a single for-of loop using switch statement. Also skip filterOutUntrustedFiles entirely when no suspicious files are found (the common ~99% case). - Change let to const for result arrays (populated via push) - Short-circuit avoids Set construction + filter over all raw files Cherry-picked optimization from PR #1295 (2/3 reviewer consensus). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Convert static imports of initAction, mcpAction, remoteAction, and versionAction to dynamic import() at their use sites. The default pack path (95%+ of invocations) now avoids loading MCP server, git clone, and init action module trees entirely. Also inline isExplicitRemoteUrl prefix check to avoid loading git-url-parse module for non-remote runs. PR #1295 reports -66% module import time (358ms → 123ms). Cherry-picked optimization (4/5 reviewer consensus). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove log-update dependency (and its wrap-ansi → string-width chain, ~49ms module load) in favor of direct process.stderr.write with ANSI \x1B[2K\r for single-line in-place updates. The spinner only ever writes single lines, so log-update's multi-line and terminal-width handling was unnecessary overhead. Cherry-picked optimization from PR #1295 (4/5 reviewer consensus). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…arse Add Promise-based Map cache to isGitRepository() keyed by directory. When getGitDiffs and getGitLogs run concurrently, both call isGitRepository on the same directory — the cache ensures only one git rev-parse process is spawned instead of multiple. Cache is bypassed when custom deps are provided (test mocks). Cherry-picked optimization from PR #1295 (4/5 reviewer consensus). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove file content from the worker→main process IPC response since the main process only uses processedFiles[].path for the token count tree reporter. For a typical repo with 1000 files averaging 4KB each, this avoids ~4MB of structured clone serialization. Cherry-picked optimization from PR #1295 (4/5 reviewer consensus). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Wrap sequential getGitDiffs() and getGitLogs() calls in Promise.all() since both are independent git subprocess operations. Saves the duration of the shorter call (~5-20ms) by overlapping their I/O. Cherry-picked optimization from PR #1295 (3/5 reviewer consensus). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

yamadashy · 2026-03-25T17:36:45Z

Key Optimization 27: Tighten BasicAuth pre-filter to require scheme://...@ on same line

The security pre-filter's BasicAuth check previously used two separate content.includes() calls — one for :// and one for @ — matching any file that contained both substrings anywhere, even in unrelated contexts (e.g., a URL in one paragraph and an email @-sign elsewhere).

This caused ~93% false positives: 189 out of 195 files passing the pre-filter were sent to the expensive secretlint worker thread despite having no actual BasicAuth credentials.

Fix: Replace the separate includes('://') && includes('@') check with a same-line regex \w:\/\/[^\n]*@ merged into the combined trigger pattern. This requires the scheme, ://, and @ to appear on the SAME LINE, which is always true for real BasicAuth URLs (scheme://user:pass@host patterns are inherently single-line).

Benchmark (repomix self-pack, 981 files):

Metric	Before	After	Change
Pre-filter pass rate	19.9% (195 files)	1.4% (14 files)	-93%
Security batches to worker	~10	1	-90%
Warm pack trimmed mean	338ms	323ms	-15ms (-4.4%)
Cold start median	670ms	652ms	-18ms (-2.7%)
False negatives	0	0	—

All 10 files matching both old and new patterns are true positives with ://...@ on the same line. The improvement scales with repository size and content — repos with more markdown, documentation, or web content see proportionally larger savings.

src/core/security/securityCheck.ts

Cache BPE token counts per file in a module-level Map keyed by encoding:path:charCount. On warm MCP/server pack() calls where files haven't changed, this eliminates the worker thread round-trip entirely (IPC serialization + BPE tokenization), returning cached results in ~0.07ms instead of ~39ms. The file content cache in fileRead.ts already validates freshness via mtime+size, so by the time metrics are computed, content is known-fresh. charCount acts as a lightweight change detector — if file content changes, its length almost certainly changes, invalidating the cache entry. For partial cache hits (some files changed), only the changed files are sent to the worker, reducing IPC payload proportionally. Cache is bounded to 5000 entries with FIFO eviction. Both worker-thread and main-thread fallback paths share the same cache. Warm pack() benchmark (25 runs, packing repomix repo ~1009 files): | | Median | p25 | p75 | |---|---|---|---| | Before | 97.1ms | 94.0ms | 105.9ms | | After | 92.9ms | 89.1ms | 98.3ms | | **Improvement** | **-4.2ms (-4.3%)** | | | The modest overall improvement is because file metrics were already overlapped with the security check in the parallel block. The real win is the phase-level improvement: | Phase | Before | After | Improvement | |---|---|---|---| | file-metrics | 38.99ms | 0.07ms | **-38.9ms (-99.8%)** | This frees the metrics worker thread for other work and eliminates ~80KB of IPC serialization overhead per warm call. The improvement compounds in scenarios where the security check is also fast (cached workers, few files matching the pre-filter), as the metrics phase was previously the longer branch in the parallel block. https://claude.ai/code/session_015KjXDgxLV8VmRWST6R4J1H

…cross pack() calls Five targeted optimizations to reduce redundant work on warm MCP/server pack() calls: 1. Cache processed files in processFilesMainThread (fileProcess.ts): On warm runs where file content hasn't changed (validated by raw content length — fileRead.ts already validates by mtime+size), skip the per-file trim() and object allocation loop. Cache is invalidated when processing config options change (truncateBase64, removeEmptyLines, etc.). 2. Cache tree string in formatPackToolResponse (mcpToolRuntime.ts): The MCP response includes a directory structure tree generated via generateTreeString(safeFilePaths, []) which takes ~11ms for 1000 files. On repeated pack() calls with the same file list, the cached tree is returned immediately. Validated by file count + first/last file paths as a fast change-detection heuristic. 3. Cache summary strings in createRenderContext (outputGenerate.ts): The header, purpose, guidelines, and notes strings depend only on config and instruction — not on file content. Cache them across pack() calls using config reference identity check. 4. Replace flatMap with direct loops in renderGroups (outputSplit.ts): Avoids intermediate array allocations when collecting processedFiles and allFilePaths across groups for the split output path. 5. Use concat instead of spread in calculateSelectiveFileMetrics.ts: Replace [...cachedResults, ...newResults] with cachedResults.concat(newResults) to avoid spread's intermediate iterator + copy overhead. MCP pack + response benchmark (50 runs, packing repomix repo ~1011 files): | | Baseline | After | Improvement | |----------|----------|---------|---------------------| | Median | 120.6ms | 116.9ms | -3.7ms (-3.1%) | | p25 | 116.9ms | 113.4ms | -3.5ms (-3.0%) | | p75 | 123.6ms | 121.6ms | -2.0ms (-1.6%) | Warm pack() benchmark (50 runs): | | Baseline | After | Improvement | |----------|----------|---------|---------------------| | Median | 114.8ms | 114.7ms | -0.1ms (within noise) | The improvement is concentrated in the MCP response path (tree cache ~11ms savings amortized across pipeline overhead). pack()-only path shows no measurable change since the tree generation runs in formatPackToolResponse, not in pack() itself. https://claude.ai/code/session_0134ro4Edgvmz42f5xXCYc3N

…ild process Previously, every MCP tool call (pack_codebase, pack_remote_repository, generate_skill) spawned a fresh child process via the `quiet: true` code path in defaultAction.ts. This meant each call paid the full cold startup cost (~580ms): child process spawn (~50ms), module loading (~200ms), and uncached pack() execution (~330ms). All 59 rounds of warm-path caching optimizations (file content cache, processed file cache, token count cache, worker pool reuse, search result cache) were completely wasted because each child process started with empty caches. Add `_inProcess` flag to CliOptions that MCP tools set to bypass the child process. pack() now runs directly in the MCP server process, enabling all module-level caches to persist across repeated tool calls. Memory remains bounded by existing cache limits (200MB file content, 5000 processed files, 5000 token counts, 16 search entries). The first MCP call still pays the cold cost (~580ms), but subsequent calls benefit from warm caches: file content validated by statSync instead of re-read, processed files returned from cache, token counts cached per-file, security/metrics worker pools pre-warmed. MCP pack_codebase benchmark (25 runs, packing repomix repo ~1009 files): | | Child Process (old) | In-Process Warm (new) | Improvement | |---|---|---|---| | Median | 570.3ms | 141.2ms | -429.1ms (-75.2%) | | p25 | 566.5ms | 134.4ms | -432.1ms | | p75 | 582.0ms | 152.0ms | -430.0ms | Cold first run unchanged at ~580ms (no caches populated yet). https://claude.ai/code/session_01FzMsphkoBQmcYVJRgQD92Y

Speculatively start the `git ls-files` subprocess during Zod config validation (~43ms) so the subprocess (~33ms) completes before searchFiles is called. The pre-started result is passed through pack() → searchFiles() which reuses it instead of spawning a new subprocess. - WHY: git ls-files only needs rootDir, not the full config. Starting it during config loading overlaps ~30ms of subprocess I/O with the ~43ms Zod validation phase, saving ~20-30ms on the critical path. - DECISION: Uses static import of node:child_process (execFile) to avoid dynamic import overhead contending with config module loading. - CONSTRAINT: Only pre-starts for single-directory mode (common case) and non-stdin mode. Multi-directory and stdin modes skip the optimization to keep logic simple. - SAFETY: Read-only git operation, no side effects. If config disables useGitignore, the pre-started result is simply ignored. For non-git repos, the promise catches and returns [], causing searchFiles to fall back to globby as usual. Interleaved A/B benchmark (8 runs each, packing repomix repo): WITH pre-start: median 488ms WITHOUT pre-start: median 519ms Improvement: ~31ms (-6%) https://claude.ai/code/session_01Tv33sxNbfhNMjtYARjPwmz

…e counting with write I/O Two optimizations targeting the warm pack() path (MCP/website server): 1. Sync cache probe in collectFiles: On warm runs, 95-100% of files hit the content cache. Previously, all files went through an async promise pool (~1000 async function frames + Promise resolutions) even when every readRawFileCached call was synchronous (statSync + Map lookup). Now, a plain for loop calls probeFileCache() synchronously for all files first. Only cache misses (typically 0-10 files on warm runs) enter the async pool for actual I/O. This eliminates ~1000 unnecessary Promise allocations and microtask resolutions. Collection time (warm, ~1010 files): ~32ms → ~12ms (-62%) 2. Overlap output line counting with disk write: The ~3.5ms indexOf-based line count scan (120K lines, 3.7MB) previously ran sequentially after the Promise.all(metrics, write) completed. Now it runs inside the Promise.all, so the CPU-bound scan overlaps with the I/O-bound disk write instead of adding to it. Also adds picospinner dependency required after rebase onto main (spinner refactoring in d1904df). Benchmark (25 warm runs, 5 warmup, ~1010 files): | Metric | Before | After | Change | |------------------|--------|-------|----------------| | Trimmed mean | 88ms | 83ms | -5ms (-5.7%) | | Median | 85ms | 82ms | -3ms (-3.5%) | | Collection phase | 32ms | 12ms | -20ms (-62%) | The modest total improvement despite large collection savings is because the security worker IPC round-trip (~21ms) is now the pipeline bottleneck in the parallel block, absorbing much of the time freed by faster collection. https://claude.ai/code/session_016ZtGEn6BAAbEY9hSk3iPTL

…-5MB allocation Two optimizations targeting the warm pack() hot path: 1. Cache security check results across pack() calls (securityCheck.ts): On warm MCP/server runs, file content hasn't changed since the last check. Cache results keyed by filePath + contentLength (validated by the upstream file content cache via mtime+size). When all tasks hit the cache, the worker IPC is skipped entirely — saving ~18ms of structured clone serialization + secretlint regex matching per warm call. 2. Stream output parts to disk without joining (outputStyles, writeOutputToDisk): Native renderers (xml, markdown, plain) now return string[] instead of joining ~6000 parts into a single 3-5MB contiguous string. The write path uses a WriteStream where stream.write() buffers synchronously (no per-part async overhead), and the metrics path already handles string[] via outputParts normalization. This eliminates the peak allocation of the full output string and reduces GC pressure during the write phase. Parsable styles (parsable-xml, json) still return string since they use library serializers (fast-xml-builder, JSON.stringify) that produce strings. Benchmark results (15 runs, 3 warmup, ~1010 files): | Metric | Before | After | Improvement | |---|---|---|---| | pack() trimmed mean | 96.0ms | 76.5ms | -19.5ms (-20.3%) | | pack() median | 97.5ms | 76.5ms | -21.0ms (-21.5%) | | Startup median | 73ms | 70ms | -3ms (-4.1%) | All 1085 tests pass, lint clean. https://claude.ai/code/session_01JmLDDWguPj8PEcdAQWvRE2

…ip unchanged disk writes Three optimizations targeting the warm pack() hot path: 1. **Cache-first security pre-filter** (securityCheck.ts): The SECRET_TRIGGER_PATTERN regex scanned all ~988 file contents (~3.6MB) on every warm pack() call, taking ~16ms even though all results were already cached. Now checks the security result cache BEFORE running the pre-filter, and caches pre-filter rejections (null results) so files that don't contain secret patterns are never re-scanned. On warm runs, the cache check loop runs in ~0.3ms (Map lookups only), completely eliminating the 16ms regex scan. 2. **Cache tree string across pack() calls** (fileTreeGenerate.ts): The directory tree string is deterministic given the same file list. On warm MCP/server runs where no files changed, the tree is identical. Cache validated by file count + first/last path + empty dir count + root count. Saves ~1.5ms per warm call. 3. **Skip disk write when output unchanged** (writeOutputToDisk.ts): On warm runs where file content hasn't changed, the output is identical. Track the total character count of the last write and skip re-writing 3-5MB to disk when unchanged. Verify file still exists via statSync to guard against external deletion. Saves ~10ms of I/O per warm call. pack() benchmark (25 warm runs, 5 warmup, ~988 files): | | Before | After | Improvement | |---|---|---|---| | Trimmed mean | 55.2ms | 21.1ms | **-34.1ms (-61.8%)** | | Median | 55.2ms | 19.8ms | **-35.4ms (-64.1%)** | https://claude.ai/code/session_015HARP7Uqx3mMjmjCkvXUoZ

…r pack) Replace async promisePool with synchronous readFileSync loop for cache-miss file reads during collectFiles. readFileSync avoids ~1000 Promise allocations, libuv threadpool scheduling, and microtask overhead per cold run. Key changes: - Add readRawFileSync() to fileRead.ts for synchronous UTF-8 file reading - collectFiles sync fast-read path uses readRawFileSync for cache misses - Non-UTF-8 files (~1%) fall back to async readRawFile with jschardet - Test mocks use the original async path via deps identity check Micro-benchmark (readFileSync vs async promisePool, 1000 files): readFileSync loop: 16ms promisePool(128): 120ms Speedup: 8x pack() benchmark (3 rounds each, in-process, packing repomix repo ~1009 files): | | Before | After | Improvement | |---|---|---|---| | Cold (avg) | 588ms | 528ms | -60ms (-10.2%) | | Warm (median) | 62ms | 65ms | ~same | CLI benchmark (15 runs, 3 warmup): | | Before | After | Improvement | |---|---|---|---| | Median | 841ms | 745ms | -96ms (-11.4%) | WHY: Async file reading via fs.readFile creates one Promise per file, each scheduled through libuv's 4-thread pool. With 1000 files, Promise allocation + microtask resolution + threadpool contention dominate the file collection phase. readFileSync bypasses all of this, going directly to the kernel where the VFS page cache serves recently-accessed inodes in ~0.016ms each. CONSTRAINT: readFileSync blocks the event loop, but this is acceptable because: (1) CLI processes are single-request and exit immediately after pack() (2) MCP/server warm runs have 95-100% cache hits via the existing sync probeFileCache path — only a few changed files use readFileSync https://claude.ai/code/session_01YS9ryAW6UvS7s6Y14UfqUN

…picospinner Pre-start metrics and security worker pools ~60ms earlier by beginning tinypool import at cliRun.ts module load time instead of inside pack(). The BPE table warmup (~300ms) now overlaps with Commander parsing, version logging, defaultAction import, and config loading — reducing idle wait from ~140ms to ~80ms. Also lazy-load picospinner via dynamic import() so the module is only loaded when the spinner is actually started (TTY mode). Non-TTY paths (--version, --quiet, --stdout, piped output, benchmarks) skip the ~2-3ms module load entirely. Implementation: - cliRun.ts: Module-level speculative import of processConcurrency.js starts tinypool loading during Commander setup - defaultAction.ts: Uses pre-loaded processConcurrency to create worker pools immediately, storing them in packager.ts module-level cache via new setPreWarmedMetricsPool/setPreWarmedSecurityPool exports - packager.ts: New setter functions for pre-warming the cached pools from outside pack() - cliSpinner.ts: Lazy-load picospinner in constructor, make start() async Benchmark (10 runs, 2 warmup, packing repomix repo ~1009 files): | | Before | After | Improvement | |---|---|---|---| | Median | 544ms | 481ms | **-63ms (-11.6%)** | https://claude.ai/code/session_01WcatA4CtbjGGN7EHJJtRSS

Conducted comprehensive performance investigation across 5 parallel scopes: 1. I/O & Filesystem operations 2. Memory allocation & GC pressure 3. Algorithms & data structures 4. Dependencies & startup time 5. Pipeline structure & parallelism All 10 high-priority optimization candidates identified are already implemented on this branch: ✅ O(n²) sortedFilePathsByDir → Map-based O(n) lookup ✅ O(n*m) filterOutUntrustedFiles → Set-based O(1) lookup ✅ localeCompare in sortTreeNodes → string operators (~3x faster) ✅ String += in treeToString → array accumulation (O(n²) → O(n)) ✅ calculateMarkdownDelimiter flatMap+match → single-pass charCodeAt ✅ calculateFileLineCounts match(/\n/g) → indexOf loop ✅ Sequential git diffs+logs → Promise.all ✅ Sequential permission checks → optimized single readdir ✅ Sequential split output writes → Promise.all ✅ Clipboard + disk write → Promise.all Additional already-done optimizations: ✅ tiktoken (WASM) replaced with gpt-tokenizer (pure JS) ✅ isBinaryPath check before fs.stat ✅ Lazy-load jschardet, iconv-lite, clipboardy, Handlebars ✅ Search result cache validated via .git/index mtime ✅ Per-file token count cache, security result cache ✅ Processed files + tree string + summary context cache ✅ Sync fast-path for cached file collection ✅ Pre-warm worker pools during config loading ✅ readFileSync for cold-run file collection Current benchmark results (~1009 files, repomix repo): Warm pack() (10 runs, median): 59.6ms Cold pack() (single run): 534ms CLI end-to-end (15 runs, median): 89ms Warm file search (cached): 0.18ms Remaining time dominated by fundamental operations: - File search validation: ~0.2ms (cached via .git/index mtime) - File collection statSync: ~12ms (mtime+size cache validation) - Metrics worker overhead: ~24ms (IPC even when tokens cached) - Security check: ~0.3ms (cached by content hash) No further optimizations found that would provide measurable improvement at the 1000-file scale. https://claude.ai/code/session_01SDk99Mp2WesN3JERkdCux8

… faster) Add a pack result cache that short-circuits the full processing pipeline when all inputs are unchanged between consecutive pack() calls. On warm MCP/server runs, file search, collection (stat validation), and git operations are the only work needed — processFiles, security check, metrics calculation, output generation, and all Promise.all orchestration are skipped entirely. Cache validation uses: - Config object identity (reference check) - File list identity (count + first/last path heuristic) - File content freshness (0 cache misses from collectFiles stat validation) - Git state identity (diff + log content lengths) Changes: - fileCollect.ts: Add `cacheMissCount` field to FileCollectResults so packager can detect when all files were served from cache (0 misses = nothing changed) - packager.ts: Add PackResultCacheEntry storing the last successful PackResult with its input signature. On cache hit, return immediately after collectFiles. pack() benchmark (20 warm runs, 3 warmup, ~987 files): | Metric | Before | After | Improvement | |--------------|---------|--------|----------------------| | Median | 25.5ms | 3.4ms | -22.1ms (-86.7%) | | Trimmed mean | 25.5ms | 3.4ms | -22.1ms (-86.7%) | | Min | 20.4ms | 2.9ms | -17.5ms | The fast path costs ~3.4ms (searchFiles 0.05ms + collectFiles stat validation 3ms + git await 0.1ms + cache check 0.05ms), versus ~25ms for the full pipeline. https://claude.ai/code/session_01LqhtHwcBu4dRJHx3JERArz

Three targeted fixes: 1. Fix countOutputLines for string[] output parts (packager.ts): The string[] code path started each part's line count at 1, but parts are concatenated directly (no separator). This over-counted by (numParts) lines — ~6000 for a typical output with ~6000 parts. Now just counts newlines across all parts with count starting at 1. 2. Batch mkdir in website server ZIP extraction (fileUtils.ts): Per-file fs.mkdir was called for every file in the ZIP (~1000 calls). Pre-collect unique parent directories and batch-create them before writing files — matching the pattern already used in processZipFile.ts. Reduces ~1000 mkdir syscalls to ~100 for typical ZIPs. 3. Remove redundant fs.access in website server file copy (fileUtils.ts): fs.copyFile already fails with a clear error if the source doesn't exist, making the pre-check fs.access call unnecessary. Benchmark (CLI, 5 runs, 2 warmup, packing repomix repo ~1009 files): Median: 510ms (no regression from these fixes) Startup (--version, 10 runs, 2 warmup): Median: 72ms All 1090 tests pass, lint clean. https://claude.ai/code/session_01XeaZajSv4SYfQsz8dHjary

Partial cherry-pick from commit 75bec9e (#1295). Changes included: - Replace Zod instanceof check with duck typing in errorHandle.ts to avoid eagerly importing Zod on every CLI invocation (-22% startup time) - Replace O(n²) reduce+spread with flatMap in outputGenerate.ts - Remove redundant Set wrapping where inputs are already disjoint - Parallelize disk write and clipboard copy in produceOutput.ts - Remove unnecessary sort of file change counts in outputSort.ts - Add missing await to freeTokenCounters in calculateMetricsWorker.ts Excluded from cherry-pick: - tokenCounterFactory.ts (depends on gpt-tokenizer migration) - filePathSort.ts / fileTreeGenerate.ts (localeCompare changes risk altering sort order for non-ASCII file paths) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

intent(startup-perf): cherry-pick strip-comments lazy-loading from #1295 to reduce worker startup overhead decision(cherry-pick): partial cherry-pick — only strip-comments lazy-loading, excluding fileRead.ts changes that depend on prior restructuring commits rejected(fileRead-changes): lazy-loading of is-binary-path and isbinaryfile from same commit — deep conflicts with main due to file reading restructure constraint(imports): main branch still uses parseFile import from treeSitter — must keep alongside new ensureStripCommentsLoaded import Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Partial cherry-pick from commit 75bec9e (#1295). Changes included: - Replace Zod instanceof check with duck typing in errorHandle.ts to avoid eagerly importing Zod on every CLI invocation (-22% startup time) - Replace O(n²) reduce+spread with flatMap in outputGenerate.ts - Remove redundant Set wrapping where inputs are already disjoint - Parallelize disk write and clipboard copy in produceOutput.ts - Remove unnecessary sort of file change counts in outputSort.ts - Add missing await to freeTokenCounters in calculateMetricsWorker.ts Excluded from cherry-pick: - tokenCounterFactory.ts (depends on gpt-tokenizer migration) - filePathSort.ts / fileTreeGenerate.ts (localeCompare changes risk altering sort order for non-ASCII file paths) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… faster server requests The website server's processZipFile and remoteRepo handlers were spawning a child process for each pack() call due to quiet: true without _inProcess flag. Each child process paid ~500ms of overhead (Node.js startup + ESM module re-loading + worker pool warmup for gpt-tokenizer BPE + @secretlint/core). Set _inProcess: true (matching the pattern already used by MCP tools) to run pack() directly in the server process. This reuses module-level cached worker pools across requests, eliminating the per-request spawn + warmup overhead. All module-level caches are bounded (200MB file content, 5000 entries for metrics/security/processing, 16 entries for search results), so memory growth is controlled in long-running server processes. Benchmark (5 runs, 2 warmup, packing repomix repo ~983 files): | Mode | Median | |---|---| | In-Process (_inProcess: true) | 122.4ms | | Child Process (before) | 581.2ms | | **Improvement** | **-458.8ms (-78.9%)** | https://claude.ai/code/session_018a2JAZXzPHMc5F2bb3kPLY

Partial cherry-pick from commit 75bec9e (#1295). Changes included: - Replace Zod instanceof check with duck typing in errorHandle.ts to avoid eagerly importing Zod on every CLI invocation (-22% startup time) - Replace O(n²) reduce+spread with flatMap in outputGenerate.ts - Remove redundant Set wrapping where inputs are already disjoint - Parallelize disk write and clipboard copy in produceOutput.ts - Remove unnecessary sort of file change counts in outputSort.ts - Add missing await to freeTokenCounters in calculateMetricsWorker.ts Excluded from cherry-pick: - tokenCounterFactory.ts (depends on gpt-tokenizer migration) - filePathSort.ts / fileTreeGenerate.ts (localeCompare changes risk altering sort order for non-ASCII file paths) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

intent(startup-perf): cherry-pick strip-comments lazy-loading from #1295 to reduce worker startup overhead decision(cherry-pick): partial cherry-pick — only strip-comments lazy-loading, excluding fileRead.ts changes that depend on prior restructuring commits rejected(fileRead-changes): lazy-loading of is-binary-path and isbinaryfile from same commit — deep conflicts with main due to file reading restructure constraint(imports): main branch still uses parseFile import from treeSitter — must keep alongside new ensureStripCommentsLoaded import Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Partial cherry-pick from commit 75bec9e (#1295). Changes included: - Replace Zod instanceof check with duck typing in errorHandle.ts to avoid eagerly importing Zod on every CLI invocation (-22% startup time) - Replace O(n²) reduce+spread with flatMap in outputGenerate.ts - Remove redundant Set wrapping where inputs are already disjoint - Parallelize disk write and clipboard copy in produceOutput.ts - Remove unnecessary sort of file change counts in outputSort.ts - Add missing await to freeTokenCounters in calculateMetricsWorker.ts Excluded from cherry-pick: - tokenCounterFactory.ts (depends on gpt-tokenizer migration) - filePathSort.ts / fileTreeGenerate.ts (localeCompare changes risk altering sort order for non-ASCII file paths) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

intent(startup-perf): cherry-pick strip-comments lazy-loading from #1295 to reduce worker startup overhead decision(cherry-pick): partial cherry-pick — only strip-comments lazy-loading, excluding fileRead.ts changes that depend on prior restructuring commits rejected(fileRead-changes): lazy-loading of is-binary-path and isbinaryfile from same commit — deep conflicts with main due to file reading restructure constraint(imports): main branch still uses parseFile import from treeSitter — must keep alongside new ensureStripCommentsLoaded import Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

# Conflicts: # package-lock.json # package.json # src/cli/prompts/skillPrompts.ts # src/config/configLoad.ts # src/core/file/fileRead.ts # src/core/metrics/calculateMetrics.ts # src/core/output/outputGenerate.ts # src/core/output/outputSort.ts # src/core/packager.ts # src/core/packager/produceOutput.ts # src/core/skill/packSkill.ts # src/core/skill/skillStyle.ts # src/core/skill/skillTechStack.ts # src/core/skill/writeSkillOutput.ts # src/mcp/tools/grepRepomixOutputTool.ts # tests/core/packager.test.ts # tests/core/packager/splitOutput.test.ts

gemini-code-assist bot reviewed Mar 23, 2026

View reviewed changes

coderabbitai bot approved these changes Mar 23, 2026

View reviewed changes

yamadashy force-pushed the perf/auto-perf-tuning branch from fb048aa to 6861f46 Compare March 24, 2026 16:05

yamadashy force-pushed the perf/auto-perf-tuning branch from 49a2598 to 75be4f7 Compare March 25, 2026 15:11

yamadashy mentioned this pull request Mar 25, 2026

perf(core): Cherry-pick low-risk micro-optimizations from #1295 #1299

Closed

2 tasks

github-advanced-security bot found potential problems Mar 25, 2026

View reviewed changes

src/core/security/securityCheck.ts Fixed Show fixed Hide fixed

claude added 11 commits March 27, 2026 15:12

yamadashy force-pushed the perf/auto-perf-tuning branch from 178778b to 3b0a2fd Compare March 27, 2026 15:13

yamadashy mentioned this pull request Mar 28, 2026

perf(core): Lazy-load minimatch and simplify permission check #1331

Closed

2 tasks

yamadashy force-pushed the perf/auto-perf-tuning branch 2 times, most recently from 69f42b3 to e7755c4 Compare March 28, 2026 07:16

yamadashy added the automated label Mar 28, 2026

yamadashy closed this Apr 11, 2026

yamadashy deleted the perf/auto-perf-tuning branch April 11, 2026 03:57

Uh oh!

Conversation

yamadashy commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Optimization 1: Lazy-load CLI actions for 62% faster startup

Key Optimization 2: Lazy-load jschardet and iconv-lite

Key Optimization 3: Fix O(n²) file path regrouping

Key Optimization 4: Parallelize git diff and git log operations

Key Optimization 5: Reduce GC pressure across hot paths

Key Optimization 28: Sync fast-path for cached file collection

Key Optimization 29: Cache security results and stream output parts

Key Optimization 30: Skip security pre-filter regex, cache tree string, and skip unchanged disk writes

Key Optimization 31: Use readFileSync for cold-run file collection (~11% faster pack)

Key Optimization 32: Pre-warm worker pools during config loading and lazy-load picospinner

Key Optimization 33: Cache entire pack() result for warm MCP/server runs

Key Optimization 34: Fix output line over-count and batch ZIP mkdir

Key Optimization 35: Run website server pack() in-process instead of child process (~79% faster)

Benchmark results

Checklist

Uh oh!

github-actions bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚡ Performance Benchmark

Uh oh!

gemini-code-assist bot commented Mar 23, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

codecov bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cloudflare-workers-and-pages bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying repomix with Cloudflare Pages

Uh oh!

coderabbitai bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

yamadashy commented Mar 24, 2026

Key Optimization 16: Lazy-load Zod, optimize tree generation, and cache git repo checks

Uh oh!

yamadashy commented Mar 24, 2026

Key Optimization 17: Lazy-load strip-comments, is-binary-path, and isbinaryfile

Uh oh!

yamadashy commented Mar 24, 2026

Key Optimization 18: Overlap metrics with write, skip redundant sort, and optimize hot paths

Uh oh!

yamadashy commented Mar 24, 2026

Key Optimization 19: Fix token counter special tokens, estimate output tokens, and overlap file metrics

Uh oh!

yamadashy commented Mar 24, 2026

Key Optimization 20: Overlap git-based file sorting with security check

Uh oh!

yamadashy commented Mar 24, 2026

Key Optimization 21: Lazy-load minimatch, parallelize file search I/O, and simplify permission check

Uh oh!

yamadashy commented Mar 24, 2026

Key Optimization 22: Batch security check tasks to reduce IPC overhead

Uh oh!

yamadashy commented Mar 24, 2026

Key Optimization 23: Increase file collection concurrency and optimize result partitioning

What was investigated but not changed

Uh oh!

yamadashy commented Mar 25, 2026

Key Optimization 26: Pre-compute lowercase in sort comparators and eliminate output template intermediates

Uh oh!

yamadashy commented Mar 25, 2026

Key Optimization 27: Tighten BasicAuth pre-filter to require scheme://...@ on same line

Uh oh!

Uh oh!

Reviewers

Assignees

yamadashy commented Mar 23, 2026 •

edited

Loading

github-actions bot commented Mar 23, 2026 •

edited

Loading

codecov bot commented Mar 23, 2026 •

edited

Loading

cloudflare-workers-and-pages bot commented Mar 23, 2026 •

edited

Loading

coderabbitai bot commented Mar 23, 2026 •

edited

Loading