perf(core): Async git ls-files and overlap git ops with file collection#1477
perf(core): Async git ls-files and overlap git ops with file collection#1477
Conversation
Replace globby filesystem walk with git ls-files for file enumeration in git repositories, reducing search time from ~250ms to ~15ms (~94ms wall-clock improvement, 6.0% of total CLI time). ## What changed - Added `searchFilesGitFastPath()` that reads from the pre-built git index via `git ls-files --cached --others --exclude-standard` - Post-filters the result through the `ignore` package (for default/custom/.repomixignore patterns) and `minimatch` (for include patterns) to produce the same file set as globby - Falls back to globby when: not a git repo, useGitignore disabled, stdin/explicit files mode, or git command failure - Added `rawIgnorePatterns` to `prepareIgnoreContext` to avoid the `normalizeGlobPattern` transform that converts `**/file.ext` to `**/file.ext/**` — correct for globby but breaks the `ignore` package's gitignore-style matching ## Why it works globby walks the entire filesystem tree (~250ms for ~1000 files), applying .gitignore rules at each directory level. `git ls-files` reads from the pre-built index file in a single syscall (~5ms), producing the same set of non-ignored files. The subsequent post-filtering with the `ignore` package for repomix-specific patterns (defaultIgnoreList, .repomixignore) and symlink detection via lstatSync adds only ~10ms. ## Benchmark `node bin/repomix.cjs --quiet --output repomix-output.xml` on the repomix repo (~1019 files, default config). 20 interleaved A/B pairs: | Metric | Baseline | This patch | Delta | |------------|----------|------------|----------------------| | median | 1573 ms | 1475 ms | −98 ms (−6.2%) | | mean | 1567 ms | 1473 ms | −94 ms (−6.0%) | | positive | | | 19/20 pairs improved | ## Correctness - 1115 tests pass - Lint clean (0 new warnings) - Output file list is identical (1019 files, same paths) - Output content is byte-identical (verified with diff) - Empty directory detection unchanged (still uses globby) https://claude.ai/code/session_01DiePCbPAioXXYL9rGkNFsP
Two changes that compound to a 2.4% (−31ms) CLI wall-time improvement: 1. Convert execFileSync → async execFile for git ls-files The git ls-files call takes ~5-17ms. execFileSync blocked the Node.js event loop during this time, preventing any concurrent async work from progressing (metrics warmup, prefetch sort data, etc.). Using the async execFile via promisify frees the event loop, allowing these background operations to make progress during the git wait. 2. Start git diff/log and empty dir scan before searchFiles getGitDiffs, getGitLogs, and searchEmptyDirectories only need rootDirs and config — they don't depend on search results. Previously they ran in the collect phase (after searchFiles completed). Moving them to fire before searchFiles gives their git subprocesses a head start. Combined with the async execFile change, the event loop processes git subprocess completions during the git ls-files await, so by the time the collect phase starts, the git operations may already be complete. Benchmark: `node bin/repomix.cjs --quiet` on the repomix repo (~1020 files). 20 interleaved A/B runs: | Metric | Baseline | This patch | Delta | |--------------|----------|------------|-----------------| | median | 1308 ms | 1277 ms | −31 ms (−2.4%) | | trimmed mean | 1310 ms | 1279 ms | −31 ms (−2.4%) | All 1115 tests pass, lint clean. https://claude.ai/code/session_01SkQSWcbnFJE9czCTzS6qcV
📝 WalkthroughWalkthroughThis PR introduces a git ls-files-based fast path for file enumeration and optimizes concurrent execution of git operations. The fileSearch module adds conditional fast-path logic with subprocess-based file tracking, symlink/type filtering, and dynamic ignore file resolution. The packager module starts git diff/log operations earlier to overlap I/O with file collection. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant SearchFiles as searchFiles()
participant GitFastPath as git ls-files<br/>subprocess
participant IgnoreFilter as ignore filter +<br/>nested files
participant Globby as globby fallback
participant FileSystem as filesystem
Caller->>SearchFiles: searchFiles(config, patterns)
alt useGitignore enabled & no stdin
SearchFiles->>GitFastPath: spawn git ls-files
GitFastPath-->>SearchFiles: NUL-separated files
SearchFiles->>IgnoreFilter: filter with patterns +<br/>nested .gitignore
IgnoreFilter->>FileSystem: check symlinks & types
FileSystem-->>IgnoreFilter: stat results
IgnoreFilter-->>SearchFiles: filtered files
alt git fast path succeeds
SearchFiles-->>Caller: files
else git fast path fails/unavailable
SearchFiles->>Globby: fallback search
Globby-->>SearchFiles: files
SearchFiles-->>Caller: files
end
else fallback to globby
SearchFiles->>Globby: globby search
Globby-->>SearchFiles: files
SearchFiles-->>Caller: files
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a performance optimization for file searching by implementing a 'fast path' using git ls-files in git repositories, which significantly reduces the time spent walking the filesystem. It also refactors the packaging process to initiate git operations concurrently with file collection. Feedback highlights a potential regression in include pattern matching where directory names are not automatically expanded, unlike the globby fallback. Suggestions were also provided to improve performance by replacing synchronous I/O with concurrent asynchronous calls and optimizing ignore file processing, along with a minor cleanup of promise handling.
| const matchers = includePatterns.map((p) => new Minimatch(p, { dot: true })); | ||
| files = files.filter((f) => matchers.some((m) => m.match(f))); |
There was a problem hiding this comment.
There is a potential regression here compared to globby semantics. globby (and fast-glob) automatically expands directory names in include patterns (e.g., src becomes src/**/*), whereas Minimatch performs literal glob matching. If a user provides a directory name in the include config, the git fast path will fail to match any files within that directory. Additionally, the PR description mentions using picomatch, but the implementation uses Minimatch. Consider using picomatch with appropriate options or expanding directory patterns to maintain consistency with the globby fallback.
| files = files.filter((f) => { | ||
| try { | ||
| return lstatSync(path.join(rootDir, f)).isFile(); | ||
| } catch { | ||
| return false; | ||
| } | ||
| }); |
There was a problem hiding this comment.
Using lstatSync in a loop for every file returned by git ls-files can be a significant performance bottleneck, especially in large repositories with thousands of files. This synchronous I/O blocks the Node.js event loop, which partially defeats the purpose of using async execFile to keep the event loop free. Consider using fs.lstat (async) with Promise.all to perform these checks concurrently.
| files = files.filter((f) => { | |
| try { | |
| return lstatSync(path.join(rootDir, f)).isFile(); | |
| } catch { | |
| return false; | |
| } | |
| }); | |
| const fileStats = await Promise.all( | |
| files.map(async (f) => { | |
| try { | |
| const stats = await fs.lstat(path.join(rootDir, f)); | |
| return stats.isFile() ? f : null; | |
| } catch { | |
| return null; | |
| } | |
| }), | |
| ); | |
| files = fileStats.filter((f): f is string => f !== null); |
| for (const filePattern of ignoreFilePatterns) { | ||
| const fileName = filePattern.replace(/^\*\*\//, ''); | ||
| const matchingFiles = files.filter((f) => f === fileName || f.endsWith(`/${fileName}`)); |
There was a problem hiding this comment.
Filtering the entire files array inside a loop for each ignore file pattern is inefficient (O(N * M)). Since ignoreFilePatterns usually contains a small number of fixed filenames (like .repomixignore), it is more efficient to perform a single pass over the files array to identify all relevant ignore files.
const ignoreFileNames = ignoreFilePatterns.map((p) => p.replace(/^\*\*\//, ''));
const ignoreFilesToProcess = files.filter((f) => {
const base = path.basename(f);
return ignoreFileNames.includes(base);
});
for (const ignoreFile of ignoreFilesToProcess) {| Promise.resolve(gitDiffPromise).catch(() => {}); | ||
| Promise.resolve(gitLogPromise).catch(() => {}); |
There was a problem hiding this comment.
The use of Promise.resolve() here is redundant because gitDiffPromise and gitLogPromise are already promises returned by async functions. You can attach the .catch() directly to the promises to suppress the unhandled rejection warning.
| Promise.resolve(gitDiffPromise).catch(() => {}); | |
| Promise.resolve(gitLogPromise).catch(() => {}); | |
| gitDiffPromise.catch(() => {}); | |
| gitLogPromise.catch(() => {}); |
⚡ Performance Benchmark
Details
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1477 +/- ##
==========================================
+ Coverage 87.18% 87.25% +0.07%
==========================================
Files 117 117
Lines 4470 4529 +59
Branches 1032 1046 +14
==========================================
+ Hits 3897 3952 +55
- Misses 573 577 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/core/file/fileSearch.ts (1)
159-188: Nested ignore-file reads are serialized.The
for (const filePattern …)/for (const ignoreFile …)pair awaits eachfs.readFileone at a time, and each inner iteration also adds patterns to a sharediginstance. The reads can be done in parallel and theig.addcalls sequenced at the end to preserve ordering semantics. Typically there are only a handful of ignore files so the practical win is modest, but it stays consistent with the async-overlap theme of this PR.♻️ Parallelize reads
- for (const filePattern of ignoreFilePatterns) { - const fileName = filePattern.replace(/^\*\*\//, ''); - const matchingFiles = files.filter((f) => f === fileName || f.endsWith(`/${fileName}`)); - - for (const ignoreFile of matchingFiles) { - try { - const content = await fs.readFile(path.join(rootDir, ignoreFile), 'utf-8'); - const patterns = parseIgnoreContent(content); - const dir = path.dirname(ignoreFile); - ... - } catch { - logger.trace(`Could not read ignore file: ${ignoreFile}`); - } - } - } + const ignoreFilesToRead: string[] = []; + for (const filePattern of ignoreFilePatterns) { + const fileName = filePattern.replace(/^\*\*\//, ''); + for (const f of files) { + if (f === fileName || f.endsWith(`/${fileName}`)) ignoreFilesToRead.push(f); + } + } + const readResults = await Promise.all( + ignoreFilesToRead.map(async (ignoreFile) => { + try { + const content = await fs.readFile(path.join(rootDir, ignoreFile), 'utf-8'); + return { ignoreFile, patterns: parseIgnoreContent(content) }; + } catch { + logger.trace(`Could not read ignore file: ${ignoreFile}`); + return null; + } + }), + ); + for (const res of readResults) { + if (!res) continue; + const dir = path.dirname(res.ignoreFile); + if (dir === '.') { + ig.add(res.patterns); + } else { + ig.add(res.patterns.map((p) => { + if (p.startsWith('!')) return `!${dir}/${p.slice(1)}`; + if (p.startsWith('/')) return `${dir}${p}`; + return `${dir}/${p}`; + })); + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/core/file/fileSearch.ts` around lines 159 - 188, The nested loops in the ignore-file handling (iterating ignoreFilePatterns and matchingFiles) perform fs.readFile sequentially and call ig.add inline, slowing I/O and mutating the shared ignore instance; change this by mapping each matching ignoreFile to a read promise (use fs.readFile for each ignoreFile) and await Promise.all to get contents, then for each resolved content (preserving original file order) call parseIgnoreContent and add patterns to ig (handling root vs nested dirs and prefixing patterns the same way); ensure per-file read errors are caught and logged with logger.trace so failures don’t break the batch, and keep the same transformation logic for '!' and '/' prefixes when calling ig.add.src/core/packager.ts (1)
128-133: RedundantPromise.resolvewrapper for rejection suppression.
Promise.resolve(p)returnspitself whenpis already a native promise, soPromise.resolve(gitDiffPromise).catch(() => {})is equivalent togitDiffPromise.catch(() => {}). The simpler form is clearer and equally effective at registering a handler on the original promise to prevent an unhandled-rejection warning ifcollectFilesrejects first. The awaitedgitDiffPromise/gitLogPromisein the laterPromise.allstill surface the real error.- // Suppress unhandled rejection if collectFiles throws before Promise.all - // consumes these promises. The original promises are still awaited below. - Promise.resolve(gitDiffPromise).catch(() => {}); - Promise.resolve(gitLogPromise).catch(() => {}); + // Suppress unhandled rejection if collectFiles throws before Promise.all + // consumes these promises. The original promises are still awaited below. + gitDiffPromise.catch(() => {}); + gitLogPromise.catch(() => {});🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/core/packager.ts` around lines 128 - 133, The Promise.resolve wrappers are redundant; replace Promise.resolve(gitDiffPromise).catch(() => {}) and Promise.resolve(gitLogPromise).catch(() => {}) with direct handlers on the original promises (gitDiffPromise.catch(() => {}) and gitLogPromise.catch(() => {})) to suppress unhandled-rejection warnings while still allowing the awaited Promise.all(gitDiffPromise, gitLogPromise, ...) to surface real errors; locate usages around deps.getGitDiffs/deps.getGitLogs and the collectFiles/Promise.all flow and update there.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/core/file/fileSearch.ts`:
- Around line 95-107: The docstring in src/core/file/fileSearch.ts incorrectly
mentions "picomatch" while the implementation uses Minimatch; update the comment
so it references the actual matcher (minimatch/Minimatch) used by the code
instead of picomatch to avoid confusion, e.g., replace the "picomatch (for
include patterns)" text with "minimatch (Minimatch) (for include patterns)" and
ensure any wording aligns with the matcher instantiation found in the file (new
Minimatch(...)).
- Around line 141-149: The synchronous lstatSync loop in
src/core/file/fileSearch.ts blocks the event loop for large repos; replace the
files = files.filter(...) block with an async filtering step that uses
fs.promises.lstat (or node:fs/promises.lstat) and bounded concurrency (e.g., a
small concurrency pool or Promise.allSettled for moderate sizes) to test
isFile() for each path, await the results, and then produce the filtered files
array; also remove the now-unused lstatSync import. Target the code around the
files variable and the current filter callback to perform the async lstat checks
and handle errors by excluding non-files.
---
Nitpick comments:
In `@src/core/file/fileSearch.ts`:
- Around line 159-188: The nested loops in the ignore-file handling (iterating
ignoreFilePatterns and matchingFiles) perform fs.readFile sequentially and call
ig.add inline, slowing I/O and mutating the shared ignore instance; change this
by mapping each matching ignoreFile to a read promise (use fs.readFile for each
ignoreFile) and await Promise.all to get contents, then for each resolved
content (preserving original file order) call parseIgnoreContent and add
patterns to ig (handling root vs nested dirs and prefixing patterns the same
way); ensure per-file read errors are caught and logged with logger.trace so
failures don’t break the batch, and keep the same transformation logic for '!'
and '/' prefixes when calling ig.add.
In `@src/core/packager.ts`:
- Around line 128-133: The Promise.resolve wrappers are redundant; replace
Promise.resolve(gitDiffPromise).catch(() => {}) and
Promise.resolve(gitLogPromise).catch(() => {}) with direct handlers on the
original promises (gitDiffPromise.catch(() => {}) and gitLogPromise.catch(() =>
{})) to suppress unhandled-rejection warnings while still allowing the awaited
Promise.all(gitDiffPromise, gitLogPromise, ...) to surface real errors; locate
usages around deps.getGitDiffs/deps.getGitLogs and the collectFiles/Promise.all
flow and update there.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7ff4c18b-47d2-48e7-8a6b-6c6bb9568cd3
📒 Files selected for processing (2)
src/core/file/fileSearch.tssrc/core/packager.ts
| /** | ||
| * Fast file enumeration using `git ls-files` for git repositories. | ||
| * | ||
| * `git ls-files --cached --others --exclude-standard` reads from the | ||
| * pre-built git index (~5ms) instead of walking the filesystem (~250ms | ||
| * with globby). The result is then post-filtered through the `ignore` | ||
| * package (for default/custom/repomixignore patterns) and `picomatch` | ||
| * (for include patterns) to produce the same file set as globby. | ||
| * | ||
| * Returns `null` when the fast path is not applicable (not a git repo, | ||
| * `useGitignore` disabled, or git command failure), signalling the | ||
| * caller to fall back to globby. | ||
| */ |
There was a problem hiding this comment.
Docstring says picomatch, but the implementation uses minimatch.
Small inconsistency in the new docstring — line 101 references picomatch, while the actual matcher below (line 195) is new Minimatch(...). Please update the comment to avoid future confusion.
- * package (for default/custom/repomixignore patterns) and `picomatch`
+ * package (for default/custom/repomixignore patterns) and `minimatch`📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /** | |
| * Fast file enumeration using `git ls-files` for git repositories. | |
| * | |
| * `git ls-files --cached --others --exclude-standard` reads from the | |
| * pre-built git index (~5ms) instead of walking the filesystem (~250ms | |
| * with globby). The result is then post-filtered through the `ignore` | |
| * package (for default/custom/repomixignore patterns) and `picomatch` | |
| * (for include patterns) to produce the same file set as globby. | |
| * | |
| * Returns `null` when the fast path is not applicable (not a git repo, | |
| * `useGitignore` disabled, or git command failure), signalling the | |
| * caller to fall back to globby. | |
| */ | |
| /** | |
| * Fast file enumeration using `git ls-files` for git repositories. | |
| * | |
| * `git ls-files --cached --others --exclude-standard` reads from the | |
| * pre-built git index (~5ms) instead of walking the filesystem (~250ms | |
| * with globby). The result is then post-filtered through the `ignore` | |
| * package (for default/custom/repomixignore patterns) and `minimatch` | |
| * (for include patterns) to produce the same file set as globby. | |
| * | |
| * Returns `null` when the fast path is not applicable (not a git repo, | |
| * `useGitignore` disabled, or git command failure), signalling the | |
| * caller to fall back to globby. | |
| */ |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/core/file/fileSearch.ts` around lines 95 - 107, The docstring in
src/core/file/fileSearch.ts incorrectly mentions "picomatch" while the
implementation uses Minimatch; update the comment so it references the actual
matcher (minimatch/Minimatch) used by the code instead of picomatch to avoid
confusion, e.g., replace the "picomatch (for include patterns)" text with
"minimatch (Minimatch) (for include patterns)" and ensure any wording aligns
with the matcher instantiation found in the file (new Minimatch(...)).
| // Filter out symlinks and non-regular files to match globby's behavior | ||
| // (followSymbolicLinks:false + onlyFiles:true reports symlinks as non-files). | ||
| files = files.filter((f) => { | ||
| try { | ||
| return lstatSync(path.join(rootDir, f)).isFile(); | ||
| } catch { | ||
| return false; | ||
| } | ||
| }); |
There was a problem hiding this comment.
Sync lstatSync in a per-file loop can block the event loop.
This filter calls lstatSync once per candidate file on the main thread. For a repo with, say, 10k tracked files that's 10k synchronous filesystem syscalls executed back-to-back, which partially defeats this PR's stated goal of keeping the event loop free so getGitDiffs/getGitLogs subprocesses and metricsWarmupPromise can make progress concurrently with file enumeration.
Using async fs.lstat with bounded concurrency (or Promise.all for moderate sizes) lets the concurrent git/metrics work actually be scheduled while stat syscalls are in flight.
♻️ Proposed async variant
- // Filter out symlinks and non-regular files to match globby's behavior
- // (followSymbolicLinks:false + onlyFiles:true reports symlinks as non-files).
- files = files.filter((f) => {
- try {
- return lstatSync(path.join(rootDir, f)).isFile();
- } catch {
- return false;
- }
- });
+ // Filter out symlinks and non-regular files to match globby's behavior
+ // (followSymbolicLinks:false + onlyFiles:true reports symlinks as non-files).
+ const isRegularFile = await Promise.all(
+ files.map(async (f) => {
+ try {
+ return (await fs.lstat(path.join(rootDir, f))).isFile();
+ } catch {
+ return false;
+ }
+ }),
+ );
+ files = files.filter((_, i) => isRegularFile[i]);The corresponding lstatSync / node:fs import becomes unused; drop it and rely on fs.lstat from node:fs/promises.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/core/file/fileSearch.ts` around lines 141 - 149, The synchronous
lstatSync loop in src/core/file/fileSearch.ts blocks the event loop for large
repos; replace the files = files.filter(...) block with an async filtering step
that uses fs.promises.lstat (or node:fs/promises.lstat) and bounded concurrency
(e.g., a small concurrency pool or Promise.allSettled for moderate sizes) to
test isFile() for each path, await the results, and then produce the filtered
files array; also remove the now-unused lstatSync import. Target the code around
the files variable and the current filter callback to perform the async lstat
checks and handle errors by excluding non-files.
Deploying repomix with
|
| Latest commit: |
bb274aa
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://0f927342.repomix.pages.dev |
| Branch Preview URL: | https://perf-async-git-ls-files.repomix.pages.dev |
Code Review — ClaudeOverall this is a well-motivated performance PR. The packager pipelining change (starting git diff/log early) is clean and low-risk. The Key Findings1.
|
| const ig = ignore(); | ||
| ig.add(adjustedIgnorePatterns); | ||
|
|
There was a problem hiding this comment.
🔴 Git fast path over-filters files: ignore package treats slash-less patterns as matching at any depth, unlike globby
The ignore package follows gitignore semantics where patterns without a / character (like *.pid, .env, *.tgz, *.cache) match against the filename at any directory depth. Globby, in contrast, treats these as glob patterns that only match at the root level. This causes the git fast path to silently exclude files that the globby fallback would include.
Confirmed behavioral difference with test
globby with ignore: ['*.pid', '.env']:
packages/app/.env→ INCLUDEDsubdir/foo.pid→ INCLUDED
ignore package with same patterns:
packages/app/.env→ EXCLUDEDsubdir/foo.pid→ EXCLUDED
Affected default ignore patterns (from src/config/defaultIgnore.ts) include: .env, *.pid, *.seed, *.pid.lock, *.tgz, *.cache, .eslintcache, .node_repl_history, .lock-wscript, .hgignore. Any user-defined custom patterns without / are also affected. Since the git fast path is the default for all git repos (config.ignore.useGitignore && !explicitFiles at line 307), this silently changes which files are included in the output — for example, packages/app/.env in a monorepo that is tracked by git would be excluded by the fast path but included by globby.
Prompt for agents
The root cause is that rawIgnorePatterns (from getIgnorePatterns) are designed for globby's glob-based matching, but the ignore package uses gitignore semantics where slash-less patterns match at any depth. To fix this, the patterns passed to ig.add() need to be transformed so that slash-less patterns are anchored to the root. In gitignore semantics, prefixing a pattern with / anchors it to the root directory. So for each pattern in rawIgnorePatterns that does not contain a / character (e.g. *.pid, .env, *.tgz, *.cache), prefix it with / before adding it to the ignore instance. Patterns that already contain / (like dist/**, **/node_modules/**) should be left as-is since they already have correct path-anchoring behavior in both globby and the ignore package. The transformation should happen in searchFilesGitFastPath around line 153-155 where ig.add(adjustedIgnorePatterns) is called, or alternatively in prepareIgnoreContext when building rawIgnorePatterns.
Was this helpful? React with 👍 or 👎 to provide feedback.
| const fileName = filePattern.replace(/^\*\*\//, ''); | ||
| const matchingFiles = files.filter((f) => f === fileName || f.endsWith(`/${fileName}`)); | ||
|
|
There was a problem hiding this comment.
🟡 Git fast path misses .repomixignore/.ignore files that are gitignored but exist on disk
The git fast path discovers .repomixignore and .ignore files by searching within the git ls-files output (src/core/file/fileSearch.ts:161-162). However, git ls-files --cached --others --exclude-standard excludes files that are in .gitignore. If a .repomixignore file exists on disk but is gitignored (and untracked), it won't appear in the git output, so its patterns won't be applied. Globby's ignoreFiles option finds these files via filesystem traversal regardless of gitignore status. This means the git fast path could include files that should have been excluded by a gitignored .repomixignore.
Was this helpful? React with 👍 or 👎 to provide feedback.
|
Closing this PR after review:
Closing in favor of focusing on optimizations with clearer independent effect. |
Summary
Two compounding changes in the file search and packager pipeline:
Convert `execFileSync` → async `execFile` for `git ls-files`: The git ls-files call takes ~5-17ms. `execFileSync` blocked the Node.js event loop during this time, preventing concurrent async work (metrics warmup, prefetch sort data) from progressing. Using async `execFile` via `promisify` frees the event loop.
Start git diff/log subprocesses early: `getGitDiffs` and `getGitLogs` only need `rootDirs` and `config` — they don't depend on collected file content. Move them to fire immediately at the start of the collect-phase `try` block (instead of lazily inside the `Promise.all`), so their git subprocesses get a head start while collectFiles is still running.
Note: This PR includes the prerequisite commit "Use git ls-files for fast file search in git repositories" (#29ac8820) which introduces the git fast-path. Compared to the original commit on the perf branch, the packager-side restructure here is simplified (does not depend on `prefetchSortData` / separate empty-dir scan, which are part of other changes).
Benchmark
Original measurement (20 interleaved A/B runs):
Note: Independent re-measurement on a different machine showed mixed results (round 1 -10ms, round 2 +10ms with σ=10-18ms noise). The simplified packager restructure in this PR may have smaller effect than the original full restructure.
Checklist
🤖 Generated with Claude Code