Skip to content

perf(core): Async git ls-files and overlap git ops with file collection#1477

Closed
yamadashy wants to merge 2 commits intomainfrom
perf/async-git-ls-files
Closed

perf(core): Async git ls-files and overlap git ops with file collection#1477
yamadashy wants to merge 2 commits intomainfrom
perf/async-git-ls-files

Conversation

@yamadashy
Copy link
Copy Markdown
Owner

@yamadashy yamadashy commented Apr 18, 2026

Summary

Two compounding changes in the file search and packager pipeline:

  1. Convert `execFileSync` → async `execFile` for `git ls-files`: The git ls-files call takes ~5-17ms. `execFileSync` blocked the Node.js event loop during this time, preventing concurrent async work (metrics warmup, prefetch sort data) from progressing. Using async `execFile` via `promisify` frees the event loop.

  2. Start git diff/log subprocesses early: `getGitDiffs` and `getGitLogs` only need `rootDirs` and `config` — they don't depend on collected file content. Move them to fire immediately at the start of the collect-phase `try` block (instead of lazily inside the `Promise.all`), so their git subprocesses get a head start while collectFiles is still running.

Note: This PR includes the prerequisite commit "Use git ls-files for fast file search in git repositories" (#29ac8820) which introduces the git fast-path. Compared to the original commit on the perf branch, the packager-side restructure here is simplified (does not depend on `prefetchSortData` / separate empty-dir scan, which are part of other changes).

Benchmark

Original measurement (20 interleaved A/B runs):

Metric Baseline This patch Delta
median 1308 ms 1277 ms -31 ms (-2.4%)
trimmed mean 1310 ms 1279 ms -31 ms (-2.4%)

Note: Independent re-measurement on a different machine showed mixed results (round 1 -10ms, round 2 +10ms with σ=10-18ms noise). The simplified packager restructure in this PR may have smaller effect than the original full restructure.

Checklist

  • Run `npm run test` (1115 passed)
  • Run `npm run lint`

🤖 Generated with Claude Code


Open with Devin

claude added 2 commits April 18, 2026 11:46
Replace globby filesystem walk with git ls-files for file enumeration
in git repositories, reducing search time from ~250ms to ~15ms (~94ms
wall-clock improvement, 6.0% of total CLI time).

## What changed

- Added `searchFilesGitFastPath()` that reads from the pre-built git
  index via `git ls-files --cached --others --exclude-standard`
- Post-filters the result through the `ignore` package (for
  default/custom/.repomixignore patterns) and `minimatch` (for include
  patterns) to produce the same file set as globby
- Falls back to globby when: not a git repo, useGitignore disabled,
  stdin/explicit files mode, or git command failure
- Added `rawIgnorePatterns` to `prepareIgnoreContext` to avoid the
  `normalizeGlobPattern` transform that converts `**/file.ext` to
  `**/file.ext/**` — correct for globby but breaks the `ignore`
  package's gitignore-style matching

## Why it works

globby walks the entire filesystem tree (~250ms for ~1000 files),
applying .gitignore rules at each directory level. `git ls-files` reads
from the pre-built index file in a single syscall (~5ms), producing
the same set of non-ignored files. The subsequent post-filtering with
the `ignore` package for repomix-specific patterns (defaultIgnoreList,
.repomixignore) and symlink detection via lstatSync adds only ~10ms.

## Benchmark

`node bin/repomix.cjs --quiet --output repomix-output.xml` on the
repomix repo (~1019 files, default config). 20 interleaved A/B pairs:

| Metric     | Baseline | This patch | Delta                |
|------------|----------|------------|----------------------|
| median     | 1573 ms  | 1475 ms    | −98 ms (−6.2%)       |
| mean       | 1567 ms  | 1473 ms    | −94 ms (−6.0%)       |
| positive   |          |            | 19/20 pairs improved |

## Correctness

- 1115 tests pass
- Lint clean (0 new warnings)
- Output file list is identical (1019 files, same paths)
- Output content is byte-identical (verified with diff)
- Empty directory detection unchanged (still uses globby)

https://claude.ai/code/session_01DiePCbPAioXXYL9rGkNFsP
Two changes that compound to a 2.4% (−31ms) CLI wall-time improvement:

1. Convert execFileSync → async execFile for git ls-files
   The git ls-files call takes ~5-17ms. execFileSync blocked the Node.js
   event loop during this time, preventing any concurrent async work from
   progressing (metrics warmup, prefetch sort data, etc.). Using the async
   execFile via promisify frees the event loop, allowing these background
   operations to make progress during the git wait.

2. Start git diff/log and empty dir scan before searchFiles
   getGitDiffs, getGitLogs, and searchEmptyDirectories only need rootDirs
   and config — they don't depend on search results. Previously they ran
   in the collect phase (after searchFiles completed). Moving them to fire
   before searchFiles gives their git subprocesses a head start. Combined
   with the async execFile change, the event loop processes git subprocess
   completions during the git ls-files await, so by the time the collect
   phase starts, the git operations may already be complete.

Benchmark: `node bin/repomix.cjs --quiet` on the repomix repo (~1020 files).
20 interleaved A/B runs:

| Metric       | Baseline | This patch | Delta           |
|--------------|----------|------------|-----------------|
| median       | 1308 ms  | 1277 ms    | −31 ms (−2.4%)  |
| trimmed mean | 1310 ms  | 1279 ms    | −31 ms (−2.4%)  |

All 1115 tests pass, lint clean.

https://claude.ai/code/session_01SkQSWcbnFJE9czCTzS6qcV
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 18, 2026

📝 Walkthrough

Walkthrough

This PR introduces a git ls-files-based fast path for file enumeration and optimizes concurrent execution of git operations. The fileSearch module adds conditional fast-path logic with subprocess-based file tracking, symlink/type filtering, and dynamic ignore file resolution. The packager module starts git diff/log operations earlier to overlap I/O with file collection.

Changes

Cohort / File(s) Summary
Git Fast Path Implementation
src/core/file/fileSearch.ts
Adds searchFilesGitFastPath function using git ls-files for NUL-separated file enumeration, with symlink/type filtering and ignore pattern matching via subprocess. Updates searchFiles control flow to attempt git fast path when useGitignore is enabled and no stdin input provided; falls back to existing globby on fast-path failure. Extends prepareIgnoreContext to return rawIgnorePatterns for git path consumption. Handles git worktrees with .git/** adjustments.
Concurrency Optimization
src/core/packager.ts
Starts getGitDiffs and getGitLogs subprocess promises immediately before file collection via collectFiles, rather than inlining calls later. Adds .catch(() => {}) handlers to suppress unhandled rejections if file collection throws. Preserves awaited results in Promise.all while enabling earlier subprocess execution for I/O overlap.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant SearchFiles as searchFiles()
    participant GitFastPath as git ls-files<br/>subprocess
    participant IgnoreFilter as ignore filter +<br/>nested files
    participant Globby as globby fallback
    participant FileSystem as filesystem

    Caller->>SearchFiles: searchFiles(config, patterns)
    
    alt useGitignore enabled & no stdin
        SearchFiles->>GitFastPath: spawn git ls-files
        GitFastPath-->>SearchFiles: NUL-separated files
        SearchFiles->>IgnoreFilter: filter with patterns +<br/>nested .gitignore
        IgnoreFilter->>FileSystem: check symlinks & types
        FileSystem-->>IgnoreFilter: stat results
        IgnoreFilter-->>SearchFiles: filtered files
        
        alt git fast path succeeds
            SearchFiles-->>Caller: files
        else git fast path fails/unavailable
            SearchFiles->>Globby: fallback search
            Globby-->>SearchFiles: files
            SearchFiles-->>Caller: files
        end
    else fallback to globby
        SearchFiles->>Globby: globby search
        Globby-->>SearchFiles: files
        SearchFiles-->>Caller: files
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed Title accurately summarizes the main changes: async git ls-files and overlapping git operations with file collection for performance improvements.
Description check ✅ Passed Description includes detailed summary, benchmark results, and completed checklist items. All required sections are present and well-documented.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/async-git-ls-files

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a performance optimization for file searching by implementing a 'fast path' using git ls-files in git repositories, which significantly reduces the time spent walking the filesystem. It also refactors the packaging process to initiate git operations concurrently with file collection. Feedback highlights a potential regression in include pattern matching where directory names are not automatically expanded, unlike the globby fallback. Suggestions were also provided to improve performance by replacing synchronous I/O with concurrent asynchronous calls and optimizing ignore file processing, along with a minor cleanup of promise handling.

Comment on lines +195 to +196
const matchers = includePatterns.map((p) => new Minimatch(p, { dot: true }));
files = files.filter((f) => matchers.some((m) => m.match(f)));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a potential regression here compared to globby semantics. globby (and fast-glob) automatically expands directory names in include patterns (e.g., src becomes src/**/*), whereas Minimatch performs literal glob matching. If a user provides a directory name in the include config, the git fast path will fail to match any files within that directory. Additionally, the PR description mentions using picomatch, but the implementation uses Minimatch. Consider using picomatch with appropriate options or expanding directory patterns to maintain consistency with the globby fallback.

Comment on lines +143 to +149
files = files.filter((f) => {
try {
return lstatSync(path.join(rootDir, f)).isFile();
} catch {
return false;
}
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using lstatSync in a loop for every file returned by git ls-files can be a significant performance bottleneck, especially in large repositories with thousands of files. This synchronous I/O blocks the Node.js event loop, which partially defeats the purpose of using async execFile to keep the event loop free. Consider using fs.lstat (async) with Promise.all to perform these checks concurrently.

Suggested change
files = files.filter((f) => {
try {
return lstatSync(path.join(rootDir, f)).isFile();
} catch {
return false;
}
});
const fileStats = await Promise.all(
files.map(async (f) => {
try {
const stats = await fs.lstat(path.join(rootDir, f));
return stats.isFile() ? f : null;
} catch {
return null;
}
}),
);
files = fileStats.filter((f): f is string => f !== null);

Comment on lines +159 to +161
for (const filePattern of ignoreFilePatterns) {
const fileName = filePattern.replace(/^\*\*\//, '');
const matchingFiles = files.filter((f) => f === fileName || f.endsWith(`/${fileName}`));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Filtering the entire files array inside a loop for each ignore file pattern is inefficient (O(N * M)). Since ignoreFilePatterns usually contains a small number of fixed filenames (like .repomixignore), it is more efficient to perform a single pass over the files array to identify all relevant ignore files.

  const ignoreFileNames = ignoreFilePatterns.map((p) => p.replace(/^\*\*\//, ''));
  const ignoreFilesToProcess = files.filter((f) => {
    const base = path.basename(f);
    return ignoreFileNames.includes(base);
  });

  for (const ignoreFile of ignoreFilesToProcess) {

Comment thread src/core/packager.ts
Comment on lines +132 to +133
Promise.resolve(gitDiffPromise).catch(() => {});
Promise.resolve(gitLogPromise).catch(() => {});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of Promise.resolve() here is redundant because gitDiffPromise and gitLogPromise are already promises returned by async functions. You can attach the .catch() directly to the promises to suppress the unhandled rejection warning.

Suggested change
Promise.resolve(gitDiffPromise).catch(() => {});
Promise.resolve(gitLogPromise).catch(() => {});
gitDiffPromise.catch(() => {});
gitLogPromise.catch(() => {});

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 18, 2026

⚡ Performance Benchmark

Latest commit:bb274aa perf(core): Async git ls-files and overlap git ops with file search
Status:✅ Benchmark complete!
Ubuntu:1.31s (±0.03s) → 1.21s (±0.02s) · -0.11s (-8.3%)
macOS:1.14s (±0.29s) → 1.07s (±0.28s) · -0.06s (-5.6%)
Windows:1.78s (±0.09s) → 1.68s (±0.10s) · -0.10s (-5.8%)
Details
  • Packing the repomix repository with node bin/repomix.cjs
  • Warmup: 2 runs (discarded), interleaved execution
  • Measurement: 20 runs / 30 on macOS (median ± IQR)
  • Workflow run

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 18, 2026

Codecov Report

❌ Patch coverage is 87.50000% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.25%. Comparing base (c55528d) to head (bb274aa).
⚠️ Report is 14 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/core/file/fileSearch.ts 86.76% 9 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1477      +/-   ##
==========================================
+ Coverage   87.18%   87.25%   +0.07%     
==========================================
  Files         117      117              
  Lines        4470     4529      +59     
  Branches     1032     1046      +14     
==========================================
+ Hits         3897     3952      +55     
- Misses        573      577       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/core/file/fileSearch.ts (1)

159-188: Nested ignore-file reads are serialized.

The for (const filePattern …) / for (const ignoreFile …) pair awaits each fs.readFile one at a time, and each inner iteration also adds patterns to a shared ig instance. The reads can be done in parallel and the ig.add calls sequenced at the end to preserve ordering semantics. Typically there are only a handful of ignore files so the practical win is modest, but it stays consistent with the async-overlap theme of this PR.

♻️ Parallelize reads
-  for (const filePattern of ignoreFilePatterns) {
-    const fileName = filePattern.replace(/^\*\*\//, '');
-    const matchingFiles = files.filter((f) => f === fileName || f.endsWith(`/${fileName}`));
-
-    for (const ignoreFile of matchingFiles) {
-      try {
-        const content = await fs.readFile(path.join(rootDir, ignoreFile), 'utf-8');
-        const patterns = parseIgnoreContent(content);
-        const dir = path.dirname(ignoreFile);
-        ...
-      } catch {
-        logger.trace(`Could not read ignore file: ${ignoreFile}`);
-      }
-    }
-  }
+  const ignoreFilesToRead: string[] = [];
+  for (const filePattern of ignoreFilePatterns) {
+    const fileName = filePattern.replace(/^\*\*\//, '');
+    for (const f of files) {
+      if (f === fileName || f.endsWith(`/${fileName}`)) ignoreFilesToRead.push(f);
+    }
+  }
+  const readResults = await Promise.all(
+    ignoreFilesToRead.map(async (ignoreFile) => {
+      try {
+        const content = await fs.readFile(path.join(rootDir, ignoreFile), 'utf-8');
+        return { ignoreFile, patterns: parseIgnoreContent(content) };
+      } catch {
+        logger.trace(`Could not read ignore file: ${ignoreFile}`);
+        return null;
+      }
+    }),
+  );
+  for (const res of readResults) {
+    if (!res) continue;
+    const dir = path.dirname(res.ignoreFile);
+    if (dir === '.') {
+      ig.add(res.patterns);
+    } else {
+      ig.add(res.patterns.map((p) => {
+        if (p.startsWith('!')) return `!${dir}/${p.slice(1)}`;
+        if (p.startsWith('/')) return `${dir}${p}`;
+        return `${dir}/${p}`;
+      }));
+    }
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/file/fileSearch.ts` around lines 159 - 188, The nested loops in the
ignore-file handling (iterating ignoreFilePatterns and matchingFiles) perform
fs.readFile sequentially and call ig.add inline, slowing I/O and mutating the
shared ignore instance; change this by mapping each matching ignoreFile to a
read promise (use fs.readFile for each ignoreFile) and await Promise.all to get
contents, then for each resolved content (preserving original file order) call
parseIgnoreContent and add patterns to ig (handling root vs nested dirs and
prefixing patterns the same way); ensure per-file read errors are caught and
logged with logger.trace so failures don’t break the batch, and keep the same
transformation logic for '!' and '/' prefixes when calling ig.add.
src/core/packager.ts (1)

128-133: Redundant Promise.resolve wrapper for rejection suppression.

Promise.resolve(p) returns p itself when p is already a native promise, so Promise.resolve(gitDiffPromise).catch(() => {}) is equivalent to gitDiffPromise.catch(() => {}). The simpler form is clearer and equally effective at registering a handler on the original promise to prevent an unhandled-rejection warning if collectFiles rejects first. The awaited gitDiffPromise/gitLogPromise in the later Promise.all still surface the real error.

-    // Suppress unhandled rejection if collectFiles throws before Promise.all
-    // consumes these promises. The original promises are still awaited below.
-    Promise.resolve(gitDiffPromise).catch(() => {});
-    Promise.resolve(gitLogPromise).catch(() => {});
+    // Suppress unhandled rejection if collectFiles throws before Promise.all
+    // consumes these promises. The original promises are still awaited below.
+    gitDiffPromise.catch(() => {});
+    gitLogPromise.catch(() => {});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/packager.ts` around lines 128 - 133, The Promise.resolve wrappers
are redundant; replace Promise.resolve(gitDiffPromise).catch(() => {}) and
Promise.resolve(gitLogPromise).catch(() => {}) with direct handlers on the
original promises (gitDiffPromise.catch(() => {}) and gitLogPromise.catch(() =>
{})) to suppress unhandled-rejection warnings while still allowing the awaited
Promise.all(gitDiffPromise, gitLogPromise, ...) to surface real errors; locate
usages around deps.getGitDiffs/deps.getGitLogs and the collectFiles/Promise.all
flow and update there.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/core/file/fileSearch.ts`:
- Around line 95-107: The docstring in src/core/file/fileSearch.ts incorrectly
mentions "picomatch" while the implementation uses Minimatch; update the comment
so it references the actual matcher (minimatch/Minimatch) used by the code
instead of picomatch to avoid confusion, e.g., replace the "picomatch (for
include patterns)" text with "minimatch (Minimatch) (for include patterns)" and
ensure any wording aligns with the matcher instantiation found in the file (new
Minimatch(...)).
- Around line 141-149: The synchronous lstatSync loop in
src/core/file/fileSearch.ts blocks the event loop for large repos; replace the
files = files.filter(...) block with an async filtering step that uses
fs.promises.lstat (or node:fs/promises.lstat) and bounded concurrency (e.g., a
small concurrency pool or Promise.allSettled for moderate sizes) to test
isFile() for each path, await the results, and then produce the filtered files
array; also remove the now-unused lstatSync import. Target the code around the
files variable and the current filter callback to perform the async lstat checks
and handle errors by excluding non-files.

---

Nitpick comments:
In `@src/core/file/fileSearch.ts`:
- Around line 159-188: The nested loops in the ignore-file handling (iterating
ignoreFilePatterns and matchingFiles) perform fs.readFile sequentially and call
ig.add inline, slowing I/O and mutating the shared ignore instance; change this
by mapping each matching ignoreFile to a read promise (use fs.readFile for each
ignoreFile) and await Promise.all to get contents, then for each resolved
content (preserving original file order) call parseIgnoreContent and add
patterns to ig (handling root vs nested dirs and prefixing patterns the same
way); ensure per-file read errors are caught and logged with logger.trace so
failures don’t break the batch, and keep the same transformation logic for '!'
and '/' prefixes when calling ig.add.

In `@src/core/packager.ts`:
- Around line 128-133: The Promise.resolve wrappers are redundant; replace
Promise.resolve(gitDiffPromise).catch(() => {}) and
Promise.resolve(gitLogPromise).catch(() => {}) with direct handlers on the
original promises (gitDiffPromise.catch(() => {}) and gitLogPromise.catch(() =>
{})) to suppress unhandled-rejection warnings while still allowing the awaited
Promise.all(gitDiffPromise, gitLogPromise, ...) to surface real errors; locate
usages around deps.getGitDiffs/deps.getGitLogs and the collectFiles/Promise.all
flow and update there.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7ff4c18b-47d2-48e7-8a6b-6c6bb9568cd3

📥 Commits

Reviewing files that changed from the base of the PR and between c55528d and bb274aa.

📒 Files selected for processing (2)
  • src/core/file/fileSearch.ts
  • src/core/packager.ts

Comment on lines +95 to +107
/**
* Fast file enumeration using `git ls-files` for git repositories.
*
* `git ls-files --cached --others --exclude-standard` reads from the
* pre-built git index (~5ms) instead of walking the filesystem (~250ms
* with globby). The result is then post-filtered through the `ignore`
* package (for default/custom/repomixignore patterns) and `picomatch`
* (for include patterns) to produce the same file set as globby.
*
* Returns `null` when the fast path is not applicable (not a git repo,
* `useGitignore` disabled, or git command failure), signalling the
* caller to fall back to globby.
*/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Docstring says picomatch, but the implementation uses minimatch.

Small inconsistency in the new docstring — line 101 references picomatch, while the actual matcher below (line 195) is new Minimatch(...). Please update the comment to avoid future confusion.

- * package (for default/custom/repomixignore patterns) and `picomatch`
+ * package (for default/custom/repomixignore patterns) and `minimatch`
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/**
* Fast file enumeration using `git ls-files` for git repositories.
*
* `git ls-files --cached --others --exclude-standard` reads from the
* pre-built git index (~5ms) instead of walking the filesystem (~250ms
* with globby). The result is then post-filtered through the `ignore`
* package (for default/custom/repomixignore patterns) and `picomatch`
* (for include patterns) to produce the same file set as globby.
*
* Returns `null` when the fast path is not applicable (not a git repo,
* `useGitignore` disabled, or git command failure), signalling the
* caller to fall back to globby.
*/
/**
* Fast file enumeration using `git ls-files` for git repositories.
*
* `git ls-files --cached --others --exclude-standard` reads from the
* pre-built git index (~5ms) instead of walking the filesystem (~250ms
* with globby). The result is then post-filtered through the `ignore`
* package (for default/custom/repomixignore patterns) and `minimatch`
* (for include patterns) to produce the same file set as globby.
*
* Returns `null` when the fast path is not applicable (not a git repo,
* `useGitignore` disabled, or git command failure), signalling the
* caller to fall back to globby.
*/
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/file/fileSearch.ts` around lines 95 - 107, The docstring in
src/core/file/fileSearch.ts incorrectly mentions "picomatch" while the
implementation uses Minimatch; update the comment so it references the actual
matcher (minimatch/Minimatch) used by the code instead of picomatch to avoid
confusion, e.g., replace the "picomatch (for include patterns)" text with
"minimatch (Minimatch) (for include patterns)" and ensure any wording aligns
with the matcher instantiation found in the file (new Minimatch(...)).

Comment on lines +141 to +149
// Filter out symlinks and non-regular files to match globby's behavior
// (followSymbolicLinks:false + onlyFiles:true reports symlinks as non-files).
files = files.filter((f) => {
try {
return lstatSync(path.join(rootDir, f)).isFile();
} catch {
return false;
}
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Sync lstatSync in a per-file loop can block the event loop.

This filter calls lstatSync once per candidate file on the main thread. For a repo with, say, 10k tracked files that's 10k synchronous filesystem syscalls executed back-to-back, which partially defeats this PR's stated goal of keeping the event loop free so getGitDiffs/getGitLogs subprocesses and metricsWarmupPromise can make progress concurrently with file enumeration.

Using async fs.lstat with bounded concurrency (or Promise.all for moderate sizes) lets the concurrent git/metrics work actually be scheduled while stat syscalls are in flight.

♻️ Proposed async variant
-  // Filter out symlinks and non-regular files to match globby's behavior
-  // (followSymbolicLinks:false + onlyFiles:true reports symlinks as non-files).
-  files = files.filter((f) => {
-    try {
-      return lstatSync(path.join(rootDir, f)).isFile();
-    } catch {
-      return false;
-    }
-  });
+  // Filter out symlinks and non-regular files to match globby's behavior
+  // (followSymbolicLinks:false + onlyFiles:true reports symlinks as non-files).
+  const isRegularFile = await Promise.all(
+    files.map(async (f) => {
+      try {
+        return (await fs.lstat(path.join(rootDir, f))).isFile();
+      } catch {
+        return false;
+      }
+    }),
+  );
+  files = files.filter((_, i) => isRegularFile[i]);

The corresponding lstatSync / node:fs import becomes unused; drop it and rely on fs.lstat from node:fs/promises.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/file/fileSearch.ts` around lines 141 - 149, The synchronous
lstatSync loop in src/core/file/fileSearch.ts blocks the event loop for large
repos; replace the files = files.filter(...) block with an async filtering step
that uses fs.promises.lstat (or node:fs/promises.lstat) and bounded concurrency
(e.g., a small concurrency pool or Promise.allSettled for moderate sizes) to
test isFile() for each path, await the results, and then produce the filtered
files array; also remove the now-unused lstatSync import. Target the code around
the files variable and the current filter callback to perform the async lstat
checks and handle errors by excluding non-files.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying repomix with  Cloudflare Pages  Cloudflare Pages

Latest commit: bb274aa
Status: ✅  Deploy successful!
Preview URL: https://0f927342.repomix.pages.dev
Branch Preview URL: https://perf-async-git-ls-files.repomix.pages.dev

View logs

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 18, 2026

Code Review — Claude

Overall this is a well-motivated performance PR. The packager pipelining change (starting git diff/log early) is clean and low-risk. The git ls-files fast path is more complex and carries some behavioral parity risks worth addressing before merge.

Key Findings

1. lstatSync blocks the event loop (high)

src/core/file/fileSearch.ts:143-149

The PR's stated goal is to keep the event loop free, but lstatSync is called synchronously for every file returned by git ls-files. For a repo with thousands of files, this blocks the main thread for 2–200ms depending on filesystem (cold cache, NFS, Docker bind-mounts are worst). This directly contradicts the async theme.

Suggestion: Replace with Promise.all + async fs.lstat from node:fs/promises (already imported):

const statResults = await Promise.all(
  files.map(async (f) => {
    try { return (await fs.lstat(path.join(rootDir, f))).isFile() ? f : null; }
    catch { return null; }
  })
);
files = statResults.filter((f): f is string => f !== null);

2. Docstring mentions "picomatch" but code uses Minimatch (medium)

src/core/file/fileSearch.ts:101

The JSDoc says patterns are filtered through "picomatch" but the implementation uses Minimatch. These are different libraries. Update the comment to reference minimatch.

3. Behavioral parity risk — directory include patterns (high)

src/core/file/fileSearch.ts:192-197

globby automatically expands directory-style include patterns (e.g., src/ matches all files under src/). The Minimatch filter operates on flat file paths from git ls-files. A pattern like src/ or src may not match src/foo/bar.ts through Minimatch, silently dropping files. The Gemini review also flagged this. Consider adding normalizeGlobPattern-style expansion for include patterns in the fast path, or adding a test that verifies parity with globby for directory-style patterns.

4. Promise.resolve(p).catch(() => {}) is redundant (low)

src/core/packager.ts:132-133

gitDiffPromise and gitLogPromise are already Promise instances. Promise.resolve() wrapping is unnecessary. Simplify to:

gitDiffPromise.catch(() => {});
gitLogPromise.catch(() => {});

5. Duplicated worktree adjustment logic (medium)

src/core/file/fileSearch.ts:442-449

The .git/**.git splice for worktrees is copy-pasted between adjustedIgnorePatterns and rawIgnorePatterns. Extract the worktree adjustment into a shared step to avoid divergence.

Test Coverage Gap

The new searchFilesGitFastPath function (~90 lines) has zero test coverage. The existing tests pass because git ls-files fails on the mock path and silently falls through to the mocked globby. Key untested logic:

  • Nested .repomixignore pattern scoping (lines 159-188) — the most algorithmically complex piece, manually prefixing patterns with directory paths. A bug here silently produces wrong file sets with no globby fallback protection.
  • Fast path gating — no test verifies the fast path is used when expected or skipped when useGitignore: false.
  • Include pattern matching via Minimatch — not verified to produce the same results as globby.

At minimum, a unit test mocking child_process.execFile to return a fixed NUL-separated file list would cover the core filtering logic.

Convention Violations

  • File size: fileSearch.ts grows to ~580 lines (project guideline: 250 max). searchFilesGitFastPath is self-contained and could be extracted to fileSearchGitFastPath.ts.
  • Dependency injection: The new function directly calls execFileAsync, lstatSync, fs.readFile, ignore(), Minimatch with no deps parameter, making it untestable per project conventions.

Security

No new security concerns. execFileAsync (not exec) prevents shell injection. The path.join(rootDir, f) pattern without containment check is pre-existing across the codebase.

Suggestion

Consider splitting the PR

The packager pipelining change (starting git diff/log early) is unambiguously correct and zero-risk — it could be merged independently. The searchFilesGitFastPath logic carries more complexity and behavioral parity risk for a speedup that showed mixed results on re-measurement (±10ms, σ=10-18ms). Deferring the fast path until behavioral parity is verified with tests would reduce merge risk.


🤖 Generated with Claude Code

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment on lines +153 to +155
const ig = ignore();
ig.add(adjustedIgnorePatterns);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Git fast path over-filters files: ignore package treats slash-less patterns as matching at any depth, unlike globby

The ignore package follows gitignore semantics where patterns without a / character (like *.pid, .env, *.tgz, *.cache) match against the filename at any directory depth. Globby, in contrast, treats these as glob patterns that only match at the root level. This causes the git fast path to silently exclude files that the globby fallback would include.

Confirmed behavioral difference with test

globby with ignore: ['*.pid', '.env']:

  • packages/app/.env → INCLUDED
  • subdir/foo.pid → INCLUDED

ignore package with same patterns:

  • packages/app/.env → EXCLUDED
  • subdir/foo.pid → EXCLUDED

Affected default ignore patterns (from src/config/defaultIgnore.ts) include: .env, *.pid, *.seed, *.pid.lock, *.tgz, *.cache, .eslintcache, .node_repl_history, .lock-wscript, .hgignore. Any user-defined custom patterns without / are also affected. Since the git fast path is the default for all git repos (config.ignore.useGitignore && !explicitFiles at line 307), this silently changes which files are included in the output — for example, packages/app/.env in a monorepo that is tracked by git would be excluded by the fast path but included by globby.

Prompt for agents
The root cause is that rawIgnorePatterns (from getIgnorePatterns) are designed for globby's glob-based matching, but the ignore package uses gitignore semantics where slash-less patterns match at any depth. To fix this, the patterns passed to ig.add() need to be transformed so that slash-less patterns are anchored to the root. In gitignore semantics, prefixing a pattern with / anchors it to the root directory. So for each pattern in rawIgnorePatterns that does not contain a / character (e.g. *.pid, .env, *.tgz, *.cache), prefix it with / before adding it to the ignore instance. Patterns that already contain / (like dist/**, **/node_modules/**) should be left as-is since they already have correct path-anchoring behavior in both globby and the ignore package. The transformation should happen in searchFilesGitFastPath around line 153-155 where ig.add(adjustedIgnorePatterns) is called, or alternatively in prepareIgnoreContext when building rawIgnorePatterns.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +160 to +162
const fileName = filePattern.replace(/^\*\*\//, '');
const matchingFiles = files.filter((f) => f === fileName || f.endsWith(`/${fileName}`));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Git fast path misses .repomixignore/.ignore files that are gitignored but exist on disk

The git fast path discovers .repomixignore and .ignore files by searching within the git ls-files output (src/core/file/fileSearch.ts:161-162). However, git ls-files --cached --others --exclude-standard excludes files that are in .gitignore. If a .repomixignore file exists on disk but is gitignored (and untracked), it won't appear in the git output, so its patterns won't be applied. Globby's ignoreFiles option finds these files via filesystem traversal regardless of gitignore status. This means the git fast path could include files that should have been excluded by a gitignored .repomixignore.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@yamadashy
Copy link
Copy Markdown
Owner Author

Closing this PR after review:

  • The packager.ts change is essentially a no-op (getGitDiffs/getGitLogs already start in parallel with collectFiles in main's structure).
  • Cherry-picking the prerequisite git fast-path commit (Issue with Ignore #1) brings ~150 lines of new code without test coverage.
  • Independent benchmarking showed no significant improvement (round 1 -10ms, round 2 +10ms, within noise).

Closing in favor of focusing on optimizations with clearer independent effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants