Skip to content

perf(core): Cache empty directory paths to avoid redundant file search#1356

Merged
yamadashy merged 1 commit intomainfrom
perf/cache-empty-dir-paths
Apr 1, 2026
Merged

perf(core): Cache empty directory paths to avoid redundant file search#1356
yamadashy merged 1 commit intomainfrom
perf/cache-empty-dir-paths

Conversation

@yamadashy
Copy link
Copy Markdown
Owner

@yamadashy yamadashy commented Mar 29, 2026

When includeEmptyDirectories is enabled, buildOutputGeneratorContext called searchFiles a second time just to obtain emptyDirPaths — despite these already being computed during the initial file search in packager. This PR eliminates the redundant search by threading the pre-computed paths through the output pipeline.

Changes

  • Cache emptyDirPaths from initial search: Thread pre-computed paths through the pipeline (packagerproduceOutputgenerateOutput/outputSplitbuildOutputGeneratorContext), skipping the redundant searchFiles call
  • Guard with includeEmptyDirectories check: Skip emptyDirPaths dedup/sort when the feature is disabled (the common case), avoiding unnecessary allocations
  • Fix split output path: emptyDirPaths was not being passed through to generateSplitOutputParts — now it is

Pipeline flow

searchFiles (returns filePaths + emptyDirPaths)
  → packager (dedup + sort once, only when includeEmptyDirectories is enabled)
  → produceOutput → generateOutput / outputSplit
  → buildOutputGeneratorContext (uses cached paths, skips redundant search)

Benchmark

Local benchmark (repomix on itself, includeEmptyDirectories: true, 10 runs):

Mean ± σ
main 696.6ms ± 4.2ms
this PR 637.1ms ± 2.6ms
Improvement ~60ms (~8.5%)

Note: CI benchmark shows no change because the improvement only applies when includeEmptyDirectories is enabled.

Checklist

  • Run npm run test (1100 tests pass)
  • Run npm run lint (clean)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 29, 2026

⚡ Performance Benchmark

Latest commit:96a6a7c perf(core): Cache empty directory paths to avoid redundant file search
Status:✅ Benchmark complete!
Ubuntu:1.78s (±0.02s) → 1.78s (±0.02s) · -0.00s (-0.2%)
macOS:1.05s (±0.08s) → 1.04s (±0.07s) · -0.01s (-1.2%)
Windows:2.25s (±0.18s) → 2.24s (±0.14s) · -0.01s (-0.4%)
Details
  • Packing the repomix repository with node bin/repomix.cjs
  • Warmup: 2 runs (discarded), interleaved execution
  • Measurement: 20 runs / 30 on macOS (median ± IQR)
  • Workflow run
History

76b73c6 fix(core): Add PermissionError handling to directory globby and test emptyDirPaths cache

Ubuntu:1.79s (±0.02s) → 1.80s (±0.02s) · +0.01s (+0.3%)
macOS:1.15s (±0.11s) → 1.11s (±0.23s) · -0.04s (-3.6%)
Windows:2.31s (±0.51s) → 2.48s (±0.28s) · +0.17s (+7.4%)

1766de9 perf(core): Cache empty directory paths and parallelize directory globby

Ubuntu:1.79s (±0.04s) → 1.79s (±0.03s) · -0.00s (-0.2%)
macOS:1.25s (±0.21s) → 1.24s (±0.21s) · -0.00s (-0.3%)
Windows:2.13s (±0.03s) → 2.13s (±0.03s) · +0.00s (+0.2%)

bb73580 perf(core): Cache empty directory paths and parallelize directory globby

Ubuntu:1.89s (±0.03s) → 1.88s (±0.03s) · -0.00s (-0.2%)
macOS:1.07s (±0.08s) → 1.05s (±0.11s) · -0.02s (-1.7%)
Windows:2.29s (±0.03s) → 2.30s (±0.03s) · +0.00s (+0.0%)

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 29, 2026

📝 Walkthrough

Walkthrough

This PR refactors searchFiles to concurrently search for files and directories, returning both results. The emptyDirPaths result is then threaded through the output generation pipeline (produceOutput, generateOutput, generateSplitOutputParts) as an optional parameter, enabling reuse of pre-computed empty directory paths instead of recomputation.

Changes

Cohort / File(s) Summary
File search refactoring
src/core/file/fileSearch.ts
Refactored searchFiles to compute globby options once and run file/directory scans concurrently via Promise.all. Now returns both filePaths and nullable directories instead of sequential operations.
Output generation parameter threading
src/core/output/outputGenerate.ts, src/core/output/outputSplit.ts, src/core/packager/produceOutput.ts
Added optional emptyDirPaths?: string[] parameter to generateOutput, buildOutputGeneratorContext, generateSplitOutputParts, and produceOutput, threading it through both single and split-output code paths.
Packager integration
src/core/packager.ts
Updated file-search stage to capture full searchFiles results (both files and directories), compute deduplicated empty-directory paths when enabled, and forward to output generation.
Test updates
tests/core/output/diffsInOutput.test.ts, tests/core/output/flagFullDirectoryStructure.test.ts, tests/core/output/outputGenerate.test.ts, tests/core/output/outputGenerateDiffs.test.ts, tests/core/packager.test.ts, tests/core/packager/produceOutput.test.ts, tests/core/packager/splitOutput.test.ts
Updated test invocations to pass undefined for the new optional emptyDirPaths parameter, aligning with updated function signatures across output generation and packager flows.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The title accurately captures the main optimization: caching empty directory paths to avoid redundant file search operations.
Description check ✅ Passed The description includes all required template sections with comprehensive explanation of changes, pipeline flow, benchmark results, and completed checklist items.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/cache-empty-dir-paths

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 29, 2026

Codecov Report

❌ Patch coverage is 80.00000% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.05%. Comparing base (729427f) to head (96a6a7c).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/core/output/outputGenerate.ts 70.00% 3 Missing ⚠️
src/core/packager.ts 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1356      +/-   ##
==========================================
+ Coverage   86.96%   87.05%   +0.09%     
==========================================
  Files         116      116              
  Lines        4425     4433       +8     
  Branches     1025     1029       +4     
==========================================
+ Hits         3848     3859      +11     
+ Misses        577      574       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages bot commented Mar 29, 2026

Deploying repomix with  Cloudflare Pages  Cloudflare Pages

Latest commit: 96a6a7c
Status: ✅  Deploy successful!
Preview URL: https://138288ed.repomix.pages.dev
Branch Preview URL: https://perf-cache-empty-dir-paths.repomix.pages.dev

View logs

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes the file and directory search process by executing globbing operations in parallel and caching empty directory paths for reuse during output generation. By passing pre-computed directory paths through the packaging pipeline, the implementation avoids redundant file system scans when building the directory structure. I have no feedback to provide.

@claude

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/core/packager/produceOutput.test.ts (1)

125-140: Please exercise a real emptyDirPaths value in the split-output test.

Line 134 still passes undefined, and this test never inspects mockDeps.generateOutput.mock.calls, so it would miss a regression in the new empty-directory threading. Seed a non-empty emptyDirPaths array here and assert every split generateOutput call receives it unchanged.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/packager/produceOutput.test.ts` around lines 125 - 140, The test
calls produceOutput with undefined for the emptyDirPaths parameter, so it won't
catch regressions in passing empty-directory info to split outputs; update the
produceOutput call in produceOutput.test.ts to pass a non-empty array (e.g.,
['path/to/emptyDir']) as the emptyDirPaths argument, then add assertions that
mockDeps.generateOutput was called for each split and that each call received
the same emptyDirPaths array (inspect mockDeps.generateOutput.mock.calls and
verify the argument matching emptyDirPaths) to ensure empty-directory threading
is preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/core/file/fileSearch.ts`:
- Around line 209-213: The directory scan promise (dirPromise) doesn't convert
EACCES/EPERM into the PermissionError like filePromise does, so
enableEmptyDirectories permission failures are treated as generic errors; update
the dirPromise creation in fileSearch.ts to mirror the filePromise error
mapping: wrap the globby call used for dirPromise with the same catch/transform
logic that checks for error.code === 'EACCES' || 'EPERM' and throws
PermissionError (same class/constructor used by filePromise), then await
Promise.all([filePromise, dirPromise]) as before so both paths yield consistent
PermissionError behavior.

---

Nitpick comments:
In `@tests/core/packager/produceOutput.test.ts`:
- Around line 125-140: The test calls produceOutput with undefined for the
emptyDirPaths parameter, so it won't catch regressions in passing
empty-directory info to split outputs; update the produceOutput call in
produceOutput.test.ts to pass a non-empty array (e.g., ['path/to/emptyDir']) as
the emptyDirPaths argument, then add assertions that mockDeps.generateOutput was
called for each split and that each call received the same emptyDirPaths array
(inspect mockDeps.generateOutput.mock.calls and verify the argument matching
emptyDirPaths) to ensure empty-directory threading is preserved.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 840f9d8a-54dc-4cf7-826c-0a0c2a6258f6

📥 Commits

Reviewing files that changed from the base of the PR and between 81fc9eb and bb73580.

📒 Files selected for processing (12)
  • src/core/file/fileSearch.ts
  • src/core/output/outputGenerate.ts
  • src/core/output/outputSplit.ts
  • src/core/packager.ts
  • src/core/packager/produceOutput.ts
  • tests/core/output/diffsInOutput.test.ts
  • tests/core/output/flagFullDirectoryStructure.test.ts
  • tests/core/output/outputGenerate.test.ts
  • tests/core/output/outputGenerateDiffs.test.ts
  • tests/core/packager.test.ts
  • tests/core/packager/produceOutput.test.ts
  • tests/core/packager/splitOutput.test.ts

@yamadashy yamadashy force-pushed the perf/cache-empty-dir-paths branch from bb73580 to 1766de9 Compare April 1, 2026 14:42
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 1, 2026

Code Review

The optimization is well-motivated and correctly implemented — eliminating the redundant searchFiles call and fixing the split-output bug are both valuable. A few items worth considering:

1. Function signature bloat (positional parameters)

produceOutput now has 10 positional parameters, generateOutput and buildOutputGeneratorContext have 8 each. Test call sites show the fragility: multiple locations with 5+ consecutive undefined arguments (e.g., produceOutput.test.ts).

generateSplitOutputParts already uses a destructured options object — consider grouping the two related optional hints (filePathsByRoot and emptyDirPaths) into a single typed object at the generateOutput/buildOutputGeneratorContext boundary:

interface OutputGenerationHints {
  filePathsByRoot?: FilesByRoot[];
  emptyDirPaths?: string[];
}

This would reduce arity, improve call-site readability, and provide a natural home for future pre-computed data without further signature growth. Could be a follow-up.

2. Test coverage for the new behavior

Details

All test changes are mechanical undefined padding — no test exercises the actual optimization:

  • No test passes a populated emptyDirPaths to verify searchFiles is not called a second time
  • No test verifies the fallback path (when emptyDirPaths is undefined but includeEmptyDirectories is true)
  • The split-output bug fix (now threading emptyDirPaths through generateSplitOutputParts) is untested
  • Edge cases (empty array, multi-root merge) are not covered

Suggested additions:

  1. Call buildOutputGeneratorContext with emptyDirPaths provided → assert deps.searchFiles is not called
  2. Call without emptyDirPaths + includeEmptyDirectories: true → assert deps.searchFiles is called

3. Promise.all with undefined is non-obvious

const dirPromise = config.output.includeEmptyDirectories
  ? globby(includePatterns, { ...baseGlobbyOptions, onlyDirectories: true })
  : undefined;
const [filePaths, directories] = await Promise.all([filePromise, dirPromise]);

Promise.all silently resolves undefined elements, which works but is non-obvious. Consider using Promise.resolve([] as string[]) as the fallback — then the downstream check becomes if (directories.length > 0) instead of if (directories), which is more semantically clear.

4. Removed timing instrumentation

Details

The per-phase directory-search timing log was removed. The combined globbyElapsedTime now covers both parallel calls, making it harder to identify whether the file or directory scan is the bottleneck when includeEmptyDirectories is enabled. Consider keeping individual timing or a comment noting the merged measurement.

Minor notes

  • The dirPromise path lacks the EPERM/EACCES error handler that filePromise has — pre-existing asymmetry made more visible by the refactor
  • Benchmark results (-0.2% to +0.0%) are expected since includeEmptyDirectories defaults to false; the real gains occur only when the feature is enabled

Overall: solid optimization with a genuine bug fix for split output. The main suggestion is to consider an options object pattern (as a follow-up) and add targeted tests for the cache/fallback behavior.


Review by Claude

When includeEmptyDirectories is enabled, buildOutputGeneratorContext
called searchFiles a second time just to obtain emptyDirPaths, despite
these already being computed during the initial file search in packager.

Changes:
- Capture emptyDirPaths from the initial searchFiles result in packager
  and thread them through the pipeline (packager → produceOutput →
  generateOutput/outputSplit → buildOutputGeneratorContext)
- Guard emptyDirPaths processing with includeEmptyDirectories check to
  skip unnecessary work when the feature is disabled
- Fix split output path which was not receiving emptyDirPaths despite
  the parameter being declared in produceOutput's signature
- Add tests for cache hit (searchFiles not called) and fallback paths

Local benchmark (repomix on itself, includeEmptyDirectories: true):
  main:   696.6ms ± 4.2ms
  branch: 637.1ms ± 2.6ms
  Improvement: ~60ms (~8.5%)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@yamadashy yamadashy force-pushed the perf/cache-empty-dir-paths branch from 76b73c6 to 96a6a7c Compare April 1, 2026 15:03
@yamadashy yamadashy changed the title perf(core): Cache empty directory paths and parallelize directory globby perf(core): Cache empty directory paths to avoid redundant file search Apr 1, 2026
@yamadashy yamadashy merged commit 9d6e224 into main Apr 1, 2026
68 of 69 checks passed
@yamadashy yamadashy deleted the perf/cache-empty-dir-paths branch April 1, 2026 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant