Skip to content

perf(core): Reuse emptyDirPaths from initial searchFiles to eliminate redundant filesystem scan#1272

Closed
yamadashy wants to merge 2 commits intomainfrom
perf/reuse-empty-dir-paths
Closed

perf(core): Reuse emptyDirPaths from initial searchFiles to eliminate redundant filesystem scan#1272
yamadashy wants to merge 2 commits intomainfrom
perf/reuse-empty-dir-paths

Conversation

@yamadashy
Copy link
Owner

@yamadashy yamadashy commented Mar 21, 2026

Summary

  • Reuse emptyDirPaths already collected during the initial searchFiles call in the packager, threading them through produceOutputgenerateOutputbuildOutputGeneratorContext
  • Eliminates a redundant searchFiles call in buildOutputGeneratorContext that was re-scanning the filesystem solely to obtain empty directory paths
  • Simplifies the fallback merge logic using flatMap instead of manual reduce

This removes an unnecessary filesystem traversal that duplicated work already done during the file collection phase.

Checklist

  • Run npm run test
  • Run npm run lint

Open with Devin

yamadashy and others added 2 commits March 20, 2026 01:54
… redundant filesystem scan

When `includeEmptyDirectories` is enabled, `buildOutputGeneratorContext` was calling
`searchFiles` a second time solely to retrieve `emptyDirPaths` that had already been
computed (and discarded) by the initial `searchFiles` call in `pack()`.

This change preserves `emptyDirPaths` from the initial search and threads it through
the output generation pipeline (`pack` → `produceOutput` → `generateOutput` →
`buildOutputGeneratorContext`), eliminating the redundant filesystem scan.

Benchmark results (5-run average on Repomix's own codebase, 963 files):
- produceOutput: 360ms → 52ms (-86%)
- Total pipeline: 3106ms → 2660ms (-14%)

A fallback path is retained for direct callers (e.g., packSkill) that do not
provide the pre-computed emptyDirPaths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the reduce-based merge that unnecessarily accumulated filePaths
(immediately discarded) with a cleaner flatMap approach, consistent with
the allEmptyDirPaths computation in packager.ts.

Addresses Gemini review feedback on PR #1244.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 21, 2026

⚡ Performance Benchmark

Latest commit:4f0fac1
Status:✅ Benchmark complete!
Ubuntu:2.35s (±0.01s) → 2.36s (±0.03s) · +0.01s (+0.4%)
macOS:1.45s (±0.08s) → 1.46s (±0.10s) · +0.01s (+0.8%)
Windows:3.01s (±0.05s) → 2.98s (±0.02s) · -0.04s (-1.2%)
Details
  • Packing the repomix repository with node bin/repomix.cjs
  • Warmup: 2 runs (discarded)
  • Measurement: 10 runs / 20 on macOS (median ± IQR)
  • Workflow run

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b12bf3fe-107d-4a04-816d-2e6c6fea30f1

📥 Commits

Reviewing files that changed from the base of the PR and between 4f50a98 and 4f0fac1.

📒 Files selected for processing (11)
  • src/core/output/outputGenerate.ts
  • src/core/output/outputSplit.ts
  • src/core/packager.ts
  • src/core/packager/produceOutput.ts
  • tests/core/output/diffsInOutput.test.ts
  • tests/core/output/flagFullDirectoryStructure.test.ts
  • tests/core/output/outputGenerate.test.ts
  • tests/core/output/outputGenerateDiffs.test.ts
  • tests/core/packager.test.ts
  • tests/core/packager/produceOutput.test.ts
  • tests/core/packager/splitOutput.test.ts

📝 Walkthrough

Walkthrough

This PR threads an optional emptyDirPaths parameter through the output generation pipeline, from the packager down through produceOutput, generateSplitOutputParts, and into generateOutput/buildOutputGeneratorContext. When provided, empty directory paths are used directly; otherwise, they are computed via searchFiles. Tests are updated to match the new signatures.

Changes

Cohort / File(s) Summary
Core Output Generation
src/core/output/outputGenerate.ts
Added optional emptyDirPaths?: string[] parameter to generateOutput and buildOutputGeneratorContext. When empty directories are enabled and emptyDirPaths is provided, it bypasses searchFiles execution; otherwise falls back to computing paths via searchFiles.
Output Split Handler
src/core/output/outputSplit.ts
Updated generateSplitOutputParts to accept optional emptyDirPaths and thread it through to renderGroups, which forwards it to deps.generateOutput.
Packager & Output Production
src/core/packager.ts, src/core/packager/produceOutput.ts
pack now captures full search results and computes allEmptyDirPaths aggregated across all roots. produceOutput accepts optional emptyDirPaths and propagates it through generateAndWriteSplitOutput and generateAndWriteSingleOutput to downstream functions.
Test Updates — Output Generation
tests/core/output/diffsInOutput.test.ts, tests/core/output/outputGenerate.test.ts, tests/core/output/outputGenerateDiffs.test.ts, tests/core/output/flagFullDirectoryStructure.test.ts
Updated generateOutput and buildOutputGeneratorContext call sites with additional undefined argument to match signature changes.
Test Updates — Packager & Output
tests/core/packager.test.ts, tests/core/packager/produceOutput.test.ts, tests/core/packager/splitOutput.test.ts
Updated produceOutput call sites and assertions to include additional empty array [] or undefined argument, reflecting the new parameter in the call chain.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • #1013: Threads emptyDirPaths through the same output generation functions (generateOutput, outputSplit, produceOutput) to avoid re-computing empty directory paths.
  • #1024: Modifies the same call chain (packagerproduceOutputgenerateOutput) to thread an additional optional parameter (filePathsByRoot) through output generation.
  • #338: Alters generateOutput and buildOutputGeneratorContext signatures in src/core/output/outputGenerate.ts and changes how searchFiles results are handled, directly overlapping with this PR's modifications.
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: reusing emptyDirPaths from the initial searchFiles to eliminate a redundant filesystem scan, which is the core optimization in the PR.
Description check ✅ Passed The description covers the main changes, includes a summary section, and the checklist items are completed, but lacks some formatting details from the template.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/reuse-empty-dir-paths

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves performance by optimizing how empty directory paths are collected and utilized. Instead of performing a redundant filesystem scan during output generation, the system now reuses the emptyDirPaths discovered during the initial file search phase. This change streamlines the process, reduces unnecessary I/O operations, and simplifies the related code paths, leading to a more efficient and cleaner codebase.

Highlights

  • Performance Optimization: Reused emptyDirPaths from the initial searchFiles call, passing them through the output generation pipeline.
  • Redundant Filesystem Scan Elimination: Removed a duplicate searchFiles call in buildOutputGeneratorContext that was previously re-scanning the filesystem solely to obtain empty directory paths.
  • Code Simplification: Refactored the fallback merge logic for empty directory paths to use flatMap instead of a manual reduce operation.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively improves performance by reusing emptyDirPaths from the initial searchFiles call, which successfully eliminates a redundant filesystem scan. The implementation correctly threads the emptyDirPaths parameter through various functions, from packager down to outputGenerate. Additionally, the fallback logic for generating these paths has been refactored to use flatMap, which is a nice simplification over the previous reduce implementation, improving code clarity. The associated tests have been updated to accommodate these signature changes. The changes are well-executed and represent a good optimization.

@claude
Copy link
Contributor

claude bot commented Mar 21, 2026

Code Review

Overall: Clean, well-scoped performance optimization. The approach of threading emptyDirPaths from the initial searchFiles call through the pipeline is sound and eliminates a redundant filesystem traversal.

Positives

  • The fallback path in buildOutputGeneratorContext is correctly preserved for callers like packSkill that don't pass emptyDirPaths
  • The flatMap refactor is cleaner than the previous reduce
  • Test coverage is updated across all affected call sites
  • No behavioral changes — pure performance improvement

Observations

1. Growing positional parameter lists

Functions like generateOutput, buildOutputGeneratorContext, produceOutput, and renderGroups now have 9-10 positional parameters, many of which are optional/undefined. This makes call sites fragile (easy to mix up parameter order) and is already visible in the tests where long chains of undefined are needed.

This PR isn't the right place to fix it, but a follow-up to consolidate these into an options object would improve maintainability. For example:

interface GenerateOutputOptions {
  rootDirs: string[];
  config: RepomixConfigMerged;
  processedFiles: ProcessedFile[];
  allFilePaths: string[];
  gitDiffResult?: GitDiffResult;
  gitLogResult?: GitLogResult;
  filePathsByRoot?: FilesByRoot[];
  emptyDirPaths?: string[];
}
2. packSkill doesn't benefit from the optimization

packSkill calls buildOutputGeneratorContext directly (line 106 of packSkill.ts) without passing emptyDirPaths, so it still triggers the fallback filesystem scan. This is correctly handled, but if packSkill is called frequently, it might be worth threading emptyDirPaths there too in a future PR.

3. Minor: early computation of emptyDirPaths

In packager.ts:82, allEmptyDirPaths is computed eagerly even when:

  • The skill generation path is taken (line 134), which returns before produceOutput is called
  • config.output.includeEmptyDirectories is false, so emptyDirPaths won't be used

The cost is negligible (a flatMap + Set + sort over an already-collected array), so this isn't a real concern — just noting it for completeness.

Verdict

Looks good to merge. No bugs or correctness issues found. The change is well-contained and the fallback ensures backward compatibility for all callers.


🤖 Generated with Claude Code

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

@yamadashy yamadashy closed this Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant