Skip to content

fix(core): bypass globby in stdin mode to avoid hang on macOS#931

Closed
yamadashy wants to merge 2 commits intomainfrom
fix/stdin-hang-bypass-globby
Closed

fix(core): bypass globby in stdin mode to avoid hang on macOS#931
yamadashy wants to merge 2 commits intomainfrom
fix/stdin-hang-bypass-globby

Conversation

@yamadashy
Copy link
Copy Markdown
Owner

Fixes #903

Summary

When using --stdin with explicit file paths (e.g., git ls-files | repomix --stdin), the application could hang on macOS when .gitignore files were present in the explicit file list.

Root Cause

globby tried to use .gitignore as both:

  • A source of ignore rules (via ignoreFiles parameter)
  • A match target (via includePatterns)

This created a pathological state causing hangs on macOS.

Solution

Bypass globby for explicit files in stdin mode:

  1. Run globby only for includePatterns (if configured)
  2. Manually read and parse .gitignore from root directory
  3. Filter explicit files using minimatch with loaded ignore patterns
  4. Merge globby results and filtered explicit files
  5. Remove duplicates

This ensures:

  • .gitignore rules are respected in stdin mode
  • ✅ No hang occurs even with .gitignore in the file list
  • includePatterns work correctly alongside stdin files

Limitations

  • Currently only reads root .gitignore (subdirectory .gitignore files not supported in stdin mode)
  • This is acceptable since git ls-files already respects all .gitignore files

Changes

  • Modified fileSearch.ts to implement bypass logic
  • Updated existing tests to match new behavior
  • Added new tests for:
    • .gitignore rules applied to stdin files
    • Merging globby and explicit file results
    • Empty directory handling in stdin mode

Testing

Unit Tests

  • All 41 tests pass in tests/core/file/fileSearch.test.ts

Manual Testing

cd /tmp/test-repo
printf ".gitignore\nignored_file.txt\nnormal.txt\n" | repomix --stdin

# Result: ignored_file.txt correctly excluded ✅
# Files: .gitignore, normal.txt (3 files total)

Performance Test

# 5 consecutive runs
for i in {1..5}; do
  printf '.gitignore\nnormal.txt\n' | repomix --stdin -o /tmp/test-$i.txt
done

# Completed in 3 seconds (no hang) ✅

Review Focus

  1. .gitignore rules are correctly applied to stdin files
  2. ✅ No hang occurs with .gitignore in file list
  3. includePatterns and stdin files merge correctly
  4. ⚠️ Subdirectory .gitignore files are not supported (limitation documented)
  5. ✅ All existing tests pass

Checklist

  • Run npm run test
  • Run npm run lint

Fixes #903

When using --stdin with explicit file paths (e.g., git ls-files | repomix --stdin),
the application could hang on macOS when .gitignore files were present in the
explicit file list. This happened because globby tried to use .gitignore as both
a source of ignore rules AND a match target, creating a pathological state.

Solution:
- In stdin mode, completely bypass globby for explicit files
- Run globby only for includePatterns (if configured)
- Manually read and parse .gitignore from root directory
- Filter explicit files using minimatch with loaded ignore patterns
- Merge globby results and filtered explicit files
- Remove duplicates

This ensures:
- .gitignore rules are respected in stdin mode
- No hang occurs even with .gitignore in the file list
- includePatterns work correctly alongside stdin files

Limitations:
- Currently only reads root .gitignore (subdirectory .gitignore files not supported in stdin mode)
- This is acceptable since git ls-files already respects all .gitignore files

Changes:
- Modified src/core/file/fileSearch.ts to implement bypass logic
- Updated existing tests to match new behavior
- Added new tests for:
  - .gitignore rules applied to stdin files
  - Merging globby and explicit file results
  - Empty directory handling in stdin mode
- All 41 tests pass
- Manual testing confirms no hang and correct filtering

Co-Authored-By: yamadashy <koukun0120@gmail.com>
Copilot AI review requested due to automatic review settings November 3, 2025 07:04
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Nov 3, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

The changes implement stdin-mode file discovery by introducing explicit file handling that bypasses globby, manually applies ignore rules, and integrates additional ignore sources (.gitignore, .repomixignore). Normal mode behavior is preserved, and verbose tracing is added for debugging.

Changes

Cohort / File(s) Change Summary
Core file discovery logic
src/core/file/fileSearch.ts
Introduces stdin-mode handling that processes explicit files separately, bypassing globby for those files and filtering them against manually-aggregated ignore patterns (from .gitignore and .repomixignore). Runs globby only for include patterns, then merges and deduplicates globby results with filtered explicit files. Adds verbose tracing and improved error handling with custom PermissionError. Adjusts empty directory collection to run only in non-stdin mode.
Test coverage
tests/core/file/fileSearch.test.ts
Updates test expectations to reflect new stdin-mode logic where explicit files are filtered against ignore rules and merged with globby results. Replaces exact globby result expectations with arrayContaining to account for mixed sources. Adds new test cases for .gitignore rule application to explicit files, deduplication of merged results, and empty directory handling in stdin mode.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant FileSearch as fileSearch()
    participant Globby
    participant IgnoreFiles as .gitignore<br/>.repomixignore
    participant Minimatch as minimatch<br/>(filter)
    
    Caller->>FileSearch: explicitFiles provided (stdin mode)
    
    rect rgb(200, 220, 255)
    Note over FileSearch,IgnoreFiles: Load ignore patterns
    FileSearch->>IgnoreFiles: Read .gitignore
    FileSearch->>IgnoreFiles: Read .repomixignore
    FileSearch->>FileSearch: Aggregate ignore patterns
    end
    
    rect rgb(220, 200, 255)
    Note over FileSearch,Globby: Process files
    FileSearch->>Globby: Run globby (include patterns only)
    Globby-->>FileSearch: globby results
    FileSearch->>Minimatch: Filter explicit files<br/>against aggregated ignore patterns
    Minimatch-->>FileSearch: filtered explicit files
    end
    
    rect rgb(200, 255, 220)
    Note over FileSearch: Merge & deduplicate
    FileSearch->>FileSearch: Merge globby + filtered explicit files
    FileSearch->>FileSearch: Deduplicate filePaths
    end
    
    FileSearch-->>Caller: Final filePaths

    rect rgb(255, 240, 200)
    Note over FileSearch: Normal mode (no explicitFiles)
    FileSearch->>Globby: Use traditional globby flow<br/>with include/exclude patterns
    Globby-->>FileSearch: Results
    FileSearch-->>Caller: filePaths
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Key areas requiring attention:

  • stdin-mode branching logic: Verify correct separation between stdin and normal modes, especially edge cases around empty explicitFiles or missing include patterns
  • Ignore pattern aggregation: Ensure .gitignore and .repomixignore are read and normalized correctly, and that patterns are applied consistently across globby and explicit files
  • Deduplication and merging: Validate that globby results and filtered explicit files are correctly merged without duplicates
  • Test coverage alignment: Confirm new tests accurately validate the merged behavior and that old test expectations have been properly updated
  • Empty directory handling: Verify the logic for when empty directories are computed (only non-stdin mode with include patterns)

Possibly related PRs

Suggested reviewers

  • yamadashy

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The pull request title "fix(core): bypass globby in stdin mode to avoid hang on macOS" is concise, specific, and directly communicates the primary change and its purpose. The title accurately reflects the main objective of the changeset—addressing the macOS hang by modifying stdin mode behavior—without being vague or generic. The developer has chosen a clear summary that would help teammates quickly understand the nature of the fix when reviewing version history.
Out of Scope Changes Check ✅ Passed All changes in the pull request are directly related to fixing the stdin mode hang issue. Modifications to src/core/file/fileSearch.ts implement the bypass logic for handling explicit files in stdin mode, and updates to tests/core/file/fileSearch.test.ts validate the new behavior with appropriate test coverage. No unrelated refactoring, dependency updates, or out-of-scope changes are present. The changes are tightly focused on the specific problem identified in issue #903.
Description Check ✅ Passed The pull request description is comprehensive and well-structured, exceeding the template requirements. It includes a summary of changes, detailed explanation of the root cause and solution, testing results (41 tests pass), manual and performance verification on macOS, documented limitations, and a completed checklist confirming that npm run test and npm run lint have been executed. All required template sections are present and substantively filled out.

[pre_merge_check_pass]

The PR successfully addresses the core objective from issue #903 by fixing the macOS hang when using stdin with explicit file paths. The root cause is identified (globby treating .gitignore as both an ignore rule source and match target), a practical solution is implemented (bypassing globby for explicit stdin files), and the fix is demonstrated to work through unit tests (all 41 passing), manual testing, and performance verification on macOS. The solution preserves stdin support, file filtering, and ignore pattern functionality as required. While the PR identifies a different root cause than initially suspected (globby pathology rather than worker/pool interaction), it provides an effective fix with thorough testing and verification steps, satisfying the primary objectives of issue #903. |

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/stdin-hang-bypass-globby

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @yamadashy, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug causing the application to hang on macOS when processing files via standard input (--stdin) if a .gitignore file was present in the provided file list. The core issue stemmed from globby attempting to use .gitignore for both ignore rules and as a file to match, leading to a pathological state. The solution introduces a refined file search mechanism for stdin mode, where explicit files are processed separately, manually filtered against ignore rules, and then merged with any globby-discovered files, ensuring stability and correct .gitignore behavior.

Highlights

  • macOS Hang Fix: Resolved a critical issue causing the application to hang on macOS when using --stdin with explicit file paths that included .gitignore.
  • Reworked Stdin File Processing: Implemented a new logic for --stdin mode that bypasses globby for explicit files, preventing conflicts with .gitignore handling.
  • Manual .gitignore Application: Explicit files are now manually filtered against root .gitignore and .repomixignore rules using minimatch.
  • Merged File Discovery: Combines results from globby (for configured includePatterns) and manually filtered explicit files, ensuring all relevant files are included without duplicates.
  • Enhanced Test Coverage: Added new unit tests to validate .gitignore application, merging logic, and empty directory handling in stdin mode.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages bot commented Nov 3, 2025

Deploying repomix with  Cloudflare Pages  Cloudflare Pages

Latest commit: 636a75b
Status: ✅  Deploy successful!
Preview URL: https://b57e69ea.repomix.pages.dev
Branch Preview URL: https://fix-stdin-hang-bypass-globby.repomix.pages.dev

View logs

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves a critical hanging issue on macOS when using stdin mode by bypassing globby for explicit file paths. The new logic correctly handles ignore patterns manually and merges the results, and the changes are well-supported by updated and new tests. My feedback focuses on improving code maintainability by addressing a couple of instances of code duplication that were introduced with the new logic. By refactoring these parts into helper functions, the code will be cleaner and easier to manage in the future.

Comment on lines +188 to +215
if (config.ignore.useGitignore) {
// Read .gitignore from root directory
const gitignorePath = path.join(rootDir, '.gitignore');
try {
const gitignoreContent = await fs.readFile(gitignorePath, 'utf8');
const gitignorePatterns = parseIgnoreContent(gitignoreContent);
allIgnorePatterns.push(...gitignorePatterns);
logger.trace('Loaded .gitignore patterns:', gitignorePatterns);
} catch (error) {
// .gitignore might not exist, which is fine
logger.trace(
'No .gitignore found or could not read:',
error instanceof Error ? error.message : String(error),
);
}

// If no include patterns at all, default to all files
if (includePatterns.length === 0) {
includePatterns = ['**/*'];
}
// Read .repomixignore from root directory
const repomixignorePath = path.join(rootDir, '.repomixignore');
try {
const repomixignoreContent = await fs.readFile(repomixignorePath, 'utf8');
const repomixignorePatterns = parseIgnoreContent(repomixignoreContent);
allIgnorePatterns.push(...repomixignorePatterns);
logger.trace('Loaded .repomixignore patterns:', repomixignorePatterns);
} catch (error) {
// .repomixignore might not exist, which is fine
logger.trace(
'No .repomixignore found or could not read:',
error instanceof Error ? error.message : String(error),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for reading and parsing .gitignore and .repomixignore is duplicated. To improve maintainability and reduce code repetition, you can extract this logic into a helper function.

      if (config.ignore.useGitignore) {
        const readIgnoreFile = async (fileName: string) => {
          try {
            const content = await fs.readFile(path.join(rootDir, fileName), 'utf8');
            const patterns = parseIgnoreContent(content);
            allIgnorePatterns.push(...patterns);
            logger.trace(`Loaded ${fileName} patterns:`, patterns);
          } catch (error) {
            // Ignore file might not exist, which is fine
            logger.trace(
              `No ${fileName} found or could not read:`,
              error instanceof Error ? error.message : String(error),
            );
          }
        };

        await readIgnoreFile('.gitignore');
        await readIgnoreFile('.repomixignore');
      }

Comment on lines +269 to +286

logger.trace('Include patterns:', patterns);

filePaths = await globby(patterns, {
cwd: rootDir,
ignore: [...adjustedIgnorePatterns],
ignoreFiles: [...ignoreFilePatterns],
onlyFiles: true,
absolute: false,
dot: true,
followSymbolicLinks: false,
}).catch((error: unknown) => {
const code = (error as NodeJS.ErrnoException | { code?: string })?.code;
if (code === 'EPERM' || code === 'EACCES') {
throw new PermissionError(
`Permission denied while scanning directory. Please check folder access permissions for your terminal app. path: ${rootDir}`,
rootDir,
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The globby call and its associated error handling logic are duplicated here and in the if (explicitFiles) block (lines 219-239). This repetition can be avoided by creating a reusable helper function for running globby with the common options and error handling. This would make the code cleaner and easier to maintain.

@codecov
Copy link
Copy Markdown

codecov bot commented Nov 3, 2025

Codecov Report

❌ Patch coverage is 70.45455% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.35%. Comparing base (6d3485a) to head (636a75b).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
src/core/file/fileSearch.ts 70.45% 26 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #931      +/-   ##
==========================================
- Coverage   89.50%   89.35%   -0.16%     
==========================================
  Files         111      111              
  Lines        7807     7865      +58     
  Branches     1498     1507       +9     
==========================================
+ Hits         6988     7028      +40     
- Misses        819      837      +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Nov 3, 2025

PR Review: Fix stdin mode hang on macOS

Summary

This PR effectively resolves issue #903 by bypassing globby when processing explicit files in stdin mode. The root cause analysis is solid, and the solution is pragmatic. The implementation separates concerns between globby (for include patterns) and manual filtering (for explicit files).

✅ Strengths

  1. Correct Root Cause Analysis: Identified that globby treating .gitignore as both ignore source and match target causes the hang.

  2. Well-Tested: Added 4 new comprehensive test cases covering edge cases:

    • .gitignore rule application
    • Merging globby and explicit files
    • Empty directory handling
    • Ignore-only patterns without includes
  3. Clear Documentation: Inline comments explain the logic flow and limitations.

  4. Backward Compatible: Existing functionality remains unchanged for non-stdin mode.


🔍 Code Quality & Best Practices

Positive Aspects:

  • Good separation of concerns: stdin mode logic is clearly isolated from normal mode
  • Proper error handling: Try-catch blocks for missing .gitignore/.repomixignore files
  • Consistent with project patterns: Uses dependency injection pattern already established
  • Logging: Appropriate trace/debug logging for troubleshooting

Minor Suggestions:

  1. Consider extracting stdin logic to separate function (line 182-265 in fileSearch.ts):
    The stdin mode block is ~80 lines. Consider extracting to searchFilesStdinMode() for better maintainability:

    if (explicitFiles) {
      filePaths = await searchFilesStdinMode(rootDir, config, explicitFiles, adjustedIgnorePatterns, includePatterns, ignoreFilePatterns);
    } else {
      // Normal mode...
    }
  2. DRY principle: The .gitignore/.repomixignore reading logic (lines 189-218) could be extracted to a reusable function since it's similar to existing parseIgnoreContent:

    const loadIgnoreFilePatterns = async (rootDir: string, filenames: string[]): Promise<string[]> => {
      const patterns: string[] = [];
      for (const filename of filenames) {
        try {
          const content = await fs.readFile(path.join(rootDir, filename), 'utf8');
          patterns.push(...parseIgnoreContent(content));
        } catch (error) {
          logger.trace(\`Could not read \${filename}:\`, error);
        }
      }
      return patterns;
    };
  3. Performance optimization: Consider using Promise.all for reading .gitignore and .repomixignore in parallel (lines 189-218).


⚠️ Potential Bugs & Issues

1. Case Sensitivity in Pattern Matching (line 254-256)

The minimatch pattern matching may behave differently from globby's ignoreFiles behavior regarding case sensitivity:

const shouldIgnore = allIgnorePatterns.some(
  (pattern) => minimatch(filePath, pattern) || minimatch(\`\${filePath}/\`, pattern),
);

Issue: globby's ignoreFiles option uses ignore package which has specific case-sensitivity rules. minimatch defaults may differ.

Mitigation: Add nocase option if needed:

minimatch(filePath, pattern, { dot: true, nocase: process.platform === 'win32' })

2. Negation Patterns Not Handled (line 252-258)

.gitignore supports negation patterns (e.g., !important.log), but the current implementation doesn't handle them:

const filteredExplicitFiles = relativePaths.filter((filePath) => {
  const shouldIgnore = allIgnorePatterns.some(
    (pattern) => minimatch(filePath, pattern) || minimatch(\`\${filePath}/\`, pattern),
  );
  return !shouldIgnore;
});

Impact: Files that should be un-ignored via negation patterns will still be ignored.

Suggested Fix: Use the ignore package (same as globby uses internally) instead of raw minimatch:

import ignore from 'ignore';

const ig = ignore().add(allIgnorePatterns);
const filteredExplicitFiles = relativePaths.filter(filePath => !ig.ignores(filePath));

3. Subdirectory .gitignore Files (Documented Limitation)

The PR notes this limitation, which is acceptable for the stated use case. However:

Edge Case: If a user pipes explicit files that should be ignored by subdirectory .gitignore but aren't handled by root .gitignore, they'll be included.

Example:

root/
  .gitignore (contains: *.log)
  subdir/
    .gitignore (contains: temp.txt)
    temp.txt

If subdir/temp.txt is in the stdin list, it will be included (incorrectly).

Recommendation: Document this more prominently or consider reading subdirectory .gitignore files if performance allows.

4. Empty includePatterns + No explicitFiles Edge Case (line 268)

This is handled correctly, but worth noting the subtle behavior change:

  • Old: Always used includePatterns even if provided (line 178-179 in diff context)
  • New: Conditionally uses patterns

The logic is correct, but verify this doesn't break any unusual configurations.


🚨 Premortem Analysis: Potential Failure Scenarios

Critical Risks:

1. Windows Path Separator Issues

Scenario: Windows uses backslashes, but patterns use forward slashes.
Impact: Pattern matching may fail on Windows for explicit files.
Likelihood: Medium (Windows is supported platform)
Mitigation:

  • Normalize paths with path.sep/ conversion before minimatch
  • Add Windows-specific integration tests

2. Symbolic Link Handling

Scenario: Explicit files could be symlinks, but we don't resolve them.
Current Code: path.relative(rootDir, filePath) (line 247)
Impact: If filePath is a symlink, relative path might not resolve correctly.
Mitigation: Consider using fs.realpath before path.relative

3. Race Condition with .gitignore Modification

Scenario: .gitignore is modified between reading (line 192) and file processing.
Impact: Inconsistent ignore behavior.
Likelihood: Low (requires concurrent modification)
Mitigation: Accept as acceptable risk (same as original behavior)

4. Memory Overhead with Large Stdin Lists

Scenario: User pipes 100k+ files via stdin.
New Code: Stores all explicit files in memory (line 247: explicitFiles.map(...))
Old Code: Only stored patterns
Impact: Increased memory usage for massive file lists.
Mitigation:

  • Document recommended limits
  • Consider streaming approach for very large inputs

5. Absolute Path Handling

Scenario: What if explicitFiles contains paths outside rootDir?
Current Code: path.relative(rootDir, filePath) (line 247)
Impact: Could result in ../ paths, which may not match patterns correctly.
Test Coverage: No test for this scenario
Mitigation: Add validation or test case

Deployment Risks:

1. Regression in Non-Stdin Mode

Risk: Changes to ignore pattern logic could affect normal mode.
Mitigation: ✅ Already addressed - normal mode path unchanged (line 266-289)

2. Breaking Change for Advanced Users

Risk: Users relying on previous stdin behavior might see different results.
Example: Files previously included might now be filtered.
Mitigation:

  • Consider this a bug fix, not breaking change
  • Document in release notes

🔒 Security Considerations

✅ No Security Concerns Identified

  1. Path Traversal: Using path.relative correctly prevents traversal attacks.
  2. File Reading: Only reads expected .gitignore/.repomixignore files.
  3. Pattern Injection: Patterns are processed safely through minimatch.

Recommendation:

Consider adding a safety check for explicit files outside rootDir:

const relativePaths = explicitFiles.map((filePath) => {
  const rel = path.relative(rootDir, filePath);
  if (rel.startsWith('..')) {
    throw new RepomixError(\`File outside root directory: \${filePath}\`);
  }
  return rel;
});

📊 Performance Considerations

Improvements:

  1. Reduces globby overhead: No longer passes all explicit files to globby
  2. Prevents hang: Primary goal achieved

Potential Concerns:

  1. Sequential file reading (lines 189-218): Reading .gitignore and .repomixignore sequentially.

    • Optimization: Use Promise.all to read both in parallel
  2. Set deduplication overhead (line 263): For large file lists, new Set() has memory overhead.

    • Impact: Minimal for typical use cases (<10k files)
  3. Multiple pattern checks: Each file checked against all patterns (line 254-256).

    • Optimization: Pre-compile patterns with ignore package for O(1) lookups

🧪 Test Coverage Analysis

✅ Well Covered:

  • Basic stdin mode with .gitignore rules
  • Merging globby and explicit files
  • Empty directory handling
  • Ignore-only patterns

⚠️ Missing Test Cases:

  1. Negation patterns (!important.txt)
  2. Case-sensitive pattern matching (Windows)
  3. Files outside rootDir (absolute paths, ../)
  4. Large file lists (performance/memory test)
  5. Subdirectory .gitignore interaction (documented limitation)
  6. Error handling: What if globby fails but explicit files succeed?
  7. Empty explicit files array: explicitFiles = []
  8. Pattern edge cases:
    • Glob wildcards in explicit file paths
    • Special characters (spaces, unicode)
  9. Symlink handling

Recommended Additions:

test('should handle negation patterns in .gitignore', async () => { ... });
test('should reject files outside rootDir', async () => { ... });
test('should handle empty explicit files array', async () => { ... });

📝 Documentation & Commit Quality

✅ Strengths:

  1. Clear commit message: Follows Conventional Commits format
  2. Comprehensive PR description: Good summary, testing evidence, limitations
  3. Inline code comments: Logic is well-explained

Suggestions:

  1. Update user documentation: If stdin mode behavior changed, update docs
  2. Changelog entry: Mention the fix for macOS hang
  3. Migration guide: If any breaking changes, document them

🎯 Final Recommendations

Priority 1 (Must Fix):

  1. Handle negation patterns using ignore package instead of raw minimatch
  2. Add test for files outside rootDir to prevent path traversal edge cases

Priority 2 (Should Fix):

  1. Extract stdin logic to separate function for maintainability
  2. Optimize .gitignore reading with Promise.all
  3. Add missing test cases (negation, edge cases)

Priority 3 (Consider):

  1. DRY up ignore file reading logic
  2. Add performance tests for large file lists
  3. Document subdirectory .gitignore limitation more prominently

✅ Approval Decision

APPROVED with minor suggestions

This PR successfully fixes the critical hang issue on macOS and is well-tested for the primary use case. The identified concerns are mostly edge cases that can be addressed in follow-up PRs. The core implementation is sound.

Merge Readiness: Ready to merge after addressing Priority 1 items (negation patterns + path validation).


Great work on fixing this issue! The root cause analysis was excellent, and the solution is pragmatic. 🚀

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6d3485a and 636a75b.

📒 Files selected for processing (2)
  • src/core/file/fileSearch.ts (1 hunks)
  • tests/core/file/fileSearch.test.ts (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/core/file/fileSearch.test.ts (2)
src/core/file/fileSearch.ts (1)
  • searchFiles (96-329)
tests/testing/testUtils.ts (1)
  • createMockConfig (15-45)
src/core/file/fileSearch.ts (2)
src/shared/logger.ts (2)
  • logger (89-89)
  • error (34-38)
src/core/file/permissionCheck.ts (1)
  • PermissionError (16-25)
🪛 GitHub Actions: CI
tests/core/file/fileSearch.test.ts

[error] 598-598: Test failure due to Windows path separators: expected POSIX-style paths (e.g., 'src/file1.ts') but received Windows-style paths (e.g., 'src\file1.ts') in result.filePaths.


[error] 625-625: Test assertion failed: expected file paths to be ['lib/utils.ts', 'src/main.ts'] but received ['lib\utils.ts', 'src\main.ts'] (Windows path separators).


[error] 667-667: Test assertion failed: expected filePaths length to be 3 but got 4 due to duplicate handling with mixed path separators in stdin mode.

🪛 GitHub Check: Test with Bun (windows-latest, latest)
tests/core/file/fileSearch.test.ts

[failure] 598-598: tests/core/file/fileSearch.test.ts > fileSearch > searchFiles path validation > should filter explicit files based on include and ignore patterns
AssertionError: expected [ Array(4) ] to deeply equal ArrayContaining{…}

  • Expected
  • Received
  • ArrayContaining [
  • "src/file1.ts",
  • "src/file2.js",
  • "src/file3.ts",
  • [
  • "src\file1.ts",
  • "src\file2.js",
  • "src\file3.ts",
    "src/other.ts",
    ]

❯ tests/core/file/fileSearch.test.ts:598:32


[failure] 667-667: tests/core/file/fileSearch.test.ts > fileSearch > searchFiles path validation > should merge globby results and explicit files in stdin mode
AssertionError: expected [ 'src\duplicate.ts', …(3) ] to have a length of 3 but got 4

  • Expected
  • Received
  • 3
  • 4

❯ tests/core/file/fileSearch.test.ts:667:32


[failure] 625-625: tests/core/file/fileSearch.test.ts > fileSearch > searchFiles path validation > should handle explicit files with ignore patterns only
AssertionError: expected [ 'lib\utils.ts', 'src\main.ts' ] to deeply equal [ 'lib/utils.ts', 'src/main.ts' ]

  • Expected
  • Received

    [

  • "lib/utils.ts",
  • "src/main.ts",
  • "lib\utils.ts",
  • "src\main.ts",
    ]

❯ tests/core/file/fileSearch.test.ts:625:32

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: claude-review

Comment on lines +188 to +218
if (config.ignore.useGitignore) {
// Read .gitignore from root directory
const gitignorePath = path.join(rootDir, '.gitignore');
try {
const gitignoreContent = await fs.readFile(gitignorePath, 'utf8');
const gitignorePatterns = parseIgnoreContent(gitignoreContent);
allIgnorePatterns.push(...gitignorePatterns);
logger.trace('Loaded .gitignore patterns:', gitignorePatterns);
} catch (error) {
// .gitignore might not exist, which is fine
logger.trace(
'No .gitignore found or could not read:',
error instanceof Error ? error.message : String(error),
);
}

// If no include patterns at all, default to all files
if (includePatterns.length === 0) {
includePatterns = ['**/*'];
}
// Read .repomixignore from root directory
const repomixignorePath = path.join(rootDir, '.repomixignore');
try {
const repomixignoreContent = await fs.readFile(repomixignorePath, 'utf8');
const repomixignorePatterns = parseIgnoreContent(repomixignoreContent);
allIgnorePatterns.push(...repomixignorePatterns);
logger.trace('Loaded .repomixignore patterns:', repomixignorePatterns);
} catch (error) {
// .repomixignore might not exist, which is fine
logger.trace(
'No .repomixignore found or could not read:',
error instanceof Error ? error.message : String(error),
);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Always load .repomixignore in stdin mode

In stdin mode we now skip .repomixignore whenever useGitignore is false, but normal mode still respects it because getIgnoreFilePatterns always includes .repomixignore. This regresses documented behaviour: disabling gitignore handling should not disable the Repomix-specific ignore file. Please load .repomixignore independently of the gitignore flag so explicit files keep honoring those rules.

-      if (config.ignore.useGitignore) {
-        // Read .gitignore from root directory
-        const gitignorePath = path.join(rootDir, '.gitignore');
-        try {
-          const gitignoreContent = await fs.readFile(gitignorePath, 'utf8');
-          const gitignorePatterns = parseIgnoreContent(gitignoreContent);
-          allIgnorePatterns.push(...gitignorePatterns);
-          logger.trace('Loaded .gitignore patterns:', gitignorePatterns);
-        } catch (error) {
-          // .gitignore might not exist, which is fine
-          logger.trace(
-            'No .gitignore found or could not read:',
-            error instanceof Error ? error.message : String(error),
-          );
-        }
-
-        // Read .repomixignore from root directory
-        const repomixignorePath = path.join(rootDir, '.repomixignore');
-        try {
-          const repomixignoreContent = await fs.readFile(repomixignorePath, 'utf8');
-          const repomixignorePatterns = parseIgnoreContent(repomixignoreContent);
-          allIgnorePatterns.push(...repomixignorePatterns);
-          logger.trace('Loaded .repomixignore patterns:', repomixignorePatterns);
-        } catch (error) {
-          // .repomixignore might not exist, which is fine
-          logger.trace(
-            'No .repomixignore found or could not read:',
-            error instanceof Error ? error.message : String(error),
-          );
-        }
-      }
+      const loadIgnoreFile = async (fileName: string, enabled: boolean) => {
+        if (!enabled) return;
+        const filePath = path.join(rootDir, fileName);
+        try {
+          const content = await fs.readFile(filePath, 'utf8');
+          const patterns = parseIgnoreContent(content).map(normalizeGlobPattern);
+          allIgnorePatterns.push(...patterns);
+          logger.trace(`Loaded ${fileName} patterns:`, patterns);
+        } catch (error) {
+          logger.trace(
+            `No ${fileName} found or could not read:`,
+            error instanceof Error ? error.message : String(error),
+          );
+        }
+      };
+
+      await loadIgnoreFile('.gitignore', config.ignore.useGitignore);
+      await loadIgnoreFile('.repomixignore', true);

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/core/file/fileSearch.ts around lines 188 to 218, the code currently only
reads .repomixignore when config.ignore.useGitignore is true, which causes
.repomixignore to be skipped in stdin mode when gitignore handling is disabled;
move or duplicate the .repomixignore read/parse/log block so it runs
unconditionally (outside or after the if (config.ignore.useGitignore) block) and
keep the same try/catch behavior and logger.trace messages to handle a missing
file gracefully.

Comment on lines +192 to +210
const gitignoreContent = await fs.readFile(gitignorePath, 'utf8');
const gitignorePatterns = parseIgnoreContent(gitignoreContent);
allIgnorePatterns.push(...gitignorePatterns);
logger.trace('Loaded .gitignore patterns:', gitignorePatterns);
} catch (error) {
// .gitignore might not exist, which is fine
logger.trace(
'No .gitignore found or could not read:',
error instanceof Error ? error.message : String(error),
);
}

// If no include patterns at all, default to all files
if (includePatterns.length === 0) {
includePatterns = ['**/*'];
}
// Read .repomixignore from root directory
const repomixignorePath = path.join(rootDir, '.repomixignore');
try {
const repomixignoreContent = await fs.readFile(repomixignorePath, 'utf8');
const repomixignorePatterns = parseIgnoreContent(repomixignoreContent);
allIgnorePatterns.push(...repomixignorePatterns);
logger.trace('Loaded .repomixignore patterns:', repomixignorePatterns);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Normalize patterns loaded from ignore files

Lines like build/ or tmp/ are extremely common in .gitignore, but when we feed them straight into minimatch they no longer behave like git (they won't exclude build/index.js). We already normalize config-derived patterns; we need to apply the same normalizeGlobPattern to patterns pulled from .gitignore / .repomixignore before pushing them into allIgnorePatterns, otherwise stdin-mode files slip through those ignores.

-          const gitignorePatterns = parseIgnoreContent(gitignoreContent);
+          const gitignorePatterns = parseIgnoreContent(gitignoreContent).map(normalizeGlobPattern);
           allIgnorePatterns.push(...gitignorePatterns);
...
-          const repomixignorePatterns = parseIgnoreContent(repomixignoreContent);
+          const repomixignorePatterns = parseIgnoreContent(repomixignoreContent).map(normalizeGlobPattern);
           allIgnorePatterns.push(...repomixignorePatterns);
🤖 Prompt for AI Agents
In src/core/file/fileSearch.ts around lines 192 to 210, the patterns read from
.gitignore and .repomixignore must be normalized before being added to
allIgnorePatterns; map the arrays returned by parseIgnoreContent through the
existing normalizeGlobPattern function (ensuring normalizeGlobPattern is
imported/available) and push the normalized results into allIgnorePatterns, and
update the logger.trace calls to log the normalized patterns so ignores behave
like git (e.g., "build/" will also match build/index.js).

Comment on lines +220 to +268
// 1. Run globby only for includePatterns (if any)
const globbyPatterns = includePatterns.length > 0 ? includePatterns : [];
const globbyResults =
globbyPatterns.length > 0
? await globby(globbyPatterns, {
cwd: rootDir,
ignore: [...adjustedIgnorePatterns],
ignoreFiles: [...ignoreFilePatterns],
onlyFiles: true,
absolute: false,
dot: true,
followSymbolicLinks: false,
}).catch((error: unknown) => {
const code = (error as NodeJS.ErrnoException | { code?: string })?.code;
if (code === 'EPERM' || code === 'EACCES') {
throw new PermissionError(
`Permission denied while scanning directory. Please check folder access permissions for your terminal app. path: ${rootDir}`,
rootDir,
);
}
throw error;
})
: [];

logger.trace('Globby results for includePatterns:', globbyResults);

// 2. Convert explicit files to relative paths
const relativePaths = explicitFiles.map((filePath) => path.relative(rootDir, filePath));

logger.trace('Explicit files (relative):', relativePaths);

// 3. Filter explicit files using ignore patterns (manually with minimatch)
const filteredExplicitFiles = relativePaths.filter((filePath) => {
// Check if file matches any ignore pattern
const shouldIgnore = allIgnorePatterns.some(
(pattern) => minimatch(filePath, pattern) || minimatch(`${filePath}/`, pattern),
);
}
throw error;
});
return !shouldIgnore;
});

logger.trace('Filtered explicit files:', filteredExplicitFiles);

// 4. Merge globby results and filtered explicit files, removing duplicates
filePaths = [...new Set([...globbyResults, ...filteredExplicitFiles])];

logger.trace('Merged file paths:', filePaths);
} else {
// Normal mode: use globby with all patterns
const patterns = includePatterns.length > 0 ? includePatterns : ['**/*'];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Normalize explicit paths to POSIX separators before merging

On Windows, path.relative returns backslash-separated paths, so filteredExplicitFiles ends up with values like src\file1.ts while globbyResults uses src/file1.ts. This breaks deduplication and causes ignore checks to misbehave, which is exactly what the Windows CI failures are showing (mixed separators and duplicate entries). Convert both the globby output and the explicit-path relatives to POSIX separators before combining them so we return a consistent, de-duped list everywhere.

-      const globbyResults =
-        globbyPatterns.length > 0
-          ? await globby(globbyPatterns, {
+      const toPosix = (value: string) => value.replace(/\\/g, '/');
+      const globbyResults =
+        globbyPatterns.length > 0
+          ? (
+              await globby(globbyPatterns, {
                   cwd: rootDir,
                   ignore: [...adjustedIgnorePatterns],
                   ignoreFiles: [...ignoreFilePatterns],
                   onlyFiles: true,
                   absolute: false,
                   dot: true,
                   followSymbolicLinks: false,
                 }).catch((error: unknown) => {
                   const code = (error as NodeJS.ErrnoException | { code?: string })?.code;
                   if (code === 'EPERM' || code === 'EACCES') {
                     throw new PermissionError(
                       `Permission denied while scanning directory. Please check folder access permissions for your terminal app. path: ${rootDir}`,
                       rootDir,
                     );
                   }
                   throw error;
                 })
-          : [];
+            ).map(toPosix)
+          : [];
...
-      const relativePaths = explicitFiles.map((filePath) => path.relative(rootDir, filePath));
+      const relativePaths = explicitFiles.map((filePath) => toPosix(path.relative(rootDir, filePath)));
🤖 Prompt for AI Agents
In src/core/file/fileSearch.ts around lines 220 to 268, explicit file paths
produced by path.relative are using Windows backslashes which mismatch globby's
POSIX-style results and break deduplication and ignore checks; normalize both
the globbyResults and the relativePaths to POSIX separators before running
minimatch, filtering, and merging. Update the code to map globbyResults and
explicitFiles -> relativePaths through a function that replaces backslashes with
'/' (or use path.posix) so both sets use forward-slash separators, then run the
ignore filtering and dedupe on those normalized values and continue with the
rest of the logic.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@yamadashy yamadashy closed this Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1.5 - 1.6 - 1.7 - 1.8 | Repomix stuck in processing repo

2 participants