Handle large files gracefully#302
Conversation
…mory errors BREAKING CHANGE: files larger than 50MB will now be skipped with a warning instead of being processed. This prevents out-of-memory crashes but may change behavior for repositories containing large files. Changes: - Added 50MB file size limit to prevent memory issues - Added user-friendly warning message The 50MB limit matches GitHub's recommended file size limit and ensures stable memory usage even on systems with limited RAM.
📝 WalkthroughWalkthroughThe changes in Sequence DiagramsequenceDiagram
participant Caller
participant readRawFile
participant fs
Caller->>readRawFile: Request to read file
readRawFile->>fs: Check file size (fs.stat)
alt File size exceeds MAX_FILE_SIZE
readRawFile-->>Caller: Return null (log warning)
else File size within limit
readRawFile->>readRawFile: Check if binary file
alt Is binary file
readRawFile-->>Caller: Return null (log warning)
else Not a binary file
readRawFile->>readRawFile: Read file contents
readRawFile-->>Caller: Return file contents
end
end
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/core/file/fileCollect.ts (1)
35-46: LGTM! Efficient size check with user-friendly warning.The implementation efficiently checks file size before loading content and provides clear, actionable feedback to users.
Consider extracting formatting logic.
Consider moving the size formatting and warning message template to utility functions for better maintainability and reuse.
+const formatSizeInMB = (bytes: number): string => (bytes / 1024 / 1024).toFixed(1); + +const logFileSizeWarning = (filePath: string, actualSize: number, maxSize: number): void => { + logger.log('⚠️ Large File Warning:'); + logger.log('────────────────────────────────────────────────'); + logger.log(`File exceeds size limit: ${formatSizeInMB(actualSize)}MB > ${formatSizeInMB(maxSize)}MB (${filePath})`); + logger.note('Add this file to .repomixignore if you want to exclude it permanently'); + logger.log(''); +}; + const readRawFile = async (filePath: string): Promise<string | null> => { try { const stats = await fs.stat(filePath); if (stats.size > MAX_FILE_SIZE) { - const sizeMB = (stats.size / 1024 / 1024).toFixed(1); - logger.log('⚠️ Large File Warning:'); - logger.log('────────────────────────────────────────────────'); - logger.log(`File exceeds size limit: ${sizeMB}MB > ${MAX_FILE_SIZE / 1024 / 1024}MB (${filePath})`); - logger.note('Add this file to .repomixignore if you want to exclude it permanently'); - logger.log(''); + logFileSizeWarning(filePath, stats.size, MAX_FILE_SIZE); return null; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/core/file/fileCollect.ts(2 hunks)
🧰 Additional context used
🪛 GitHub Actions: CI
src/core/file/fileCollect.ts
[warning] 34-34: Extra whitespace found and automatically fixed by Biome formatter
🔇 Additional comments (4)
src/core/file/fileCollect.ts (4)
11-13: LGTM! Well-documented constant with appropriate size limit.The 50MB limit is a reasonable choice that aligns with GitHub's recommendations and provides a good balance between accommodating legitimate source files while preventing memory issues.
48-52: LGTM! Optimal order of file checks.The binary file check is correctly positioned after the size check, maintaining the efficient check order (quick checks before expensive ones).
34-34: Formatting issue already addressed.The extra whitespace was automatically fixed by the Biome formatter.
🧰 Tools
🪛 GitHub Actions: CI
[warning] 34-34: Extra whitespace found and automatically fixed by Biome formatter
11-13: Verify potential impact on existing large files.Let's check if there are any existing files in the codebase that would be affected by this size limit.
✅ Verification successful
No large files found that would be affected by the size limit
The 50MB file size limit is safe to implement as there are no existing files in the codebase that approach this limit.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Find files larger than 45MB (90% of limit) to identify potential issues find_large_files() { find . -type f -size +45M ! -path "*/\.*" -exec du -h {} \; | \ sort -rh | \ while read size file; do echo "Large file found: $file ($size)" done } echo "Checking for large files (>45MB)..." find_large_filesLength of output: 194
- Add missing fs.stat mocks to handle large files check - Change mocking approach to be more consistent with codebase - Extract common fs.stat mock to beforeEach hook
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #302 +/- ##
==========================================
+ Coverage 92.12% 92.17% +0.04%
==========================================
Files 44 44
Lines 2236 2248 +12
Branches 491 493 +2
==========================================
+ Hits 2060 2072 +12
Misses 176 176 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/core/file/fileCollect.test.ts (1)
66-88: Enhance test coverage and maintainability.While the test correctly verifies basic large file handling, consider these improvements:
- Verify the complete warning message to ensure proper user guidance
- Add assertion that
fs.readFilewasn't called for the large file- Extract the size constant for better maintainability
+ const SIXTY_MB = 60 * 1024 * 1024; it('should skip large files', async () => { const mockFilePaths = ['large.txt', 'normal.txt']; const mockRootDir = '/root'; vi.mocked(fs.stat) .mockResolvedValueOnce({ // for large.txt - size: 60 * 1024 * 1024, + size: SIXTY_MB, isFile: () => true, } as Stats) .mockResolvedValueOnce({ // for normal.txt size: 1024, isFile: () => true, } as Stats); vi.mocked(isBinary).mockReturnValue(false); vi.mocked(fs.readFile).mockResolvedValue(Buffer.from('file content')); vi.mocked(jschardet.detect).mockReturnValue({ encoding: 'utf-8', confidence: 0.99 }); vi.mocked(iconv.decode).mockReturnValue('decoded content'); const result = await collectFiles(mockFilePaths, mockRootDir); expect(result).toEqual([{ path: 'normal.txt', content: 'decoded content' }]); - expect(logger.log).toHaveBeenCalledWith('⚠️ Large File Warning:'); + expect(logger.log).toHaveBeenCalledWith( + expect.stringContaining('⚠️ Large File Warning:'), + expect.stringContaining('large.txt'), + expect.stringContaining('50MB'), + expect.stringContaining('.repomixignore') + ); + expect(fs.readFile).not.toHaveBeenCalledWith( + path.resolve('/root/large.txt'), + expect.any(Object) + ); });
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/core/file/fileCollect.test.ts(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Test (windows-latest, 23.x)
- GitHub Check: Test (windows-latest, 22.x)
- GitHub Check: Test (windows-latest, 21.x)
- GitHub Check: Test (windows-latest, 20.x)
- GitHub Check: Test (windows-latest, 19.x)
- GitHub Check: Test (windows-latest, 18.x)
- GitHub Check: Test (windows-latest, 18.0.0)
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (2)
tests/core/file/fileCollect.test.ts (2)
1-1: LGTM! Well-structured mock setup for file stats.The Stats type import and fs.stat mock setup are properly implemented to support the new file size checking functionality.
Also applies to: 20-25
53-55: LGTM! Improved mock readability.The reformatted mock setup with inline comments makes the test's behavior more clear and maintainable.
- Remove trailing whitespace - Fix comment alignment in mock definitions - Move inline comments to their own lines in object literals
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/core/file/fileCollect.test.ts (1)
1-1: LGTM! Consider enhancing the stat mock.The Stats import and basic stat mock setup look good. However, consider making the mock more robust by implementing all required Stats properties.
vi.mocked(fs.stat).mockResolvedValue({ size: 1024, isFile: () => true, + isDirectory: () => false, + isSymbolicLink: () => false, } as Stats);Also applies to: 20-25
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/core/file/fileCollect.test.ts(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: Test (windows-latest, 23.x)
- GitHub Check: Test (windows-latest, 22.x)
- GitHub Check: Test (windows-latest, 21.x)
- GitHub Check: Build and run (windows-latest, 22.x)
- GitHub Check: Test (windows-latest, 20.x)
- GitHub Check: Test (windows-latest, 19.x)
- GitHub Check: Test (windows-latest, 18.x)
🔇 Additional comments (1)
tests/core/file/fileCollect.test.ts (1)
53-55: Clean formatting improvement!The mock setup is now more readable with explicit comments indicating the behavior for each file type.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/core/file/fileCollect.ts (1)
35-48: LGTM! Efficient size check with user-friendly warning.The implementation efficiently checks file size before loading content and provides clear, actionable feedback to users.
Consider adding the exact file size in bytes to the warning message for debugging purposes:
-logger.log(`File exceeds size limit: ${sizeMB}MB > ${MAX_FILE_SIZE / 1024 / 1024}MB (${filePath})`); +logger.log(`File exceeds size limit: ${sizeMB}MB > ${MAX_FILE_SIZE / 1024 / 1024}MB (${stats.size} bytes) (${filePath})`);tests/core/file/fileCollect.test.ts (1)
66-99: Enhance size-related assertions in the large file test.The test implementation looks good and addresses previous review feedback. Consider adding these assertions for more precise size verification:
expect(logger.log).toHaveBeenCalledWith(expect.stringContaining('File exceeds size limit:')); +const expectedSizeMB = ((MAX_FILE_SIZE + 1024) / 1024 / 1024).toFixed(1); +expect(logger.log).toHaveBeenCalledWith( + expect.stringContaining(`${expectedSizeMB}MB > ${MAX_FILE_SIZE / 1024 / 1024}MB`) +); expect(logger.log).toHaveBeenCalledWith(expect.stringContaining(largePath));
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/core/file/fileCollect.ts(2 hunks)tests/core/file/fileCollect.test.ts(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: Test (windows-latest, 23.x)
- GitHub Check: Test (windows-latest, 22.x)
- GitHub Check: Test (windows-latest, 20.x)
- GitHub Check: Test (windows-latest, 19.x)
- GitHub Check: Test (windows-latest, 18.0.0)
🔇 Additional comments (4)
src/core/file/fileCollect.ts (2)
11-13: LGTM! Well-documented constant with appropriate size limit.The 50MB limit aligns with the PR objectives and GitHub's recommended file size limit.
49-54: LGTM! Proper ordering of file checks.The binary check is correctly positioned after the size check, with appropriate debug logging.
tests/core/file/fileCollect.test.ts (2)
8-8: LGTM! Proper test setup with necessary imports.The import and mock setup provide a good foundation for testing the new functionality.
Also applies to: 21-25
53-55: LGTM! Improved mock readability.The mock chain is now more readable with clear comments.
|
Hi, @slavashvets ! You're right that we don't need to process such large files. Going to merge this. Looking forward to your future contributions! |
Fixes critical out-of-memory errors when processing repositories containing large files
The Problem
When accidentally adding a large file to a repository (such as a 480MB txt file), Repomix would try to load it into memory and crash with a Node.js heap out-of-memory error:
This provides a poor user experience as:
The Solution
Added a size limit check (50MB) before loading files into memory, with clear user guidance:
Changes
Why
Large files are often added by accident (logfiles, lock files, etc.). Instead of crashing, Repomix should help users identify and handle these files appropriately. The 50MB limit was chosen because: