perf(core): Eliminate stat() syscall and lazy-load encoding libraries in fileRead#1399
perf(core): Eliminate stat() syscall and lazy-load encoding libraries in fileRead#1399
Conversation
… in fileRead
Two optimizations to reduce file reading overhead:
1. Remove redundant fs.stat() before fs.readFile()
- Previously each file required stat() (size check) then readFile() = 2 syscalls
- Now readFile() runs first, then buffer.length is checked = 1 syscall
- Files exceeding maxFileSize (default 10MB) are rare; the occasional
oversized read is acceptable for halving syscall count on all files
2. Lazy-load jschardet and iconv-lite
- These libraries have ~25ms combined import cost at startup
- The fast UTF-8 path (covers ~99% of source code files) never needs them
- They are only loaded on first encounter of a non-UTF-8 file
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe PR optimizes file reading in Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
⚡ Performance Benchmark
Details
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1399 +/- ##
==========================================
+ Coverage 87.40% 87.41% +0.01%
==========================================
Files 116 116
Lines 4389 4393 +4
Branches 1018 1019 +1
==========================================
+ Hits 3836 3840 +4
Misses 553 553 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Code Review
This pull request optimizes file reading by lazy-loading encoding detection libraries and removing the fs.stat syscall to reduce I/O operations. Feedback indicates a potential race condition and type safety issues in the lazy-loading logic. Additionally, there is a concern that removing the early size check via fs.stat could lead to high memory consumption or out-of-memory errors when processing very large files.
| let _jschardet: typeof import('jschardet') | undefined; | ||
| let _iconv: typeof import('iconv-lite') | undefined; | ||
| const getEncodingDeps = async () => { | ||
| if (!_jschardet || !_iconv) { | ||
| [_jschardet, _iconv] = await Promise.all([import('jschardet'), import('iconv-lite')]); | ||
| } | ||
| return { jschardet: _jschardet, iconv: _iconv }; | ||
| }; |
There was a problem hiding this comment.
The current implementation of getEncodingDeps has a potential race condition. If multiple concurrent calls to readRawFile trigger the slow path (non-UTF-8 files) before the first import() completes, the modules will be imported multiple times. Additionally, the return type of the function includes undefined for both dependencies, which will cause TypeScript errors or runtime crashes when calling encodingDeps.jschardet.detect() at line 73.
Using a single promise to manage the lazy loading ensures that the modules are only loaded once and provides a clean, non-nullable return type for the caller.
let encodingDepsPromise: Promise<{
jschardet: typeof import('jschardet');
iconv: typeof import('iconv-lite');
}> | undefined;
const getEncodingDeps = async () => {
if (!encodingDepsPromise) {
encodingDepsPromise = Promise.all([
import('jschardet'),
import('iconv-lite'),
]).then(([jschardet, iconv]) => ({
jschardet: (jschardet as any).default || jschardet,
iconv: (iconv as any).default || iconv,
}));
}
return encodingDepsPromise;
};| // Read the file directly and check size afterward, avoiding a separate stat() syscall. | ||
| // This halves the number of I/O operations per file. | ||
| // Files exceeding maxFileSize are rare, so the occasional oversized read is acceptable. | ||
| const buffer = await fs.readFile(filePath); |
There was a problem hiding this comment.
Removing the fs.stat check before fs.readFile introduces a risk of high memory consumption. While it reduces the number of syscalls, any file exceeding maxFileSize (e.g., a large log file, dataset, or binary not caught by extension checks) will now be fully loaded into memory before the size check occurs. In extreme cases, this could lead to OutOfMemoryError or process crashes if the file is very large (e.g., several GBs).
Consider if the performance gain of one less stat syscall (which is metadata-only and very fast) outweighs the safety of the early size check, especially for a tool that might be run on diverse and potentially large repositories.
Code Review —
|
Two optimizations to reduce file reading overhead:
1. Remove redundant
fs.stat()beforefs.readFile()stat()(size check) thenreadFile()= 2 syscallsreadFile()runs first, thenbuffer.lengthis checked = 1 syscallmaxFileSize(default 10MB) are rare; the occasional oversized read is acceptable for halving syscall count on all files2. Lazy-load
jschardetandiconv-liteExtracted from #1377. Pure performance optimization with no functional changes.
Checklist
npm run testnpm run lint