feat(cli): Report files detected as binary by content inspection#798
feat(cli): Report files detected as binary by content inspection#798
Conversation
Add new "Binary Files Detected" section to CLI output that shows files which were skipped due to binary content detection (not extension-based). This addresses issue #752 where users were not informed about files being silently excluded. Changes: - Update fileRead.ts to return detailed skip reasons (binary-extension, binary-content, size-limit, encoding-error) - Modify file collection pipeline to track and propagate skipped files - Add reportSkippedFiles function to display binary-content detected files - Show files with relative paths and helpful exclusion messages - Only display section when binary-content files are found - Add comprehensive test coverage for new functionality The implementation follows existing security check reporting patterns and provides users clear visibility into why files were excluded from output.
Remove redundant explanatory message from binary file listing to simplify output. The main header already explains the detection method, making individual file annotations unnecessary. Changes: - Remove "Detected as binary despite text extension" message from each file - Improve type safety by using FileSkipReason type instead of string - Update corresponding test expectations - Clean up unused import in fileCollectWorker The simplified output is cleaner while maintaining clarity about why files were excluded.
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughIntroduces structured skip-reason reporting across file read/collect/pack flows; aggregates skipped files into PackResult; adds a CLI reporter that lists files skipped due to binary-content; updates tests to the new return types and result shapes; and expands PackResult with metrics and skippedFiles for downstream reporting. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant CLI as CLI
participant Pack as pack()
participant Collect as collectFiles()
participant Worker as fileCollectWorker
participant Read as readRawFile()
CLI->>Pack: run pack()
Pack->>Collect: collectFiles(rootDir, config, cb)
loop per file
Collect->>Worker: process(filePath, maxFileSize)
Worker->>Read: readRawFile(filePath, maxFileSize)
Read-->>Worker: { content } or { skippedReason }
alt content present
Worker-->>Collect: { rawFile: { path, content } }
else skippedReason present
Worker-->>Collect: { skippedFile: { path, reason } }
end
end
Collect-->>Pack: { rawFiles[], skippedFiles[] }
Pack-->>CLI: PackResult { processedFiles..., skippedFiles[], metrics... }
CLI->>CLI: reportSkippedFiles(binary-content only)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Assessment against linked issues
Assessment against linked issues: Out-of-scope changes
Possibly related PRs
Suggested labels
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
There was a problem hiding this comment.
Summary of Changes
Hello @yamadashy, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces a new feature to the CLI that reports files identified as binary through content inspection, rather than just by their file extension. Previously, such files were silently excluded, which could lead to user confusion. The update provides clear visibility into why these files are skipped and offers guidance for users.
Highlights
- CLI Output Enhancement: Introduced a new 'Binary Files Detected' section in the CLI output to clearly list files identified as binary by content.
- Enhanced File Processing: Refactored file reading and collection pipelines to return detailed skip reasons (e.g., binary content, size limit, encoding errors) and propagate this information throughout the system.
- Type Safety: Ensured type safety by adding new TypeScript types for file skip reasons and collection results, improving code robustness and maintainability.
- User Guidance: Provided user-friendly messages explaining why files were excluded and advising on next steps, improving the user experience.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
Deploying repomix with
|
| Latest commit: |
a221ee7
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://19bc1248.repomix.pages.dev |
| Branch Preview URL: | https://feat-binary-check.repomix.pages.dev |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #798 +/- ##
==========================================
- Coverage 87.49% 87.44% -0.06%
==========================================
Files 113 113
Lines 6437 6491 +54
Branches 1316 1331 +15
==========================================
+ Hits 5632 5676 +44
- Misses 805 815 +10 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Code Review
This pull request does a great job of implementing the new feature to report files detected as binary by content inspection. The changes are well-structured and propagated through the different layers of the application, from file reading to the final CLI report. The addition of tests for the new reporting logic is also a good practice.
I've identified a critical issue in the file collection worker that could lead to files being silently dropped, which would undermine the goal of this feature. I've also suggested a minor refactoring to improve code clarity and maintainability in the packager. Addressing these points will make the implementation more robust and reliable.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (13)
src/core/file/fileRead.ts (2)
31-34: Lower verbosity for extension-based skips to avoid confusion with new CLI content-based section.Since the CLI prominently reports content-detected binaries, extension-based skips are incidental context. Consider
tracehere to reduce noise while keepingdebugfor content-based detection.- logger.debug(`Skipping binary file: ${filePath}`); + logger.trace(`Skipping binary file by extension: ${filePath}`);
40-43: Optional: sample subset for binary sniff to reduce cost on large files.
isBinary(null, buffer)scans entire buffers. Passing a reasonable slice (e.g., first 4–8KB) is typically sufficient and cheaper.- if (isBinary(null, buffer)) { + // Inspect a slice to keep the check lightweight; full buffer is rarely needed. + const sample = buffer.subarray(0, Math.min(buffer.length, 8192)); + if (isBinary(null, sample)) {tests/mcp/tools/packCodebaseTool.test.ts (1)
64-107: Add a focused test to assert skippedFiles are forwarded to the formatter.Right now we never exercise a non-empty
skippedFiles. Add a case that injects one and asserts the formatter receives it (guarding against regressions).Example minimal addition:
+ test('should forward skippedFiles to formatter', async () => { + const testDir = '/test/project'; + const skipped = [{ path: 'bin.dat', reason: 'binary-content' as const }]; + vi.mocked(runCli).mockResolvedValueOnce({ + packResult: { ...defaultPackResult, skippedFiles: skipped }, + config: createMockConfig({ cwd: testDir }), + }); + + await toolHandler({ directory: testDir }); + + // Ensure formatter was called (signature is mocked, so we assert by call args shape) + expect(formatPackToolResponse).toHaveBeenCalledWith( + expect.objectContaining({ + packResult: expect.objectContaining({ skippedFiles: skipped }), + }), + ); + });tests/core/packager.test.ts (1)
121-122: Add a positive-case asserting skippedFiles passthrough and exclusion from metrics.Include a test where
collectFilesreturns non-emptyskippedFilesand assert:
result.skippedFilesequals the provided list,- token/char/file counts exclude skipped items.
Illustrative change:
+ test('pack should propagate skippedFiles and exclude them from metrics', async () => { + const skipped = [{ path: 'ignored.bin', reason: 'binary-content' as const }]; + const deps = { ...mockDeps, collectFiles: vi.fn().mockResolvedValue({ rawFiles: mockRawFiles, skippedFiles: skipped }) }; + const result = await pack(['root'], mockConfig, progressCallback, deps as any); + expect(result.skippedFiles).toEqual(skipped); + // Existing expectations for totals implicitly assert skips are not included. + });tests/cli/actions/defaultAction.test.ts (1)
670-671: Add a skipped‐files reporting test to defaultAction.test.tsWe should ensure that when
pack()returns a non‐emptyskippedFilesarray, the CLI wiring invokescliReport.reportSkippedFiles.• In tests/cli/actions/defaultAction.test.ts (around lines 670–671):
Import the reporter:
import * as cliReport from '../../../src/cli/cliReport.js';Add a new case after the existing
pack()tests:// existing tests... vi.mocked(packager.pack).mockResolvedValue({ // ...other PackResult fields...skippedFiles: [],
skippedFiles: [{ path: 'bin.dat', reason: 'binary-content' }],} as PackResult);
it('reports skipped files when pack returns skippedFiles', async () => {
// Arrange: return one skipped file
vi.mocked(packager.pack).mockResolvedValue({
// ...reuse other required fields...skippedFiles: [{ path: 'bin.dat', reason: 'binary-content' }],} as PackResult);
// Act
await runDefaultAction(['.'], process.cwd(), {});
// Assert
expect(cliReport.reportSkippedFiles).toHaveBeenCalledWith(
expect.any(String),[{ path: 'bin.dat', reason: 'binary-content' }],);
});
This addition validates that
runDefaultActioncorrectly delegates skipped‐files reporting to the CLI reporter.tests/cli/actions/remoteAction.test.ts (2)
53-53: Added packResult.skippedFiles in mocks — LGTMThe shape update aligns with the new PackResult contract and keeps these tests compiling and semantically correct.
To improve signal, consider adding one test where
runDefaultActionreturns a non-emptyskippedFiles(e.g., abinary-contentcase) and assert thatrunRemoteActionbehavior remains unchanged (no accidental logging or failures).Also applies to: 96-96, 142-142, 188-188
26-28: Avoid real filesystem writes in tests (optional)
execGitShallowCloneMockcallsfs.writeFile. Sincefs.writeFileisn’t mocked in this suite, this introduces real FS writes. It’s harmless here but can introduce flakiness in constrained CI. Mocking it keeps the test hermetic.Apply this diff near the
vi.mock('node:fs/promises', ...)block:vi.mock('node:fs/promises', async (importOriginal) => { const actual = await importOriginal<typeof import('node:fs/promises')>(); return { ...actual, copyFile: vi.fn(), mkdir: vi.fn(), + writeFile: vi.fn().mockResolvedValue(undefined), }; });tests/core/file/fileCollect.test.ts (3)
89-93: Clarify binary skip reason and add content-based coverageThis test exercises the extension-based skip (
binary-extension). Given the PR’s goal also covers content inspection, add a companion test where decoding/character inspection yieldsbinary-contentto prevent regressions.Example additional test (outside the shown range):
it('should skip files detected as binary by content inspection', async () => { const mockFilePaths = ['malformed.txt']; const mockRootDir = '/root'; const mockConfig = createMockConfig(); vi.mocked(isBinary).mockReturnValue(false); vi.mocked(fs.readFile).mockResolvedValue(Buffer.from([0xff, 0xfe, 0x00])); // arbitrary bytes // Simulate undecodable/malformed input leading to content-based binary detection vi.mocked(jschardet.detect).mockReturnValue({ encoding: 'utf-8', confidence: 0.2 }); // low confidence vi.mocked(iconv.decode).mockImplementation(() => { throw new Error('Malformed sequence'); // or simulate replacement-character detection in your impl }); const result = await collectFiles(mockFilePaths, mockRootDir, mockConfig, () => {}, { initTaskRunner: mockInitTaskRunner, }); expect(result).toEqual({ rawFiles: [], skippedFiles: [{ path: 'malformed.txt', reason: 'binary-content' }], }); });
165-168: Custom maxFileSize — numeric formatting in trace is brittleThe subsequent assertion compares a specific formatted string (
10240.0KB > 5120.0KB). If units/precision change (e.g., MB vs KB), the test will fail unnecessarily. Match flexibly.Suggested assertion replacement for the nearby line:
expect(logger.trace).toHaveBeenCalledWith( expect.stringMatching(/File exceeds size limit:.*(10(\.0)?MB|10240(\.0)?KB)\s*>\s*(5(\.0)?MB|5120(\.0)?KB)/), );
190-193: Read errors reported asencoding-error— consider separating IO vs decodingCurrently an I/O failure (
fs.readFilerejection) maps toencoding-error. Consider a distinct reason (e.g.,read-errororio-error) to differentiate environment/permissions issues from malformed content, improving CLI guidance.If the enum is intentionally limited to four reasons in this PR, please confirm and we can open a follow-up to introduce
read-errorwithout scope-creep here.src/cli/cliReport.ts (1)
164-187: Simplify skipped file path display in reportSkippedFilesThe
SkippedFileInfo.pathvalue is already emitted by the file collector as a path relative to the project root, so usingpath.relative(rootDir, file.path)is redundant and can introduce unexpected../…segments ifrootDirever differs from the CLI’s working directory. You can logfile.pathdirectly.Apply this refactor in
src/cli/cliReport.ts:src/cli/cliReport.ts export const reportSkippedFiles = (rootDir: string, skippedFiles: SkippedFileInfo[]) => { const binaryContentFiles = skippedFiles.filter((file) => file.reason === 'binary-content'); if (binaryContentFiles.length === 0) { return; } logger.log(pc.white('📄 Binary Files Detected:')); logger.log(pc.dim('─────────────────────────')); if (binaryContentFiles.length === 1) { logger.log(pc.yellow('1 file detected as binary by content inspection:')); } else { logger.log(pc.yellow(`${binaryContentFiles.length} files detected as binary by content inspection:`)); } - binaryContentFiles.forEach((file, index) => { - const relativeFilePath = path.relative(rootDir, file.path); - logger.log(`${pc.white(`${index + 1}.`)} ${pc.white(relativeFilePath)}`); - }); + // file.path is already relative to rootDir + binaryContentFiles.forEach((file, index) => { + logger.log(`${pc.white(`${index + 1}.`)} ${pc.white(file.path)}`); + }); logger.log(pc.yellow('\nThese files have been excluded from the output.')); logger.log(pc.yellow('Please review these files if you expected them to contain text content.')); };Non-blocking improvement: issue #752 tracks surfacing encoding/character errors (reason
'encoding-error'). You could mirror the binary section with a “⚠️ Files With Encoding Errors” block to clearly distinguish and explain why those files were skipped.src/core/file/workers/fileCollectWorker.ts (1)
26-37: Tight result mapping; handle “impossible” branch explicitlyMapping readRawFile to either rawFile or skippedFile is solid. The trailing return {} should be unreachable given current readRawFile semantics. Consider making that path explicit to aid diagnostics.
- return {}; + // This should never happen; add a safeguard to aid diagnostics if contracts change. + return { + skippedFile: { + path: filePath, + reason: 'encoding-error', // fallback to a safe, non-destructive reason + }, + };Alternatively, log a warning here (if bringing logger back into the worker is acceptable).
Also applies to: 39-45, 48-48
src/core/packager.ts (1)
97-99: Avoid spreading large arrays into concat; prefer flatMapacc.concat(...curr.rawFiles) and acc.concat(...curr.skippedFiles) spread each element into arguments, which can degrade performance or hit argument limits at scale. Use flatMap (or concat without spread).
- const rawFiles = collectResults.reduce((acc: RawFile[], curr) => acc.concat(...curr.rawFiles), []); - const allSkippedFiles = collectResults.reduce((acc: SkippedFileInfo[], curr) => acc.concat(...curr.skippedFiles), []); + const rawFiles = collectResults.flatMap((curr) => curr.rawFiles); + const allSkippedFiles = collectResults.flatMap((curr) => curr.skippedFiles);
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (14)
src/cli/cliReport.ts(3 hunks)src/core/file/fileCollect.ts(3 hunks)src/core/file/fileRead.ts(2 hunks)src/core/file/workers/fileCollectWorker.ts(2 hunks)src/core/packager.ts(4 hunks)tests/cli/actions/defaultAction.test.ts(4 hunks)tests/cli/actions/remoteAction.test.ts(4 hunks)tests/cli/cliReport.binaryFiles.test.ts(1 hunks)tests/cli/cliReport.test.ts(2 hunks)tests/cli/cliRun.test.ts(2 hunks)tests/core/file/fileCollect.test.ts(5 hunks)tests/core/packager.test.ts(2 hunks)tests/core/packager/diffsFunctionality.test.ts(2 hunks)tests/mcp/tools/packCodebaseTool.test.ts(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
tests/cli/cliReport.binaryFiles.test.ts (3)
src/core/file/workers/fileCollectWorker.ts (1)
SkippedFileInfo(16-19)src/cli/cliReport.ts (1)
reportSkippedFiles(164-187)src/shared/logger.ts (1)
logger(89-89)
src/core/file/fileRead.ts (1)
src/shared/logger.ts (2)
logger(89-89)error(34-38)
src/cli/cliReport.ts (1)
src/core/file/workers/fileCollectWorker.ts (1)
SkippedFileInfo(16-19)
src/core/file/workers/fileCollectWorker.ts (2)
src/core/file/fileRead.ts (2)
FileSkipReason(7-7)readRawFile(20-53)src/core/file/fileTypes.ts (1)
RawFile(1-4)
src/core/file/fileCollect.ts (2)
src/core/file/fileTypes.ts (1)
RawFile(1-4)src/core/file/workers/fileCollectWorker.ts (4)
SkippedFileInfo(16-19)FileCollectTask(10-14)FileCollectTask(26-49)FileCollectResult(21-24)
src/core/packager.ts (3)
src/core/file/workers/fileCollectWorker.ts (1)
SkippedFileInfo(16-19)src/shared/memoryUtils.ts (1)
withMemoryLogging(71-87)src/core/file/fileTypes.ts (1)
RawFile(1-4)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Build and run (windows-latest, 21.x)
- GitHub Check: Build and run (windows-latest, 24.x)
- GitHub Check: Build and run (windows-latest, 22.x)
- GitHub Check: Build and run (windows-latest, 23.x)
- GitHub Check: Build and run (windows-latest, 20.x)
- GitHub Check: Test with Bun (windows-latest, latest)
🔇 Additional comments (25)
src/core/file/fileRead.ts (2)
24-29: LGTM: clear size-limit telemetry and structured skip reason.The trace message is informative and the return shape matches the new contract.
7-12: LGTM: well-scoped public types for skip semantics.The union covers all current cases and aligns with downstream reporting.
tests/mcp/tools/packCodebaseTool.test.ts (1)
47-48: LGTM: default result shape aligned with new PackResult.skippedFiles.This keeps the tool scaffold compatible with the expanded API.
tests/core/packager.test.ts (1)
45-46: LGTM: collectFiles mock updated to new return shape.The pack orchestration remains readable with the new
{ rawFiles, skippedFiles }contract.tests/cli/actions/defaultAction.test.ts (3)
110-111: LGTM: default PackResult includes skippedFiles field.Keeps test fixtures in sync with the new PackResult contract.
802-803: LGTM: directory-processing path also aligned with PackResult.skippedFiles.Consistent fixtures across code paths.
872-873: LGTM: progress-callback path returns the updated shape.The mocked implementation preserves
skippedFiles, matching the production contract.tests/cli/cliRun.test.ts (2)
103-103: LGTM: defaultAction result fixture includes skippedFiles.Shape matches
PackResultupdates and prevents type drift.
159-159: LGTM: remoteAction result fixture includes skippedFiles.Good parity between local and remote flows.
tests/cli/cliReport.test.ts (1)
49-49: PackResult now includes skippedFiles — tests updated appropriatelyThe added
skippedFiles: []keeps these fixtures consistent with the new public API. No further action needed here.Also applies to: 75-75
tests/core/file/fileCollect.test.ts (2)
64-70: Assert new collectFiles result shape — LGTMGood assertion of
{ rawFiles, skippedFiles }. This validates the new aggregation contract from the worker to the caller.
122-126: Size-based skipping assertion — LGTMCorrect expectation of
{ rawFiles, skippedFiles }and the single skipped file withsize-limit. The follow-up trace assertions and read guards are meaningful.tests/cli/cliReport.binaryFiles.test.ts (1)
58-65: Binary-content reporting UX — LGTMGood coverage of header, count grammar, relative paths, and the exclusion guidance block. Filtering out
binary-extensionentries is correctly validated.tests/core/packager/diffsFunctionality.test.ts (1)
56-56: Adapted collectFiles mock to new result shape — LGTMBoth tests now return
{ rawFiles, skippedFiles }, matching the updated packer pipeline expectations.Also applies to: 107-107
src/cli/cliReport.ts (2)
4-4: Type re-export import is correct under NodeNextGood call importing SkippedFileInfo as a type via the public re-export (../core/file/fileCollect.js). Keeps the CLI decoupled from worker internals.
40-42: Nice placement in the reporting flowRunning reportSkippedFiles after Security Check and before the summary, with a spacer line, reads well and matches existing section rhythms.
src/core/file/fileCollect.ts (3)
7-16: Public result shape + type re-export look goodIntroducing FileCollectResults and re-exporting SkippedFileInfo gives the CLI and packager a clean, stable surface. Nice separation of raw vs. skipped.
61-74: Partitioning results is straightforward and clearSimple accumulation with explicit branches keeps intent obvious. No concerns.
25-26: All collectFiles call sites correctly handle the new return shapeVerified usages:
- In
src/core/packager.ts,collectFilesresults are reduced over.rawFilesand.skippedFilesbefore further processing.- In the integration tests (
tests/integration-tests/packager.test.ts), the stubbedcollectFilesreturns aFileCollectResultsobject, whichpackthen handles via its existing reductions.- In the unit tests (
tests/core/file/fileCollect.test.ts), assertions explicitly checkresult.rawFilesandresult.skippedFiles.No call sites assume a flat
RawFile[].src/core/file/workers/fileCollectWorker.ts (2)
2-4: Worker log-level initializationInitializing logger config from workerData at module load is correct; it ensures worker traces respect CLI verbosity without extra plumbing.
16-24: Type additions are cohesiveSkippedFileInfo and FileCollectResult map cleanly to readRawFile’s contract. This unlocks end-to-end skip reporting without leaking worker internals.
src/core/packager.ts (4)
5-5: Publicly surfacing skipped file info is the right callImporting the re-exported type and wiring collectFiles aligns the pack result with the new CLI section.
32-33: PackResult extension is coherentAdding skippedFiles to PackResult keeps reporting concerns out of processing code and mirrors existing suspicious* fields.
86-95: Per-dir collection with withMemoryLogging is goodThe step boundary makes memory deltas actionable for large repos. No issues spotted.
142-143: Wiring skippedFiles through the final resultLooks correct; nothing else in the pipeline consumes it, keeping concerns separated from processing/metrics.
- Replace empty return in fileCollectWorker with descriptive error - Simplify array flattening using flatMap instead of reduce - Make test assertions OS-agnostic using path.join - Improve encoding error detection with confidence checks and BOM handling
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new feature to report files detected as binary by content inspection, which were previously ignored silently. The implementation is solid, propagating file skip reasons throughout the file collection pipeline and adding a new section to the CLI report. I've found one issue related to path handling when multiple directories are processed, which could lead to ambiguous file paths in the new report section. My review includes a suggestion to fix this to ensure the new feature is robust.
related:
Summary
This PR implements a new feature that reports files which are detected as binary by content inspection (not by extension) in the CLI output. Previously, these files were silently excluded with only debug-level logging, which could confuse users when files with text extensions contained binary data.
Changes Made
fileRead.tsto return detailed skip reasons (binary-extension,binary-content,size-limit,encoding-error)Key Features
Example Output
Technical Implementation
fileRead.tswith result objects containing skip reasonsChecklist
npm run testnpm run lint