Skip to content

feat(cli): Report files detected as binary by content inspection#798

Merged
yamadashy merged 3 commits intomainfrom
feat/binary-check
Aug 23, 2025
Merged

feat(cli): Report files detected as binary by content inspection#798
yamadashy merged 3 commits intomainfrom
feat/binary-check

Conversation

@yamadashy
Copy link
Owner

@yamadashy yamadashy commented Aug 23, 2025

related:

Summary

This PR implements a new feature that reports files which are detected as binary by content inspection (not by extension) in the CLI output. Previously, these files were silently excluded with only debug-level logging, which could confuse users when files with text extensions contained binary data.

Changes Made

  • New Binary Files Detection Section: Added "📄 Binary Files Detected" section to CLI output
  • Enhanced File Reading: Modified fileRead.ts to return detailed skip reasons (binary-extension, binary-content, size-limit, encoding-error)
  • Updated File Collection Pipeline: Enhanced file collection to track and propagate skipped files throughout the system
  • Clean Output Format: Simple, numbered list of detected files with clear messaging
  • Type Safety: Added proper TypeScript types for file skip reasons and results

Key Features

  1. Content-Based Detection: Only reports files detected as binary by actual content inspection, not by file extension
  2. Clear User Guidance: Provides helpful messages about why files were excluded and what users should do
  3. Scalable Display: Handles single and multiple files with appropriate grammar (1 file vs N files)
  4. Integration with Existing Flow: Follows the same patterns as the Security Check section

Example Output

📄 Binary Files Detected:
─────────────────────────
3 files detected as binary by content inspection:
1. config/corrupted.txt
2. data/malformed.json  
3. logs/output.log

These files have been excluded from the output.
Please review these files if you expected them to contain text content.
スクリーンショット 2025-08-23 14 52 16

Technical Implementation

  • Enhanced fileRead.ts with result objects containing skip reasons
  • Updated file collection workers to handle new result format
  • Propagated skipped file information through the entire pipeline
  • Added comprehensive test coverage for all scenarios
  • Maintained backward compatibility and performance

Checklist

  • Run npm run test
  • Run npm run lint

Add new "Binary Files Detected" section to CLI output that shows files which were
skipped due to binary content detection (not extension-based). This addresses issue #752
where users were not informed about files being silently excluded.

Changes:
- Update fileRead.ts to return detailed skip reasons (binary-extension, binary-content, size-limit, encoding-error)
- Modify file collection pipeline to track and propagate skipped files
- Add reportSkippedFiles function to display binary-content detected files
- Show files with relative paths and helpful exclusion messages
- Only display section when binary-content files are found
- Add comprehensive test coverage for new functionality

The implementation follows existing security check reporting patterns and provides
users clear visibility into why files were excluded from output.
Remove redundant explanatory message from binary file listing to simplify output.
The main header already explains the detection method, making individual file
annotations unnecessary.

Changes:
- Remove "Detected as binary despite text extension" message from each file
- Improve type safety by using FileSkipReason type instead of string
- Update corresponding test expectations
- Clean up unused import in fileCollectWorker

The simplified output is cleaner while maintaining clarity about why files were excluded.
Copilot AI review requested due to automatic review settings August 23, 2025 05:53
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 23, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Introduces structured skip-reason reporting across file read/collect/pack flows; aggregates skipped files into PackResult; adds a CLI reporter that lists files skipped due to binary-content; updates tests to the new return types and result shapes; and expands PackResult with metrics and skippedFiles for downstream reporting.

Changes

Cohort / File(s) Summary
CLI reporting for skipped files
src/cli/cliReport.ts, tests/cli/cliReport.binaryFiles.test.ts, tests/cli/cliReport.test.ts, tests/cli/cliRun.test.ts, tests/cli/actions/*.test.ts
Adds reportSkippedFiles to print binary-content skipped files; integrates into CLI flow; adjusts tests to include packResult.skippedFiles.
File read result structuring
src/core/file/fileRead.ts
readRawFile now returns { content, skippedReason }; adds FileSkipReason and FileReadResult types; encodes reasons: binary-extension, binary-content, size-limit, encoding-error.
Worker result structuring
src/core/file/workers/fileCollectWorker.ts
Worker returns FileCollectResult with rawFile or skippedFile (path, reason); maps readRawFile result to structured outputs.
File collection aggregation
src/core/file/fileCollect.ts, tests/core/file/fileCollect.test.ts
collectFiles now returns { rawFiles, skippedFiles }; introduces FileCollectResults; re-exports SkippedFileInfo; updates tests accordingly.
Packager output expansion
src/core/packager.ts, tests/core/packager*.test.ts, tests/mcp/tools/packCodebaseTool.test.ts
pack() aggregates skippedFiles and adds token/char metrics and suspiciousFilesResults to PackResult; tests updated to new shape.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant CLI as CLI
  participant Pack as pack()
  participant Collect as collectFiles()
  participant Worker as fileCollectWorker
  participant Read as readRawFile()

  CLI->>Pack: run pack()
  Pack->>Collect: collectFiles(rootDir, config, cb)
  loop per file
    Collect->>Worker: process(filePath, maxFileSize)
    Worker->>Read: readRawFile(filePath, maxFileSize)
    Read-->>Worker: { content } or { skippedReason }
    alt content present
      Worker-->>Collect: { rawFile: { path, content } }
    else skippedReason present
      Worker-->>Collect: { skippedFile: { path, reason } }
    end
  end
  Collect-->>Pack: { rawFiles[], skippedFiles[] }
  Pack-->>CLI: PackResult { processedFiles..., skippedFiles[], metrics... }
  CLI->>CLI: reportSkippedFiles(binary-content only)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Assessment against linked issues

Objective Addressed Explanation
Emit clear warning/error when a file is skipped due to undecodable characters; include filename and ideally line number. [#752] CLI only reports binary-content skips; no explicit reporting for encoding-error or line numbers observed.

Assessment against linked issues: Out-of-scope changes

Code Change Explanation
Expand PackResult with totalFiles/totalCharacters/totalTokens, fileCharCounts/fileTokenCounts, gitDiffTokenCount/gitLogTokenCount, suspiciousFilesResults (src/core/packager.ts) Metrics and additional suspicious results are unrelated to reporting undecodable-character skips in #752.
Add CLI reporting specifically for binary-content skipped files (src/cli/cliReport.ts) The linked issue targets encoding-related skips; binary-content reporting is orthogonal.
Test updates around token/diff flows reflecting new collectFiles shape (tests/core/packager/diffsFunctionality.test.ts) Diff/token behavior adjustments are outside the encoding-skip warning objective.

Possibly related PRs

Suggested labels

enhancement

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/binary-check

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@claude
Copy link
Contributor

claude bot commented Aug 23, 2025

Claude encountered an error —— View job


I'll analyze this and get back to you.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @yamadashy, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new feature to the CLI that reports files identified as binary through content inspection, rather than just by their file extension. Previously, such files were silently excluded, which could lead to user confusion. The update provides clear visibility into why these files are skipped and offers guidance for users.

Highlights

  • CLI Output Enhancement: Introduced a new 'Binary Files Detected' section in the CLI output to clearly list files identified as binary by content.
  • Enhanced File Processing: Refactored file reading and collection pipelines to return detailed skip reasons (e.g., binary content, size limit, encoding errors) and propagate this information throughout the system.
  • Type Safety: Ensured type safety by adding new TypeScript types for file skip reasons and collection results, improving code robustness and maintainability.
  • User Guidance: Provided user-friendly messages explaining why files were excluded and advising on next steps, improving the user experience.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Aug 23, 2025

Deploying repomix with  Cloudflare Pages  Cloudflare Pages

Latest commit: a221ee7
Status: ✅  Deploy successful!
Preview URL: https://19bc1248.repomix.pages.dev
Branch Preview URL: https://feat-binary-check.repomix.pages.dev

View logs

@codecov
Copy link

codecov bot commented Aug 23, 2025

Codecov Report

❌ Patch coverage is 86.84211% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.44%. Comparing base (5e6918f) to head (a221ee7).
⚠️ Report is 84 commits behind head on main.

Files with missing lines Patch % Lines
src/core/file/fileRead.ts 66.66% 5 Missing ⚠️
src/core/file/workers/fileCollectWorker.ts 85.00% 3 Missing ⚠️
src/cli/cliReport.ts 90.47% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #798      +/-   ##
==========================================
- Coverage   87.49%   87.44%   -0.06%     
==========================================
  Files         113      113              
  Lines        6437     6491      +54     
  Branches     1316     1331      +15     
==========================================
+ Hits         5632     5676      +44     
- Misses        805      815      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request does a great job of implementing the new feature to report files detected as binary by content inspection. The changes are well-structured and propagated through the different layers of the application, from file reading to the final CLI report. The addition of tests for the new reporting logic is also a good practice.

I've identified a critical issue in the file collection worker that could lead to files being silently dropped, which would undermine the goal of this feature. I've also suggested a minor refactoring to improve code clarity and maintainability in the packager. Addressing these points will make the implementation more robust and reliable.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (13)
src/core/file/fileRead.ts (2)

31-34: Lower verbosity for extension-based skips to avoid confusion with new CLI content-based section.

Since the CLI prominently reports content-detected binaries, extension-based skips are incidental context. Consider trace here to reduce noise while keeping debug for content-based detection.

-      logger.debug(`Skipping binary file: ${filePath}`);
+      logger.trace(`Skipping binary file by extension: ${filePath}`);

40-43: Optional: sample subset for binary sniff to reduce cost on large files.

isBinary(null, buffer) scans entire buffers. Passing a reasonable slice (e.g., first 4–8KB) is typically sufficient and cheaper.

-    if (isBinary(null, buffer)) {
+    // Inspect a slice to keep the check lightweight; full buffer is rarely needed.
+    const sample = buffer.subarray(0, Math.min(buffer.length, 8192));
+    if (isBinary(null, sample)) {
tests/mcp/tools/packCodebaseTool.test.ts (1)

64-107: Add a focused test to assert skippedFiles are forwarded to the formatter.

Right now we never exercise a non-empty skippedFiles. Add a case that injects one and asserts the formatter receives it (guarding against regressions).

Example minimal addition:

+  test('should forward skippedFiles to formatter', async () => {
+    const testDir = '/test/project';
+    const skipped = [{ path: 'bin.dat', reason: 'binary-content' as const }];
+    vi.mocked(runCli).mockResolvedValueOnce({
+      packResult: { ...defaultPackResult, skippedFiles: skipped },
+      config: createMockConfig({ cwd: testDir }),
+    });
+
+    await toolHandler({ directory: testDir });
+
+    // Ensure formatter was called (signature is mocked, so we assert by call args shape)
+    expect(formatPackToolResponse).toHaveBeenCalledWith(
+      expect.objectContaining({
+        packResult: expect.objectContaining({ skippedFiles: skipped }),
+      }),
+    );
+  });
tests/core/packager.test.ts (1)

121-122: Add a positive-case asserting skippedFiles passthrough and exclusion from metrics.

Include a test where collectFiles returns non-empty skippedFiles and assert:

  • result.skippedFiles equals the provided list,
  • token/char/file counts exclude skipped items.

Illustrative change:

+  test('pack should propagate skippedFiles and exclude them from metrics', async () => {
+    const skipped = [{ path: 'ignored.bin', reason: 'binary-content' as const }];
+    const deps = { ...mockDeps, collectFiles: vi.fn().mockResolvedValue({ rawFiles: mockRawFiles, skippedFiles: skipped }) };
+    const result = await pack(['root'], mockConfig, progressCallback, deps as any);
+    expect(result.skippedFiles).toEqual(skipped);
+    // Existing expectations for totals implicitly assert skips are not included.
+  });
tests/cli/actions/defaultAction.test.ts (1)

670-671: Add a skipped‐files reporting test to defaultAction.test.ts

We should ensure that when pack() returns a non‐empty skippedFiles array, the CLI wiring invokes cliReport.reportSkippedFiles.

• In tests/cli/actions/defaultAction.test.ts (around lines 670–671):

  • Import the reporter:

    import * as cliReport from '../../../src/cli/cliReport.js';
  • Add a new case after the existing pack() tests:

     // existing tests...
     vi.mocked(packager.pack).mockResolvedValue({
       // ...other PackResult fields...
  •  skippedFiles: [],
    
  •  skippedFiles: [{ path: 'bin.dat', reason: 'binary-content' }],
    

    } as PackResult);

  • it('reports skipped files when pack returns skippedFiles', async () => {

  • // Arrange: return one skipped file

  • vi.mocked(packager.pack).mockResolvedValue({

  •  // ...reuse other required fields...
    
  •  skippedFiles: [{ path: 'bin.dat', reason: 'binary-content' }],
    
  • } as PackResult);

  • // Act

  • await runDefaultAction(['.'], process.cwd(), {});

  • // Assert

  • expect(cliReport.reportSkippedFiles).toHaveBeenCalledWith(

  •  expect.any(String),
    
  •  [{ path: 'bin.dat', reason: 'binary-content' }],
    
  • );

  • });

    
    

This addition validates that runDefaultAction correctly delegates skipped‐files reporting to the CLI reporter.

tests/cli/actions/remoteAction.test.ts (2)

53-53: Added packResult.skippedFiles in mocks — LGTM

The shape update aligns with the new PackResult contract and keeps these tests compiling and semantically correct.

To improve signal, consider adding one test where runDefaultAction returns a non-empty skippedFiles (e.g., a binary-content case) and assert that runRemoteAction behavior remains unchanged (no accidental logging or failures).

Also applies to: 96-96, 142-142, 188-188


26-28: Avoid real filesystem writes in tests (optional)

execGitShallowCloneMock calls fs.writeFile. Since fs.writeFile isn’t mocked in this suite, this introduces real FS writes. It’s harmless here but can introduce flakiness in constrained CI. Mocking it keeps the test hermetic.

Apply this diff near the vi.mock('node:fs/promises', ...) block:

 vi.mock('node:fs/promises', async (importOriginal) => {
   const actual = await importOriginal<typeof import('node:fs/promises')>();
   return {
     ...actual,
     copyFile: vi.fn(),
     mkdir: vi.fn(),
+    writeFile: vi.fn().mockResolvedValue(undefined),
   };
 });
tests/core/file/fileCollect.test.ts (3)

89-93: Clarify binary skip reason and add content-based coverage

This test exercises the extension-based skip (binary-extension). Given the PR’s goal also covers content inspection, add a companion test where decoding/character inspection yields binary-content to prevent regressions.

Example additional test (outside the shown range):

it('should skip files detected as binary by content inspection', async () => {
  const mockFilePaths = ['malformed.txt'];
  const mockRootDir = '/root';
  const mockConfig = createMockConfig();

  vi.mocked(isBinary).mockReturnValue(false);
  vi.mocked(fs.readFile).mockResolvedValue(Buffer.from([0xff, 0xfe, 0x00])); // arbitrary bytes
  // Simulate undecodable/malformed input leading to content-based binary detection
  vi.mocked(jschardet.detect).mockReturnValue({ encoding: 'utf-8', confidence: 0.2 }); // low confidence
  vi.mocked(iconv.decode).mockImplementation(() => {
    throw new Error('Malformed sequence'); // or simulate replacement-character detection in your impl
  });

  const result = await collectFiles(mockFilePaths, mockRootDir, mockConfig, () => {}, {
    initTaskRunner: mockInitTaskRunner,
  });

  expect(result).toEqual({
    rawFiles: [],
    skippedFiles: [{ path: 'malformed.txt', reason: 'binary-content' }],
  });
});

165-168: Custom maxFileSize — numeric formatting in trace is brittle

The subsequent assertion compares a specific formatted string (10240.0KB > 5120.0KB). If units/precision change (e.g., MB vs KB), the test will fail unnecessarily. Match flexibly.

Suggested assertion replacement for the nearby line:

expect(logger.trace).toHaveBeenCalledWith(
  expect.stringMatching(/File exceeds size limit:.*(10(\.0)?MB|10240(\.0)?KB)\s*>\s*(5(\.0)?MB|5120(\.0)?KB)/),
);

190-193: Read errors reported as encoding-error — consider separating IO vs decoding

Currently an I/O failure (fs.readFile rejection) maps to encoding-error. Consider a distinct reason (e.g., read-error or io-error) to differentiate environment/permissions issues from malformed content, improving CLI guidance.

If the enum is intentionally limited to four reasons in this PR, please confirm and we can open a follow-up to introduce read-error without scope-creep here.

src/cli/cliReport.ts (1)

164-187: Simplify skipped file path display in reportSkippedFiles

The SkippedFileInfo.path value is already emitted by the file collector as a path relative to the project root, so using path.relative(rootDir, file.path) is redundant and can introduce unexpected ../… segments if rootDir ever differs from the CLI’s working directory. You can log file.path directly.

Apply this refactor in src/cli/cliReport.ts:

 src/cli/cliReport.ts
 export const reportSkippedFiles = (rootDir: string, skippedFiles: SkippedFileInfo[]) => {
   const binaryContentFiles = skippedFiles.filter((file) => file.reason === 'binary-content');

   if (binaryContentFiles.length === 0) {
     return;
   }

   logger.log(pc.white('📄 Binary Files Detected:'));
   logger.log(pc.dim('─────────────────────────'));

   if (binaryContentFiles.length === 1) {
     logger.log(pc.yellow('1 file detected as binary by content inspection:'));
   } else {
     logger.log(pc.yellow(`${binaryContentFiles.length} files detected as binary by content inspection:`));
   }

-  binaryContentFiles.forEach((file, index) => {
-    const relativeFilePath = path.relative(rootDir, file.path);
-    logger.log(`${pc.white(`${index + 1}.`)} ${pc.white(relativeFilePath)}`);
-  });
+  // file.path is already relative to rootDir
+  binaryContentFiles.forEach((file, index) => {
+    logger.log(`${pc.white(`${index + 1}.`)} ${pc.white(file.path)}`);
+  });

   logger.log(pc.yellow('\nThese files have been excluded from the output.'));
   logger.log(pc.yellow('Please review these files if you expected them to contain text content.'));
 };

Non-blocking improvement: issue #752 tracks surfacing encoding/character errors (reason 'encoding-error'). You could mirror the binary section with a “⚠️ Files With Encoding Errors” block to clearly distinguish and explain why those files were skipped.

src/core/file/workers/fileCollectWorker.ts (1)

26-37: Tight result mapping; handle “impossible” branch explicitly

Mapping readRawFile to either rawFile or skippedFile is solid. The trailing return {} should be unreachable given current readRawFile semantics. Consider making that path explicit to aid diagnostics.

-  return {};
+  // This should never happen; add a safeguard to aid diagnostics if contracts change.
+  return {
+    skippedFile: {
+      path: filePath,
+      reason: 'encoding-error', // fallback to a safe, non-destructive reason
+    },
+  };

Alternatively, log a warning here (if bringing logger back into the worker is acceptable).

Also applies to: 39-45, 48-48

src/core/packager.ts (1)

97-99: Avoid spreading large arrays into concat; prefer flatMap

acc.concat(...curr.rawFiles) and acc.concat(...curr.skippedFiles) spread each element into arguments, which can degrade performance or hit argument limits at scale. Use flatMap (or concat without spread).

-  const rawFiles = collectResults.reduce((acc: RawFile[], curr) => acc.concat(...curr.rawFiles), []);
-  const allSkippedFiles = collectResults.reduce((acc: SkippedFileInfo[], curr) => acc.concat(...curr.skippedFiles), []);
+  const rawFiles = collectResults.flatMap((curr) => curr.rawFiles);
+  const allSkippedFiles = collectResults.flatMap((curr) => curr.skippedFiles);
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5e6918f and 8d117d9.

📒 Files selected for processing (14)
  • src/cli/cliReport.ts (3 hunks)
  • src/core/file/fileCollect.ts (3 hunks)
  • src/core/file/fileRead.ts (2 hunks)
  • src/core/file/workers/fileCollectWorker.ts (2 hunks)
  • src/core/packager.ts (4 hunks)
  • tests/cli/actions/defaultAction.test.ts (4 hunks)
  • tests/cli/actions/remoteAction.test.ts (4 hunks)
  • tests/cli/cliReport.binaryFiles.test.ts (1 hunks)
  • tests/cli/cliReport.test.ts (2 hunks)
  • tests/cli/cliRun.test.ts (2 hunks)
  • tests/core/file/fileCollect.test.ts (5 hunks)
  • tests/core/packager.test.ts (2 hunks)
  • tests/core/packager/diffsFunctionality.test.ts (2 hunks)
  • tests/mcp/tools/packCodebaseTool.test.ts (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
tests/cli/cliReport.binaryFiles.test.ts (3)
src/core/file/workers/fileCollectWorker.ts (1)
  • SkippedFileInfo (16-19)
src/cli/cliReport.ts (1)
  • reportSkippedFiles (164-187)
src/shared/logger.ts (1)
  • logger (89-89)
src/core/file/fileRead.ts (1)
src/shared/logger.ts (2)
  • logger (89-89)
  • error (34-38)
src/cli/cliReport.ts (1)
src/core/file/workers/fileCollectWorker.ts (1)
  • SkippedFileInfo (16-19)
src/core/file/workers/fileCollectWorker.ts (2)
src/core/file/fileRead.ts (2)
  • FileSkipReason (7-7)
  • readRawFile (20-53)
src/core/file/fileTypes.ts (1)
  • RawFile (1-4)
src/core/file/fileCollect.ts (2)
src/core/file/fileTypes.ts (1)
  • RawFile (1-4)
src/core/file/workers/fileCollectWorker.ts (4)
  • SkippedFileInfo (16-19)
  • FileCollectTask (10-14)
  • FileCollectTask (26-49)
  • FileCollectResult (21-24)
src/core/packager.ts (3)
src/core/file/workers/fileCollectWorker.ts (1)
  • SkippedFileInfo (16-19)
src/shared/memoryUtils.ts (1)
  • withMemoryLogging (71-87)
src/core/file/fileTypes.ts (1)
  • RawFile (1-4)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Build and run (windows-latest, 21.x)
  • GitHub Check: Build and run (windows-latest, 24.x)
  • GitHub Check: Build and run (windows-latest, 22.x)
  • GitHub Check: Build and run (windows-latest, 23.x)
  • GitHub Check: Build and run (windows-latest, 20.x)
  • GitHub Check: Test with Bun (windows-latest, latest)
🔇 Additional comments (25)
src/core/file/fileRead.ts (2)

24-29: LGTM: clear size-limit telemetry and structured skip reason.

The trace message is informative and the return shape matches the new contract.


7-12: LGTM: well-scoped public types for skip semantics.

The union covers all current cases and aligns with downstream reporting.

tests/mcp/tools/packCodebaseTool.test.ts (1)

47-48: LGTM: default result shape aligned with new PackResult.skippedFiles.

This keeps the tool scaffold compatible with the expanded API.

tests/core/packager.test.ts (1)

45-46: LGTM: collectFiles mock updated to new return shape.

The pack orchestration remains readable with the new { rawFiles, skippedFiles } contract.

tests/cli/actions/defaultAction.test.ts (3)

110-111: LGTM: default PackResult includes skippedFiles field.

Keeps test fixtures in sync with the new PackResult contract.


802-803: LGTM: directory-processing path also aligned with PackResult.skippedFiles.

Consistent fixtures across code paths.


872-873: LGTM: progress-callback path returns the updated shape.

The mocked implementation preserves skippedFiles, matching the production contract.

tests/cli/cliRun.test.ts (2)

103-103: LGTM: defaultAction result fixture includes skippedFiles.

Shape matches PackResult updates and prevents type drift.


159-159: LGTM: remoteAction result fixture includes skippedFiles.

Good parity between local and remote flows.

tests/cli/cliReport.test.ts (1)

49-49: PackResult now includes skippedFiles — tests updated appropriately

The added skippedFiles: [] keeps these fixtures consistent with the new public API. No further action needed here.

Also applies to: 75-75

tests/core/file/fileCollect.test.ts (2)

64-70: Assert new collectFiles result shape — LGTM

Good assertion of { rawFiles, skippedFiles }. This validates the new aggregation contract from the worker to the caller.


122-126: Size-based skipping assertion — LGTM

Correct expectation of { rawFiles, skippedFiles } and the single skipped file with size-limit. The follow-up trace assertions and read guards are meaningful.

tests/cli/cliReport.binaryFiles.test.ts (1)

58-65: Binary-content reporting UX — LGTM

Good coverage of header, count grammar, relative paths, and the exclusion guidance block. Filtering out binary-extension entries is correctly validated.

tests/core/packager/diffsFunctionality.test.ts (1)

56-56: Adapted collectFiles mock to new result shape — LGTM

Both tests now return { rawFiles, skippedFiles }, matching the updated packer pipeline expectations.

Also applies to: 107-107

src/cli/cliReport.ts (2)

4-4: Type re-export import is correct under NodeNext

Good call importing SkippedFileInfo as a type via the public re-export (../core/file/fileCollect.js). Keeps the CLI decoupled from worker internals.


40-42: Nice placement in the reporting flow

Running reportSkippedFiles after Security Check and before the summary, with a spacer line, reads well and matches existing section rhythms.

src/core/file/fileCollect.ts (3)

7-16: Public result shape + type re-export look good

Introducing FileCollectResults and re-exporting SkippedFileInfo gives the CLI and packager a clean, stable surface. Nice separation of raw vs. skipped.


61-74: Partitioning results is straightforward and clear

Simple accumulation with explicit branches keeps intent obvious. No concerns.


25-26: All collectFiles call sites correctly handle the new return shape

Verified usages:

  • In src/core/packager.ts, collectFiles results are reduced over .rawFiles and .skippedFiles before further processing.
  • In the integration tests (tests/integration-tests/packager.test.ts), the stubbed collectFiles returns a FileCollectResults object, which pack then handles via its existing reductions.
  • In the unit tests (tests/core/file/fileCollect.test.ts), assertions explicitly check result.rawFiles and result.skippedFiles.

No call sites assume a flat RawFile[].

src/core/file/workers/fileCollectWorker.ts (2)

2-4: Worker log-level initialization

Initializing logger config from workerData at module load is correct; it ensures worker traces respect CLI verbosity without extra plumbing.


16-24: Type additions are cohesive

SkippedFileInfo and FileCollectResult map cleanly to readRawFile’s contract. This unlocks end-to-end skip reporting without leaking worker internals.

src/core/packager.ts (4)

5-5: Publicly surfacing skipped file info is the right call

Importing the re-exported type and wiring collectFiles aligns the pack result with the new CLI section.


32-33: PackResult extension is coherent

Adding skippedFiles to PackResult keeps reporting concerns out of processing code and mirrors existing suspicious* fields.


86-95: Per-dir collection with withMemoryLogging is good

The step boundary makes memory deltas actionable for large repos. No issues spotted.


142-143: Wiring skippedFiles through the final result

Looks correct; nothing else in the pipeline consumes it, keeping concerns separated from processing/metrics.

- Replace empty return in fileCollectWorker with descriptive error
- Simplify array flattening using flatMap instead of reduce
- Make test assertions OS-agnostic using path.join
- Improve encoding error detection with confidence checks and BOM handling
@yamadashy
Copy link
Owner Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new feature to report files detected as binary by content inspection, which were previously ignored silently. The implementation is solid, propagating file skip reasons throughout the file collection pipeline and adding a new section to the CLI report. I've found one issue related to path handling when multiple directories are processed, which could lead to ambiguous file paths in the new report section. My review includes a suggestion to fix this to ensure the new feature is robust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant