Skip to content

feat(core): Replace istextorbinary with is-binary-path and isbinaryfile#1006

Merged
yamadashy merged 1 commit intomainfrom
refactor/replace-istextorbinary
Dec 14, 2025
Merged

feat(core): Replace istextorbinary with is-binary-path and isbinaryfile#1006
yamadashy merged 1 commit intomainfrom
refactor/replace-istextorbinary

Conversation

@yamadashy
Copy link
Owner

Summary

Replace istextorbinary package (last updated Dec 2023) with actively maintained alternatives:

  • is-binary-path: Extension-based binary detection (updated Apr 2024)
  • isbinaryfile: Content-based binary detection with zero dependencies (updated Dec 2025)

Why this change?

Package Weekly Downloads Last Update Dependencies
istextorbinary 2M 2023-12 3 (binaryextensions, editions, textextensions)
is-binary-path 47M 2024-04 1 (binary-extensions)
isbinaryfile 9M 2025-12 0

Binary extension coverage improvement

Package Binary Extensions
binaryextensions (old) 13
binary-extensions (new) 262

The new setup provides ~20x more binary extension coverage, reducing unnecessary content checks for common binary formats.

Behavior

The two-stage detection logic remains the same:

  1. Extension check (fast path): Skip files with known binary extensions
  2. Content check (fallback): Analyze file content for files with unknown extensions

Checklist

  • Run npm run test
  • Run npm run lint

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 14, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

The pull request replaces the istextorbinary library with a two-library approach using is-binary-path and isbinaryfile for binary file detection. Dependencies are updated in package.json, and the binary detection logic in fileRead.ts is refactored to perform a path-based check first, followed by a content-based check. Test mocks are updated to reflect the new detection strategy.

Changes

Cohort / File(s) Change Summary
Dependency updates
package.json
Replaced istextorbinary dependency with is-binary-path and isbinaryfile libraries
Core binary detection logic
src/core/file/fileRead.ts
Refactored binary detection from single istextorbinary call to two-step approach: path-based check via is-binary-path, then content-based check via isbinaryfile
Test suite
tests/core/file/fileCollect.test.ts
Updated test mocks and imports to use isBinaryPath and isBinaryFile; adjusted test expectations to reflect new two-step binary detection flow

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Attention areas:
    • Verify that the two-step detection logic (path-based followed by content-based) produces the same or improved binary detection accuracy compared to the previous single-library approach
    • Ensure that is-binary-path and isbinaryfile handle edge cases consistently and cover the file types previously detected by istextorbinary
    • Confirm that test mocks accurately reflect the async nature of isbinaryfile (resolved values) versus the synchronous behavior of is-binary-path

Possibly related PRs

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: replacing istextorbinary with is-binary-path and isbinaryfile packages.
Description check ✅ Passed The description is comprehensive and follows the template structure. It includes a clear summary, detailed rationale with comparison tables, behavior explanation, and completed checklist items.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @yamadashy, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the core binary file detection mechanism by replacing an outdated dependency with a pair of modern, actively maintained libraries. This change aims to improve the accuracy and efficiency of binary file identification, particularly through expanded extension coverage, while preserving the existing detection logic.

Highlights

  • Dependency Update: The istextorbinary package has been replaced with two actively maintained alternatives: is-binary-path for extension-based detection and isbinaryfile for content-based detection.
  • Improved Binary Extension Coverage: The new setup significantly increases binary extension coverage from 13 to 262, reducing the need for content checks on common binary formats.
  • Maintained Logic: The core two-stage detection logic (extension check first, then content check as fallback) remains unchanged, ensuring consistent behavior.
  • Code Refactoring: Imports and function calls related to binary detection have been updated across src/core/file/fileRead.ts and tests/core/file/fileCollect.test.ts to use the new libraries.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@codecov
Copy link

codecov bot commented Dec 14, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.83%. Comparing base (f06528a) to head (7f0d05d).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1006   +/-   ##
=======================================
  Coverage   89.83%   89.83%           
=======================================
  Files         120      120           
  Lines        9234     9235    +1     
  Branches     1683     1683           
=======================================
+ Hits         8295     8296    +1     
  Misses        939      939           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully replaces the istextorbinary package with the more modern is-binary-path and isbinaryfile packages. The implementation correctly adapts to the new APIs, including handling asynchronous operations, and the tests have been updated accordingly. However, I've identified a critical compatibility issue with the chosen version of isbinaryfile, which requires a newer Node.js version than what is specified for this project. This could break the setup for developers on older Node.js versions.

@claude
Copy link
Contributor

claude bot commented Dec 14, 2025

PR Review: refactor(core): Replace istextorbinary with is-binary-path and isbinaryfile

Summary

This is a well-structured refactoring PR that replaces the istextorbinary package with two actively maintained alternatives. The changes are straightforward and maintain the existing two-stage binary detection logic.

✅ Strengths

  1. Good package selection: The new packages are well-maintained with high download counts
  2. Significant coverage improvement: ~20x more binary extension coverage (13 → 262)
  3. Preserved behavior: The two-stage detection logic remains intact
  4. Clean implementation: Minimal code changes with proper async handling

⚠️ Critical Issue: Node.js Version Compatibility

Details

The isbinaryfile v6.0.0 in package-lock.json shows:

"engines": {
  "node": ">= 24.0.0"
}

However, repomix/package.json specifies:

"engines": {
  "node": ">=20.0.0"
}

Node.js 24 is not yet released (as of December 2024). This appears to be an error in the isbinaryfile package itself, but it may cause npm to refuse installation with strict engine checking enabled.

Recommendation: Verify that the package actually works on Node 20. Run the test suite on Node 20 to confirm compatibility. The package likely works fine (the engine constraint may be a typo in their package.json), but it is worth confirming.

💡 Suggestions

1. Consider adding a test for content-based binary detection

The current tests mock isBinaryFile but there is no explicit test case that verifies the content-based binary detection fallback path works correctly (i.e., when extension check passes but content check fails). Consider adding:

it("should skip files that pass extension check but fail content check", async () => {
  // File with non-binary extension but binary content
  vi.mocked(isBinaryPath).mockReturnValue(false);
  vi.mocked(isBinaryFile).mockResolvedValue(true);  // Binary content detected
  // ...verify skippedReason is "binary-content"
});
2. Minor: The `isBinaryFile` is now async

The change from synchronous isBinary(null, buffer) to async await isBinaryFile(buffer) is handled correctly. Just noting that this could have minor performance implications if processing many files, though it is likely negligible given the async file reading context.

🔍 Premortem Analysis

Risk Likelihood Impact Mitigation
Node.js engine mismatch Medium High Test on Node 20 before merging
Different binary detection behavior Low Medium The new packages use similar heuristics; already tested
Edge case: file misclassified Low Low Both extension and content checks provide redundancy

Checklist

  • Code follows Airbnb style guidelines
  • Tests updated appropriately
  • No security concerns introduced
  • Verify Node.js 20 compatibility

Overall, this is a solid refactoring PR. Just verify the Node.js compatibility concern before merging.

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Dec 14, 2025

Deploying repomix with  Cloudflare Pages  Cloudflare Pages

Latest commit: 7f0d05d
Status: ✅  Deploy successful!
Preview URL: https://91b02ecb.repomix.pages.dev
Branch Preview URL: https://refactor-replace-istextorbin.repomix.pages.dev

View logs

@yamadashy yamadashy changed the title refactor(core): Replace istextorbinary with is-binary-path and isbinaryfile feat(core): Replace istextorbinary with is-binary-path and isbinaryfile Dec 14, 2025
Migrate from istextorbinary (last updated 2023-12) to actively maintained packages:
- is-binary-path: Extension-based binary detection (updated 2024-04)
- isbinaryfile: Content-based binary detection with zero dependencies (updated 2025-12)

Improvements:
- Binary extension coverage: 13 → 262 extensions (~20x increase)
- Content detection: Better UTF-16/CJK support, statistical analysis (512 bytes vs 72 bytes)

The two-stage detection logic (extension check → content check) is preserved.
@yamadashy yamadashy force-pushed the refactor/replace-istextorbinary branch from de6ac28 to 7f0d05d Compare December 14, 2025 09:03
@claude
Copy link
Contributor

claude bot commented Dec 14, 2025

PR Review: feat(core): Replace istextorbinary with is-binary-path and isbinaryfile

Summary

This is a well-executed dependency modernization PR that replaces istextorbinary with two actively maintained alternatives (is-binary-path and isbinaryfile). The implementation is clean and preserves the existing two-stage binary detection logic.

✅ Strengths

  1. Good package selection: Both new packages are actively maintained with strong download counts
  2. Significant coverage improvement: ~20x more binary extension coverage (13 → 262 extensions)
  3. Clean implementation: Minimal, focused changes with proper async handling at src/core/file/fileRead.ts:41
  4. Well-tested: Test mocks properly updated to reflect the new two-function approach

⚠️ Previous Node.js Version Concern - Resolved

Details

The previous review noted that isbinaryfile v6.0.0 declares "node": ">= 24.0.0" in package-lock.json. However, this appears to be an error in the upstream package.json since:

  1. CI confirms Node 20 compatibility: Tests on macos-latest with Node 20.x have passed ✅
  2. The package likely works fine: The engine constraint appears to be a typo or overly aggressive constraint in the upstream package

This is worth noting, but not a blocker since the CI validates actual compatibility.

💡 Minor Observations

1. Consider an explicit test for binary content detection fallback

The tests cover the case where isBinaryPath returns true, but there is no explicit test case verifying the content-based fallback path (when extension check passes but content check detects binary). This would ensure both detection paths are validated:

it("should skip files that pass extension check but are detected as binary by content", async () => {
  vi.mocked(isBinaryPath).mockReturnValue(false);  // Extension check passes
  vi.mocked(isBinaryFile).mockResolvedValue(true); // Content check detects binary
  // ... verify skippedReason is "binary-content"
});

This is a minor enhancement suggestion, not a blocker.

2. The async change is handled correctly

The change from synchronous isBinary(null, buffer) to async await isBinaryFile(buffer) at line 41 is properly implemented. Since file reading is already async, this introduces no performance concerns.

🔍 Premortem Analysis

Risk Likelihood Impact Status
Node.js engine mismatch Low Medium Mitigated by CI
Different binary detection behavior Low Low Same heuristics, redundant checks
Edge case file misclassification Low Low Two-stage detection provides safety net

Checklist

  • Code follows Airbnb style guidelines
  • Tests updated appropriately
  • No security concerns
  • CI passing on multiple Node versions

Overall: This PR is ready to merge. The code is clean, well-tested, and the dependency upgrade provides tangible benefits (20x more binary extension coverage, actively maintained packages).

🤖 Generated with Claude Code

@yamadashy yamadashy merged commit a022d89 into main Dec 14, 2025
54 checks passed
@yamadashy yamadashy deleted the refactor/replace-istextorbinary branch December 14, 2025 09:22
yamadashy added a commit that referenced this pull request Dec 14, 2025
Remove the confidence < 0.2 check that was causing valid UTF-8/ASCII files
to be incorrectly skipped. Files are now only skipped if they contain actual
decode errors (U+FFFD replacement characters).

This fixes issues where:
- Valid Python files were skipped with confidence=0.00 (#869)
- HTML files with Thymeleaf syntax (~{}) were incorrectly detected as binary (#847)

The isbinaryfile library (added in PR #1006) now handles binary detection more
accurately, making the confidence-based heuristic unnecessary.

Fixes #869
yamadashy added a commit that referenced this pull request Dec 14, 2025
isbinaryfile v6.0.0 requires Node.js >= 24.0.0, but repomix supports
Node.js >= 20.0.0. Downgrade to v5.0.2 (requires Node.js >= 18.0.0)
to maintain compatibility with current LTS versions.

Addresses gemini-code-assist review on PR #1006
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant