Skip to content

fix: Improve error handling for permission errors and special tokens (Issue #868)#907

Merged
yamadashy merged 4 commits intomainfrom
fix/issue-868-permission-and-token-errors
Oct 19, 2025
Merged

fix: Improve error handling for permission errors and special tokens (Issue #868)#907
yamadashy merged 4 commits intomainfrom
fix/issue-868-permission-and-token-errors

Conversation

@yamadashy
Copy link
Owner

Summary

Fixes #868

This PR addresses two distinct issues reported in #868:

  1. Permission errors when copying output files (Windows-specific)
  2. Token counting errors with special tokens (all platforms)

Changes

1. Permission Error Improvements

File: src/cli/actions/remoteAction.ts

When file copy operations fail due to permission errors (EPERM/EACCES), the error message now:

  • Clearly explains the problem (permission denied)
  • Identifies the likely cause (protected directory)
  • Provides 3 specific solutions:
    • Run from a different directory
    • Use --output flag to specify writable location
    • Use --stdout to print directly to console

This commonly occurs when users run repomix from protected directories like C:\Windows\System32 on Windows.

2. Special Token Handling

File: src/core/metrics/TokenCounter.ts

Fixed token counting errors when processing files containing special tokens (e.g., <|endoftext|> in tokenizer vocab.json files):

  • Disabled special token validation by passing empty arrays to tiktoken's encode() method
  • Special tokens are now treated as ordinary text (appropriate for code analysis)
  • Eliminates "special token not allowed" errors
  • Works across all platforms (Mac, Windows, Linux)

Before:

Failed to count tokens. error: The text contains a special token that is not allowed: <|endoftext|>
(repeated many times...)

After:

✔ Packing completed successfully!
Total Tokens: 13,369,990 tokens

Test Plan

  • Run npm run test - All 800 tests passing
  • Run npm run lint - No errors
  • Added test cases for EPERM and EACCES errors
  • Updated test cases for new token encoding parameters
  • Tested on DiffSynth-Studio repository - no warnings, accurate token counts

Checklist

  • Run npm run test
  • Run npm run lint

…itory output

When copying output files fails due to permission errors (EPERM/EACCES),
provide clear and actionable error messages that explain the issue and
offer specific solutions.

This commonly occurs when users run repomix from protected directories
like C:\Windows\System32 on Windows.

Fixes #868
Disable special token validation in tiktoken encoding by passing empty
arrays for allowed_special and disallowed_special parameters. This
prevents "special token not allowed" errors when processing files that
contain special token sequences like <|endoftext|>.

This approach treats special tokens as ordinary text, which is appropriate
for general code/text analysis where we're not sending content to LLM APIs.

Common in tokenizer configuration files (vocab.json) which frequently
contain special token definitions.

Fixes #868
Copilot AI review requested due to automatic review settings October 19, 2025 14:15
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 19, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

This PR addresses two issues: (1) improves error handling for permission-related failures when copying output files to the current directory by throwing descriptive RepomixError messages with actionable guidance, and (2) modifies token counting to treat potential special token sequences as ordinary text by disabling special token validation in the encoder.

Changes

Cohort / File(s) Summary
Remote Action Error Handling
src/cli/actions/remoteAction.ts, tests/cli/actions/remoteAction.test.ts
Enhanced error handling in copyOutputToCurrentDirectory to detect permission-related errors (EPERM, EACCES) and throw descriptive RepomixError with guidance on writable locations and alternatives. Added two test cases validating the human-friendly error messages for both EPERM and EACCES scenarios.
Token Counter Encoding
src/core/metrics/TokenCounter.ts, tests/core/metrics/TokenCounter.test.ts
Modified countTokens to call encode(content, [], []) with two additional empty array arguments, disabling special token validation and treating special token sequences as ordinary text. Updated all corresponding test expectations to verify the new encoder call signature.

Sequence Diagram

sequenceDiagram
    participant User
    participant RemoteAction
    participant FileSystem
    participant ErrorHandler

    User->>RemoteAction: copyOutputToCurrentDirectory()
    RemoteAction->>FileSystem: copyFile(source, dest)
    
    alt Success
        FileSystem-->>RemoteAction: file copied
        RemoteAction-->>User: ✓ success
    else Permission Error (EPERM/EACCES)
        FileSystem-->>RemoteAction: error
        RemoteAction->>ErrorHandler: detect permission error
        ErrorHandler->>ErrorHandler: compose friendly message
        ErrorHandler-->>RemoteAction: RepomixError with guidance
        RemoteAction-->>User: ✗ descriptive error
    else Other Error
        FileSystem-->>RemoteAction: error
        RemoteAction-->>User: ✗ original error
    end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

The changes follow consistent patterns: one is a straightforward error handling enhancement with corresponding tests, while the other is a targeted parameter modification to function calls across test files. Both modifications are localized with clear intent and existing logic paths remain unchanged.

Possibly related PRs

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "fix: Improve error handling for permission errors and special tokens (Issue #868)" clearly and specifically summarizes the primary changes in the changeset. It identifies two distinct improvements (permission error handling and special token handling) and references the linked issue. The title is concise, avoids vague terminology, and accurately reflects the main objectives of the pull request.
Linked Issues Check ✅ Passed The pull request successfully addresses both primary objectives from linked issue #868. For permission error handling, the changes to remoteAction.ts enhance error messages with clear guidance on running from different directories, using the --output flag, or using --stdout—directly addressing the EPERM error encountered when running from protected directories. For token counting, the TokenCounter.ts changes disable special token validation by passing empty arrays to encode(), treating special tokens as ordinary text and eliminating the "special token not allowed" errors. Both coding requirements are met with supporting test cases added.
Out of Scope Changes Check ✅ Passed All code changes in this pull request are directly related to the stated objectives in linked issue #868. The modifications to remoteAction.ts and TokenCounter.ts directly address the permission error and special token handling failures reported in the issue, while the test updates validate these specific fixes. No extraneous or out-of-scope changes are present in the changeset.
Description Check ✅ Passed The PR description exceeds the template requirements by providing a comprehensive summary of changes, detailed explanations for each of the two issues being addressed, concrete before/after examples, a thorough test plan with confirmation that testing was completed, and a fully completed checklist. The description clearly explains the problem context, solution approach, affected files, and validation steps, making it easy for reviewers to understand the scope and impact of the changes.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @yamadashy, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the robustness and user experience of the application by addressing two critical issues: providing more informative and helpful error messages for file permission failures, and ensuring accurate token counting by correctly handling special tokens in various file types. These changes aim to prevent common user frustrations and ensure consistent functionality across different operating systems and content types.

Highlights

  • Improved Permission Error Handling: Enhanced error messages for EPERM and EACCES permission errors during file copy operations, providing clear explanations and actionable solutions to users, especially for Windows-specific issues.
  • Fixed Special Token Counting: Resolved token counting errors that occurred when files contained special tokens (e.g., <|endoftext|>) by disabling special token validation in the tiktoken encoder, treating them as ordinary text for accurate code analysis.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Purpose: Improve handling of permission errors during output file copy operations and adjust token counting to ignore special token validation (fixing Issue #868).

  • Adds explicit EPERM/EACCES handling with actionable guidance in copyOutputToCurrentDirectory.
  • Modifies TokenCounter to pass empty arrays to encode to treat special tokens as ordinary text and updates related tests accordingly.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
src/cli/actions/remoteAction.ts Adds specific permission error handling with a detailed user-facing message.
src/core/metrics/TokenCounter.ts Adjusts token encoding to disable special token validation and adds explanatory comments.
tests/cli/actions/remoteAction.test.ts Adds tests asserting improved permission error messaging for EPERM and EACCES.
tests/core/metrics/TokenCounter.test.ts Updates expectations to reflect new encode invocation signature with empty arrays.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses two separate issues: improving error messages for permission errors on Windows and fixing token counting for files with special tokens. The changes are well-implemented. The error handling for file copy operations is now much more user-friendly, providing clear instructions on how to resolve the issue. The fix for token counting correctly handles special tokens by treating them as regular text, which is appropriate for this tool's purpose. The accompanying tests are thorough and validate the new behavior. I have one suggestion to make the error handling code in remoteAction.ts even more robust and readable.

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Oct 19, 2025

Deploying repomix with  Cloudflare Pages  Cloudflare Pages

Latest commit: 4a87223
Status: ✅  Deploy successful!
Preview URL: https://a47cd64c.repomix.pages.dev
Branch Preview URL: https://fix-issue-868-permission-and.repomix.pages.dev

View logs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/core/metrics/TokenCounter.test.ts (1)

50-136: Test expectations correctly updated to match new encode signature.

All tests now expect encode(text, [], []) calls, which aligns with the implementation change in TokenCounter.ts.

Consider adding a specific test case that validates token counting for content containing special token sequences (e.g., <|endoftext|>). This would explicitly verify the fix for Issue #868:

test('should handle content with special token sequences', () => {
  const text = 'tokenizer_config = {"eos_token": "<|endoftext|>"}';
  const mockTokens = [1, 2, 3, 4, 5];
  mockEncoder.encode.mockReturnValue(mockTokens);

  const count = tokenCounter.countTokens(text);

  expect(count).toBe(5);
  expect(mockEncoder.encode).toHaveBeenCalledWith(text, [], []);
});
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 76a8c59 and a8795d7.

📒 Files selected for processing (4)
  • src/cli/actions/remoteAction.ts (1 hunks)
  • src/core/metrics/TokenCounter.ts (1 hunks)
  • tests/cli/actions/remoteAction.test.ts (1 hunks)
  • tests/core/metrics/TokenCounter.test.ts (8 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/cli/actions/remoteAction.test.ts (1)
src/cli/actions/remoteAction.ts (1)
  • copyOutputToCurrentDirectory (186-225)
src/cli/actions/remoteAction.ts (1)
src/shared/errorHandle.ts (1)
  • RepomixError (6-11)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: Build and run (macos-latest, 24.x)
  • GitHub Check: Test with Bun (macos-latest, latest)
  • GitHub Check: Lint TypeScript
  • GitHub Check: Build and run (windows-latest, 22.x)
  • GitHub Check: Build and run (windows-latest, 24.x)
  • GitHub Check: Build and run (windows-latest, 20.x)
  • GitHub Check: Build and run (ubuntu-latest, 24.x)
  • GitHub Check: Build and run (ubuntu-latest, 20.x)
  • GitHub Check: Build and run (ubuntu-latest, 22.x)
  • GitHub Check: Lint GitHub Actions
  • GitHub Check: Memory Test
  • GitHub Check: claude-review
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (4)
tests/cli/actions/remoteAction.test.ts (2)

260-272: LGTM! Good coverage for Windows permission scenario.

The test correctly simulates an EPERM error when copying to a protected Windows directory (C:\Windows\System32) and validates that the error message includes actionable guidance.


274-286: LGTM! Good coverage for Unix/Linux permission scenario.

The test correctly simulates an EACCES error when copying to a protected directory and validates that the same helpful error message is provided.

src/cli/actions/remoteAction.ts (1)

209-221: Excellent enhancement for permission error handling.

The implementation properly detects EPERM and EACCES errors and provides clear, actionable guidance with three remediation options. This directly addresses the Windows System32 permission error from Issue #868.

src/core/metrics/TokenCounter.ts (1)

21-26: Good solution for special token handling.

Passing empty arrays to encode() disables special token validation, treating special tokens as ordinary text. This is appropriate for general code analysis where the content may contain token sequences like <|endoftext|> in tokenizer configuration files.

The comment clearly explains the rationale and use case.

@codecov
Copy link

codecov bot commented Oct 19, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.60%. Comparing base (76a8c59) to head (4a87223).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #907      +/-   ##
==========================================
+ Coverage   74.36%   74.60%   +0.23%     
==========================================
  Files         111      111              
  Lines        7776     7782       +6     
  Branches     1448     1450       +2     
==========================================
+ Hits         5783     5806      +23     
+ Misses       1993     1976      -17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@claude
Copy link
Contributor

claude bot commented Oct 19, 2025

Code Review - PR #907

This PR effectively addresses both issues reported in #868. The implementation is solid with good test coverage.

Strengths

  1. Clear Problem Solving - Both fixes directly address the reported issues
  2. Excellent Error Messages - Permission error guidance is specific and actionable
  3. Good Test Coverage - Tests for both EPERM and EACCES error codes
  4. Consistent with Codebase - Matches existing patterns in fileSearch.ts and permissionCheck.ts
  5. Well-Documented - TokenCounter change includes clear rationale

Key Suggestions

  1. EROFS Handling - Consider adding EROFS (read-only filesystem) to permission error checks for Docker/CI environments
  2. Windows Path Examples - Error message uses ~/repomix-output.xml which does not work on Windows. Consider platform-aware examples
  3. Test Coverage - Add test for special token handling to verify files with endoftext no longer throw errors

Security & Performance

  • No security issues identified
  • No performance regressions expected - token counting change likely improves performance by skipping validation
  • Successfully tested on 13M+ token repository

Summary

Recommendation: Approve and merge

This is a well-implemented PR solving real user problems with clear error messages, proper error handling, and good test coverage. The suggestions above are minor improvements that could be addressed in follow-up PRs. Current implementation is production-ready.

Review conducted per CLAUDE.md guidelines

Use a single template literal instead of string concatenation for
better readability and modern JavaScript best practices.

Addresses feedback from AI code review (Gemini Code Assist).
Use 'latest' instead of specific version number to avoid schema
version mismatch warnings.
@claude
Copy link
Contributor

claude bot commented Oct 19, 2025

PR Review: Fix Permission Errors and Special Token Handling

Summary

This PR effectively addresses two critical issues from #868:

  1. ✅ Improved error messaging for permission errors (Windows-specific)
  2. ✅ Fixed token counting errors with special tokens (all platforms)

Strengths

  • Clear problem identification: Both fixes target well-documented, reproducible issues
  • User-friendly error messages: The permission error now provides actionable guidance
  • Appropriate fix for token counting: Disabling special token validation is the right approach for code analysis
  • Good test coverage: Added tests for both EPERM and EACCES error codes
  • Minimal changes: Surgical fixes without unnecessary refactoring

Code Quality & Best Practices

✅ Positive Observations

  1. Error handling improvement (remoteAction.ts:209-222):

    • Properly checks both EPERM and EACCES error codes
    • Excellent multi-line error message with clear formatting
    • Provides 3 actionable solutions to users
  2. Token counter fix (TokenCounter.ts:26):

    • Well-commented explanation of why special tokens are disabled
    • Aligns with tiktoken's API: encode(text, allowed_special=[], disallowed_special=[])
    • Appropriate for general code analysis (not LLM API usage)
  3. Test coverage:

    • Tests verify both error codes (EPERM, EACCES)
    • Regex pattern correctly validates multi-line error message
    • All existing tests updated to match new API signature

💡 Suggestions

Minor: Consider extracting the error message to a constant

// In remoteAction.ts, consider:
const PERMISSION_ERROR_MESSAGE = `Failed to copy output file to {path}: Permission denied.

The current directory may be protected or require elevated permissions.
Please try one of the following:
  • Run from a different directory (e.g., your home directory or Documents folder)
  • Use the --output flag to specify a writable location: --output ~/repomix-output.xml
  • Use --stdout to print output directly to the console`;

This would make testing easier and improve maintainability. However, this is very minor and can be addressed in future refactoring.

Potential Bugs & Edge Cases

⚠️ Minor Issue: Inconsistency with Website Server Code

The website server has a similar copyOutputToCurrentDirectory function at /website/server/src/domains/pack/utils/fileUtils.ts (lines 161-184) that does NOT include the same permission error handling.

Impact: Low - The website server runs in a controlled environment, but consistency would be beneficial.

Recommendation: Consider applying the same permission error handling to the website server in a follow-up PR for consistency, or document why it's not needed there.

✅ Edge Cases Considered

  1. Permission errors: Both EPERM and EACCES properly handled ✓
  2. Token encoding errors: Still caught and logged (line 27-42) ✓
  3. File path validation: Already present in line 191-199 ✓
Performance Considerations

✅ No Performance Impact

  1. Token counting change: Minimal performance impact

    • The encode(content, [], []) call bypasses special token validation
    • This may actually be slightly faster than validation
    • No observable performance degradation expected
  2. Error handling: Only executes on error path

    • No performance impact on happy path
Security Analysis

✅ Security Posture Improved

  1. Permission errors now surface clearly: Users are less likely to run with elevated privileges unnecessarily
  2. No security regressions: The changes don't introduce new attack vectors
  3. Token counting: Treating special tokens as text is appropriate for code analysis (defensive approach)

ℹ️ Note on Special Token Handling

The fix correctly identifies that for code analysis (repomix's use case), special tokens like <|endoftext|> should be counted as regular text. This is different from LLM API usage where special tokens have semantic meaning.

Premortem Analysis: Potential Failure Scenarios

🔍 Deployment Risks (All Low)

1. Tiktoken API Compatibility

  • Risk: Future tiktoken versions might change the encode() signature
  • Likelihood: Low (tiktoken is stable)
  • Mitigation:
    • ✅ Already mitigated: package.json pins tiktoken to ^1.0.22
    • Consider: Add a comment referencing the tiktoken version when this API was verified

2. Error Message Localization

  • Risk: Hardcoded English error messages may not be ideal for international users
  • Likelihood: Medium (if repomix grows internationally)
  • Impact: Low (error messages are still understandable)
  • Mitigation: Not urgent, but consider i18n in future for all error messages

3. Platform-Specific Error Codes

  • Risk: Other platforms might return different error codes for permission issues
  • Likelihood: Low (EPERM/EACCES are POSIX standard)
  • Impact: Medium (users on exotic platforms might not get helpful message)
  • Mitigation: Current implementation is good; monitor for other error codes in issue reports

4. Website Server Inconsistency

  • Risk: Users of the website server might encounter different error behavior
  • Likelihood: Low (website runs in controlled environment)
  • Impact: Low
  • Mitigation: Document or apply similar fix to website server in follow-up

🎯 Integration Risks (All Low)

1. Test Coverage Gap

  • Risk: Real tiktoken behavior might differ from mocked tests
  • Likelihood: Very Low
  • Current mitigation:
    • ✅ PR description mentions testing on DiffSynth-Studio repository
    • ✅ All 800+ tests passing
  • Recommendation: Consider adding an integration test with actual tiktoken encoding if one doesn't exist

2. Backward Compatibility

  • Risk: Breaking changes for users importing repomix as a library
  • Likelihood: Very Low
  • Impact: None (internal implementation change only)
  • Mitigation: No action needed
Test Coverage Assessment

✅ Excellent Coverage

  1. Permission errors: Both EPERM and EACCES tested with realistic scenarios
  2. Token counting: All existing tests updated to match new signature
  3. Regression prevention: Tests ensure the fix continues working

💭 Optional Future Enhancement

Consider adding an integration test that processes a file containing actual special tokens (like the DiffSynth-Studio tokenizer configs) to ensure end-to-end behavior is correct. However, this is not critical given the thorough manual testing mentioned in the PR.


Final Recommendation: ✅ APPROVE with minor suggestions

This PR successfully fixes both reported issues with:

  • ✅ Minimal, focused changes
  • ✅ Clear documentation and comments
  • ✅ Appropriate test coverage
  • ✅ No security concerns
  • ✅ No performance degradation
  • ✅ User-friendly error messages

Suggested Actions (Optional, Non-Blocking)

  1. Consider extracting the permission error message to a constant for easier testing/maintenance
  2. In a follow-up PR, evaluate whether the website server needs similar permission error handling
  3. Add a comment referencing tiktoken version when the API was verified

Great work on this fix! 🎉 The improved error messages will significantly help users who encounter these issues, especially on Windows.

@claude
Copy link
Contributor

claude bot commented Oct 19, 2025

PR Review: Fix permission errors and special token handling

Overall Assessment: Approved with Minor Suggestions

This PR effectively addresses both issues reported in #868 with clean, well-tested solutions. The code quality is excellent and aligns with the project's standards.


Strengths

  1. Excellent Error Messages - The permission error handling provides actionable guidance with three specific solutions
  2. Proper Root Cause Fix - The tiktoken special token issue is fixed at the right abstraction level
  3. Good Test Coverage - Both new features have corresponding test cases
  4. Clear Documentation - Inline comments explain the rationale for the tiktoken changes
  5. Follows Project Guidelines - Adheres to Conventional Commits, dependency injection pattern, and feature-based structure

Code Quality and Best Practices

Permission Error Handling (remoteAction.ts)

  • Correctly checks both EPERM and EACCES error codes
  • Provides user-friendly, actionable error messages
  • Maintains existing error handling for other failure types
  • Type-safe with proper NodeJS.ErrnoException casting

Token Counter Changes (TokenCounter.ts)

  • Passing empty arrays [], [] to tiktoken encode() disables special token validation
  • Well-documented with clear inline comments explaining the rationale
  • Appropriate for general code analysis use case (not LLM API calls)
  • Maintains error handling and returns 0 on failure

Test Coverage

  • Added tests for both EPERM and EACCES permission errors
  • Updated all existing TokenCounter tests to use new signature
  • Uses regex matching for flexible assertion on error message content
  • Good test case names and structure

Potential Issues and Edge Cases

Minor Observations:

  1. Error Message Formatting - The multi-line error message in remoteAction.ts:214-220 uses a template literal that preserves leading spaces. This is fine for terminal output, but verify it displays correctly across different terminal types.

  2. Tiktoken API Stability - The encode(content, [], []) signature appears to be passing allowed_special and disallowed_special parameters as empty arrays. While this works, consider:

    • Adding a comment about which tiktoken version introduced this signature
    • Verifying this works across the minimum Node.js version (20.0.0)
  3. Windows Path Escaping - In test at line 262, the Windows path C:\Windows\System32 is correctly double-escaped. Good attention to detail.


Performance Considerations
  • No Performance Impact - Changes are only in error paths (permission handling) and parameter passing (tiktoken)
  • Token Counting - The encode(content, [], []) call should have identical or marginally better performance than the previous version since special token validation is skipped
  • No Algorithmic Changes - Core logic remains the same

Security Concerns
  • No Security Issues Detected
  • Error Message Information Disclosure - The error message reveals the target path (targetPath), but this is acceptable as it is the user's own working directory
  • No Injection Risks - All user inputs are properly handled
  • Special Token Handling - Treating special tokens as regular text is safe for static analysis; this is not code that will be executed

Test Coverage Assessment

Current Coverage: Good

What is Tested:

  • EPERM permission errors
  • EACCES permission errors
  • Token counter with new encode parameters (8 test cases updated)
  • Error message content validation using regex

Potential Additions (Optional):

  • Integration test with actual special token content (e.g., file containing endoftext)
  • Test for the specific error message suggestions (--output, --stdout) are present
  • Cross-platform path handling tests (though current tests cover both Unix and Windows paths)

Test Quality:

  • Tests use appropriate mocking
  • Regex patterns are flexible enough to handle minor message changes
  • Both positive and negative cases covered

Premortem Analysis: Potential Failure Scenarios

Identified Risks and Mitigations:

1. tiktoken API Breaking Changes

Risk: Future tiktoken updates might change the encode() signature or behavior
Likelihood: Low
Impact: High (token counting would break)
Mitigation:

  • Already mitigated: Error handling catches failures and returns 0
  • Consider: Add a comment referencing tiktoken version (v1.0.22) where this works
  • Consider: Pin tiktoken to a specific major version in package.json

2. Permission Error on Other Platforms

Risk: Different error codes on other OS (macOS, Linux variants, BSDs)
Likelihood: Low-Medium
Impact: Low (falls back to generic error)
Mitigation:

  • Generic error handler still works for unknown error codes
  • Consider: Log/track unhandled error codes to discover new cases

3. Token Count Accuracy Change

Risk: Disabling special token validation might slightly change token counts
Likelihood: High (expected behavior change)
Impact: Low (more accurate for code analysis)
Mitigation:

  • This is the intended fix - treating special tokens as regular text
  • Better represents actual content for AI consumption
  • Document in release notes that token counts may differ slightly

4. Error Message UX Issues

Risk: Multi-line error message formatting issues in different terminals
Likelihood: Low
Impact: Low (message still readable, just formatting)
Mitigation:

  • Uses standard template literals that work across terminals
  • Test on Windows Command Prompt, PowerShell, and Git Bash

5. Integration Issues

Risk: Copy operation might fail for other reasons (disk full, network drives)
Likelihood: Medium
Impact: Low (generic error handler catches these)
Mitigation:

  • Generic error handler at line 224 catches all other cases
  • Error messages include the original error message

6. Backward Compatibility

Risk: Breaking changes for users relying on special token error behavior
Likelihood: Very Low
Impact: None (this was a bug, not a feature)
Mitigation:

  • The previous behavior was throwing errors - this is strictly an improvement
  • No API changes for library consumers

Suggestions for Improvement

  1. Optional: Add tiktoken version comment - Reference the tiktoken version where the encode signature with allowed_special and disallowed_special parameters is supported

  2. Optional: Consider extracting error message to constant - For easier maintenance and potential i18n in the future

  3. Optional: Add integration test - Test with a file actually containing special token sequences to verify the fix end-to-end


Checklist Verification

  • Tests claimed to pass (All 800 tests passing)
  • Linting claimed to pass (No errors)
  • Test cases added for both fixes
  • Follows Conventional Commits format
  • Includes scope in commit message
  • PR description follows template

Conclusion

This is a high-quality PR that solves real user problems with minimal code changes. The fixes are well-targeted, properly tested, and include excellent user-facing improvements. The premortem analysis reveals only minor risks, all of which are already mitigated.

Recommendation: Approve and merge

Great work addressing both issues in #868! The improved error messages will significantly enhance user experience, especially for Windows users running from protected directories.

@yamadashy yamadashy merged commit ba16a97 into main Oct 19, 2025
48 checks passed
@yamadashy yamadashy deleted the fix/issue-868-permission-and-token-errors branch October 19, 2025 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Not able to repomix this repo, what is wrong here?

2 participants