Skip to content

feat(core): Improve error handling and performance optimizations#746

Merged
yamadashy merged 13 commits intomainfrom
feat/perf
Jul 22, 2025
Merged

feat(core): Improve error handling and performance optimizations#746
yamadashy merged 13 commits intomainfrom
feat/perf

Conversation

@yamadashy
Copy link
Owner

@yamadashy yamadashy commented Jul 22, 2025

yamadashy and others added 10 commits July 23, 2025 00:05
- Set TASKS_PER_THREAD to 100 for better balance between performance and resource usage
- Add comment explaining that worker initialization is expensive
- Update tests to match new thread allocation logic

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add performance monitoring to track how long it takes to initialize the Tinypool worker pool. This helps identify potential bottlenecks during startup.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
…er log level

Replace the environment variable approach for passing log levels to workers with Tinypool's workerData mechanism, which is more idiomatic for worker thread configuration.

Changes:
- Add setLogLevelByWorkerData() method to handle workerData-based log level setting
- Update Tinypool configuration to use workerData instead of env variables
- Update all 5 worker files to use setLogLevelByWorkerData()
- Remove unused setLogLevelByEnv function and related test mocks
- Update tests to reflect new workerData configuration

This provides better isolation and follows Node.js worker thread best practices.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Fix setLogLevelByWorkerData to properly handle workerData when it comes as an array format like [{ workerId: 1 }, { logLevel: 3 }]. The logLevel is in the second element of the array.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
…comments

Move setLogLevelByWorkerData() calls to the top of all worker files (immediately after imports) with clear documentation. This ensures logger configuration is set up before any other code execution in worker threads.

- Move logger initialization to module load time in all 5 worker files
- Add standardized comment explaining the importance of early initialization
- Ensure consistent pattern across all worker implementations

This improves debugging capabilities and ensures proper logging from worker startup.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
…-encoding support

Create tokenCounterFactory.ts to centralize TokenCounter singleton management and add support for multiple encodings using a Map-based cache.

Changes:
- Add tokenCounterFactory.ts with Map<TiktokenEncoding, TokenCounter> cache
- Support multiple encodings simultaneously in worker threads
- Add getTokenCounter() and freeTokenCounter() functions for lifecycle management
- Update fileMetricsWorker.ts and outputMetricsWorker.ts to use factory
- Remove duplicate singleton implementations from worker files
- Add timing measurement to TokenCounter initialization for performance monitoring

This improves code organization, reduces duplication, and enables proper handling of multiple encoding types while maintaining memory efficiency.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Update dependency injection parameter names to be more descriptive of the actual functionality.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Enable error chaining by accepting ErrorOptions parameter in RepomixError and RepomixConfigValidationError constructors. Enhanced error handler to display cause information during debugging.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
…ssage

Add specific error handling for JavaScript string size limit (~512MB) in Handlebars template compilation. Provides actionable guidance to use --include flag for processing specific directories.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
…re options

Update 'Invalid string length' error message to suggest both --include and --ignore flags as solutions for handling large repositories.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings July 22, 2025 15:08
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 22, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This update refactors logger initialization across worker modules to use setLogLevelByWorkerData instead of environment variables, ensuring logger configuration occurs at module load. It introduces a centralized tokenCounterFactory for managing TokenCounter instances, replaces singleton logic in metrics workers, and renames the output handler dependency in the packager and related tests. Error handling is enhanced with error causes and improved reporting.

Changes

File(s) Change Summary
src/core/file/workers/fileCollectWorker.ts
src/core/file/workers/fileProcessWorker.ts
src/core/security/workers/securityCheckWorker.ts
src/core/metrics/workers/fileMetricsWorker.ts
src/core/metrics/workers/outputMetricsWorker.ts
Logger initialization refactored to use setLogLevelByWorkerData at module load, replacing environment-based setup.
src/core/metrics/tokenCounterFactory.ts New module introduced to manage and cache TokenCounter instances per encoding; provides getTokenCounter and freeTokenCounter functions.
src/core/metrics/TokenCounter.ts Logs initialization duration of encoding in the constructor for debugging/performance tracking.
src/core/metrics/workers/fileMetricsWorker.ts
src/core/metrics/workers/outputMetricsWorker.ts
Replaces local singleton TokenCounter logic with imports from tokenCounterFactory for centralized management and cleanup.
src/core/output/outputGenerate.ts Enhanced error handling: attaches error causes to RepomixError, and adds specific handling for string size RangeError in template output.
src/core/packager.ts
tests/core/packager.test.ts
tests/core/packager/diffsFunctionality.test.ts
tests/integration-tests/packager.test.ts
Renames dependency property from handleOutput to writeOutputToDisk in packager and all related tests.
src/shared/errorHandle.ts Updates error class constructors to accept ErrorOptions and log error causes; improves error reporting and guidance for users.
src/shared/logger.ts Replaces setLogLevelByEnv with setLogLevelByWorkerData, configuring log level from worker data instead of environment variables.
src/shared/processConcurrency.ts Introduces TASKS_PER_THREAD constant, logs worker pool initialization duration, and passes log level via workerData instead of environment variables.
tests/cli/cliRun.test.ts Updates logger mock from setLogLevelByEnv to setLogLevelByWorkerData.
tests/shared/processConcurrency.test.ts Updates comments and expected config shapes to reflect new concurrency logic and logger configuration via workerData.

Sequence Diagram(s)

sequenceDiagram
    participant MainProcess
    participant Worker
    participant Logger

    MainProcess->>Worker: Spawn worker with workerData (includes logLevel)
    Worker->>Logger: setLogLevelByWorkerData()
    Logger-->>Worker: Logger configured
    Worker->>TokenCounterFactory: getTokenCounter(encoding)
    TokenCounterFactory->>TokenCounter: (create if not cached)
    TokenCounter-->>TokenCounterFactory: instance
    TokenCounterFactory-->>Worker: TokenCounter instance
    Worker->>MainProcess: Processed results
    Worker->>TokenCounterFactory: freeTokenCounter() on exit
    TokenCounterFactory->>TokenCounter: free()
Loading

Estimated code review effort

3 (~45 minutes)

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/perf

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@claude
Copy link
Contributor

claude bot commented Jul 22, 2025

Claude finished @yamadashy's task —— View job


PR Review Analysis

I'll provide a comprehensive review of this PR focusing on error handling improvements and performance optimizations.

Review Tasks:

  • Analyze error handling improvements in errorHandle.ts
  • Review TokenCounter factory pattern implementation
  • Examine performance optimizations in processConcurrency.ts
  • Check function renaming and code clarity improvements
  • Review worker file changes
  • Evaluate test updates
  • Provide overall assessment and recommendations

Review Feedback

Error Handling Improvements

Excellent work on error chaining! The addition of ErrorOptions support to RepomixError classes (errorHandle.ts:6, errorHandle.ts:13) is a solid improvement that enables proper error cause tracking.

Great specific error handling for the "Invalid string length" case in outputGenerate.ts:122-127. The error message provides actionable guidance mentioning both --include and --ignore options, which directly addresses the user pain point mentioned in the PR description.

Code Quality: The error handling logic in handleError function (errorHandle.ts:19-58) properly handles all error types and provides helpful debugging information.

TokenCounter Factory Pattern

Well-designed factory pattern in tokenCounterFactory.ts. The implementation correctly:

  • Caches instances per encoding per worker thread (tokenCounterFactory.ts:5)
  • Provides proper resource cleanup with freeTokenCounter (tokenCounterFactory.ts:24-29)
  • Uses clear, documented API

Performance benefit: This addresses the expensive TokenCounter initialization by ensuring only one instance per encoding per worker.

Minor suggestion: Consider adding type guards or validation in getTokenCounter if invalid encodings could be passed.

Performance Optimizations

Great addition of timing measurements in both TokenCounter.ts:8-16 and processConcurrency.ts:33-48. This provides valuable visibility into performance bottlenecks.

Proper worker initialization timing (processConcurrency.ts:26-51): The timing measurements for Tinypool initialization will help with future optimization efforts.

Code Clarity Improvements

Excellent renaming: handleOutputwriteOutputToDisk (packager.ts:36, packager.ts:101) is much more semantically clear and accurately describes the function's purpose.

Worker initialization positioning: Moving setLogLevelByWorkerData() to module load time (fileMetricsWorker.ts:7-9) ensures logging is properly configured before any operations.

🔍 Areas for Consideration

  1. Logger initialization in workers: While moving setLogLevelByWorkerData() to module level works, consider documenting this pattern across all worker files for consistency. Currently I see it in fileMetricsWorker.ts:9 but should verify it's consistent across all workers.

  2. Error cause logging: In errorHandle.ts:31-33, the cause is logged but the formatting could be improved:

// Current
logger.note('Caused by:', error.cause);

// Consider
logger.note('Caused by:', error.cause instanceof Error ? error.cause.message : error.cause);
  1. TokenCounter factory thread safety: While the current implementation is worker-scoped (good!), consider adding a comment clarifying this is safe because each worker has its own Map instance.

Test Updates

The test updates properly reflect the function renaming and maintain coverage. Good attention to keeping tests in sync with implementation changes.

📋 Overall Assessment

This is a well-executed PR that delivers meaningful improvements:

  • Error handling: Proper error chaining and user-friendly messages
  • Performance: Smart caching strategy for TokenCounters and timing visibility
  • Code quality: Clear naming and better organization
  • Maintainability: Factory pattern reduces complexity and improves resource management

The changes follow the project's coding guidelines, maintain backward compatibility, and include appropriate test updates.

🏆 Recommendation: Approve

This PR successfully addresses the stated goals with clean, well-structured code. The improvements are meaningful and the implementation is solid. No blocking issues identified.


Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @yamadashy, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on making the application more robust and efficient. I've introduced significant improvements to how errors are handled, especially for large data outputs, by providing clearer messages and better error chaining. Concurrently, I've optimized performance by refining worker initialization processes and resource management, ensuring a smoother and faster user experience. A minor refactoring was also included to enhance code readability.

Highlights

  • Enhanced Error Handling: I've improved the error handling mechanisms, particularly for cases where generated output exceeds JavaScript's string size limits. The system now provides more user-friendly messages with actionable guidance (e.g., suggesting --include or --ignore options). Additionally, RepomixError classes now support ErrorOptions for better error chaining and debugging, allowing the original cause of an error to be preserved and displayed.
  • Performance Optimizations: I've implemented several performance enhancements. The TokenCounter now uses a factory pattern to ensure efficient management and reuse of instances across worker threads, reducing redundant initializations. I've also added timing measurements for TokenCounter and Tinypool worker pool initializations to better understand and optimize startup performance. Worker logger initialization has been refined to occur earlier in the worker lifecycle.
  • Code Clarity and Maintainability: For improved code clarity, I've renamed the handleOutput function to writeOutputToDisk. This change provides a more semantically accurate description of the function's purpose, making the codebase easier to understand and maintain.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

This comment was marked as outdated.

@codecov
Copy link

codecov bot commented Jul 22, 2025

Codecov Report

Attention: Patch coverage is 51.04167% with 47 lines in your changes missing coverage. Please review.

Project coverage is 89.68%. Comparing base (ee9d71d) to head (3468504).

Files with missing lines Patch % Lines
src/core/output/outputGenerate.ts 24.13% 22 Missing ⚠️
src/shared/logger.ts 16.66% 10 Missing ⚠️
src/shared/errorHandle.ts 33.33% 8 Missing ⚠️
src/core/metrics/tokenCounterFactory.ts 68.75% 5 Missing ⚠️
src/core/metrics/workers/fileMetricsWorker.ts 50.00% 1 Missing ⚠️
src/core/metrics/workers/outputMetricsWorker.ts 75.00% 1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (51.04%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #746      +/-   ##
==========================================
- Coverage   90.23%   89.68%   -0.56%     
==========================================
  Files         100      101       +1     
  Lines        5768     5806      +38     
  Branches     1205     1203       -2     
==========================================
+ Hits         5205     5207       +2     
- Misses        563      599      +36     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Jul 22, 2025

Deploying repomix with  Cloudflare Pages  Cloudflare Pages

Latest commit: 3468504
Status: ✅  Deploy successful!
Preview URL: https://38310027.repomix.pages.dev
Branch Preview URL: https://feat-perf.repomix.pages.dev

View logs

… guidance

Enhanced error message to be more user-friendly:
- Clearly explain the issue (repository contains files too large to process)
- Provide concrete examples for --ignore usage
- Structure solutions in easy-to-follow bullet points
- Add option to process smaller portions of the repository

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@gemini-code-assist
Copy link
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

Changed RepomixError stack trace output from logger.note to logger.debug to reduce noise in standard error output. Stack traces are now only shown when verbose flag is used.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
When output size exceeds JavaScript string limit, now displays the top 5 largest files with their sizes to help users identify which files to exclude. This makes it easier to decide which --ignore patterns to use.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@yamadashy yamadashy requested a review from Copilot July 22, 2025 15:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves error handling and implements performance optimizations across the codebase. The changes focus on better error reporting with cause chains, worker thread initialization optimizations, and resource management improvements.

  • Enhanced error handling with detailed debugging information and better error messages for common issues
  • Optimized worker thread initialization by switching from environment variables to workerData for configuration
  • Improved resource management with token counter factory pattern and performance timing metrics

Reviewed Changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/shared/errorHandle.ts Enhanced error constructors to support error chaining and improved error logging
src/shared/logger.ts Replaced environment-based configuration with workerData-based approach for workers
src/shared/processConcurrency.ts Added performance timing and switched to workerData for worker configuration
src/core/output/outputGenerate.ts Enhanced error handling with better messages for string length limits and added processedFiles parameter
src/core/metrics/tokenCounterFactory.ts New factory for managing TokenCounter instances with proper resource cleanup
Multiple worker files Updated to use workerData configuration and improved initialization patterns
Multiple test files Updated function names and mock implementations to match API changes
Comments suppressed due to low confidence (1)

src/core/output/outputGenerate.ts:138

  • [nitpick] The error message spans multiple lines but is constructed as a template literal. Consider breaking this into multiple concatenated strings or ensuring consistent formatting across all lines for better readability.
        `Output size exceeds JavaScript string limit. The repository contains files that are too large to process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants