Skip to content

feat: Add parameter --summarize-token-counts#747

Merged
yamadashy merged 19 commits intoyamadashy:mainfrom
gudber:token_count
Aug 3, 2025
Merged

feat: Add parameter --summarize-token-counts#747
yamadashy merged 19 commits intoyamadashy:mainfrom
gudber:token_count

Conversation

@gudber
Copy link
Contributor

@gudber gudber commented Jul 23, 2025

Overview

This PR introduces a new parameter --summarize-token-counts which displays summarized token counts for each entry in a file tree of a repo processed with repomix. For example on Repomix this would show something like:

📊 Token Count Summary:
────────────────────────
....
├── tsconfig.json (177 tokens)
├── typos.toml (80 tokens)
├── vitest.config.ts (89 tokens)
├── .agents/ (2874 tokens)
│ └── rules/ (2874 tokens)
│ ├── base.md (1988 tokens)
│ ├── browser-extension.md (453 tokens)
│ └── website.md (433 tokens)
...

Usage Examples

repomix --summarize-token-counts
Would display something like:
📊 Token Count Summary:
────────────────────────
....
├── tsconfig.json (177 tokens)
├── typos.toml (80 tokens)
├── vitest.config.ts (89 tokens)
├── .agents/ (2874 tokens)
│ └── rules/ (2874 tokens)
│ ├── base.md (1988 tokens)
│ ├── browser-extension.md (453 tokens)
│ └── website.md (433 tokens)
...

repomix --summarize-token-counts 1000
Would display something like:
📊 Token Count Summary:
────────────────────────
....
├── .agents/ (2874 tokens)
│ └── rules/ (2874 tokens)
│ ├── base.md (1988 tokens)
...

CLI Command

repomix --summarize-token-counts [minum threshold to display token count]

@gudber gudber requested a review from yamadashy as a code owner July 23, 2025 22:39
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 23, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This update introduces a hierarchical token count tree feature to the CLI tool, allowing users to visualize token usage per directory and file. It reorganizes and extends CLI options, updates configuration schemas, centralizes reporting logic, and adds new modules and tests for token count tree generation and display. Documentation is updated accordingly.

Changes

Cohort / File(s) Change Summary
CLI Action Refactor & Token Count Tree Integration
src/cli/actions/defaultAction.ts, src/cli/cliRun.ts, src/cli/types.ts
Refactored default action workflow for unified spinner and result reporting; added and reorganized CLI options including tokenCountTree; updated types to support new options.
Token Count Tree Core & Types
src/core/tokenCount/buildTokenCountStructure.ts, src/core/tokenCount/types.ts
Added new core module and types to build and represent a hierarchical token count tree from file lists.
Token Count Tree Reporter
src/cli/reporters/tokenCountTreeReporter.ts
Added a new reporter to display the token count tree in the CLI, including filtering and formatting logic.
Reporting Refactor
src/cli/cliReport.ts
Centralized and renamed reporting functions; introduced reportResults to orchestrate output including token count tree display.
Config Schema Update
src/config/configSchema.ts
Added tokenCountTree option to configuration schemas with type validation and defaults.
Metrics Calculation Update
src/core/metrics/calculateMetrics.ts
Modified logic to calculate token counts for all files if tokenCountTree is enabled, optimizing metric calculations.
Documentation
README.md
Updated CLI option documentation, added "Token Count Optimization" section, and documented new configuration options.
Test Updates & Additions
tests/cli/actions/defaultAction.test.ts, tests/cli/actions/defaultAction.tokenCountTree.test.ts, tests/cli/cliReport.test.ts, tests/cli/reporters/tokenCountTreeReporter.test.ts, tests/config/configSchema.test.ts, tests/core/tokenCount/buildTokenCountStructure.test.ts
Updated and added tests for refactored CLI actions, token count tree feature, reporting, config schema, and core tree-building logic.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI
    participant Config
    participant Metrics
    participant Reporter

    User->>CLI: Run CLI with --token-count-tree
    CLI->>Config: Load and merge config (includes tokenCountTree)
    CLI->>Metrics: Process files & calculate token counts (all files if tokenCountTree)
    CLI->>Reporter: reportResults (includes reportTokenCountTree)
    Reporter->>Reporter: Build and display token count tree
    Reporter->>User: Output hierarchical token count summary
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

Suggested labels

enhancement

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @gudber, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the repomix CLI by adding a new feature that provides a hierarchical summary of token counts across the repository's file structure. This allows users to quickly understand the token distribution within their codebase, with an option to filter results by a minimum token count.

Highlights

  • New CLI Parameter: I've introduced a new command-line parameter, --summarize-token-counts, to the repomix tool. This allows users to generate a detailed summary of token counts for their repository's file tree.
  • Hierarchical Token Count Display: The new feature calculates and presents token counts for individual files and their parent directories in a clear, tree-like structure, providing an intuitive overview of token distribution.
  • Configurable Threshold Filtering: Users can optionally specify a minimum token count threshold with the --summarize-token-counts parameter. This filters the output, displaying only files and directories that meet or exceed the specified token count, helping to focus on significant parts of the codebase.
  • Modular Implementation: The token summarization logic is implemented in new, dedicated modules responsible for building the hierarchical data structure, calculating token sums for directories, and rendering the output to the console.
  • Comprehensive Test Coverage: New unit tests have been added to validate the CLI integration, the correctness of the token tree building algorithm, the display formatting, and the threshold filtering functionality.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the --summarize-token-counts feature. The review focuses on input validation and code duplication.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (5)
src/cli/actions/defaultAction.ts (2)

103-115: Good integration with minor suggestion for robustness.

The token count summarization integration is well-implemented with proper conditional execution and progress reporting.

Consider adding validation for the threshold parsing to handle invalid input gracefully:

-    const threshold =
-      typeof cliOptions.summarizeTokenCounts === 'string' ? Number.parseInt(cliOptions.summarizeTokenCounts, 10) : 0;
+    const threshold =
+      typeof cliOptions.summarizeTokenCounts === 'string' 
+        ? Math.max(0, Number.parseInt(cliOptions.summarizeTokenCounts, 10) || 0)
+        : 0;

This ensures that invalid numeric strings default to 0 and negative values are clamped to 0.


152-164: Consider extracting the duplicated token summarization logic.

The implementation is correct and consistent with the stdin workflow, but there's code duplication. Consider extracting this into a helper function:

private const handleTokenCountSummary = async (
  cliOptions: CliOptions,
  packResult: PackResult,
  config: RepomixConfigMerged,
  spinner: Spinner,
): Promise<void> => {
  if (cliOptions.summarizeTokenCounts) {
    const threshold =
      typeof cliOptions.summarizeTokenCounts === 'string' 
        ? Math.max(0, Number.parseInt(cliOptions.summarizeTokenCounts, 10) || 0)
        : 0;
    await summarizeTokenCounts(
      packResult.processedFiles,
      config.tokenCount.encoding as TiktokenEncoding,
      (message) => spinner.update(message),
      threshold,
    );
  }
};

Then call this helper in both handleStdinProcessing and handleDirectoryProcessing.

.agents/rules/base.md (1)

1-4: File header inconsistency needs clarification.

The file starts with "# CLAUDE.md" but the filename is .agents/rules/base.md. This mismatch could cause confusion about the file's purpose and location.

Consider updating the header to match the actual filename:

-# CLAUDE.md
+# Base Rules for Repomix Development
src/core/tokenCount/displayTokenCountTree.ts (2)

22-83: Consider breaking down the complex displayNode function.

The displayNode function handles multiple responsibilities: filtering, sorting, formatting, and recursive traversal. This makes it harder to test and maintain.

Consider extracting helper functions:

+const filterAndSortEntries = (node: TreeNode, minTokenCount: number) => {
+  const allEntries = Object.entries(node).filter(
+    ([key, value]) => !key.startsWith('_') && value && typeof value === 'object' && !Array.isArray(value),
+  );
+  
+  const entries = allEntries.filter(([, value]) => {
+    const tokenSum = (value as TreeNode)._tokenSum || 0;
+    return tokenSum >= minTokenCount;
+  });
+  
+  const allFiles = node._files || [];
+  const files = allFiles.filter((file) => file.tokens >= minTokenCount);
+  
+  entries.sort(([a], [b]) => a.localeCompare(b));
+  files.sort((a, b) => a.name.localeCompare(b.name));
+  
+  return { entries, files };
+};
+
+const formatTreeLine = (prefix: string, connector: string, name: string, tokenInfo: string, isRoot: boolean) => {
+  return isRoot && prefix === '' ? `${connector}${name} ${tokenInfo}` : `${prefix}${connector}${name} ${tokenInfo}`;
+};

48-52: Simplify root handling logic.

The repeated isRoot && prefix === '' checks can be simplified by extracting the prefix calculation.

+const getDisplayPrefix = (prefix: string, isRoot: boolean) => isRoot && prefix === '' ? '' : prefix;

-    if (isRoot && prefix === '') {
-      logger.log(`${connector}${file.name} ${tokenInfo}`);
-    } else {
-      logger.log(`${prefix}${connector}${file.name} ${tokenInfo}`);
-    }
+    const displayPrefix = getDisplayPrefix(prefix, isRoot);
+    logger.log(`${displayPrefix}${connector}${file.name} ${tokenInfo}`);

Also applies to: 62-66

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2ecf7a8 and dc82ff6.

📒 Files selected for processing (12)
  • .agents/rules/base.md (4 hunks)
  • src/cli/actions/defaultAction.ts (4 hunks)
  • src/cli/cliRun.ts (1 hunks)
  • src/cli/types.ts (1 hunks)
  • src/core/tokenCount/buildTokenCountStructure.ts (1 hunks)
  • src/core/tokenCount/displayTokenCountTree.ts (1 hunks)
  • src/core/tokenCount/saveTokenCounts.ts (1 hunks)
  • src/core/tokenCount/types.ts (1 hunks)
  • tests/cli/actions/defaultAction.saveTokenCounts.test.ts (1 hunks)
  • tests/core/tokenCount/buildTokenCountStructure.test.ts (1 hunks)
  • tests/core/tokenCount/displayTokenCountTree.test.ts (1 hunks)
  • tests/core/tokenCount/saveTokenCounts.test.ts (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
PR: yamadashy/repomix#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-18T15:12:57.169Z
Learning: .agents/rules/base.md
Learnt from: CR
PR: yamadashy/repomix#0
File: .cursorrules:0-0
Timestamp: 2025-06-30T16:07:18.316Z
Learning: Applies to .agents/rules/base.md : Check the rules written in `.agents/rules/base.md` as they contain important project-specific guidelines and instructions.
.agents/rules/base.md (2)

Learnt from: CR
PR: yamadashy/repomix#0
File: .cursorrules:0-0
Timestamp: 2025-06-30T16:07:18.316Z
Learning: Applies to .agents/rules/base.md : Check the rules written in .agents/rules/base.md as they contain important project-specific guidelines and instructions.

Learnt from: CR
PR: yamadashy/repomix#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-18T15:12:57.169Z
Learning: .agents/rules/base.md

🧬 Code Graph Analysis (7)
tests/core/tokenCount/buildTokenCountStructure.test.ts (1)
src/core/tokenCount/buildTokenCountStructure.ts (2)
  • FileWithTokens (4-7)
  • buildTokenCountStructure (69-72)
tests/cli/actions/defaultAction.saveTokenCounts.test.ts (3)
src/core/tokenCount/saveTokenCounts.ts (1)
  • summarizeTokenCounts (8-34)
src/cli/types.ts (1)
  • CliOptions (4-58)
src/cli/actions/defaultAction.ts (1)
  • runDefaultAction (28-57)
src/core/tokenCount/displayTokenCountTree.ts (2)
src/core/tokenCount/buildTokenCountStructure.ts (2)
  • FileWithTokens (4-7)
  • buildTokenCountTree (15-47)
src/shared/logger.ts (1)
  • logger (89-89)
src/core/tokenCount/saveTokenCounts.ts (3)
src/core/file/fileTypes.ts (1)
  • ProcessedFile (6-9)
src/core/tokenCount/buildTokenCountStructure.ts (1)
  • FileWithTokens (4-7)
src/core/tokenCount/displayTokenCountTree.ts (1)
  • displayTokenCountTree (10-20)
src/core/tokenCount/buildTokenCountStructure.ts (1)
src/core/tokenCount/types.ts (3)
  • FileTokenInfo (1-4)
  • TokenCountOutput (12-12)
  • DirectoryTokenInfo (6-10)
src/cli/actions/defaultAction.ts (1)
src/core/tokenCount/saveTokenCounts.ts (1)
  • summarizeTokenCounts (8-34)
tests/core/tokenCount/saveTokenCounts.test.ts (3)
src/core/tokenCount/displayTokenCountTree.ts (1)
  • displayTokenCountTree (10-20)
src/core/file/fileTypes.ts (1)
  • ProcessedFile (6-9)
src/core/tokenCount/saveTokenCounts.ts (1)
  • summarizeTokenCounts (8-34)
🔇 Additional comments (10)
src/cli/cliRun.ts (1)

104-107: LGTM! Well-structured CLI option implementation.

The new --summarize-token-counts option is properly integrated with clear documentation and appropriate optional parameter handling. The placement in the "Token Count Options" group maintains good CLI organization.

src/cli/types.ts (1)

49-49: LGTM! Appropriate type definition for the CLI option.

The summarizeTokenCounts?: boolean | string type correctly represents the dual nature of the CLI option (flag or threshold value) and follows established naming conventions.

tests/core/tokenCount/buildTokenCountStructure.test.ts (1)

1-108: Excellent test coverage for the token count structure builder.

The test suite comprehensively covers various scenarios including:

  • Simple and nested directory structures
  • Multiple root directories
  • Empty input handling
  • File name collisions in different directories

Each test case is well-structured with clear input/output validation, providing confidence in the robustness of the buildTokenCountStructure function.

src/core/tokenCount/types.ts (1)

1-12: Well-designed type definitions for hierarchical token count structure.

The type definitions effectively model the hierarchical nature of the token count feature:

  • FileTokenInfo provides a clean interface for individual files with token counts
  • DirectoryTokenInfo properly supports recursive directory structures with the optional directories property
  • TokenCountOutput as an array accommodates multiple root directories

The types are intuitive, properly typed, and will provide good IntelliSense support throughout the codebase.

tests/core/tokenCount/saveTokenCounts.test.ts (1)

1-102: Excellent test coverage and structure!

This test suite demonstrates comprehensive testing practices:

  • Proper mock setup and cleanup
  • Tests cover success path, error scenarios, empty input, and parameter variations
  • Resource cleanup verification (TokenCounter.free()) even in error cases
  • Clear test descriptions and well-structured assertions

The dependency injection testing approach using mocks is well-implemented and follows best practices.

src/core/tokenCount/saveTokenCounts.ts (1)

8-34: Well-implemented async function with proper resource management!

The implementation demonstrates excellent practices:

  • Clean function signature with appropriate default parameter
  • Proper resource management using try-finally to ensure TokenCounter cleanup
  • Progress callback integration for user feedback
  • Clear, readable code structure

The guaranteed cleanup in the finally block is particularly important for preventing resource leaks.

tests/core/tokenCount/displayTokenCountTree.test.ts (1)

1-178: Comprehensive and well-structured display tests!

This test suite excellently covers all aspects of the token count tree display:

  • Various tree structures (simple, nested, multiple files)
  • Edge cases (empty lists, filtering scenarios)
  • Sorting verification with proper index comparisons
  • Threshold filtering with detailed assertions
  • Proper logger mocking and output verification

The tests are particularly well-organized and demonstrate thorough understanding of the display requirements.

tests/cli/actions/defaultAction.saveTokenCounts.test.ts (1)

1-120: Excellent CLI integration test coverage!

This test suite effectively validates the integration of token count summarization into the CLI workflow:

  • Comprehensive mocking strategy isolates the integration logic
  • Tests cover various CLI option combinations (boolean/string values, custom encodings)
  • Proper verification of parameter passing to the core function
  • Both positive and negative test cases are included
  • Mock setup is realistic and consistent

The tests ensure the CLI properly bridges user options to the core functionality.

.agents/rules/base.md (1)

21-221: Comprehensive and well-structured development documentation!

The extensive updates provide excellent guidance covering:

  • Essential development commands for common workflows
  • High-level architecture with clear pipeline explanation
  • Key architectural patterns with code examples
  • Detailed coding guidelines and commit message standards
  • Important procedural instructions for AI assistants

The documentation strikes a good balance between being comprehensive and remaining actionable for developers and AI assistants working with the codebase.

src/core/tokenCount/buildTokenCountStructure.ts (1)

115-122: No change needed for root-level file handling
The current implementation intentionally represents each root-level file as a DirectoryTokenInfo with the file name and its tokens. All existing tests in buildTokenCountStructure.test.ts explicitly verify this behavior (e.g. the “nested directory structure” and “multiple root directories” cases), confirming it aligns with the intended design.

@yamadashy
Copy link
Owner

Hi, @gudber !
Thank you for this great PR!
This feature will definitely make token reduction much easier.

I'll make some adjustments to the options and code structure.

gudber and others added 5 commits August 3, 2025 14:41
- Move spinner initialization to runDefaultAction for shared usage across all operations
- Extract handleTokenCountSummary method from inline code for better modularity
- Remove unused cliOptions parameters from handleStdinProcessing and handleDirectoryProcessing
- Update test signatures to match new function parameters
- Enable spinner reuse in summarizeTokenCounts functionality

This refactoring improves code organization by centralizing spinner management
and separating token count summary logic into a dedicated method.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
…rint

Major restructuring of CLI options and tokenCountTree functionality:

CLI Options:
- Reorganized options into logical groups: "CLI Input/Output Options", "Repomix Output Options", and "File Selection Options"
- Moved --verbose, --quiet, --stdout, --stdin, --copy, --token-count-tree, --top-files-len to CLI I/O group
- Renamed --summarize-token-counts to --token-count-tree for clarity
- Updated README.md documentation to reflect new option organization

TokenCountTree Refactoring:
- Moved tokenCountTree functionality from saveTokenCounts.ts to cliPrint.ts as printTokenCountTree
- Deleted saveTokenCounts.ts (no longer needed)
- Removed handleTokenCountTree function from defaultAction.ts
- Integrated printTokenCountTree into printResults workflow for consistency
- Updated threshold calculation to be handled within printTokenCountTree

Configuration:
- Added tokenCountTree option to config schema with default value false
- Changed schema from z.union([z.boolean(), z.string()]) to z.union([z.boolean(), z.number()])
- CLI now converts string thresholds to numbers during buildCliConfig

Optimizations:
- Modified calculateMetrics to calculate all file tokens when tokenCountTree is enabled
- Prevents double token calculation for better performance

Display:
- Changed emoji from 📊 to 🔢 for Token Count Tree
- Updated title from "Token Count Summary" to "Token Count Tree"

Tests:
- Renamed and updated test files to match new structure
- Updated tests to work with new printTokenCountTree function signature
- All tests pass with new implementation

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
…functions

- Rename cliPrint.ts to cliReport.ts with all functions changed from print* to report*
- Move printResults from defaultAction to cliReport as reportResults for better organization
- Move reportTokenCountTree to cli/reporters/tokenCountTreeReporter.ts for cleaner separation
- Move test-only functions (buildTokenCountStructure, convertToOutput) from core module to test files
- Update all imports and test files to reflect new naming conventions
- Maintain all functionality while improving code organization and module cohesion

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Merge displayTokenCountTree functionality directly into reportTokenCountTree
- Remove separate displayTokenCountTree export and test file for better code organization
- Update configSchema to accept string values for tokenCountTree (boolean | number | string)
- Enhance tokenCountTreeReporter to handle string to number conversion
- Fix config tests to properly validate tokenCountTree type union

This consolidates the token count tree display logic into a single function
while maintaining all existing functionality and improving type safety.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
yamadashy and others added 2 commits August 3, 2025 16:36
…play

- Apply pc.dim() to token count parentheses for consistent styling with other CLI outputs
- Add toLocaleString() for comma-separated number formatting
- Improves readability of large token counts in tree view

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Enhance token counting feature description with tree visualization details
- Expand --token-count-tree CLI option explanation with optimization use cases
- Add new "Token Count Optimization" section with practical examples and output samples
- Include threshold usage examples and token reduction strategies
- Document use cases for identifying large files, optimizing selection, and planning compression

This helps users understand how to leverage token counting for AI context optimization.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

This comment was marked as outdated.

@yamadashy
Copy link
Owner

@claude review

@claude
Copy link
Contributor

claude bot commented Aug 3, 2025

Claude encountered an error —— View job

Failed with exit code 128

I'll analyze this and get back to you.

yamadashy and others added 6 commits August 3, 2025 16:58
- Remove leading newline from "🔢 Token Count Tree:" output for cleaner formatting
- Update corresponding test expectations to match the new output format
- Improves consistency with other CLI output sections

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
… location

- Move test file from tests/cli/cliReport.reportTokenCountTree.test.ts to tests/cli/reporters/tokenCountTreeReporter.test.ts
- Update import paths to reflect new directory structure
- Improve test organization by co-locating test with the module it tests

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Remove .js extension from vi.mock('../../../src/shared/logger') for consistency
- Follow project convention for mock imports in test files

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add content verification for tree structure output instead of just checking headers
- Verify actual file paths and directories are displayed in the tree
- Test that files without token counts are properly skipped
- Make tests more robust by checking actual functionality rather than just basic output

This should resolve CI test failures by properly validating the complete tree output.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Remove toHaveBeenCalledWith() assertions that depend on call order
- Replace with content-based checks using .some() to verify required output exists
- Make tests more resilient to output order changes in CI/CD environments
- Ensure tests focus on functionality rather than specific call sequences

This should resolve the CI test failures caused by different output ordering.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add picocolors mock following the same pattern as cliReport.test.ts
- Update test expectations to match mocked color output (e.g., DIM:────────)
- Ensure consistent test behavior across different environments
- Fix CI test failures caused by environment-specific color handling

This aligns with the existing test patterns in the codebase.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@codecov
Copy link

codecov bot commented Aug 3, 2025

Codecov Report

❌ Patch coverage is 86.22449% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.56%. Comparing base (8478218) to head (57d44c9).

Files with missing lines Patch % Lines
src/cli/cliReport.ts 23.07% 20 Missing ⚠️
src/cli/reporters/tokenCountTreeReporter.ts 93.75% 5 Missing ⚠️
src/cli/actions/defaultAction.ts 95.65% 1 Missing ⚠️
src/core/metrics/calculateMetrics.ts 92.30% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #747      +/-   ##
==========================================
+ Coverage   88.38%   88.56%   +0.17%     
==========================================
  Files         106      109       +3     
  Lines        5940     6084     +144     
  Branches     1212     1261      +49     
==========================================
+ Hits         5250     5388     +138     
- Misses        690      696       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Add parentheses around arrow function parameters for consistency
- Apply automatic code style fixes from biome linter

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Replace path.sep with forward slash in buildTokenCountTree for consistent behavior
- Remove unused path import after changing to forward slash split
- Fix Windows CI test failures caused by path separator differences
- Ensure tests work consistently across Unix and Windows environments

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@yamadashy yamadashy requested a review from Copilot August 3, 2025 10:16
@yamadashy
Copy link
Owner

/gemini review

@yamadashy
Copy link
Owner

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 3, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

This comment was marked as outdated.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new --token-count-tree feature to display a file tree with token counts, which is a great addition for understanding repository composition. The implementation is solid, with good refactoring of the CLI reporting logic and comprehensive test coverage for the new functionality.

I have a couple of suggestions to improve robustness:

  1. In src/cli/actions/defaultAction.ts, the use of parseInt with radix 0 should be changed to radix 10 to ensure consistent behavior.
  2. In src/cli/reporters/tokenCountTreeReporter.ts, the logic for parsing the token count threshold should be made more robust to handle invalid numeric inputs, which currently lead to a silent failure where an empty tree is displayed.

These changes will make the new feature more reliable for users. Overall, great work on this feature.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/cli/reporters/tokenCountTreeReporter.ts (1)

48-115: Remove unused parameter and improve function clarity.

The displayNode function is well-implemented with proper recursive tree rendering. However, the _isLast parameter is unused, which suggests it was intended for different logic or is vestigial.

 const displayNode = (
   node: TreeNode,
   prefix: string,
-  _isLast: boolean,
   isRoot: boolean,
   minTokenCount: number,
 ): void => {

Also, update the recursive call:

-    displayNode(childNode as TreeNode, childPrefix, isLastEntry, false, minTokenCount);
+    displayNode(childNode as TreeNode, childPrefix, false, minTokenCount);
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dc82ff6 and 57d44c9.

📒 Files selected for processing (16)
  • README.md (5 hunks)
  • src/cli/actions/defaultAction.ts (5 hunks)
  • src/cli/cliReport.ts (6 hunks)
  • src/cli/cliRun.ts (2 hunks)
  • src/cli/reporters/tokenCountTreeReporter.ts (1 hunks)
  • src/cli/types.ts (1 hunks)
  • src/config/configSchema.ts (2 hunks)
  • src/core/metrics/calculateMetrics.ts (1 hunks)
  • src/core/tokenCount/buildTokenCountStructure.ts (1 hunks)
  • src/core/tokenCount/types.ts (1 hunks)
  • tests/cli/actions/defaultAction.test.ts (13 hunks)
  • tests/cli/actions/defaultAction.tokenCountTree.test.ts (1 hunks)
  • tests/cli/cliReport.test.ts (9 hunks)
  • tests/cli/reporters/tokenCountTreeReporter.test.ts (1 hunks)
  • tests/config/configSchema.test.ts (3 hunks)
  • tests/core/tokenCount/buildTokenCountStructure.test.ts (1 hunks)
✅ Files skipped from review due to trivial changes (3)
  • src/core/tokenCount/types.ts
  • tests/cli/cliReport.test.ts
  • src/config/configSchema.ts
🚧 Files skipped from review as they are similar to previous changes (5)
  • src/cli/types.ts
  • src/cli/cliRun.ts
  • tests/core/tokenCount/buildTokenCountStructure.test.ts
  • src/core/tokenCount/buildTokenCountStructure.ts
  • src/cli/actions/defaultAction.ts
🧰 Additional context used
🧬 Code Graph Analysis (3)
src/cli/reporters/tokenCountTreeReporter.ts (4)
src/core/file/fileTypes.ts (1)
  • ProcessedFile (6-9)
src/config/configSchema.ts (1)
  • RepomixConfigMerged (160-160)
src/core/tokenCount/buildTokenCountStructure.ts (2)
  • FileWithTokens (3-6)
  • buildTokenCountTree (14-47)
src/shared/logger.ts (1)
  • logger (89-89)
tests/config/configSchema.test.ts (1)
src/config/configSchema.ts (1)
  • repomixConfigBaseSchema (16-69)
tests/cli/actions/defaultAction.test.ts (2)
src/cli/cliSpinner.ts (1)
  • Spinner (9-70)
src/cli/actions/defaultAction.ts (2)
  • handleStdinProcessing (71-109)
  • handleDirectoryProcessing (114-137)
🪛 markdownlint-cli2 (0.17.2)
README.md

713-713: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 LanguageTool
README.md

[style] ~916-~916: To form a complete sentence, be sure to include a subject.
Context: ...y file tree with token count summaries. Can be boolean or number (minimum token cou...

(MISSING_IT_THERE)

🔇 Additional comments (26)
src/core/metrics/calculateMetrics.ts (3)

33-35: LGTM! Clear conditional logic for token count optimization.

The comment and conditional check clearly explain when to calculate token counts for all files versus just the top files. This optimization makes sense to avoid double calculation when the token count tree feature is enabled.


37-45: Well-structured conditional file selection logic.

The implementation correctly handles both scenarios:

  • When tokenCountTree is enabled: calculates for all files to support the tree feature
  • Otherwise: optimizes by calculating only for top files by character count

The fallback logic with Math.max(topFilesLength * 10, topFilesLength) ensures a reasonable number of files are processed even when topFilesLength is small.


47-56: Function call updated correctly with new parameter.

The calculateSelectiveFileMetrics call now uses the computed metricsTargetPaths instead of a hardcoded calculation, which aligns with the new conditional logic above.

tests/config/configSchema.test.ts (2)

24-57: Comprehensive test coverage for tokenCountTree option.

The test suite thoroughly covers the tokenCountTree option with appropriate test cases:

  • Boolean values (true/false) ✓
  • String values ✓
  • Invalid type rejection (array) ✓

The tests follow the established pattern and use descriptive names. Good validation of the union type z.union([z.boolean(), z.number(), z.string()]) from the schema.


66-66: Appropriate integration of tokenCountTree in existing tests.

The existing schema validation tests have been properly updated to include the tokenCountTree property with valid values (boolean and string), ensuring the new option is covered in comprehensive schema testing scenarios.

Also applies to: 116-116, 212-212

tests/cli/actions/defaultAction.tokenCountTree.test.ts (5)

1-51: Well-structured test setup with comprehensive mocking.

The test file follows good testing practices:

  • Proper mocking of all external dependencies
  • Clear separation of concerns with individual mock functions
  • Sensible default values in beforeEach setup
  • Clean mock reset between tests

The mock setup covers all the necessary modules and provides realistic return values for testing the tokenCountTree functionality.


53-87: Thorough test for tokenCountTree enabled scenario.

This test properly verifies that when tokenCountTree: true is provided in CLI options, the merged configuration reflects this setting and the reporting function receives the correct parameters. The test structure and assertions are comprehensive.


89-103: Good test coverage for disabled tokenCountTree.

This test ensures the default behavior (tokenCountTree: false) works correctly when the option is not provided, which is important for backward compatibility.


105-133: Multiple directories scenario properly tested.

This test validates that the tokenCountTree functionality works correctly with multiple input directories, ensuring the feature scales appropriately.


135-164: Threshold parameter handling tested correctly.

This test verifies that string threshold values (like '50') are properly converted to numbers (50) in the merged configuration, which aligns with the schema's union type support.

tests/cli/reporters/tokenCountTreeReporter.test.ts (5)

1-25: Excellent mocking strategy for testing reporter output.

The test setup properly mocks both the logger and picocolors dependencies, allowing for precise verification of the output format and content. The mock implementations provide predictable color/formatting prefixes that make assertions clear and reliable.


27-53: Comprehensive test for basic token count tree display.

This test verifies the core functionality by checking that:

  • The header and separator are displayed
  • File paths and directories are included in the output
  • The logger is called with expected content

The test data is realistic and the assertions are thorough.


55-68: Good edge case handling for empty file list.

This test ensures the reporter gracefully handles the edge case of no files being processed, displaying an appropriate "No files found" message. This is important for user experience.


70-92: Proper threshold functionality testing.

This test validates that the minimum token count threshold is correctly communicated to users through the display message, ensuring transparency about filtering behavior.


94-121: Important test for handling missing token counts.

This test verifies that files without token count data are properly skipped, which is crucial for robustness. The test confirms that the reporter doesn't break when some files lack token count information.

tests/cli/actions/defaultAction.test.ts (3)

26-40: Well-implemented mock spinner with comprehensive method coverage.

The mockSpinner object properly mocks all the essential Spinner methods (start, update, succeed, fail, stop) and properties. The mocking of the cliSpinner module ensures that any new Spinner() instantiation returns the mock instance.


47-51: Proper mock setup in beforeEach hook.

The beforeEach hook correctly resets mocks and ensures the Spinner constructor returns the mockSpinner instance, providing clean test isolation.


668-668: Consistent refactoring of function calls throughout test suite.

All calls to handleStdinProcessing and handleDirectoryProcessing have been consistently updated to pass mockSpinner instead of the previous parameter. This refactoring is thorough and maintains the test logic while adapting to the new function signatures.

Also applies to: 674-674, 685-685, 700-700, 719-719, 731-731, 743-743, 799-799, 816-816, 826-826, 835-835, 863-863

src/cli/reporters/tokenCountTreeReporter.ts (1)

13-46: LGTM! Well-structured token count extraction and display logic.

The function correctly handles different config types for tokenCountTree and properly filters files with token counts. The output formatting with emoji and styling is consistent with the rest of the application.

Consider a minor improvement for readability:

-  // Display the token count tree
   logger.log('🔢 Token Count Tree:');
   logger.log(pc.dim('────────────────────'));
README.md (4)

510-518: LGTM! Clear documentation of new CLI options.

The reorganization into "CLI Input/Output Options" improves clarity and the --token-count-tree option is well-documented with its optional threshold parameter.


703-736: Excellent documentation for the new token count feature.

This section provides comprehensive coverage of the --token-count-tree feature with clear examples, practical use cases, and benefits. The sample output accurately represents the tree format generated by the code.


916-916: Configuration documentation is accurate and complete.

The output.tokenCountTree option is properly documented with correct type information (boolean or number threshold) and appropriate default value.


537-537: Good improvement in section naming.

"File Selection Options" is more descriptive and accurate than the previous "Filter Options" naming.

src/cli/cliReport.ts (3)

9-37: Excellent refactoring with improved separation of concerns.

The new reportResults function provides a clean orchestration of the reporting workflow with proper conditional logic and spacing. The integration of token count tree reporting is well-positioned in the sequence.


25-28: Clean integration of token count tree reporting.

The conditional reporting based on config.output.tokenCountTree follows the established pattern and maintains consistency with other reporting sections.


74-146: Consistent function renaming improves code clarity.

The rename from print* to report* functions is well-executed throughout the file. Comments and formatting adjustments maintain consistency and improve readability.

- Add explicit radix 10 to Number.parseInt() for consistent parsing
- Add validation to throw error for non-numeric string inputs
- Export TreeNode interface to eliminate code duplication
- Remove unused rootDirs parameter from test helper
- Remove unused isLast parameter from displayNode function
- Simplify token count threshold handling in reporter

These changes address review feedback to improve robustness and maintainability.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
CLI options use string type for numeric values as they come from commander.js parsing.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@yamadashy yamadashy requested a review from Copilot August 3, 2025 11:52
@yamadashy
Copy link
Owner

/gemini review

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new --token-count-tree parameter that displays hierarchical token count summaries for repository files, helping users understand token distribution across their codebase for AI context optimization.

Key changes:

  • Adds --token-count-tree CLI option with optional threshold parameter
  • Implements token count tree visualization with directory structure
  • Integrates token counting into metrics calculation for all files when feature is enabled

Reviewed Changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/cli/types.ts Adds tokenCountTree option to CliOptions interface
src/config/configSchema.ts Adds tokenCountTree field to configuration schema
src/cli/cliRun.ts Adds CLI option definition and reorganizes option groups
src/cli/actions/defaultAction.ts Implements CLI config parsing and refactors to use centralized reporting
src/cli/cliReport.ts Refactors from cliPrint to cliReport with centralized result reporting
src/core/tokenCount/types.ts Defines TypeScript interfaces for token count data structures
src/core/tokenCount/buildTokenCountStructure.ts Implements tree building logic for hierarchical token counting
src/cli/reporters/tokenCountTreeReporter.ts Implements tree visualization and display logic
src/core/metrics/calculateMetrics.ts Modifies metrics calculation to include all files when tree feature is enabled
README.md Documents the new feature with usage examples and configuration details
tests/ Comprehensive test coverage for all new functionality
Comments suppressed due to low confidence (2)

src/cli/reporters/tokenCountTreeReporter.ts:30

  • [nitpick] The emoji and title don't match the PR description example which shows '📊 Token Count Summary:'. Consider using consistent terminology and emoji between the code and documentation.
  logger.log('🔢 Token Count Tree:');

tests/cli/actions/defaultAction.buildCliConfig.test.ts:85

  • The test expects parseInt behavior (10.5 -> 10) but this may not be the intended behavior for a threshold parameter. Consider testing whether this behavior is actually desired or if it should be treated as an error.
      // parseInt returns only the integer part

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a very useful feature, --token-count-tree, to visualize token counts in a file tree, which is great for managing context for AI models. The implementation is solid, with good refactoring of the CLI reporting logic and comprehensive test coverage for the new functionality.

I've identified a few areas for improvement:

  • The parsing of the numeric threshold for --token-count-tree is a bit too lenient and could lead to unexpected behavior. I've suggested a stricter implementation.
  • There's some code duplication in the new reporter file that can be simplified.
  • A couple of new types seem to be unused in the application code.
  • A minor formatting issue in the README.md table.

Overall, this is a great addition to the tool. Addressing these points will improve the robustness and maintainability of the code.

- Move string-to-number parsing from defaultAction to Commander.js option parser
- Update CLI types to reflect Commander.js parsed values (boolean | number)
- Simplify buildCliConfig by removing redundant parsing logic
- Update tests to match new type expectations

This approach follows the same pattern as --top-files-len and centralizes
input validation at the CLI parsing level where it belongs.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Fix radix parameter in Number.parseInt from 0 to 10 for consistent parsing
- Add safety checks for malformed file paths in buildTokenCountTree

These changes improve input validation and prevent potential parsing issues.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@yamadashy
Copy link
Owner

@gudber
I've made some adjustments:

  • Renamed to --token-count-tree
  • Added config file support
  • Updated README
  • General refactoring

This feature will be super helpful for the community.

Thanks for this great contribution! Merging now 🎉

@yamadashy yamadashy merged commit 22b9d3c into yamadashy:main Aug 3, 2025
50 checks passed
@yamadashy
Copy link
Owner

@gudber
This feature has been released in v1.3.0!
https://github.com/yamadashy/repomix/releases/tag/v1.3.0

Thank you for your contribution!

@gudber
Copy link
Contributor Author

gudber commented Aug 21, 2025

Awesome @yamadashy, glad to help. I use this a lot to feed into LLM and ask it to generate a .repomixignore file for me to get a repo under 1M tokens for example (Gemini Pro 2.5, Sonnet 4). Using prompt like:

"Use this Repomix token summary of each file in a software repo to write me a .repomixignore file to bring token count under 1 million. Focus on eliminating things like tests, benchmarks, i8n files, json data files, css, media and other assets and extra fluff but try to keep the documentation and most of the core functionality code files of the repo"

@yamadashy
Copy link
Owner

Wow, that's a great use case!
Now that we can see token counts in a tree view, I'm also finding it much easier to manage my ignore files!
Thank you for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants