feat: Add parameter --summarize-token-counts#747
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis update introduces a hierarchical token count tree feature to the CLI tool, allowing users to visualize token usage per directory and file. It reorganizes and extends CLI options, updates configuration schemas, centralizes reporting logic, and adds new modules and tests for token count tree generation and display. Documentation is updated accordingly. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI
participant Config
participant Metrics
participant Reporter
User->>CLI: Run CLI with --token-count-tree
CLI->>Config: Load and merge config (includes tokenCountTree)
CLI->>Metrics: Process files & calculate token counts (all files if tokenCountTree)
CLI->>Reporter: reportResults (includes reportTokenCountTree)
Reporter->>Reporter: Build and display token count tree
Reporter->>User: Output hierarchical token count summary
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Possibly related PRs
Suggested labels
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Summary of Changes
Hello @gudber, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request enhances the repomix CLI by adding a new feature that provides a hierarchical summary of token counts across the repository's file structure. This allows users to quickly understand the token distribution within their codebase, with an option to filter results by a minimum token count.
Highlights
- New CLI Parameter: I've introduced a new command-line parameter,
--summarize-token-counts, to therepomixtool. This allows users to generate a detailed summary of token counts for their repository's file tree. - Hierarchical Token Count Display: The new feature calculates and presents token counts for individual files and their parent directories in a clear, tree-like structure, providing an intuitive overview of token distribution.
- Configurable Threshold Filtering: Users can optionally specify a minimum token count threshold with the
--summarize-token-countsparameter. This filters the output, displaying only files and directories that meet or exceed the specified token count, helping to focus on significant parts of the codebase. - Modular Implementation: The token summarization logic is implemented in new, dedicated modules responsible for building the hierarchical data structure, calculating token sums for directories, and rendering the output to the console.
- Comprehensive Test Coverage: New unit tests have been added to validate the CLI integration, the correctness of the token tree building algorithm, the display formatting, and the threshold filtering functionality.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (5)
src/cli/actions/defaultAction.ts (2)
103-115: Good integration with minor suggestion for robustness.The token count summarization integration is well-implemented with proper conditional execution and progress reporting.
Consider adding validation for the threshold parsing to handle invalid input gracefully:
- const threshold = - typeof cliOptions.summarizeTokenCounts === 'string' ? Number.parseInt(cliOptions.summarizeTokenCounts, 10) : 0; + const threshold = + typeof cliOptions.summarizeTokenCounts === 'string' + ? Math.max(0, Number.parseInt(cliOptions.summarizeTokenCounts, 10) || 0) + : 0;This ensures that invalid numeric strings default to 0 and negative values are clamped to 0.
152-164: Consider extracting the duplicated token summarization logic.The implementation is correct and consistent with the stdin workflow, but there's code duplication. Consider extracting this into a helper function:
private const handleTokenCountSummary = async ( cliOptions: CliOptions, packResult: PackResult, config: RepomixConfigMerged, spinner: Spinner, ): Promise<void> => { if (cliOptions.summarizeTokenCounts) { const threshold = typeof cliOptions.summarizeTokenCounts === 'string' ? Math.max(0, Number.parseInt(cliOptions.summarizeTokenCounts, 10) || 0) : 0; await summarizeTokenCounts( packResult.processedFiles, config.tokenCount.encoding as TiktokenEncoding, (message) => spinner.update(message), threshold, ); } };Then call this helper in both
handleStdinProcessingandhandleDirectoryProcessing..agents/rules/base.md (1)
1-4: File header inconsistency needs clarification.The file starts with "# CLAUDE.md" but the filename is
.agents/rules/base.md. This mismatch could cause confusion about the file's purpose and location.Consider updating the header to match the actual filename:
-# CLAUDE.md +# Base Rules for Repomix Developmentsrc/core/tokenCount/displayTokenCountTree.ts (2)
22-83: Consider breaking down the complex displayNode function.The
displayNodefunction handles multiple responsibilities: filtering, sorting, formatting, and recursive traversal. This makes it harder to test and maintain.Consider extracting helper functions:
+const filterAndSortEntries = (node: TreeNode, minTokenCount: number) => { + const allEntries = Object.entries(node).filter( + ([key, value]) => !key.startsWith('_') && value && typeof value === 'object' && !Array.isArray(value), + ); + + const entries = allEntries.filter(([, value]) => { + const tokenSum = (value as TreeNode)._tokenSum || 0; + return tokenSum >= minTokenCount; + }); + + const allFiles = node._files || []; + const files = allFiles.filter((file) => file.tokens >= minTokenCount); + + entries.sort(([a], [b]) => a.localeCompare(b)); + files.sort((a, b) => a.name.localeCompare(b.name)); + + return { entries, files }; +}; + +const formatTreeLine = (prefix: string, connector: string, name: string, tokenInfo: string, isRoot: boolean) => { + return isRoot && prefix === '' ? `${connector}${name} ${tokenInfo}` : `${prefix}${connector}${name} ${tokenInfo}`; +};
48-52: Simplify root handling logic.The repeated
isRoot && prefix === ''checks can be simplified by extracting the prefix calculation.+const getDisplayPrefix = (prefix: string, isRoot: boolean) => isRoot && prefix === '' ? '' : prefix; - if (isRoot && prefix === '') { - logger.log(`${connector}${file.name} ${tokenInfo}`); - } else { - logger.log(`${prefix}${connector}${file.name} ${tokenInfo}`); - } + const displayPrefix = getDisplayPrefix(prefix, isRoot); + logger.log(`${displayPrefix}${connector}${file.name} ${tokenInfo}`);Also applies to: 62-66
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
.agents/rules/base.md(4 hunks)src/cli/actions/defaultAction.ts(4 hunks)src/cli/cliRun.ts(1 hunks)src/cli/types.ts(1 hunks)src/core/tokenCount/buildTokenCountStructure.ts(1 hunks)src/core/tokenCount/displayTokenCountTree.ts(1 hunks)src/core/tokenCount/saveTokenCounts.ts(1 hunks)src/core/tokenCount/types.ts(1 hunks)tests/cli/actions/defaultAction.saveTokenCounts.test.ts(1 hunks)tests/core/tokenCount/buildTokenCountStructure.test.ts(1 hunks)tests/core/tokenCount/displayTokenCountTree.test.ts(1 hunks)tests/core/tokenCount/saveTokenCounts.test.ts(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
PR: yamadashy/repomix#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-18T15:12:57.169Z
Learning: .agents/rules/base.md
Learnt from: CR
PR: yamadashy/repomix#0
File: .cursorrules:0-0
Timestamp: 2025-06-30T16:07:18.316Z
Learning: Applies to .agents/rules/base.md : Check the rules written in `.agents/rules/base.md` as they contain important project-specific guidelines and instructions.
.agents/rules/base.md (2)
Learnt from: CR
PR: yamadashy/repomix#0
File: .cursorrules:0-0
Timestamp: 2025-06-30T16:07:18.316Z
Learning: Applies to .agents/rules/base.md : Check the rules written in .agents/rules/base.md as they contain important project-specific guidelines and instructions.
Learnt from: CR
PR: yamadashy/repomix#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-18T15:12:57.169Z
Learning: .agents/rules/base.md
🧬 Code Graph Analysis (7)
tests/core/tokenCount/buildTokenCountStructure.test.ts (1)
src/core/tokenCount/buildTokenCountStructure.ts (2)
FileWithTokens(4-7)buildTokenCountStructure(69-72)
tests/cli/actions/defaultAction.saveTokenCounts.test.ts (3)
src/core/tokenCount/saveTokenCounts.ts (1)
summarizeTokenCounts(8-34)src/cli/types.ts (1)
CliOptions(4-58)src/cli/actions/defaultAction.ts (1)
runDefaultAction(28-57)
src/core/tokenCount/displayTokenCountTree.ts (2)
src/core/tokenCount/buildTokenCountStructure.ts (2)
FileWithTokens(4-7)buildTokenCountTree(15-47)src/shared/logger.ts (1)
logger(89-89)
src/core/tokenCount/saveTokenCounts.ts (3)
src/core/file/fileTypes.ts (1)
ProcessedFile(6-9)src/core/tokenCount/buildTokenCountStructure.ts (1)
FileWithTokens(4-7)src/core/tokenCount/displayTokenCountTree.ts (1)
displayTokenCountTree(10-20)
src/core/tokenCount/buildTokenCountStructure.ts (1)
src/core/tokenCount/types.ts (3)
FileTokenInfo(1-4)TokenCountOutput(12-12)DirectoryTokenInfo(6-10)
src/cli/actions/defaultAction.ts (1)
src/core/tokenCount/saveTokenCounts.ts (1)
summarizeTokenCounts(8-34)
tests/core/tokenCount/saveTokenCounts.test.ts (3)
src/core/tokenCount/displayTokenCountTree.ts (1)
displayTokenCountTree(10-20)src/core/file/fileTypes.ts (1)
ProcessedFile(6-9)src/core/tokenCount/saveTokenCounts.ts (1)
summarizeTokenCounts(8-34)
🔇 Additional comments (10)
src/cli/cliRun.ts (1)
104-107: LGTM! Well-structured CLI option implementation.The new
--summarize-token-countsoption is properly integrated with clear documentation and appropriate optional parameter handling. The placement in the "Token Count Options" group maintains good CLI organization.src/cli/types.ts (1)
49-49: LGTM! Appropriate type definition for the CLI option.The
summarizeTokenCounts?: boolean | stringtype correctly represents the dual nature of the CLI option (flag or threshold value) and follows established naming conventions.tests/core/tokenCount/buildTokenCountStructure.test.ts (1)
1-108: Excellent test coverage for the token count structure builder.The test suite comprehensively covers various scenarios including:
- Simple and nested directory structures
- Multiple root directories
- Empty input handling
- File name collisions in different directories
Each test case is well-structured with clear input/output validation, providing confidence in the robustness of the
buildTokenCountStructurefunction.src/core/tokenCount/types.ts (1)
1-12: Well-designed type definitions for hierarchical token count structure.The type definitions effectively model the hierarchical nature of the token count feature:
FileTokenInfoprovides a clean interface for individual files with token countsDirectoryTokenInfoproperly supports recursive directory structures with the optionaldirectoriespropertyTokenCountOutputas an array accommodates multiple root directoriesThe types are intuitive, properly typed, and will provide good IntelliSense support throughout the codebase.
tests/core/tokenCount/saveTokenCounts.test.ts (1)
1-102: Excellent test coverage and structure!This test suite demonstrates comprehensive testing practices:
- Proper mock setup and cleanup
- Tests cover success path, error scenarios, empty input, and parameter variations
- Resource cleanup verification (TokenCounter.free()) even in error cases
- Clear test descriptions and well-structured assertions
The dependency injection testing approach using mocks is well-implemented and follows best practices.
src/core/tokenCount/saveTokenCounts.ts (1)
8-34: Well-implemented async function with proper resource management!The implementation demonstrates excellent practices:
- Clean function signature with appropriate default parameter
- Proper resource management using try-finally to ensure TokenCounter cleanup
- Progress callback integration for user feedback
- Clear, readable code structure
The guaranteed cleanup in the finally block is particularly important for preventing resource leaks.
tests/core/tokenCount/displayTokenCountTree.test.ts (1)
1-178: Comprehensive and well-structured display tests!This test suite excellently covers all aspects of the token count tree display:
- Various tree structures (simple, nested, multiple files)
- Edge cases (empty lists, filtering scenarios)
- Sorting verification with proper index comparisons
- Threshold filtering with detailed assertions
- Proper logger mocking and output verification
The tests are particularly well-organized and demonstrate thorough understanding of the display requirements.
tests/cli/actions/defaultAction.saveTokenCounts.test.ts (1)
1-120: Excellent CLI integration test coverage!This test suite effectively validates the integration of token count summarization into the CLI workflow:
- Comprehensive mocking strategy isolates the integration logic
- Tests cover various CLI option combinations (boolean/string values, custom encodings)
- Proper verification of parameter passing to the core function
- Both positive and negative test cases are included
- Mock setup is realistic and consistent
The tests ensure the CLI properly bridges user options to the core functionality.
.agents/rules/base.md (1)
21-221: Comprehensive and well-structured development documentation!The extensive updates provide excellent guidance covering:
- Essential development commands for common workflows
- High-level architecture with clear pipeline explanation
- Key architectural patterns with code examples
- Detailed coding guidelines and commit message standards
- Important procedural instructions for AI assistants
The documentation strikes a good balance between being comprehensive and remaining actionable for developers and AI assistants working with the codebase.
src/core/tokenCount/buildTokenCountStructure.ts (1)
115-122: No change needed for root-level file handling
The current implementation intentionally represents each root-level file as aDirectoryTokenInfowith the file name and its tokens. All existing tests inbuildTokenCountStructure.test.tsexplicitly verify this behavior (e.g. the “nested directory structure” and “multiple root directories” cases), confirming it aligns with the intended design.
|
Hi, @gudber ! I'll make some adjustments to the options and code structure. |
- Move spinner initialization to runDefaultAction for shared usage across all operations - Extract handleTokenCountSummary method from inline code for better modularity - Remove unused cliOptions parameters from handleStdinProcessing and handleDirectoryProcessing - Update test signatures to match new function parameters - Enable spinner reuse in summarizeTokenCounts functionality This refactoring improves code organization by centralizing spinner management and separating token count summary logic into a dedicated method. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
…rint Major restructuring of CLI options and tokenCountTree functionality: CLI Options: - Reorganized options into logical groups: "CLI Input/Output Options", "Repomix Output Options", and "File Selection Options" - Moved --verbose, --quiet, --stdout, --stdin, --copy, --token-count-tree, --top-files-len to CLI I/O group - Renamed --summarize-token-counts to --token-count-tree for clarity - Updated README.md documentation to reflect new option organization TokenCountTree Refactoring: - Moved tokenCountTree functionality from saveTokenCounts.ts to cliPrint.ts as printTokenCountTree - Deleted saveTokenCounts.ts (no longer needed) - Removed handleTokenCountTree function from defaultAction.ts - Integrated printTokenCountTree into printResults workflow for consistency - Updated threshold calculation to be handled within printTokenCountTree Configuration: - Added tokenCountTree option to config schema with default value false - Changed schema from z.union([z.boolean(), z.string()]) to z.union([z.boolean(), z.number()]) - CLI now converts string thresholds to numbers during buildCliConfig Optimizations: - Modified calculateMetrics to calculate all file tokens when tokenCountTree is enabled - Prevents double token calculation for better performance Display: - Changed emoji from 📊 to 🔢 for Token Count Tree - Updated title from "Token Count Summary" to "Token Count Tree" Tests: - Renamed and updated test files to match new structure - Updated tests to work with new printTokenCountTree function signature - All tests pass with new implementation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
…functions - Rename cliPrint.ts to cliReport.ts with all functions changed from print* to report* - Move printResults from defaultAction to cliReport as reportResults for better organization - Move reportTokenCountTree to cli/reporters/tokenCountTreeReporter.ts for cleaner separation - Move test-only functions (buildTokenCountStructure, convertToOutput) from core module to test files - Update all imports and test files to reflect new naming conventions - Maintain all functionality while improving code organization and module cohesion 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Merge displayTokenCountTree functionality directly into reportTokenCountTree - Remove separate displayTokenCountTree export and test file for better code organization - Update configSchema to accept string values for tokenCountTree (boolean | number | string) - Enhance tokenCountTreeReporter to handle string to number conversion - Fix config tests to properly validate tokenCountTree type union This consolidates the token count tree display logic into a single function while maintaining all existing functionality and improving type safety. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
…play - Apply pc.dim() to token count parentheses for consistent styling with other CLI outputs - Add toLocaleString() for comma-separated number formatting - Improves readability of large token counts in tree view 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Enhance token counting feature description with tree visualization details - Expand --token-count-tree CLI option explanation with optimization use cases - Add new "Token Count Optimization" section with practical examples and output samples - Include threshold usage examples and token reduction strategies - Document use cases for identifying large files, optimizing selection, and planning compression This helps users understand how to leverage token counting for AI context optimization. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
|
@claude review |
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
- Remove leading newline from "🔢 Token Count Tree:" output for cleaner formatting - Update corresponding test expectations to match the new output format - Improves consistency with other CLI output sections 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
… location - Move test file from tests/cli/cliReport.reportTokenCountTree.test.ts to tests/cli/reporters/tokenCountTreeReporter.test.ts - Update import paths to reflect new directory structure - Improve test organization by co-locating test with the module it tests 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Remove .js extension from vi.mock('../../../src/shared/logger') for consistency
- Follow project convention for mock imports in test files
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add content verification for tree structure output instead of just checking headers - Verify actual file paths and directories are displayed in the tree - Test that files without token counts are properly skipped - Make tests more robust by checking actual functionality rather than just basic output This should resolve CI test failures by properly validating the complete tree output. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Remove toHaveBeenCalledWith() assertions that depend on call order - Replace with content-based checks using .some() to verify required output exists - Make tests more resilient to output order changes in CI/CD environments - Ensure tests focus on functionality rather than specific call sequences This should resolve the CI test failures caused by different output ordering. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add picocolors mock following the same pattern as cliReport.test.ts - Update test expectations to match mocked color output (e.g., DIM:────────) - Ensure consistent test behavior across different environments - Fix CI test failures caused by environment-specific color handling This aligns with the existing test patterns in the codebase. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #747 +/- ##
==========================================
+ Coverage 88.38% 88.56% +0.17%
==========================================
Files 106 109 +3
Lines 5940 6084 +144
Branches 1212 1261 +49
==========================================
+ Hits 5250 5388 +138
- Misses 690 696 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- Add parentheses around arrow function parameters for consistency - Apply automatic code style fixes from biome linter 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Replace path.sep with forward slash in buildTokenCountTree for consistent behavior - Remove unused path import after changing to forward slash split - Fix Windows CI test failures caused by path separator differences - Ensure tests work consistently across Unix and Windows environments 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
|
/gemini review |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new --token-count-tree feature to display a file tree with token counts, which is a great addition for understanding repository composition. The implementation is solid, with good refactoring of the CLI reporting logic and comprehensive test coverage for the new functionality.
I have a couple of suggestions to improve robustness:
- In
src/cli/actions/defaultAction.ts, the use ofparseIntwith radix0should be changed to radix10to ensure consistent behavior. - In
src/cli/reporters/tokenCountTreeReporter.ts, the logic for parsing the token count threshold should be made more robust to handle invalid numeric inputs, which currently lead to a silent failure where an empty tree is displayed.
These changes will make the new feature more reliable for users. Overall, great work on this feature.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/cli/reporters/tokenCountTreeReporter.ts (1)
48-115: Remove unused parameter and improve function clarity.The
displayNodefunction is well-implemented with proper recursive tree rendering. However, the_isLastparameter is unused, which suggests it was intended for different logic or is vestigial.const displayNode = ( node: TreeNode, prefix: string, - _isLast: boolean, isRoot: boolean, minTokenCount: number, ): void => {Also, update the recursive call:
- displayNode(childNode as TreeNode, childPrefix, isLastEntry, false, minTokenCount); + displayNode(childNode as TreeNode, childPrefix, false, minTokenCount);
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
README.md(5 hunks)src/cli/actions/defaultAction.ts(5 hunks)src/cli/cliReport.ts(6 hunks)src/cli/cliRun.ts(2 hunks)src/cli/reporters/tokenCountTreeReporter.ts(1 hunks)src/cli/types.ts(1 hunks)src/config/configSchema.ts(2 hunks)src/core/metrics/calculateMetrics.ts(1 hunks)src/core/tokenCount/buildTokenCountStructure.ts(1 hunks)src/core/tokenCount/types.ts(1 hunks)tests/cli/actions/defaultAction.test.ts(13 hunks)tests/cli/actions/defaultAction.tokenCountTree.test.ts(1 hunks)tests/cli/cliReport.test.ts(9 hunks)tests/cli/reporters/tokenCountTreeReporter.test.ts(1 hunks)tests/config/configSchema.test.ts(3 hunks)tests/core/tokenCount/buildTokenCountStructure.test.ts(1 hunks)
✅ Files skipped from review due to trivial changes (3)
- src/core/tokenCount/types.ts
- tests/cli/cliReport.test.ts
- src/config/configSchema.ts
🚧 Files skipped from review as they are similar to previous changes (5)
- src/cli/types.ts
- src/cli/cliRun.ts
- tests/core/tokenCount/buildTokenCountStructure.test.ts
- src/core/tokenCount/buildTokenCountStructure.ts
- src/cli/actions/defaultAction.ts
🧰 Additional context used
🧬 Code Graph Analysis (3)
src/cli/reporters/tokenCountTreeReporter.ts (4)
src/core/file/fileTypes.ts (1)
ProcessedFile(6-9)src/config/configSchema.ts (1)
RepomixConfigMerged(160-160)src/core/tokenCount/buildTokenCountStructure.ts (2)
FileWithTokens(3-6)buildTokenCountTree(14-47)src/shared/logger.ts (1)
logger(89-89)
tests/config/configSchema.test.ts (1)
src/config/configSchema.ts (1)
repomixConfigBaseSchema(16-69)
tests/cli/actions/defaultAction.test.ts (2)
src/cli/cliSpinner.ts (1)
Spinner(9-70)src/cli/actions/defaultAction.ts (2)
handleStdinProcessing(71-109)handleDirectoryProcessing(114-137)
🪛 markdownlint-cli2 (0.17.2)
README.md
713-713: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 LanguageTool
README.md
[style] ~916-~916: To form a complete sentence, be sure to include a subject.
Context: ...y file tree with token count summaries. Can be boolean or number (minimum token cou...
(MISSING_IT_THERE)
🔇 Additional comments (26)
src/core/metrics/calculateMetrics.ts (3)
33-35: LGTM! Clear conditional logic for token count optimization.The comment and conditional check clearly explain when to calculate token counts for all files versus just the top files. This optimization makes sense to avoid double calculation when the token count tree feature is enabled.
37-45: Well-structured conditional file selection logic.The implementation correctly handles both scenarios:
- When
tokenCountTreeis enabled: calculates for all files to support the tree feature- Otherwise: optimizes by calculating only for top files by character count
The fallback logic with
Math.max(topFilesLength * 10, topFilesLength)ensures a reasonable number of files are processed even whentopFilesLengthis small.
47-56: Function call updated correctly with new parameter.The
calculateSelectiveFileMetricscall now uses the computedmetricsTargetPathsinstead of a hardcoded calculation, which aligns with the new conditional logic above.tests/config/configSchema.test.ts (2)
24-57: Comprehensive test coverage for tokenCountTree option.The test suite thoroughly covers the
tokenCountTreeoption with appropriate test cases:
- Boolean values (true/false) ✓
- String values ✓
- Invalid type rejection (array) ✓
The tests follow the established pattern and use descriptive names. Good validation of the union type
z.union([z.boolean(), z.number(), z.string()])from the schema.
66-66: Appropriate integration of tokenCountTree in existing tests.The existing schema validation tests have been properly updated to include the
tokenCountTreeproperty with valid values (boolean and string), ensuring the new option is covered in comprehensive schema testing scenarios.Also applies to: 116-116, 212-212
tests/cli/actions/defaultAction.tokenCountTree.test.ts (5)
1-51: Well-structured test setup with comprehensive mocking.The test file follows good testing practices:
- Proper mocking of all external dependencies
- Clear separation of concerns with individual mock functions
- Sensible default values in beforeEach setup
- Clean mock reset between tests
The mock setup covers all the necessary modules and provides realistic return values for testing the tokenCountTree functionality.
53-87: Thorough test for tokenCountTree enabled scenario.This test properly verifies that when
tokenCountTree: trueis provided in CLI options, the merged configuration reflects this setting and the reporting function receives the correct parameters. The test structure and assertions are comprehensive.
89-103: Good test coverage for disabled tokenCountTree.This test ensures the default behavior (tokenCountTree: false) works correctly when the option is not provided, which is important for backward compatibility.
105-133: Multiple directories scenario properly tested.This test validates that the tokenCountTree functionality works correctly with multiple input directories, ensuring the feature scales appropriately.
135-164: Threshold parameter handling tested correctly.This test verifies that string threshold values (like '50') are properly converted to numbers (50) in the merged configuration, which aligns with the schema's union type support.
tests/cli/reporters/tokenCountTreeReporter.test.ts (5)
1-25: Excellent mocking strategy for testing reporter output.The test setup properly mocks both the logger and picocolors dependencies, allowing for precise verification of the output format and content. The mock implementations provide predictable color/formatting prefixes that make assertions clear and reliable.
27-53: Comprehensive test for basic token count tree display.This test verifies the core functionality by checking that:
- The header and separator are displayed
- File paths and directories are included in the output
- The logger is called with expected content
The test data is realistic and the assertions are thorough.
55-68: Good edge case handling for empty file list.This test ensures the reporter gracefully handles the edge case of no files being processed, displaying an appropriate "No files found" message. This is important for user experience.
70-92: Proper threshold functionality testing.This test validates that the minimum token count threshold is correctly communicated to users through the display message, ensuring transparency about filtering behavior.
94-121: Important test for handling missing token counts.This test verifies that files without token count data are properly skipped, which is crucial for robustness. The test confirms that the reporter doesn't break when some files lack token count information.
tests/cli/actions/defaultAction.test.ts (3)
26-40: Well-implemented mock spinner with comprehensive method coverage.The
mockSpinnerobject properly mocks all the essential Spinner methods (start,update,succeed,fail,stop) and properties. The mocking of thecliSpinnermodule ensures that anynew Spinner()instantiation returns the mock instance.
47-51: Proper mock setup in beforeEach hook.The beforeEach hook correctly resets mocks and ensures the Spinner constructor returns the mockSpinner instance, providing clean test isolation.
668-668: Consistent refactoring of function calls throughout test suite.All calls to
handleStdinProcessingandhandleDirectoryProcessinghave been consistently updated to passmockSpinnerinstead of the previous parameter. This refactoring is thorough and maintains the test logic while adapting to the new function signatures.Also applies to: 674-674, 685-685, 700-700, 719-719, 731-731, 743-743, 799-799, 816-816, 826-826, 835-835, 863-863
src/cli/reporters/tokenCountTreeReporter.ts (1)
13-46: LGTM! Well-structured token count extraction and display logic.The function correctly handles different config types for
tokenCountTreeand properly filters files with token counts. The output formatting with emoji and styling is consistent with the rest of the application.Consider a minor improvement for readability:
- // Display the token count tree logger.log('🔢 Token Count Tree:'); logger.log(pc.dim('────────────────────'));README.md (4)
510-518: LGTM! Clear documentation of new CLI options.The reorganization into "CLI Input/Output Options" improves clarity and the
--token-count-treeoption is well-documented with its optional threshold parameter.
703-736: Excellent documentation for the new token count feature.This section provides comprehensive coverage of the
--token-count-treefeature with clear examples, practical use cases, and benefits. The sample output accurately represents the tree format generated by the code.
916-916: Configuration documentation is accurate and complete.The
output.tokenCountTreeoption is properly documented with correct type information (boolean or number threshold) and appropriate default value.
537-537: Good improvement in section naming."File Selection Options" is more descriptive and accurate than the previous "Filter Options" naming.
src/cli/cliReport.ts (3)
9-37: Excellent refactoring with improved separation of concerns.The new
reportResultsfunction provides a clean orchestration of the reporting workflow with proper conditional logic and spacing. The integration of token count tree reporting is well-positioned in the sequence.
25-28: Clean integration of token count tree reporting.The conditional reporting based on
config.output.tokenCountTreefollows the established pattern and maintains consistency with other reporting sections.
74-146: Consistent function renaming improves code clarity.The rename from
print*toreport*functions is well-executed throughout the file. Comments and formatting adjustments maintain consistency and improve readability.
- Add explicit radix 10 to Number.parseInt() for consistent parsing - Add validation to throw error for non-numeric string inputs - Export TreeNode interface to eliminate code duplication - Remove unused rootDirs parameter from test helper - Remove unused isLast parameter from displayNode function - Simplify token count threshold handling in reporter These changes address review feedback to improve robustness and maintainability. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
CLI options use string type for numeric values as they come from commander.js parsing. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
|
/gemini review |
There was a problem hiding this comment.
Pull Request Overview
This PR introduces a new --token-count-tree parameter that displays hierarchical token count summaries for repository files, helping users understand token distribution across their codebase for AI context optimization.
Key changes:
- Adds
--token-count-treeCLI option with optional threshold parameter - Implements token count tree visualization with directory structure
- Integrates token counting into metrics calculation for all files when feature is enabled
Reviewed Changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/cli/types.ts | Adds tokenCountTree option to CliOptions interface |
| src/config/configSchema.ts | Adds tokenCountTree field to configuration schema |
| src/cli/cliRun.ts | Adds CLI option definition and reorganizes option groups |
| src/cli/actions/defaultAction.ts | Implements CLI config parsing and refactors to use centralized reporting |
| src/cli/cliReport.ts | Refactors from cliPrint to cliReport with centralized result reporting |
| src/core/tokenCount/types.ts | Defines TypeScript interfaces for token count data structures |
| src/core/tokenCount/buildTokenCountStructure.ts | Implements tree building logic for hierarchical token counting |
| src/cli/reporters/tokenCountTreeReporter.ts | Implements tree visualization and display logic |
| src/core/metrics/calculateMetrics.ts | Modifies metrics calculation to include all files when tree feature is enabled |
| README.md | Documents the new feature with usage examples and configuration details |
| tests/ | Comprehensive test coverage for all new functionality |
Comments suppressed due to low confidence (2)
src/cli/reporters/tokenCountTreeReporter.ts:30
- [nitpick] The emoji and title don't match the PR description example which shows '📊 Token Count Summary:'. Consider using consistent terminology and emoji between the code and documentation.
logger.log('🔢 Token Count Tree:');
tests/cli/actions/defaultAction.buildCliConfig.test.ts:85
- The test expects
parseIntbehavior (10.5 -> 10) but this may not be the intended behavior for a threshold parameter. Consider testing whether this behavior is actually desired or if it should be treated as an error.
// parseInt returns only the integer part
There was a problem hiding this comment.
Code Review
This pull request introduces a very useful feature, --token-count-tree, to visualize token counts in a file tree, which is great for managing context for AI models. The implementation is solid, with good refactoring of the CLI reporting logic and comprehensive test coverage for the new functionality.
I've identified a few areas for improvement:
- The parsing of the numeric threshold for
--token-count-treeis a bit too lenient and could lead to unexpected behavior. I've suggested a stricter implementation. - There's some code duplication in the new reporter file that can be simplified.
- A couple of new types seem to be unused in the application code.
- A minor formatting issue in the
README.mdtable.
Overall, this is a great addition to the tool. Addressing these points will improve the robustness and maintainability of the code.
- Move string-to-number parsing from defaultAction to Commander.js option parser - Update CLI types to reflect Commander.js parsed values (boolean | number) - Simplify buildCliConfig by removing redundant parsing logic - Update tests to match new type expectations This approach follows the same pattern as --top-files-len and centralizes input validation at the CLI parsing level where it belongs. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Fix radix parameter in Number.parseInt from 0 to 10 for consistent parsing - Add safety checks for malformed file paths in buildTokenCountTree These changes improve input validation and prevent potential parsing issues. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
|
@gudber
This feature will be super helpful for the community. Thanks for this great contribution! Merging now 🎉 |
|
@gudber Thank you for your contribution! |
|
Awesome @yamadashy, glad to help. I use this a lot to feed into LLM and ask it to generate a .repomixignore file for me to get a repo under 1M tokens for example (Gemini Pro 2.5, Sonnet 4). Using prompt like: "Use this Repomix token summary of each file in a software repo to write me a .repomixignore file to bring token count under 1 million. Focus on eliminating things like tests, benchmarks, i8n files, json data files, css, media and other assets and extra fluff but try to keep the documentation and most of the core functionality code files of the repo" |
|
Wow, that's a great use case! |
Overview
This PR introduces a new parameter --summarize-token-counts which displays summarized token counts for each entry in a file tree of a repo processed with repomix. For example on Repomix this would show something like:
📊 Token Count Summary:
────────────────────────
....
├── tsconfig.json (177 tokens)
├── typos.toml (80 tokens)
├── vitest.config.ts (89 tokens)
├── .agents/ (2874 tokens)
│ └── rules/ (2874 tokens)
│ ├── base.md (1988 tokens)
│ ├── browser-extension.md (453 tokens)
│ └── website.md (433 tokens)
...
Usage Examples
repomix --summarize-token-countsWould display something like:
📊 Token Count Summary:
────────────────────────
....
├── tsconfig.json (177 tokens)
├── typos.toml (80 tokens)
├── vitest.config.ts (89 tokens)
├── .agents/ (2874 tokens)
│ └── rules/ (2874 tokens)
│ ├── base.md (1988 tokens)
│ ├── browser-extension.md (453 tokens)
│ └── website.md (433 tokens)
...
repomix --summarize-token-counts 1000Would display something like:
📊 Token Count Summary:
────────────────────────
....
├── .agents/ (2874 tokens)
│ └── rules/ (2874 tokens)
│ ├── base.md (1988 tokens)
...
CLI Command
repomix --summarize-token-counts [minum threshold to display token count]