feat(cli): Add --split-output option#1013
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds a new Changes
Sequence DiagramsequenceDiagram
participant User
participant CLI as CLI (cliRun)
participant Config as Config (defaultAction)
participant Packager as Packager (packager)
participant Produce as ProduceOutput
participant Split as Splitter (outputSplit)
participant FS as FileSystem
participant Metrics as Metrics (calculateMetrics)
User->>CLI: repomix --split-output 20mb
CLI->>CLI: parseHumanSizeToBytes("20mb")
CLI->>Config: buildCliConfig(splitOutput)
Config->>Config: validate conflicting options
Config->>Packager: invoke pack(config)
Packager->>Produce: produceOutput(rootDirs, config, files, allPaths, gitDiff, gitLog, progress)
alt splitOutput enabled
Produce->>Split: generateSplitOutputParts(processedFiles, allFilePaths, maxBytesPerPart, generateOutput, progress)
Split->>Split: group by root entry, render groups, measure UTF-8 bytes, assemble parts
Split-->>Produce: OutputSplitPart[] (parts with index, filePath, content)
Produce->>FS: writeOutputToDisk(part.filePath, part.content) for each part
Produce-->>Packager: { outputFiles: [paths...], outputForMetrics: [contents...] }
else single output
Produce->>Produce: generateOutput(...)
Produce->>FS: writeOutputToDisk(outputFilePath, content)
Produce->>Produce: copyToClipboardIfEnabled(...)
Produce-->>Packager: { outputForMetrics: content }
end
Packager->>Metrics: calculateMetrics(processedFiles, outputForMetrics, ..., gitDiff, gitLog)
Metrics->>Metrics: compute per-part counts and aggregate
Metrics-->>Packager: CalculateMetricsResult
Packager-->>User: final result (includes outputFiles when split)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Key areas to inspect closely:
Possibly related PRs
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @Dango233, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant enhancement to the CLI by enabling the splitting of large generated outputs into multiple, more manageable files. This feature is particularly beneficial for users dealing with large codebases or strict context window limitations in AI tools, as it allows them to process output in chunks without losing the context of individual files. The implementation carefully handles file grouping and output generation to ensure a seamless and logical division of content. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Additional context - Google aistudio recently introduced 1MB text file size limit - that was the trigger for this PR |
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable --split-output feature, allowing large repository outputs to be split into multiple smaller files. The implementation is comprehensive, covering CLI options, configuration, core splitting logic, metrics calculation, and reporting. The approach of grouping files by root directory to prevent splitting related files is well-thought-out.
My review includes a couple of suggestions for improvement:
- A correction in
calculateMetrics.tsto ensure consistent file path generation for split parts. - A refactoring of
buildOutputSplitGroupsin the newoutputSplit.tsfile for better clarity and robustness.
The changes are well-tested, with new unit and integration tests covering the new functionality. The documentation has also been updated accordingly. Overall, this is a solid addition to the project.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
website/client/src/en/guide/configuration.md (1)
103-103: LGTM! Clear documentation for the new split output feature.The description accurately explains the feature and its behavior. Consider clarifying what happens when a single file exceeds the specified split size limit (based on the PR description, files are not split mid-content, so understanding the behavior in edge cases would be helpful for users).
src/core/packager.ts (1)
186-199: Consider cleanup on partial write failure.If writing one part fails, previously written parts remain on disk. For a better user experience, consider either:
- Wrapping in a try/catch to clean up previously written files on failure
- Documenting this behavior so users know partial output may exist
This is a minor concern since partial output can still be useful for debugging.
src/core/output/outputSplit.ts (1)
149-189: Performance consideration: Multiple re-renders during splitting.The algorithm re-renders groups multiple times to accurately measure cumulative size (since output generation adds headers/footers). While this ensures correctness, it could be expensive for large repositories with many top-level entries.
This trade-off seems acceptable for correctness, but consider adding a note in documentation about potential performance impact with many small top-level directories.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
README.md(2 hunks)src/cli/actions/defaultAction.ts(2 hunks)src/cli/cliReport.ts(1 hunks)src/cli/cliRun.ts(2 hunks)src/cli/types.ts(1 hunks)src/config/configSchema.ts(2 hunks)src/core/metrics/calculateMetrics.ts(3 hunks)src/core/output/outputSplit.ts(1 hunks)src/core/packager.ts(4 hunks)src/shared/sizeParse.ts(1 hunks)tests/cli/actions/defaultAction.buildCliConfig.test.ts(1 hunks)tests/core/metrics/calculateMetrics.test.ts(1 hunks)tests/core/output/outputSplit.test.ts(1 hunks)tests/core/packager.test.ts(2 hunks)tests/core/packager/splitOutput.test.ts(1 hunks)tests/shared/sizeParse.test.ts(1 hunks)website/client/src/en/guide/command-line-options.md(2 hunks)website/client/src/en/guide/configuration.md(1 hunks)website/client/src/en/guide/usage.md(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (7)
tests/cli/actions/defaultAction.buildCliConfig.test.ts (2)
src/cli/types.ts (1)
CliOptions(4-70)src/cli/actions/defaultAction.ts (1)
buildCliConfig(157-351)
src/cli/cliRun.ts (1)
src/shared/sizeParse.ts (1)
parseHumanSizeToBytes(5-19)
src/core/packager.ts (3)
src/shared/errorHandle.ts (1)
RepomixError(6-11)src/shared/memoryUtils.ts (1)
withMemoryLogging(71-87)src/core/output/outputSplit.ts (1)
generateSplitOutputParts(69-203)
src/shared/sizeParse.ts (1)
src/shared/errorHandle.ts (1)
RepomixError(6-11)
src/core/output/outputSplit.ts (4)
src/config/configSchema.ts (1)
RepomixConfigMerged(160-160)src/core/git/gitDiffHandle.ts (1)
GitDiffResult(7-10)src/core/git/gitLogHandle.ts (1)
GitLogResult(21-24)src/shared/errorHandle.ts (1)
RepomixError(6-11)
tests/core/packager/splitOutput.test.ts (5)
tests/testing/testUtils.ts (1)
createMockConfig(15-47)src/core/packager/writeOutputToDisk.ts (1)
writeOutputToDisk(7-22)src/core/output/outputGenerate.ts (1)
generateOutput(239-280)src/core/metrics/calculateMetrics.ts (1)
calculateMetrics(23-116)src/core/packager.ts (1)
pack(60-234)
tests/shared/sizeParse.test.ts (1)
src/shared/sizeParse.ts (1)
parseHumanSizeToBytes(5-19)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Agent
🔇 Additional comments (32)
README.md (2)
629-629: LGTM! Well-documented CLI option.The description is clear and the placement in the Repomix Output Options section is appropriate. The example format (e.g.,
500kbor2mb) helps users understand the expected input format.
686-687: LGTM! Clear usage example.The example demonstrates practical usage with a reasonable size parameter (20mb) and clearly shows the command syntax.
website/client/src/en/guide/command-line-options.md (2)
29-29: LGTM! Consistent documentation.The option description matches the README.md documentation, maintaining consistency across all documentation surfaces.
80-81: LGTM! Clear example.The example demonstrates the feature usage effectively and is consistent with other examples in the file.
website/client/src/en/guide/usage.md (1)
28-35: LGTM! Excellent usage documentation.The section provides clear context for when to use the feature (large output for AI context windows) and shows the expected output file naming convention (
.1.xml,.2.xml, etc.). This helps users understand both the use case and the results.src/cli/types.ts (1)
26-26: LGTM! Properly typed with clear unit specification.The optional
splitOutputfield is correctly typed and the// bytescomment clarifies the expected unit, which is important for both type safety and developer understanding.tests/core/packager.test.ts (1)
51-52: LGTM! Test updated to reflect expanded safety and metrics structure.The additions of
suspiciousGitDiffResults,suspiciousGitLogResults,gitDiffTokenCount, andgitLogTokenCountalign with the broader PR changes that enhance security validation for git diffs/logs and expand metrics calculation. The mock values (empty arrays and zero counts) are appropriate for this test scenario.Also applies to: 69-70
src/cli/actions/defaultAction.ts (2)
55-67: LGTM! Comprehensive validation for split output constraints.The validation logic appropriately prevents invalid combinations:
- stdout/"-": Split output requires writing multiple files to disk, incompatible with stdout
- skill-generate: Skill output is a directory structure, incompatible with split files
- copy-to-clipboard: Cannot copy multiple files to clipboard
The error messages are clear and explain the constraints to users. This prevents runtime errors and provides good UX.
291-296: LGTM! Proper config propagation.The
splitOutputoption is correctly propagated from CLI options to the merged config when defined, following the same pattern as other output options inbuildCliConfig.tests/core/metrics/calculateMetrics.test.ts (1)
64-64: LGTM! Cleaner async function syntax.The change from
Promise.resolve(30)toasync () => 30is functionally equivalent but uses cleaner, more modern async function syntax. This aligns with the metrics calculation updates to support multi-part outputs mentioned in the PR.tests/cli/actions/defaultAction.buildCliConfig.test.ts (1)
28-38: LGTM!The test correctly validates that the
splitOutputoption is properly mapped from CLI options to the configuration object.tests/core/output/outputSplit.test.ts (1)
1-50: LGTM!The test suite comprehensively covers the output splitting logic:
- Correct root entry extraction for nested paths and root files
- Proper grouping by root entry with all file paths included
- Correct file path formatting with part numbers
src/cli/cliRun.ts (2)
7-7: LGTM!The import correctly brings in the size parsing utility.
120-125: LGTM!The CLI option is well-defined with:
- Clear description and examples
- Proper integration with the argument parser to convert human-readable sizes to bytes
- Appropriate placement in the Repomix Output Options group
tests/shared/sizeParse.test.ts (1)
1-20: LGTM!The test suite thoroughly validates the size parsing utility:
- Correct parsing of kb and mb units (case-insensitive)
- Proper error handling for invalid formats, non-positive values, and unsupported units
src/cli/cliReport.ts (1)
95-108: LGTM!The enhanced output display logic correctly handles multi-part outputs:
- Shows first and last file paths with an ellipsis and part count when multiple outputs exist
- Falls back to single output display when
outputFilesis not present- User-friendly formatting with proper path display
src/config/configSchema.ts (2)
43-43: LGTM!The
splitOutputfield is correctly added to the base schema with appropriate validation (positive integer).
104-104: LGTM!The
splitOutputfield is correctly added to the default schema, maintaining consistency with the base schema definition.tests/core/packager/splitOutput.test.ts (1)
1-83: LGTM!This integration test comprehensively validates the split output functionality:
- Correctly mocks all dependencies including generateOutput with size-dependent behavior
- Verifies multiple output files are written with proper naming (numbered parts)
- Confirms calculateMetrics receives the array of output parts
- Validates the final result contains the expected outputFiles array
src/shared/sizeParse.ts (1)
1-19: LGTM!The size parsing utility is well-implemented:
- Correct regex pattern for matching size format
- Proper validation for format, safe integer range, and positive values
- Accurate byte conversion for kb and mb units
- Clear and helpful error messages
src/core/metrics/calculateMetrics.ts (3)
23-37: LGTM! Clean API extension for multi-part output support.The function signature change from
stringtostring | string[]is backward-compatible and elegantly supports both single and split output scenarios.
74-80: LGTM! Parallel per-part metrics calculation.The approach correctly computes metrics for each output part in parallel and generates consistent part paths for metrics identification.
85-87: Potential runtime error ifoutputPartsis empty.The
reduceon line 85 has an initial value of0, which is correct. However, ifoutputPartsis empty,outputTokenCountswould also be empty, and whilereducewith an initial value handles this gracefully, an empty output scenario might indicate an upstream issue.The logic is correct as written since the initial value
0is provided.src/core/packager.ts (3)
29-29: LGTM! Clean extension of PackResult interface.The optional
outputFilesproperty cleanly extends the result type without breaking existing consumers.
161-167: LGTM! Proper validation for incompatible options.The early validation correctly prevents
splitOutputfrom being used withstdoutorcopyToClipboard, with clear error messages.
215-222: LGTM! Clean integration with metrics calculation.The metrics calculation correctly receives either a single output string or array of parts, and the result assembly properly includes
outputFilesonly when split output was used.src/core/output/outputSplit.ts (6)
22-26: LGTM! Correct cross-platform path handling.The function properly normalizes Windows path separators and extracts the root entry. The fallback
first || normalizedhandles edge cases where no separator exists.
41-53: Consider the relationship between allFilePaths and processedFiles.When a
processedFile.pathdoesn't exist inallFilePaths, the function creates a new group withallFilePaths: [processedFile.path]. This seems intentional to handle edge cases, but worth verifying this aligns with expected behavior whereprocessedFilesshould be a subset ofallFilePaths.
58-65: LGTM! Correct file path generation for split parts.The function correctly handles both extension and no-extension cases, inserting the part index appropriately.
99-101: LGTM! Robust validation for maxBytesPerPart.The validation correctly checks for both safe integer (preventing floating-point issues and overflow) and positive values.
161-166: LGTM! Clear error messages for oversized entries.The error handling provides actionable feedback with the actual size vs. limit, helping users understand why splitting failed and what root entry is causing the issue.
Also applies to: 180-185
191-200: LGTM! Correct finalization of remaining content.The final block ensures any accumulated content that didn't exceed the limit is properly added as the last part.
|
Thanks for the note! In the normal flow, processedFiles should indeed be a subset of allFilePaths (after filtering/security checks). The branch that creates a group when a processedFile.path isn’t present in allFilePaths is meant as a defensive fallback to avoid dropping content in unexpected edge cases (e.g., mismatches introduced by upstream filtering or special execution modes). In practice we don’t expect it to trigger, but it keeps the splitter robust. |
There was a problem hiding this comment.
Pull request overview
This PR adds a --split-output option that enables splitting large packed output into multiple numbered files with configurable maximum size per part, without splitting files across parts. This is useful when the output exceeds tooling or AI context window limits.
Key changes:
- Adds size parsing utility to convert human-readable sizes (e.g., "20mb") to bytes
- Implements split output logic that groups files by root directory and distributes them across parts
- Updates configuration schema, CLI options, and metrics calculation to support split output
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
src/shared/sizeParse.ts |
New utility to parse human-readable size strings (kb/mb) to bytes |
src/core/output/outputSplit.ts |
Core logic for splitting output by grouping files by root entry and respecting size limits |
src/core/packager.ts |
Integrates split output generation and adds validation checks |
src/core/metrics/calculateMetrics.ts |
Updates to handle both single output and array of output parts for token counting |
src/cli/cliRun.ts |
Adds --split-output CLI option with size parser |
src/cli/actions/defaultAction.ts |
Adds validation checks and maps CLI option to config |
src/cli/cliReport.ts |
Updates summary output to display first and last file when multiple parts exist |
src/config/configSchema.ts |
Adds splitOutput field to configuration schema |
src/cli/types.ts |
Adds splitOutput to CLI options type |
tests/shared/sizeParse.test.ts |
Tests for size parsing utility |
tests/core/output/outputSplit.test.ts |
Tests for split output helper functions |
tests/core/packager/splitOutput.test.ts |
Integration test for split output packager behavior |
tests/core/packager.test.ts |
Adds missing fields to mock validation result |
tests/core/metrics/calculateMetrics.test.ts |
Updates mock to use async arrow function |
tests/cli/actions/defaultAction.buildCliConfig.test.ts |
Tests CLI config mapping for splitOutput option |
README.md |
Documents new --split-output CLI option with examples |
website/client/src/en/guide/usage.md |
Adds usage example for split output feature |
website/client/src/en/guide/configuration.md |
Documents splitOutput configuration option |
website/client/src/en/guide/command-line-options.md |
Lists --split-output in command reference |
Comments suppressed due to low confidence (1)
tests/core/metrics/calculateMetrics.test.ts:82
- Missing test coverage for the new functionality where calculateMetrics accepts an array of strings (for split output). Currently only the single string case is tested. Add a test case that passes an array of output parts to ensure the token counting and character counting logic works correctly for split outputs.
describe('calculateMetrics', () => {
it('should calculate metrics and return the result', async () => {
const processedFiles: ProcessedFile[] = [
{ path: 'file1.txt', content: 'a'.repeat(100) },
{ path: 'file2.txt', content: 'b'.repeat(200) },
];
const output = 'a'.repeat(300);
const progressCallback: RepomixProgressCallback = vi.fn();
const fileMetrics = [
{ path: 'file1.txt', charCount: 100, tokenCount: 10 },
{ path: 'file2.txt', charCount: 200, tokenCount: 20 },
];
(calculateSelectiveFileMetrics as unknown as Mock).mockResolvedValue(fileMetrics);
const aggregatedResult = {
totalFiles: 2,
totalCharacters: 300,
totalTokens: 30,
fileCharCounts: {
'file1.txt': 100,
'file2.txt': 200,
},
fileTokenCounts: {
'file1.txt': 10,
'file2.txt': 20,
},
gitDiffTokenCount: 0,
gitLogTokenCount: 0,
};
const config = createMockConfig();
const gitDiffResult: GitDiffResult | undefined = undefined;
const mockTaskRunner = {
run: vi.fn(),
cleanup: vi.fn(),
};
const result = await calculateMetrics(processedFiles, output, progressCallback, config, gitDiffResult, undefined, {
calculateSelectiveFileMetrics,
calculateOutputMetrics: async () => 30,
calculateGitDiffMetrics: () => Promise.resolve(0),
calculateGitLogMetrics: () => Promise.resolve({ gitLogTokenCount: 0 }),
taskRunner: mockTaskRunner,
});
expect(progressCallback).toHaveBeenCalledWith('Calculating metrics...');
expect(calculateSelectiveFileMetrics).toHaveBeenCalledWith(
processedFiles,
['file2.txt', 'file1.txt'], // sorted by character count desc
'o200k_base',
progressCallback,
expect.objectContaining({
taskRunner: expect.any(Object),
}),
);
expect(result).toEqual(aggregatedResult);
});
});
|
Hi @Dango233 ! |
b435269 to
14b8693
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1013 +/- ##
==========================================
+ Coverage 90.24% 90.42% +0.18%
==========================================
Files 120 123 +3
Lines 9017 9430 +413
Branches 1623 1699 +76
==========================================
+ Hits 8137 8527 +390
- Misses 880 903 +23 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Adds a size-based output splitter via --split-output (kb/mb) and writes numbered parts without splitting within a top-level folder. Also updates metrics aggregation for multi-part output and adds unit tests.
Add the new split output flag to README and website docs, including examples and the config option.
Extract conflicting options validation into a dedicated function that uses a data-driven approach. This makes it easier to add new conflicts and reduces code duplication. - Add validateConflictingOptions function with conflict definitions - Support --split-output, --skill-generate, --stdout, --copy conflicts - Handle --output "-" as stdout mode
Document --split-output feature with: - Use case (AI Studio's 1MB file size limit) - Usage example and file naming convention - Note about directory-based grouping
Add detailed explanation with: - Use case (AI Studio's 1MB file size limit) - File naming convention - Note about directory-based grouping
Add `repomix-output.*` to ignore numbered output files generated by --split-output option (e.g., repomix-output.1.xml).
- Use buildSplitOutputFilePath for consistent file path generation in calculateMetrics.ts (fixes repomix-output.xml.part-1 → repomix-output.1.xml) - Remove duplicate validation in packager.ts (already handled by validateConflictingOptions in defaultAction.ts) - Add overflow check for large size values in sizeParse.ts
- Add test for sizeParse overflow case - Use RepomixProgressCallback type in outputSplit.ts for consistency - Improve configuration.md description for splitOutput option
Cover error scenarios: - Single root entry exceeding maxBytesPerPart - Invalid maxBytesPerPart values (0, negative) - Empty files array returning empty result
Move split/single output generation and writing logic to packager/produceOutput.ts to keep packager.ts focused on the high-level orchestration flow. - Create produceOutput module handling both output modes - Simplify packager.ts from 227 to 181 lines - Update related tests to use new dependency structure
Cover both single and split output modes: - Single output: generate, write, clipboard - Split output: multiple file writes, no clipboard - Git diff/log passthrough - Progress callback invocations
- Move makeChunkConfig and renderGroups to module level for better readability - Add GenerateOutputFn type alias using typeof generateOutput - Add comment explaining O(N²) complexity and why it's acceptable - Fix test mock property names to match actual GitDiffResult/GitLogResult types - Update integration tests to use produceOutput instead of individual functions
Allow decimal size values like '2.5mb' or '1.5kb' in parseHumanSizeToBytes. This enables more flexible size configuration for split output.
Update documentation to reflect support for decimal values like 1.5mb in size specifications.
Add module-level caching for compiled Handlebars templates to avoid recompilation on every generateOutput call. This improves performance especially for split output mode where templates are rendered multiple times.
Update help text to show that decimal values like 2.5mb are supported.
Show which root entry is being evaluated during split output generation. The evaluating info is displayed in dimmed color for better readability.
Add module-level caching for git operations to avoid repeated expensive git commands during split output generation. This significantly improves performance when generateOutput is called multiple times (O(N²) in split output mode). - Add fileChangeCountsCache for git log results (~133ms savings per call) - Add gitAvailabilityCache for git installation/folder checks (~65ms savings) - Extract sortFilesByChangeCounts helper function - Add test for cache behavior verification
14b8693 to
5f8be42
Compare
Add comprehensive tests for generateSplitOutputParts: - Test successful splitting into multiple parts when content exceeds limit - Test git diff/log is only included in first part Also optimize byteLength recalculation by tracking currentBytes variable to avoid redundant getUtf8ByteLength() calls.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/integration-tests/packager.test.ts (1)
159-161: MissingsuspiciousGitLogResultsin metrics mock return.The
calculateMetricsmock returnssuspiciousFilesResultsandsuspiciousGitDiffResultsbut omitssuspiciousGitLogResults. While this may not cause test failures currently, it could lead to issues if the code starts relying on this field.🔎 Proposed fix
suspiciousFilesResults: [], suspiciousGitDiffResults: [], + suspiciousGitLogResults: [], };
🧹 Nitpick comments (6)
.gitignore (1)
26-31: Consider removing redundant ignore patterns.The new
repomix-output.*pattern on line 31 is a superset that already covers all the specific patterns on lines 27-30 (repomix-output.txt,.xml,.md,.json). You could simplify this section by keeping only the wildcard pattern.🔎 Proposed simplification
# Repomix output -repomix-output.txt -repomix-output.xml -repomix-output.md -repomix-output.json repomix-output.*tests/core/output/outputSplit.test.ts (1)
185-258: Consider simplifying the git diff/log test assertions.The test correctly verifies that git diff/log data is only passed to the first part, but the
isFirstPartdetection logic (line 211) relies on checkingconfig.output.git?.includeDiffs !== false, which couples the test to implementation details. A simpler approach would be to track calls by order and assert on the first call having git data while subsequent calls do not.That said, the test does achieve its goal of verifying the behavior.
src/core/packager/produceOutput.ts (2)
63-110: Clipboard copy is intentionally disabled for split output mode.The
generateAndWriteSplitOutputfunction writes multiple files but does not callcopyToClipboardIfEnabled. This is likely intentional since copying multiple large files to clipboard would be impractical, but consider logging a message or documenting this behavior if users havecopy: truein their config.
92-103: Consider adding error handling for partial write failures.If writing one part fails mid-way, previously written parts remain on disk while later parts are missing. This could leave the output in an inconsistent state. Consider either wrapping in a transaction-like pattern (write to temp files, then rename) or at minimum logging which parts succeeded.
🔎 Example approach with cleanup on failure
await withMemoryLogging('Write Split Output', async () => { + const writtenFiles: string[] = []; + try { for (const part of parts) { const partConfig = { ...config, output: { ...config.output, stdout: false, filePath: part.filePath, }, }; // eslint-disable-next-line no-await-in-loop await deps.writeOutputToDisk(part.content, partConfig); + writtenFiles.push(part.filePath); } + } catch (error) { + // Log partial success for debugging + if (writtenFiles.length > 0) { + console.warn(`Partial write: ${writtenFiles.length}/${parts.length} parts written before failure`); + } + throw error; + } });src/core/output/outputSplit.ts (2)
33-61: Defensive fallback for processedFiles not in allFilePaths.Per PR discussion, the else branch (lines 51-57) is intentionally defensive to handle edge cases where a
processedFile.pathmight not exist inallFilePaths. While this shouldn't occur in normal operation, it prevents silent data loss. Consider adding a debug log when this fallback triggers to aid troubleshooting.🔎 Optional: Add debug logging for the fallback case
for (const processedFile of processedFiles) { const rootEntry = getRootEntry(processedFile.path); const existing = groupsByRootEntry.get(rootEntry); if (existing) { existing.processedFiles.push(processedFile); } else { + // Defensive fallback: processedFile.path not found in allFilePaths + // This is unexpected but handled to avoid data loss groupsByRootEntry.set(rootEntry, { rootEntry, processedFiles: [processedFile], allFilePaths: [processedFile.path], }); } }
95-116: Consider documenting the git data restriction for non-first parts.Lines 113-114 pass
undefinedforgitDiffResultandgitLogResultwhenpartIndex !== 1. This is consistent withmakeChunkConfig, but the duplication of this logic could lead to drift. Consider consolidating or adding a comment linking them.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (27)
.gitignore(1 hunks)README.md(3 hunks)src/cli/actions/defaultAction.ts(3 hunks)src/cli/cliReport.ts(1 hunks)src/cli/cliRun.ts(2 hunks)src/cli/types.ts(1 hunks)src/config/configSchema.ts(2 hunks)src/core/metrics/calculateMetrics.ts(4 hunks)src/core/output/outputGenerate.ts(2 hunks)src/core/output/outputSort.ts(1 hunks)src/core/output/outputSplit.ts(1 hunks)src/core/packager.ts(4 hunks)src/core/packager/produceOutput.ts(1 hunks)src/shared/sizeParse.ts(1 hunks)tests/cli/actions/defaultAction.buildCliConfig.test.ts(1 hunks)tests/core/metrics/calculateMetrics.test.ts(1 hunks)tests/core/output/outputSort.test.ts(6 hunks)tests/core/output/outputSplit.test.ts(1 hunks)tests/core/packager.test.ts(4 hunks)tests/core/packager/diffsFunctionality.test.ts(4 hunks)tests/core/packager/produceOutput.test.ts(1 hunks)tests/core/packager/splitOutput.test.ts(1 hunks)tests/integration-tests/packager.test.ts(2 hunks)tests/shared/sizeParse.test.ts(1 hunks)website/client/src/en/guide/command-line-options.md(2 hunks)website/client/src/en/guide/configuration.md(1 hunks)website/client/src/en/guide/usage.md(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (11)
- src/cli/types.ts
- src/config/configSchema.ts
- tests/cli/actions/defaultAction.buildCliConfig.test.ts
- src/shared/sizeParse.ts
- tests/core/metrics/calculateMetrics.test.ts
- website/client/src/en/guide/configuration.md
- src/cli/cliReport.ts
- src/cli/cliRun.ts
- tests/core/packager/splitOutput.test.ts
- website/client/src/en/guide/usage.md
- tests/shared/sizeParse.test.ts
🧰 Additional context used
🧬 Code graph analysis (8)
src/cli/actions/defaultAction.ts (2)
src/config/configSchema.ts (1)
RepomixConfigMerged(160-160)src/shared/errorHandle.ts (1)
RepomixError(6-11)
tests/core/output/outputSort.test.ts (1)
src/core/output/outputSort.ts (1)
sortOutputFiles(95-115)
src/core/output/outputSort.ts (2)
src/core/git/gitRepositoryHandle.ts (2)
getFileChangeCount(4-25)isGitInstalled(41-53)src/config/configSchema.ts (1)
RepomixConfigMerged(160-160)
tests/core/output/outputSplit.test.ts (1)
src/core/output/outputSplit.ts (4)
getRootEntry(27-31)buildOutputSplitGroups(33-61)buildSplitOutputFilePath(63-70)generateSplitOutputParts(118-238)
tests/core/packager/produceOutput.test.ts (1)
src/core/packager/produceOutput.ts (1)
produceOutput(23-61)
src/core/metrics/calculateMetrics.ts (1)
src/core/output/outputSplit.ts (1)
buildSplitOutputFilePath(63-70)
src/core/packager.ts (1)
src/shared/memoryUtils.ts (1)
withMemoryLogging(71-87)
src/core/output/outputSplit.ts (6)
src/core/output/outputGenerate.ts (1)
generateOutput(253-294)src/config/configSchema.ts (1)
RepomixConfigMerged(160-160)src/core/git/gitDiffHandle.ts (1)
GitDiffResult(7-10)src/core/git/gitLogHandle.ts (1)
GitLogResult(21-24)src/index.ts (1)
RepomixProgressCallback(40-40)src/shared/errorHandle.ts (1)
RepomixError(6-11)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build and run (windows-latest, 24.x)
- GitHub Check: Test (windows-latest, 24.x)
- GitHub Check: Test (windows-latest, 20.x)
- GitHub Check: Test (windows-latest, 22.x)
🔇 Additional comments (38)
src/core/output/outputGenerate.ts (2)
26-53: LGTM! Smart optimization for split-output feature.The template compilation cache is well-implemented and directly supports the performance needs of the new
--split-outputfeature, where the same templates may be compiled repeatedly for each output part. The cache is bounded (max 3 entries: xml, markdown, plain), memory-safe, and transparent to callers.
223-223: LGTM! Clean refactoring.The refactored template usage is cleaner and leverages the cache effectively. Error handling is preserved correctly.
src/core/output/outputSort.ts (4)
8-17: LGTM! Module-level caching is appropriate for CLI context.The caching strategy correctly avoids repeated git operations within a single CLI invocation. The cache key format
${cwd}:${maxCommits}properly differentiates configurations.
28-61: LGTM! Graceful fallback when git is unavailable or fails.The function properly handles all failure cases by returning
null, allowing the caller to fall back to the original file order.
67-92: LGTM! Correct handling of boolean cache values.The
cached !== undefinedcheck properly distinguishes between "not cached" and "cached as false", ensuring negative results are also cached.
94-124: LGTM! Clean refactor with dependency injection for testability.The separation of concerns with
sortFilesByChangeCountsand the default deps pattern enables easy testing while maintaining production behavior.tests/core/output/outputSort.test.ts (2)
11-14: LGTM! Proper test isolation for module-level caches.Using
vi.resetModules()inbeforeEachensures each test starts with a fresh module instance, preventing cache state from leaking between tests.
126-160: LGTM! Comprehensive caching verification test.The test correctly validates that
getFileChangeCountandisGitInstalledare called only once across multiplesortOutputFilesinvocations with the same config, confirming the caching behavior works as intended.website/client/src/en/guide/command-line-options.md (2)
29-29: LGTM! Clear and concise option documentation.The description effectively conveys the purpose, file naming convention, and supported size formats.
80-82: LGTM! Helpful example added.The example demonstrates practical usage with a realistic size value.
README.md (2)
629-629: LGTM! Consistent CLI option documentation.The description matches the website documentation and clearly explains the feature.
844-861: LGTM! Well-documented new feature section.The section provides clear context (AI tool file size limits), practical examples, supported formats, and important behavioral notes about file grouping. The
[!NOTE]callout effectively highlights the non-splitting guarantee.tests/core/packager/diffsFunctionality.test.ts (1)
63-65: LGTM!The mock correctly returns
{ outputForMetrics: string }matching theProduceOutputResultinterface. The refactoring from the previous trio of mocks to a singleproduceOutputmock aligns well with the unified output pathway introduced in this PR.Also applies to: 112-114
src/cli/actions/defaultAction.ts (2)
269-274: LGTM!The
splitOutputoption is correctly propagated to the CLI config, following the same pattern as other optional output settings. The conditional assignment ensures config file values are respected when the CLI option is not specified.
370-411: Clean validation design with declarative conflict rules.The centralized conflict validation is well-structured and extensible. The declarative
conflictsarray makes it easy to add new incompatible option pairs in the future. Error messages are clear and explain why the options conflict.tests/core/packager.test.ts (2)
51-56: LGTM!The mock return values correctly reflect the expanded
validateFileSafetyresult interface withsuspiciousGitDiffResultsandsuspiciousGitLogResults, and the newproduceOutputsurface returning{ outputForMetrics }.
93-101: LGTM!The expectation correctly verifies
produceOutputis called with the expected arguments in the right order:rootDirs,config,processedFiles,allFilePaths,gitDiffResult,gitLogResult, andprogressCallback.tests/core/output/outputSplit.test.ts (3)
10-19: LGTM!Good coverage for
getRootEntrywith both nested paths and root-level files. The tests verify the core grouping logic that prevents intra-folder splitting.
21-39: LGTM!The test correctly verifies that files are grouped by their root entry and that
allFilePathsincludes files not present inprocessedFiles(likesrc/ignored.txt), which is important for maintaining accurate file tree structure in split outputs.
68-85: Good error handling coverage.This test addresses the critical error path when a single root entry exceeds
maxBytesPerPart, ensuring users get a clear error message about which root entry is too large.tests/integration-tests/packager.test.ts (1)
130-130: LGTM!The integration test correctly uses the real
produceOutputfunction, which aligns with the purpose of integration testing the full flow rather than mocking internal details.tests/core/packager/produceOutput.test.ts (2)
12-45: LGTM!Good coverage of the single output mode: verifies
generateOutput,writeOutputToDisk, andcopyToClipboardIfEnabledare called correctly, and that the result containsoutputForMetricswithoutoutputFiles.
78-143: LGTM!Good coverage of split output mode including:
- Verifying multiple output files are written
- Confirming
outputFilesis defined in the result- Ensuring
outputForMetricsis an array for multi-part outputs- Verifying
copyToClipboardIfEnabledis not called in split modesrc/core/packager.ts (3)
12-12: LGTM!Clean refactoring introducing
produceOutputas the unified output pathway. The optionaloutputFilesfield inPackResultcorrectly signals when split output is used.Also applies to: 25-25, 39-39
150-159: LGTM!The unified
produceOutputcall cleanly replaces the previous separate steps for generation, disk writing, and clipboard copying. The destructuring of{ outputFiles, outputForMetrics }correctly captures both potential outputs.
166-175: LGTM!The conditional spread
...(outputFiles && { outputFiles })correctly includesoutputFilesin the result only when it's defined (split output mode), keeping the result clean for single-output scenarios.src/core/packager/produceOutput.ts (3)
12-15: LGTM! Clean interface design.The
ProduceOutputResultinterface appropriately distinguishes between the optionaloutputFilesarray (only populated in split mode) and the unifiedoutputForMetricsproperty that supports both single and multi-part outputs.
17-21: LGTM! Well-structured dependency injection pattern.The
defaultDepsobject withoverrideDepsmerge provides clean testability. The conditional branching onsplitMaxBytes !== undefinedcorrectly routes to the appropriate output generation path.Also applies to: 23-61
112-134: LGTM! Clean single-output flow.The single output path correctly generates, writes, and optionally copies to clipboard. The
withMemoryLogginginstrumentation is consistent with the split path.src/core/metrics/calculateMetrics.ts (3)
7-7: Previous review concern addressed.The code now correctly uses
buildSplitOutputFilePathfrom the shared module, ensuring consistent file path generation across the codebase. The conditional on line 78-80 appropriately handles both single-output (uses base path) and multi-part (uses numbered path) scenarios.Also applies to: 75-83
51-51: LGTM! Clean normalization and aggregation.Normalizing
outputtooutputPartsarray on line 51 allows uniform processing. The aggregation oftotalTokensandtotalCharactersviareducecorrectly sums across all parts.Also applies to: 88-90
67-86: Concurrent task submission is safe with the shared taskRunner.The
taskRunnerwraps Tinypool, which is purpose-built to handle concurrent task submissions. The singletaskRunnerinstance shared across all metrics calculations inPromise.allis the intended design pattern, and Tinypool internally manages task queuing and worker thread allocation safely. No modifications needed.src/core/output/outputSplit.ts (6)
63-70: LGTM! Clean file path construction.
buildSplitOutputFilePathcorrectly handles both extensionless files and files with extensions, inserting the part index appropriately.
74-93: LGTM! Correctly disables git content for non-first parts.The
makeChunkConfighelper ensures git diffs and logs are only included in the first part, avoiding duplication across split files. This is a sensible optimization.
141-143: Good input validation.Checking for safe integer and positive value prevents invalid configurations from causing unexpected behavior downstream.
155-163: Excellent documentation of the O(N²) trade-off.The comment clearly explains why the algorithm re-renders for each group (non-linear output size due to templates and file trees) and acknowledges the performance implications for large repositories. This addresses the past review concern with transparent reasoning.
164-224: Core splitting algorithm is sound.The iterative approach correctly:
- Accumulates groups until the next would exceed the limit
- Finalizes the current part and starts fresh with the overflowing group
- Throws a clear error if a single group exceeds the limit
- Re-renders the single group for the new part to get accurate byte count
The only minor concern is duplicate rendering when a group doesn't fit (lines 168-176 then 204-212), but this is necessary for correctness given the non-linear size behavior.
226-237: LGTM! Final part emission.Correctly handles the remaining groups after the loop completes, ensuring no data is lost.
Add test coverage for edge cases as suggested by CodeRabbit: - Empty string input - Paths with leading separators (/absolute/path.ts) - Single separator (/)
|
@Dango233 Everything works well, so I'll merge this now! |
Add --split-output feature documentation to 12 non-English languages: - German (de), Spanish (es), French (fr) - Hindi (hi), Indonesian (id), Italian (it) - Japanese (ja), Korean (ko), Portuguese-BR (pt-br) - Russian (ru), Vietnamese (vi) - Chinese Simplified (zh-cn), Chinese Traditional (zh-tw) Each language includes updates to: - command-line-options.md: --split-output CLI option - configuration.md: output.splitOutput config option - usage.md: New "Split Output" section with examples Based on PR #1013 changes to English documentation.
Add --split-output feature documentation to 12 non-English languages: - German (de), Spanish (es), French (fr) - Hindi (hi), Indonesian (id), Italian (it) - Japanese (ja), Korean (ko), Portuguese-BR (pt-br) - Russian (ru), Vietnamese (vi) - Chinese Simplified (zh-cn), Chinese Traditional (zh-tw) Each language includes updates to: - command-line-options.md: --split-output CLI option - configuration.md: output.splitOutput config option - usage.md: New "Split Output" section with examples Based on PR #1013 changes to English documentation.
Add --split-output feature documentation to 12 non-English languages: - German (de), Spanish (es), French (fr) - Hindi (hi), Indonesian (id), Italian (it) - Japanese (ja), Korean (ko), Portuguese-BR (pt-br) - Russian (ru), Vietnamese (vi) - Chinese Simplified (zh-cn), Chinese Traditional (zh-tw) Each language includes updates to: - command-line-options.md: --split-output CLI option - configuration.md: output.splitOutput config option - usage.md: New "Split Output" section with examples Based on PR #1013 changes to English documentation.
Add --split-output feature documentation to 12 non-English languages: - German (de), Spanish (es), French (fr) - Hindi (hi), Indonesian (id), Italian (it) - Japanese (ja), Korean (ko), Portuguese-BR (pt-br) - Russian (ru), Vietnamese (vi) - Chinese Simplified (zh-cn), Chinese Traditional (zh-tw) Each language includes updates to: - command-line-options.md: --split-output CLI option - configuration.md: output.splitOutput config option - usage.md: New "Split Output" section with examples Based on PR #1013 changes to English documentation.
|
@Dango233 Thank you for the contribution! |
|
Thank you!! ;)2025年12月23日 22:56,Kazuki Yamada ***@***.***>写道:yamadashy left a comment (yamadashy/repomix#1013)
@Dango233
Released in v1.11.0! 🎉
https://github.com/yamadashy/repomix/releases/tag/v1.11.0
Thank you for the contribution!
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Adds
--split-output <size>to split packed output into multiple numbered files with a maximum size per part, without splitting files across parts.Documentation updates:
Checklist
npm run testnpm run lint