feat(pack): Performance Optimization for Large Repositories#309
feat(pack): Performance Optimization for Large Repositories#309
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #309 +/- ##
==========================================
- Coverage 92.17% 90.12% -2.05%
==========================================
Files 44 48 +4
Lines 2248 2471 +223
Branches 493 514 +21
==========================================
+ Hits 2072 2227 +155
- Misses 176 244 +68 ☔ View full report in Codecov by Sentry. |
📝 WalkthroughWalkthroughThis pull request introduces significant changes to the Repomix project, focusing on enhancing concurrency and modularity through the integration of the Piscina library for worker thread management. Key updates include the addition of a new dependency, Sequence DiagramsequenceDiagram
participant CLI as CLI Entrypoint
participant Packager as Core Packager
participant FileCollector as File Collector
participant FileProcessor as File Processor
participant SecurityCheck as Security Checker
participant MetricsCalculator as Metrics Calculator
participant WorkerPool as Piscina Worker Pool
CLI->>Packager: Initiate pack process
Packager->>FileCollector: Collect files with progress callback
FileCollector->>WorkerPool: Create task for file collection
WorkerPool-->>FileCollector: Return collected files
Packager->>SecurityCheck: Validate file safety
SecurityCheck->>WorkerPool: Create security check tasks
WorkerPool-->>SecurityCheck: Return security check results
Packager->>FileProcessor: Process files
FileProcessor->>WorkerPool: Create file processing tasks
WorkerPool-->>FileProcessor: Return processed files
Packager->>MetricsCalculator: Calculate metrics
MetricsCalculator->>WorkerPool: Create metrics calculation tasks
WorkerPool-->>MetricsCalculator: Return file and output metrics
Possibly related PRs
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (26)
tests/core/metrics/calculateMetrics.test.ts (2)
50-53: Enhance test coverage for the new dependency injection pattern.While the dependency injection approach aligns well with the multithreading objectives, the test coverage could be improved:
- Add expectations to verify that
calculateOutputMetricsis called with the correct parameters- Document why
calculateOutputMetricsreturns 30 (appears to be related toaggregatedResult.totalTokens)- Consider testing error scenarios
Here's a suggested improvement:
const result = await calculateMetrics(processedFiles, output, progressCallback, config, { calculateAllFileMetrics, - calculateOutputMetrics: () => Promise.resolve(30), + calculateOutputMetrics: vi.fn().mockResolvedValue(30), // Mock using vi.fn() for better test control }); expect(progressCallback).toHaveBeenCalledWith('Calculating metrics...'); + expect(calculateOutputMetrics).toHaveBeenCalledWith(output, 'o200k_base'); expect(calculateAllFileMetrics).toHaveBeenCalledWith(processedFiles, 'o200k_base', progressCallback);
56-56: Document the meaning of 'o200k_base' and consider extracting it as a constant.The hardcoded string 'o200k_base' appears to be a model identifier, but its purpose and relationship to the test scenario isn't clear. This could impact test maintainability.
Consider these improvements:
+// At the top of the file with other imports +import { MODEL_IDENTIFIERS } from '../../../src/core/constants.js'; + describe('calculateMetrics', () => { it('should calculate metrics and return the result', async () => { // ... existing test setup ... - expect(calculateAllFileMetrics).toHaveBeenCalledWith(processedFiles, 'o200k_base', progressCallback); + expect(calculateAllFileMetrics).toHaveBeenCalledWith( + processedFiles, + MODEL_IDENTIFIERS.DEFAULT, // Use a named constant for better maintainability + progressCallback + );Also, consider adding a comment explaining why this specific model is used in this test scenario.
src/shared/processConcurrency.ts (1)
9-21: Refine maximum thread calculation.
Using a strict limit ofMath.ceil(numOfTasks / 100)might underutilize concurrency for mid-sized tasks. For example, ifnumOfTasksis just over a multiple of 100, you might be creating too few threads. Revisit whether you need a more flexible upper bound for peak performance.src/core/file/fileProcess.ts (1)
22-31: Constructing FileProcessTask objects is concise.
Your use of inline satisfies casting clarifies the intended structure, though consider adding validation forrawFileandconfigin worker tasks for robustness.src/core/file/fileCollect.ts (2)
8-11: Implement test coverage for worker initialization.These lines show the creation of the task runner via
initPiscina, but static analysis suggests no test scenario covers this code path. Adding direct or indirect tests to ensure worker pool initialization occurs as expected would reinforce the reliability of this concurrency layer.I can draft a test that initializes the task runner with a controlled number of tasks and verifies resource cleanup.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 9-11: src/core/file/fileCollect.ts#L9-L11
Added lines #L9 - L11 were not covered by tests
31-46: Parallel execution looks good; watch out for resource usage.The
Promise.allapproach efficiently spins up tasks concurrently. However, for large file lists, concurrency could spike system resource usage. Consider a concurrency limit if scaling becomes a concern.src/core/security/securityCheck.ts (1)
25-31: Consider cleansing or truncating large file contents.Some security checks might not need the entire file content. If very large files appear, it might degrade performance or memory usage. Optionally consider partial reads or chunked checks.
tests/core/security/securityCheck.test.ts (2)
26-30: Consider concurrency in the task runner.
mockInitTaskRunnercallssecurityCheckWorkerserially. If concurrency is desired for stress testing, consider spawning multiple tasks or using a parallel test approach.
58-72: Good error-handling coverage.
This test confirms the function properly reports worker errors. Consider adding a partial-success scenario where one file fails but others succeed for more comprehensive coverage.tests/integration-tests/packager.test.ts (1)
26-30: Mocked file collector initialization.
Similar to other mocked task runners, this approach is straightforward and ensures a predictable environment. Evaluate whether a parallel or concurrent strategy is needed here as well.src/core/metrics/workers/types.ts (1)
1-5: Well-definedFileMetricsinterface.
This is a straightforward and necessary definition. If these fields are never modified after creation, consider marking them as readonly for stronger immutability guarantees.src/core/security/validateFileSafety.ts (1)
18-23: Consider adding error handling for security checks.While the conditional execution is good, consider adding try-catch block around the security check to handle potential failures gracefully.
if (config.security.enableSecurityCheck) { progressCallback('Running security check...'); + try { suspiciousFilesResults = await deps.runSecurityCheck(rawFiles, progressCallback); + } catch (error) { + logger.error('Security check failed:', error); + throw new Error('Security check failed. Please try again or disable security checks.'); + } }tests/core/metrics/calculateAllFileMetrics.test.ts (2)
12-16: Enhance mock task runner test coverage.Consider adding tests for:
- Error scenarios
- Edge cases with empty files
- Concurrent task execution
const mockInitTaskRunner = (numOfTasks: number) => { return async (task: FileMetricsTask) => { + if (!task.content) { + throw new Error('Empty content not supported'); + } return await fileMetricsWorker(task); }; };
26-28: Add test cases for different encodings.The test only covers 'o200k_base' encoding. Consider adding test cases for other supported encodings to ensure compatibility.
src/core/security/workers/securityCheckWorker.ts (1)
30-41: Improve file extension extraction logic.The current implementation of getting file extension could be more robust.
const result = await lintSource({ source: { filePath: filePath, content: content, - ext: filePath.split('.').pop() || '', + ext: filePath.match(/\.([^.]+)$/)?.[1] || '', contentType: 'text', }, options: { config: config, }, });tests/shared/processConcurrency.test.ts (1)
20-59: Add test cases for edge scenarios in getWorkerThreadCount.The test suite should include additional edge cases:
- Negative task counts
- Non-integer task counts
- Fallback behavior when os.availableParallelism is unavailable
describe('getWorkerThreadCount', () => { + it('should handle negative task counts', () => { + const { minThreads, maxThreads } = getWorkerThreadCount(-5); + expect(minThreads).toBe(1); + expect(maxThreads).toBe(1); + }); + + it('should handle non-integer task counts', () => { + const { minThreads, maxThreads } = getWorkerThreadCount(3.7); + expect(minThreads).toBe(1); + expect(maxThreads).toBe(3); + }); + + it('should handle unavailable os.availableParallelism', () => { + vi.mocked(os).availableParallelism = undefined; + const { minThreads, maxThreads } = getWorkerThreadCount(100); + expect(minThreads).toBe(1); + expect(maxThreads).toBeLessThanOrEqual(8); + }); // existing tests... });src/core/file/workers/fileCollectWorker.ts (1)
35-44: Improve large file warning message.The warning message could be more actionable by including the exact file size and suggesting alternative approaches.
if (stats.size > MAX_FILE_SIZE) { const sizeMB = (stats.size / 1024 / 1024).toFixed(1); - logger.log(''); - logger.log('⚠️ Large File Warning:'); - logger.log('──────────────────────'); - logger.log(`File exceeds size limit: ${sizeMB}MB > ${MAX_FILE_SIZE / 1024 / 1024}MB (${filePath})`); - logger.note('Add this file to .repomixignore if you want to exclude it permanently'); - logger.log(''); + logger.warn( + `Large File Warning: ${filePath}\n` + + `Size: ${sizeMB}MB (Limit: ${MAX_FILE_SIZE / 1024 / 1024}MB)\n` + + 'Options:\n' + + '1. Add to .repomixignore to exclude permanently\n' + + '2. Process file in chunks using streaming\n' + + '3. Increase MAX_FILE_SIZE if needed' + );src/core/packager.ts (1)
Line range hint
41-58: Improve progress reporting granularity.Consider adding percentage-based progress updates for long-running operations.
progressCallback('Collecting files...'); - const rawFiles = await deps.collectFiles(filePaths, rootDir, progressCallback); + const rawFiles = await deps.collectFiles( + filePaths, + rootDir, + (status, progress) => progressCallback( + `Collecting files... ${progress ? `${Math.round(progress * 100)}%` : ''}` + ) + ); const { safeFilePaths, safeRawFiles, suspiciousFilesResults } = await deps.validateFileSafety( rawFiles, - progressCallback, + (status, progress) => progressCallback( + `Validating files... ${progress ? `${Math.round(progress * 100)}%` : ''}` + ), config, );tests/core/security/workers/securityCheckWorker.test.ts (1)
51-76: Add test cases for edge scenarios.The test suite should include additional scenarios:
- Empty content
- Very large content
- Mixed valid and invalid content
- Malformed sensitive data
+ test('should handle empty content', async () => { + const result = await runSecretLint('empty.md', '', config); + expect(result).toBeNull(); + }); + + test('should handle very large content', async () => { + const largeContent = 'a'.repeat(1024 * 1024); // 1MB + const result = await runSecretLint('large.md', largeContent, config); + expect(result).toBeNull(); + }); + + test('should detect sensitive info in mixed content', async () => { + const mixedContent = ` + # Mixed Content + Normal text here + FAKE_AWS_KEY=FAKE000TOKEN111 + More normal text + `; + const result = await runSecretLint('mixed.md', mixedContent, config); + expect(result).not.toBeNull(); + }); + + test('should handle malformed sensitive data', async () => { + const malformedContent = ` + # Malformed Content + ghp_incomplete_token + xoxb-invalid-format + `; + const result = await runSecretLint('malformed.md', malformedContent, config); + expect(result).toBeNull(); + });tests/core/metrics/calculateOutputMetrics.test.ts (2)
9-13: Add test coverage for worker pool initialization.The test suite should verify proper initialization and cleanup of worker pools. Consider adding test cases for:
- Worker pool initialization failures
- Concurrent task execution
- Worker pool cleanup after task completion
39-57: Enhance error handling test coverage.While the error handling test verifies basic error propagation, it should also test:
- Worker pool cleanup after errors
- Multiple concurrent worker failures
- Error propagation during worker initialization
tests/core/packager.test.ts (1)
75-75: Add test coverage for concurrent processing scenarios.The test should verify:
- Concurrent file collection and processing
- Progress tracking during parallel operations
- Worker pool cleanup after processing completion
tests/core/file/fileCollect.test.ts (2)
20-24: Add test coverage for worker pool management.The mockInitTaskRunner implementation should be enhanced to test:
- Worker pool creation and destruction
- Concurrent task execution limits
- Resource cleanup after task completion
100-102: Add test coverage for concurrent file processing.The large file test should be extended to verify:
- Concurrent file processing behavior
- Memory usage during parallel operations
- Progress callback accuracy during concurrent operations
tests/core/file/fileProcess.test.ts (1)
11-15: Consider enhancing the mock task runner implementation.The current implementation directly calls
fileProcessWorker, which might not fully simulate the concurrent nature of the worker pool. Consider adding artificial delays or parallel execution simulation for more realistic testing.const mockInitTaskRunner = (numOfTasks: number) => { return async (task: FileProcessTask) => { + // Simulate concurrent execution with artificial delay + await new Promise(resolve => setTimeout(resolve, Math.random() * 10)); return await fileProcessWorker(task); }; };tests/cli/actions/remoteAction.test.ts (1)
38-50: Consider adding edge cases to the mock result.While the mock implementation provides valid data, it could be enhanced to test more scenarios:
- Add files with varying sizes
- Include suspicious files
- Test different character counts
runDefaultAction: async () => { return { packResult: { - totalFiles: 1, - totalCharacters: 1, - totalTokens: 1, - fileCharCounts: {}, - fileTokenCounts: {}, + totalFiles: 3, + totalCharacters: 1000, + totalTokens: 200, + fileCharCounts: { + 'file1.js': 500, + 'file2.js': 300, + 'file3.js': 200 + }, + fileTokenCounts: { + 'file1.js': 100, + 'file2.js': 60, + 'file3.js': 40 + }, suspiciousFilesResults: [], }, config: createMockConfig(), } satisfies DefaultActionRunnerResult; },
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (41)
package.json(1 hunks)src/cli/actions/remoteAction.ts(2 hunks)src/config/configLoad.ts(0 hunks)src/core/file/fileCollect.ts(1 hunks)src/core/file/fileProcess.ts(1 hunks)src/core/file/workers/fileCollectWorker.ts(1 hunks)src/core/file/workers/fileProcessWorker.ts(1 hunks)src/core/metrics/aggregateMetrics.ts(0 hunks)src/core/metrics/calculateAllFileMetrics.ts(1 hunks)src/core/metrics/calculateIndividualFileMetrics.ts(0 hunks)src/core/metrics/calculateMetrics.ts(2 hunks)src/core/metrics/calculateOutputMetrics.ts(1 hunks)src/core/metrics/workers/fileMetricsWorker.ts(1 hunks)src/core/metrics/workers/outputMetricsWorker.ts(1 hunks)src/core/metrics/workers/types.ts(1 hunks)src/core/packager.ts(1 hunks)src/core/security/runSecurityCheckIfEnabled.ts(0 hunks)src/core/security/securityCheck.ts(1 hunks)src/core/security/validateFileSafety.ts(1 hunks)src/core/security/workers/securityCheckWorker.ts(1 hunks)src/shared/processConcurrency.ts(1 hunks)tests/cli/actions/remoteAction.test.ts(2 hunks)tests/cli/cliPrint.test.ts(1 hunks)tests/config/configSchema.test.ts(1 hunks)tests/core/file/fileCollect.test.ts(6 hunks)tests/core/file/fileProcess.test.ts(9 hunks)tests/core/metrics/aggregateMetrics.test.ts(0 hunks)tests/core/metrics/calculateAllFileMetrics.test.ts(1 hunks)tests/core/metrics/calculateIndividualFileMetrics.test.ts(0 hunks)tests/core/metrics/calculateMetrics.test.ts(1 hunks)tests/core/metrics/calculateOutputMetrics.test.ts(1 hunks)tests/core/packager.test.ts(1 hunks)tests/core/packager/copyToClipboardIfEnabled.test.ts(0 hunks)tests/core/security/runSecurityCheckIfEnabled.test.ts(0 hunks)tests/core/security/securityCheck.test.ts(1 hunks)tests/core/security/validateFileSafety.test.ts(1 hunks)tests/core/security/workers/securityCheckWorker.test.ts(1 hunks)tests/integration-tests/packager.test.ts(2 hunks)tests/shared/processConcurrency.test.ts(1 hunks)website/compose.yml(1 hunks)website/server/src/utils/cache.ts(1 hunks)
💤 Files with no reviewable changes (8)
- src/config/configLoad.ts
- src/core/metrics/aggregateMetrics.ts
- tests/core/metrics/calculateIndividualFileMetrics.test.ts
- tests/core/metrics/aggregateMetrics.test.ts
- src/core/security/runSecurityCheckIfEnabled.ts
- src/core/metrics/calculateIndividualFileMetrics.ts
- tests/core/security/runSecurityCheckIfEnabled.test.ts
- tests/core/packager/copyToClipboardIfEnabled.test.ts
✅ Files skipped from review due to trivial changes (4)
- tests/config/configSchema.test.ts
- website/compose.yml
- tests/cli/cliPrint.test.ts
- website/server/src/utils/cache.ts
🧰 Additional context used
🪛 Gitleaks (8.21.2)
tests/core/security/workers/securityCheckWorker.test.ts
23-23: Identified a Slack Legacy Workspace token, potentially compromising access to workspace data and legacy features.
(slack-legacy-workspace-token)
29-43: Identified a Private Key, which may compromise cryptographic security and sensitive data encryption.
(private-key)
24-24: Identified a Slack Bot token, which may compromise bot integrations and communication channel security.
(slack-bot-token)
16-16: Uncovered a GitHub Personal Access Token, potentially leading to unauthorized repository access and sensitive content exposure.
(github-pat)
🪛 GitHub Check: codecov/patch
src/core/metrics/workers/outputMetricsWorker.ts
[warning] 38-42: src/core/metrics/workers/outputMetricsWorker.ts#L38-L42
Added lines #L38 - L42 were not covered by tests
src/core/metrics/calculateOutputMetrics.ts
[warning] 7-9: src/core/metrics/calculateOutputMetrics.ts#L7-L9
Added lines #L7 - L9 were not covered by tests
src/core/metrics/workers/fileMetricsWorker.ts
[warning] 48-52: src/core/metrics/workers/fileMetricsWorker.ts#L48-L52
Added lines #L48 - L52 were not covered by tests
src/core/metrics/calculateAllFileMetrics.ts
[warning] 11-13: src/core/metrics/calculateAllFileMetrics.ts#L11-L13
Added lines #L11 - L13 were not covered by tests
[warning] 55-57: src/core/metrics/calculateAllFileMetrics.ts#L55-L57
Added lines #L55 - L57 were not covered by tests
src/core/security/securityCheck.ts
[warning] 14-16: src/core/security/securityCheck.ts#L14-L16
Added lines #L14 - L16 were not covered by tests
src/core/file/fileCollect.ts
[warning] 9-11: src/core/file/fileCollect.ts#L9-L11
Added lines #L9 - L11 were not covered by tests
[warning] 54-55: src/core/file/fileCollect.ts#L54-L55
Added lines #L54 - L55 were not covered by tests
src/core/security/workers/securityCheckWorker.ts
[warning] 25-27: src/core/security/workers/securityCheckWorker.ts#L25-L27
Added lines #L25 - L27 were not covered by tests
src/core/file/fileProcess.ts
[warning] 10-12: src/core/file/fileProcess.ts#L10-L12
Added lines #L10 - L12 were not covered by tests
src/shared/processConcurrency.ts
[warning] 24-24: src/shared/processConcurrency.ts#L24
Added line #L24 was not covered by tests
[warning] 26-28: src/shared/processConcurrency.ts#L26-L28
Added lines #L26 - L28 were not covered by tests
[warning] 30-35: src/shared/processConcurrency.ts#L30-L35
Added lines #L30 - L35 were not covered by tests
⏰ Context from checks skipped due to timeout of 90000ms (15)
- GitHub Check: Test (macos-latest, 22.x)
- GitHub Check: Test (macos-latest, 21.x)
- GitHub Check: Test (macos-latest, 20.x)
- GitHub Check: Test (macos-latest, 19.x)
- GitHub Check: Test (windows-latest, 19.x)
- GitHub Check: Test (windows-latest, 18.x)
- GitHub Check: Test (windows-latest, 18.0.0)
- GitHub Check: Build and run (windows-latest, 23.x)
- GitHub Check: Build and run (windows-latest, 22.x)
- GitHub Check: Build and run (windows-latest, 21.x)
- GitHub Check: Build and run (windows-latest, 20.x)
- GitHub Check: Test (ubuntu-latest, 19.x)
- GitHub Check: Test (ubuntu-latest, 18.x)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Cloudflare Pages
🔇 Additional comments (46)
src/shared/processConcurrency.ts (3)
2-3: Imports look good.
Adding Piscina and logger dependencies is aligned with your new concurrency approach.
5-7: Consider verifying older Node compatibility.
Your fallback toos.cpus().lengthis correct ifavailableParallelismis not available, but ensure Node versions below 19 are tested.
23-35: Add test coverage for worker pool initialization.
Lines 24, 26–28, and 30–35 are flagged by coverage checks. Consider adding tests that verify worker pool creation with variousnumOfTasksvalues to ensure correct thread counts and log outputs.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 24-24: src/shared/processConcurrency.ts#L24
Added line #L24 was not covered by tests
[warning] 26-28: src/shared/processConcurrency.ts#L26-L28
Added lines #L26 - L28 were not covered by tests
[warning] 30-35: src/shared/processConcurrency.ts#L30-L35
Added lines #L30 - L35 were not covered by testssrc/core/metrics/calculateMetrics.ts (5)
5-5: Optional import is consistent.
IncludingcalculateOutputMetricsis a logical extension of metrics.
20-23: Dependency injection pattern looks clean.
Passingdepswith defaults makes testing and future maintenance easier.
27-31: Parallelize metrics calculation for speed.
UsingPromise.allto calculate file metrics and output metrics in parallel is an effective way to reduce overall runtime.
35-40: File-level metric records are straightforward.
BuildingfileCharCountsandfileTokenCountsdirectly simplifies future data consumption.
42-48: Return object structure is solid.
Returning structured, aggregated metrics is clear and maintainable for downstream consumers.src/core/file/fileProcess.ts (5)
4-4: Switching to Piscina is appropriate for worker threads.
Replacing p-map with Piscina aligns with the PR’s multithreading goals, likely contributing to the performance gains noted in the PR summary.
9-12: Ensure test coverage for worker task runner.
Lines 10–12 are indicated as uncovered by codecov. A simple test verifying thatinitTaskRunnerproperly spawns workers would increase confidence in concurrency behavior.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 10-12: src/core/file/fileProcess.ts#L10-L12
Added lines #L10 - L12 were not covered by tests
18-20: Dependency injection fosters maintainability.
Allowing an overridableinitTaskRunnersimplifies testing and future concurrency improvements.
38-47: Good concurrency flow.
AwaitingPromise.allreturns ensures efficient parallel processing while providing a straightforward progress callback.
48-55: Timing calculation is a helpful diagnostic.
Logging execution duration is a valuable metric for benchmarking performance gains and regressions.src/core/file/fileCollect.ts (3)
13-20: Good use of dependency injection for flexibility.By injecting
initTaskRunnerthrough thedepsparameter, you allow for easier testing and more modular code. This design pattern reduces coupling and facilitates mocking in unit tests.
21-28: Ensure validated tasks or consider fallback.The code maps file paths directly into
FileCollectTaskobjects. If there's any possibility of malformed file paths or missing root directories, consider validating data or applying a graceful fallback here.
48-52: Neat performance logging.Capturing start and end timestamps helps measure performance gains accurately. This aligns well with your PR objective to highlight multithreading improvements.
src/core/security/securityCheck.ts (3)
21-23: Excellent use of dependency injection for concurrency.Similar to
fileCollect.ts, passing theinitTaskRunnerdependency simplifies testing and decouples concurrency logic from domain logic.
34-48: Concurrent processing with progress feedback is well structured.The approach of incrementing
completedTasksand providing a progress callback is both user-friendly and traceable. Great job integrating real-time feedback!
13-16: Add unit tests for initTaskRunner logic.The concurrency init code is functionally consistent with the rest of the worker-based approach. However, test coverage is missing. To maintain high reliability, consider adding tests that specifically initialize the runner and mock worker tasks.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 14-16: src/core/security/securityCheck.ts#L14-L16
Added lines #L14 - L16 were not covered by testssrc/core/metrics/calculateAllFileMetrics.ts (4)
10-13: Expose worker pool coverage.Similar to other modules, initialization of the worker pool is untested. Migrating concurrency logic to a separate function is a good step, but consider adding dedicated worker pool tests.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 11-13: src/core/metrics/calculateAllFileMetrics.ts#L11-L13
Added lines #L11 - L13 were not covered by tests
17-17: Dependency injection pattern is consistent.Defaulting to
initTaskRunneris a solid design choice, ensuring consistent concurrency logic and enabling easy stubbing or mocking during tests.Also applies to: 19-21
23-31: Watch out for indexing logic.The code uses
indexto track its position among total files. Ensure that externally or in the worker, no off-by-one logic creeps in, especially if zero-based indexing differs from user-facing progress labels (which might expect 1-based).
34-48: Parallel tasks with progress logging are well done.This approach consistently mirrors the concurrency patterns in other modules. The real-time progress feedback fosters a uniform user experience.
tests/core/security/securityCheck.test.ts (9)
3-9: Great use of imports and naming consistency.
These imports prepare the test environment cleanly, and switching torunSecurityCheckis well-aligned with the refactoring. No issues noticed.
11-12: Mocking the logger.
Mocking the logger as part of the test setup is appropriate to isolate test logs and prevent console noise during test runs.
13-24: Validate inclusion of real-like credentials in test.
The mock filetest1.jsintentionally includes an exposed credential. Verify that this is truly synthetic and won't risk leaking real data.
32-41: Security issue detection test.
This test appropriately validates that the function detects a sensitive string. Great work ensuring the length of messages is also checked.
43-56: Progress callback coverage.
Verifying the callback calls with correct arguments is thorough. This ensures the user sees expected progress logs.
75-81: Empty file list test.
Verifying an edge case of no files ensures robust behavior. No issues here.
83-90: Tracing performance metrics.
Logging performance details at trace level can be invaluable. This test effectively checks that logs are used.
92-104: Parallel processing timing test.
A strict < 1000 ms threshold could introduce flakiness on slower systems. Consider mocking timers or verifying concurrency with a different approach to reduce CI variability.
106-113: Preserving original file content.
Ensuring the files remain unchanged is crucial for data integrity. This test effectively confirms that.tests/integration-tests/packager.test.ts (2)
8-19: New imports for extended file and output processes.
These imports set the stage for deeper integration with file collection, searching, and security. Looks consistent and clear.
70-114: Expanded packing process with concurrency hooks.
Providing a progress callback and aninitTaskRunneroption is a clean way to handle parallelizable tasks. The approach is flexible and modular. Great job!src/core/security/validateFileSafety.ts (1)
6-6: LGTM! Clean dependency updates.The change from
runSecurityCheckIfEnabledtorunSecurityCheckwith conditional execution is a good refactor that improves separation of concerns.Also applies to: 14-14
src/core/metrics/calculateOutputMetrics.ts (2)
21-35: LGTM! Well-structured error handling and performance monitoring.The implementation includes:
- Proper error handling with logging
- Performance timing with nanosecond precision
- Clear progress logging
6-9: Add test coverage for task runner initialization.The task runner initialization logic lacks test coverage. Consider adding tests for pool creation and task execution.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 7-9: src/core/metrics/calculateOutputMetrics.ts#L7-L9
Added lines #L7 - L9 were not covered by testssrc/core/metrics/workers/outputMetricsWorker.ts (2)
11-19: Verify thread safety of TokenCounter singleton.While the singleton pattern is appropriate for a worker context, ensure
TokenCounteris thread-safe when used across multiple workers.✅ Verification successful
TokenCounter singleton implementation is thread-safe
The singleton pattern is correctly implemented as each worker runs in its own isolated context using Piscina worker pools. The TokenCounter instance is scoped to individual workers with no shared state between them, ensuring thread safety.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for TokenCounter usage in worker contexts ast-grep --pattern 'new TokenCounter($_)'Length of output: 332
Script:
#!/bin/bash # Check worker instantiation and TokenCounter implementation ast-grep --pattern 'class TokenCounter { $$$ }' # Check worker creation/usage rg "new Worker|Worker\." -A 3Length of output: 8742
36-42: Add test coverage for worker cleanup.The cleanup logic on worker termination is not covered by tests. Consider adding tests that simulate worker termination.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 38-42: src/core/metrics/workers/outputMetricsWorker.ts#L38-L42
Added lines #L38 - L42 were not covered by teststests/core/security/validateFileSafety.test.ts (1)
24-24: LGTM! Test updated correctly for the new worker-based security check.The test has been properly updated to reflect the API changes, with correct mock setup and assertions.
Also applies to: 30-30
src/core/metrics/workers/fileMetricsWorker.ts (1)
14-22: LGTM! Efficient TokenCounter singleton implementation.The singleton pattern is appropriately used in the worker context, ensuring efficient reuse of the TokenCounter instance.
src/core/security/workers/securityCheckWorker.ts (2)
56-63: LGTM! Secure configuration setup.The security configuration is properly set up using the recommended preset.
24-27: Add test coverage for error handling.The error handling code path is not covered by tests.
✅ Verification successful
Error handling is properly tested ✅
The error handling code path is already covered by the test case
"should handle worker errors gracefully"intests/core/security/securityCheck.test.ts. This test verifies both error propagation and error logging behavior.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check test coverage for security check error handling # Search for test files that might contain error handling tests rg -l "SecurityCheckTask" tests/Length of output: 74
Script:
#!/bin/bash # Check test file content for error handling tests echo "=== Test File Content ===" cat tests/core/security/securityCheck.test.ts echo -e "\n=== Implementation File Content ===" cat src/core/security/workers/securityCheckWorker.tsLength of output: 5581
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 25-27: src/core/security/workers/securityCheckWorker.ts#L25-L27
Added lines #L25 - L27 were not covered by testssrc/cli/actions/remoteAction.ts (1)
50-50: Consider potential race conditions in cleanup.When running multiple instances simultaneously, the cleanup of temporary directories could face race conditions. Consider:
- Adding unique identifiers to temp directories
- Implementing retry logic for cleanup failures
- Adding locks or synchronization mechanisms
tests/core/file/fileProcess.test.ts (1)
36-38: LGTM! Good use of dependency injection for testing.The test properly injects the mock task runner, making it easier to test the concurrent processing behavior in isolation.
package.json (1)
79-79: Verify Piscina version compatibility.The addition of Piscina aligns with the PR's multithreading objectives. However, let's verify the version compatibility and check for any known issues.
✅ Verification successful
Piscina v4.8.0 is appropriate for this PR
Using a stable v4.8.0 release is a good choice, especially over the latest alpha version (5.0.0-alpha.1). The package is actively maintained, not deprecated, and has standard Node.js compatibility requirements (>=18.x).
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check for compatibility issues and security advisories for Piscina # Get the latest version and release date echo "Latest version info:" npm view piscina time --json | jq 'to_entries | .[-1]' # Check for known vulnerabilities echo -e "\nVulnerability check:" npm audit piscina --json | jq '.vulnerabilities' # Check minimum Node.js version requirement echo -e "\nNode.js version requirement:" npm view piscina engines.nodeLength of output: 747
Script:
#!/bin/bash # Check for vulnerabilities and deprecation notices echo "Checking vulnerabilities for piscina@4.8.0:" npm audit piscina@4.8.0 --json | jq '.vulnerabilities' echo -e "\nChecking for deprecation notices:" npm view piscina@4.8.0 deprecatedLength of output: 624
Deploying repomix with
|
| Latest commit: |
3b2e45e
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://5699927e.repomix.pages.dev |
| Branch Preview URL: | https://feat-multithread.repomix.pages.dev |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
src/core/metrics/workers/outputMetricsWorker.ts (1)
11-19: Consider enhancing type safety for the singleton implementation.While the lazy initialization pattern is appropriate, we can make it more robust with TypeScript's strict null checks.
-let tokenCounter: TokenCounter | null = null; +let tokenCounter: TokenCounter | undefined; const getTokenCounter = (encoding: TiktokenEncoding): TokenCounter => { - if (!tokenCounter) { + if (tokenCounter === undefined) { tokenCounter = new TokenCounter(encoding); } return tokenCounter; };src/core/metrics/calculateMetrics.ts (1)
20-23: Great use of dependency injection and concurrent execution!The implementation effectively uses Promise.all for parallel processing and follows good practices with dependency injection.
Consider destructuring deps in the function parameters for better readability:
export const calculateMetrics = async ( processedFiles: ProcessedFile[], output: string, progressCallback: RepomixProgressCallback, config: RepomixConfigMerged, - deps = { - calculateAllFileMetrics, - calculateOutputMetrics, - }, + { + calculateAllFileMetrics = calculateAllFileMetrics, + calculateOutputMetrics = calculateOutputMetrics, + } = {}, ): Promise<CalculateMetricsResult> => {Also applies to: 27-30
src/core/metrics/calculateOutputMetrics.ts (2)
6-7: Consider making threshold constants configurable.The chunk size and parallel processing threshold could be made configurable through the application's config to allow tuning based on different workloads.
-const CHUNK_SIZE = 1000; -const MIN_CONTENT_LENGTH_FOR_PARALLEL = 1_000_000; // 1000KB +const DEFAULT_CHUNK_SIZE = 1000; +const DEFAULT_MIN_CONTENT_LENGTH_FOR_PARALLEL = 1_000_000; // 1000KB + +interface MetricsConfig { + chunkSize?: number; + minContentLengthForParallel?: number; +}
64-67: Enhance error logging for better debugging.The error handling could provide more context about the failed operation.
} catch (error) { - logger.error('Error during token count:', error); + logger.error( + `Error during token count${path ? ` for ${path}` : ''} ` + + `(parallel=${shouldRunInParallel}, contentLength=${content.length}):`, + error + ); throw error; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/core/metrics/calculateMetrics.ts(2 hunks)src/core/metrics/calculateOutputMetrics.ts(1 hunks)src/core/metrics/workers/outputMetricsWorker.ts(1 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
src/core/metrics/calculateOutputMetrics.ts
[warning] 10-12: src/core/metrics/calculateOutputMetrics.ts#L10-L12
Added lines #L10 - L12 were not covered by tests
[warning] 34-35: src/core/metrics/calculateOutputMetrics.ts#L34-L35
Added lines #L34 - L35 were not covered by tests
[warning] 37-39: src/core/metrics/calculateOutputMetrics.ts#L37-L39
Added lines #L37 - L39 were not covered by tests
[warning] 42-50: src/core/metrics/calculateOutputMetrics.ts#L42-L50
Added lines #L42 - L50 were not covered by tests
[warning] 53-53: src/core/metrics/calculateOutputMetrics.ts#L53
Added line #L53 was not covered by tests
src/core/metrics/workers/outputMetricsWorker.ts
[warning] 36-40: src/core/metrics/workers/outputMetricsWorker.ts#L36-L40
Added lines #L36 - L40 were not covered by tests
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Test (windows-latest, 23.x)
- GitHub Check: Test (windows-latest, 22.x)
- GitHub Check: Test (windows-latest, 21.x)
- GitHub Check: Test (windows-latest, 20.x)
- GitHub Check: Test (windows-latest, 19.x)
- GitHub Check: Test (windows-latest, 18.x)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Cloudflare Pages
🔇 Additional comments (3)
src/core/metrics/workers/outputMetricsWorker.ts (2)
5-9: Well-defined interface for worker tasks!The
OutputMetricsTaskinterface clearly defines the required properties for token counting tasks.
34-40: Add test coverage for worker cleanup.The cleanup handler is crucial for proper resource management but currently lacks test coverage.
Could you add tests to verify that:
- The cleanup handler is registered
- TokenCounter is properly freed on process exit
- The tokenCounter reference is nullified
Here's a verification script to check the current test coverage:
✅ Verification successful
Test coverage missing for worker cleanup handler
While the
TokenCounter.free()method is well tested, the worker's process exit handler lacks test coverage. The review comment is accurate - there are no tests verifying:
- Registration of the exit handler
- Cleanup execution during process termination
- Nullification of the tokenCounter reference
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check test coverage for worker cleanup # Search for test files related to outputMetricsWorker fd -e test.ts -e spec.ts outputMetricsWorker # Search for cleanup-related test cases rg -l "process.*exit|cleanup|free" $(fd -e test.ts -e spec.ts)Length of output: 1659
Script:
#!/bin/bash # Check content of matching test files and search for worker-related tests echo "=== Token Count Tests ===" rg -A 5 -B 5 "process.*exit|cleanup|free" tests/core/tokenCount/tokenCount.test.ts echo -e "\n=== Calculate Metrics Tests ===" rg -A 5 -B 5 "process.*exit|cleanup|free" tests/core/metrics/calculateMetrics.test.ts echo -e "\n=== Worker Tests ===" # Search for any worker-related test files fd worker.*test echo -e "\n=== Process Exit Tests ===" # Search for process.exit related tests ast-grep --pattern 'process.on("exit"' ast-grep --pattern "process.on('exit'"Length of output: 1917
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 36-40: src/core/metrics/workers/outputMetricsWorker.ts#L36-L40
Added lines #L36 - L40 were not covered by testssrc/core/metrics/calculateOutputMetrics.ts (1)
32-57: Add test coverage for parallel processing logic.The parallel processing implementation looks solid but lacks test coverage for various scenarios.
Could you add tests to verify:
- Content chunking logic
- Parallel processing of chunks
- Result aggregation
- Edge cases with different content sizes
Here's a verification script to analyze the current test coverage:
✅ Verification successful
Review comment is accurate - parallel processing lacks test coverage
Current test suite only covers basic scenarios. Missing test coverage for:
- Content chunking logic and size calculations
- Parallel processing with Promise.all
- Results aggregation from multiple chunks
- Edge cases with varying content sizes
- Behavior when shouldRunInParallel is true
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Analyze test coverage for parallel processing # Search for relevant test files fd -e test.ts -e spec.ts calculateOutputMetrics # Check for parallel processing test cases rg -l "parallel|chunk|Promise.all" $(fd -e test.ts -e spec.ts) # Look for test data files fd -e json -e txt test-dataLength of output: 1699
Script:
#!/bin/bash # Examine test file content and coverage echo "=== Main test file content ===" cat tests/core/metrics/calculateOutputMetrics.test.ts echo -e "\n=== Searching for parallel processing related tests ===" rg "shouldRunInParallel|parallel|chunk" tests/core/metrics/calculateOutputMetrics.test.ts -C 2 echo -e "\n=== Looking for test scenarios ===" rg "describe|it\(" tests/core/metrics/calculateOutputMetrics.test.tsLength of output: 3428
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 34-35: src/core/metrics/calculateOutputMetrics.ts#L34-L35
Added lines #L34 - L35 were not covered by tests
[warning] 37-39: src/core/metrics/calculateOutputMetrics.ts#L37-L39
Added lines #L37 - L39 were not covered by tests
[warning] 42-50: src/core/metrics/calculateOutputMetrics.ts#L42-L50
Added lines #L42 - L50 were not covered by tests
[warning] 53-53: src/core/metrics/calculateOutputMetrics.ts#L53
Added line #L53 was not covered by tests
Performance Improvement
yamadashy/repomix
868.73 millis (usr: 1.11 secs, sys: 0.14 secs)671.26 millis (usr: 1.42 secs, sys: 0.22 secs)No significant change
facebook/react
123.31 secs (usr: 118.64 secs, sys: 1.60 secs)4.19 secs (usr: 22.66 secs, sys: 2.49 secs)29x faster
vercel/next.js
17.85 mins (usr: 16.66 mins, sys: 0.18 mins)17.27 secs (usr: 52.93 secs, sys: 7.11 secs)58x faster
Changes
p-mapwith Piscina worker threads for parallel processingChecklist
npm run testnpm run lint