feat: optimize worker system performance#851
Conversation
…g other workers with worker_threads Implemented a comprehensive worker process architecture that balances performance and isolation: - Created defaultActionWorker.ts that runs entire CLI processing pipeline in child_process for better resource isolation - Migrated all other worker implementations (fileProcess, security, metrics, etc.) from child_process to worker_threads for improved performance - Enhanced logger.ts to support both worker_threads (via workerData) and child_process (via environment variables) - Added color support for child_process workers by passing FORCE_COLOR and TERM environment variables - Moved Spinner from main process to defaultActionWorker for direct stdout control This hybrid approach provides the best of both worlds: the main CLI operation runs in isolated child_process for stability, while intensive parallel tasks use faster worker_threads for optimal performance.
…n process Moved configuration loading (loadFileConfig, buildCliConfig, mergeConfigs) from worker process to main process for better performance and cleaner separation of concerns. Changes: - Config loading now happens in main process before worker creation - Worker receives pre-loaded config via task interface, eliminating redundant config operations - Reduced worker startup overhead and simplified worker responsibilities - Worker now focuses solely on heavy processing tasks (pack + spinner) This optimization maintains the same functionality while improving startup performance and creating clearer architectural boundaries between main and worker processes.
- Use class names for RepomixError type checking instead of hardcoded strings - Remove unused RepomixError import from fileProcess.ts - Simplify comments in errorHandle.ts and fileProcess.ts - Clean up constructor-based error checking logic
Replace executeGlobbyInWorker with direct globby calls since worker isolation is no longer necessary for globby execution. - Remove src/core/file/globbyExecute.ts wrapper - Remove src/core/file/workers/globbyWorker.ts - Update fileSearch.ts to import and use globby directly - Update tests to mock globby instead of executeGlobbyInWorker - Simplify integration tests by removing worker mocks
Previously, each metrics calculation function (calculateSelectiveFileMetrics, calculateOutputMetrics, calculateGitDiffMetrics, calculateGitLogMetrics) was creating its own TaskRunner instance, leading to redundant worker pool initialization overhead. This change consolidates all metrics workers into a single unified worker (unifiedMetricsWorker) that handles all metric types through a discriminated union pattern. The calculateMetrics function now creates one TaskRunner instance and passes it to all calculation functions, with cleanup performed once at the end. Benefits: - Reduces worker pool initialization overhead - Improves resource efficiency by reusing single worker pool - Maintains consistent worker infrastructure across all metrics - Simplifies lifecycle management with centralized cleanup
…Worker Added function overloads for better type inference when passing PingTask vs DefaultActionTask. Implemented separate PingTask/PingResult types for Bun worker initialization workaround. Added waitForWorkerReady method with retry logic to handle ES module timing issues in Bun. This resolves the "Cannot access 'default' before initialization" error that started occurring after commit 8a88f45 when using child_process workers with Bun runtime.
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughRefactors CLI default action to dispatch work via a new worker, adds a worker ping/readiness path, unifies metrics into a single worker, replaces globby worker with direct globby usage, shifts several runtimes to worker_threads, injects shared task runners into metrics, updates logging/error guards, adjusts scripts and tests. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant CLI as CLI (main)
participant Worker as DefaultActionWorker
Note over CLI,Worker: Startup & readiness
User->>CLI: runDefaultAction(args)
CLI->>Worker: PingTask (retry until ready)
Worker-->>CLI: PingResult (config echo)
Note over CLI,Worker: Execute pack
CLI->>Worker: DefaultActionTask (dirs/cwd/config/opts/isStdin)
Worker->>Worker: Validate mode (stdin vs dirs)
Worker->>Worker: pack()/progress via spinner
Worker-->>CLI: DefaultActionWorkerResult (packResult, config)
CLI->>CLI: reportResults()
CLI-->>User: Output
sequenceDiagram
autonumber
participant Orchestrator as calculateMetrics
participant Runner as TaskRunner<UnifiedMetricsTask>
participant UWorker as unifiedMetricsWorker
Orchestrator->>Runner: run 'file' tasks (selected files)
Runner->>UWorker: FileMetricsTask
UWorker-->>Runner: FileMetrics
Orchestrator->>Runner: run 'output' tasks (chunks or whole)
Runner->>UWorker: OutputMetricsTask
UWorker-->>Runner: token count
Orchestrator->>Runner: run 'gitDiff' and 'gitLog'
Runner->>UWorker: GitDiff/GitLogTasks
UWorker-->>Runner: counts
Orchestrator->>Runner: cleanup()
Orchestrator-->>Orchestrator: aggregate totals
Estimated code review effort🎯 4 (Complex) | ⏱️ ~70 minutes Possibly related PRs
Suggested labels
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @yamadashy, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request delivers substantial improvements to the application's worker system, focusing on both performance and compatibility. It streamlines how tasks are distributed and managed across workers, particularly for file processing and metrics calculation, by introducing a unified task runner. A critical aspect of this update is addressing and resolving compatibility issues with the Bun runtime, ensuring smoother operation. These changes collectively aim to make the application more efficient and reliable across different JavaScript runtimes. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Pull Request Overview
This PR optimizes the worker system by introducing a unified task runner lifecycle, consolidating separate worker types into a single metrics worker, and improves Bun compatibility through better worker initialization handling.
Key changes include:
- Unified worker system with consolidated lifecycle management and single TaskRunner
- Moved config loading from workers to main process for better performance
- Implemented ping/pong worker readiness checks for Bun ES module timing issues
Reviewed Changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/core/metrics/workers/unifiedMetricsWorker.ts | New unified worker consolidating file, output, git diff, and git log metrics workers |
| src/cli/actions/defaultAction.ts | Refactored to use child_process worker with pre-loaded config and Bun compatibility checks |
| src/core/metrics/calculateMetrics.ts | Updated to use single unified task runner for all metrics calculations |
| src/shared/processConcurrency.ts | Added environment variable passing for child_process workers |
| tests/ | Updated test files to work with new unified worker system |
There was a problem hiding this comment.
Code Review
This pull request introduces significant optimizations to the worker system, improves Bun compatibility, and enhances type safety. The refactoring to a unified TaskRunner for metrics calculations is a great improvement for performance and resource management. The introduction of a ping/pong mechanism for worker readiness is a clever solution for the Bun compatibility issue. My review focuses on further improving type safety, efficiency, and maintainability. I've pointed out missing function overloads, a potentially brittle file path for a worker, and some unsafe type assertions that could be improved with type guards.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (30)
src/core/file/fileProcess.ts (1)
28-28: Consider documenting the runtime change decision.The switch from
child_processtoworker_threadsruntime may have performance implications. The comment "High memory usage and leak risk" suggests concerns with the previous approach, but it's unclear ifworker_threadsfully addresses these issues.Consider adding a more detailed comment explaining:
- Why
worker_threadswas chosen overchild_process- Any trade-offs between memory usage, performance, and isolation
- Whether memory leak risks are fully mitigated
- // High memory usage and leak risk + // Using worker_threads for better memory sharing and reduced overhead compared to child_process. + // Note: Less process isolation but significantly lower memory footprint for file processing tasks. runtime: 'worker_threads',src/shared/logger.ts (2)
102-114: Consider extracting log level validation logic.The validation logic for log levels is duplicated between the environment variable path (lines 105-110) and the workerData path (lines 120-125). This could be refactored to reduce duplication.
+const isValidLogLevel = (level: number): level is RepomixLogLevel => { + return ( + level === repomixLogLevels.SILENT || + level === repomixLogLevels.ERROR || + level === repomixLogLevels.WARN || + level === repomixLogLevels.INFO || + level === repomixLogLevels.DEBUG + ); +}; + export const setLogLevelByWorkerData = () => { // Try to get log level from environment variable first (for child_process workers) const envLogLevel = process.env.REPOMIX_LOG_LEVEL; if (envLogLevel !== undefined) { const logLevel = Number(envLogLevel); - if ( - logLevel === repomixLogLevels.SILENT || - logLevel === repomixLogLevels.ERROR || - logLevel === repomixLogLevels.WARN || - logLevel === repomixLogLevels.INFO || - logLevel === repomixLogLevels.DEBUG - ) { + if (isValidLogLevel(logLevel)) { setLogLevel(logLevel); return; } } // Fallback to workerData for worker_threads if (Array.isArray(workerData) && workerData.length > 1 && workerData[1]?.logLevel !== undefined) { const logLevel = workerData[1].logLevel; - if ( - logLevel === repomixLogLevels.SILENT || - logLevel === repomixLogLevels.ERROR || - logLevel === repomixLogLevels.WARN || - logLevel === repomixLogLevels.INFO || - logLevel === repomixLogLevels.DEBUG - ) { + if (isValidLogLevel(logLevel)) { setLogLevel(logLevel); } } };
100-115: Verify edge cases in log level parsing.The code converts the environment variable to a number without handling potential parsing errors (e.g.,
NaNfrom non-numeric strings).const envLogLevel = process.env.REPOMIX_LOG_LEVEL; if (envLogLevel !== undefined) { const logLevel = Number(envLogLevel); - if ( + if (!isNaN(logLevel) && ( logLevel === repomixLogLevels.SILENT || logLevel === repomixLogLevels.ERROR || logLevel === repomixLogLevels.WARN || logLevel === repomixLogLevels.INFO || logLevel === repomixLogLevels.DEBUG - ) { + )) { setLogLevel(logLevel); return; } }src/shared/errorHandle.ts (4)
34-43: Don’t print full stack traces by default for unexpected errors.Leaking stacks at non-verbose levels can expose paths and sensitive details. Gate stack emission behind DEBUG, mirroring the RepomixError path.
- logger.error(`✖ Unexpected error: ${error.message}`); - // If unexpected error, show stack trace by default - logger.note('Stack trace:', error.stack); + logger.error(`✖ Unexpected error: ${error.message}`); + // Emit stack only at DEBUG and above + if (logger.getLogLevel() >= repomixLogLevels.DEBUG && error.stack) { + logger.note('Stack trace:', error.stack); + }
46-46: Labeling nit: “Stack trace” for unknown values can be misleading.This prints the raw object (not necessarily a stack). Adjust the label and guard.
- logger.note('Stack trace:', error); + logger.note('Raw error object:', error);
61-76: Relax isError guard: don’t require a stack property.Some Error-like objects (or sanitized errors) may omit
stack. Accept message-only objects.-const isError = (error: unknown): error is Error => { +const isError = (error: unknown): error is Error => { if (error instanceof Error) return true; if (typeof error !== 'object' || error === null) return false; const obj = error as Record<string, unknown>; return ( - typeof obj.message === 'string' && - typeof obj.stack === 'string' && - ('name' in obj ? typeof obj.name === 'string' : true) + typeof obj.message === 'string' && + // stack is optional across boundaries + (!('stack' in obj) || typeof obj.stack === 'string') && + (!('name' in obj) || typeof obj.name === 'string') ); };
95-102: Preserve the original Zod error as cause.Improves diagnosability while keeping the user-facing message.
- throw new RepomixConfigValidationError( - `${message}\n\n ${zodErrorText}\n\n Please check the config file and try again.`, - ); + throw new RepomixConfigValidationError( + `${message}\n\n ${zodErrorText}\n\n Please check the config file and try again.`, + { cause: error } + );src/core/file/fileSearch.ts (1)
195-213: Harden EPERM/EACCES handling in globby catch and include the code in PermissionError.Type-guard the error before accessing
.codeand pass the code toPermissionErrorfor consistency with earlier checks.- }).catch((error) => { - // Handle EPERM errors specifically - if (error.code === 'EPERM' || error.code === 'EACCES') { - throw new PermissionError( - `Permission denied while scanning directory. Please check folder access permissions for your terminal app. path: ${rootDir}`, - rootDir, - ); - } - throw error; - }); + }).catch((error: unknown) => { + // Handle EPERM/EACCES specifically + const code = (error as NodeJS.ErrnoException | { code?: string })?.code; + if (code === 'EPERM' || code === 'EACCES') { + throw new PermissionError( + `Permission denied while scanning directory. Please check folder access permissions for your terminal app. path: ${rootDir}`, + rootDir, + code + ); + } + throw error; + });tests/cli/actions/defaultAction.tokenCountTree.test.ts (1)
55-76: Assert runner cleanup is invoked.Expose the cleanup mock so the test verifies resource cleanup.
Apply this diff:
- mockInitTaskRunner.mockImplementation(() => ({ - run: vi.fn().mockImplementation(async () => ({ + const cleanupMock = vi.fn().mockResolvedValue(undefined); + const runMock = vi.fn().mockImplementation(async () => ({ packResult: { totalFiles: 3, totalCharacters: 1000, totalTokens: 200, fileCharCounts: {}, fileTokenCounts: {}, gitDiffTokenCount: 0, suspiciousFilesResults: [], suspiciousGitDiffResults: [], processedFiles: [ { path: '/test/file1.js', content: 'content1' }, { path: '/test/file2.js', content: 'content2' }, ], safeFilePaths: ['/test/file1.js', '/test/file2.js'], }, config: mockMergeConfigs.mock.results[mockMergeConfigs.mock.results.length - 1]?.value || {}, })), - cleanup: vi.fn().mockResolvedValue(undefined), - })); + cleanup: cleanupMock, + run: runMock, + })); + // After runDefaultAction(...) + // expect(cleanupMock).toHaveBeenCalled();tests/core/file/fileSearch.test.ts (1)
385-393: Minor: guard against empty call list.If searchFiles short‑circuits, accessing mock.calls[0] would throw. Optional defensive check keeps the test resilient.
-const executeGlobbyCall = vi.mocked(globby).mock.calls[0]; +expect(vi.mocked(globby).mock.calls.length).toBeGreaterThan(0); +const executeGlobbyCall = vi.mocked(globby).mock.calls[0];tests/core/metrics/calculateSelectiveFileMetrics.test.ts (1)
10-12: Remove unused mock.This vi.mock targets a module not imported in this test and can be dropped.
-vi.mock('../../shared/processConcurrency', () => ({ - getProcessConcurrency: () => 1, -}));src/core/metrics/calculateMetrics.ts (1)
41-45: Pool-sizing OK — add Bun/runtime guard for worker_threads.
- getWorkerThreadCount enforces minThreads = 1, so numOfTasks === 0 won't create a zero‑sized pool (src/shared/processConcurrency.ts).
- Several callers still pass runtime: 'worker_threads' (e.g., src/core/metrics/calculateMetrics.ts, src/core/file/fileProcess.ts).
- Bun exposes node:worker_threads and tinypool supports both runtimes, but Bun's worker implementation has caveats — detect process.versions.bun and either default to 'child_process' or fall back at pool creation when runtime === 'worker_threads'. (bun.com)
tests/cli/actions/defaultAction.test.ts (5)
144-149: Loosen initTaskRunner expectation to reduce brittlenessFuture additions to WorkerOptions would break this exact-match assertion. Use partial matching.
- expect(processConcurrency.initTaskRunner).toHaveBeenCalledWith({ - numOfTasks: 1, - workerPath: expect.stringContaining('defaultActionWorker.js'), - runtime: 'child_process', - }); + expect(processConcurrency.initTaskRunner).toHaveBeenCalledWith( + expect.objectContaining({ + numOfTasks: 1, + workerPath: expect.stringContaining('defaultActionWorker.js'), + runtime: 'child_process', + }), + );
150-153: Stabilize assertions: avoid relying on vi.mock.results[N]Accessing mock.results is brittle. Assert directly on the exact mock instance you return from initTaskRunner.
- const taskRunner = vi.mocked(processConcurrency.initTaskRunner).mock.results[0].value; - expect(taskRunner.run).toHaveBeenCalled(); - expect(taskRunner.cleanup).toHaveBeenCalled(); + // Assert on the mock instance returned by initTaskRunner + expect(mockTaskRunner.run).toHaveBeenCalled(); + expect(mockTaskRunner.cleanup).toHaveBeenCalled();Outside this hunk, hoist the runner so tests can reference it:
// at top-level of this file (near other top-level consts) let mockTaskRunner: { run: ReturnType<typeof vi.fn>; cleanup: ReturnType<typeof vi.fn> }; // inside beforeEach(), assign instead of const: mockTaskRunner = { run: vi.fn().mockResolvedValue(/* ... */), cleanup: vi.fn().mockResolvedValue(undefined), }; vi.mocked(processConcurrency.initTaskRunner).mockReturnValue(mockTaskRunner as any);
37-45: Remove redundant mock resetsCalling vi.resetAllMocks() in both beforeEach and afterEach is unnecessary; keep it in one place to avoid accidental unmocking between arrange/act phases.
beforeEach(() => { - vi.resetAllMocks(); - // Reset mockSpinner functions vi.clearAllMocks();Also applies to: 132-134
175-193: Add Bun readiness (ping) test coveragewaitForWorkerReady is Bun‑specific; add a unit test that simulates Bun and verifies a ping task is dispatched before the main task.
Example test to add:
it('sends a ping first in Bun to ensure worker readiness', async () => { const originalBun = (process.versions as any).bun; (process.versions as any).bun = '1.1.0'; const calls: any[] = []; const bunRunner = { run: vi.fn(async (task: any) => { calls.push(task); if (task.ping) return { ping: true, config: createMockConfig() }; return { packResult: { totalFiles: 0, totalCharacters: 0, totalTokens: 0, fileCharCounts: {}, fileTokenCounts: {}, suspiciousFilesResults: [], suspiciousGitDiffResults: [], suspiciousGitLogResults: [], processedFiles: [], safeFilePaths: [], gitDiffTokenCount: 0, gitLogTokenCount: 0, skippedFiles: [] }, config: createMockConfig(), }; }), cleanup: vi.fn().mockResolvedValue(undefined), }; vi.mocked(processConcurrency.initTaskRunner).mockReturnValue(bunRunner as any); await runDefaultAction(['.'], process.cwd(), {}); expect(calls[0]).toMatchObject({ ping: true }); expect(calls[1]).toMatchObject({ directories: ['.'], isStdin: false }); (process.versions as any).bun = originalBun; });
88-103: Redundant packager mockWith defaultAction delegated to a worker via initTaskRunner, this mock is unused in this spec. Removing it simplifies the setup.
src/core/metrics/calculateOutputMetrics.ts (1)
7-8: Rename CHUNK_SIZE (it’s a chunk count) and avoid building a large intermediate arrayCHUNK_SIZE is used as “number of chunks,” not “size.” Also, skip materializing the full chunks array to reduce peak memory.
-const CHUNK_SIZE = 1000; +// Target number of chunks for parallel processing (not bytes) +const TARGET_PARALLEL_CHUNKS = 1000; @@ - // Split content into chunks for parallel processing - const chunkSize = Math.ceil(content.length / CHUNK_SIZE); - const chunks: string[] = []; - - for (let i = 0; i < content.length; i += chunkSize) { - chunks.push(content.slice(i, i + chunkSize)); - } - - // Process chunks in parallel - const chunkResults = await Promise.all( - chunks.map(async (chunk, index) => { - const result = await deps.taskRunner.run({ - type: 'output', - content: chunk, - encoding, - path: path ? `${path}-chunk-${index}` : undefined, - }); - return result as number; // Output tasks always return numbers - }), - ); + // Process chunks in parallel without pre-building an intermediate chunks array + const sliceSize = Math.ceil(content.length / TARGET_PARALLEL_CHUNKS); + const numSlices = Math.ceil(content.length / sliceSize); + const chunkResults = await Promise.all( + Array.from({ length: numSlices }, (_, index) => { + const start = index * sliceSize; + const chunk = content.slice(start, start + sliceSize); + return deps.taskRunner + .run({ + type: 'output', + content: chunk, + encoding, + path: path ? `${path}-chunk-${index}` : undefined, + }) + .then((r) => r as number); // Output tasks always return numbers + }), + );Also applies to: 26-31, 33-44
src/cli/actions/defaultAction.ts (1)
269-309: Consider moving waitForWorkerReady to a shared utilityMultiple workers may need this Bun readiness probe. Extract to shared/processConcurrency (or a small helper module) to centralize behavior and tests.
tests/core/metrics/calculateOutputMetrics.test.ts (1)
12-21: Simplify mock runner: drop unused WorkerOptions parameter and update call sitesWorkerOptions aren’t used by the in-test runner; removing them simplifies the test.
-const mockInitTaskRunner = <T, R>(_options: WorkerOptions) => { +const mockInitTaskRunner = <T, R>() => { @@ - const result = await calculateOutputMetrics(content, encoding, path, { - taskRunner: mockInitTaskRunner({ numOfTasks: 1, workerPath: '', runtime: 'worker_threads' }), + const result = await calculateOutputMetrics(content, encoding, path, { + taskRunner: mockInitTaskRunner(), }); @@ - const result = await calculateOutputMetrics(content, encoding, undefined, { - taskRunner: mockInitTaskRunner({ numOfTasks: 1, workerPath: '', runtime: 'worker_threads' }), + const result = await calculateOutputMetrics(content, encoding, undefined, { + taskRunner: mockInitTaskRunner(), }); @@ - calculateOutputMetrics(content, encoding, undefined, { - taskRunner: mockErrorTaskRunner({ numOfTasks: 1, workerPath: '', runtime: 'worker_threads' }), + calculateOutputMetrics(content, encoding, undefined, { + taskRunner: mockErrorTaskRunner(), }), @@ - const result = await calculateOutputMetrics(content, encoding, undefined, { - taskRunner: mockInitTaskRunner({ numOfTasks: 1, workerPath: '', runtime: 'worker_threads' }), + const result = await calculateOutputMetrics(content, encoding, undefined, { + taskRunner: mockInitTaskRunner(), }); @@ - const result = await calculateOutputMetrics(content, encoding, path, { - taskRunner: mockParallelTaskRunner({ numOfTasks: 1, workerPath: '', runtime: 'worker_threads' }), + const result = await calculateOutputMetrics(content, encoding, path, { + taskRunner: mockParallelTaskRunner(), }); @@ - calculateOutputMetrics(content, encoding, undefined, { - taskRunner: mockErrorTaskRunner({ numOfTasks: 1, workerPath: '', runtime: 'worker_threads' }), + calculateOutputMetrics(content, encoding, undefined, { + taskRunner: mockErrorTaskRunner(), }), @@ - await calculateOutputMetrics(content, encoding, undefined, { - taskRunner: mockChunkTrackingTaskRunner({ numOfTasks: 1, workerPath: '', runtime: 'worker_threads' }), + await calculateOutputMetrics(content, encoding, undefined, { + taskRunner: mockChunkTrackingTaskRunner(), });Also applies to: 30-31, 41-42, 65-66, 77-78, 88-89, 116-117, 141-142, 167-168
src/cli/actions/workers/defaultActionWorker.ts (6)
38-41: Add real TS overloads (current comment is misleading).
Provide proper overload signatures to improve inference at call sites.Apply:
-// Function overloads for better type inference -async function defaultActionWorker( - task: DefaultActionTask | PingTask, -): Promise<DefaultActionWorkerResult | PingResult> { +// Function overloads for better type inference +function defaultActionWorker(task: DefaultActionTask): Promise<DefaultActionWorkerResult>; +function defaultActionWorker(task: PingTask): Promise<PingResult>; +async function defaultActionWorker( + task: DefaultActionTask | PingTask, +): Promise<DefaultActionWorkerResult | PingResult> {
51-51: Minor: redundant cast after discriminator.
TypeScript narrowstasktoDefaultActionTaskin theelsebranch; theas DefaultActionTaskcast isn’t needed.Apply:
- const { directories, cwd, config, cliOptions, isStdin } = task as DefaultActionTask; + const { directories, cwd, config, cliOptions, isStdin } = task;
53-54: Avoid logging full config objects.
Configs can include sensitive paths/flags; log keys/summary instead.Apply:
- logger.trace('Worker: Using pre-loaded config:', config); + logger.trace(`Worker: Using pre-loaded config. Keys: ${Object.keys(config ?? {}).join(', ')}`);
65-71: Clarify stdin mode validation message.
Users often pass.by habit; the message could mention that no directory args are needed (repomix assumes.).Proposed copy: “When using --stdin, do not pass directory arguments (repomix assumes '.' by default). File paths are read from stdin.”
86-91: Deduplicate resolved target paths (edge-case polish).
Avoid duplicate work if the same dir is passed multiple times.Apply:
- const targetPaths = directories.map((directory) => path.resolve(cwd, directory)); + const targetPaths = Array.from(new Set(directories.map((d) => path.resolve(cwd, d))));
99-102: Surface error details in spinner fail.
Helps users debug without waiting for upper-layer logging.Apply:
- spinner.fail('Error during packing'); + const msg = error instanceof Error ? error.message : String(error); + spinner.fail(`Error during packing: ${msg}`);src/core/metrics/calculateSelectiveFileMetrics.ts (2)
37-37: Nit: update log copy to reflect DI runner, not a local pool.Apply:
- logger.trace(`Starting selective metrics calculation for ${filesToProcess.length} files using worker pool`); + logger.trace(`Starting selective metrics calculation for ${filesToProcess.length} files using task runner`);
41-49: Progress updates per completion are fine; order will be non-deterministic.
If deterministic progress is desired, throttle or serialize; otherwise this is OK.src/core/metrics/workers/unifiedMetricsWorker.ts (2)
11-12: Remove unusedMetricsTaskType.
Dead type alias; keep surface lean.Apply:
-export type MetricsTaskType = 'output' | 'file' | 'gitDiff' | 'gitLog';
64-83: Optional: micro-simplify diff counting.
Not measurable, but shorter.Apply:
- const tokenCounter = getTokenCounter(task.encoding); - const countPromises = []; - - if (task.workTreeDiffContent) { - const content = task.workTreeDiffContent; - countPromises.push(Promise.resolve().then(() => tokenCounter.countTokens(content))); - } - if (task.stagedDiffContent) { - const content = task.stagedDiffContent; - countPromises.push(Promise.resolve().then(() => tokenCounter.countTokens(content))); - } - - const results = await Promise.all(countPromises); + const tokenCounter = getTokenCounter(task.encoding); + const inputs = [task.workTreeDiffContent, task.stagedDiffContent].filter(Boolean) as string[]; + const results = await Promise.all(inputs.map((c) => Promise.resolve().then(() => tokenCounter.countTokens(c))));
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (29)
package.json(1 hunks)src/cli/actions/defaultAction.ts(4 hunks)src/cli/actions/workers/defaultActionWorker.ts(1 hunks)src/core/file/fileProcess.ts(1 hunks)src/core/file/fileSearch.ts(3 hunks)src/core/file/globbyExecute.ts(0 hunks)src/core/file/workers/globbyWorker.ts(0 hunks)src/core/metrics/calculateGitDiffMetrics.ts(2 hunks)src/core/metrics/calculateGitLogMetrics.ts(2 hunks)src/core/metrics/calculateMetrics.ts(3 hunks)src/core/metrics/calculateOutputMetrics.ts(2 hunks)src/core/metrics/calculateSelectiveFileMetrics.ts(3 hunks)src/core/metrics/workers/fileMetricsWorker.ts(0 hunks)src/core/metrics/workers/gitDiffMetricsWorker.ts(0 hunks)src/core/metrics/workers/gitLogMetricsWorker.ts(0 hunks)src/core/metrics/workers/outputMetricsWorker.ts(0 hunks)src/core/metrics/workers/unifiedMetricsWorker.ts(1 hunks)src/core/security/securityCheck.ts(1 hunks)src/shared/errorHandle.ts(4 hunks)src/shared/logger.ts(1 hunks)src/shared/processConcurrency.ts(1 hunks)tests/cli/actions/defaultAction.test.ts(4 hunks)tests/cli/actions/defaultAction.tokenCountTree.test.ts(3 hunks)tests/core/file/fileSearch.test.ts(16 hunks)tests/core/metrics/calculateMetrics.test.ts(1 hunks)tests/core/metrics/calculateOutputMetrics.test.ts(10 hunks)tests/core/metrics/calculateSelectiveFileMetrics.test.ts(4 hunks)tests/integration-tests/packager.test.ts(1 hunks)tests/shared/processConcurrency.test.ts(1 hunks)
💤 Files with no reviewable changes (6)
- src/core/metrics/workers/fileMetricsWorker.ts
- src/core/metrics/workers/gitDiffMetricsWorker.ts
- src/core/metrics/workers/gitLogMetricsWorker.ts
- src/core/metrics/workers/outputMetricsWorker.ts
- src/core/file/workers/globbyWorker.ts
- src/core/file/globbyExecute.ts
🧰 Additional context used
🧬 Code graph analysis (15)
src/shared/logger.ts (1)
src/index.ts (1)
setLogLevel(38-38)
tests/core/file/fileSearch.test.ts (2)
src/core/file/fileSearch.ts (1)
searchFiles(96-249)src/index.ts (1)
searchFiles(11-11)
src/cli/actions/workers/defaultActionWorker.ts (5)
src/shared/logger.ts (3)
setLogLevelByWorkerData(99-129)logger(89-89)error(34-38)src/config/configSchema.ts (1)
RepomixConfigMerged(165-165)src/cli/cliSpinner.ts (1)
Spinner(9-70)src/shared/errorHandle.ts (1)
RepomixError(5-10)src/core/file/fileStdin.ts (1)
readFilePathsFromStdin(69-116)
tests/cli/actions/defaultAction.test.ts (3)
tests/testing/testUtils.ts (1)
createMockConfig(15-45)src/cli/types.ts (1)
CliOptions(4-60)src/cli/actions/defaultAction.ts (1)
buildCliConfig(96-267)
src/shared/processConcurrency.ts (1)
src/shared/logger.ts (1)
logger(89-89)
src/core/metrics/calculateGitLogMetrics.ts (5)
src/config/configSchema.ts (1)
RepomixConfigMerged(165-165)src/core/git/gitLogHandle.ts (1)
GitLogResult(21-24)src/shared/processConcurrency.ts (1)
TaskRunner(97-100)src/core/metrics/workers/unifiedMetricsWorker.ts (1)
UnifiedMetricsTask(41-41)src/core/metrics/workers/types.ts (1)
FileMetrics(1-5)
src/core/metrics/calculateSelectiveFileMetrics.ts (4)
src/shared/processConcurrency.ts (1)
TaskRunner(97-100)src/core/metrics/workers/unifiedMetricsWorker.ts (2)
UnifiedMetricsTask(41-41)task(43-106)src/core/metrics/workers/types.ts (1)
FileMetrics(1-5)src/shared/logger.ts (1)
logger(89-89)
src/core/metrics/calculateMetrics.ts (3)
src/shared/processConcurrency.ts (1)
initTaskRunner(102-108)src/core/metrics/workers/unifiedMetricsWorker.ts (1)
UnifiedMetricsTask(41-41)src/core/metrics/workers/types.ts (1)
FileMetrics(1-5)
src/core/metrics/calculateOutputMetrics.ts (3)
src/shared/processConcurrency.ts (1)
TaskRunner(97-100)src/core/metrics/workers/unifiedMetricsWorker.ts (1)
UnifiedMetricsTask(41-41)src/core/metrics/workers/types.ts (1)
FileMetrics(1-5)
src/cli/actions/defaultAction.ts (4)
src/shared/processConcurrency.ts (1)
initTaskRunner(102-108)src/cli/actions/workers/defaultActionWorker.ts (4)
DefaultActionTask(14-20)PingTask(22-26)DefaultActionWorkerResult(28-31)PingResult(33-36)src/cli/cliReport.ts (1)
reportResults(13-47)src/config/configSchema.ts (1)
RepomixConfigMerged(165-165)
src/core/metrics/calculateGitDiffMetrics.ts (5)
src/config/configSchema.ts (1)
RepomixConfigMerged(165-165)src/core/git/gitDiffHandle.ts (1)
GitDiffResult(7-10)src/shared/processConcurrency.ts (1)
TaskRunner(97-100)src/core/metrics/workers/unifiedMetricsWorker.ts (1)
UnifiedMetricsTask(41-41)src/core/metrics/workers/types.ts (1)
FileMetrics(1-5)
tests/core/metrics/calculateOutputMetrics.test.ts (1)
src/core/metrics/workers/unifiedMetricsWorker.ts (2)
task(43-106)UnifiedMetricsTask(41-41)
src/shared/errorHandle.ts (1)
src/shared/logger.ts (1)
error(34-38)
src/core/metrics/workers/unifiedMetricsWorker.ts (3)
src/shared/logger.ts (2)
setLogLevelByWorkerData(99-129)error(34-38)src/core/metrics/workers/types.ts (1)
FileMetrics(1-5)src/core/metrics/tokenCounterFactory.ts (2)
getTokenCounter(12-19)freeTokenCounters(25-31)
tests/core/metrics/calculateSelectiveFileMetrics.test.ts (1)
src/core/metrics/workers/unifiedMetricsWorker.ts (2)
task(43-106)UnifiedMetricsTask(41-41)
🔇 Additional comments (31)
src/core/security/securityCheck.ts (1)
61-61: Runtime consistency across worker types.Good to see consistent adoption of
worker_threadsruntime across both file processing and security checking workers.package.json (1)
31-31: Source maps enabled for better debugging.Enabling source maps with
--enable-source-mapsis excellent for debugging TypeScript code in production, especially when dealing with worker threads where stack traces can be harder to follow.src/shared/logger.ts (1)
100-114: Robust environment variable validation.The implementation properly validates the log level from the environment variable before applying it. Good defensive programming with explicit validation against known log levels.
tests/integration-tests/packager.test.ts (1)
7-14: Integration test simplification improves test reliability.Removing the mock for
executeGlobbyInWorkerand using the actual globby implementation directly is a good change. This makes the integration tests more realistic by testing the actual file search behavior rather than a mocked worker.src/shared/errorHandle.ts (2)
22-34: Good move: duck-typed guards for cross-worker errors.Switching from
instanceofto structural checks avoids false negatives across process/thread boundaries.
78-93: LGTM: RepomixError duck-typing is precise and safe.Using
nameto distinguishRepomixErrorandRepomixConfigValidationErroris appropriate across serialization boundaries.tests/core/metrics/calculateMetrics.test.ts (1)
70-72: LGTM: verifies DI of taskRunner.The expectation guards against regressions in the unified runner path.
src/shared/processConcurrency.ts (1)
54-66: Good addition: pass minimal env to child_process workers.Propagating
REPOMIX_LOG_LEVEL,FORCE_COLOR, andTERMimproves parity and UX for subprocess workers.If we want parity for
worker_threadslogs too, we could also readREPOMIX_LOG_LEVELin worker init (already supported bysetLogLevelByWorkerData). No change required.tests/shared/processConcurrency.test.ts (1)
88-92: LGTM: asserts env propagation for child_process runtime.Matches the new Tinypool config surface.
src/core/file/fileSearch.ts (1)
216-224: LGTM: direct globby directory scan aligns with file scan options.Flags (
cwd,ignore,ignoreFiles,dot,followSymbolicLinks) are consistent between file and directory queries.tests/cli/actions/defaultAction.tokenCountTree.test.ts (1)
12-12: Consider stubbing readiness helpers from processConcurrency.If defaultAction begins calling waitForWorkerReady/ping helpers, this test may hang. Stub them to safe no-ops in the mock to future‑proof.
tests/core/file/fileSearch.test.ts (4)
23-25: Confirm globby export shape.Repository now imports { globby } from 'globby' and mocks it likewise. Ensure the installed globby version exposes a named globby export; if not, switch to default import/mocking to avoid brittle tests.
94-101: LGTM: directory/file dual-call simulation.Good split on onlyDirectories vs files to validate includeEmptyDirectories flow.
264-275: Solid assertion of globby options.Verifies key flags (cwd, ignore, ignoreFiles, onlyFiles, dot, followSymbolicLinks). Nice.
300-313: Subdir .gitignore behavior covered.Windows is excluded; path handling via path.join/dirname + minimatch is appropriate for the simulation.
src/core/metrics/calculateMetrics.ts (1)
100-103: Good: cleanup in finally.Ensures no worker leaks on success or error.
tests/core/metrics/calculateSelectiveFileMetrics.test.ts (1)
14-23: LGTM: task runner stub delegates to unified worker.Keeps behavior close to production path while remaining deterministic.
src/core/metrics/calculateGitLogMetrics.ts (2)
11-15: Deps-injected runner is the right move.Clean separation; return shape stays stable.
34-39: Graceful fallback with logging.Covers empty content and errors; consistent with unified worker’s 0-on-error behavior for gitLog.
src/core/metrics/calculateGitDiffMetrics.ts (2)
20-23: LGTM: early exit for empty diffsThe guard avoids unnecessary worker calls when both contents are empty. Good.
29-35: Consistent dependency injection for the unified workerThe shift to deps.taskRunner aligns with the new unified worker design and keeps cleanup ownership with the caller. Looks good.
src/cli/actions/defaultAction.ts (1)
51-56: Worker implements PingTask — no action requireddefaultActionWorker.ts contains an if ('ping' in task) branch that returns { ping: true, config: task.config } (lines 43–47).
src/cli/actions/workers/defaultActionWorker.ts (4)
10-13: LGTM: early log-level setup in worker.
Good placement before any logging.
22-36: Bun readiness handshake looks solid.
Ping task/result shape and the early-return branch keep init cheap.Also applies to: 42-48
107-111: LGTM: explicit teardown hook exported.
Keeps the worker contract consistent even if no cleanup is needed today.
72-83: Confirmpackcall signature with explicit filePaths.
This code calls pack([cwd], config, (message)=>spinner.update(message), {}, stdinResult.filePaths) (src/cli/actions/workers/defaultActionWorker.ts:72-83). Repo search produced no pack definition or other call sites, so I could not verify the exported signature — confirm pack accepts options then filePaths in this order and update the call or export if it differs.src/core/metrics/calculateSelectiveFileMetrics.ts (2)
15-16: Nice DI: metrics now use an injected TaskRunner.
Improves testability and unifies worker lifecycle.
24-33: Good use ofsatisfiesfor discriminated tasks.
Ensures task shape stays aligned with the unified worker.src/core/metrics/workers/unifiedMetricsWorker.ts (3)
7-10: LGTM: initialize log level at worker load.
Prevents noisy logs under worker_threads/child_process.
96-106: Sane error policy; special-cases gitLog to 0.
Good balance between resilience and signal.
113-116: LGTM: freeing token counters on teardown.
Prevents leaks across Tinypool worker reuse.
CI was failing because test jobs were trying to import compiled worker files from lib/ directory, but the build step was missing. Added npm run build step before running tests in all test jobs to ensure TypeScript files are compiled to JavaScript before tests run.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #851 +/- ##
==========================================
+ Coverage 87.27% 87.78% +0.50%
==========================================
Files 113 109 -4
Lines 6839 6827 -12
Branches 1408 1422 +14
==========================================
+ Hits 5969 5993 +24
+ Misses 870 834 -36 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Deploying repomix with
|
| Latest commit: |
1badf53
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://4e61af15.repomix.pages.dev |
| Branch Preview URL: | https://feat-process-wrap.repomix.pages.dev |
…rectly Replaced worker file imports with direct function calls in tests to improve coverage and remove CI build requirements. Key changes: - Extracted calculateUnifiedMetrics function from unifiedMetricsWorker for direct testing - Added taskRunner dependency injection to calculateMetrics function - Updated all metric tests to use mocked taskRunner instead of actual worker files - Removed build steps from CI test jobs (test, test-bun, test-coverage) This approach provides better test coverage since worker logic runs directly in the test process rather than being executed in separate worker threads. Tests now run faster without requiring TypeScript compilation.
… callers - Rename unifiedMetricsWorker to calculateMetricsWorker for clarity - Simplify worker from complex switch-case to pure token counting - Move business logic (FileMetrics building, git diff combining) to calling functions - Update all metric calculation functions to use new simple worker interface - Improve test coverage by enabling direct worker function mocking - Maintain separation of concerns with focused worker responsibility This change eliminates historical naming confusion and creates a cleaner architecture where workers handle only token counting while callers handle metric aggregation and business logic.
ed28a3c to
e1be3df
Compare
- Add tests for defaultActionWorker.ts covering ping, directory processing, stdin handling, error cases, and path resolution - Add tests for calculateGitDiffMetrics.ts covering config validation, diff processing, error handling, and encoding - Add tests for calculateGitLogMetrics.ts covering log processing, error scenarios, and edge cases - All test files follow project conventions with proper mocking and type safety - Total of 46 new tests ensuring robust coverage of worker functionality These tests improve code coverage and provide safety net for future changes to the metrics calculation and worker system components.
- Import path module and use path.resolve() for all path assertions - Fix failing tests on Windows CI by handling platform-specific path separators - Ensure tests work correctly on both Unix and Windows systems - Maintain test functionality while improving cross-platform compatibility This resolves CI failures on Windows builds while preserving test coverage.
…nWorker tests - Fix stdin processing tests to expect raw cwd path instead of resolved path - Aligns test expectations with actual implementation where stdin uses [cwd] directly - Resolves Windows CI failures where path.resolve() was incorrectly applied to stdin cases - Maintains correct expectations for directory processing vs stdin processing paths This fixes the remaining Windows CI test failures while preserving test accuracy.
Address PR review feedback: - Fix worker path to use relative path instead of lib directory - Add proper function overloads for defaultActionWorker - Remove unsafe type assertions in worker code - Improve error handling with optional stack property - Extract log level validation logic to reduce duplication - Add NaN check for environment variable parsing All tests pass and linting issues resolved.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and well-executed refactoring to optimize worker performance and improve the overall architecture. The consolidation of metrics calculation into a unified worker pool is a great enhancement for performance and maintainability. Moving the main action to a child_process for better isolation and improving error handling across process boundaries are also excellent changes. The code quality is high, and the test suite has been updated thoroughly to reflect the new architecture. I have a couple of minor suggestions to further improve maintainability and error information.
- Remove unnecessary config parameter from PingTask interface for better performance - Improve error serialization using util.inspect with safety limits - Update function signatures to match simplified ping interface - Add comprehensive error details formatting with depth and length limits These changes address PR review comments and improve worker efficiency.
This PR optimizes the worker system performance by consolidating worker lifecycle management and improving concurrency handling.
Summary
Key Changes
Unified Worker System (commits 76dcbce, 78b25b8, 681e377)
Main Process Optimization (commit bd88340)
Child Process Integration (commit 8a88f45)
Bun Compatibility Fix (commit 8efeaf9)
Performance Impact
Checklist
npm run testnpm run lint