Skip to content

feat(worker): Add unified worker entry point for bundling support#1056

Merged
yamadashy merged 13 commits intomainfrom
feat/unified-worker-bundling
Jan 2, 2026
Merged

feat(worker): Add unified worker entry point for bundling support#1056
yamadashy merged 13 commits intomainfrom
feat/unified-worker-bundling

Conversation

@yamadashy
Copy link
Owner

Summary

This PR adds a unified worker entry point that enables full bundling support for the website server, improving Cloud Run cold start times.

Key Changes

  • Unified Worker (src/shared/unifiedWorker.ts): Single entry point for all worker types that dynamically routes tasks to appropriate handlers based on task structure inference
  • Bundled Environment Support: Environment variables (REPOMIX_WORKER_PATH, REPOMIX_WASM_DIR) enable bundled environments to override default paths
  • Website Server Integration: Server can now be bundled as a single file while maintaining worker functionality
  • Backward Compatible: Normal CLI usage is completely unaffected - new code paths only activate when environment variables are set

How It Works

Normal CLI:           processConcurrency.ts → individual worker files
Bundled Environment:  processConcurrency.ts → unifiedWorker.ts → dynamic import based on task

Files Changed

File Purpose
src/shared/unifiedWorker.ts New unified worker dispatcher
src/shared/processConcurrency.ts Worker path selection logic
src/core/treeSitter/loadLanguage.ts Custom WASM path support
src/index.ts Export unified worker for bundled environments
website/server/src/index.ts Re-export worker handler for Tinypool

Testing

  • Added comprehensive tests for unifiedWorker.ts
  • All 1059 existing tests pass
  • Lint passes

Checklist

  • Run npm run test
  • Run npm run lint

Add a unified worker entry point that enables full bundling support by
allowing bundled files to spawn workers using themselves. This is a
prerequisite for bundling the website server to improve Cloud Run cold
start times.

Changes:
- Add src/shared/unifiedWorker.ts as single entry point for all workers
- Support both worker_threads and child_process runtimes
- Add REPOMIX_WORKER_TYPE env var for child_process worker type detection
- Add REPOMIX_WORKER_PATH env var for bundled environment worker path
- Add REPOMIX_WASM_DIR env var for WASM file location override
- Update processConcurrency.ts to use unified worker path
- Add debug logging (REPOMIX_DEBUG_WORKER=1) for worker troubleshooting
- Export unified worker handler from main index.ts

Note: This is work in progress. There's a known issue with child_process
runtime where nested worker pools (created inside a worker) may receive
incorrect REPOMIX_WORKER_TYPE environment variable, causing task routing
issues. Investigation ongoing.
Modify website server entry point to support being used as both server
and worker entry in bundled environments:

- Re-export unified worker handler from repomix for Tinypool
- Add isTinypoolWorker() check to skip server startup when running as worker
- Wrap server initialization in conditional block

This enables esbuild bundling of the server while maintaining worker
functionality for Cloud Run cold start optimization.
Fix regression where fileCollect tasks were incorrectly routed to
defaultActionWorker due to REPOMIX_WORKER_TYPE environment variable
inheritance in child_process mode.

Changes:
- Add getWorkerPath() that returns individual worker file paths
- Only use unified worker when REPOMIX_WORKER_PATH is explicitly set
- Move WorkerType definition to processConcurrency.ts to avoid circular import

This ensures the regular CLI works correctly while still supporting
bundled environments when REPOMIX_WORKER_PATH is set.
Fix issue where Tinypool reuses child processes across different worker
pools in bundled environments, causing tasks to be routed to incorrect
handlers.

Changes:
- Add inferWorkerTypeFromTask() to determine worker type from task structure
- Add getWorkerTypeFromWorkerData() to handle Tinypool's array workerData format
- Cache handlers by worker type instead of single loadedHandler
- Dynamically select handler based on inferred or configured worker type

This enables bundled website server to correctly handle all worker types
(fileCollect, fileProcess, securityCheck, calculateMetrics, defaultAction)
even when child processes are reused.
Remove code that was added for debugging during development:
- Remove unused isTinypoolWorker function from unifiedWorker.ts
- Remove REPOMIX_DEBUG_WORKER logging from unifiedWorker.ts
- Remove debug logging from defaultActionWorker.ts
- Remove unused getUnifiedWorkerPath export
- Update tests to use workerType instead of workerPath
- Consolidate WorkerType definition to unifiedWorker.ts
- Fix inferWorkerTypeFromTask order: check calculateMetrics before securityCheck
- Simplify cliOptions handling in defaultActionWorker
Based on multi-agent code review:
- Fix fileProcess inference: check for rawFile instead of filePath/content
- Fix calculateMetrics inference: check for content/encoding (path is optional)
- Fix securityCheck inference: add type field check for specificity
- Remove unnecessary type assertion in defaultActionWorker
- Add comprehensive tests for unifiedWorker.ts covering task inference
  and worker termination cleanup
- Unify onWorkerTermination to async signature across all worker files
  for consistency (fileCollect, securityCheck, calculateMetrics)
Ensure all onWorkerTermination exports have consistent
`: Promise<void>` type annotations for better type safety.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 31, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

The PR transitions the worker system from file-path-based identification (workerPath) to type-based identification (workerType), introduces a unified worker dispatcher (unifiedWorker.ts), makes all worker termination hooks async, and adds support for bundled worker environments with configurable WASM paths and optional CLI options.

Changes

Cohort / File(s) Summary
Worker type transition
src/cli/actions/defaultAction.ts, src/core/file/fileCollect.ts, src/core/file/fileProcess.ts, src/core/metrics/calculateMetrics.ts, src/core/security/securityCheck.ts
Changed task runner initialization from workerPath (file path) to workerType (string literal like 'defaultAction', 'fileCollect', 'securityCheck', 'calculateMetrics'); no behavioral change to task flow.
Worker termination async conversion
src/cli/actions/workers/defaultActionWorker.ts, src/core/file/workers/fileCollectWorker.ts, src/core/file/workers/fileProcessWorker.ts, src/core/metrics/workers/calculateMetricsWorker.ts, src/core/security/workers/securityCheckWorker.ts
Updated onWorkerTermination signatures from synchronous to async, returning Promise<void>; internal bodies unchanged.
Worker infrastructure & dispatch
src/shared/processConcurrency.ts, src/shared/unifiedWorker.ts
Replaced WorkerOptions.workerPath: string with workerType: WorkerType; introduced new unified worker entry point with task-aware type inference, dynamic handler loading (cached per-type), and global cleanup hook; added internal getWorkerPath() helper to resolve file paths from type; passed workerType via workerData and environment variable.
Spinner & CLI environment support
src/cli/cliSpinner.ts, src/cli/actions/workers/defaultActionWorker.ts
Made Spinner constructor accept optional cliOptions? with safe fallback defaults; added input validation for task object and safe safeCliOptions variable in worker.
WASM path configuration
src/core/treeSitter/loadLanguage.ts
Introduced setWasmBasePath(basePath) public setter and internal path resolution logic supporting custom WASM base path (with REPOMIX_WASM_DIR environment variable fallback); updated getWasmPath() to prefer custom path over require.resolve.
Public API exports
src/index.ts, website/server/src/index.ts
Added exports for setWasmBasePath, unifiedWorkerHandler (as default), unifiedWorkerTermination, and WorkerType type; added Tinypool bundling support with conditional server initialization based on worker mode detection.
Test updates
tests/cli/actions/defaultAction.test.ts, tests/core/metrics/calculateGitDiffMetrics.test.ts, tests/core/metrics/calculateGitLogMetrics.test.ts, tests/core/metrics/calculateOutputMetrics.test.ts, tests/core/metrics/calculateSelectiveFileMetrics.test.ts, tests/shared/processConcurrency.test.ts, tests/shared/unifiedWorker.test.ts
Updated test expectations and mocks to use workerType instead of workerPath; added new test suite for unified worker dispatch, type inference, and cleanup hooks.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Task/Client
    participant UW as UnifiedWorker
    participant Disp as Type Dispatcher
    participant Handler as Per-Type Handler<br/>(cached)
    participant SpecWkr as Specific Worker<br/>(fileCollect/fileProcess/etc.)
    
    rect rgb(200, 220, 255)
        Note over Client,SpecWkr: Old Flow (workerPath-based)
        Client->>Client: Task with workerPath:<br/>'./path/to/fileCollectWorker.js'
        Client->>SpecWkr: Load & execute worker
        SpecWkr->>SpecWkr: Process task
    end
    
    rect rgb(220, 255, 220)
        Note over Client,SpecWkr: New Flow (workerType-based via UnifiedWorker)
        Client->>UW: Task with workerType<br/>from env/workerData
        UW->>Disp: inferWorkerTypeFromTask(task)
        Disp-->>UW: Determined worker type<br/>(e.g., 'fileCollect')
        UW->>Handler: Dynamic import cached<br/>per workerType
        Handler->>SpecWkr: Handler delegates<br/>to specific worker logic
        SpecWkr->>SpecWkr: Process task payload
        SpecWkr-->>Handler: Task result
        Handler-->>UW: Return result
        UW-->>Client: Task output
    end
    
    rect rgb(255, 240, 200)
        Note over UW,SpecWkr: Cleanup (onWorkerTermination)
        Client->>UW: Termination signal
        UW->>Handler: onWorkerTermination()
        Handler->>SpecWkr: Per-type cleanup hook
        SpecWkr->>SpecWkr: Cleanup (e.g., free<br/>token counters)
        SpecWkr-->>Handler: Cleanup complete
        Handler-->>UW: Promise<void> resolved
        UW->>UW: Clear handler cache
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding a unified worker entry point to enable bundling support, which is the primary objective of this PR.
Description check ✅ Passed The PR description comprehensively covers the changes with a clear summary, key changes section, implementation details, files changed, and testing status. Both checklist items are completed.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @yamadashy, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the application's bundling capabilities, particularly for the website server, by introducing a unified worker entry point. This change aims to reduce Cloud Run cold start times by allowing all worker logic to be consolidated into a single bundled file. It refactors how worker tasks are dispatched, moving from explicit worker file paths to a type-based system that dynamically loads the correct handler. Additionally, it provides flexibility for WASM file locations in bundled contexts and maintains full backward compatibility for standard CLI operations.

Highlights

  • Unified Worker Entry Point: Introduced src/shared/unifiedWorker.ts to act as a single entry point for all worker types, dynamically routing tasks to specific handlers based on their structure.
  • Bundling Support: Enabled full bundling for the website server by allowing workers to be spawned from a single bundled file, improving cold start times, especially for Cloud Run.
  • Configurable WASM Paths: Added support for custom WebAssembly (WASM) file paths via REPOMIX_WASM_DIR environment variable or setWasmBasePath() for bundled environments.
  • Backward Compatibility: Ensured that existing CLI usage remains unaffected, with new bundled worker logic activating only when specific environment variables are set.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@codecov
Copy link

codecov bot commented Dec 31, 2025

Codecov Report

❌ Patch coverage is 84.84848% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.24%. Comparing base (46857e5) to head (534b238).
⚠️ Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
src/shared/unifiedWorker.ts 85.71% 15 Missing ⚠️
src/shared/processConcurrency.ts 83.33% 4 Missing ⚠️
src/core/treeSitter/loadLanguage.ts 80.00% 3 Missing ⚠️
src/cli/actions/workers/defaultActionWorker.ts 80.00% 2 Missing ⚠️
src/core/metrics/calculateMetrics.ts 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1056      +/-   ##
==========================================
- Coverage   90.34%   90.24%   -0.11%     
==========================================
  Files         123      124       +1     
  Lines        9457     9605     +148     
  Branches     1709     1754      +45     
==========================================
+ Hits         8544     8668     +124     
- Misses        913      937      +24     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Dec 31, 2025

Deploying repomix with  Cloudflare Pages  Cloudflare Pages

Latest commit: 534b238
Status: ✅  Deploy successful!
Preview URL: https://de73bc30.repomix.pages.dev
Branch Preview URL: https://feat-unified-worker-bundling.repomix.pages.dev

View logs

@claude
Copy link
Contributor

claude bot commented Dec 31, 2025

PR Review: feat(worker): Add unified worker entry point for bundling support

Overall Assessment

This is a well-designed PR that solves a real problem - enabling bundling support for the website server to improve Cloud Run cold start times. The architecture is sound and backward compatible.


✅ Strengths

  1. Clean abstraction: The workerType abstraction replaces hardcoded worker paths, making the code more maintainable
  2. Backward compatibility: Normal CLI usage is unaffected - new paths only activate when env variables are set
  3. Comprehensive tests: Good test coverage for the unified worker inference logic
  4. Defensive coding: Added validation and safe defaults for cliOptions in bundled environments

⚠️ Issues & Recommendations

1. Task Inference Order Could Cause Subtle Bugs (Medium Priority)

In unifiedWorker.ts:108-116, the inference logic checks calculateMetrics before securityCheck:

// calculateMetrics: has content, encoding (must check before securityCheck)
if ('content' in taskObj && 'encoding' in taskObj) {
  return 'calculateMetrics';
}

// securityCheck: has filePath, content, type
if ('filePath' in taskObj && 'content' in taskObj && 'type' in taskObj) {
  return 'securityCheck';
}

The comment says "must check before securityCheck", but this order dependency is fragile. If a future task type happens to have content and encoding fields, it could be misclassified.

Recommendation: Consider adding a discriminator field (e.g., taskType) to each task instead of relying on structural inference. This would be more explicit and less error-prone.

2. Module-level Mutable State in loadLanguage.ts (Low Priority)
let customWasmBasePath: string | null = null;

This module-level mutable state could cause issues if the library is used in multiple contexts. However, since it's only meant for bundled environments, this is acceptable.

Recommendation: Add a note to the JSDoc that setWasmBasePath() affects all subsequent calls globally.

3. Coverage Gaps (For Information)

Codecov reports 84.85% patch coverage with 25 lines missing. Key areas:

  • unifiedWorker.ts: 15 lines uncovered (likely error paths and edge cases)
  • processConcurrency.ts: 4 lines uncovered (bundled environment paths)
  • loadLanguage.ts: 3 lines uncovered (custom WASM path code)

These are mostly environment-specific code paths that are difficult to test in isolation. The test coverage is acceptable given the nature of the changes.


🔍 Premortem Analysis

Failure Scenario Risk Mitigation
Task inference misclassifies a task Medium Covered by the explicit property checks, but adding tests for ambiguous task structures would help
Environment variables not propagated in child_process Low Already handled via REPOMIX_WORKER_TYPE env var
WASM files not found in bundled env Medium Error message is clear, but docs should specify expected file layout
Tinypool worker reuse causes wrong handler Low Task inference takes precedence over workerData, mitigating this

📋 Minor Suggestions

  1. Type safety: The biome-ignore for noExplicitAny is reasonable given the varying worker signatures, but consider using a generic or discriminated union in the future

  2. Documentation: Consider adding a brief section to CLAUDE.md or README about bundled environment setup if this feature is intended for external use


Verdict

Approve - The PR is well-implemented with good test coverage. The design is clean and the implementation is backward compatible. The minor issues noted above don't block merging.


🤖 Reviewed by Claude

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR introduces a unified worker entry point to support bundling, which is a great improvement for deployment scenarios like Cloud Run. The implementation is well-executed, using a dynamic dispatcher in unifiedWorker.ts that correctly routes tasks to their respective handlers. The changes to support custom WASM paths and the fallback mechanism for inferring worker types show great attention to detail. The refactoring is applied consistently across the codebase, and the new tests for the unified worker are comprehensive.

I have one suggestion regarding the website/server/src/index.ts file to maintain testability by ensuring the Hono app instance is always exported.

Comment on lines +13 to +72
// Re-export unified worker for bundled environment
// When this file is used as a Tinypool worker, it needs to export the handler
export { unifiedWorkerHandler as default, unifiedWorkerTermination as onWorkerTermination } from 'repomix';

// Log server metrics on startup
logInfo('Server starting', {
metrics: {
processConcurrency: getProcessConcurrency(),
},
});
// Check if running as a Tinypool worker (bundled environment)
// In bundled mode, this file is used both as server entry and worker entry
const isTinypoolWorker = (): boolean => {
const tinypoolState = (process as NodeJS.Process & { __tinypool_state__?: { isTinypoolWorker?: boolean } })
.__tinypool_state__;
return tinypoolState?.isTinypoolWorker ?? false;
};

// Skip server initialization if running as a Tinypool worker
if (!isTinypoolWorker()) {
const API_TIMEOUT_MS = 35_000;

// Log initial memory usage
logMemoryUsage('Server startup', {
processConcurrency: getProcessConcurrency(),
});
// Log server metrics on startup
logInfo('Server starting', {
metrics: {
processConcurrency: getProcessConcurrency(),
},
});

const app = new Hono();
// Log initial memory usage
logMemoryUsage('Server startup', {
processConcurrency: getProcessConcurrency(),
});

// Configure CORS
app.use('/*', corsMiddleware);
const app = new Hono();

// Enable compression
app.use(compress());
// Configure CORS
app.use('/*', corsMiddleware);

// Set timeout for API routes
app.use('/api', timeout(API_TIMEOUT_MS));
// Enable compression
app.use(compress());

// Setup custom logger
app.use('*', cloudLoggerMiddleware());
// Set timeout for API routes
app.use('/api', timeout(API_TIMEOUT_MS));

// Apply rate limiting to API routes
app.use('/api/*', rateLimitMiddleware());
// Setup custom logger
app.use('*', cloudLoggerMiddleware());

// Health check endpoint
app.get('/health', (c) => c.text('OK'));
// Apply rate limiting to API routes
app.use('/api/*', rateLimitMiddleware());

// Main packing endpoint
app.post('/api/pack', bodyLimitMiddleware, packAction);
// Health check endpoint
app.get('/health', (c) => c.text('OK'));

// Start server
const port = process.env.PORT ? Number.parseInt(process.env.PORT, 10) : 3000;
logInfo(`Server starting on port ${port}`);
// Main packing endpoint
app.post('/api/pack', bodyLimitMiddleware, packAction);

serve({
fetch: app.fetch,
port,
});
// Start server
const port = process.env.PORT ? Number.parseInt(process.env.PORT, 10) : 3000;
logInfo(`Server starting on port ${port}`);

// Export app for testing
export default app;
serve({
fetch: app.fetch,
port,
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The recent changes have removed the export of the app instance, which was previously available for testing purposes. While the logic to differentiate between server and worker modes is correct, removing this export can break testing setups that rely on importing the app instance directly.

To maintain testability, I suggest refactoring to ensure the app instance is always created and exported, while the server-specific logic (like starting the server with serve) remains conditional.

import { serve } from '@hono/node-server';
import { Hono } from 'hono';
import { compress } from 'hono/compress';
import { timeout } from 'hono/timeout';
import { packAction } from './actions/packAction.js';
import { bodyLimitMiddleware } from './middlewares/bodyLimit.js';
import { cloudLoggerMiddleware } from './middlewares/cloudLogger.js';
import { corsMiddleware } from './middlewares/cors.js';
import { rateLimitMiddleware } from './middlewares/rateLimit.js';
import { logInfo, logMemoryUsage } from './utils/logger.js';
import { getProcessConcurrency } from './utils/processConcurrency.js';

// Re-export unified worker for bundled environment
// When this file is used as a Tinypool worker, it needs to export the handler
export { unifiedWorkerHandler as default, unifiedWorkerTermination as onWorkerTermination } from 'repomix';

// Check if running as a Tinypool worker (bundled environment)
// In bundled mode, this file is used both as server entry and worker entry
const isTinypoolWorker = (): boolean => {
  const tinypoolState = (process as NodeJS.Process & { __tinypool_state__?: { isTinypoolWorker?: boolean } })
    .__tinypool_state__;
  return tinypoolState?.isTinypoolWorker ?? false;
};

const app = new Hono();

// Only configure and run the server if not in worker mode.
if (!isTinypoolWorker()) {
  const API_TIMEOUT_MS = 35_000;

  // Log server metrics on startup
  logInfo('Server starting', {
    metrics: {
      processConcurrency: getProcessConcurrency(),
    },
  });

  // Log initial memory usage
  logMemoryUsage('Server startup', {
    processConcurrency: getProcessConcurrency(),
  });

  // Configure CORS
  app.use('/*', corsMiddleware);

  // Enable compression
  app.use(compress());

  // Set timeout for API routes
  app.use('/api', timeout(API_TIMEOUT_MS));

  // Setup custom logger
  app.use('*', cloudLoggerMiddleware());

  // Apply rate limiting to API routes
  app.use('/api/*', rateLimitMiddleware());

  // Health check endpoint
  app.get('/health', (c) => c.text('OK'));

  // Main packing endpoint
  app.post('/api/pack', bodyLimitMiddleware, packAction);

  // Start server
  const port = process.env.PORT ? Number.parseInt(process.env.PORT, 10) : 3000;
  logInfo(`Server starting on port ${port}`);

  serve({
    fetch: app.fetch,
    port,
  });
}

// Export app for testing purposes
export default app;

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cli/actions/workers/defaultActionWorker.ts (1)

12-17: Type definition inconsistency: cliOptions should be optional.

The DefaultActionTask interface declares cliOptions: CliOptions (not optional), but line 59 treats it as potentially undefined with cliOptions ?? {}. This creates a type system mismatch where TypeScript won't catch cases where cliOptions is actually undefined.

🔎 Proposed fix
 export interface DefaultActionTask {
   directories: string[];
   cwd: string;
   config: RepomixConfigMerged;
-  cliOptions: CliOptions;
+  cliOptions?: CliOptions;
   stdinFilePaths?: string[];
 }
🧹 Nitpick comments (2)
src/cli/actions/workers/defaultActionWorker.ts (1)

46-56: Consider consistency with other worker validation patterns.

The task validation logic added here (checking for object type and array type) is not present in other workers like fileProcessWorker.ts or fileCollectWorker.ts. While defensive validation is beneficial, consider whether this pattern should be applied consistently across all workers or if it's specifically needed for this worker due to the child_process runtime.

website/server/src/index.ts (1)

17-23: Consider adding a comment about Tinypool internal API usage.

The isTinypoolWorker function accesses Tinypool's internal __tinypool_state__ property. While this works, it could break if Tinypool changes its internals. The safe fallback (?? false) is good defensive coding.

🔎 Suggested documentation
 // Check if running as a Tinypool worker (bundled environment)
 // In bundled mode, this file is used both as server entry and worker entry
+// Note: Uses Tinypool's internal __tinypool_state__ property; may need updates if Tinypool changes
 const isTinypoolWorker = (): boolean => {
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 46857e5 and 6323e4c.

📒 Files selected for processing (23)
  • src/cli/actions/defaultAction.ts
  • src/cli/actions/workers/defaultActionWorker.ts
  • src/cli/cliSpinner.ts
  • src/core/file/fileCollect.ts
  • src/core/file/fileProcess.ts
  • src/core/file/workers/fileCollectWorker.ts
  • src/core/file/workers/fileProcessWorker.ts
  • src/core/metrics/calculateMetrics.ts
  • src/core/metrics/workers/calculateMetricsWorker.ts
  • src/core/security/securityCheck.ts
  • src/core/security/workers/securityCheckWorker.ts
  • src/core/treeSitter/loadLanguage.ts
  • src/index.ts
  • src/shared/processConcurrency.ts
  • src/shared/unifiedWorker.ts
  • tests/cli/actions/defaultAction.test.ts
  • tests/core/metrics/calculateGitDiffMetrics.test.ts
  • tests/core/metrics/calculateGitLogMetrics.test.ts
  • tests/core/metrics/calculateOutputMetrics.test.ts
  • tests/core/metrics/calculateSelectiveFileMetrics.test.ts
  • tests/shared/processConcurrency.test.ts
  • tests/shared/unifiedWorker.test.ts
  • website/server/src/index.ts
🧰 Additional context used
🧬 Code graph analysis (10)
src/core/file/workers/fileCollectWorker.ts (4)
src/core/file/workers/fileProcessWorker.ts (1)
  • onWorkerTermination (25-27)
src/core/metrics/workers/calculateMetricsWorker.ts (1)
  • onWorkerTermination (48-50)
src/core/security/workers/securityCheckWorker.ts (1)
  • onWorkerTermination (88-90)
src/index.ts (1)
  • onWorkerTermination (66-66)
src/core/file/workers/fileProcessWorker.ts (3)
src/cli/actions/workers/defaultActionWorker.ts (1)
  • onWorkerTermination (119-122)
src/core/file/workers/fileCollectWorker.ts (1)
  • onWorkerTermination (54-56)
src/core/security/workers/securityCheckWorker.ts (1)
  • onWorkerTermination (88-90)
src/core/treeSitter/loadLanguage.ts (1)
src/index.ts (1)
  • setWasmBasePath (27-27)
src/cli/cliSpinner.ts (1)
src/cli/types.ts (1)
  • CliOptions (4-72)
src/core/security/workers/securityCheckWorker.ts (3)
src/core/file/workers/fileProcessWorker.ts (1)
  • onWorkerTermination (25-27)
src/core/file/workers/fileCollectWorker.ts (1)
  • onWorkerTermination (54-56)
src/core/metrics/workers/calculateMetricsWorker.ts (1)
  • onWorkerTermination (48-50)
tests/shared/processConcurrency.test.ts (1)
src/shared/processConcurrency.ts (2)
  • createWorkerPool (66-113)
  • initTaskRunner (141-147)
src/core/metrics/workers/calculateMetricsWorker.ts (2)
src/core/file/workers/fileCollectWorker.ts (1)
  • onWorkerTermination (54-56)
src/core/security/workers/securityCheckWorker.ts (1)
  • onWorkerTermination (88-90)
website/server/src/index.ts (7)
website/server/src/utils/logger.ts (2)
  • logInfo (34-39)
  • logMemoryUsage (64-72)
src/shared/processConcurrency.ts (1)
  • getProcessConcurrency (48-50)
website/server/src/middlewares/cors.ts (1)
  • corsMiddleware (3-21)
website/server/src/middlewares/cloudLogger.ts (1)
  • cloudLoggerMiddleware (20-98)
website/server/src/middlewares/rateLimit.ts (1)
  • rateLimitMiddleware (6-20)
website/server/src/middlewares/bodyLimit.ts (1)
  • bodyLimitMiddleware (5-12)
website/server/src/actions/packAction.ts (1)
  • packAction (72-159)
tests/shared/unifiedWorker.test.ts (3)
src/core/metrics/workers/calculateMetricsWorker.ts (2)
  • task (43-45)
  • onWorkerTermination (48-50)
src/core/file/workers/fileCollectWorker.ts (1)
  • onWorkerTermination (54-56)
src/core/security/workers/securityCheckWorker.ts (1)
  • onWorkerTermination (88-90)
src/shared/processConcurrency.ts (2)
src/shared/unifiedWorker.ts (1)
  • WorkerType (15-15)
src/index.ts (1)
  • WorkerType (67-67)
🪛 GitHub Actions: autofix.ci
src/index.ts

[error] 15-15: Module 'repomix' has no exported member 'unifiedWorkerHandler'.

website/server/src/index.ts

[error] 15-15: Module 'repomix' has no exported member 'unifiedWorkerHandler'.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: Test (ubuntu-latest, 20.x)
  • GitHub Check: Test (windows-latest, 22.x)
  • GitHub Check: Build and run (ubuntu-latest, 22.x)
  • GitHub Check: Build and run (windows-latest, 25.x)
  • GitHub Check: Test (windows-latest, 25.x)
  • GitHub Check: Test (macos-latest, 24.x)
  • GitHub Check: Test (windows-latest, 20.x)
  • GitHub Check: Build and run (macos-latest, 20.x)
  • GitHub Check: Build and run (macos-latest, 25.x)
  • GitHub Check: Test (windows-latest, 24.x)
  • GitHub Check: Build and run (windows-latest, 24.x)
  • GitHub Check: claude-review
  • GitHub Check: Build and run with Bun (windows-latest, latest)
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (33)
src/core/file/workers/fileCollectWorker.ts (1)

54-56: LGTM! Termination signature standardized.

The conversion of onWorkerTermination to an async function returning Promise<void> aligns with the unified worker termination pattern applied across all workers in this PR.

src/core/security/workers/securityCheckWorker.ts (1)

88-90: LGTM! Termination signature standardized.

The async termination hook signature is consistent with other workers and enables unified termination handling.

src/core/treeSitter/loadLanguage.ts (2)

8-28: Well-designed WASM path configuration for bundled environments.

The custom WASM base path configuration with environment variable fallback provides flexibility for bundled deployments while maintaining backward compatibility with standard node_modules resolution.


44-54: LGTM! Clean path resolution with proper fallback.

The WASM path resolution correctly prioritizes custom paths for bundled environments while falling back to standard require.resolve for node_modules.

src/core/metrics/calculateMetrics.ts (1)

42-48: LGTM! Worker identification migrated to type-based approach.

The migration from workerPath to workerType: 'calculateMetrics' is consistent with the unified worker pattern.

src/cli/actions/defaultAction.ts (1)

108-112: LGTM! Worker identification migrated to type-based approach.

The migration to workerType: 'defaultAction' is consistent with the unified worker pattern. Note that this worker uses child_process runtime rather than worker_threads, which aligns with the defaultAction worker's requirements.

src/core/metrics/workers/calculateMetricsWorker.ts (1)

48-50: LGTM! Termination signature standardized.

The async termination hook signature is consistent with other workers. The freeTokenCounters() call is synchronous based on the implementation pattern.

tests/core/metrics/calculateSelectiveFileMetrics.test.ts (2)

39-40: LGTM! Test updated to reflect new workerType API.

The test correctly uses workerType: 'calculateMetrics' instead of the previous workerPath approach, aligning with the unified worker dispatch mechanism.


60-61: LGTM! Consistent test update.

Both test cases now use the workerType approach consistently.

src/core/security/securityCheck.ts (1)

58-62: LGTM! Worker identification migrated to type-based approach.

The switch from workerPath to workerType: 'securityCheck' aligns with the unified worker dispatch mechanism. The worker type is properly registered in src/shared/unifiedWorker.ts (WorkerType union and handler dispatch) and correctly mapped in src/shared/processConcurrency.ts.

src/core/file/fileCollect.ts (1)

28-28: LGTM! Clean migration to workerType-based configuration.

The change from workerPath to workerType: 'fileCollect' aligns with the unified worker dispatch pattern introduced in this PR.

src/core/file/workers/fileProcessWorker.ts (1)

25-27: LGTM! Consistent async termination signature.

The updated onWorkerTermination signature aligns with the unified worker termination pattern used across all workers in this PR.

tests/cli/actions/defaultAction.test.ts (1)

150-154: LGTM! Test correctly reflects the new workerType-based API.

The test expectation now validates workerType: 'defaultAction' instead of checking workerPath, which is consistent with the API changes across the codebase.

tests/core/metrics/calculateGitLogMetrics.test.ts (1)

70-74: LGTM! Consistent test setup with workerType-based configuration.

The mock task runner initialization correctly uses workerType: 'calculateMetrics', aligning with the unified worker dispatch pattern.

tests/core/metrics/calculateGitDiffMetrics.test.ts (1)

70-74: LGTM! Consistent workerType-based test configuration.

The test setup correctly mirrors the workerType-based initialization pattern used throughout the codebase.

src/core/file/fileProcess.ts (1)

26-26: LGTM! Clean migration to type-based worker selection.

The change to workerType: 'fileProcess' is consistent with the unified worker dispatch pattern.

src/cli/actions/workers/defaultActionWorker.ts (1)

59-63: LGTM! Safe defaults and async termination.

The introduction of safeCliOptions appropriately handles potentially undefined cliOptions in bundled environments, and the async onWorkerTermination signature is consistent with other workers.

Also applies to: 69-70, 119-122

src/cli/cliSpinner.ts (1)

15-20: LGTM! Appropriate optional parameter with safe defaults.

Making cliOptions optional and using optional chaining provides safe operation in bundled worker environments where options may be undefined. The fallback to false ensures consistent behavior.

tests/core/metrics/calculateOutputMetrics.test.ts (1)

27-27: LGTM!

The test mocks correctly updated to use workerType: 'calculateMetrics' instead of workerPath, aligning with the new WorkerOptions interface. The runtime: 'worker_threads' addition is consistent across all test cases.

Also applies to: 38-38, 62-62, 74-74, 85-85, 113-113, 138-138, 164-168

src/shared/processConcurrency.ts (2)

17-43: LGTM - well-structured worker path resolution.

The getWorkerPath function cleanly separates bundled vs. non-bundled environments. The exhaustive switch with a default throw ensures compile-time and runtime safety for unknown worker types.


66-105: LGTM - consistent workerType propagation.

The createWorkerPool function correctly propagates workerType through both workerData (for worker_threads) and environment variables (for child_process), ensuring the unified worker can determine the appropriate handler in both runtime modes.

tests/shared/unifiedWorker.test.ts (3)

1-36: LGTM - comprehensive mock setup.

The test setup correctly mocks all worker modules and worker_threads, enabling isolated testing of the unified worker's dispatch and inference logic. Using vi.resetModules() in beforeEach ensures the handler cache is cleared between tests.


38-136: LGTM - thorough inference testing.

The tests cover all WorkerType inference paths and validate error handling for invalid task structures. Good coverage of edge cases including null and non-object tasks.


138-170: LGTM - termination flow properly validated.

The tests correctly verify that onWorkerTermination triggers cleanup on cached handlers and clears the cache, ensuring fresh handler loading on subsequent tasks.

src/index.ts (1)

27-27: LGTM - useful export for bundled environments.

Exporting setWasmBasePath enables consumers to configure Tree-sitter WASM file locations, which is essential for bundled deployments.

tests/shared/processConcurrency.test.ts (2)

74-113: LGTM - createWorkerPool tests correctly updated.

The tests properly validate that:

  1. workerType is passed to createWorkerPool
  2. Tinypool is initialized with the correct filename derived from workerType
  3. workerData includes workerType for worker identification

128-150: LGTM - initTaskRunner tests properly updated.

Tests correctly verify that initTaskRunner passes workerType through to createWorkerPool and that the resulting workerData contains the correct workerType value.

src/shared/unifiedWorker.ts (5)

1-23: LGTM - clean module structure and type definitions.

The documentation clearly explains the module's purpose. The WorkerType union type provides type safety, and the handler cache using Map is appropriate for this use case.


30-74: LGTM - efficient handler loading with caching.

The loadWorkerHandler function correctly:

  1. Checks cache before dynamic imports
  2. Uses exhaustive switch with typed cases
  3. Caches both handler and cleanup functions
  4. Throws on unknown worker types for safety

81-119: Good inference logic with important ordering consideration.

The inference function correctly distinguishes worker types using unique property combinations. The comment at line 108 about checking calculateMetrics before securityCheck is critical since both have content but differ in other required properties (encoding vs type).


155-176: LGTM - robust task routing with inference fallback.

The default handler correctly prioritizes task-based inference over workerData/env configuration, which is essential for bundled environments where Tinypool may reuse child processes across different worker pools. The descriptive error message helps with debugging.


182-189: LGTM - proper cleanup with cache clearing.

The onWorkerTermination function correctly iterates through cached handlers, calls their cleanup functions (awaiting async cleanups), and clears the cache to ensure fresh state on next use.

website/server/src/index.ts (1)

26-72: LGTM - clean server/worker dual-mode pattern.

Wrapping the entire server initialization in the !isTinypoolWorker() guard is an effective pattern for files that serve as both server entry points and Tinypool worker modules in bundled environments.

This allows CI to pass by using the PR branch which has the new exports.
Should be changed back to #main after merging.
@claude
Copy link
Contributor

claude bot commented Dec 31, 2025

Code Review Summary

This PR introduces a unified worker entry point that enables full bundling support for the website server, targeting Cloud Run cold start optimization. The implementation is well-structured and maintains backward compatibility.

Overall Assessment: ✅ Approve

The PR demonstrates solid engineering with careful attention to edge cases and backward compatibility. The architecture cleanly separates bundled vs. non-bundled environments through environment variable detection.


Strengths

  1. Clean architecture: The unified worker dispatcher pattern with task-type inference is elegant and handles the Tinypool child process reuse scenario well
  2. Backward compatibility: Normal CLI usage is completely unaffected - new code paths only activate with REPOMIX_WORKER_PATH
  3. Good test coverage: Comprehensive tests for the new unifiedWorker.ts covering all task inference paths
  4. Type safety: Proper TypeScript types for WorkerType union and consistent async signatures

Potential Issues & Recommendations

1. Website server package.json - Temporary branch reference

File: website/server/package.json:19

The repomix dependency currently points to the PR branch. This is explicitly noted in the commit message as temporary. Reminder to change back to main after merging.

2. Task inference order sensitivity

File: src/shared/unifiedWorker.ts:81-118

The inferWorkerTypeFromTask function relies on checking specific field combinations in a particular order. The ordering is correct (e.g., calculateMetrics before securityCheck since both have content), but this coupling to task structure could be fragile if task interfaces change.

Recommendation: Consider adding a comment documenting why the order matters, or add a discriminator field to tasks for more robust routing in future iterations.

3. Handler cache lifetime

File: src/shared/unifiedWorker.ts:23

The handlerCache is a module-level Map that persists across the worker lifetime. This is intentional for performance. The cache is properly cleared in onWorkerTermination. This is fine as-is.


Premortem Analysis

Risk Likelihood Impact Mitigation
Task structure changes break inference Low High Document field dependencies; consider adding tests that fail if task interfaces change
WASM path resolution in bundled env Medium Medium setWasmBasePath API and REPOMIX_WASM_DIR env var provide flexibility
Child process reuse causing handler mismatch Low High Already mitigated by task inference taking priority over env variable
Memory leak from handler cache Very Low Low Cache is bounded by WorkerType enum (5 entries max)

Minor Suggestions

  1. Optional: In defaultActionWorker.ts:59, the fallback cliOptions ?? {} is good defensive coding. Consider adding a debug log when this fallback is used to help diagnose issues in bundled environments.

  2. Documentation: The PR description is excellent. Consider adding a section to CONTRIBUTING.md or README about the bundled environment setup for future reference.


Test Verification

The test changes appropriately update from workerPath to workerType in mock expectations. The new unifiedWorker.test.ts provides good coverage of:

  • Task inference for all 5 worker types
  • Ping task handling
  • Error cases (null, non-object, unrecognizable tasks)
  • Worker termination cleanup

LGTM 👍 Ready to merge after ensuring the website server package.json is updated post-merge.


🤖 Generated with Claude Code

Build and link local repomix package before running website server lint.
This ensures the latest exports are available during type checking.
@claude
Copy link
Contributor

claude bot commented Dec 31, 2025

Code Review: PR #1056 - feat(worker): Add unified worker entry point for bundling support

Summary

This PR introduces a unified worker entry point to enable full bundling support for the website server, which should improve Cloud Run cold start times. The implementation is well-structured and maintains backward compatibility.

Overall Assessment: ✅ Approve with Minor Suggestions

The PR is well-designed with clear separation of concerns. The approach of inferring worker types from task structure is clever for handling Tinypool's process reuse in bundled environments.


✅ Strengths
  1. Backward Compatibility: Normal CLI usage is completely unaffected - the new unified worker path only activates when REPOMIX_WORKER_PATH is set.

  2. Clean Architecture: The transition from workerPath to workerType is a good abstraction that simplifies the API and centralizes path resolution logic.

  3. Handler Caching: The handlerCache in unifiedWorker.ts prevents redundant dynamic imports, which is important for performance.

  4. Comprehensive Testing: The new unifiedWorker.test.ts covers all worker type inference scenarios and cleanup behavior.

  5. Defensive Coding: Good validation in defaultActionWorker.ts with safeCliOptions fallback for bundled environments.

  6. Consistent Async Signatures: Unifying all onWorkerTermination exports to async (): Promise<void> improves consistency.

⚠️ Minor Concerns & Suggestions

1. Task Inference Order Sensitivity (Low Risk)

src/shared/unifiedWorker.ts:81-118

The task inference logic depends on checking fields in a specific order. While the current ordering appears correct, a task with overlapping fields could theoretically be misrouted.

Suggestion: Consider adding a discriminant field (e.g., taskType: 'fileCollect') to task objects for more explicit routing, though the current inference approach is acceptable for now.

2. CI Workflow Uses npm link

.github/workflows/ci.yml:89-98

The CI now builds and links the local repomix package before running website server lint. This adds build time to CI runs. While necessary for type checking the new exports, consider if this could be optimized in the future.

3. Environment Variable Naming

The environment variables (REPOMIX_WORKER_PATH, REPOMIX_WASM_DIR, REPOMIX_WORKER_TYPE) are well-named but undocumented. Consider adding documentation for users who want to use bundled environments.

🔍 Potential Failure Scenarios (Premortem Analysis)
Scenario Risk Level Mitigation
Task structure changes break inference Medium Tests cover current task structures; changes would fail tests
Handler cache memory leak Low Cache is cleared in onWorkerTermination; size is bounded by 5 worker types
WASM path misconfiguration in bundled env Low fs.access check provides clear error message with path
Circular import with WorkerType Low Already resolved by re-exporting from processConcurrency.ts
Tinypool process reuse edge cases Medium Task inference prioritized over env/workerData provides resilience
📝 Code Quality Notes
  • Code follows Airbnb style guide ✅
  • No security vulnerabilities detected ✅
  • Proper error handling with descriptive messages ✅
  • The biome-ignore comment for any types is appropriate given the varying worker signatures ✅
  • Good use of TypeScript discriminated unions and type guards ✅

Questions for Author

  1. Post-merge action: The commit message mentions "Should be changed back to #main after merging" for the website server dependency. Is there a follow-up task tracked for this?

  2. Performance testing: Have you measured the actual cold start improvement in Cloud Run with the bundled server?


🤖 Generated with Claude Code

- Update Dockerfile for bundled production mode with esbuild
- Copy WASM files and tinypool for runtime
- Set environment variables for bundled worker path
- Update compose.yml to disable bundled mode for local development
@claude
Copy link
Contributor

claude bot commented Jan 1, 2026

PR Review: feat(worker): Add unified worker entry point for bundling support

This PR introduces a unified worker entry point system to enable full bundling support for the website server, primarily aimed at improving Cloud Run cold start times. The implementation is well-designed with good backward compatibility considerations.

Overall Assessment

Recommendation: Approve with minor suggestions

The PR is well-structured and the implementation shows careful thought about the challenges of bundling workers. The approach of using task-structure inference to handle Tinypool's child process reuse is clever and practical.


Code Quality & Best Practices

Details

Strengths:

  • Clean separation between bundled and non-bundled environments via environment variables
  • Good use of caching in loadWorkerHandler() to avoid repeated dynamic imports
  • Comprehensive test coverage for the new unifiedWorker.ts
  • Proper async signatures for all onWorkerTermination hooks for consistency

Minor observations:

  • The inferWorkerTypeFromTask() function is well-documented with clear ordering of checks (e.g., calculateMetrics before securityCheck)
  • Type safety is maintained through proper TypeScript types

Potential Issues & Edge Cases

Details
  1. Task Inference Order Sensitivity: The inferWorkerTypeFromTask() function in src/shared/unifiedWorker.ts:81-119 relies on a specific order of checks. If a task happens to have properties matching multiple worker types, the first match wins. This is currently handled correctly, but future task structures should be mindful of this ordering.

  2. WASM Path Validation: In loadLanguage.ts:56-60, the WASM path is validated with fs.access(), which is good. However, if REPOMIX_WASM_DIR is set but the directory doesn't exist, users will get clear error messages.

  3. CI Workflow: The change to use npm link repomix in CI (ci.yml:89-98) is a practical solution for type checking the new exports before merging. Good to note this should be changed back after merge as indicated in the commit message.


Performance Considerations

Details

Positive:

  • Handler caching in unifiedWorker.ts prevents repeated dynamic imports
  • The unified worker approach should reduce cold start times in Cloud Run by eliminating multiple file resolution costs
  • Task inference is lightweight (object property checks)

Note:

  • The inferWorkerTypeFromTask() function is called for every task in bundled environments. The current implementation is efficient with simple property checks.

Security Considerations

Details

Reviewed areas:

  • Environment variables (REPOMIX_WORKER_PATH, REPOMIX_WASM_DIR, REPOMIX_WORKER_TYPE) are used appropriately for configuration
  • The defaultActionWorker.ts now validates task structure before processing (lines 47-56), which is a good defensive practice
  • No security vulnerabilities identified

Test Coverage

Details

The test coverage is good:

  • tests/shared/unifiedWorker.test.ts: Comprehensive tests for task inference logic
  • Tests cover all 5 worker types plus error cases (null, non-object, unrecognizable)
  • onWorkerTermination cleanup is tested

Consider adding:

  • Tests for the getWorkerTypeFromWorkerData() function with array format (Tinypool child_process mode)

Premortem Analysis: Potential Failure Scenarios

Scenario Risk Mitigation
WASM files missing in bundled environment Medium Clear error message in loadLanguage.ts with path info
Task structure changes in future Low Task inference order is documented; tests would catch regressions
Environment variable not propagated to workers Medium Already handled with both workerData and env passing in processConcurrency.ts
Memory leak from handler cache Low Cache is cleared in onWorkerTermination()
CI breaks after PR merge (npm link issue) Low Commit message notes to revert to #main after merge

Suggested Improvements (Non-blocking)

  1. Consider adding a reset function for setWasmBasePath() for testing purposes (e.g., resetWasmBasePath()), though current approach works fine.

  2. Document the environment variables in README or CLAUDE.md for future reference:

    • REPOMIX_WORKER_PATH: Path to unified worker for bundled environments
    • REPOMIX_WASM_DIR: Custom WASM directory for bundled environments
    • REPOMIX_WORKER_TYPE: Internal worker type for child_process mode

Summary

This is a solid PR that enables important bundling functionality for Cloud Run deployment optimization. The implementation:

  • Maintains backward compatibility for CLI users
  • Handles the complexity of Tinypool's child process reuse elegantly
  • Includes comprehensive tests
  • Is well-documented in code comments

Changes look good to merge! 🚀

Use specific commit hash in package-lock.json for reproducibility.
@claude
Copy link
Contributor

claude bot commented Jan 1, 2026

Code Review: feat(worker): Add unified worker entry point for bundling support

This is a well-architected PR that enables full bundling support for the website server, which should significantly improve Cloud Run cold start times. The implementation is clean and maintains backward compatibility.

Overall Assessment: ✅ Approved with minor suggestions


Strengths

  1. Clean Architecture: The unified worker pattern with task-based type inference is elegant and solves the child process reuse problem in bundled environments
  2. Backward Compatibility: The implementation only activates bundled paths when REPOMIX_WORKER_PATH is set, ensuring normal CLI usage is unaffected
  3. Comprehensive Testing: Good test coverage for the new unified worker with various task inference scenarios
  4. Proper Cleanup: The onWorkerTermination pattern is consistently applied across all workers with async signatures

Code Quality Review

Details

unifiedWorker.ts (src/shared/unifiedWorker.ts)

  • Handler caching: Good use of Map for caching loaded handlers by worker type
  • Task inference ordering: The comment on line 108 correctly notes that calculateMetrics must be checked before securityCheck due to overlapping fields

processConcurrency.ts (src/shared/processConcurrency.ts)

  • Clean separation: getWorkerPath() function cleanly abstracts the path resolution logic
  • Environment variable passing: Properly passes REPOMIX_WORKER_TYPE in env for child_process runtime

loadLanguage.ts (src/core/treeSitter/loadLanguage.ts)

  • Configurable WASM paths: Good support for both programmatic (setWasmBasePath) and environment variable (REPOMIX_WASM_DIR) configuration

website/server/src/index.ts

  • Dual-mode entry point: Clever use of isTinypoolWorker() to skip server initialization when running as a worker

Minor Suggestions

Details
  1. Type safety in inferWorkerTypeFromTask: Consider adding TypeScript narrowing guards to avoid repeated type assertions

  2. Error message consistency: In unifiedWorker.ts:170, the error message could include the actual task structure for debugging:

    throw new Error(`Cannot determine worker type from workerData, env, or task structure: ${JSON.stringify(task)}`);
  3. Documentation for environment variables: Consider documenting the new environment variables (REPOMIX_WORKER_PATH, REPOMIX_WASM_DIR, REPOMIX_WORKER_TYPE) in the README or a configuration guide


Premortem Analysis

Potential Failure Scenarios & Mitigations
Scenario Risk Mitigation
Task structure changes If task structures change in the future, inferWorkerTypeFromTask may fail to route correctly The explicit field checks provide clear points to update; consider adding a __workerType field to tasks for explicit routing
WASM files missing in bundled env If WASM files aren't properly copied during build Dockerfile correctly copies WASM files; runtime check in getWasmPath throws descriptive error
Circular imports unifiedWorker imports all workers, which might cause issues Dynamic imports mitigate this risk; modules are only loaded when needed
Memory leaks in handler cache Handlers cached indefinitely onWorkerTermination properly clears the cache
Tinypool version incompatibility Different Tinypool versions may handle workerData differently The getWorkerTypeFromWorkerData handles both array and object formats

Security Considerations

  • ✅ No new user input vectors introduced
  • ✅ Environment variables used for configuration only, not for code execution paths
  • ✅ WASM file paths are validated before loading

Test Coverage

  • ✅ Task inference tests cover all worker types
  • ✅ Null/invalid task handling tested
  • ✅ Worker termination cleanup tested
  • ⚠️ Consider adding integration tests for the bundled environment flow

Checklist Verification

  • Tests pass (1059 tests mentioned in PR)
  • Lint passes
  • Backward compatible

Note: Remember to change the repomix dependency in website/server/package.json back to #main after merging (as noted in commit message).


🤖 Review by Claude

@yamadashy yamadashy merged commit 1972fb1 into main Jan 2, 2026
54 checks passed
@yamadashy yamadashy deleted the feat/unified-worker-bundling branch January 2, 2026 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant