Skip to content

feat(core): Implement worker memory management with Tinypool teardown and reorganize memory testing tools#817

Merged
yamadashy merged 14 commits intomainfrom
feat/memory-check
Sep 1, 2025
Merged

feat(core): Implement worker memory management with Tinypool teardown and reorganize memory testing tools#817
yamadashy merged 14 commits intomainfrom
feat/memory-check

Conversation

@yamadashy
Copy link
Owner

@yamadashy yamadashy commented Aug 31, 2025

Implement comprehensive worker memory management system with proper resource cleanup and reorganize memory testing infrastructure for better maintainability and debugging capabilities.

Summary of Changes

🔧 Worker Memory Management System

  • Tinypool Teardown Integration: Leveraged Tinypool's teardown functionality for unified worker termination handling
  • Universal Cleanup Functions: Added onWorkerTermination export to all 8 worker files for consistent resource cleanup
  • Required Runtime Parameter: Made runtime parameter mandatory in WorkerOptions to enforce explicit memory management decisions
  • Strategic Runtime Selection: Optimized worker runtime assignments based on memory risk assessment

🧹 Resource Cleanup Improvements

  • Tree-sitter WASM Cleanup: Proper disposal of Tree-sitter parsers and WASM resources in fileProcessWorker
  • Token Counter Management: Systematic cleanup of tiktoken counters in all metrics workers
  • Language Parser Disposal: Enhanced LanguageParser.dispose() with debug logging for parser cleanup
  • Singleton Cleanup: Added cleanupLanguageParser() function for proper singleton resource management

📁 Memory Testing Infrastructure Reorganization

  • Directory Restructure: Moved memory testing tools from benchmarks/memory/ to scripts/memory/
  • Enhanced Visualization: Added ASCII chart visualization with asciichart dependency for real-time memory monitoring
  • Improved Test Output: Better formatted test results with aligned columns and progress indicators
  • CI/CD Updates: Updated GitHub Actions workflows to reflect new directory structure

⚡ Worker Runtime Optimization

// Strategic runtime assignments based on memory characteristics
- fileCollectWorker: 'worker_threads'     // Low memory risk, high frequency
- fileProcessWorker: 'child_process'      // Tree-sitter WASM, high leak risk  
- globbyWorker: 'child_process'           // Memory isolation for glob ops
- securityCheckWorker: 'child_process'    // Security isolation required
- All metrics workers: 'child_process'    // Token counter cleanup needed

🎯 Problem Solved

  • Memory Leaks: Prevents WASM resource leaks during long-running processes
  • Resource Cleanup: Ensures proper cleanup on worker termination
  • Developer Experience: Better organized and enhanced memory testing tools
  • Process Stability: Improved reliability through appropriate runtime isolation

🔍 Technical Implementation

  • 8 Workers Enhanced: All worker files now export onWorkerTermination functions
  • Cleanup Strategies: Different cleanup approaches based on worker resource usage
  • Debug Logging: Added comprehensive logging for cleanup process visibility
  • Test Enhancement: Memory test script with graph visualization and leak detection

✅ Validation

  • Memory leak prevention verified through enhanced testing tools
  • Worker cleanup functionality tested across all worker types
  • CI/CD pipeline updated and verified with new directory structure
  • Runtime configurations validated for optimal memory/performance balance

Checklist

  • Run npm run test
  • Run npm run lint
  • Verified all worker cleanup functions
  • Tested memory visualization tools
  • Updated GitHub Actions workflows
  • Validated runtime configurations

Copilot AI review requested due to automatic review settings August 31, 2025 10:57
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 31, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Adds ASCII graphing to memory benchmark via new --graph flag. Refactors worker runtime selection across multiple modules, making runtime a required option and configuring Tinypool teardown via exported onWorkerTermination hooks. Introduces explicit cleanup for Tree-sitter parser. Updates various task runners to use worker_threads or child_process as specified.

Changes

Cohort / File(s) Summary
Memory benchmark: CLI, graphs, config
benchmarks/memory/.gitignore, benchmarks/memory/package.json, benchmarks/memory/src/memory-test.ts, benchmarks/memory/src/types.ts
Adds asciichart dependency and --graph flag; renders ASCII memory graphs; improves log formatting; updates TestConfig.options by removing style and introducing compress; gitignore tweak.
Concurrency API and runtime selection
src/shared/processConcurrency.ts
WorkerOptions.runtime made required; default destructuring retained; Tinypool initialized with teardown: 'onWorkerTermination'.
Task runner runtime at call sites
src/core/file/fileCollect.ts, src/core/file/fileProcess.ts, src/core/file/globbyExecute.ts, src/core/metrics/calculateGitDiffMetrics.ts, src/core/metrics/calculateGitLogMetrics.ts, src/core/metrics/calculateOutputMetrics.ts, src/core/metrics/calculateSelectiveFileMetrics.ts, src/core/security/securityCheck.ts
Adds explicit runtime selection per task runner: mostly worker_threads (low leak risk) and child_process for file processing (high memory/leak risk note).
Worker teardown hooks
src/core/file/workers/fileCollectWorker.ts, src/core/file/workers/fileProcessWorker.ts, src/core/file/workers/globbyWorker.ts, src/core/metrics/workers/fileMetricsWorker.ts, src/core/metrics/workers/gitDiffMetricsWorker.ts, src/core/metrics/workers/gitLogMetricsWorker.ts, src/core/metrics/workers/outputMetricsWorker.ts, src/core/security/workers/securityCheckWorker.ts
Exposes exported onWorkerTermination hooks; replaces process exit listeners in metrics workers with explicit teardown; fileProcessWorker calls cleanupLanguageParser.
Tree-sitter lifecycle
src/core/treeSitter/languageParser.ts, src/core/treeSitter/parseFile.ts
Adds lang to LanguageResources and logs dispose; exports cleanupLanguageParser() to dispose singleton and log deletion.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant CLI as User/CLI
  participant Mem as memory-test.ts
  participant Sampler as Memory Sampler
  participant Chart as AsciiChart

  CLI->>Mem: run memory test [--graph]
  Mem->>Sampler: start sampling heapUsed/rss
  loop iterations
    Sampler-->>Mem: memory snapshot
    Mem->>Mem: analyzeMemoryTrends()
    alt --graph enabled (during run)
      Mem->>Chart: displayMemoryGraphs(last 40 points)
    end
  end
  Mem->>Sampler: stop sampling
  alt --graph enabled (final)
    Mem->>Chart: display "Complete Memory Usage Timeline"
    Mem->>Chart: displayMemoryGraphs(full history)
  end
  Mem-->>CLI: exit
Loading
sequenceDiagram
  autonumber
  participant Caller as Module (e.g., fileProcess.ts)
  participant PC as initTaskRunner (processConcurrency)
  participant Pool as Tinypool
  participant Worker as Worker Module
  participant Hook as onWorkerTermination()

  Caller->>PC: initTaskRunner({workerPath, numOfTasks, runtime})
  PC->>Pool: new Tinypool({filename: workerPath, teardown: 'onWorkerTermination'})
  Caller->>Pool: run tasks
  Pool->>Worker: execute task
  Note over Worker: Normal processing
  Caller->>Pool: destroy
  Pool->>Worker: invoke teardown hook
  Worker->>Hook: onWorkerTermination()
  Note over Worker: e.g., freeTokenCounters(), cleanupLanguageParser()
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/memory-check

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@claude
Copy link
Contributor

claude bot commented Aug 31, 2025

Claude encountered an error —— View job


I'll analyze this and get back to you.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @yamadashy, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

このプルリクエストは、Repomixのワーカー処理におけるメモリ使用量と処理速度のバランスを最適化することを目的としています。具体的には、タスクの性質に応じて適切なワーカーランタイムを選択し、リソースの確実なクリーンアップメカニズムを導入することで、システムの安定性と効率性を大幅に向上させました。また、メモリベンチマークツールも強化され、より詳細な分析が可能になっています。

Highlights

  • Worker Pool設定の最適化: Repomixのworker処理において、メモリリークリスクに基づいて適切なランタイム(高リスクなTree-sitter処理にはchild_process、低リスクなmetrics計算などにはworker_threads)を明示的に選択するように変更しました。これにより、WorkerOptionsruntimeパラメータが必須となりました。
  • メモリクリーンアップの改善: Tinypoolのteardown機能を活用し、すべてのworkerファイルでonWorkerTermination関数をエクスポートすることで、統一的なworker終了処理を実装しました。これにより、Tree-sitterのWASMリソースやトークンカウンターが適切に解放され、デバッグログも追加されました。
  • コードベースの整理とパフォーマンス向上: 重複コードの削除、LanguageParserdeleteAllParsersメソッドをdisposeに統合するなど、コードベースのメンテナンス性を向上させました。これらの変更により、メモリ効率、処理速度、およびシステムの安定性が向上しています。
  • メモリベンチマークツールの強化: メモリベンチマークツールにasciichartを導入し、リアルタイムでASCIIグラフを表示する--graphオプションを追加しました。これにより、メモリ使用量のトレンドを視覚的に確認できるようになり、デバッグと分析が容易になりました。
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes memory usage and processing speed balance in Repomix worker processing by implementing proper cleanup of Tree-sitter WASM resources and token counters, while strategically choosing worker runtimes based on memory leak risk.

  • Worker pool configuration optimized with explicit runtime selection based on memory leak risk
  • Unified memory cleanup implementation using Tinypool's teardown functionality
  • Enhanced memory benchmark testing with ASCII graph visualization capabilities

Reviewed Changes

Copilot reviewed 23 out of 24 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/shared/processConcurrency.ts Made runtime parameter mandatory and added teardown configuration
src/core/treeSitter/parseFile.ts Added cleanup function for language parser singleton
src/core/treeSitter/languageParser.ts Enhanced dispose method with debug logging
src/core/security/workers/securityCheckWorker.ts Added onWorkerTermination export for consistency
src/core/metrics/workers/*.ts Replaced process.on('exit') with onWorkerTermination exports
src/core/file/workers/*.ts Added onWorkerTermination exports and cleanup calls
src/core//calculate.ts Added explicit runtime selection with comments
benchmarks/memory/* Enhanced memory testing with graph visualization and improved configuration
Files not reviewed (1)
  • benchmarks/memory/package-lock.json: Language not supported

public async dispose(): Promise<void> {
for (const resources of this.loadedResources.values()) {
resources.parser.delete();
logger.debug('Deleted parser for language: ', resources.lang);
Copy link

Copilot AI Aug 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The log message should use consistent formatting. Consider using template literals or consistent spacing around the colon.

Suggested change
logger.debug('Deleted parser for language: ', resources.lang);
logger.debug(`Deleted parser for language: ${resources.lang}`);

Copilot uses AI. Check for mistakes.
@yamadashy yamadashy changed the title feat(core): メモリ使用量と速度の良いバランスをとった構成に feat(core): Optimize worker pool configuration for balanced memory usage and performance Aug 31, 2025
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Aug 31, 2025

Deploying repomix with  Cloudflare Pages  Cloudflare Pages

Latest commit: 0ac770d
Status: ✅  Deploy successful!
Preview URL: https://9dfda8be.repomix.pages.dev
Branch Preview URL: https://feat-memory-check.repomix.pages.dev

View logs

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

このプルリクエストは、worker処理におけるメモリ使用量と処理速度のバランスを最適化するための重要なリファクタリングですね。child_processworker_threadsをリスクに応じて使い分ける方針や、tinypoolteardown機能を利用した統一的なクリーンアップ機構の導入は、メモリリークの防止とパフォーマンス向上に大きく貢献する素晴らしい改善です。コード全体で一貫したクリーンアップパターン (onWorkerTermination) が適用されており、堅牢性が向上していると感じます。
いくつか軽微な改善点として、冗長なクリーンアップ処理の削除や、可読性向上のためのマジックナンバーの定数化についてコメントしました。全体として、アプリケーションの安定性とパフォーマンスを向上させる質の高い変更だと思います。

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
src/core/security/workers/securityCheckWorker.ts (1)

51-56: Fix incorrect file extension detection (breaks Makefile, Dockerfile, etc.)

filePath.split('.') yields “Makefile” instead of an empty extension. Use path.extname().slice(1).

Apply within this range:

-      ext: filePath.split('.').pop() || '',
+      // Use reliable extension extraction
+      ext: path.extname(filePath).slice(1),

Add at top of the file (outside this range):

import path from 'node:path';
src/core/treeSitter/languageParser.ts (1)

98-105: Missing Query.delete() can leak WASM memory; also fix debug log formatting.

Delete the Parser.Query objects during dispose and format the log as a single string.

   public async dispose(): Promise<void> {
     for (const resources of this.loadedResources.values()) {
-      resources.parser.delete();
-      logger.debug('Deleted parser for language: ', resources.lang);
+      // Free query first, then parser to release WASM memory fully
+      try { resources.query.delete(); } catch {}
+      try { resources.parser.delete(); } catch {}
+      logger.debug(`Deleted parser for language: ${resources.lang}`);
     }
     this.loadedResources.clear();
     this.initialized = false;
   }
benchmarks/memory/package.json (1)

8-15: Replace all node --run usages or bump Node engines to ≥22.0.0
Either replace every node --run <script> with npm run <script> in all package.json scripts (root and benchmarks), or update each "engines".node to ">=22.0.0" so that the --run flag is supported.

benchmarks/memory/src/memory-test.ts (1)

60-62: Help text/examples don’t match actual defaults (code uses 100 basic iterations).

Docs say 50 (and example comments say 50) but code defaults to 100 for basic runs.

Align help and examples:

-  iterations     Number of test iterations (default: 50 for basic, 200 for --full)
+  iterations     Number of test iterations (default: 100 for basic, 200 for --full)
...
-  node memory-test.ts                    # Quick CI test (50 iterations)
+  node memory-test.ts                    # Quick CI test (100 iterations)

Alternatively, if 50 is intended, change the code default at Line 31:

-const iterations = Number(numericArgs[0]) || (flags.full ? 200 : 100);
+const iterations = Number(numericArgs[0]) || (flags.full ? 200 : 50);

Also applies to: 71-75

🧹 Nitpick comments (10)
src/shared/processConcurrency.ts (1)

34-36: Remove defaulting runtime to 'child_process' to honor “required” contract

WorkerOptions.runtime is now required; defaulting to 'child_process' weakens that guarantee and diverges from Tinypool’s default of 'worker_threads'. Prefer no default and fail fast if omitted.

-  const { numOfTasks, workerPath, runtime = 'child_process' } = options;
+  const { numOfTasks, workerPath, runtime } = options;
+  // (Optional) Runtime guard for JS callsites:
+  if (!runtime) throw new Error('WorkerOptions.runtime is required');

Reference: Tinypool’s default runtime is worker_threads. (npmjs.com, bestofjs.org)

src/core/metrics/workers/gitLogMetricsWorker.ts (1)

34-36: Avoid duplicate cleanup; rely on teardown for performance.

You free token counters in finally and again in onWorkerTermination(). Keeping only the teardown avoids repeated alloc/free across tasks on a hot worker.

   } catch (error) {
     logger.error('Error calculating git log token count:', error);
     return 0;
-  } finally {
-    // Clean up token counters to free memory
-    freeTokenCounters();
-  }
+  }

Also applies to: 39-42

src/core/file/workers/fileProcessWorker.ts (1)

24-27: Harden teardown and make return type explicit.

Add an explicit return type and guard against unexpected dispose errors so teardown never throws.

-export const onWorkerTermination = () => {
-  cleanupLanguageParser();
-};
+export const onWorkerTermination = (): void => {
+  try {
+    cleanupLanguageParser();
+  } catch {
+    // best-effort cleanup; ignore to avoid teardown failures
+  }
+};
src/core/metrics/calculateOutputMetrics.ts (1)

22-24: Runtime specified: consistent with metrics workers.

Looks good. Consider resourceLimits in Worker options if initTaskRunner plumbs them through (heap caps for pathological inputs).

-    runtime: 'worker_threads',
+    runtime: 'worker_threads',
+    // Optional: pass through if supported by initTaskRunner/WorkerOptions
+    // resourceLimits: { maxOldGenerationSizeMb: 512 },
benchmarks/memory/package.json (1)

17-18: Move @types dependency to devDependencies.

@types/asciichart is compile-time only and shouldn’t ship in production deps.

Apply:

   "dependencies": {
-    "@types/asciichart": "^1.5.8",
     "asciichart": "^1.5.25",
     "repomix": "file:../.."
   },
   "devDependencies": {
     "@types/node": "^22.16.5",
+    "@types/asciichart": "^1.5.8",
     "typescript": "^5.3.0"
   },
benchmarks/memory/src/memory-test.ts (5)

36-36: Double-check GC cadence change (20 in basic mode).

Forcing GC more frequently in basic mode can skew measurements and hide leaks. Consider aligning or reducing forced GC in basic runs.

Suggested tweak:

-const FORCE_GC_INTERVAL = flags.full ? 50 : 20;
+const FORCE_GC_INTERVAL = flags.full ? 50 : 50;

98-99: Gate the “Forced garbage collection” log to reduce noise.

It prints every GC interval and can drown signal.

-    global.gc();
-    console.log('🗑️  Forced garbage collection');
+    global.gc();
+    if (flags.full) console.log('🗑️  Forced garbage collection');

142-168: Consider a single combined graph with colors for easier comparison.

Two separate charts make visual correlation harder; asciichart supports multi-series.

-  console.log('\n📈 Memory Usage Graphs:');
-
-  console.log('\n🔸 Heap Usage (MB):');
-  console.log(
-    asciichart.plot(heapData, {
-      height: 8,
-      format: (x: number) => x.toFixed(1),
-    }),
-  );
-
-  console.log('\n🔹 RSS Usage (MB):');
-  console.log(
-    asciichart.plot(rssData, {
-      height: 8,
-      format: (x: number) => x.toFixed(1),
-    }),
-  );
+  console.log('\n📈 Memory Usage (Heap vs RSS, MB):');
+  const series = [heapData, rssData];
+  const all = heapData.concat(rssData);
+  console.log(
+    asciichart.plot(series, {
+      height: 8,
+      colors: [asciichart.yellow, asciichart.blue],
+      ymin: Math.min(...all),
+      ymax: Math.max(...all),
+      format: (x: number) => x.toFixed(1),
+    }),
+  );
+  console.log('Legend: 🔸 Heap=yellow, 🔹 RSS=blue');

169-182: Avoid divide-by-zero in trend percentages.

If initial averages were ever zero, growth prints Infinity.

-  const heapGrowth = ((avgRecentHeap - avgInitialHeap) / avgInitialHeap) * 100;
-  const rssGrowth = ((avgRecentRSS - avgInitialRSS) / avgInitialRSS) * 100;
+  const heapGrowth = avgInitialHeap > 0 ? ((avgRecentHeap - avgInitialHeap) / avgInitialHeap) * 100 : 0;
+  const rssGrowth  = avgInitialRSS  > 0 ? ((avgRecentRSS  - avgInitialRSS)  / avgInitialRSS)  * 100 : 0;

12-12: Use default import for asciichart
esModuleInterop is enabled in benchmarks/memory/tsconfig.json, so a default import is safe.

-import * as asciichart from 'asciichart';
+import asciichart from 'asciichart';
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8f7a7d9 and 8df7d8a.

⛔ Files ignored due to path filters (1)
  • benchmarks/memory/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (23)
  • benchmarks/memory/.gitignore (2 hunks)
  • benchmarks/memory/package.json (1 hunks)
  • benchmarks/memory/src/memory-test.ts (11 hunks)
  • benchmarks/memory/src/types.ts (0 hunks)
  • src/core/file/fileCollect.ts (1 hunks)
  • src/core/file/fileProcess.ts (1 hunks)
  • src/core/file/globbyExecute.ts (1 hunks)
  • src/core/file/workers/fileCollectWorker.ts (1 hunks)
  • src/core/file/workers/fileProcessWorker.ts (2 hunks)
  • src/core/file/workers/globbyWorker.ts (1 hunks)
  • src/core/metrics/calculateGitDiffMetrics.ts (1 hunks)
  • src/core/metrics/calculateGitLogMetrics.ts (1 hunks)
  • src/core/metrics/calculateOutputMetrics.ts (1 hunks)
  • src/core/metrics/calculateSelectiveFileMetrics.ts (1 hunks)
  • src/core/metrics/workers/fileMetricsWorker.ts (1 hunks)
  • src/core/metrics/workers/gitDiffMetricsWorker.ts (1 hunks)
  • src/core/metrics/workers/gitLogMetricsWorker.ts (1 hunks)
  • src/core/metrics/workers/outputMetricsWorker.ts (1 hunks)
  • src/core/security/securityCheck.ts (1 hunks)
  • src/core/security/workers/securityCheckWorker.ts (1 hunks)
  • src/core/treeSitter/languageParser.ts (3 hunks)
  • src/core/treeSitter/parseFile.ts (1 hunks)
  • src/shared/processConcurrency.ts (2 hunks)
💤 Files with no reviewable changes (1)
  • benchmarks/memory/src/types.ts
🧰 Additional context used
🧬 Code graph analysis (9)
src/core/security/workers/securityCheckWorker.ts (7)
src/core/file/workers/fileCollectWorker.ts (1)
  • onWorkerTermination (54-56)
src/core/file/workers/fileProcessWorker.ts (1)
  • onWorkerTermination (25-27)
src/core/file/workers/globbyWorker.ts (1)
  • onWorkerTermination (13-15)
src/core/metrics/workers/fileMetricsWorker.ts (1)
  • onWorkerTermination (41-43)
src/core/metrics/workers/gitDiffMetricsWorker.ts (1)
  • onWorkerTermination (40-42)
src/core/metrics/workers/gitLogMetricsWorker.ts (1)
  • onWorkerTermination (40-42)
src/core/metrics/workers/outputMetricsWorker.ts (1)
  • onWorkerTermination (29-31)
src/core/file/workers/fileCollectWorker.ts (7)
src/core/file/workers/fileProcessWorker.ts (1)
  • onWorkerTermination (25-27)
src/core/file/workers/globbyWorker.ts (1)
  • onWorkerTermination (13-15)
src/core/metrics/workers/fileMetricsWorker.ts (1)
  • onWorkerTermination (41-43)
src/core/metrics/workers/gitDiffMetricsWorker.ts (1)
  • onWorkerTermination (40-42)
src/core/metrics/workers/gitLogMetricsWorker.ts (1)
  • onWorkerTermination (40-42)
src/core/metrics/workers/outputMetricsWorker.ts (1)
  • onWorkerTermination (29-31)
src/core/security/workers/securityCheckWorker.ts (1)
  • onWorkerTermination (88-90)
src/core/file/workers/fileProcessWorker.ts (1)
src/core/treeSitter/parseFile.ts (1)
  • cleanupLanguageParser (93-98)
src/core/file/workers/globbyWorker.ts (7)
src/core/file/workers/fileCollectWorker.ts (1)
  • onWorkerTermination (54-56)
src/core/file/workers/fileProcessWorker.ts (1)
  • onWorkerTermination (25-27)
src/core/metrics/workers/fileMetricsWorker.ts (1)
  • onWorkerTermination (41-43)
src/core/metrics/workers/gitDiffMetricsWorker.ts (1)
  • onWorkerTermination (40-42)
src/core/metrics/workers/gitLogMetricsWorker.ts (1)
  • onWorkerTermination (40-42)
src/core/metrics/workers/outputMetricsWorker.ts (1)
  • onWorkerTermination (29-31)
src/core/security/workers/securityCheckWorker.ts (1)
  • onWorkerTermination (88-90)
src/core/treeSitter/languageParser.ts (1)
src/core/treeSitter/lang2Query.ts (1)
  • SupportedLang (35-35)
src/core/metrics/workers/outputMetricsWorker.ts (8)
src/core/file/workers/fileCollectWorker.ts (1)
  • onWorkerTermination (54-56)
src/core/file/workers/fileProcessWorker.ts (1)
  • onWorkerTermination (25-27)
src/core/file/workers/globbyWorker.ts (1)
  • onWorkerTermination (13-15)
src/core/metrics/workers/fileMetricsWorker.ts (1)
  • onWorkerTermination (41-43)
src/core/metrics/workers/gitDiffMetricsWorker.ts (1)
  • onWorkerTermination (40-42)
src/core/metrics/workers/gitLogMetricsWorker.ts (1)
  • onWorkerTermination (40-42)
src/core/security/workers/securityCheckWorker.ts (1)
  • onWorkerTermination (88-90)
src/core/metrics/tokenCounterFactory.ts (1)
  • freeTokenCounters (25-31)
src/core/metrics/workers/fileMetricsWorker.ts (8)
src/core/file/workers/fileCollectWorker.ts (1)
  • onWorkerTermination (54-56)
src/core/file/workers/fileProcessWorker.ts (1)
  • onWorkerTermination (25-27)
src/core/file/workers/globbyWorker.ts (1)
  • onWorkerTermination (13-15)
src/core/metrics/workers/gitDiffMetricsWorker.ts (1)
  • onWorkerTermination (40-42)
src/core/metrics/workers/gitLogMetricsWorker.ts (1)
  • onWorkerTermination (40-42)
src/core/metrics/workers/outputMetricsWorker.ts (1)
  • onWorkerTermination (29-31)
src/core/security/workers/securityCheckWorker.ts (1)
  • onWorkerTermination (88-90)
src/core/metrics/tokenCounterFactory.ts (1)
  • freeTokenCounters (25-31)
src/core/metrics/workers/gitDiffMetricsWorker.ts (8)
src/core/file/workers/fileCollectWorker.ts (1)
  • onWorkerTermination (54-56)
src/core/file/workers/fileProcessWorker.ts (1)
  • onWorkerTermination (25-27)
src/core/file/workers/globbyWorker.ts (1)
  • onWorkerTermination (13-15)
src/core/metrics/workers/fileMetricsWorker.ts (1)
  • onWorkerTermination (41-43)
src/core/metrics/workers/gitLogMetricsWorker.ts (1)
  • onWorkerTermination (40-42)
src/core/metrics/workers/outputMetricsWorker.ts (1)
  • onWorkerTermination (29-31)
src/core/security/workers/securityCheckWorker.ts (1)
  • onWorkerTermination (88-90)
src/core/metrics/tokenCounterFactory.ts (1)
  • freeTokenCounters (25-31)
src/core/metrics/workers/gitLogMetricsWorker.ts (4)
src/core/metrics/workers/fileMetricsWorker.ts (1)
  • onWorkerTermination (41-43)
src/core/metrics/workers/gitDiffMetricsWorker.ts (1)
  • onWorkerTermination (40-42)
src/core/metrics/workers/outputMetricsWorker.ts (1)
  • onWorkerTermination (29-31)
src/core/metrics/tokenCounterFactory.ts (1)
  • freeTokenCounters (25-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build and run (windows-latest, 24.x)
  • GitHub Check: Build and run (windows-latest, 21.x)
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (21)
src/core/file/workers/globbyWorker.ts (1)

12-15: Teardown hook export: LGTM

Named export matches the pool’s teardown contract; no cleanup needed here.

src/core/file/fileCollect.ts (1)

29-31: Runtime choice comment aligned with intent

Keeping file collection on worker_threads for low leak risk is consistent with the new policy.

src/core/security/workers/securityCheckWorker.ts (1)

87-90: Teardown hook export: LGTM

No-op is fine; this worker doesn’t hold long-lived resources.

src/core/file/workers/fileCollectWorker.ts (1)

53-56: Teardown hook export: LGTM

Consistent with the pool’s teardown callback; no resources to release here.

src/shared/processConcurrency.ts (1)

50-51: Teardown wiring verified across all workers
All worker modules export onWorkerTermination, matching the teardown: 'onWorkerTermination' setting in processConcurrency.ts. No missing exports detected.

src/core/metrics/workers/fileMetricsWorker.ts (1)

40-43: All worker files export onWorkerTermination and Tinypool is wired to call it—no further action.

src/core/metrics/workers/outputMetricsWorker.ts (1)

28-31: Teardown hook added correctly.

Consistent with other metrics workers; no issues.

src/core/security/securityCheck.ts (1)

61-63: Explicit runtime selection is appropriate here.

runtime: 'worker_threads' for low leak risk work matches the PR’s runtime strategy.

src/core/treeSitter/languageParser.ts (1)

12-16: Adding lang into LanguageResources is a good improvement.

Helps observability and structured logging.

Also applies to: 34-39

src/core/metrics/workers/gitDiffMetricsWorker.ts (1)

39-42: Approved: teardown hook correctly wired
The teardown: 'onWorkerTermination' option in createWorkerPool ensures the exported onWorkerTermination is invoked during pool teardown, freeing token counters as intended.

src/core/file/globbyExecute.ts (1)

19-21: Correct runtime choice for low-risk task; cleanup in finally is solid.

Using worker_threads here is appropriate; paired with taskRunner.cleanup() ensures no leaks.

src/core/metrics/calculateSelectiveFileMetrics.ts (1)

29-31: Pool sizing logic confirmed
maxThreads = Math.min(getProcessConcurrency(), Math.ceil(numOfTasks / TASKS_PER_THREAD)) with a floor of 1 guarantees the pool never exceeds available cores and scales one thread per 100 tasks.

src/core/file/fileProcess.ts (1)

27-29: Verify child_process concurrency cap
I didn’t see any poolSize, maxWorkers, or similar limit in src/shared/processConcurrency.ts for the Tinypool instance—please confirm you’re passing a sane maxThreads (or equivalent) option to bound concurrent child processes on large repos.

src/core/metrics/calculateGitLogMetrics.ts (1)

34-36: Runtime explicitly set to worker_threads — consistent with metrics path.

Matches the low-risk classification and teardown model.

src/core/file/workers/fileProcessWorker.ts (1)

3-3: Import looks correct and aligned with ESM build paths (.js).

benchmarks/memory/.gitignore (2)

10-10: Good ignore for test artifact.

Adding test-output.txt prevents accidental commits of local runs.


24-24: Minor formatting fix looks good.

Thumbs.db entry with a trailing newline is fine.

benchmarks/memory/src/memory-test.ts (4)

117-129: Aligned logging looks good.

The fixed-width fields make trend scanning easier.


193-195: Nice touch showing graphs during trend analysis.

Keeps long runs informative without waiting for the end.


305-310: Good conditional final graph.

Printing the complete timeline only when requested avoids noisy CI logs.


355-361: Features summary reads well.

Graph Display is correctly listed only when enabled.

@codecov
Copy link

codecov bot commented Aug 31, 2025

Codecov Report

❌ Patch coverage is 30.00000% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.36%. Comparing base (a693ee7) to head (0ac770d).
⚠️ Report is 15 commits behind head on main.

Files with missing lines Patch % Lines
src/core/treeSitter/parseFile.ts 8.33% 11 Missing ⚠️
src/core/metrics/workers/gitLogMetricsWorker.ts 0.00% 3 Missing ⚠️
src/core/file/workers/fileProcessWorker.ts 33.33% 2 Missing ⚠️
src/core/file/workers/globbyWorker.ts 0.00% 2 Missing ⚠️
src/core/metrics/workers/gitDiffMetricsWorker.ts 0.00% 2 Missing ⚠️
src/core/file/globbyExecute.ts 0.00% 1 Missing ⚠️
src/core/file/workers/fileCollectWorker.ts 50.00% 1 Missing ⚠️
src/core/metrics/calculateGitDiffMetrics.ts 0.00% 1 Missing ⚠️
src/core/metrics/calculateGitLogMetrics.ts 0.00% 1 Missing ⚠️
src/core/metrics/workers/fileMetricsWorker.ts 50.00% 1 Missing ⚠️
... and 3 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #817      +/-   ##
==========================================
- Coverage   87.59%   87.36%   -0.24%     
==========================================
  Files         113      113              
  Lines        6604     6626      +22     
  Branches     1372     1372              
==========================================
+ Hits         5785     5789       +4     
- Misses        819      837      +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Add asciichart dependency for real-time memory visualization
- Implement displayMemoryGraphs() function with heap/RSS graphs
- Add --graph/-g flag to enable graph display
- Improve output alignment with fixed-width formatting
- Show final complete timeline after test completion
- Temporarily disable force GC for cleaner graph visualization
Add comprehensive worker termination cleanup to prevent memory leaks from WASM resources and token counters in worker threads.

- Add deleteAllParsers method to LanguageParser for WASM resource cleanup
- Create shared onWorkerTermination utility for consistent cleanup patterns
- Refactor all workers to use centralized cleanup approach
- Add cleanupLanguageParser function for Tree-sitter resource management
- Handle both parentPort close and process exit events for robust cleanup

This addresses memory leak issues in worker threads by ensuring proper disposal of:
- Tree-sitter WASM parsers and language resources
- Tiktoken encoding resources
- Other worker-specific resources

Also fixes linting issues in memory benchmark tool by removing unnecessary type assertions.
…lity

Replace custom worker event listeners with Tinypool's built-in teardown mechanism for cleaner and more reliable resource cleanup.

- Export onWorkerTermination function from all 8 worker files for Tinypool teardown
- Configure teardown: 'onWorkerTermination' in createWorkerPool
- Remove shared/workerCleanup.ts utility (no longer needed)
- Maintain same cleanup behavior: Tree-sitter WASM resources and token counters
- Add onWorkerTermination to workers that don't need cleanup for consistency

This leverages Tinypool's native worker lifecycle management instead of manual event handling, improving reliability and reducing complexity.

Workers updated:
- fileProcessWorker.ts: Tree-sitter parser cleanup
- fileMetricsWorker.ts: Token counter cleanup
- gitDiffMetricsWorker.ts: Token counter cleanup
- gitLogMetricsWorker.ts: Token counter cleanup
- outputMetricsWorker.ts: Token counter cleanup
- fileCollectWorker.ts: No cleanup needed
- globbyWorker.ts: No cleanup needed
- securityCheckWorker.ts: No cleanup needed
Add debug logging to track when Tree-sitter parsers are being deleted during worker cleanup for better observability of memory management.

- Add logger import to languageParser.ts
- Include language name in LanguageResources interface for logging
- Log individual parser deletions in deleteAllParsers method
- Log language parser singleton cleanup in cleanupLanguageParser function

This helps with debugging memory cleanup issues and verifying that WASM resources are being properly disposed of during worker termination.
Refine worker pool initialization to better manage memory leak risks and consolidate cleanup methods.

- Make runtime parameter required in WorkerOptions for explicit memory management decisions
- Configure runtime based on memory leak risk assessment:
  - worker_threads: Low risk workers (metrics, security checks, file collection)
  - child_process: High risk workers (file processing with Tree-sitter, globby operations)
- Consolidate deleteAllParsers into existing dispose method in LanguageParser
- Update cleanupLanguageParser to use dispose method for consistency
- Add runtime comments to clarify memory leak risk levels

This provides explicit control over worker isolation strategies based on memory characteristics of each operation, improving overall memory stability.
…lection

Adjust worker runtime configuration to use child_process for all potentially risky operations, prioritizing stability and isolation over performance.

- Change token-related workers to child_process for better memory isolation:
  - calculateGitDiffMetrics: child_process (was worker_threads)
  - calculateGitLogMetrics: child_process (was worker_threads)
  - calculateOutputMetrics: child_process (was worker_threads)
  - calculateSelectiveFileMetrics: child_process (was worker_threads)
- Keep file collection and globby operations as worker_threads (lower risk)
- Remove redundant memory leak risk comments for cleaner code
- Fix test cases to include required runtime parameter and teardown property
- Reorder imports in languageParser.ts for consistency

This conservative approach ensures maximum stability by isolating all token counting operations in separate processes, preventing potential memory leaks from affecting the main process.
Fix critical issues identified in code review:

- Make cleanupLanguageParser async and properly handle dispose()
  - Add try-catch-finally block with error handling
  - Set languageParserSingleton to null after disposal to prevent memory retention
  - Update onWorkerTermination in fileProcessWorker to await async cleanup

- Improve debug logging consistency
  - Use template literal formatting in languageParser debug messages

- Remove redundant cleanup in gitLogMetricsWorker
  - Remove finally block that duplicated onWorkerTermination cleanup
  - Consolidate resource cleanup to single termination handler for better performance

These changes ensure proper async handling, prevent memory retention, and eliminate redundant cleanup calls as suggested by code review bots.
github-actions bot and others added 4 commits September 1, 2025 13:40
Reverted worker runtime configuration from worker_threads back to child_process
across all worker implementations (fileCollect, globbyExecute, securityCheck)
to maintain consistency and avoid potential memory management issues.
Relocated memory testing tools from benchmarks/memory/ to scripts/memory/
to better align with project structure conventions. Updated GitHub Actions
workflow paths accordingly to maintain CI functionality.
Changed fileCollect worker runtime from child_process back to worker_threads
for better performance and memory efficiency in file collection operations.
@yamadashy yamadashy changed the title feat(core): Optimize worker pool configuration for balanced memory usage and performance feat(core): Implement worker memory management with Tinypool teardown and reorganize memory testing tools Sep 1, 2025
@yamadashy
Copy link
Owner Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a robust worker memory management system by leveraging Tinypool's teardown functionality and implementing universal cleanup functions across all workers. The reorganization of memory testing tools, including the addition of ASCII chart visualization, is a great improvement for debugging and maintainability. The strategic selection of worker runtimes based on memory risk is a thoughtful optimization.

My review includes a couple of suggestions to improve the clarity of the new memory test script and to remove a small redundancy in the worker pool creation logic. Overall, these are excellent changes that enhance the stability and developer experience of the project.

…script

- Extract graph display constants (MIN_POINTS_FOR_GRAPH=5, GRAPH_DATA_POINTS=40, GRAPH_HEIGHT=8)
- Make displayMemoryGraphs() more flexible by accepting history data as parameter
- Improve code maintainability and readability as suggested in PR review

Addresses PR review feedback from gemini-code-assist bot.
…ault

- Remove --graph/-g flag as graphs are now always enabled
- Update help text and feature display accordingly
- Simplify displayMemoryGraphs() condition check
- Graphs will now always show when sufficient data points are available

Makes memory visualization always available for better debugging experience.
@yamadashy yamadashy merged commit 37c62a2 into main Sep 1, 2025
96 of 105 checks passed
@yamadashy yamadashy deleted the feat/memory-check branch September 1, 2025 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants