Skip to content

perf(core): Reduce output token counting IPC overhead#1438

Closed
yamadashy wants to merge 1 commit intomainfrom
perf/output-token-ipc-optimization
Closed

perf(core): Reduce output token counting IPC overhead#1438
yamadashy wants to merge 1 commit intomainfrom
perf/output-token-ipc-optimization

Conversation

@yamadashy
Copy link
Copy Markdown
Owner

@yamadashy yamadashy commented Apr 9, 2026

Summary

  • Replace hardcoded TARGET_CHARS_PER_CHUNK=200K with CPU-core-based chunking via getProcessConcurrency()
  • Skip expensive process.memoryUsage() calls when log level is below DEBUG

Cherry-picked from 5de897c (PR #1428)

Test plan

  • All tests passing
  • Build clean

Open with Devin

Replace hardcoded TARGET_CHARS_PER_CHUNK=200K with CPU-core-based
chunking via getProcessConcurrency(). Skip expensive process.memoryUsage()
calls when log level is below DEBUG.

Cherry-picked from 5de897c (PR #1428)

Co-Authored-By: Claude <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 9, 2026

📝 Walkthrough

Walkthrough

The changes introduce dynamic parallel chunking based on process concurrency in token counting metrics, replace fixed-size chunk calculation with CPU-aware sizing, and add log-level gating to reduce overhead in memory utilities when detailed logging is disabled.

Changes

Cohort / File(s) Summary
Dynamic Parallel Chunking
src/core/metrics/calculateOutputMetrics.ts
Replaced fixed target size (200,000 characters) with dynamic chunk calculation derived from getProcessConcurrency(). Chunk count is max(1, getProcessConcurrency()) and chunk size is ceil(content.length / numChunks), maintaining existing parallel processing flow.
Memory Logging Optimization
src/shared/memoryUtils.ts
Added log-level gating to logMemoryUsage and withMemoryLogging functions that early-exits unless log level is at least DEBUG, avoiding unnecessary memory capture and logging overhead when trace logging is disabled.
Test Updates
tests/core/metrics/calculateOutputMetrics.test.ts
Updated test assertions to derive expected chunk count from getProcessConcurrency() instead of hardcoded values. Relaxed chunk size assertions to check for "roughly equal" distribution rather than exact per-chunk sizes, and tightened parallel execution verification.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'perf(core): Reduce output token counting IPC overhead' directly and accurately summarizes the main performance improvement objective of replacing hardcoded chunking with CPU-aware concurrency and skipping expensive memory operations.
Description check ✅ Passed The PR description includes a clear summary of the two main changes, a test plan with verification checkmarks, and covers the key objectives. It follows the repository's template structure with summary and test plan sections.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/output-token-ipc-optimization

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying repomix with  Cloudflare Pages  Cloudflare Pages

Latest commit: 7400698
Status: ✅  Deploy successful!
Preview URL: https://74ed3363.repomix.pages.dev
Branch Preview URL: https://perf-output-token-ipc-optimi.repomix.pages.dev

View logs

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes parallel metric calculation by scaling the number of chunks with available CPU cores and introduces log-level checks to memory utility functions to reduce overhead. Review feedback suggests refining the chunking logic to maintain a minimum chunk size of 200KB, preventing excessive IPC overhead on high-core systems, and updating the associated tests to reflect this change.

Comment on lines +26 to +27
const numChunks = Math.max(1, getProcessConcurrency());
const chunkSize = Math.ceil(content.length / numChunks);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation may lead to excessive IPC overhead on systems with many CPU cores. For example, on a 64-core machine, a 1.1MB file would be split into 64 small chunks (~17KB each), resulting in 64 postMessage calls.

To minimize IPC overhead while still saturating available cores, it is better to ensure chunks don't fall below a reasonable size. Using the previously established 200KB "sweet spot" as a minimum chunk size (by capping the number of chunks) ensures that parallelization benefits aren't negated by message serialization costs.

Suggested change
const numChunks = Math.max(1, getProcessConcurrency());
const chunkSize = Math.ceil(content.length / numChunks);
const numChunks = Math.min(getProcessConcurrency(), Math.ceil(content.length / 200_000));
const chunkSize = Math.ceil(content.length / numChunks);

Comment on lines +121 to +122
const expectedChunks = getProcessConcurrency();
expect(chunksProcessed).toBe(expectedChunks); // Should match number of CPU cores
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If the chunking logic in calculateOutputMetrics.ts is updated to cap the number of chunks based on a minimum size, this test expectation should be updated accordingly to reflect the actual number of chunks produced.

Suggested change
const expectedChunks = getProcessConcurrency();
expect(chunksProcessed).toBe(expectedChunks); // Should match number of CPU cores
const expectedChunks = Math.min(getProcessConcurrency(), Math.ceil(content.length / 200_000));
expect(chunksProcessed).toBe(expectedChunks);


// With TARGET_CHARS_PER_CHUNK=200_000, 1.1M character content should produce 6 chunks
// Check that chunks are roughly equal in size
const expectedChunks = getProcessConcurrency();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This expectation should also be updated to match the improved chunking logic that accounts for a minimum chunk size.

Suggested change
const expectedChunks = getProcessConcurrency();
const expectedChunks = Math.min(getProcessConcurrency(), Math.ceil(content.length / 200_000));

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/core/metrics/calculateOutputMetrics.test.ts (1)

10-10: Stabilize these tests by mocking process concurrency.

These assertions currently depend on host CPU concurrency, which can vary across CI environments. Consider pinning getProcessConcurrency() to a fixed test value for deterministic behavior.

Proposed refactor
-import { getProcessConcurrency, type WorkerOptions } from '../../../src/shared/processConcurrency.js';
+import * as processConcurrency from '../../../src/shared/processConcurrency.js';
+import type { WorkerOptions } from '../../../src/shared/processConcurrency.js';

 vi.mock('../../../src/shared/logger');
+const MOCK_PROCESS_CONCURRENCY = 4;
+vi.spyOn(processConcurrency, 'getProcessConcurrency').mockReturnValue(MOCK_PROCESS_CONCURRENCY);

 ...
-    const expectedChunks = getProcessConcurrency();
+    const expectedChunks = MOCK_PROCESS_CONCURRENCY;
 ...
-    const expectedChunks = getProcessConcurrency();
+    const expectedChunks = MOCK_PROCESS_CONCURRENCY;

Also applies to: 121-124, 177-184

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/metrics/calculateOutputMetrics.test.ts` at line 10, The tests are
unstable because they rely on the real getProcessConcurrency() value; fix by
mocking getProcessConcurrency from '../../../src/shared/processConcurrency.js'
to return a fixed deterministic number (e.g. 4) for the duration of these tests
(use jest.spyOn or your test-runner's module-mocking API before importing/using
the code under test), update any places referencing WorkerOptions if they derive
expected values from concurrency, and ensure you restore the original
implementation after the tests to avoid cross-test pollution.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/core/metrics/calculateOutputMetrics.test.ts`:
- Line 10: The tests are unstable because they rely on the real
getProcessConcurrency() value; fix by mocking getProcessConcurrency from
'../../../src/shared/processConcurrency.js' to return a fixed deterministic
number (e.g. 4) for the duration of these tests (use jest.spyOn or your
test-runner's module-mocking API before importing/using the code under test),
update any places referencing WorkerOptions if they derive expected values from
concurrency, and ensure you restore the original implementation after the tests
to avoid cross-test pollution.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f27f6d0d-c3aa-45bb-9796-7699cebf2a4f

📥 Commits

Reviewing files that changed from the base of the PR and between eafa70a and 7400698.

📒 Files selected for processing (3)
  • src/core/metrics/calculateOutputMetrics.ts
  • src/shared/memoryUtils.ts
  • tests/core/metrics/calculateOutputMetrics.test.ts

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

⚡ Performance Benchmark

Latest commit:7400698 perf(core): Reduce output token counting IPC overhead
Status:✅ Benchmark complete!
Ubuntu:1.40s (±0.02s) → 1.40s (±0.03s) · +0.00s (+0.0%)
macOS:1.17s (±0.24s) → 1.34s (±0.25s) · +0.17s (+14.7%)
Windows:1.91s (±0.40s) → 1.93s (±0.14s) · +0.02s (+1.1%)
Details
  • Packing the repomix repository with node bin/repomix.cjs
  • Warmup: 2 runs (discarded), interleaved execution
  • Measurement: 20 runs / 30 on macOS (median ± IQR)
  • Workflow run

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.74%. Comparing base (eafa70a) to head (7400698).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1438      +/-   ##
==========================================
- Coverage   87.32%   86.74%   -0.58%     
==========================================
  Files         117      117              
  Lines        4426     4430       +4     
  Branches     1022     1024       +2     
==========================================
- Hits         3865     3843      -22     
- Misses        561      587      +26     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 9, 2026

Code Review — perf(core): Reduce output token counting IPC overhead

Good optimization direction overall. The memory logging guard is clean and correct. The chunking strategy change has a design issue worth discussing before merge.

High-core regression inverts the stated goal

The PR aims to reduce IPC overhead by creating fewer chunks, but on high-core machines (32–64+ cores, common in CI/cloud), the new approach creates more chunks than the old 200K strategy for content near the 1MB threshold:

Content size Old (200K chunks) New (64-core machine)
2MB 10 chunks 64 chunks
4MB 20 chunks 64 chunks
8MB 40 chunks 64 chunks

On a typical 8-core laptop, 2MB → 8 chunks vs old 10 — slight improvement. But the regression on high-core environments is significant and contradicts the PR title.

Suggested fix: Use core count as a cap rather than a mandate, preserving the empirically-benchmarked 200K minimum chunk size:

const TARGET_CHARS_PER_CHUNK = 200_000;
const numChunks = Math.min(
  Math.max(1, getProcessConcurrency()),
  Math.ceil(content.length / TARGET_CHARS_PER_CHUNK)
);

This gives fewer IPC round-trips on typical hardware while preventing over-chunking on high-core machines.

Tests depend on runtime CPU count

getProcessConcurrency() is called live (not mocked) in tests, making assertions non-deterministic across environments. On a 1-core CI container, expectedChunks = 1 collapses the parallel test into a degenerate single-chunk case that doesn't exercise chunking logic at all. CodeRabbit flagged this too — mocking to a fixed value (e.g., 4) would make tests reliable.

Minor observations
  • logMemoryDifference has no guard — safe today because it's only called from within the guarded withMemoryLogging, but inconsistent with the sibling functions. Consider adding a guard for consistency.
  • Chunk equality assertion is very looseexpect(maxDiff).toBeLessThan(Math.ceil(content.length / expectedChunks)) will always pass by construction. A tighter bound like toBeLessThanOrEqual(1) would be more meaningful.
  • Memory logging guard is correctprocess.memoryUsage() is a syscall; skipping it at INFO level is a clean win.
  • Pre-existing: BPE tokens spanning chunk boundaries — splitting a string and tokenizing chunks independently inflates the total token count since BPE tokens can span boundaries. Not introduced by this PR, but worth noting as a known limitation.

🤖 Generated with Claude Code

@yamadashy yamadashy closed this Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant