perf(core): Reduce output token counting IPC overhead#1438
perf(core): Reduce output token counting IPC overhead#1438
Conversation
📝 WalkthroughWalkthroughThe changes introduce dynamic parallel chunking based on process concurrency in token counting metrics, replace fixed-size chunk calculation with CPU-aware sizing, and add log-level gating to reduce overhead in memory utilities when detailed logging is disabled. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Deploying repomix with
|
| Latest commit: |
7400698
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://74ed3363.repomix.pages.dev |
| Branch Preview URL: | https://perf-output-token-ipc-optimi.repomix.pages.dev |
There was a problem hiding this comment.
Code Review
This pull request optimizes parallel metric calculation by scaling the number of chunks with available CPU cores and introduces log-level checks to memory utility functions to reduce overhead. Review feedback suggests refining the chunking logic to maintain a minimum chunk size of 200KB, preventing excessive IPC overhead on high-core systems, and updating the associated tests to reflect this change.
| const numChunks = Math.max(1, getProcessConcurrency()); | ||
| const chunkSize = Math.ceil(content.length / numChunks); |
There was a problem hiding this comment.
The current implementation may lead to excessive IPC overhead on systems with many CPU cores. For example, on a 64-core machine, a 1.1MB file would be split into 64 small chunks (~17KB each), resulting in 64 postMessage calls.
To minimize IPC overhead while still saturating available cores, it is better to ensure chunks don't fall below a reasonable size. Using the previously established 200KB "sweet spot" as a minimum chunk size (by capping the number of chunks) ensures that parallelization benefits aren't negated by message serialization costs.
| const numChunks = Math.max(1, getProcessConcurrency()); | |
| const chunkSize = Math.ceil(content.length / numChunks); | |
| const numChunks = Math.min(getProcessConcurrency(), Math.ceil(content.length / 200_000)); | |
| const chunkSize = Math.ceil(content.length / numChunks); |
| const expectedChunks = getProcessConcurrency(); | ||
| expect(chunksProcessed).toBe(expectedChunks); // Should match number of CPU cores |
There was a problem hiding this comment.
If the chunking logic in calculateOutputMetrics.ts is updated to cap the number of chunks based on a minimum size, this test expectation should be updated accordingly to reflect the actual number of chunks produced.
| const expectedChunks = getProcessConcurrency(); | |
| expect(chunksProcessed).toBe(expectedChunks); // Should match number of CPU cores | |
| const expectedChunks = Math.min(getProcessConcurrency(), Math.ceil(content.length / 200_000)); | |
| expect(chunksProcessed).toBe(expectedChunks); |
|
|
||
| // With TARGET_CHARS_PER_CHUNK=200_000, 1.1M character content should produce 6 chunks | ||
| // Check that chunks are roughly equal in size | ||
| const expectedChunks = getProcessConcurrency(); |
There was a problem hiding this comment.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/core/metrics/calculateOutputMetrics.test.ts (1)
10-10: Stabilize these tests by mocking process concurrency.These assertions currently depend on host CPU concurrency, which can vary across CI environments. Consider pinning
getProcessConcurrency()to a fixed test value for deterministic behavior.Proposed refactor
-import { getProcessConcurrency, type WorkerOptions } from '../../../src/shared/processConcurrency.js'; +import * as processConcurrency from '../../../src/shared/processConcurrency.js'; +import type { WorkerOptions } from '../../../src/shared/processConcurrency.js'; vi.mock('../../../src/shared/logger'); +const MOCK_PROCESS_CONCURRENCY = 4; +vi.spyOn(processConcurrency, 'getProcessConcurrency').mockReturnValue(MOCK_PROCESS_CONCURRENCY); ... - const expectedChunks = getProcessConcurrency(); + const expectedChunks = MOCK_PROCESS_CONCURRENCY; ... - const expectedChunks = getProcessConcurrency(); + const expectedChunks = MOCK_PROCESS_CONCURRENCY;Also applies to: 121-124, 177-184
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/core/metrics/calculateOutputMetrics.test.ts` at line 10, The tests are unstable because they rely on the real getProcessConcurrency() value; fix by mocking getProcessConcurrency from '../../../src/shared/processConcurrency.js' to return a fixed deterministic number (e.g. 4) for the duration of these tests (use jest.spyOn or your test-runner's module-mocking API before importing/using the code under test), update any places referencing WorkerOptions if they derive expected values from concurrency, and ensure you restore the original implementation after the tests to avoid cross-test pollution.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/core/metrics/calculateOutputMetrics.test.ts`:
- Line 10: The tests are unstable because they rely on the real
getProcessConcurrency() value; fix by mocking getProcessConcurrency from
'../../../src/shared/processConcurrency.js' to return a fixed deterministic
number (e.g. 4) for the duration of these tests (use jest.spyOn or your
test-runner's module-mocking API before importing/using the code under test),
update any places referencing WorkerOptions if they derive expected values from
concurrency, and ensure you restore the original implementation after the tests
to avoid cross-test pollution.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f27f6d0d-c3aa-45bb-9796-7699cebf2a4f
📒 Files selected for processing (3)
src/core/metrics/calculateOutputMetrics.tssrc/shared/memoryUtils.tstests/core/metrics/calculateOutputMetrics.test.ts
⚡ Performance Benchmark
Details
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1438 +/- ##
==========================================
- Coverage 87.32% 86.74% -0.58%
==========================================
Files 117 117
Lines 4426 4430 +4
Branches 1022 1024 +2
==========================================
- Hits 3865 3843 -22
- Misses 561 587 +26 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Code Review —
|
| Content size | Old (200K chunks) | New (64-core machine) |
|---|---|---|
| 2MB | 10 chunks | 64 chunks |
| 4MB | 20 chunks | 64 chunks |
| 8MB | 40 chunks | 64 chunks |
On a typical 8-core laptop, 2MB → 8 chunks vs old 10 — slight improvement. But the regression on high-core environments is significant and contradicts the PR title.
Suggested fix: Use core count as a cap rather than a mandate, preserving the empirically-benchmarked 200K minimum chunk size:
const TARGET_CHARS_PER_CHUNK = 200_000;
const numChunks = Math.min(
Math.max(1, getProcessConcurrency()),
Math.ceil(content.length / TARGET_CHARS_PER_CHUNK)
);This gives fewer IPC round-trips on typical hardware while preventing over-chunking on high-core machines.
Tests depend on runtime CPU count
getProcessConcurrency() is called live (not mocked) in tests, making assertions non-deterministic across environments. On a 1-core CI container, expectedChunks = 1 collapses the parallel test into a degenerate single-chunk case that doesn't exercise chunking logic at all. CodeRabbit flagged this too — mocking to a fixed value (e.g., 4) would make tests reliable.
Minor observations
logMemoryDifferencehas no guard — safe today because it's only called from within the guardedwithMemoryLogging, but inconsistent with the sibling functions. Consider adding a guard for consistency.- Chunk equality assertion is very loose —
expect(maxDiff).toBeLessThan(Math.ceil(content.length / expectedChunks))will always pass by construction. A tighter bound liketoBeLessThanOrEqual(1)would be more meaningful. - Memory logging guard is correct —
process.memoryUsage()is a syscall; skipping it at INFO level is a clean win. - Pre-existing: BPE tokens spanning chunk boundaries — splitting a string and tokenizing chunks independently inflates the total token count since BPE tokens can span boundaries. Not introduced by this PR, but worth noting as a known limitation.
🤖 Generated with Claude Code
Summary
getProcessConcurrency()process.memoryUsage()calls when log level is below DEBUGCherry-picked from 5de897c (PR #1428)
Test plan