ci(perf-benchmark): Enable GitHub autolink for commit SHAs in benchmark comments#1352
ci(perf-benchmark): Enable GitHub autolink for commit SHAs in benchmark comments#1352
Conversation
Replace tiktoken (WASM-based) with gpt-tokenizer (pure JS) for token
counting, eliminating ~200ms WASM initialization overhead while keeping
the existing worker pool infrastructure for parallel processing.
Changes:
- Swap tiktoken dependency for gpt-tokenizer in package.json
- Rewrite TokenCounter to use gpt-tokenizer's async dynamic import
with lazy-loaded encoding modules cached at module level
- Add TOKEN_ENCODINGS constant with Zod enum validation in config
schema, replacing unsafe type assertion
- Use { disallowedSpecial: new Set() } to match tiktoken's
encode(content, [], []) behavior (treat all text as plain text)
- Add p50k_edit encoding for backward compatibility
- Update worker to handle async getTokenCounter initialization
- Rewrite tests to use real gpt-tokenizer with exact token counts
The worker pool, parallel chunk processing, and pre-warming
infrastructure are preserved — only the underlying tokenizer changes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
⚡ Performance Benchmark
Details
|
📝 WalkthroughWalkthroughThe pull request replaces the Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request replaces the tiktoken library with gpt-tokenizer, transitioning the token counting implementation to a pure JavaScript solution. Key modifications include the introduction of an asynchronous initialization pattern for the TokenCounter class to support dynamic loading of encoding modules, updates to the configuration schema for stricter encoding validation, and necessary refactoring of worker threads and test suites to accommodate the new asynchronous flow. I have no feedback to provide.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (5)
src/core/metrics/TokenCounter.ts (1)
19-19: Narrow cache key type toTokenEncodingfor stronger invariants.Line 19 currently allows arbitrary string keys; using
Map<TokenEncoding, CountTokensFn>keeps cache typing aligned with the public encoding contract.♻️ Suggested diff
-const encodingModules = new Map<string, CountTokensFn>(); +const encodingModules = new Map<TokenEncoding, CountTokensFn>();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/core/metrics/TokenCounter.ts` at line 19, The cache key for encodingModules is too wide (string); change its declaration from Map<string, CountTokensFn> to Map<TokenEncoding, CountTokensFn> to enforce the public encoding contract and stronger typing. Update TokenCounter.ts to import or reference the TokenEncoding type and adjust any places that set or get from encodingModules to use TokenEncoding values (or cast explicitly where unavoidable) so the Map keys and CountTokensFn usages remain type-safe.src/config/configSchema.ts (1)
2-2: Decouple config schema fromTokenCounterruntime module.Importing constants from
src/core/metrics/TokenCounter.ts(Line 2) couples config parsing to a metrics implementation file. Consider movingTOKEN_ENCODINGSinto a small shared constants module (e.g.,src/core/metrics/tokenEncodings.ts) and importing from both places.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/config/configSchema.ts` at line 2, Move the TOKEN_ENCODINGS constant out of the TokenCounter runtime module into a small shared constants module and update imports so config schema and TokenCounter both import from that new module: create a new tokenEncodings module that exports TOKEN_ENCODINGS, replace the import in configSchema.ts (currently importing TOKEN_ENCODINGS from TokenCounter) to import from the new tokenEncodings module, and update TokenCounter to import TOKEN_ENCODINGS from the same new module to remove the runtime coupling.src/core/metrics/tokenCounterFactory.ts (1)
11-17: Avoid duplicate initialization under concurrent first access.Because the cache is populated only after
await(Line 16), parallel calls can create multipleTokenCounterinstances for the same encoding. Cache before awaiting and roll back on init failure.♻️ Suggested diff
export const getTokenCounter = async (encoding: TokenEncoding): Promise<TokenCounter> => { let tokenCounter = tokenCounters.get(encoding); if (!tokenCounter) { tokenCounter = new TokenCounter(encoding); - await tokenCounter.init(); - tokenCounters.set(encoding, tokenCounter); + tokenCounters.set(encoding, tokenCounter); + try { + await tokenCounter.init(); + } catch (error) { + tokenCounters.delete(encoding); + throw error; + } } return tokenCounter; };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/core/metrics/tokenCounterFactory.ts` around lines 11 - 17, getTokenCounter currently sets tokenCounters only after awaiting tokenCounter.init, so concurrent callers can create duplicate TokenCounter instances; fix by storing the newly created TokenCounter into tokenCounters immediately after construction (tokenCounters.set(encoding, tokenCounter)), then await tokenCounter.init(), and if init throws remove the entry from tokenCounters (tokenCounters.delete(encoding)) and rethrow the error; keep the early-return path (check tokenCounters.get(encoding)) so callers get the cached instance and ensure TokenCounter.init is referenced for rollback on failure.tests/core/metrics/calculateMetrics.test.ts (1)
9-12: Avoid duplicatingTOKEN_ENCODINGSin the mock.This can drift from the real export and make tests pass with stale values. Prefer importing the actual module and overriding only
TokenCounter.♻️ Suggested refactor
-vi.mock('../../../src/core/metrics/TokenCounter.js', () => { - return { - TOKEN_ENCODINGS: ['o200k_base', 'cl100k_base', 'p50k_base', 'p50k_edit', 'r50k_base'], - TokenCounter: vi.fn().mockImplementation(() => ({ - countTokens: vi.fn().mockReturnValue(10), - free: vi.fn(), - })), - }; -}); +vi.mock('../../../src/core/metrics/TokenCounter.js', async () => { + const actual = await vi.importActual<typeof import('../../../src/core/metrics/TokenCounter.js')>( + '../../../src/core/metrics/TokenCounter.js', + ); + return { + ...actual, + TokenCounter: vi.fn().mockImplementation(() => ({ + countTokens: vi.fn().mockReturnValue(10), + free: vi.fn(), + })), + }; +});🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/core/metrics/calculateMetrics.test.ts` around lines 9 - 12, The mock for TokenCounter is re-defining TOKEN_ENCODINGS and can drift from the real export; instead import the real module and only replace TokenCounter in the mock so TOKEN_ENCODINGS stays authoritative. Update the vi.mock for '../../../src/core/metrics/TokenCounter.js' to require/import the real module (to preserve TOKEN_ENCODINGS) and return an object that spreads the real exports but overrides TokenCounter with the vi.fn() implementation, keeping the rest of the module intact.tests/core/metrics/TokenCounter.test.ts (1)
73-77: This assertion is too weak for the behavior being described.
toBeGreaterThan(0)will pass for many unrelated cases and doesn’t prove the special sequence is treated as plain text.✅ Stronger assertion
test('should handle special token sequences as plain text', () => { - // gpt-tokenizer should treat <|endoftext|> as ordinary text, not a control token - const count = tokenCounter.countTokens('Hello <|endoftext|> world'); - expect(count).toBeGreaterThan(0); + const withSequence = tokenCounter.countTokens('Hello <|endoftext|> world'); + const withoutSequence = tokenCounter.countTokens('Hello world'); + expect(withSequence).toBeGreaterThan(withoutSequence); });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/core/metrics/TokenCounter.test.ts` around lines 73 - 77, The test "should handle special token sequences as plain text" uses a weak assertion; replace the expect(count).toBeGreaterThan(0) with a precise assertion that the tokenizer treats "<|endoftext|>" as ordinary text by comparing counts: compute count = tokenCounter.countTokens('Hello <|endoftext|> world') and assert it equals tokenCounter.countTokens('Hello world') + tokenCounter.countTokens('<|endoftext|>') (use the existing tokenCounter.countTokens helper to get those parts) so the test explicitly verifies the special sequence is tokenized as plain text.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/core/metrics/TokenCounter.test.ts`:
- Around line 79-83: The test title is misleading: change the description string
in the test containing new TokenCounter('o200k_base') and the expect(() =>
uninitCounter.countTokens('test')).toThrow('TokenCounter not initialized') so it
accurately describes the behavior (e.g., "should throw when not initialized" or
"throws 'TokenCounter not initialized' when init not called"); locate the test
in TokenCounter.test.ts and update only the test name while keeping the existing
TokenCounter, countTokens and init-related assertions unchanged.
---
Nitpick comments:
In `@src/config/configSchema.ts`:
- Line 2: Move the TOKEN_ENCODINGS constant out of the TokenCounter runtime
module into a small shared constants module and update imports so config schema
and TokenCounter both import from that new module: create a new tokenEncodings
module that exports TOKEN_ENCODINGS, replace the import in configSchema.ts
(currently importing TOKEN_ENCODINGS from TokenCounter) to import from the new
tokenEncodings module, and update TokenCounter to import TOKEN_ENCODINGS from
the same new module to remove the runtime coupling.
In `@src/core/metrics/TokenCounter.ts`:
- Line 19: The cache key for encodingModules is too wide (string); change its
declaration from Map<string, CountTokensFn> to Map<TokenEncoding, CountTokensFn>
to enforce the public encoding contract and stronger typing. Update
TokenCounter.ts to import or reference the TokenEncoding type and adjust any
places that set or get from encodingModules to use TokenEncoding values (or cast
explicitly where unavoidable) so the Map keys and CountTokensFn usages remain
type-safe.
In `@src/core/metrics/tokenCounterFactory.ts`:
- Around line 11-17: getTokenCounter currently sets tokenCounters only after
awaiting tokenCounter.init, so concurrent callers can create duplicate
TokenCounter instances; fix by storing the newly created TokenCounter into
tokenCounters immediately after construction (tokenCounters.set(encoding,
tokenCounter)), then await tokenCounter.init(), and if init throws remove the
entry from tokenCounters (tokenCounters.delete(encoding)) and rethrow the error;
keep the early-return path (check tokenCounters.get(encoding)) so callers get
the cached instance and ensure TokenCounter.init is referenced for rollback on
failure.
In `@tests/core/metrics/calculateMetrics.test.ts`:
- Around line 9-12: The mock for TokenCounter is re-defining TOKEN_ENCODINGS and
can drift from the real export; instead import the real module and only replace
TokenCounter in the mock so TOKEN_ENCODINGS stays authoritative. Update the
vi.mock for '../../../src/core/metrics/TokenCounter.js' to require/import the
real module (to preserve TOKEN_ENCODINGS) and return an object that spreads the
real exports but overrides TokenCounter with the vi.fn() implementation, keeping
the rest of the module intact.
In `@tests/core/metrics/TokenCounter.test.ts`:
- Around line 73-77: The test "should handle special token sequences as plain
text" uses a weak assertion; replace the expect(count).toBeGreaterThan(0) with a
precise assertion that the tokenizer treats "<|endoftext|>" as ordinary text by
comparing counts: compute count = tokenCounter.countTokens('Hello <|endoftext|>
world') and assert it equals tokenCounter.countTokens('Hello world') +
tokenCounter.countTokens('<|endoftext|>') (use the existing
tokenCounter.countTokens helper to get those parts) so the test explicitly
verifies the special sequence is tokenized as plain text.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1aa8931d-07a8-46ea-a2f3-cb5a5377b40c
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (11)
package.jsonsrc/config/configSchema.tssrc/core/metrics/TokenCounter.tssrc/core/metrics/calculateOutputMetrics.tssrc/core/metrics/calculateSelectiveFileMetrics.tssrc/core/metrics/tokenCounterFactory.tssrc/core/metrics/workers/calculateMetricsWorker.tstests/core/metrics/TokenCounter.test.tstests/core/metrics/calculateMetrics.test.tstests/core/metrics/diffTokenCount.test.tstests/core/packager.test.ts
| test('should return 0 for errors when not initialized', () => { | ||
| const uninitCounter = new TokenCounter('o200k_base'); | ||
| // Not calling init() - should throw | ||
| expect(() => uninitCounter.countTokens('test')).toThrow('TokenCounter not initialized'); | ||
| }); |
There was a problem hiding this comment.
Test name contradicts the assertion.
At Line 79 the title says “return 0”, but the test asserts a throw. Rename for accuracy.
✏️ Suggested rename
-test('should return 0 for errors when not initialized', () => {
+test('should throw when counting tokens before initialization', () => {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| test('should return 0 for errors when not initialized', () => { | |
| const uninitCounter = new TokenCounter('o200k_base'); | |
| // Not calling init() - should throw | |
| expect(() => uninitCounter.countTokens('test')).toThrow('TokenCounter not initialized'); | |
| }); | |
| test('should throw when counting tokens before initialization', () => { | |
| const uninitCounter = new TokenCounter('o200k_base'); | |
| // Not calling init() - should throw | |
| expect(() => uninitCounter.countTokens('test')).toThrow('TokenCounter not initialized'); | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/core/metrics/TokenCounter.test.ts` around lines 79 - 83, The test title
is misleading: change the description string in the test containing new
TokenCounter('o200k_base') and the expect(() =>
uninitCounter.countTokens('test')).toThrow('TokenCounter not initialized') so it
accurately describes the behavior (e.g., "should throw when not initialized" or
"throws 'TokenCounter not initialized' when init not called"); locate the test
in TokenCounter.test.ts and update only the test name while keeping the existing
TokenCounter, countTokens and init-related assertions unchanged.
Code Review PR 1352Overall: The tiktoken to gpt-tokenizer migration is well-executed. The code is clean, type-safe, and the caching strategy is sound. A few items worth addressing below. PR Metadata Mismatch (Required) The PR title says ci perf-benchmark Enable GitHub autolink for commit SHAs but the actual changes replace tiktoken with gpt-tokenizer, a significant dependency swap. The commit message perf(core): Replace tiktoken WASM with gpt-tokenizer (pure JS) is accurate, but the PR title/description should be updated to match. Noteworthy Items (Recommended)
What Looks Good
Performance Note Pure JS will likely be slightly slower per-token than WASM, but for Repomix use case the faster startup trade-off is reasonable. CI benchmark should confirm no significant regression. Generated with Claude Code |
Remove
<code>HTML tags wrapping commit SHAs in perf benchmark PR comments so that GitHub's autolink feature can recognize and link them automatically.Checklist
npm run testnpm run lint