Skip to content

ci(perf-benchmark): Enable GitHub autolink for commit SHAs in benchmark comments#1352

Closed
yamadashy wants to merge 1 commit intomainfrom
ci/bench-comment-autolink-commit-sha
Closed

ci(perf-benchmark): Enable GitHub autolink for commit SHAs in benchmark comments#1352
yamadashy wants to merge 1 commit intomainfrom
ci/bench-comment-autolink-commit-sha

Conversation

@yamadashy
Copy link
Copy Markdown
Owner

@yamadashy yamadashy commented Mar 29, 2026

Remove <code> HTML tags wrapping commit SHAs in perf benchmark PR comments so that GitHub's autolink feature can recognize and link them automatically.

Checklist

  • Run npm run test
  • Run npm run lint

Open with Devin

Replace tiktoken (WASM-based) with gpt-tokenizer (pure JS) for token
counting, eliminating ~200ms WASM initialization overhead while keeping
the existing worker pool infrastructure for parallel processing.

Changes:
- Swap tiktoken dependency for gpt-tokenizer in package.json
- Rewrite TokenCounter to use gpt-tokenizer's async dynamic import
  with lazy-loaded encoding modules cached at module level
- Add TOKEN_ENCODINGS constant with Zod enum validation in config
  schema, replacing unsafe type assertion
- Use { disallowedSpecial: new Set() } to match tiktoken's
  encode(content, [], []) behavior (treat all text as plain text)
- Add p50k_edit encoding for backward compatibility
- Update worker to handle async getTokenCounter initialization
- Rewrite tests to use real gpt-tokenizer with exact token counts

The worker pool, parallel chunk processing, and pre-warming
infrastructure are preserved — only the underlying tokenizer changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 29, 2026

⚡ Performance Benchmark

Latest commit:bb67792 perf(core): Replace tiktoken WASM with gpt-tokenizer (pure JS)
Status:✅ Benchmark complete!
Ubuntu:2.14s (±0.01s) → 2.05s (±0.04s) · -0.09s (-4.0%)
macOS:1.03s (±0.03s) → 1.00s (±0.05s) · -0.03s (-2.6%)
Windows:2.28s (±0.03s) → 2.24s (±0.04s) · -0.04s (-1.9%)
Details
  • Packing the repomix repository with node bin/repomix.cjs
  • Warmup: 2 runs (discarded), interleaved execution
  • Measurement: 20 runs / 30 on macOS (median ± IQR)
  • Workflow run

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 29, 2026

📝 Walkthrough

Walkthrough

The pull request replaces the tiktoken tokenizer library with gpt-tokenizer across the codebase. Changes include updating dependencies, refactoring TokenCounter to support async initialization with the new library, introducing a TokenEncoding type to replace TiktokenEncoding, and updating all related type signatures, factory methods, and tests to accommodate the new tokenizer backend.

Changes

Cohort / File(s) Summary
Dependency Update
package.json
Replaced tiktoken (^1.0.22) with gpt-tokenizer (^3.4.0) in dependencies.
TokenCounter Core Refactoring
src/core/metrics/TokenCounter.ts
Replaced tiktoken implementation with gpt-tokenizer; added async init() method for lazy encoding loading, exported TOKEN_ENCODINGS constant and TokenEncoding type, converted free() to a no-op, and updated runtime token counting via the new library's API.
Configuration & Schema
src/config/configSchema.ts
Updated tokenCount.encoding validation to use TOKEN_ENCODINGS enum instead of plain string, replacing TiktokenEncoding import with reference to TokenEncoding.
Type Signature Updates
src/core/metrics/calculateOutputMetrics.ts, src/core/metrics/calculateSelectiveFileMetrics.ts, src/core/metrics/workers/calculateMetricsWorker.ts
Changed encoding parameter types from TiktokenEncoding to TokenEncoding across function signatures and task interfaces.
Factory Pattern Update
src/core/metrics/tokenCounterFactory.ts
Converted getTokenCounter from synchronous to async, adding await tokenCounter.init() on token counter creation and updating cache key type to TokenEncoding.
Test Updates
tests/core/metrics/TokenCounter.test.ts, tests/core/metrics/calculateMetrics.test.ts, tests/core/metrics/diffTokenCount.test.ts, tests/core/packager.test.ts
Removed tiktoken mocks and added real TokenCounter initialization with init() in unit tests; added TOKEN_ENCODINGS export to all TokenCounter module mocks; replaced mocked token counts with concrete expected values from actual gpt-tokenizer.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The PR title describes removing HTML tags for commit SHA autolinking in benchmark comments, but the actual changes involve replacing the tiktoken dependency with gpt-tokenizer, which is unrelated. Update the PR title to reflect the actual changes: replacing tiktoken with gpt-tokenizer and making TokenCounter async-initialized. The current title describes a different, unrelated feature.
Description check ⚠️ Warning The PR description discusses removing HTML code tags for GitHub autolinking of commit SHAs in benchmark comments, but the actual changes show a major dependency migration from tiktoken to gpt-tokenizer with significant API changes. Update the PR description to accurately explain the actual changes: dependency migration, new TokenEncoding types, async TokenCounter initialization, and related test updates.
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ci/bench-comment-autolink-commit-sha

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces the tiktoken library with gpt-tokenizer, transitioning the token counting implementation to a pure JavaScript solution. Key modifications include the introduction of an asynchronous initialization pattern for the TokenCounter class to support dynamic loading of encoding modules, updates to the configuration schema for stricter encoding validation, and necessary refactoring of worker threads and test suites to accommodate the new asynchronous flow. I have no feedback to provide.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
src/core/metrics/TokenCounter.ts (1)

19-19: Narrow cache key type to TokenEncoding for stronger invariants.

Line 19 currently allows arbitrary string keys; using Map<TokenEncoding, CountTokensFn> keeps cache typing aligned with the public encoding contract.

♻️ Suggested diff
-const encodingModules = new Map<string, CountTokensFn>();
+const encodingModules = new Map<TokenEncoding, CountTokensFn>();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/metrics/TokenCounter.ts` at line 19, The cache key for
encodingModules is too wide (string); change its declaration from Map<string,
CountTokensFn> to Map<TokenEncoding, CountTokensFn> to enforce the public
encoding contract and stronger typing. Update TokenCounter.ts to import or
reference the TokenEncoding type and adjust any places that set or get from
encodingModules to use TokenEncoding values (or cast explicitly where
unavoidable) so the Map keys and CountTokensFn usages remain type-safe.
src/config/configSchema.ts (1)

2-2: Decouple config schema from TokenCounter runtime module.

Importing constants from src/core/metrics/TokenCounter.ts (Line 2) couples config parsing to a metrics implementation file. Consider moving TOKEN_ENCODINGS into a small shared constants module (e.g., src/core/metrics/tokenEncodings.ts) and importing from both places.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config/configSchema.ts` at line 2, Move the TOKEN_ENCODINGS constant out
of the TokenCounter runtime module into a small shared constants module and
update imports so config schema and TokenCounter both import from that new
module: create a new tokenEncodings module that exports TOKEN_ENCODINGS, replace
the import in configSchema.ts (currently importing TOKEN_ENCODINGS from
TokenCounter) to import from the new tokenEncodings module, and update
TokenCounter to import TOKEN_ENCODINGS from the same new module to remove the
runtime coupling.
src/core/metrics/tokenCounterFactory.ts (1)

11-17: Avoid duplicate initialization under concurrent first access.

Because the cache is populated only after await (Line 16), parallel calls can create multiple TokenCounter instances for the same encoding. Cache before awaiting and roll back on init failure.

♻️ Suggested diff
 export const getTokenCounter = async (encoding: TokenEncoding): Promise<TokenCounter> => {
   let tokenCounter = tokenCounters.get(encoding);
   if (!tokenCounter) {
     tokenCounter = new TokenCounter(encoding);
-    await tokenCounter.init();
-    tokenCounters.set(encoding, tokenCounter);
+    tokenCounters.set(encoding, tokenCounter);
+    try {
+      await tokenCounter.init();
+    } catch (error) {
+      tokenCounters.delete(encoding);
+      throw error;
+    }
   }
   return tokenCounter;
 };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/metrics/tokenCounterFactory.ts` around lines 11 - 17,
getTokenCounter currently sets tokenCounters only after awaiting
tokenCounter.init, so concurrent callers can create duplicate TokenCounter
instances; fix by storing the newly created TokenCounter into tokenCounters
immediately after construction (tokenCounters.set(encoding, tokenCounter)), then
await tokenCounter.init(), and if init throws remove the entry from
tokenCounters (tokenCounters.delete(encoding)) and rethrow the error; keep the
early-return path (check tokenCounters.get(encoding)) so callers get the cached
instance and ensure TokenCounter.init is referenced for rollback on failure.
tests/core/metrics/calculateMetrics.test.ts (1)

9-12: Avoid duplicating TOKEN_ENCODINGS in the mock.

This can drift from the real export and make tests pass with stale values. Prefer importing the actual module and overriding only TokenCounter.

♻️ Suggested refactor
-vi.mock('../../../src/core/metrics/TokenCounter.js', () => {
-  return {
-    TOKEN_ENCODINGS: ['o200k_base', 'cl100k_base', 'p50k_base', 'p50k_edit', 'r50k_base'],
-    TokenCounter: vi.fn().mockImplementation(() => ({
-      countTokens: vi.fn().mockReturnValue(10),
-      free: vi.fn(),
-    })),
-  };
-});
+vi.mock('../../../src/core/metrics/TokenCounter.js', async () => {
+  const actual = await vi.importActual<typeof import('../../../src/core/metrics/TokenCounter.js')>(
+    '../../../src/core/metrics/TokenCounter.js',
+  );
+  return {
+    ...actual,
+    TokenCounter: vi.fn().mockImplementation(() => ({
+      countTokens: vi.fn().mockReturnValue(10),
+      free: vi.fn(),
+    })),
+  };
+});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/metrics/calculateMetrics.test.ts` around lines 9 - 12, The mock
for TokenCounter is re-defining TOKEN_ENCODINGS and can drift from the real
export; instead import the real module and only replace TokenCounter in the mock
so TOKEN_ENCODINGS stays authoritative. Update the vi.mock for
'../../../src/core/metrics/TokenCounter.js' to require/import the real module
(to preserve TOKEN_ENCODINGS) and return an object that spreads the real exports
but overrides TokenCounter with the vi.fn() implementation, keeping the rest of
the module intact.
tests/core/metrics/TokenCounter.test.ts (1)

73-77: This assertion is too weak for the behavior being described.

toBeGreaterThan(0) will pass for many unrelated cases and doesn’t prove the special sequence is treated as plain text.

✅ Stronger assertion
 test('should handle special token sequences as plain text', () => {
-  // gpt-tokenizer should treat <|endoftext|> as ordinary text, not a control token
-  const count = tokenCounter.countTokens('Hello <|endoftext|> world');
-  expect(count).toBeGreaterThan(0);
+  const withSequence = tokenCounter.countTokens('Hello <|endoftext|> world');
+  const withoutSequence = tokenCounter.countTokens('Hello  world');
+  expect(withSequence).toBeGreaterThan(withoutSequence);
 });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/metrics/TokenCounter.test.ts` around lines 73 - 77, The test
"should handle special token sequences as plain text" uses a weak assertion;
replace the expect(count).toBeGreaterThan(0) with a precise assertion that the
tokenizer treats "<|endoftext|>" as ordinary text by comparing counts: compute
count = tokenCounter.countTokens('Hello <|endoftext|> world') and assert it
equals tokenCounter.countTokens('Hello  world') +
tokenCounter.countTokens('<|endoftext|>') (use the existing
tokenCounter.countTokens helper to get those parts) so the test explicitly
verifies the special sequence is tokenized as plain text.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/core/metrics/TokenCounter.test.ts`:
- Around line 79-83: The test title is misleading: change the description string
in the test containing new TokenCounter('o200k_base') and the expect(() =>
uninitCounter.countTokens('test')).toThrow('TokenCounter not initialized') so it
accurately describes the behavior (e.g., "should throw when not initialized" or
"throws 'TokenCounter not initialized' when init not called"); locate the test
in TokenCounter.test.ts and update only the test name while keeping the existing
TokenCounter, countTokens and init-related assertions unchanged.

---

Nitpick comments:
In `@src/config/configSchema.ts`:
- Line 2: Move the TOKEN_ENCODINGS constant out of the TokenCounter runtime
module into a small shared constants module and update imports so config schema
and TokenCounter both import from that new module: create a new tokenEncodings
module that exports TOKEN_ENCODINGS, replace the import in configSchema.ts
(currently importing TOKEN_ENCODINGS from TokenCounter) to import from the new
tokenEncodings module, and update TokenCounter to import TOKEN_ENCODINGS from
the same new module to remove the runtime coupling.

In `@src/core/metrics/TokenCounter.ts`:
- Line 19: The cache key for encodingModules is too wide (string); change its
declaration from Map<string, CountTokensFn> to Map<TokenEncoding, CountTokensFn>
to enforce the public encoding contract and stronger typing. Update
TokenCounter.ts to import or reference the TokenEncoding type and adjust any
places that set or get from encodingModules to use TokenEncoding values (or cast
explicitly where unavoidable) so the Map keys and CountTokensFn usages remain
type-safe.

In `@src/core/metrics/tokenCounterFactory.ts`:
- Around line 11-17: getTokenCounter currently sets tokenCounters only after
awaiting tokenCounter.init, so concurrent callers can create duplicate
TokenCounter instances; fix by storing the newly created TokenCounter into
tokenCounters immediately after construction (tokenCounters.set(encoding,
tokenCounter)), then await tokenCounter.init(), and if init throws remove the
entry from tokenCounters (tokenCounters.delete(encoding)) and rethrow the error;
keep the early-return path (check tokenCounters.get(encoding)) so callers get
the cached instance and ensure TokenCounter.init is referenced for rollback on
failure.

In `@tests/core/metrics/calculateMetrics.test.ts`:
- Around line 9-12: The mock for TokenCounter is re-defining TOKEN_ENCODINGS and
can drift from the real export; instead import the real module and only replace
TokenCounter in the mock so TOKEN_ENCODINGS stays authoritative. Update the
vi.mock for '../../../src/core/metrics/TokenCounter.js' to require/import the
real module (to preserve TOKEN_ENCODINGS) and return an object that spreads the
real exports but overrides TokenCounter with the vi.fn() implementation, keeping
the rest of the module intact.

In `@tests/core/metrics/TokenCounter.test.ts`:
- Around line 73-77: The test "should handle special token sequences as plain
text" uses a weak assertion; replace the expect(count).toBeGreaterThan(0) with a
precise assertion that the tokenizer treats "<|endoftext|>" as ordinary text by
comparing counts: compute count = tokenCounter.countTokens('Hello <|endoftext|>
world') and assert it equals tokenCounter.countTokens('Hello  world') +
tokenCounter.countTokens('<|endoftext|>') (use the existing
tokenCounter.countTokens helper to get those parts) so the test explicitly
verifies the special sequence is tokenized as plain text.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1aa8931d-07a8-46ea-a2f3-cb5a5377b40c

📥 Commits

Reviewing files that changed from the base of the PR and between b21a85f and bb67792.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (11)
  • package.json
  • src/config/configSchema.ts
  • src/core/metrics/TokenCounter.ts
  • src/core/metrics/calculateOutputMetrics.ts
  • src/core/metrics/calculateSelectiveFileMetrics.ts
  • src/core/metrics/tokenCounterFactory.ts
  • src/core/metrics/workers/calculateMetricsWorker.ts
  • tests/core/metrics/TokenCounter.test.ts
  • tests/core/metrics/calculateMetrics.test.ts
  • tests/core/metrics/diffTokenCount.test.ts
  • tests/core/packager.test.ts

Comment on lines +79 to 83
test('should return 0 for errors when not initialized', () => {
const uninitCounter = new TokenCounter('o200k_base');
// Not calling init() - should throw
expect(() => uninitCounter.countTokens('test')).toThrow('TokenCounter not initialized');
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test name contradicts the assertion.

At Line 79 the title says “return 0”, but the test asserts a throw. Rename for accuracy.

✏️ Suggested rename
-test('should return 0 for errors when not initialized', () => {
+test('should throw when counting tokens before initialization', () => {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
test('should return 0 for errors when not initialized', () => {
const uninitCounter = new TokenCounter('o200k_base');
// Not calling init() - should throw
expect(() => uninitCounter.countTokens('test')).toThrow('TokenCounter not initialized');
});
test('should throw when counting tokens before initialization', () => {
const uninitCounter = new TokenCounter('o200k_base');
// Not calling init() - should throw
expect(() => uninitCounter.countTokens('test')).toThrow('TokenCounter not initialized');
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/metrics/TokenCounter.test.ts` around lines 79 - 83, The test title
is misleading: change the description string in the test containing new
TokenCounter('o200k_base') and the expect(() =>
uninitCounter.countTokens('test')).toThrow('TokenCounter not initialized') so it
accurately describes the behavior (e.g., "should throw when not initialized" or
"throws 'TokenCounter not initialized' when init not called"); locate the test
in TokenCounter.test.ts and update only the test name while keeping the existing
TokenCounter, countTokens and init-related assertions unchanged.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review

@yamadashy yamadashy closed this Mar 29, 2026
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 29, 2026

Code Review PR 1352

Overall: The tiktoken to gpt-tokenizer migration is well-executed. The code is clean, type-safe, and the caching strategy is sound. A few items worth addressing below.

PR Metadata Mismatch (Required)

The PR title says ci perf-benchmark Enable GitHub autolink for commit SHAs but the actual changes replace tiktoken with gpt-tokenizer, a significant dependency swap. The commit message perf(core): Replace tiktoken WASM with gpt-tokenizer (pure JS) is accurate, but the PR title/description should be updated to match.

Noteworthy Items (Recommended)

  1. Potential race condition in loadEncoding - In src/core/metrics/TokenCounter.ts:21-38, if two concurrent calls invoke loadEncoding with the same encoding before the first resolves, both will perform the dynamic import. Not a correctness bug but wastes resources. Fix by caching the Promise itself. Same applies to getTokenCounter in tokenCounterFactory.ts.

  2. Hardcoded token counts in tests are version-fragile - In tests/core/metrics/TokenCounter.test.ts, exact token count assertions are tied to the specific gpt-tokenizer version. Consider range assertions for most cases.

  3. Lost test coverage for error handling paths - The old tests covered the catch block in countTokens (with and without filePath). The new tests removed these. Since the error-handling code still exists (lines 63-78), it should still be tested.

  4. Type assertion on dynamic import - src/core/metrics/TokenCounter.ts:31 uses mod.countTokens as CountTokensFn. If gpt-tokenizer changes its API, this breaks silently at runtime. Consider adding a runtime check.

What Looks Good

  • Type safety improvement: z.enum(TOKEN_ENCODINGS) replaces unsafe TiktokenEncoding type assertion
  • Caching strategy: Two-level caching suits the worker pool architecture
  • API compatibility: free() kept as no-op
  • Test rewrite: Real tokenizer instead of mocks gives higher confidence
  • Clean migration: All tiktoken references removed

Performance Note

Pure JS will likely be slightly slower per-token than WASM, but for Repomix use case the faster startup trade-off is reasonable. CI benchmark should confirm no significant regression.

Generated with Claude Code

@yamadashy yamadashy deleted the ci/bench-comment-autolink-commit-sha branch March 29, 2026 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant