fix(metrics): Use resolveEncodingAsync for bundler-compatible token counting by yamadashy · Pull Request #1417 · yamadashy/repomix

yamadashy · 2026-04-06T09:14:34Z

Fix remote repository processing failure on the website caused by gpt-tokenizer's dynamic import not being resolved by the bundler (rolldown).

The original code used a template-literal dynamic import (gpt-tokenizer/encoding/${encodingName}) which rolldown cannot statically analyze. This caused a runtime error:

Cannot find package 'gpt-tokenizer' imported from /app/dist-bundled/shared-CGbgfXgU.mjs

Solution

Replace the dynamic template-literal import with gpt-tokenizer's own resolveEncodingAsync + GptEncoding.getEncodingApi(). Since resolveEncodingAsync uses static import paths internally, rolldown can resolve and bundle the BPE data correctly.

Changes

src/core/metrics/TokenCounter.ts: Use resolveEncodingAsync to load BPE rank data and GptEncoding.getEncodingApi() to create encoder instances, instead of dynamic template-literal imports

Checklist

Run npm run test
Run npm run lint

github-actions · 2026-04-06T09:14:42Z

⚡ Performance Benchmark

Latest commit:	`8bcd93a` refactor(metrics): Delegate encoding resolution to gpt-tokenizer's resolveEncodingAsync
Status:	✅ Benchmark complete!
Ubuntu:	1.41s (±0.02s) → 1.41s (±0.03s) · -0.01s (-0.4%)
macOS:	0.85s (±0.06s) → 0.85s (±0.02s) · +0.00s (+0.0%)
Windows:	1.81s (±0.02s) → 1.80s (±0.03s) · -0.01s (-0.3%)

Details

Packing the repomix repository with node bin/repomix.cjs
Warmup: 2 runs (discarded), interleaved execution
Measurement: 20 runs / 30 on macOS (median ± IQR)
Workflow run

History

37d6572 refactor(metrics): Delegate encoding resolution to gpt-tokenizer's resolveEncodingAsync

Ubuntu:	1.47s (±0.03s) → 1.49s (±0.03s) · +0.01s (+0.8%)
macOS:	1.04s (±0.27s) → 0.95s (±0.33s) · -0.09s (-8.4%)
Windows:	1.89s (±0.04s) → 1.89s (±0.03s) · +0.00s (+0.1%)

09b8398 refactor(metrics): Delegate encoding resolution to gpt-tokenizer's resolveEncodingAsync

Ubuntu:	1.50s (±0.05s) → 1.49s (±0.04s) · -0.01s (-0.9%)
macOS:	1.22s (±0.19s) → 1.21s (±0.20s) · -0.02s (-1.3%)
Windows:	1.80s (±0.03s) → 1.80s (±0.04s) · +0.01s (+0.3%)

ac29951 refactor(metrics): Use static import map for gpt-tokenizer encodings

Ubuntu:	1.46s (±0.03s) → 1.46s (±0.03s) · +0.00s (+0.1%)
macOS:	1.38s (±0.14s) → 1.37s (±0.25s) · -0.00s (-0.3%)
Windows:	1.83s (±0.03s) → 1.83s (±0.03s) · +0.01s (+0.3%)

12b409f fix(website): Add gpt-tokenizer to external deps for server bundle

Ubuntu:	1.46s (±0.03s) → 1.47s (±0.03s) · +0.00s (+0.2%)
macOS:	0.87s (±0.05s) → 0.85s (±0.03s) · -0.02s (-1.8%)
Windows:	1.82s (±0.06s) → 1.80s (±0.03s) · -0.02s (-1.0%)

coderabbitai · 2026-04-06T09:14:51Z

📝 Walkthrough

Walkthrough

Refactored the encoding module loading mechanism in TokenCounter by replacing dynamic template-literal imports with a static encodingImportMap that explicitly maps TokenEncoding values to gpt-tokenizer module imports.

Changes

Cohort / File(s)	Summary
Token Encoding Loading Refactor `src/core/metrics/TokenCounter.ts`	Replaced dynamic template-literal imports with a static `encodingImportMap` for explicit module imports. Control flow and caching logic remain unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

perf(core): Replace tiktoken WASM with gpt-tokenizer #1350 — Introduced gpt-tokenizer async loading and TokenEncoding exports; this PR refactors the import strategy for the same encoding module system.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check	✅ Passed	The title accurately describes the main change: replacing dynamic imports with a bundler-compatible solution using resolveEncodingAsync for token counting.
Description check	✅ Passed	The description provides clear context about the bundler issue, explains the solution approach, documents the specific file changes, and includes completed checklist items.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/website-gpt-tokenizer-bundle

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cloudflare-workers-and-pages · 2026-04-06T09:15:47Z

Deploying repomix with Cloudflare Pages

Latest commit:	`37d6572`
Status:	✅ Deploy successful!
Preview URL:	https://909ccc58.repomix.pages.dev
Branch Preview URL:	https://fix-website-gpt-tokenizer-bu.repomix.pages.dev

View logs

codecov · 2026-04-06T09:16:57Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.26%. Comparing base (fb24e64) to head (8bcd93a).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1417   +/-   ##
=======================================
  Coverage   87.26%   87.26%           
=======================================
  Files         117      117           
  Lines        4420     4421    +1     
  Branches     1021     1021           
=======================================
+ Hits         3857     3858    +1     
  Misses        563      563

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

claude · 2026-04-06T09:30:02Z

Code Review — Claude

Verdict: Approve ✅

This is a clean, well-evolved fix. The final approach — using resolveEncodingAsync + GptEncoding.getEncodingApi — is the best of the three iterations because it eliminates the bundling problem at its root rather than working around it with externals/Dockerfile changes.

Note: The previous Claude review on this PR was based on an earlier iteration that modified bundle.mjs and Dockerfile. Those changes were superseded — the final diff only touches TokenCounter.ts. This review reflects the current state.

Summary

Area	Finding
Code Quality	Clean and readable. The `.bind(encoder)` is correct — it preserves the `this` context for `countTokens`. Comments accurately explain the bundler motivation.
Security	Improved. Replaces a template-literal dynamic import (potential path traversal vector, though mitigated by the `TokenEncoding` type) with static imports. No new attack surface.
Performance	No regression (CI benchmarks confirm ±0.3%). The `GptEncoding` instance is created once per encoding and cached via the existing `encodingModules` Map. Top-level imports of `GptEncoding` and `resolveEncodingAsync` are lightweight (just class/function references).
Test Coverage	All modified lines covered per Codecov. Adequate for this change.
Conventions	Commit messages follow Conventional Commits. Code style is consistent.

Observations (non-blocking)

PR title/description mismatch: The title says "Add gpt-tokenizer to external deps for server bundle" and the description references scripts/bundle.mjs and Dockerfile changes, but the final diff only modifies src/core/metrics/TokenCounter.ts. Consider updating the title to something like fix(metrics): Replace dynamic import with resolveEncodingAsync for bundler compatibility and updating the description to match.
resolveEncodingAsync is a public but less-commonly-used API from gpt-tokenizer. It's exported from gpt-tokenizer/resolveEncodingAsync in the package's exports map, so it's a stable public API — not an internal. Same for GptEncoding.getEncodingApi. This is a safe dependency.
No remaining template-literal dynamic imports in src/ — this was the last one, which is great for bundler compatibility going forward.
Gemini's suggestion about the regex /^gpt-tokenizer/ is now moot since that change was reverted in the final iteration.

LGTM — ship it! 🚢

Reviewed with Claude Code

gpt-tokenizer uses dynamic imports with template literals (gpt-tokenizer/encoding/${encodingName}) which cannot be resolved at bundle time by rolldown. The package was missing from both the external list in the bundle config and the Dockerfile runtime image, causing "Cannot find package 'gpt-tokenizer'" errors during remote repository processing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace dynamic template literal import (`gpt-tokenizer/encoding/${name}`) with an explicit static import map so that rolldown can resolve and bundle gpt-tokenizer directly. This eliminates the need to treat gpt-tokenizer as an external dependency in the server bundle and Dockerfile. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…solveEncodingAsync Instead of maintaining a static import map of all 5 encodings in Repomix, use gpt-tokenizer's resolveEncodingAsync which already handles the encoding-to-BPE-data mapping with static import paths internally. This removes the need to update Repomix when gpt-tokenizer adds encodings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

claude · 2026-04-06T09:39:07Z

Code Review — Claude

Verdict: Approve ✅

Clean, minimal fix that solves the rolldown bundler incompatibility at its root. The final approach — using resolveEncodingAsync + GptEncoding.getEncodingApi() — is the best of the three iterations, eliminating the need for workarounds in bundle config or Dockerfile.

Summary

Area	Assessment
Code Quality	Excellent. Clear variable names (`bpeRanks`, `encoder`, `countFn`), accurate comments explaining the "why", correct `.bind(encoder)` for `this` context preservation.
Security	Improved. Removes a template-literal dynamic import (`gpt-tokenizer/encoding/${encodingName}`) — even though `encodingName` was already type-constrained, static imports are strictly better here.
Performance	No regression. CI benchmarks confirm ±0.3%. Encoding is loaded once per `TokenEncoding` value and cached in the `encodingModules` Map. The `.bind()` overhead is negligible.
Test Coverage	All modified lines covered (Codecov: 87.26%, unchanged). Tests cover init, countTokens, error handling, multiple encodings (`o200k_base`, `cl100k_base`).
Conventions	PR title follows `type(scope): Description`. Code style is consistent with the rest of the codebase.

No issues found. Ship it!

Detailed observations (non-blocking)

Gemini's earlier feedback about regex patterns in bundle.mjs and Dockerfile changes is now moot — those files were correctly removed in the final iteration.
resolveEncodingAsync and GptEncoding.getEncodingApi are both stable public exports from gpt-tokenizer's exports map — safe to depend on.
The () => bpeRanks callback pattern in getEncodingApi is a factory that returns the already-resolved BPE ranks, which is correct since they were awaited on the previous line.
No remaining template-literal dynamic imports in src/ after this change.

Reviewed with Claude Code

This comment was marked as outdated.

Sign in to view

This comment has been minimized.

Sign in to view

coderabbitai bot approved these changes Apr 6, 2026

View reviewed changes

yamadashy force-pushed the fix/website-gpt-tokenizer-bundle branch from 09b8398 to 37d6572 Compare April 6, 2026 09:32

yamadashy and others added 3 commits April 6, 2026 18:33

yamadashy force-pushed the fix/website-gpt-tokenizer-bundle branch from 37d6572 to 8bcd93a Compare April 6, 2026 09:33

yamadashy changed the title ~~fix(website): Add gpt-tokenizer to external deps for server bundle~~ fix(metrics): Use resolveEncodingAsync for bundler-compatible token counting Apr 6, 2026

yamadashy merged commit 7b9e3d4 into main Apr 6, 2026
58 of 59 checks passed

yamadashy deleted the fix/website-gpt-tokenizer-bundle branch April 6, 2026 09:40

yamadashy mentioned this pull request Apr 6, 2026

chore(website): Update repomix dependency to latest main #1419

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(metrics): Use resolveEncodingAsync for bundler-compatible token counting#1417

fix(metrics): Use resolveEncodingAsync for bundler-compatible token counting#1417
yamadashy merged 3 commits intomainfrom
fix/website-gpt-tokenizer-bundle

yamadashy commented Apr 6, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Apr 6, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

cloudflare-workers-and-pages bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

claude bot commented Apr 6, 2026

Uh oh!

claude bot commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

yamadashy commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Solution

Changes

Checklist

Uh oh!

github-actions bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚡ Performance Benchmark

Uh oh!

coderabbitai bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

cloudflare-workers-and-pages bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying repomix with Cloudflare Pages

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment has been minimized.

claude bot commented Apr 6, 2026

Code Review — Claude

Summary

Uh oh!

claude bot commented Apr 6, 2026

Code Review — Claude

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yamadashy commented Apr 6, 2026 •

edited

Loading

github-actions bot commented Apr 6, 2026 •

edited

Loading

coderabbitai bot commented Apr 6, 2026 •

edited

Loading

cloudflare-workers-and-pages bot commented Apr 6, 2026 •

edited

Loading

codecov bot commented Apr 6, 2026 •

edited

Loading