Skip to content

fix(metrics): Use resolveEncodingAsync for bundler-compatible token counting#1417

Merged
yamadashy merged 3 commits intomainfrom
fix/website-gpt-tokenizer-bundle
Apr 6, 2026
Merged

fix(metrics): Use resolveEncodingAsync for bundler-compatible token counting#1417
yamadashy merged 3 commits intomainfrom
fix/website-gpt-tokenizer-bundle

Conversation

@yamadashy
Copy link
Copy Markdown
Owner

@yamadashy yamadashy commented Apr 6, 2026

Fix remote repository processing failure on the website caused by gpt-tokenizer's dynamic import not being resolved by the bundler (rolldown).

The original code used a template-literal dynamic import (gpt-tokenizer/encoding/${encodingName}) which rolldown cannot statically analyze. This caused a runtime error:

Cannot find package 'gpt-tokenizer' imported from /app/dist-bundled/shared-CGbgfXgU.mjs

Solution

Replace the dynamic template-literal import with gpt-tokenizer's own resolveEncodingAsync + GptEncoding.getEncodingApi(). Since resolveEncodingAsync uses static import paths internally, rolldown can resolve and bundle the BPE data correctly.

Changes

  • src/core/metrics/TokenCounter.ts: Use resolveEncodingAsync to load BPE rank data and GptEncoding.getEncodingApi() to create encoder instances, instead of dynamic template-literal imports

Checklist

  • Run npm run test
  • Run npm run lint

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

⚡ Performance Benchmark

Latest commit:8bcd93a refactor(metrics): Delegate encoding resolution to gpt-tokenizer's resolveEncodingAsync
Status:✅ Benchmark complete!
Ubuntu:1.41s (±0.02s) → 1.41s (±0.03s) · -0.01s (-0.4%)
macOS:0.85s (±0.06s) → 0.85s (±0.02s) · +0.00s (+0.0%)
Windows:1.81s (±0.02s) → 1.80s (±0.03s) · -0.01s (-0.3%)
Details
  • Packing the repomix repository with node bin/repomix.cjs
  • Warmup: 2 runs (discarded), interleaved execution
  • Measurement: 20 runs / 30 on macOS (median ± IQR)
  • Workflow run
History

37d6572 refactor(metrics): Delegate encoding resolution to gpt-tokenizer's resolveEncodingAsync

Ubuntu:1.47s (±0.03s) → 1.49s (±0.03s) · +0.01s (+0.8%)
macOS:1.04s (±0.27s) → 0.95s (±0.33s) · -0.09s (-8.4%)
Windows:1.89s (±0.04s) → 1.89s (±0.03s) · +0.00s (+0.1%)

09b8398 refactor(metrics): Delegate encoding resolution to gpt-tokenizer's resolveEncodingAsync

Ubuntu:1.50s (±0.05s) → 1.49s (±0.04s) · -0.01s (-0.9%)
macOS:1.22s (±0.19s) → 1.21s (±0.20s) · -0.02s (-1.3%)
Windows:1.80s (±0.03s) → 1.80s (±0.04s) · +0.01s (+0.3%)

ac29951 refactor(metrics): Use static import map for gpt-tokenizer encodings

Ubuntu:1.46s (±0.03s) → 1.46s (±0.03s) · +0.00s (+0.1%)
macOS:1.38s (±0.14s) → 1.37s (±0.25s) · -0.00s (-0.3%)
Windows:1.83s (±0.03s) → 1.83s (±0.03s) · +0.01s (+0.3%)

12b409f fix(website): Add gpt-tokenizer to external deps for server bundle

Ubuntu:1.46s (±0.03s) → 1.47s (±0.03s) · +0.00s (+0.2%)
macOS:0.87s (±0.05s) → 0.85s (±0.03s) · -0.02s (-1.8%)
Windows:1.82s (±0.06s) → 1.80s (±0.03s) · -0.02s (-1.0%)

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 6, 2026

📝 Walkthrough

Walkthrough

Refactored the encoding module loading mechanism in TokenCounter by replacing dynamic template-literal imports with a static encodingImportMap that explicitly maps TokenEncoding values to gpt-tokenizer module imports.

Changes

Cohort / File(s) Summary
Token Encoding Loading Refactor
src/core/metrics/TokenCounter.ts
Replaced dynamic template-literal imports with a static encodingImportMap for explicit module imports. Control flow and caching logic remain unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The title accurately describes the main change: replacing dynamic imports with a bundler-compatible solution using resolveEncodingAsync for token counting.
Description check ✅ Passed The description provides clear context about the bundler issue, explains the solution approach, documents the specific file changes, and includes completed checklist items.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/website-gpt-tokenizer-bundle

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages bot commented Apr 6, 2026

Deploying repomix with  Cloudflare Pages  Cloudflare Pages

Latest commit: 37d6572
Status: ✅  Deploy successful!
Preview URL: https://909ccc58.repomix.pages.dev
Branch Preview URL: https://fix-website-gpt-tokenizer-bu.repomix.pages.dev

View logs

gemini-code-assist[bot]

This comment was marked as outdated.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.26%. Comparing base (fb24e64) to head (8bcd93a).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1417   +/-   ##
=======================================
  Coverage   87.26%   87.26%           
=======================================
  Files         117      117           
  Lines        4420     4421    +1     
  Branches     1021     1021           
=======================================
+ Hits         3857     3858    +1     
  Misses        563      563           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@claude

This comment has been minimized.

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 6, 2026

Code Review — Claude

Verdict: Approve

This is a clean, well-evolved fix. The final approach — using resolveEncodingAsync + GptEncoding.getEncodingApi — is the best of the three iterations because it eliminates the bundling problem at its root rather than working around it with externals/Dockerfile changes.

Note: The previous Claude review on this PR was based on an earlier iteration that modified bundle.mjs and Dockerfile. Those changes were superseded — the final diff only touches TokenCounter.ts. This review reflects the current state.

Summary

Area Finding
Code Quality Clean and readable. The .bind(encoder) is correct — it preserves the this context for countTokens. Comments accurately explain the bundler motivation.
Security Improved. Replaces a template-literal dynamic import (potential path traversal vector, though mitigated by the TokenEncoding type) with static imports. No new attack surface.
Performance No regression (CI benchmarks confirm ±0.3%). The GptEncoding instance is created once per encoding and cached via the existing encodingModules Map. Top-level imports of GptEncoding and resolveEncodingAsync are lightweight (just class/function references).
Test Coverage All modified lines covered per Codecov. Adequate for this change.
Conventions Commit messages follow Conventional Commits. Code style is consistent.
Observations (non-blocking)
  • PR title/description mismatch: The title says "Add gpt-tokenizer to external deps for server bundle" and the description references scripts/bundle.mjs and Dockerfile changes, but the final diff only modifies src/core/metrics/TokenCounter.ts. Consider updating the title to something like fix(metrics): Replace dynamic import with resolveEncodingAsync for bundler compatibility and updating the description to match.
  • resolveEncodingAsync is a public but less-commonly-used API from gpt-tokenizer. It's exported from gpt-tokenizer/resolveEncodingAsync in the package's exports map, so it's a stable public API — not an internal. Same for GptEncoding.getEncodingApi. This is a safe dependency.
  • No remaining template-literal dynamic imports in src/ — this was the last one, which is great for bundler compatibility going forward.
  • Gemini's suggestion about the regex /^gpt-tokenizer/ is now moot since that change was reverted in the final iteration.

LGTM — ship it! 🚢


Reviewed with Claude Code

@yamadashy yamadashy force-pushed the fix/website-gpt-tokenizer-bundle branch from 09b8398 to 37d6572 Compare April 6, 2026 09:32
yamadashy and others added 3 commits April 6, 2026 18:33
gpt-tokenizer uses dynamic imports with template literals
(gpt-tokenizer/encoding/${encodingName}) which cannot be resolved at
bundle time by rolldown. The package was missing from both the external
list in the bundle config and the Dockerfile runtime image, causing
"Cannot find package 'gpt-tokenizer'" errors during remote repository
processing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace dynamic template literal import (`gpt-tokenizer/encoding/${name}`)
with an explicit static import map so that rolldown can resolve and bundle
gpt-tokenizer directly. This eliminates the need to treat gpt-tokenizer as
an external dependency in the server bundle and Dockerfile.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…solveEncodingAsync

Instead of maintaining a static import map of all 5 encodings in Repomix,
use gpt-tokenizer's resolveEncodingAsync which already handles the
encoding-to-BPE-data mapping with static import paths internally.
This removes the need to update Repomix when gpt-tokenizer adds encodings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@yamadashy yamadashy force-pushed the fix/website-gpt-tokenizer-bundle branch from 37d6572 to 8bcd93a Compare April 6, 2026 09:33
@yamadashy yamadashy changed the title fix(website): Add gpt-tokenizer to external deps for server bundle fix(metrics): Use resolveEncodingAsync for bundler-compatible token counting Apr 6, 2026
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 6, 2026

Code Review — Claude

Verdict: Approve

Clean, minimal fix that solves the rolldown bundler incompatibility at its root. The final approach — using resolveEncodingAsync + GptEncoding.getEncodingApi() — is the best of the three iterations, eliminating the need for workarounds in bundle config or Dockerfile.

Summary

Area Assessment
Code Quality Excellent. Clear variable names (bpeRanks, encoder, countFn), accurate comments explaining the "why", correct .bind(encoder) for this context preservation.
Security Improved. Removes a template-literal dynamic import (gpt-tokenizer/encoding/${encodingName}) — even though encodingName was already type-constrained, static imports are strictly better here.
Performance No regression. CI benchmarks confirm ±0.3%. Encoding is loaded once per TokenEncoding value and cached in the encodingModules Map. The .bind() overhead is negligible.
Test Coverage All modified lines covered (Codecov: 87.26%, unchanged). Tests cover init, countTokens, error handling, multiple encodings (o200k_base, cl100k_base).
Conventions PR title follows type(scope): Description. Code style is consistent with the rest of the codebase.

No issues found. Ship it!

Detailed observations (non-blocking)
  • Gemini's earlier feedback about regex patterns in bundle.mjs and Dockerfile changes is now moot — those files were correctly removed in the final iteration.
  • resolveEncodingAsync and GptEncoding.getEncodingApi are both stable public exports from gpt-tokenizer's exports map — safe to depend on.
  • The () => bpeRanks callback pattern in getEncodingApi is a factory that returns the already-resolved BPE ranks, which is correct since they were awaited on the previous line.
  • No remaining template-literal dynamic imports in src/ after this change.

Reviewed with Claude Code

@yamadashy yamadashy merged commit 7b9e3d4 into main Apr 6, 2026
58 of 59 checks passed
@yamadashy yamadashy deleted the fix/website-gpt-tokenizer-bundle branch April 6, 2026 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant