docs(core): Replace tiktoken references with gpt-tokenizer by yamadashy · Pull Request #1413 · yamadashy/repomix

yamadashy · 2026-04-06T06:14:58Z

Update all documentation, build configuration, and source comments to reflect the completed migration from tiktoken to gpt-tokenizer.

Changes

Documentation (README + 15 language variants):

Update tokenCount.encoding description: replaced tiktoken-specific wording with "OpenAI-compatible tokenization" and linked to gpt-tokenizer
Remove tiktoken from bundling external dependencies list — gpt-tokenizer is pure JS, no longer needs to be external

Build infrastructure:

Remove tiktoken COPY line from website/server/Dockerfile
Remove tiktoken from external array in website/server/scripts/bundle.mjs

Source code:

Simplify comments in TokenCounter.ts — remove "old tiktoken behavior" migration artifacts

Lock files:

Regenerate scripts/memory/package-lock.json to remove stale tiktoken reference
website/server/package-lock.json still references tiktoken via the GitHub-linked repomix dependency — will resolve automatically after this merges and npm install is re-run

Checklist

Run npm run test
Run npm run lint

🤖 Generated with Claude Code

coderabbitai · 2026-04-06T06:15:12Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 67124ae2-3868-4f5a-85ee-c031b76b4d40

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This pull request replaces all references to OpenAI's tiktoken library with gpt-tokenizer across documentation and bundling configuration. It updates configuration guides in multiple languages, removes tiktoken from external dependency lists, updates source code comments, and modifies Docker and build scripts to treat tiktoken as bundled rather than external.

Changes

Cohort / File(s)	Summary
Documentation - Configuration Guides `README.md`, `website/client/src/*/guide/configuration.md`	Updated `tokenCount.encoding` descriptions to reference OpenAI-compatible tokenization and `gpt-tokenizer` instead of OpenAI's `tiktoken`. Removed tiktoken links and references across English, German, Spanish, French, Hindi, Indonesian, Italian, Japanese, Korean, Portuguese (BR), Russian, Turkish, Vietnamese, Simplified Chinese, and Traditional Chinese documentation.
Documentation - Bundling Guides `website/client/src/*/guide/development/using-repomix-as-a-library.md`	Removed `tiktoken` from external dependencies lists across all 15 language versions, leaving only `tinypool` as an unbundleable external dependency.
Source Code Comments `src/core/metrics/TokenCounter.ts`	Updated inline documentation to reference OpenAI-compatible encodings rather than tiktoken-specific behavior; no logic or control flow changes.
Bundling & Runtime Configuration `website/server/Dockerfile`, `website/server/scripts/bundle.mjs`	Removed `tiktoken` copy instruction from Docker runtime image build and removed `tiktoken` from Rolldown's external dependencies list, treating it as bundleable code.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

Possibly related PRs

PR #1065: Addresses the opposite concern—handling how tiktoken maintains external dependency status and WASM provisioning at runtime.
PR #1350: Contains related gpt-tokenizer implementation changes that complement this documentation and bundling migration.
PR #1075: Originally introduced tiktoken as an external dependency in bundling documentation and configuration; this PR reverses that decision by removing it.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: replacing tiktoken references with gpt-tokenizer across documentation and configuration files.
Description check	✅ Passed	The description covers all required changes comprehensively, includes a well-organized checklist of modifications across documentation, build infrastructure, and source code, and confirms both npm test and lint were run.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch docs/remove-tiktoken-references

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cloudflare-workers-and-pages · 2026-04-06T06:16:20Z

Deploying repomix with Cloudflare Pages

Latest commit:	`8d8aefc`
Status:	⚡️ Build in progress...

View logs

gemini-code-assist

Code Review

This pull request migrates the project from tiktoken to gpt-tokenizer for token counting. Key changes include updating dependencies in package-lock.json, revising the TokenCounter logic and comments, and updating documentation across all supported languages. Additionally, tiktoken has been removed from the build's external dependencies and Docker configuration since it is no longer required at runtime. I have no feedback to provide.

github-actions · 2026-04-06T06:17:30Z

⚡ Performance Benchmark

Latest commit:	`8d8aefc` chore(server): Update website server package-lock.json
Status:	✅ Benchmark complete!
Ubuntu:	1.50s (±0.04s) → 1.50s (±0.05s) · +0.00s (+0.0%)
macOS:	1.21s (±0.36s) → 1.22s (±0.28s) · +0.01s (+1.0%)
Windows:	1.80s (±0.13s) → 1.83s (±0.12s) · +0.03s (+1.8%)

Details

Packing the repomix repository with node bin/repomix.cjs
Warmup: 2 runs (discarded), interleaved execution
Measurement: 20 runs / 30 on macOS (median ± IQR)
Workflow run

History

54597b1 chore(server): Update website server package-lock.json

Ubuntu:	1.55s (±0.04s) → 1.54s (±0.03s) · -0.01s (-0.7%)
macOS:	0.97s (±0.07s) → 0.98s (±0.13s) · +0.01s (+1.2%)
Windows:	1.80s (±0.05s) → 1.81s (±0.04s) · +0.00s (+0.1%)

493035c docs(core): Replace tiktoken references with gpt-tokenizer

Ubuntu:	1.50s (±0.03s) → 1.50s (±0.03s) · +0.00s (+0.2%)
macOS:	1.04s (±0.13s) → 1.05s (±0.16s) · +0.01s (+0.5%)
Windows:	1.86s (±0.16s) → 1.82s (±0.12s) · -0.05s (-2.5%)

codecov · 2026-04-06T06:18:38Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.26%. Comparing base (ffe6770) to head (8d8aefc).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1413   +/-   ##
=======================================
  Coverage   87.26%   87.26%           
=======================================
  Files         117      117           
  Lines        4420     4420           
  Branches     1021     1021           
=======================================
  Hits         3857     3857           
  Misses        563      563

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Update all documentation, build configuration, and source comments to reflect the completed migration from tiktoken to gpt-tokenizer. - Update tokenCount.encoding description to say "OpenAI-compatible tokenization" with gpt-tokenizer link (README + 15 language docs) - Remove tiktoken from bundling external deps list since gpt-tokenizer is pure JS (README + 15 language docs) - Remove tiktoken COPY from Dockerfile and external from bundle.mjs - Simplify source code comments removing migration artifacts - Regenerate scripts/memory/package-lock.json Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

claude · 2026-04-06T06:38:56Z

Code Review — Claude

Overall: Approve — This is a clean, thorough documentation and config cleanup PR. No logic changes, no regressions, safe to merge.

Findings

Verify gpt-tokenizer GitHub URL (all 16 doc files)

All documentation links point to https://github.com/nicolo-ribaudo/gpt-tokenizer. The canonical npm package gpt-tokenizer (v3.4.0) is historically maintained at https://github.com/niieani/gpt-tokenizer by niieani (Bazyli Brzóska). nicolo-ribaudo is primarily known as a Babel/TC39 contributor. Please verify this URL is correct — if it's a recent repo transfer, it's fine; if not, all 16 doc files need updating.

Detailed review notes

Code Quality — The TokenCounter.ts comment cleanup is well done. Removing the "old tiktoken behavior" migration artifacts is appropriate now that the migration is complete. The free() no-op comment and PLAIN_TEXT_OPTIONS explanation remain clear and accurate.

Security — No concerns. Removing tiktoken (native/WASM addon) from the Docker image and bundle externals actually reduces the supply-chain attack surface. The dynamic import path in TokenCounter.ts is constrained to the TOKEN_ENCODINGS const tuple — no path traversal risk.

Performance — The benchmark results confirm no regression. Bundling gpt-tokenizer inline (pure JS) instead of keeping it external is correct and should slightly reduce Docker image size vs. copying tiktoken's WASM binaries. Lazy loading via dynamic import is preserved.

Test Coverage — No new tests needed. Only comments and config changed; Codecov confirms all modified coverable lines are covered (87.26% project coverage).

Conventions — Commit messages follow Conventional Commits correctly. Documentation wording is consistent across all 16 language variants.

Completeness — All tiktoken references in source, docs, and build config have been removed. The only remaining tiktoken mention is in historical release notes (.github/releases/), which is correct.

🤖 Generated with Claude Code

claude · 2026-04-06T06:40:07Z

Review — Claude

Verdict: Looks good ✅

Clean documentation and build config follow-up to the tiktoken → gpt-tokenizer migration (#1350). All 36 changed files are consistent and well-scoped.

Highlights

No logic changes — only comments, docs, and build config touched
All 15 language variants updated consistently
Build pipeline correctly simplified — tiktoken WASM no longer needs special handling in Dockerfile or bundle externals
Benchmarks show no regression across all platforms
Conventions fully followed (commit messages, PR checklist)
No stale tiktoken references remain — verified only historical release notes (v0.1.39) still mention it, which is correct

Detailed review notes

Code Quality: Comment simplifications in TokenCounter.ts are accurate. Removing "old tiktoken behavior" migration artifacts is appropriate cleanup.

Security: No concerns. Removing the tiktoken WASM COPY from Dockerfile reduces attack surface marginally.

Performance: Bundling gpt-tokenizer (pure JS) instead of externalizing tiktoken (WASM) is neutral to positive. CI benchmarks confirm.

Test Coverage: No new logic paths — 87.26% coverage maintained, all modified coverable lines covered.

Conventions: Both commits follow type(scope): Description format. PR body is well-organized with checked test/lint boxes.

🤖 Generated with Claude Code

gemini-code-assist bot reviewed Apr 6, 2026

View reviewed changes

coderabbitai bot approved these changes Apr 6, 2026

View reviewed changes

devin-ai-integration bot reviewed Apr 6, 2026

View reviewed changes

yamadashy and others added 2 commits April 6, 2026 15:37

chore(server): Update website server package-lock.json

8d8aefc

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

yamadashy force-pushed the docs/remove-tiktoken-references branch from 54597b1 to 8d8aefc Compare April 6, 2026 06:37

yamadashy merged commit 01f5c1a into main Apr 6, 2026
54 of 55 checks passed

yamadashy deleted the docs/remove-tiktoken-references branch April 6, 2026 06:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(core): Replace tiktoken references with gpt-tokenizer#1413

docs(core): Replace tiktoken references with gpt-tokenizer#1413
yamadashy merged 2 commits intomainfrom
docs/remove-tiktoken-references

yamadashy commented Apr 6, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

coderabbitai bot commented Apr 6, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

cloudflare-workers-and-pages bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

github-actions bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

codecov bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

Uh oh!

claude bot commented Apr 6, 2026

Uh oh!

claude bot commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

yamadashy commented Apr 6, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Checklist

Uh oh!

coderabbitai bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

cloudflare-workers-and-pages bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying repomix with Cloudflare Pages

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

github-actions bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚡ Performance Benchmark

Uh oh!

codecov bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Uh oh!

claude bot commented Apr 6, 2026

Code Review — Claude

Findings

Uh oh!

claude bot commented Apr 6, 2026

Review — Claude

Highlights

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yamadashy commented Apr 6, 2026 •

edited by devin-ai-integration bot

Loading

coderabbitai bot commented Apr 6, 2026 •

edited

Loading

cloudflare-workers-and-pages bot commented Apr 6, 2026 •

edited

Loading

github-actions bot commented Apr 6, 2026 •

edited

Loading

codecov bot commented Apr 6, 2026 •

edited

Loading