docs(core): Replace tiktoken references with gpt-tokenizer#1413
docs(core): Replace tiktoken references with gpt-tokenizer#1413
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis pull request replaces all references to OpenAI's Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request migrates the project from tiktoken to gpt-tokenizer for token counting. Key changes include updating dependencies in package-lock.json, revising the TokenCounter logic and comments, and updating documentation across all supported languages. Additionally, tiktoken has been removed from the build's external dependencies and Docker configuration since it is no longer required at runtime. I have no feedback to provide.
⚡ Performance Benchmark
Details
History54597b1 chore(server): Update website server package-lock.json
493035c docs(core): Replace tiktoken references with gpt-tokenizer
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1413 +/- ##
=======================================
Coverage 87.26% 87.26%
=======================================
Files 117 117
Lines 4420 4420
Branches 1021 1021
=======================================
Hits 3857 3857
Misses 563 563 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Update all documentation, build configuration, and source comments to reflect the completed migration from tiktoken to gpt-tokenizer. - Update tokenCount.encoding description to say "OpenAI-compatible tokenization" with gpt-tokenizer link (README + 15 language docs) - Remove tiktoken from bundling external deps list since gpt-tokenizer is pure JS (README + 15 language docs) - Remove tiktoken COPY from Dockerfile and external from bundle.mjs - Simplify source code comments removing migration artifacts - Regenerate scripts/memory/package-lock.json Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
54597b1 to
8d8aefc
Compare
Code Review — ClaudeOverall: Approve — This is a clean, thorough documentation and config cleanup PR. No logic changes, no regressions, safe to merge. FindingsVerify gpt-tokenizer GitHub URL (all 16 doc files) All documentation links point to Detailed review notesCode Quality — The Security — No concerns. Removing tiktoken (native/WASM addon) from the Docker image and bundle externals actually reduces the supply-chain attack surface. The dynamic import path in Performance — The benchmark results confirm no regression. Bundling gpt-tokenizer inline (pure JS) instead of keeping it external is correct and should slightly reduce Docker image size vs. copying tiktoken's WASM binaries. Lazy loading via dynamic import is preserved. Test Coverage — No new tests needed. Only comments and config changed; Codecov confirms all modified coverable lines are covered (87.26% project coverage). Conventions — Commit messages follow Conventional Commits correctly. Documentation wording is consistent across all 16 language variants. Completeness — All tiktoken references in source, docs, and build config have been removed. The only remaining tiktoken mention is in historical release notes ( 🤖 Generated with Claude Code |
Review — ClaudeVerdict: Looks good ✅ Clean documentation and build config follow-up to the tiktoken → gpt-tokenizer migration (#1350). All 36 changed files are consistent and well-scoped. Highlights
Detailed review notesCode Quality: Comment simplifications in Security: No concerns. Removing the tiktoken WASM COPY from Dockerfile reduces attack surface marginally. Performance: Bundling gpt-tokenizer (pure JS) instead of externalizing tiktoken (WASM) is neutral to positive. CI benchmarks confirm. Test Coverage: No new logic paths — 87.26% coverage maintained, all modified coverable lines covered. Conventions: Both commits follow 🤖 Generated with Claude Code |
Update all documentation, build configuration, and source comments to reflect the completed migration from tiktoken to gpt-tokenizer.
Changes
Documentation (README + 15 language variants):
tokenCount.encodingdescription: replaced tiktoken-specific wording with "OpenAI-compatible tokenization" and linked to gpt-tokenizertiktokenfrom bundling external dependencies list — gpt-tokenizer is pure JS, no longer needs to be externalBuild infrastructure:
tiktokenCOPY line fromwebsite/server/Dockerfiletiktokenfrom external array inwebsite/server/scripts/bundle.mjsSource code:
TokenCounter.ts— remove "old tiktoken behavior" migration artifactsLock files:
scripts/memory/package-lock.jsonto remove stale tiktoken referencewebsite/server/package-lock.jsonstill references tiktoken via the GitHub-linked repomix dependency — will resolve automatically after this merges andnpm installis re-runChecklist
npm run testnpm run lint🤖 Generated with Claude Code