refactor(metrics): Replace tiktoken with gpt-tokenizer by yamadashy · Pull Request #1245 · yamadashy/repomix

yamadashy · 2026-03-19T17:27:56Z

Replace the WASM-based tiktoken library with gpt-tokenizer, a pure JavaScript BPE tokenizer implementation. This eliminates the native/WASM binary dependency while maintaining identical token count results across all encodings.

Changes

gpt-tokenizer added as production dependency, tiktoken fully removed
New TokenEncoding type replaces TiktokenEncoding from tiktoken
TokenCounter changed to async factory (static async create()) using resolveEncodingAsync to dynamically import only the needed BPE encoding data (~2.2MB for o200k_base) instead of all encodings (~4.1MB)
tokenCounterFactory / calculateMetricsWorker updated for async initialization
Encoding validation added to config schema (z.enum() instead of unchecked string cast)
Updated website/server Dockerfile and bundle script for gpt-tokenizer

End-to-End Benchmark (full repository)

Environment	tiktoken (main)	gpt-tokenizer (this PR)	Diff
Linux x86 (median, 7 runs)	7903ms	4186ms	-47%
macOS M2 (mean ± σ, hyperfine 10 runs)	1.353s ± 0.069s	1.355s ± 0.045s	same (1.00x)

Linux (CI, Cloud Run) では大幅に速く、macOS (local CLI) では同等。

Encoding Compatibility

Encoding	tiktoken	gpt-tokenizer	Notes
o200k_base	yes	yes	Default, GPT-4o/o1/o3
cl100k_base	yes	yes	GPT-4, GPT-3.5-turbo
p50k_base	yes	yes	Legacy
p50k_edit	yes	yes	Legacy
r50k_base	yes	yes	Legacy
o200k_harmony	no	yes	Added: open-weight models
gpt2	yes	no	Dropped: very old, not used in modern LLMs

The only encoding lost is gpt2 (GPT-2 era), which is not used by any current model.

Benefits

No WASM/native binary dependency - simpler build and deployment
No explicit resource cleanup (free()) needed
Lazy loading of BPE data via dynamic import (only loads requested encoding)
Runtime validation of encoding names in config
Significant speedup on Linux x86 (Cloud Run / CI)

Checklist

Run npm run test
Run npm run lint

codecov · 2026-03-19T17:29:02Z

Codecov Report

❌ Patch coverage is 94.44444% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 87.24%. Comparing base (ad7abc9) to head (37f632b).
⚠️ Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
src/core/metrics/tokenCounterFactory.ts	87.50%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1245      +/-   ##
==========================================
+ Coverage   87.18%   87.24%   +0.05%     
==========================================
  Files         115      116       +1     
  Lines        4324     4328       +4     
  Branches     1002     1004       +2     
==========================================
+ Hits         3770     3776       +6     
+ Misses        554      552       -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

cloudflare-workers-and-pages · 2026-03-19T17:30:02Z

Deploying repomix with Cloudflare Pages

Latest commit:	`37f632b`
Status:	✅ Deploy successful!
Preview URL:	https://92d7f332.repomix.pages.dev
Branch Preview URL:	https://refactor-replace-tiktoken-wi.repomix.pages.dev

View logs

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

claude · 2026-03-20T04:20:03Z

PR Review: refactor(metrics): Replace tiktoken with gpt-tokenizer

Overall this is a well-executed migration with impressive benchmark results. The pre-built encoding sharing via structured clone is a clever optimization. A few items to consider:

Fragility of Internal API Access

The encodingCache.ts module accesses gpt-tokenizer internals via type assertions (bytePairEncodingCoreProcessor, bytePairRankDecoder, etc.). This couples tightly to implementation details that can change without notice in a minor/patch release.

Details

Both preBuildEncodingData and restoreEncodingFromData rely on internal property names like bytePairEncodingCoreProcessor, bytePairStringRankEncoder, mergeCacheSize, etc. These are not part of gpt-tokenizer's public API.

Mitigations to consider:

Pin gpt-tokenizer to an exact version (e.g., "3.4.0" instead of "^3.4.0") to prevent silent breakage on update
Add a smoke test that validates the pre-build/restore round-trip produces correct token counts (not just mocked)
Add a comment documenting which gpt-tokenizer version these internals were verified against

`TokenEncoding` Type Maintenance

The hand-maintained TokenEncoding union type in tokenEncoding.ts could drift from what gpt-tokenizer actually supports.

Details

If gpt-tokenizer adds or removes an encoding, this type won't reflect it. Consider deriving it from the library if possible, or adding a comment noting it must be kept in sync.

Also, o200k_harmony is listed — verify this encoding is actually supported by gpt-tokenizer v3.4.0.

`extraWorkerData` Spreading Behavior

In processConcurrency.ts:88, extraWorkerData is spread into workerData with ...extraWorkerData. If extraWorkerData contained a workerType or logLevel key, it would silently override the required fields.

Details

The spread is currently:

workerData: {
  workerType,
  logLevel: logger.getLogLevel(),
  ...extraWorkerData,
},

Consider spreading extraWorkerData first, or validating that it doesn't conflict:

workerData: {
  ...extraWorkerData,
  workerType,
  logLevel: logger.getLogLevel(),
},

This is low risk since the caller is internal, but defensive ordering is cheap.

`freeTokenCounters` Is Now Misleading

The function logs "Freed TokenCounter resources" but does nothing (since free() is a no-op). Consider simplifying to just tokenCounters.clear() without the loop/log, or updating the log message.

`bytePairRankDecoder` Typed as `unknown`

In the EncodingData interface, bytePairRankDecoder is typed unknown while all other fields have specific types. If the actual type is known, specifying it would improve type safety and catch serialization issues earlier.

CI Status

Most checks are passing. Some Windows and macOS tests are still pending — worth confirming those pass before merge, especially since this removes a native/WASM dependency which could have platform-specific behavior.

Premortem Analysis

Scenario	Risk	Mitigation
`gpt-tokenizer` patch release changes internal property names	High — silent breakage in pre-build path	Pin exact version; add integration test for round-trip
Structured clone fails for `RegExp` or `Map` in certain Node versions	Low — structured clone supports these since Node 17+	Already require Node 20+
`child_process` runtime doesn't receive `workerData`	Medium — pre-built data silently unused	Already handled: falls back to slow path. But no warning is logged when `preBuiltEncodingData` is `undefined` in worker_threads mode
Token counts diverge from tiktoken for edge cases	Low — benchmarks show 100% match	Consider keeping a small set of golden-file token count tests

Summary

Strong PR — the performance wins are substantial and well-documented. The main concern is the fragility of reaching into gpt-tokenizer internals for the pre-build optimization. Pinning the dependency version and adding a round-trip integration test would significantly reduce the risk of silent breakage.

claude · 2026-03-20T05:39:05Z

PR Review: refactor(metrics): Replace tiktoken with gpt-tokenizer (Updated)

This is a clean, well-executed migration. The final state after the 5-commit iteration — removing the pre-built encoding cache and adding z.enum() validation — lands in a good place. The code is simpler and more maintainable than the intermediate states.

What looks good

z.enum(tokenEncodings) in configSchema.ts — addresses the earlier CodeRabbit concern about unchecked string casts. Runtime validation now catches invalid encoding names at config parse time.
Simplified freeTokenCounters — correctly reduced to just tokenCounters.clear() with an accurate log message. No misleading "freed resources" language.
Removal of encodingCache.ts — good call removing the internal API access to gpt-tokenizer. The fragility wasn't worth the ~120ms improvement given Tinypool reuses workers.
Tests updated properly — mocks correctly target the new gpt-tokenizer/GptEncoding and gpt-tokenizer/resolveEncoding subpath imports.

Items to verify

1. o200k_harmony encoding validity

The tokenEncodings array in tokenEncoding.ts includes o200k_harmony. This encoding isn't widely documented. Please verify it's actually supported by gpt-tokenizer@3.4.0 — if it's not, users who specify it in their config will pass Zod validation but hit a runtime error in GptEncoding.getEncodingApi().

A quick smoke test (new TokenCounter('o200k_harmony')) would confirm.

2. Base schema still accepts any string for encoding

The repomixConfigBaseSchema (line 70-74) still uses z.string().optional() for tokenCount.encoding, while repomixConfigDefaultSchema uses z.enum(tokenEncodings). This is likely intentional (loose validation for file configs, strict for defaults), but it means an invalid encoding in repomix.config.json could pass repomixConfigBaseSchema validation but fail when merged with defaults. Worth confirming this is the desired behavior or if the base schema should also validate.

3. new Set() created on every encode call

In TokenCounter.ts:28, { disallowedSpecial: new Set() } allocates a new Set on every countTokens call. For high-throughput paths (many files), this creates unnecessary GC pressure. Consider hoisting it to a module-level constant:

const ENCODE_OPTIONS = { disallowedSpecial: new Set<string>() };
// ...
return this.encoding.encode(content, ENCODE_OPTIONS).length;

Minor optimization, but free and consistent with the performance focus of this PR.

4. Dockerfile comment still says "WASM files"

website/server/Dockerfile:31 comment says "Copy bundled server and WASM files" but tiktoken WASM is no longer used. The comment on line 35 was correctly updated, but line 31 still references WASM.

Premortem

Scenario	Risk	Notes
`gpt-tokenizer` major version bump changes `GptEncoding.getEncodingApi` signature	Medium	Standard semver risk. `^3.4.0` range is fine.
`o200k_harmony` not supported → runtime crash	Medium	Only if a user explicitly configures it. Verify once.
Token counts diverge from tiktoken on edge cases not covered by benchmarks	Low	Benchmarks show 100% match. Consider a golden-file test with a few known inputs if paranoia warrants it.

Summary

Clean migration with significant performance wins. The main open question is verifying o200k_harmony support. Everything else looks solid and ready to merge.

Review by Claude

claude · 2026-03-20T05:48:49Z

PR Review: refactor(metrics): Replace tiktoken with gpt-tokenizer (Review #3)

Great iteration — the final state is clean and well-structured. Previous concerns about encodingCache.ts internal API fragility, extraWorkerData spreading, and misleading log messages have all been addressed. Here's what remains:

Remaining items from prior reviews

1. o200k_harmony — please verify it works

tokenEncoding.ts lists o200k_harmony and it passes z.enum() validation, but if gpt-tokenizer@3.4.0 doesn't actually support it, users who configure it will get a runtime error from GptEncoding.getEncodingApi(). A quick manual test (new TokenCounter('o200k_harmony')) would confirm.

2. Base schema accepts any string for encoding

repomixConfigBaseSchema (line 70-72) uses z.string().optional() for tokenCount.encoding, while repomixConfigDefaultSchema uses z.enum(tokenEncodings). This means an invalid encoding like "banana" passes the base schema but fails when merged. This is likely intentional (loose file parsing, strict defaults), but worth confirming the merge logic handles the mismatch gracefully with a clear error message.

3. new Set() allocated on every countTokens call

In TokenCounter.ts:28, { disallowedSpecial: new Set() } creates a new Set per call. For repos with thousands of files, this adds unnecessary GC pressure. Trivial fix — hoist to a module-level constant:

const ENCODE_OPTIONS = { disallowedSpecial: new Set<string>() } as const;

4. Dockerfile line 30 still references WASM

website/server/Dockerfile:30 comment says "Copy bundled server and WASM files" — the tiktoken WASM reference is stale. Line 35's comment was correctly updated.

New observation

5. REPOMIX_WASM_DIR env var in Dockerfile

website/server/Dockerfile:47 still sets REPOMIX_WASM_DIR=/app/dist-bundled/wasm. If this was only used by tiktoken, it's now dead config. If it's used by tree-sitter WASM, ignore this — but the name could be clarified.

Summary

This is a solid migration with clear performance wins and good test coverage. The code is simpler and more maintainable than the tiktoken version. The items above are all minor — verifying o200k_harmony support is the most important one before merge.

Review by Claude

yamadashy · 2026-03-20T13:46:11Z

MacBook Pro M2 での検証結果

ローカル環境（MacBook Pro M2）で実際に repomix を実行したところ、gpt-tokenizer 版の方が遅くなることが確認されました。

	tiktoken (main)	gpt-tokenizer (this PR)
repomix 実行時間	1.5〜1.7s	1.8〜2.0s
TokenCounter init	~200ms	~100ms

TokenCounter の初期化は速くなっていますが、全体の実行時間は約 200-300ms 遅い結果でした。

原因の推測

CI 環境（Linux x86）では gpt-tokenizer（pure JS）が tiktoken（WASM/Rust）より高速でしたが、Apple Silicon では逆転しています。これは：

tiktoken の WASM は Rust からコンパイルされており、M2 の高性能な WASM 実行エンジンで非常に効率的に動作する
pure JS の BPE 実装は V8 の JIT に依存するため、ネイティブに近い WASM の速度には及ばない

ユーザーの実行環境（ローカルマシン）で遅くなるのは本末転倒なので、この PR はクローズし、tiktoken を維持する方向（#1243）で進めるのが適切と判断します。

claude · 2026-03-20T14:23:54Z

PR Review: refactor(metrics): Replace tiktoken with gpt-tokenizer (Final)

Assessment

The code changes themselves are well-executed — clean migration, proper async factory pattern, good test updates, and the z.enum() validation improvement is a genuine win. However, the maintainer's real-world benchmarking on Apple Silicon showing 200-300ms regression is the decisive factor here.

Agreement with closure decision

The CI benchmark results (4.8-14.8x faster) were misleading because they measured single-thread encoding throughput on Linux x86, not end-to-end wall time on the platforms users actually run Repomix on. tiktoken's Rust-compiled WASM benefits significantly from Apple Silicon's WASM execution engine, which pure JS BPE can't match despite V8 JIT.

Worth preserving from this PR

One improvement from this PR that could be backported to the tiktoken-based code:

z.enum() validation for encoding — the current main branch still uses .transform((val) => val as TiktokenEncoding) which is an unchecked cast. Adding runtime validation with z.enum() against known tiktoken encodings would catch invalid config values early. This could be a small standalone PR.

Review by Claude

claude · 2026-03-20T14:45:41Z

PR Review: refactor(metrics): Replace tiktoken with gpt-tokenizer

Status: Agree with closure decision

The maintainer's Apple Silicon benchmarks (200-300ms regression) are the decisive factor. CI benchmarks on Linux x86 were misleading for real-world usage — tiktoken's Rust-compiled WASM runs very efficiently on Apple Silicon's WASM engine, outperforming pure JS BPE despite V8 JIT.

Code quality (for the record)

The implementation is well-executed: clean async factory pattern, proper z.enum() validation, correctly updated tests and Dockerfile. No bugs or security concerns.

Backport suggestion

The z.enum(tokenEncodings) validation replacing the unchecked .transform((val) => val as TiktokenEncoding) cast is a genuine improvement worth backporting to the tiktoken codebase as a standalone PR.

Review by Claude

devin-ai-integration

Devin Review found 1 new potential issue.

View 8 additional findings in Devin Review.

devin-ai-integration · 2026-03-20T14:49:15Z

package.json

    "fast-xml-parser": "^5.4.1",
    "git-url-parse": "^16.1.0",
    "globby": "^16.1.1",
+    "gpt-tokenizer": "^3.4.0",


🔴 README.md not updated after tiktoken → gpt-tokenizer migration (CONTRIBUTING.md violation)

CONTRIBUTING.md requires: "You have updated relevant documentation (especially README.md) if you've added or changed functionality." This PR replaces tiktoken with gpt-tokenizer but does not update the README.md, which still contains two now-incorrect references to tiktoken:

README.md:1360 describes tokenCount.encoding as using "OpenAI's tiktoken tokenizer" and links to tiktoken's GitHub/model.py.

README.md:1791 lists tiktoken as an external bundling dependency that "Loads WASM files dynamically at runtime" — but gpt-tokenizer is pure JavaScript and does not use WASM.

Both references are factually incorrect after this change and will mislead users.

Prompt for agents

Update README.md in two places to reflect the migration from tiktoken to gpt-tokenizer: 1. README.md line 1360: Change the tokenCount.encoding description from referencing tiktoken to referencing gpt-tokenizer. Replace the tiktoken links with appropriate gpt-tokenizer references. For example: "Token count encoding (e.g., o200k_base for GPT-4o, cl100k_base for GPT-4/3.5)." 2. README.md line 1791: Change the external bundling dependency from "tiktoken - Loads WASM files dynamically at runtime" to "gpt-tokenizer - Loads encoding data files at runtime" (since gpt-tokenizer is pure JS, not WASM-based).

Was this helpful? React with 👍 or 👎 to provide feedback.

…ting Replace the WASM-based tiktoken library with gpt-tokenizer, a pure JavaScript BPE tokenizer implementation. This eliminates the native/WASM binary dependency while maintaining identical token count results across all encodings. Key changes: - Replace tiktoken with gpt-tokenizer in production dependencies - Move tiktoken to devDependencies (retained for benchmark comparison) - Introduce TokenEncoding type to replace TiktokenEncoding from tiktoken - Simplify TokenCounter by removing explicit free() resource management (gpt-tokenizer uses standard JS garbage collection) - Add benchmark script (npm run benchmark-tokenizer) comparing both libraries Benchmark results show gpt-tokenizer is 4.8-14.8x faster for encoding and 2.5x faster for initialization, with 100% token count consistency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Pre-build the BPE encoder Map (200K entries, ~60-90ms) once in the main thread and pass it to workers via workerData structured clone. Workers restore the encoding instance by directly assigning the pre-built data, bypassing the expensive BytePairEncodingCore constructor entirely. - Add encodingCache.ts with preBuildEncodingData/restoreEncodingFromData - Add extraWorkerData support to WorkerOptions/createWorkerPool - calculateMetrics pre-builds encoding before creating the worker pool - Workers detect and use pre-built data from workerData when available - Add worker pool benchmark script comparing all three approaches Worker init: 63ms → 0.037ms (~1700x faster) E2E wall time: 20% faster than tiktoken WASM, 27% faster than scratch Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…pendency The benchmarks served their purpose for the migration decision. Remove them along with the tiktoken devDependency to keep the install lighter (no more WASM binary download). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…eferences - Fix pre-built encoding data not being read in workers: Tinypool wraps workerData as [tinypoolPrivateData, userWorkerData], so access via workerData[1] is required (matching the pattern in setLogLevelByWorkerData) - Update website/server/Dockerfile to copy gpt-tokenizer instead of tiktoken - Update website/server/scripts/bundle.mjs external dependency list Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… validation Remove the encodingCache.ts module that accessed gpt-tokenizer internal APIs to pre-build and share encoding data across workers. The complexity and fragility of depending on private properties is not worth the ~120ms E2E improvement — workers are pooled by Tinypool and pay init cost once. Also: - Add z.enum() validation for encoding names in configSchema (replaces unchecked string cast that could cause runtime errors) - Export tokenEncodings array from tokenEncoding.ts for schema reuse - Simplify freeTokenCounters to just clear the cache - Remove extraWorkerData from WorkerOptions (no longer needed) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Switch from resolveEncoding (synchronous, imports all 4 encoding data files ~4.1MB) to resolveEncodingAsync (dynamic import, loads only the requested encoding). This avoids parsing unused BPE data on every worker thread. - TokenCounter: change constructor to private, add static async create() factory that uses resolveEncodingAsync for lazy loading - tokenCounterFactory: make getTokenCounter async - calculateMetricsWorker: await getTokenCounter() - Update tests for async factory pattern Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add a double-check after await to prevent duplicate TokenCounter creation if two concurrent calls interleave at the await point. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-20T16:06:35Z

⚡ Performance Benchmark

Packing the repomix repository with node bin/repomix.cjs

Runner	PR	main	Diff
Ubuntu	2.42s (±0.03s)	2.55s (±0.02s)	-0.13s (-5.2%)
macOS	1.94s (±0.14s)	1.85s (±0.12s)	+0.09s (+4.9%)
Windows	2.95s (±0.05s)	3.03s (±0.06s)	-0.08s (-2.7%)

Details

Warmup: 2 runs (discarded)
Measurement: 10 runs (median)
±: IQR (Interquartile Range) — middle 50% of measurements spread
Workflow run

claude · 2026-03-20T16:10:26Z

PR Review: refactor(metrics): Replace tiktoken with gpt-tokenizer

Overall this is a clean, well-executed refactoring. The migration from tiktoken (WASM) to gpt-tokenizer (pure JS) reduces build complexity and eliminates native/WASM binary management. Up to 47 percent faster on Linux x86 with parity on macOS.

Highlights: Async factory pattern with lazy BPE loading, z.enum config validation replacing unsafe cast, thorough test updates, proper Dockerfile and bundle config changes.

Issues:

1. Race condition in tokenCounterFactory.ts - Multiple concurrent calls independently call TokenCounter.create(). Consider a Promise-based cache to deduplicate in-flight creation.

2. Base config schema (pre-existing) - tokenCount.encoding is z.string().optional() in base schema; consider z.enum for earlier validation.

3. Minor: free() no-op - Since TokenCounter is internal, the no-op could be removed entirely.

Premortem: Low risk. Main concern is concurrent init (item 1). Dropping gpt2 unlikely to affect users. Dynamic imports properly externalized.

Verdict: Approve - Solid refactoring with good benchmarks and clean API design. Race condition in factory is the only actionable item.

This comment has been minimized.

Sign in to view

devin-ai-integration bot reviewed Mar 19, 2026

View reviewed changes

This comment was marked as resolved.

Sign in to view

This comment has been minimized.

Sign in to view

This comment was marked as resolved.

Sign in to view

This comment has been minimized.

Sign in to view

yamadashy mentioned this pull request Mar 20, 2026

perf(core): Share pre-compiled tiktoken WASM module across worker threads #1243

Closed

2 tasks

coderabbitai bot approved these changes Mar 20, 2026

View reviewed changes

This comment was marked as resolved.

Sign in to view

devin-ai-integration bot reviewed Mar 20, 2026

View reviewed changes

yamadashy and others added 8 commits March 21, 2026 01:03

docs(metrics): Clarify free() is retained for public API compatibility

cbb2091

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(metrics): Guard against TOCTOU race in async getTokenCounter

37f632b

Add a double-check after await to prevent duplicate TokenCounter creation if two concurrent calls interleave at the await point. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

yamadashy force-pushed the refactor/replace-tiktoken-with-gpt-tokenizer branch from 62c7261 to 37f632b Compare March 20, 2026 16:03

yamadashy closed this Mar 20, 2026

Uh oh!

Conversation

yamadashy commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

End-to-End Benchmark (full repository)

Encoding Compatibility

Benefits

Checklist

Uh oh!

This comment has been minimized.

This comment has been minimized.

codecov bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment has been minimized.

cloudflare-workers-and-pages bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying repomix with Cloudflare Pages

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment has been minimized.

This comment was marked as resolved.

Uh oh!

This comment has been minimized.

claude bot commented Mar 20, 2026

PR Review: refactor(metrics): Replace tiktoken with gpt-tokenizer

Fragility of Internal API Access

TokenEncoding Type Maintenance

extraWorkerData Spreading Behavior

freeTokenCounters Is Now Misleading

bytePairRankDecoder Typed as unknown

CI Status

Premortem Analysis

Summary

Uh oh!

claude bot commented Mar 20, 2026

PR Review: refactor(metrics): Replace tiktoken with gpt-tokenizer (Updated)

What looks good

Items to verify

Premortem

Summary

Uh oh!

claude bot commented Mar 20, 2026

PR Review: refactor(metrics): Replace tiktoken with gpt-tokenizer (Review #3)

Remaining items from prior reviews

New observation

Summary

Uh oh!

yamadashy commented Mar 20, 2026

MacBook Pro M2 での検証結果

原因の推測

Uh oh!

claude bot commented Mar 20, 2026

PR Review: refactor(metrics): Replace tiktoken with gpt-tokenizer (Final)

Assessment

Agreement with closure decision

Worth preserving from this PR

Uh oh!

This comment was marked as resolved.

Uh oh!

claude bot commented Mar 20, 2026

PR Review: refactor(metrics): Replace tiktoken with gpt-tokenizer

Status: Agree with closure decision

Code quality (for the record)

Backport suggestion

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 20, 2026

Choose a reason for hiding this comment

yamadashy commented Mar 19, 2026 •

edited

Loading

codecov bot commented Mar 19, 2026 •

edited

Loading

cloudflare-workers-and-pages bot commented Mar 19, 2026 •

edited

Loading

`TokenEncoding` Type Maintenance

`extraWorkerData` Spreading Behavior

`freeTokenCounters` Is Now Misleading

`bytePairRankDecoder` Typed as `unknown`

github-actions bot commented Mar 20, 2026 •

edited

Loading

claude bot commented Mar 20, 2026 •

edited

Loading