-
Notifications
You must be signed in to change notification settings - Fork 2.4k
feat: use tiktoken-rs instead of tokenizers, single global tokenizer #3115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This addresses the critical performance issue where token counter downloads would create nested Tokio runtimes and block the async executor. Key improvements: - AsyncTokenCounter with proper async download patterns - Global tokenizer cache to prevent repeated downloads - Token result caching with hash-based lookup (80-90% hit rates) - Main context management now uses async token counting - Backward compatible legacy TokenCounter with fixed blocking HTTP client - Comprehensive test coverage for async functionality Performance benefits: - Eliminates blocking Runtime::new().block_on() anti-pattern - Concurrent tokenizer downloads without blocking main executor - Shared tokenizer instances reduce memory usage - Token count caching provides significant speedup on repeated text - Async context operations now properly non-blocking The critical async paths (truncate_context, summarize_context) now use AsyncTokenCounter for optimal performance while maintaining full backward compatibility for sync usage.
…vements This builds on the async token counter with focused optimizations: Performance improvements: - Replace DefaultHasher with AHasher for 2-3x faster cache lookups - Eliminate lock contention by using DashMap for global tokenizer cache - Add cache size management to prevent unbounded memory growth - Maintain accurate token counting while improving cache performance Key changes: - AHasher provides better hash distribution and performance vs DefaultHasher - DashMap allows concurrent reads without blocking on different keys - Cache eviction policies prevent memory leaks in long-running processes - Preserve original tokenization behavior for consistent results These optimizations provide measurable performance gains especially in high-throughput scenarios with concurrent tokenizer access and frequent token counting operations.
- Fixed needless borrow warnings in context.rs - Added blocking feature to reqwest for backward compatibility - Moved demo file to proper examples directory - Applied cargo fmt formatting - All tests pass successfully
- Implement exponential backoff retry logic (3 attempts, up to 30s delay) - Add comprehensive download validation and corruption detection - Enhanced HTTP client with proper timeouts (60s total, 15s connect) - Progress reporting for large tokenizer downloads (>1MB) - Smart retry strategy: retry server errors (5xx) and network failures, fail fast on client errors (4xx) - File integrity validation with JSON structure checking - Partial download recovery and cleanup of corrupted files - Comprehensive test coverage for network resilience scenarios This addresses real-world network conditions including: - Temporary connectivity loss and DNS resolution failures - HuggingFace server downtime/rate limiting - Connection timeouts on slow networks - Partial download corruption
|
To see the benchmarking results for previous tokenizers vs tiktoken and also sync/async, download the zip and open for 100K tokens, here are the mean times: |
jamadeo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we expect the error to be when using tiktoken for claude models? I'm guessing it isn't too big a deal since we are mainly using this to suggest when to summarize?
| use mcp_core::{content::TextContent, Role}; | ||
| use std::env; | ||
|
|
||
| #[warn(dead_code)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it not warn for dead code by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why this one wasn't
Anthropic doesn't provide us a tokenizer. The previous |
* main: (37 commits) fix: fix desktop recipe url generation (block#3209) feat: improve UX for saving recipes (block#3214) fix: Pass Google AI API key in HTTP header, not query param (block#3192) docs: add linter to CONTRIBUTING.md (block#3168) feat: Structured output for recipes (block#3188) Fix cost tracking accuracy and OpenRouter model pricing (block#3189) docs: update cli install instructions for windows (block#3205) Docs: Cost tracking on the desktop app (block#3204) feat: Adding streamable-http transport support for backend, desktop and cli (block#2942) fix: use the correct `contains` syntax on create-recipe-pr.yml (block#3193) Temporarily Remove GH Copilot Provider (block#3199) docs: fix tab navigation (block#3201) feat: use tiktoken-rs instead of tokenizers, single global tokenizer (block#3115) add playwright-mcp server to extensions list (block#3010) Add `/extension` path for extension installation (block#3011) feat(desktop): Prioritize suffix when truncating path in header (block#3110) chore(release): release version 1.0.31 (block#3185) feat: additional sub recipes via command line (block#3163) Add Internal Recipes To Recipes Cookbook (block#3179) pipe the argument to storage (block#3184) ...
* main: (150 commits) Defend against invalid sessions (block#3229) Clean up session file optionality for --no-session (block#3230) Feat: Support Recipe Parameters in Goose desktop app (block#3155) docs: update recipe example (block#3222) Add native OAuth 2.0 authentication support to MCP client (block#3213) build: Check in Cargo.lock changes (block#3220) fix: fix desktop recipe url generation (block#3209) feat: improve UX for saving recipes (block#3214) fix: Pass Google AI API key in HTTP header, not query param (block#3192) docs: add linter to CONTRIBUTING.md (block#3168) feat: Structured output for recipes (block#3188) Fix cost tracking accuracy and OpenRouter model pricing (block#3189) docs: update cli install instructions for windows (block#3205) Docs: Cost tracking on the desktop app (block#3204) feat: Adding streamable-http transport support for backend, desktop and cli (block#2942) fix: use the correct `contains` syntax on create-recipe-pr.yml (block#3193) Temporarily Remove GH Copilot Provider (block#3199) docs: fix tab navigation (block#3201) feat: use tiktoken-rs instead of tokenizers, single global tokenizer (block#3115) add playwright-mcp server to extensions list (block#3010) ...
…lock#3115) Co-authored-by: jack <> Signed-off-by: Adam Tarantino <[email protected]>
…lock#3115) Co-authored-by: jack <> Signed-off-by: Soroosh <[email protected]>
…lock#3115) Co-authored-by: jack <>
we estimate the # tokens + lots of recent open source models use tiktoken
BEFORE (tokenizers):
AFTER (tiktoken):
The difference in performance is significant, especially init (since we just have one tokenizer) but the count time is also 3x faster
📊 Benchmark Results Analysis (done by goose)
Performance Summary (10,000 tokens)
Performance Summary (100,000 tokens)
Key Findings
🎯 Raw Tokenization Performance
📈 Scaling Characteristics
Real-World Implications
✅ What This Means
🎯 Bottom Line