fix: tiktoken cache nonroot offline#23498
Conversation
Made-with: Cursor
Made-with: Cursor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR fixes a regression introduced in #19449 where non-root Docker images with Key changes:
Confidence Score: 5/5
|
| Filename | Overview |
|---|---|
| litellm/litellm_core_utils/default_encoding.py | Removes the LITELLM_NON_ROOT /tmp fallback; always defaults TIKTOKEN_CACHE_DIR to the bundled tokenizers directory unless CUSTOM_TIKTOKEN_CACHE_DIR overrides it. Logic is clean and offline-safe. |
| tests/test_default_encoding_non_root.py | Tests rewritten to reload the module for real behavior verification. The custom-cache test correctly patches tiktoken.get_encoding to stay offline-safe. The first test calls real tiktoken.get_encoding without a mock — intentional but environment-sensitive. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[default_encoding.py module loads] --> B[Resolve bundled tokenizers dir via importlib.resources]
B --> C{CUSTOM_TIKTOKEN_CACHE_DIR set?}
C -- Yes --> D[os.makedirs custom_cache_dir, exist_ok=True]
D --> E[cache_dir = custom_cache_dir]
C -- No --> F[cache_dir = bundled tokenizers dir]
E --> G[os.environ TIKTOKEN_CACHE_DIR = cache_dir]
F --> G
G --> H[import tiktoken]
H --> I[tiktoken.get_encoding cl100k_base with retry logic]
I --> J[encoding = tiktoken Encoding object]
style F fill:#d4edda,color:#000
style D fill:#fff3cd,color:#000
Last reviewed commit: a6f8389
Made-with: Cursor
Made-with: Cursor
| with patch( | ||
| "litellm.litellm_core_utils.default_encoding.tiktoken.get_encoding", | ||
| return_value=MagicMock(), | ||
| ): | ||
| _reload_default_encoding( | ||
| monkeypatch, CUSTOM_TIKTOKEN_CACHE_DIR=str(custom_dir) | ||
| ) |
There was a problem hiding this comment.
Module state not restored after mocked reload
After test_custom_tiktoken_cache_dir_override completes and the with patch(...) block exits, default_encoding.encoding is left as a MagicMock() in sys.modules. The monkeypatch fixture restores the environment variables, but it does not reload the module back to a clean state.
litellm/_lazy_imports.py calls from litellm.litellm_core_utils.default_encoding import encoding lazily inside _get_default_encoding(), which executes the import at call time. If any test that runs after this one (in the same process) invokes _get_default_encoding() before _reload_default_encoding is called again, it will receive a MagicMock instead of a real tiktoken Encoding.
To be safe, consider adding a module teardown reload at the end of the test (after the assertions) so the module is restored to a clean state regardless of test execution order:
# After assertions — restore module to a clean, real-encoding state
monkeypatch.delenv("TIKTOKEN_CACHE_DIR", raising=False)
importlib.reload(default_encoding)Made-with: Cursor
| def test_default_encoding_uses_bundled_tokenizers_by_default(monkeypatch): | ||
| """ | ||
| TIKTOKEN_CACHE_DIR should point at the bundled tokenizers directory | ||
| when no CUSTOM_TIKTOKEN_CACHE_DIR is set, even in non-root environments. | ||
| """ | ||
| _reload_default_encoding(monkeypatch, LITELLM_NON_ROOT="true") | ||
|
|
||
| assert filename == "/tmp/tiktoken_cache" | ||
| assert "TIKTOKEN_CACHE_DIR" in os.environ | ||
| cache_dir = os.environ["TIKTOKEN_CACHE_DIR"] | ||
| assert "tokenizers" in cache_dir |
There was a problem hiding this comment.
No module-state cleanup after first test
test_default_encoding_uses_bundled_tokenizers_by_default calls _reload_default_encoding, which reloads the module and sets default_encoding.encoding to a freshly-loaded tiktoken Encoding. There is no cleanup reload at the end of this test function to restore the module to its original state (unlike test_custom_tiktoken_cache_dir_override, which explicitly reloads at lines 55–57).
In practice this is low-risk because the encoding is still a real tiktoken Encoding object loaded from the bundled tokenizers, so downstream consumers of default_encoding.encoding won't receive a broken value. However, if a subsequent test calls _get_default_encoding() in _lazy_imports.py before any other reload, it will use the already-cached _default_encoding (set when litellm was first imported) and won't be affected. The concern is if default_encoding.encoding is ever accessed directly (not via the lazy cache), it may differ from the original import-time state.
For symmetry and defensive consistency, consider adding a cleanup reload at the end of this test, as done in test_custom_tiktoken_cache_dir_override:
assert "tokenizers" in cache_dir
# Restore module to original state
monkeypatch.delenv("TIKTOKEN_CACHE_DIR", raising=False)
monkeypatch.delenv("CUSTOM_TIKTOKEN_CACHE_DIR", raising=False)
importlib.reload(default_encoding)Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
…der (#23663) * fix: forward extra_headers to HuggingFace embedding calls (#23525) Fixes #23502 The huggingface_embed.embedding() call was not receiving the headers parameter, causing extra_headers (e.g., X-HF-Bill-To) to be silently dropped. Other providers (openrouter, vercel_ai_gateway, bedrock) already pass headers correctly. This fix adds headers=headers to match the behavior of other providers. Co-authored-by: Jah-yee <sparklab@outlook.com> * fix: add getPopupContainer to Select components in fallback modal to fix z-index issue (#23516) The model dropdown menus in the Add Fallbacks modal were rendering behind the modal overlay because Ant Design portals Select dropdowns to document.body by default. By setting getPopupContainer to attach the dropdown to its parent element, the dropdown inherits the modal's stacking context and renders above the modal. Fixes #17895 * PR #22867 added _remove_scope_from_cache_control for Bedrock and Azur… (#23183) * PR #22867 added _remove_scope_from_cache_control for Bedrock and Azure AI but omitted Vertex AI. This applies the same pattern to VertexAIPartnerModelsAnthropicMessagesConfig." * PR #22867 added _remove_scope_from_cache_control for Bedrock and Azure AI but omitted Vertex AI. This applies the same pattern to VertexAIPartnerModelsAnthropicMessagesConfig." * PR #22867 added _remove_scope_from_cache_control to AzureAnthropicMessagesConfig but missed VertexAIPartnerModelsAnthropicMessagesConfi Rather than duplicating the method again, moved it up to the base AnthropicMessagesConfig so all providers inherit it, and removed the now-redundant copy from the Azure AI subclass. * PR #22867 added _remove_scope_from_cache_control to AzureAnthropicMessagesConfig but missed VertexAIPartnerModelsAnthropicMessagesConfi Rather than duplicating the method again, moved it up to the base AnthropicMessagesConfig so all providers inherit it, and removed the now-redundant copy from the Azure AI subclass. --------- Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * fix: auto-fill reasoning_content for moonshot kimi reasoning models in multi-turn tool calling (#23580) * Handle response.failed, response.incomplete, and response.cancelled (#23492) * Handle response.failed, response.incomplete, and response.cancelled terminal events in background streaming Previously the background streaming task only handled response.completed and hardcoded the final status to "completed". This missed three other terminal event types from the OpenAI streaming spec, causing failed/incomplete/cancelled responses to be incorrectly marked as completed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Committed-By-Agent: claude * Remove unused terminal_response_data variable Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Committed-By-Agent: claude * Address code review: derive fallback status from event type, rewrite tests as integration tests 1. Replace hardcoded "completed" fallback in response_data.get("status") with _event_to_status lookup so that response.incomplete and response.cancelled events get the correct fallback if the response body ever omits the status field. 2. Replace duplicated-logic unit tests with integration tests that exercise background_streaming_task directly using mocked streaming responses and assert on the final update_state call arguments. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Committed-By-Agent: claude * Remove dead mock_processor and unused mock_response parameter from test helper Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Committed-By-Agent: claude * Remove FastAPI and UserAPIKeyAuth imports from test file These types were only used as Mock(spec=...) arguments. Drop the spec constraints and remove the top-level imports to avoid pulling FastAPI into test files outside litellm/proxy/. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Committed-By-Agent: claude * Log warning when streaming response has no body_iterator If base_process_llm_request returns a non-streaming response (no body_iterator), log a warning since this likely indicates a misconfiguration or provider error rather than a successful completion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Committed-By-Agent: claude --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix(security): bump tar to 7.5.11 and tornado to 6.5.5 (#23602) * fix(security): bump tar to 7.5.11 and tornado to 6.5.5 - tar >=7.5.11: fixes CVE-2026-31802 (HIGH) in node-pkg - tornado >=6.5.5: fixes CVE-2026-31958 (HIGH) and GHSA-78cv-mqj4-43f7 (MEDIUM) in python-pkg Addresses vulnerabilities found in ghcr.io/berriai/litellm:main-v1.82.0-stable Trivy scan. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: document tar override is enforced via Dockerfile, not npm * fix: revert invalid JSON comment in package.json tar override --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * [Feat] - Ishaan main merge branch (#23596) * fix(bedrock): respect s3_region_name for batch file uploads (#23569) * fix(bedrock): respect s3_region_name for batch file uploads (GovCloud fix) * fix: s3_region_name always wins over aws_region_name for S3 signing (Greptile feedback) * fix: _filter_headers_for_aws_signature - Bedrock KB (#23571) * fix: _filter_headers_for_aws_signature * fix: filter None header values in all post-signing re-merge paths Addresses Greptile feedback: None-valued headers were being filtered during SigV4 signing but re-merged back into the final headers dict afterward, which would cause downstream HTTP client failures. Made-with: Cursor * feat(router): tag_regex routing — route by User-Agent regex without per-developer tag config (#23594) * feat(router): add tag_regex support for header-based routing Adds a new `tag_regex` field to litellm_params that lets operators route requests based on regex patterns matched against request headers — primarily User-Agent — without requiring per-developer tag configuration. Use case: route all Claude Code traffic (User-Agent: claude-code/x.y.z) to a dedicated deployment by setting: tag_regex: - "^User-Agent: claude-code\\/" in the deployment's litellm_params. Works alongside existing `tags` routing; exact tag match takes precedence over regex match. Unmatched requests fall through to deployments tagged `default`. The matched deployment, pattern, and user_agent are recorded in `metadata["tag_routing"]` so they flow through to SpendLogs automatically. * fix(tag_regex): address backwards-compat, metadata overwrite, and warning noise Three issues from code review: 1. Backwards-compat: `has_tag_filter` was widened to activate on any non-empty User-Agent, which would raise ValueError for existing deployments using plain tags without a `default` fallback. Fix: only activate header-based regex filtering when at least one candidate deployment has `tag_regex` configured. 2. Metadata overwrite: `metadata["tag_routing"]` was overwritten for every matching deployment in the loop, leaving inaccurate provenance when multiple deployments match. Fix: write only for the first match. 3. Warning noise: an invalid regex pattern logged one warning per header string rather than once per pattern. Fix: compile first (catching re.error once), then iterate over header strings. Also adds two new tests covering these cases, and adds docs page for tag_regex routing with a Claude Code walk-through. * refactor(tag_regex): remove unnecessary _healthy_list copy * docs: merge tag_regex section into tag_routing.md, remove standalone page - Add ## Regex-based tag routing (tag_regex) section to existing tag_routing.md instead of a separate page - Remove tag_regex_routing.md standalone doc (odd UX to have a separate page for a sub-feature) - Remove proxy/tag_regex_routing from sidebars.js - Add match_any=False debug warning in tag_based_routing.py when regex routing fires under strict mode (regex always uses OR semantics) * fix(tag_regex): address greptile review - security docs, strict-mode enforcement, validation order - Strengthen security note in tag_routing.md: explicitly state User-Agent is client-supplied and can be set to any value; frame tag_regex as a traffic classification hint, not an access-control mechanism - Move tag_regex startup validation before _add_deployment() so an invalid pattern never leaves partial router state - Enforce match_any=False strict-tag policy: when a deployment has both tags and tag_regex and the strict tag check fails, skip the regex fallback rather than silently bypassing the operator's intent - Extract per-deployment match logic into _match_deployment() helper to keep get_deployments_for_tag() readable - Add two new tests: strict-mode blocks regex fallback, regex-only deployment still matches under match_any=False * fix(ci): apply Black formatting to 14 files and stabilize flaky caplog tests - Run Black formatter on 14 files that were failing the lint check - Replace caplog-based assertions in TestAliasConflicts with unittest.mock.patch on verbose_logger.warning for xdist compatibility - The caplog fixture can produce empty text in pytest-xdist workers in certain CI environments, causing flaky test failures Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: tiktoken cache nonroot offline (#23498) * fix: restore offline tiktoken cache for non-root envs Made-with: Cursor * chore: mkdir for custom tiktoken cache dir Made-with: Cursor * test: patch tiktoken.get_encoding in custom-dir test to avoid network Made-with: Cursor * test: clear CUSTOM_TIKTOKEN_CACHE_DIR in helper for test isolation Made-with: Cursor * test: restore default_encoding module state after custom-dir test Made-with: Cursor * fix: normalize content_filtered finish_reason (#23564) Map provider finish_reason "content_filtered" to the OpenAI-compatible "content_filter" and extend core_helpers tests to cover this case. Made-with: Cursor * fix: Fixes #23185 (#23647) * fix: merge annotations from all streaming chunks in stream_chunk_builder Previously, stream_chunk_builder only took annotations from the first chunk that contained them, losing any annotations from later chunks. This is a problem because providers like Gemini/Vertex AI send grounding metadata (converted to annotations) in the final streaming chunk, while other providers may spread annotations across multiple chunks. Changes: - Collect and merge annotations from ALL annotation-bearing chunks instead of only using the first one --------- Co-authored-by: RoomWithOutRoof <166608075+Jah-yee@users.noreply.github.com> Co-authored-by: Jah-yee <sparklab@outlook.com> Co-authored-by: Ethan T. <ethanchang32@gmail.com> Co-authored-by: Awais Qureshi <awais.qureshi@arbisoft.com> Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Pradyumna Yadav <pradyumna.aky@gmail.com> Co-authored-by: xianzongxie-stripe <87151258+xianzongxie-stripe@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Joe Reyna <joseph.reyna@gmail.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> Co-authored-by: milan-berri <milan@berri.ai>
Summary
This PR fixes a regression in tiktoken cache handling for non-root + offline environments and aligns both eager and lazy loading with the same offline-safe behavior.
Today, non-root Docker images with
--network=nonecan fail on startup because tiktoken attempts to download thecl100k_basevocab into an empty cache directory, instead of using the pre-bundled vocab that LiteLLM ships.Background
LiteLLM bundles the
cl100k_basevocab at:Historically,
default_encoding.pyset:This guaranteed:
PR #19449 changed this for non-root images by introducing:
This solved “can’t write into read-only site-packages” by redirecting non-root to
/tmp/tiktoken_cache, but it also introduced a regression:/tmp/tiktoken_cachestarts empty.--network=noneenvironments, tiktoken cannot download the vocab into that empty dir → startup fails.CUSTOM_TIKTOKEN_CACHE_DIRis manually pointed back at the bundled dir.Separately, PR #19774 fixed lazy loading by making
__getattr__use_get_default_encoding(), which assumes thatdefault_encoding.pyhas correctly setTIKTOKEN_CACHE_DIRto a local cache. That PR does not address the/tmpbehavior itself; it just ensures lazy loading respects whateverdefault_encoding.pyconfigured.Fix
This PR simplifies and hardens the cache selection logic in
default_encoding.py:Key points:
Default (no env):
TIKTOKEN_CACHE_DIR→ bundledtokenizersdir.When
CUSTOM_TIKTOKEN_CACHE_DIRis set:os.makedirs(custom_cache_dir, exist_ok=True)so the directory always exists.TIKTOKEN_CACHE_DIR→ that custom path.Removed logic:
LITELLM_NON_ROOTor redirect to/tmp/tiktoken_cacheby default.Why this approach
--network=noneuses the bundled vocab again; no network fetch is attempted.site-packages._get_default_encoding()(and PR #19774) already rely ondefault_encoding.pyto setTIKTOKEN_CACHE_DIRto a local cache.default_encoding.py).Tests
Updated tests in
tests/test_default_encoding_non_root.py:test_default_encoding_uses_bundled_tokenizers_by_defaultLITELLM_NON_ROOT="true"with noCUSTOM_TIKTOKEN_CACHE_DIR.default_encoding.TIKTOKEN_CACHE_DIRis set."tokenizers"(i.e., points at the bundled directory).test_custom_tiktoken_cache_dir_overridetmp_path / "tiktoken_cache".CUSTOM_TIKTOKEN_CACHE_DIRto that path.default_encoding.TIKTOKEN_CACHE_DIR == custom_dir.os.path.isdir(TIKTOKEN_CACHE_DIR)→ the directory was created.These tests verify both:
/tmpredirect).CUSTOM_TIKTOKEN_CACHE_DIR.Impact
/tmpcache.TIKTOKEN_CACHE_DIRis now consistently initialized to a local cache before anyget_encodingcall.CUSTOM_TIKTOKEN_CACHE_DIR) for advanced setups, while keeping the default path simple and offline-safe.Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewCI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🐛 Bug Fix