fix: don't close HTTP/SDK clients on LLMClientCache eviction#22925
fix: don't close HTTP/SDK clients on LLMClientCache eviction#22925ishaan-jaff merged 5 commits intomainfrom
Conversation
Removing the _remove_key override that eagerly called aclose()/close() on evicted clients. Evicted clients may still be held by in-flight streaming requests; closing them causes: RuntimeError: Cannot send a request, as the client has been closed. This is a regression from commit fb72979. Clients that are no longer referenced will be garbage-collected naturally. Explicit shutdown cleanup happens via close_litellm_async_clients(). Fixes production crashes after the 1-hour cache TTL expires.
Flip the assertions: evicted clients must NOT be closed. Replace test_remove_key_closes_async_client → test_remove_key_does_not_close_async_client and equivalents for sync/eviction paths. Add test_remove_key_removes_plain_values for non-client cache entries. Remove test_background_tasks_cleaned_up_after_completion (no more _background_tasks). Remove test_remove_key_no_event_loop variant that depended on old behavior.
Add two new e2e tests using real AsyncOpenAI clients: - test_evicted_openai_sdk_client_stays_usable: verifies size-based eviction doesn't close the client - test_ttl_expired_openai_sdk_client_stays_usable: verifies TTL expiry eviction doesn't close the client Both tests sleep after eviction so any create_task()-based close would have time to run, making the regression detectable. Also expand the module docstring to explain why the sleep is required.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR fixes a production regression where streaming requests would crash with Changes:
Confidence Score: 5/5
|
| Filename | Overview |
|---|---|
| litellm/caching/llm_caching_handler.py | Removes the _remove_key override that eagerly closed clients on eviction, and the _background_tasks class-level set. The class now simply inherits eviction behaviour from InMemoryCache, letting clients be GC'd naturally. The asyncio import is still required for update_cache_key_with_event_loop. |
| tests/test_litellm/caching/test_llm_caching_handler.py | Unit tests correctly inverted to assert clients are NOT closed on eviction. Old tests that asserted client.closed is True are replaced with client.closed is False. The test_remove_key_no_event_loop and test_remove_key_removes_plain_values tests provide good additional coverage. The _background_tasks cleanup test is correctly removed since that set no longer exists. |
| tests/test_litellm/caching/test_llm_client_cache_e2e.py | Adds two new tests using real AsyncOpenAI client objects (not mocks) that cover the exact production scenario. Tests include asyncio.sleep(0.15) to catch any regression that schedules close via create_task(). No real network calls are made — clients are instantiated but no API requests are sent. Accesses the private client._client attribute which could be fragile if the OpenAI SDK changes its internals. |
| AGENTS.md | Adds a clear rule (#9) prohibiting close()/aclose()/create_task(close_fn()) in LLMClientCache._remove_key() or any cache eviction path, with rationale and incident history link. |
| CLAUDE.md | Adds an "HTTP Client Cache Safety" section documenting the invariant. Concise and correctly placed under the architecture notes. |
Sequence Diagram
sequenceDiagram
participant Req as In-flight Request
participant Cache as LLMClientCache
participant Client as HTTP/SDK Client
participant GC as Python GC
Note over Cache: TTL expires (1 hour)
Req->>Client: Holds reference, streaming...
Cache->>Cache: evict_cache() / _remove_key()
rect rgb(255, 200, 200)
Note over Cache,Client: BEFORE (regression in fb72979432)
Cache--xClient: close() / create_task(aclose())
Req--xClient: RuntimeError: Cannot send a request,\nas the client has been closed
end
rect rgb(200, 255, 200)
Note over Cache,Client: AFTER (this PR)
Cache->>Cache: Remove key from cache_dict/ttl_dict only
Note over Client: Client stays open, ref count > 0
Req->>Client: Continues streaming successfully
Req->>Client: Eventually releases reference
Client->>GC: Garbage collected when refcount = 0
Note over Cache: Shutdown cleanup via close_litellm_async_clients()
end
Last reviewed commit: aa8cdbc
| await asyncio.sleep(0.15) | ||
|
|
||
| # The SDK client's internal httpx client must still be open | ||
| assert not client._client.is_closed, ( |
There was a problem hiding this comment.
Accessing private _client attribute
client._client is a private implementation detail of the openai SDK. If the SDK ever renames or restructures this attribute, this assertion will raise an AttributeError rather than failing with a clear test message. Consider using the public client.is_closed() method if the SDK exposes one, or wrapping the access in a hasattr guard with a fallback:
| assert not client._client.is_closed, ( | |
| assert not client._client.is_closed, ( |
Alternatively, a safer pattern would be:
internal_client = getattr(client, "_client", None)
assert internal_client is not None, "AsyncOpenAI no longer exposes ._client"
assert not internal_client.is_closed, (
"AsyncOpenAI client was closed on cache eviction — this causes "
"'Cannot send a request, as the client has been closed' in production"
)Same concern applies to client._client.is_closed on line 123 in test_ttl_expired_openai_sdk_client_stays_usable.
Relevant issues
Fixes streaming requests crashing with
RuntimeError: Cannot send a request, as the client has been closed.after ~1 hour in production. Regression from commitfb72979432which re-introduced client closing on cache eviction.Pre-Submission checklist
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unitCI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🐛 Bug Fix
✅ Test
📖 Documentation
Changes
LLMClientCache._remove_key()was closinghttpx/OpenAI SDK clients when they were evicted from the in-memory cache (after the 1-hour TTL). If a streaming request was still holding a reference to that client, it'd crash with the error above.Fix: remove the
_remove_keyoverride entirely. Evicted clients are no longer closed — they get garbage-collected when no more references exist. Explicit shutdown cleanup still happens viaclose_litellm_async_clients().Tests:
test_llm_caching_handler.pyto assert clients are NOT closed on evictiontest_evicted_openai_sdk_client_stays_usableandtest_ttl_expired_openai_sdk_client_stays_usablein the e2e test file — both use realAsyncOpenAIclients and sleep after eviction so anycreate_task(close_fn())regression would be caught