fix: don't close HTTP/SDK clients on LLMClientCache eviction by ishaan-jaff · Pull Request #22925 · BerriAI/litellm

ishaan-jaff · 2026-03-05T19:57:52Z

Relevant issues

Fixes streaming requests crashing with RuntimeError: Cannot send a request, as the client has been closed. after ~1 hour in production. Regression from commit fb72979432 which re-introduced client closing on cache eviction.

Pre-Submission checklist

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem

CI (LiteLLM team)

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Type

🐛 Bug Fix
✅ Test
📖 Documentation

Changes

LLMClientCache._remove_key() was closing httpx/OpenAI SDK clients when they were evicted from the in-memory cache (after the 1-hour TTL). If a streaming request was still holding a reference to that client, it'd crash with the error above.

Fix: remove the _remove_key override entirely. Evicted clients are no longer closed — they get garbage-collected when no more references exist. Explicit shutdown cleanup still happens via close_litellm_async_clients().

Tests:

Updated existing unit tests in test_llm_caching_handler.py to assert clients are NOT closed on eviction
Added test_evicted_openai_sdk_client_stays_usable and test_ttl_expired_openai_sdk_client_stays_usable in the e2e test file — both use real AsyncOpenAI clients and sleep after eviction so any create_task(close_fn()) regression would be caught

Removing the _remove_key override that eagerly called aclose()/close() on evicted clients. Evicted clients may still be held by in-flight streaming requests; closing them causes: RuntimeError: Cannot send a request, as the client has been closed. This is a regression from commit fb72979. Clients that are no longer referenced will be garbage-collected naturally. Explicit shutdown cleanup happens via close_litellm_async_clients(). Fixes production crashes after the 1-hour cache TTL expires.

Flip the assertions: evicted clients must NOT be closed. Replace test_remove_key_closes_async_client → test_remove_key_does_not_close_async_client and equivalents for sync/eviction paths. Add test_remove_key_removes_plain_values for non-client cache entries. Remove test_background_tasks_cleaned_up_after_completion (no more _background_tasks). Remove test_remove_key_no_event_loop variant that depended on old behavior.

Add two new e2e tests using real AsyncOpenAI clients: - test_evicted_openai_sdk_client_stays_usable: verifies size-based eviction doesn't close the client - test_ttl_expired_openai_sdk_client_stays_usable: verifies TTL expiry eviction doesn't close the client Both tests sleep after eviction so any create_task()-based close would have time to run, making the regression detectable. Also expand the module docstring to explain why the sleep is required.

…ction

vercel · 2026-03-05T19:57:57Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Mar 5, 2026 7:59pm

greptile-apps · 2026-03-05T20:00:29Z

Greptile Summary

This PR fixes a production regression where streaming requests would crash with RuntimeError: Cannot send a request, as the client has been closed. after roughly one hour, caused by LLMClientCache._remove_key() eagerly closing HTTP/SDK clients during cache eviction even when in-flight requests still held references to them.

Changes:

litellm/caching/llm_caching_handler.py: Removes the _remove_key override and _background_tasks set entirely. Evicted clients are now left open and garbage-collected naturally when no more references exist. Explicit cleanup at shutdown is delegated to close_litellm_async_clients().
tests/test_litellm/caching/test_llm_caching_handler.py: All unit tests inverted to assert client.closed is False after eviction; removes the now-irrelevant _background_tasks cleanup test; adds test_remove_key_removes_plain_values for non-client values.
tests/test_litellm/caching/test_llm_client_cache_e2e.py: Adds two new regression guards using real AsyncOpenAI clients, with deliberate asyncio.sleep(0.15) after eviction to catch any create_task(close_fn()) regression that runs asynchronously. No real network calls are made.
AGENTS.md / CLAUDE.md: Documents the invariant that cache eviction paths must never close clients.

Confidence Score: 5/5

This PR is safe to merge — it removes a clearly harmful code path with targeted, well-tested changes and no functional regressions.
The fix is minimal and surgical: a problematic override is deleted entirely rather than patched, eliminating the root cause. The new behaviour (let GC handle unused clients) is the conventional Python approach. Tests are comprehensive — unit tests cover the _remove_key path directly and the e2e tests include the asyncio.sleep needed to catch any async create_task regression. Documentation is updated to prevent the pattern from being re-introduced. The only minor concern is the use of the private _client attribute in two e2e tests, which is a fragility risk but not a correctness issue.
No files require special attention — the only minor issue is the client._client private attribute access in test_llm_client_cache_e2e.py lines 94 and 123.

Important Files Changed

Filename	Overview
litellm/caching/llm_caching_handler.py	Removes the `_remove_key` override that eagerly closed clients on eviction, and the `_background_tasks` class-level set. The class now simply inherits eviction behaviour from `InMemoryCache`, letting clients be GC'd naturally. The `asyncio` import is still required for `update_cache_key_with_event_loop`.
tests/test_litellm/caching/test_llm_caching_handler.py	Unit tests correctly inverted to assert clients are NOT closed on eviction. Old tests that asserted `client.closed is True` are replaced with `client.closed is False`. The `test_remove_key_no_event_loop` and `test_remove_key_removes_plain_values` tests provide good additional coverage. The `_background_tasks` cleanup test is correctly removed since that set no longer exists.
tests/test_litellm/caching/test_llm_client_cache_e2e.py	Adds two new tests using real `AsyncOpenAI` client objects (not mocks) that cover the exact production scenario. Tests include `asyncio.sleep(0.15)` to catch any regression that schedules close via `create_task()`. No real network calls are made — clients are instantiated but no API requests are sent. Accesses the private `client._client` attribute which could be fragile if the OpenAI SDK changes its internals.
AGENTS.md	Adds a clear rule (#9) prohibiting `close()`/`aclose()`/`create_task(close_fn())` in `LLMClientCache._remove_key()` or any cache eviction path, with rationale and incident history link.
CLAUDE.md	Adds an "HTTP Client Cache Safety" section documenting the invariant. Concise and correctly placed under the architecture notes.

Sequence Diagram

sequenceDiagram
    participant Req as In-flight Request
    participant Cache as LLMClientCache
    participant Client as HTTP/SDK Client
    participant GC as Python GC

    Note over Cache: TTL expires (1 hour)
    Req->>Client: Holds reference, streaming...
    Cache->>Cache: evict_cache() / _remove_key()

    rect rgb(255, 200, 200)
        Note over Cache,Client: BEFORE (regression in fb72979432)
        Cache--xClient: close() / create_task(aclose())
        Req--xClient: RuntimeError: Cannot send a request,\nas the client has been closed
    end

    rect rgb(200, 255, 200)
        Note over Cache,Client: AFTER (this PR)
        Cache->>Cache: Remove key from cache_dict/ttl_dict only
        Note over Client: Client stays open, ref count > 0
        Req->>Client: Continues streaming successfully
        Req->>Client: Eventually releases reference
        Client->>GC: Garbage collected when refcount = 0
        Note over Cache: Shutdown cleanup via close_litellm_async_clients()
    end

_{Last reviewed commit: aa8cdbc}

greptile-apps · 2026-03-05T20:00:33Z

tests/test_litellm/caching/test_llm_client_cache_e2e.py

+    await asyncio.sleep(0.15)
+
+    # The SDK client's internal httpx client must still be open
+    assert not client._client.is_closed, (


Accessing private _client attribute

client._client is a private implementation detail of the openai SDK. If the SDK ever renames or restructures this attribute, this assertion will raise an AttributeError rather than failing with a clear test message. Consider using the public client.is_closed() method if the SDK exposes one, or wrapping the access in a hasattr guard with a fallback:

Suggested change

assert not client._client.is_closed, (

assert not client._client.is_closed, (

Alternatively, a safer pattern would be:

internal_client = getattr(client, "_client", None) assert internal_client is not None, "AsyncOpenAI no longer exposes ._client" assert not internal_client.is_closed, ( "AsyncOpenAI client was closed on cache eviction — this causes " "'Cannot send a request, as the client has been closed' in production" )

Same concern applies to client._client.is_closed on line 123 in test_ttl_expired_openai_sdk_client_stays_usable.

ishaan-jaff added 5 commits March 5, 2026 11:57

docs(AGENTS.md): add rule — never close HTTP/SDK clients on cache evi…

0f5bd40

…ction

docs(CLAUDE.md): add HTTP client cache safety guideline

aa8cdbc

vercel bot deployed to Preview March 5, 2026 19:59 View deployment

greptile-apps bot reviewed Mar 5, 2026

View reviewed changes

ishaan-jaff merged commit 503eb2f into main Mar 5, 2026
30 of 41 checks passed

ishaan-jaff mentioned this pull request Mar 5, 2026

fix: don't close HTTP/SDK clients on LLMClientCache eviction #22926

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: don't close HTTP/SDK clients on LLMClientCache eviction#22925

fix: don't close HTTP/SDK clients on LLMClientCache eviction#22925
ishaan-jaff merged 5 commits intomainfrom
fix/llm-client-cache-dont-close-on-eviction

ishaan-jaff commented Mar 5, 2026

Uh oh!

vercel bot commented Mar 5, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 5, 2026

Important Files Changed

Uh oh!

greptile-apps bot Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	assert not client._client.is_closed, (
	assert not client._client.is_closed, (

Uh oh!

Conversation

ishaan-jaff commented Mar 5, 2026

Relevant issues

Pre-Submission checklist

CI (LiteLLM team)

Type

Changes

Uh oh!

vercel bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Mar 5, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Mar 5, 2026 •

edited

Loading