Skip to content

fix: don't close HTTP/SDK clients on LLMClientCache eviction#22925

Merged
ishaan-jaff merged 5 commits intomainfrom
fix/llm-client-cache-dont-close-on-eviction
Mar 5, 2026
Merged

fix: don't close HTTP/SDK clients on LLMClientCache eviction#22925
ishaan-jaff merged 5 commits intomainfrom
fix/llm-client-cache-dont-close-on-eviction

Conversation

@ishaan-jaff
Copy link
Copy Markdown
Contributor

Relevant issues

Fixes streaming requests crashing with RuntimeError: Cannot send a request, as the client has been closed. after ~1 hour in production. Regression from commit fb72979432 which re-introduced client closing on cache eviction.

Pre-Submission checklist

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem

CI (LiteLLM team)

  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🐛 Bug Fix
✅ Test
📖 Documentation

Changes

LLMClientCache._remove_key() was closing httpx/OpenAI SDK clients when they were evicted from the in-memory cache (after the 1-hour TTL). If a streaming request was still holding a reference to that client, it'd crash with the error above.

Fix: remove the _remove_key override entirely. Evicted clients are no longer closed — they get garbage-collected when no more references exist. Explicit shutdown cleanup still happens via close_litellm_async_clients().

Tests:

  • Updated existing unit tests in test_llm_caching_handler.py to assert clients are NOT closed on eviction
  • Added test_evicted_openai_sdk_client_stays_usable and test_ttl_expired_openai_sdk_client_stays_usable in the e2e test file — both use real AsyncOpenAI clients and sleep after eviction so any create_task(close_fn()) regression would be caught

Removing the _remove_key override that eagerly called aclose()/close()
on evicted clients. Evicted clients may still be held by in-flight
streaming requests; closing them causes:

  RuntimeError: Cannot send a request, as the client has been closed.

This is a regression from commit fb72979. Clients that are no longer
referenced will be garbage-collected naturally. Explicit shutdown cleanup
happens via close_litellm_async_clients().

Fixes production crashes after the 1-hour cache TTL expires.
Flip the assertions: evicted clients must NOT be closed. Replace
test_remove_key_closes_async_client → test_remove_key_does_not_close_async_client
and equivalents for sync/eviction paths.

Add test_remove_key_removes_plain_values for non-client cache entries.
Remove test_background_tasks_cleaned_up_after_completion (no more _background_tasks).
Remove test_remove_key_no_event_loop variant that depended on old behavior.
Add two new e2e tests using real AsyncOpenAI clients:
- test_evicted_openai_sdk_client_stays_usable: verifies size-based eviction
  doesn't close the client
- test_ttl_expired_openai_sdk_client_stays_usable: verifies TTL expiry
  eviction doesn't close the client

Both tests sleep after eviction so any create_task()-based close would
have time to run, making the regression detectable.

Also expand the module docstring to explain why the sleep is required.
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 5, 2026 7:59pm

Request Review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 5, 2026

Greptile Summary

This PR fixes a production regression where streaming requests would crash with RuntimeError: Cannot send a request, as the client has been closed. after roughly one hour, caused by LLMClientCache._remove_key() eagerly closing HTTP/SDK clients during cache eviction even when in-flight requests still held references to them.

Changes:

  • litellm/caching/llm_caching_handler.py: Removes the _remove_key override and _background_tasks set entirely. Evicted clients are now left open and garbage-collected naturally when no more references exist. Explicit cleanup at shutdown is delegated to close_litellm_async_clients().
  • tests/test_litellm/caching/test_llm_caching_handler.py: All unit tests inverted to assert client.closed is False after eviction; removes the now-irrelevant _background_tasks cleanup test; adds test_remove_key_removes_plain_values for non-client values.
  • tests/test_litellm/caching/test_llm_client_cache_e2e.py: Adds two new regression guards using real AsyncOpenAI clients, with deliberate asyncio.sleep(0.15) after eviction to catch any create_task(close_fn()) regression that runs asynchronously. No real network calls are made.
  • AGENTS.md / CLAUDE.md: Documents the invariant that cache eviction paths must never close clients.

Confidence Score: 5/5

  • This PR is safe to merge — it removes a clearly harmful code path with targeted, well-tested changes and no functional regressions.
  • The fix is minimal and surgical: a problematic override is deleted entirely rather than patched, eliminating the root cause. The new behaviour (let GC handle unused clients) is the conventional Python approach. Tests are comprehensive — unit tests cover the _remove_key path directly and the e2e tests include the asyncio.sleep needed to catch any async create_task regression. Documentation is updated to prevent the pattern from being re-introduced. The only minor concern is the use of the private _client attribute in two e2e tests, which is a fragility risk but not a correctness issue.
  • No files require special attention — the only minor issue is the client._client private attribute access in test_llm_client_cache_e2e.py lines 94 and 123.

Important Files Changed

Filename Overview
litellm/caching/llm_caching_handler.py Removes the _remove_key override that eagerly closed clients on eviction, and the _background_tasks class-level set. The class now simply inherits eviction behaviour from InMemoryCache, letting clients be GC'd naturally. The asyncio import is still required for update_cache_key_with_event_loop.
tests/test_litellm/caching/test_llm_caching_handler.py Unit tests correctly inverted to assert clients are NOT closed on eviction. Old tests that asserted client.closed is True are replaced with client.closed is False. The test_remove_key_no_event_loop and test_remove_key_removes_plain_values tests provide good additional coverage. The _background_tasks cleanup test is correctly removed since that set no longer exists.
tests/test_litellm/caching/test_llm_client_cache_e2e.py Adds two new tests using real AsyncOpenAI client objects (not mocks) that cover the exact production scenario. Tests include asyncio.sleep(0.15) to catch any regression that schedules close via create_task(). No real network calls are made — clients are instantiated but no API requests are sent. Accesses the private client._client attribute which could be fragile if the OpenAI SDK changes its internals.
AGENTS.md Adds a clear rule (#9) prohibiting close()/aclose()/create_task(close_fn()) in LLMClientCache._remove_key() or any cache eviction path, with rationale and incident history link.
CLAUDE.md Adds an "HTTP Client Cache Safety" section documenting the invariant. Concise and correctly placed under the architecture notes.

Sequence Diagram

sequenceDiagram
    participant Req as In-flight Request
    participant Cache as LLMClientCache
    participant Client as HTTP/SDK Client
    participant GC as Python GC

    Note over Cache: TTL expires (1 hour)
    Req->>Client: Holds reference, streaming...
    Cache->>Cache: evict_cache() / _remove_key()

    rect rgb(255, 200, 200)
        Note over Cache,Client: BEFORE (regression in fb72979432)
        Cache--xClient: close() / create_task(aclose())
        Req--xClient: RuntimeError: Cannot send a request,\nas the client has been closed
    end

    rect rgb(200, 255, 200)
        Note over Cache,Client: AFTER (this PR)
        Cache->>Cache: Remove key from cache_dict/ttl_dict only
        Note over Client: Client stays open, ref count > 0
        Req->>Client: Continues streaming successfully
        Req->>Client: Eventually releases reference
        Client->>GC: Garbage collected when refcount = 0
        Note over Cache: Shutdown cleanup via close_litellm_async_clients()
    end
Loading

Last reviewed commit: aa8cdbc

await asyncio.sleep(0.15)

# The SDK client's internal httpx client must still be open
assert not client._client.is_closed, (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accessing private _client attribute

client._client is a private implementation detail of the openai SDK. If the SDK ever renames or restructures this attribute, this assertion will raise an AttributeError rather than failing with a clear test message. Consider using the public client.is_closed() method if the SDK exposes one, or wrapping the access in a hasattr guard with a fallback:

Suggested change
assert not client._client.is_closed, (
assert not client._client.is_closed, (

Alternatively, a safer pattern would be:

internal_client = getattr(client, "_client", None)
assert internal_client is not None, "AsyncOpenAI no longer exposes ._client"
assert not internal_client.is_closed, (
    "AsyncOpenAI client was closed on cache eviction — this causes "
    "'Cannot send a request, as the client has been closed' in production"
)

Same concern applies to client._client.is_closed on line 123 in test_ttl_expired_openai_sdk_client_stays_usable.

@ishaan-jaff ishaan-jaff merged commit 503eb2f into main Mar 5, 2026
30 of 41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant