fix: HTTP client memory leaks in Presidio, OpenAI, and Gemini#19190
Merged
krrishdholakia merged 7 commits intoBerriAI:litellm_staging_01_20_2026from Jan 20, 2026
Merged
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Member
|
@rsp2k is this ready for review? i see it marked as draft |
043abf4 to
58e82f1
Compare
AlexsanderHamir
approved these changes
Jan 17, 2026
Fixes multiple memory leak issues reported in BerriAI#14540 and related tickets: **Presidio Guardrail Fix (BerriAI#14540)** - Problem: Every guardrail check created a new aiohttp.ClientSession - Impact: High-traffic proxies accumulated thousands of unclosed sessions - Solution: Share a single session across all guardrail checks - Added `self._http_session` instance variable - Lazy session creation via `_get_http_session()` - Proper cleanup via `_close_http_session()` and `__del__()` - Files: litellm/proxy/guardrails/guardrail_hooks/presidio.py **OpenAI HTTP Client Caching (BerriAI#14540)** - Problem: `_get_async_http_client()` created new httpx.AsyncClient on each call - Impact: OpenAI/Azure completions bypassed client caching system - Solution: Route through `get_async_httpx_client()` for TTL-based caching - Caches clients by provider and SSL config - Fallback to direct creation if caching fails - Applied to both async and sync client methods - Files: litellm/llms/openai/common_utils.py **Test Script** - Added validation script to demonstrate fixes - Counts file descriptors and unclosed session objects - Files: test_oom_fixes.py Related issues: BerriAI#14384, BerriAI#13251, BerriAI#12443
…nt creation Fixes two high-impact memory leaks: 1. Presidio Guardrail Session Leak (issue BerriAI#14540) - Problem: Created new aiohttp.ClientSession on every guardrail check - Impact: Runs on EVERY proxy request when PII masking enabled - Fix: Shared session pattern with lifecycle management - Files: litellm/proxy/guardrails/guardrail_hooks/presidio.py 2. OpenAI HTTP Client Cache Bypass (issue BerriAI#14540) - Problem: _get_async_http_client() created new httpx.AsyncClient, bypassing TTL cache - Impact: Every completion created new client with own connection pool - Fix: Route through get_async_httpx_client() for proper caching - Critical: Include SSL config in cache key for correctness - Files: litellm/llms/openai/common_utils.py Validation: - Presidio: 100 requests → 0 new sessions (was 100) - OpenAI: 100 calls → 1 unique client (was 100) - test_oom_fixes.py: Automated validation script
Fixes persistent "Unclosed client session" warnings when using Gemini models. Root Causes: 1. Broken atexit cleanup - get_event_loop() fails at exit time 2. On-demand session creation without reliable cleanup Changes: 1. Fixed atexit Cleanup (async_client_cleanup.py) - OLD: Used get_event_loop() which fails when loop is closed - NEW: Always create fresh event loop at exit time - Ensures cleanup runs successfully even when main loop is closed 2. Added __del__ Cleanup (aiohttp_handler.py) - Defense-in-depth: cleanup on garbage collection - Handles abnormal termination cases - Similar pattern to Presidio guardrail fix 3. Enhanced Cleanup Scope (async_client_cleanup.py) - Now closes global base_llm_aiohttp_handler instance - Previously only checked cache, missed module-level handler Validation: - Test 1: __del__ cleanup → 0 sessions leaked ✓ - Test 2: atexit cleanup → 0 sessions leaked ✓ - test_gemini_session_leak.py: Automated validation Related: BerriAI#14540 (broader OOM issue tracking)
MyPy was failing because llm_provider parameter expects Union[LlmProviders, httpxSpecialProvider], not a string. Changed from string "openai" to LlmProviders.OPENAI enum value.
- Move test_oom_fixes.py to tests/test_litellm/llms/ - Move test_gemini_session_leak.py to tests/test_litellm/llms/custom_httpx/ - Fix pytest warning: use pytest.skip() instead of return True This ensures CI actually runs our OOM fix validation tests.
…sion creation - Make _get_http_session() async with asyncio.Lock protection - Prevents multiple concurrent requests from creating orphaned sessions - Add concurrent load test (50 parallel requests) to validate fix - Test confirms only 1 session created under concurrent load Critical fix: Previous implementation had race condition where concurrent guardrail checks could create multiple sessions, defeating the shared session pattern and causing memory leaks.
411e3ee to
84234a7
Compare
Move asyncio.Lock creation from lazy initialization in _get_http_session() to __init__. The previous lazy init had a race condition where concurrent coroutines could both see _session_lock as None, both create locks, and end up with different lock instances - defeating the synchronization. asyncio.Lock() can be safely created without an event loop; it only requires one when awaited.
Contributor
Author
|
Yes, ready for review! Just pushed a fix for a race condition in the Presidio session lock initialization - the lock was being created lazily which could cause concurrent coroutines to end up with different lock instances. |
58c8c2b
into
BerriAI:litellm_staging_01_20_2026
5 of 7 checks passed
6 tasks
1 task
This was referenced Feb 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix: HTTP Client Memory Leaks (Issues #14540, #12443)
Fixes three high-impact memory leaks in LiteLLM's HTTP client lifecycle management.
Issues Addressed
Changes
1. Presidio Guardrail Session Leak (CRITICAL)
Impact: Every guardrail check created new
aiohttp.ClientSession_get_http_session()__del__cleanup for safetyFiles:
litellm/proxy/guardrails/guardrail_hooks/presidio.py2. OpenAI Client Caching Bypass
Impact: Every completion created new client, bypassing LiteLLM's TTL cache
get_async_httpx_client()for proper cachingFiles:
litellm/llms/openai/common_utils.py3. Gemini aiohttp Session Leak (#12443)
Impact: Persistent "Unclosed client session" warnings
asyncio.new_event_loop()(was failing withget_event_loop())__del__cleanup toBaseLLMAIOHTTPHandlerfor defense-in-depthbase_llm_aiohttp_handlerinstanceFiles:
litellm/llms/custom_httpx/async_client_cleanup.pylitellm/llms/custom_httpx/aiohttp_handler.pyValidation
All fixes validated with automated tests:
test_oom_fixes.py- Presidio + OpenAI validation (2/2 tests passing)test_gemini_session_leak.py- Gemini cleanup validation (3/3 tests passing)Run tests:
Context
This PR responds to @ishaan-jaff's request for collaboration on broader OOM issues in this comment.
The fixes follow a pattern of ensuring HTTP clients are managed through LiteLLM's centralized lifecycle system (
LLMClientCachewith TTL) rather than being created ad-hoc per request.Root Cause Pattern
All three leaks shared a common anti-pattern:
httpx.AsyncClient()oraiohttp.ClientSession()directlylitellm/llms/custom_httpx/http_handler.pyNext Steps
Happy to continue working on remaining OOM issues if helpful:
cc @ishaan-jaff