Skip to content

fix: HTTP client memory leaks in Presidio, OpenAI, and Gemini#19190

Merged
krrishdholakia merged 7 commits intoBerriAI:litellm_staging_01_20_2026from
rsp2k:fix/oom-http-client-leaks
Jan 20, 2026
Merged

fix: HTTP client memory leaks in Presidio, OpenAI, and Gemini#19190
krrishdholakia merged 7 commits intoBerriAI:litellm_staging_01_20_2026from
rsp2k:fix/oom-http-client-leaks

Conversation

@rsp2k
Copy link
Contributor

@rsp2k rsp2k commented Jan 16, 2026

Fix: HTTP Client Memory Leaks (Issues #14540, #12443)

Fixes three high-impact memory leaks in LiteLLM's HTTP client lifecycle management.

Issues Addressed

Changes

1. Presidio Guardrail Session Leak (CRITICAL)

Impact: Every guardrail check created new aiohttp.ClientSession

  • Added shared session pattern with _get_http_session()
  • Added __del__ cleanup for safety
  • Scope: Runs on EVERY proxy request when PII masking enabled

Files: litellm/proxy/guardrails/guardrail_hooks/presidio.py

2. OpenAI Client Caching Bypass

Impact: Every completion created new client, bypassing LiteLLM's TTL cache

  • Route through get_async_httpx_client() for proper caching
  • Critical: Include SSL config in cache key (prevents different SSL configs sharing same client)
  • Added specific exception handling with debug logging

Files: litellm/llms/openai/common_utils.py

3. Gemini aiohttp Session Leak (#12443)

Impact: Persistent "Unclosed client session" warnings

  • Fixed atexit cleanup to use asyncio.new_event_loop() (was failing with get_event_loop())
  • Added __del__ cleanup to BaseLLMAIOHTTPHandler for defense-in-depth
  • Close global base_llm_aiohttp_handler instance

Files:

  • litellm/llms/custom_httpx/async_client_cleanup.py
  • litellm/llms/custom_httpx/aiohttp_handler.py

Validation

All fixes validated with automated tests:

  • test_oom_fixes.py - Presidio + OpenAI validation (2/2 tests passing)
  • test_gemini_session_leak.py - Gemini cleanup validation (3/3 tests passing)

Run tests:

poetry run pytest test_oom_fixes.py -v
poetry run pytest test_gemini_session_leak.py -v

Context

This PR responds to @ishaan-jaff's request for collaboration on broader OOM issues in this comment.

The fixes follow a pattern of ensuring HTTP clients are managed through LiteLLM's centralized lifecycle system (LLMClientCache with TTL) rather than being created ad-hoc per request.

Root Cause Pattern

All three leaks shared a common anti-pattern:

  • Code created httpx.AsyncClient() or aiohttp.ClientSession() directly
  • Bypassed LiteLLM's caching infrastructure in litellm/llms/custom_httpx/http_handler.py
  • Resources accumulated without proper cleanup

Next Steps

Happy to continue working on remaining OOM issues if helpful:

cc @ishaan-jaff

@vercel
Copy link

vercel bot commented Jan 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
litellm Error Error Jan 18, 2026 11:07pm

Request Review

@krrishdholakia
Copy link
Member

@rsp2k is this ready for review? i see it marked as draft

rsp2k added 6 commits January 17, 2026 19:03
Fixes multiple memory leak issues reported in BerriAI#14540 and related tickets:

**Presidio Guardrail Fix (BerriAI#14540)**
- Problem: Every guardrail check created a new aiohttp.ClientSession
- Impact: High-traffic proxies accumulated thousands of unclosed sessions
- Solution: Share a single session across all guardrail checks
  - Added `self._http_session` instance variable
  - Lazy session creation via `_get_http_session()`
  - Proper cleanup via `_close_http_session()` and `__del__()`
- Files: litellm/proxy/guardrails/guardrail_hooks/presidio.py

**OpenAI HTTP Client Caching (BerriAI#14540)**
- Problem: `_get_async_http_client()` created new httpx.AsyncClient on each call
- Impact: OpenAI/Azure completions bypassed client caching system
- Solution: Route through `get_async_httpx_client()` for TTL-based caching
  - Caches clients by provider and SSL config
  - Fallback to direct creation if caching fails
  - Applied to both async and sync client methods
- Files: litellm/llms/openai/common_utils.py

**Test Script**
- Added validation script to demonstrate fixes
- Counts file descriptors and unclosed session objects
- Files: test_oom_fixes.py

Related issues: BerriAI#14384, BerriAI#13251, BerriAI#12443
…nt creation

Fixes two high-impact memory leaks:

1. Presidio Guardrail Session Leak (issue BerriAI#14540)
   - Problem: Created new aiohttp.ClientSession on every guardrail check
   - Impact: Runs on EVERY proxy request when PII masking enabled
   - Fix: Shared session pattern with lifecycle management
   - Files: litellm/proxy/guardrails/guardrail_hooks/presidio.py

2. OpenAI HTTP Client Cache Bypass (issue BerriAI#14540)
   - Problem: _get_async_http_client() created new httpx.AsyncClient, bypassing TTL cache
   - Impact: Every completion created new client with own connection pool
   - Fix: Route through get_async_httpx_client() for proper caching
   - Critical: Include SSL config in cache key for correctness
   - Files: litellm/llms/openai/common_utils.py

Validation:
- Presidio: 100 requests → 0 new sessions (was 100)
- OpenAI: 100 calls → 1 unique client (was 100)
- test_oom_fixes.py: Automated validation script
Fixes persistent "Unclosed client session" warnings when using Gemini models.

Root Causes:
1. Broken atexit cleanup - get_event_loop() fails at exit time
2. On-demand session creation without reliable cleanup

Changes:

1. Fixed atexit Cleanup (async_client_cleanup.py)
   - OLD: Used get_event_loop() which fails when loop is closed
   - NEW: Always create fresh event loop at exit time
   - Ensures cleanup runs successfully even when main loop is closed

2. Added __del__ Cleanup (aiohttp_handler.py)
   - Defense-in-depth: cleanup on garbage collection
   - Handles abnormal termination cases
   - Similar pattern to Presidio guardrail fix

3. Enhanced Cleanup Scope (async_client_cleanup.py)
   - Now closes global base_llm_aiohttp_handler instance
   - Previously only checked cache, missed module-level handler

Validation:
- Test 1: __del__ cleanup → 0 sessions leaked ✓
- Test 2: atexit cleanup → 0 sessions leaked ✓
- test_gemini_session_leak.py: Automated validation

Related: BerriAI#14540 (broader OOM issue tracking)
MyPy was failing because llm_provider parameter expects Union[LlmProviders, httpxSpecialProvider], not a string.

Changed from string "openai" to LlmProviders.OPENAI enum value.
- Move test_oom_fixes.py to tests/test_litellm/llms/
- Move test_gemini_session_leak.py to tests/test_litellm/llms/custom_httpx/
- Fix pytest warning: use pytest.skip() instead of return True

This ensures CI actually runs our OOM fix validation tests.
…sion creation

- Make _get_http_session() async with asyncio.Lock protection
- Prevents multiple concurrent requests from creating orphaned sessions
- Add concurrent load test (50 parallel requests) to validate fix
- Test confirms only 1 session created under concurrent load

Critical fix: Previous implementation had race condition where
concurrent guardrail checks could create multiple sessions,
defeating the shared session pattern and causing memory leaks.
Move asyncio.Lock creation from lazy initialization in _get_http_session()
to __init__. The previous lazy init had a race condition where concurrent
coroutines could both see _session_lock as None, both create locks, and
end up with different lock instances - defeating the synchronization.

asyncio.Lock() can be safely created without an event loop; it only
requires one when awaited.
@rsp2k
Copy link
Contributor Author

rsp2k commented Jan 18, 2026

Yes, ready for review! Just pushed a fix for a race condition in the Presidio session lock initialization - the lock was being created lazily which could cause concurrent coroutines to end up with different lock instances.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants