Skip to content

fix: close streaming connections to prevent connection pool exhaustion#21213

Merged
ryan-crabbe merged 12 commits intomainfrom
litellm_fix_streaming_connection_pool_leak
Feb 21, 2026
Merged

fix: close streaming connections to prevent connection pool exhaustion#21213
ryan-crabbe merged 12 commits intomainfrom
litellm_fix_streaming_connection_pool_leak

Conversation

@ryan-crabbe
Copy link
Collaborator

@ryan-crabbe ryan-crabbe commented Feb 14, 2026

Relevant issues

Fixes #9178 — Connection to backing provider leaking when source request cancels
Fixes #19549 — TCP connections starvation issue on 1.80.10

Helps:

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.

Type

🐛 Bug Fix
✅ Test

Changes

The proxy's CustomStreamWrapper is the universal streaming adapter litellm uses to wrap provider specific streams and normalizes them to a single response.

When we create a CustomStreamWrapper, we pass in the raw provider stream, then this raw stream object has a reference to the HTTP connection and we process those chunks as they come in.

When a client disconnected mid-stream, the client → proxy connection closed, but the proxy → provider connection was never released back to the pool. Over time this filled the connection pool and this would cause requests to hang.

The problem is 3 fold (after a lot of digging):

  1. Closing the stream didn't actually close the connection. The aiohttp transport wrapped the response using content= instead of stream= when building the httpx.Response. This silently replaced the real stream with a different wrapper that when calling aclose() on the response did nothing and the underlying HTTP connection was never returned to the pool. This leaked on every streaming request, even ones that completed successfully.

  2. Nobody was calling close in the first place. Even if aclose() had worked, neither the proxy's async_data_generator nor the router's stream_with_fallbacks had finally blocks to call it. When a client disconnected, the response was just abandoned without cleanup. However clients that didn't disconnect were okay and cleaned up correctly. (Also, these finally blocks weren't running because it was just endin the whole thing so had to use anyio to shield cancellation so it would run).

  3. Doesn't work for Uvicorn 0.28–0.32 because claimed to support ASGI spec 2.4 but didn't actually implement disconnect signaling. Starlette saw the spec version and skipped its own fallback disconnect detection. This has since been fixed and 0.32+ references ASGI spec 2.3.

- Add aclose() to CustomStreamWrapper to delegate to underlying stream
- Add finally block in async_data_generator to release HTTP connections
- Thread shared_session through async_streaming to reuse connection pool
- Set finite default timeout (600s) in _get_openai_client
@vercel
Copy link

vercel bot commented Feb 14, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Feb 21, 2026 8:46pm

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 14, 2026

Greptile Summary

This PR fixes a critical connection pool exhaustion bug in streaming responses. The fix addresses three root causes:

Key Changes:

  • Transport layer fix: Changed content= to stream= in aiohttp_transport.py so aclose() properly propagates to the underlying stream, releasing HTTP connections back to the pool
  • Cleanup infrastructure: Added aclose() methods with cancellation shielding (anyio.CancelScope(shield=True)) at three layers: CustomStreamWrapper, async_data_generator, and stream_with_fallbacks
  • Idempotency: The aclose() implementation sets self.completion_stream = None after closing to prevent double-close issues
  • Version constraint: Updated uvicorn to >=0.32.1,<1.0.0 to ensure proper ASGI disconnect signaling (versions 0.28-0.32 had broken disconnect detection)

The fix ensures HTTP connections are properly released in all scenarios: client disconnects, normal completion, mid-stream errors, and task cancellation. Comprehensive test coverage validates all cleanup paths.

Confidence Score: 5/5

  • This PR is safe to merge - it fixes a critical production bug with well-tested infrastructure changes
  • Score of 5 reflects: (1) the fix directly addresses a well-documented connection pool leak, (2) changes follow defensive coding patterns with exception handling and idempotency guards, (3) comprehensive test coverage including edge cases like cancellation, (4) previous Greptile review comments have been addressed in follow-up commits, (5) the PR scope is focused and isolated to the streaming cleanup issue
  • No files require special attention - all changes are well-structured and thoroughly tested

Important Files Changed

Filename Overview
litellm/llms/custom_httpx/aiohttp_transport.py Fixed critical bug by changing content= to stream= so aclose() properly releases HTTP connections
litellm/litellm_core_utils/streaming_handler.py Added idempotent aclose() method with cancellation shielding to properly cleanup streams
litellm/proxy/proxy_server.py Added finally block to async_data_generator to close response streams with cancellation shielding
litellm/router.py Added finally block to stream_with_fallbacks to cleanup both primary and fallback response streams
pyproject.toml Updated uvicorn constraint to >=0.32.1,<1.0.0 to ensure proper ASGI disconnect signaling
tests/test_litellm/test_streaming_connection_cleanup.py Comprehensive test coverage for transport layer, CustomStreamWrapper, and router cleanup behavior

Sequence Diagram

sequenceDiagram
    participant Client
    participant Proxy as proxy_server.py<br/>async_data_generator
    participant Router as router.py<br/>stream_with_fallbacks
    participant Wrapper as CustomStreamWrapper
    participant Transport as AiohttpTransport
    participant Provider as LLM Provider

    Client->>Proxy: Stream request
    Proxy->>Router: Forward to router
    Router->>Wrapper: Wrap provider stream
    Wrapper->>Transport: Get httpx.Response
    Transport->>Provider: HTTP request
    Provider-->>Transport: Streaming response
    Transport-->>Wrapper: stream= (not content=)
    
    alt Client disconnects mid-stream
        Client-xProxy: Disconnect
        Note over Proxy: finally block executes
        Proxy->>Router: Close generator
        Note over Router: finally block executes
        Router->>Wrapper: await aclose()
        Note over Wrapper: Shield from cancellation
        Wrapper->>Transport: Close stream
        Transport->>Provider: Release connection to pool
    else Normal completion
        Provider-->>Wrapper: All chunks
        Wrapper-->>Router: Stream chunks
        Router-->>Proxy: Stream chunks
        Proxy-->>Client: SSE events
        Note over Proxy,Router: finally blocks still execute
        Proxy->>Router: Cleanup
        Router->>Wrapper: await aclose()
        Wrapper->>Transport: Close stream
        Transport->>Provider: Release connection to pool
    end
Loading

Last reviewed commit: 28c7cc6

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

api_base: Optional[str] = None,
api_version: Optional[str] = None,
timeout: Union[float, httpx.Timeout] = httpx.Timeout(None),
timeout: Union[float, httpx.Timeout] = httpx.Timeout(timeout=600.0, connect=5.0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default timeout change is a behavioral risk

Changing the default from httpx.Timeout(None) (no timeout) to httpx.Timeout(timeout=600.0, connect=5.0) affects any caller of _get_openai_client that doesn't explicitly pass a timeout parameter. Specifically, the moderations API call in litellm/main.py:6150 does not pass a timeout and will now inherit this 600s default instead of no timeout.

While 600s is generous and matches common defaults, this is a silent behavioral change that could cause timeout errors for long-running moderation calls that previously had no timeout. Consider whether this change should be scoped more narrowly (e.g., only in the streaming path) or if it's intentionally applied globally.

@ryan-crabbe ryan-crabbe marked this pull request as draft February 14, 2026 20:04
Address three additional root causes that prevented connection cleanup
on client disconnect: Starlette/Uvicorn disconnect detection gap,
content= vs stream= no-op wrapper in aiohttp transport, and anyio
CancelledError interrupting cleanup awaits.
@ryan-crabbe
Copy link
Collaborator Author

@greptileai review this please

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 14, 2026

Greptile Overview

Greptile Summary

This PR fixes a connection pool exhaustion bug in the proxy's streaming path. When clients disconnected mid-stream, the upstream HTTP connections (proxy → LLM provider) were never closed, eventually filling the connection pool and blocking new requests.\n\nKey changes:\n- aiohttp_transport.py: Fixes a critical bug where content= was used instead of stream= in httpx.Response(), which prevented aclose() from propagating to the underlying AiohttpResponseStream — the root cause of connections never being released\n- streaming_handler.py: Adds aclose() to CustomStreamWrapper with anyio.CancelScope(shield=True) so cleanup completes even under task cancellation\n- proxy_server.py: Adds a finally block to async_data_generator to call response.aclose() when the generator exits (normal or early)\n- router.py: Adds a finally block in stream_with_fallbacks to close the model response stream on generator exit\n- common_request_processing.py: Monkey-patches StreamingResponse.__call__ to restore client-disconnect detection lost in Starlette >= 0.45.3 (Uvicorn doesn't implement ASGI 2.4 disconnect signaling)\n- openai.py: Passes shared_session to the async streaming path so the shared connection pool is used instead of a fallback pool; changes default timeout from None to 600s\n\nConcerns:\n- The StreamingResponse monkey-patch depends on starlette._utils.collapse_excgroups, a private API that could break on Starlette upgrades\n- The default timeout change from None → 600s in _get_openai_client is a global behavioral change (already noted in prior review)

Confidence Score: 3/5

  • This PR fixes a real connection pool leak but introduces a global monkey-patch on StreamingResponse using a private Starlette API, which adds fragility risk on upgrades.
  • Score of 3 reflects: (1) the core fix (content= → stream=, aclose propagation, finally blocks) is correct and well-tested; (2) however, the global monkey-patch of StreamingResponse.call using private Starlette internals (starlette._utils.collapse_excgroups) is a maintenance risk; (3) the default timeout change from None to 600s is a silent behavioral change affecting all callers; (4) comprehensive test coverage with 10+ regression tests is a strong positive.
  • Pay close attention to litellm/proxy/common_request_processing.py (global monkey-patch with private API dependency) and litellm/llms/openai/openai.py (global timeout default change).

Important Files Changed

Filename Overview
litellm/litellm_core_utils/streaming_handler.py Adds aclose() to CustomStreamWrapper for proper stream cleanup with anyio cancellation shielding. Correct approach, minor concern about non-idempotent close and inline import.
litellm/llms/custom_httpx/aiohttp_transport.py Critical fix: Changes content= to stream= in httpx.Response constructor so that aclose() properly propagates to AiohttpResponseStream, enabling connection pool release.
litellm/llms/openai/openai.py Passes shared_session to the async streaming path and changes default timeout from None to 600s. The timeout change is a global behavioral shift already flagged in prior review threads.
litellm/proxy/common_request_processing.py Monkey-patches StreamingResponse.call globally at import time to restore disconnect detection. Uses private Starlette API (starlette._utils.collapse_excgroups) which may break on upgrades.
litellm/proxy/proxy_server.py Adds a finally block to async_data_generator that calls response.aclose() to release HTTP connections on generator exit. Clean and correct.
litellm/router.py Adds finally block in stream_with_fallbacks to close model_response on generator exit, with anyio cancellation shielding. Minor import ordering issue with anyio.
tests/test_litellm/test_streaming_connection_cleanup.py Comprehensive regression test suite covering: transport stream parameter, aclose propagation, fallback-to-close, cancellation shielding, router stream cleanup, and monkey-patch verification. All mock-based.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Proxy as Proxy (StreamingResponse)
    participant Generator as async_data_generator
    participant Router as Router.stream_with_fallbacks
    participant CSW as CustomStreamWrapper
    participant Transport as AiohttpTransport
    participant Provider as LLM Provider

    Client->>Proxy: POST /chat/completions (stream=true)
    Proxy->>Generator: Start streaming
    Generator->>Router: iterate response
    Router->>CSW: iterate chunks
    CSW->>Transport: iterate stream (stream= param)
    Transport->>Provider: HTTP connection (shared_session)
    Provider-->>Transport: SSE chunks
    Transport-->>CSW: bytes
    CSW-->>Router: ModelResponseStream
    Router-->>Generator: chunk
    Generator-->>Proxy: data: {...}
    Proxy-->>Client: SSE chunk

    Note over Client: Client disconnects mid-stream

    Proxy->>Proxy: _disconnect_aware_call detects disconnect
    Proxy->>Generator: cancel task group
    Generator->>Generator: finally: response.aclose()
    Generator->>CSW: aclose() [shielded]
    CSW->>Transport: aclose()
    Transport->>Provider: Release HTTP connection
    Note over Transport,Provider: Connection returned to pool
Loading

Last reviewed commit: 9f09b02

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

import orjson
from fastapi import HTTPException, Request, status
from fastapi.responses import JSONResponse, Response, StreamingResponse
from starlette._utils import collapse_excgroups
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reliance on private Starlette API

starlette._utils.collapse_excgroups is a private (underscore-prefixed) module. Private APIs can change or be removed in any Starlette release without notice, which could break this monkey-patch silently on upgrade. Consider either:

  1. Vendoring the collapse_excgroups implementation (it's typically a small context manager for flattening ExceptionGroups)
  2. Adding a version guard or try/except fallback so the proxy degrades gracefully if this import fails

Comment on lines +158 to +174
async def aclose(self):
if self.completion_stream is not None:
# Shield from anyio cancellation so cleanup awaits can complete.
# Without this, CancelledError is thrown into every await during
# task group cancellation, preventing HTTP connection release.
import anyio

with anyio.CancelScope(shield=True):
try:
if hasattr(self.completion_stream, "aclose"):
await self.completion_stream.aclose()
elif hasattr(self.completion_stream, "close"):
result = self.completion_stream.close()
if result is not None:
await result
except BaseException:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aclose() is not idempotent — may double-close the stream

aclose() does not set self.completion_stream = None after closing, so calling aclose() twice will attempt to close the underlying stream twice. This can happen because aclose() is called from multiple locations (e.g., async_data_generator's finally block AND stream_with_fallbacks' finally block). While the broad except BaseException: pass prevents errors from propagating, it would be cleaner to guard against double-close:

Suggested change
async def aclose(self):
if self.completion_stream is not None:
# Shield from anyio cancellation so cleanup awaits can complete.
# Without this, CancelledError is thrown into every await during
# task group cancellation, preventing HTTP connection release.
import anyio
with anyio.CancelScope(shield=True):
try:
if hasattr(self.completion_stream, "aclose"):
await self.completion_stream.aclose()
elif hasattr(self.completion_stream, "close"):
result = self.completion_stream.close()
if result is not None:
await result
except BaseException:
pass
async def aclose(self):
if self.completion_stream is not None:
# Shield from anyio cancellation so cleanup awaits can complete.
# Without this, CancelledError is thrown into every await during
# task group cancellation, preventing HTTP connection release.
import anyio
with anyio.CancelScope(shield=True):
stream = self.completion_stream
self.completion_stream = None
try:
if hasattr(stream, "aclose"):
await stream.aclose()
elif hasattr(stream, "close"):
result = stream.close()
if result is not None:
await result
except BaseException:
pass

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines 15 to 18
import json

import anyio
import logging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import ordering: anyio mixed with stdlib imports

anyio is a third-party package but is inserted between stdlib imports (json and logging). Per convention, third-party imports should be separated from stdlib imports with a blank line.

Suggested change
import json
import anyio
import logging
import json
import logging

And add import anyio in the third-party imports section (after the blank line following stdlib imports).

Context Used: Context from dashboard - CLAUDE.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

# Shield from anyio cancellation so cleanup awaits can complete.
# Without this, CancelledError is thrown into every await during
# task group cancellation, preventing HTTP connection release.
import anyio
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline import of anyio inside method

Per the CLAUDE.md style guide: "Avoid imports within methods — place all imports at the top of the file (module-level)." If anyio is intentionally deferred here because streaming_handler.py is in the core SDK and anyio is only a transitive dependency, consider adding a comment explaining why, or move it to the top of the file since httpx (which already depends on anyio) is already a top-level import.

Context Used: Context from dashboard - CLAUDE.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Uvicorn 0.31.x falsely advertised ASGI spec_version "2.4" without
implementing send() raising OSError on disconnect. Starlette trusted
this and skipped its disconnect listener, causing generators to run
forever. Uvicorn 0.32.1 corrected this to "2.3", restoring native
disconnect detection. The monkey-patch is no longer needed.

Also adds fallback_response cleanup in stream_with_fallbacks and
moves inline import anyio to module level in streaming_handler.
@ryan-crabbe
Copy link
Collaborator Author

@greptileai please review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 15, 2026

Greptile Overview

Greptile Summary

This PR fixes streaming connection pool exhaustion by ensuring HTTP connections are properly released when clients disconnect mid-stream. The fix touches three layers: (1) adds aclose() to CustomStreamWrapper with anyio cancellation shielding, (2) adds finally blocks in async_data_generator and stream_with_fallbacks to close streams on exit, (3) passes shared_session through to the streaming path so it uses the shared connection pool, and (4) fixes content=stream= in the aiohttp transport to preserve lazy streaming semantics.

  • Core fix: CustomStreamWrapper.aclose() delegates to the underlying stream's aclose()/close() with anyio.CancelScope(shield=True) so cleanup completes even during task cancellation
  • Proxy layer: async_data_generator finally block ensures response.aclose() is called on client disconnect, normal completion, and error paths
  • Router layer: stream_with_fallbacks finally block closes both model_response and fallback_response streams
  • Connection pool sharing: shared_session is now passed through to astreaming(), preventing use of a fallback connection pool
  • aiohttp transport: Changed from content= to stream= in httpx.Response constructor so aclose() propagates correctly to AiohttpResponseStream
  • Dependency: uvicorn bumped from ^0.31.1 to >=0.32.1 to fix ASGI spec_version mismatch that prevented Starlette's disconnect detection
  • Idempotency concern: CustomStreamWrapper.aclose() does not set completion_stream = None after closing, creating a risk of double-close on the underlying HTTP connection when multiple cleanup paths converge
  • Good test coverage: 10+ regression tests covering early exit, normal completion, mid-stream error, cancellation shielding, and fallback cleanup — all mock-based with no real network calls

Confidence Score: 4/5

  • This PR is safe to merge with one minor fix recommended for idempotency in aclose().
  • The PR addresses a real and critical connection pool exhaustion bug with a well-structured, multi-layer fix. The content= to stream= change is correct, shared_session passthrough is straightforward, and the finally block cleanup is the right pattern. The anyio cancellation shielding is necessary and properly implemented. The one concern is the missing idempotency guard in CustomStreamWrapper.aclose() which could lead to double-closing underlying HTTP connections when multiple cleanup paths converge, though the broad exception handlers prevent crashes. Test coverage is thorough with 10+ mock-based regression tests.
  • litellm/litellm_core_utils/streaming_handler.py needs an idempotency guard in aclose() to prevent double-closing the underlying stream. pyproject.toml uvicorn constraint has no upper bound.

Important Files Changed

Filename Overview
litellm/litellm_core_utils/streaming_handler.py Adds aclose() method to CustomStreamWrapper with anyio cancellation shielding and fallback from aclose() to close(). Missing idempotency guard (should set completion_stream = None after closing).
litellm/llms/custom_httpx/aiohttp_transport.py Changed content= to stream= in httpx.Response constructor, correctly preserving lazy streaming semantics and enabling aclose() propagation to AiohttpResponseStream.
litellm/llms/openai/openai.py Passes shared_session through to astreaming() so the streaming path uses the shared connection pool instead of creating a fallback one.
litellm/proxy/proxy_server.py Adds finally block in async_data_generator to call response.aclose(), ensuring HTTP connections are released back to the pool on client disconnect or normal completion.
litellm/router.py Adds finally block in stream_with_fallbacks to close both model_response and fallback_response with anyio cancellation shielding. Import ordering issue with anyio mixed into stdlib imports.
pyproject.toml Bumps uvicorn from ^0.31.1 to >=0.32.1. The >= constraint has no upper bound, which could allow future breaking changes in uvicorn 1.0+.
tests/test_litellm/litellm_core_utils/test_streaming_handler.py Adds three well-structured unit tests for CustomStreamWrapper.aclose() covering normal, no-aclose, and None-stream cases.
tests/test_litellm/proxy/test_proxy_server.py Adds three proxy-level tests verifying async_data_generator calls response.aclose() on early exit, normal completion, and mid-stream error. All mock-based, no real network calls.
tests/test_litellm/test_streaming_connection_cleanup.py New comprehensive regression test file covering aiohttp transport, CustomStreamWrapper.aclose(), cancellation shielding, and router stream_with_fallbacks cleanup. All mock-based.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Proxy as Proxy (async_data_generator)
    participant Router as Router (stream_with_fallbacks)
    participant CSW as CustomStreamWrapper
    participant Provider as Provider Stream (httpx/aiohttp)

    Client->>Proxy: POST /chat/completions (stream=true)
    Proxy->>Router: iterate model_response
    Router->>CSW: async for chunk in model_response
    CSW->>Provider: read chunks from HTTP stream
    Provider-->>CSW: chunk data
    CSW-->>Router: yield chunk
    Router-->>Proxy: yield chunk
    Proxy-->>Client: SSE data chunk

    Note over Client: Client disconnects mid-stream

    Client--xProxy: connection closed
    Proxy->>Proxy: finally: response.aclose()
    Proxy->>CSW: aclose() [FallbackStreamWrapper]
    CSW->>Router: generator.aclose() triggers finally
    Router->>Router: finally: anyio.CancelScope(shield=True)
    Router->>CSW: model_response.aclose()
    CSW->>CSW: anyio.CancelScope(shield=True)
    CSW->>Provider: completion_stream.aclose()
    Provider->>Provider: Release HTTP connection back to pool
    Note over Provider: Connection available for reuse
Loading

Last reviewed commit: f5e3606

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +159 to +176
async def aclose(self):
if self.completion_stream is not None:
# Shield from anyio cancellation so cleanup awaits can complete.
# Without this, CancelledError is thrown into every await during
# task group cancellation, preventing HTTP connection release.
with anyio.CancelScope(shield=True):
try:
if hasattr(self.completion_stream, "aclose"):
await self.completion_stream.aclose()
elif hasattr(self.completion_stream, "close"):
result = self.completion_stream.close()
if result is not None:
await result
except BaseException as e:
verbose_logger.debug(
"CustomStreamWrapper.aclose: error closing completion_stream: %s",
e,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aclose() is not idempotent — may double-close the underlying stream

aclose() does not set self.completion_stream = None after closing, so calling it twice will attempt to close the underlying stream twice. This can happen because aclose() is called from multiple code paths:

  1. async_data_generator's finally block calls response.aclose() (where response is a FallbackStreamWrapper inheriting this method)
  2. stream_with_fallbacks's finally block calls model_response.aclose() (triggered when the FallbackStreamWrapper's inner generator is closed)

While the except BaseException prevents crashes, double-closing AiohttpResponseStream calls __aexit__ twice on the aiohttp response, which can corrupt connection pool state. Adding a guard would make this robust:

Suggested change
async def aclose(self):
if self.completion_stream is not None:
# Shield from anyio cancellation so cleanup awaits can complete.
# Without this, CancelledError is thrown into every await during
# task group cancellation, preventing HTTP connection release.
with anyio.CancelScope(shield=True):
try:
if hasattr(self.completion_stream, "aclose"):
await self.completion_stream.aclose()
elif hasattr(self.completion_stream, "close"):
result = self.completion_stream.close()
if result is not None:
await result
except BaseException as e:
verbose_logger.debug(
"CustomStreamWrapper.aclose: error closing completion_stream: %s",
e,
)
async def aclose(self):
if self.completion_stream is not None:
stream_to_close = self.completion_stream
self.completion_stream = None
# Shield from anyio cancellation so cleanup awaits can complete.
# Without this, CancelledError is thrown into every await during
# task group cancellation, preventing HTTP connection release.
with anyio.CancelScope(shield=True):
try:
if hasattr(stream_to_close, "aclose"):
await stream_to_close.aclose()
elif hasattr(stream_to_close, "close"):
result = stream_to_close.close()
if result is not None:
await result
except BaseException as e:
verbose_logger.debug(
"CustomStreamWrapper.aclose: error closing completion_stream: %s",
e,
)

pyproject.toml Outdated
numpydoc = {version = "*", optional = true} # used in utils.py

uvicorn = {version = "^0.31.1", optional = true}
uvicorn = {version = ">=0.32.1", optional = true}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unbounded uvicorn version constraint

The constraint >=0.32.1 has no upper bound, meaning it allows any future version including a hypothetical uvicorn 1.0+ with breaking changes. Other dependencies in this file use more conservative constraints (e.g., fastapi = ">=0.120.1" is similarly unbounded, but gunicorn = "^23.0.0" and pydantic = "^2.5.0" are capped). Consider adding an upper bound for consistency and safety:

Suggested change
uvicorn = {version = ">=0.32.1", optional = true}
uvicorn = {version = ">=0.32.1,<1.0.0", optional = true}

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@ryan-crabbe
Copy link
Collaborator Author

fixed @greptileai re review

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +5145 to +5155
finally:
# Close the response stream to release the underlying HTTP connection
# back to the connection pool. This prevents pool exhaustion when
# clients disconnect mid-stream.
if hasattr(response, "aclose"):
try:
await response.aclose()
except Exception as e:
verbose_proxy_logger.debug(
"async_data_generator: error closing response stream: %s", e
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing cancellation shielding in cleanup

The other two cleanup sites (CustomStreamWrapper.aclose() at streaming_handler.py:166 and stream_with_fallbacks at router.py:1603) both use anyio.CancelScope(shield=True) to protect the aclose() from being interrupted by task cancellation. This finally block does not shield its await response.aclose() call.

When a client disconnects, ASGI servers may cancel the task, injecting CancelledError into any pending await. Without shielding, response.aclose() could itself be cancelled before completing, defeating the purpose of the cleanup.

Suggested change
finally:
# Close the response stream to release the underlying HTTP connection
# back to the connection pool. This prevents pool exhaustion when
# clients disconnect mid-stream.
if hasattr(response, "aclose"):
try:
await response.aclose()
except Exception as e:
verbose_proxy_logger.debug(
"async_data_generator: error closing response stream: %s", e
)
finally:
# Close the response stream to release the underlying HTTP connection
# back to the connection pool. This prevents pool exhaustion when
# clients disconnect mid-stream.
# Shield from cancellation so the close awaits can complete.
import anyio
with anyio.CancelScope(shield=True):
if hasattr(response, "aclose"):
try:
await response.aclose()
except Exception as e:
verbose_proxy_logger.debug(
"async_data_generator: error closing response stream: %s", e
)

@ryan-crabbe
Copy link
Collaborator Author

fixed @greptileai re review

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@ryan-crabbe
Copy link
Collaborator Author

@greptileai please review this

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

# back to the connection pool. This prevents pool exhaustion when
# clients disconnect mid-stream.
# Shield from cancellation so the close awaits can complete.
import anyio
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline import violates style guide

Per CLAUDE.md: "Avoid imports within methods — place all imports at the top of the file (module-level)." The import anyio should be moved to the top-level imports section of proxy_server.py alongside the other third-party imports (around line 33). The same import is already correctly placed at module-level in both streaming_handler.py and router.py.

Suggested change
import anyio
import anyio # TODO: move to top-level imports per CLAUDE.md style guide

Context Used: Context from dashboard - CLAUDE.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@ryan-crabbe ryan-crabbe marked this pull request as ready for review February 18, 2026 23:51
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Resolve conflict in test_proxy_server.py: keep both async_data_generator
cleanup tests and store_model_in_db DB config override tests.
@ryan-crabbe ryan-crabbe merged commit 4c393de into main Feb 21, 2026
12 of 34 checks passed
@ryan-crabbe ryan-crabbe deleted the litellm_fix_streaming_connection_pool_leak branch February 21, 2026 20:45
ryan-crabbe added a commit that referenced this pull request Mar 4, 2026
Rewrite blog for accuracy against PR #21213, add interactive diagrams,
fix technical inaccuracies, and improve readability.
@ryan-crabbe ryan-crabbe mentioned this pull request Mar 4, 2026
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: TCP connections starvation issue on 1.80.10 [Bug]: Connection to backing provider leaking when the source request cancels

1 participant