Skip to content

separate streaming clients for avoiding read timeouts#2832

Merged
akshaydeo merged 2 commits intomainfrom
04-19-separate_streaming_clients_for_avoiding_read_timeouts
Apr 20, 2026
Merged

separate streaming clients for avoiding read timeouts#2832
akshaydeo merged 2 commits intomainfrom
04-19-separate_streaming_clients_for_avoiding_read_timeouts

Conversation

@akshaydeo
Copy link
Copy Markdown
Contributor

Summary

Briefly explain the purpose of this PR and the problem it solves.

Changes

  • What was changed and why
  • Any notable design decisions or trade-offs

Type of change

  • Bug fix
  • Feature
  • Refactor
  • Documentation
  • Chore/CI

Affected areas

  • Core (Go)
  • Transports (HTTP)
  • Providers/Integrations
  • Plugins
  • UI (React)
  • Docs

How to test

Describe the steps to validate this change. Include commands and expected outcomes.

# Core/Transports
go version
go test ./...

# UI
cd ui
pnpm i || npm i
pnpm test || npm test
pnpm build || npm run build

If adding new configs or environment variables, document them here.

Screenshots/Recordings

If UI changes, add before/after screenshots or short clips.

Breaking changes

  • Yes
  • No

If yes, describe impact and migration instructions.

Related issues

Link related issues and discussions. Example: Closes #123

Security considerations

Note any security implications (auth, secrets, PII, sandboxing, etc.).

Checklist

  • I read docs/contributing/README.md and followed the guidelines
  • I added/updated tests where appropriate
  • I updated documentation where needed
  • I verified builds succeed (Go and UI)
  • I verified the CI pipeline passes locally if applicable

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 19, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4cfb0662-21b7-4473-8aad-0f7b908562f5

📥 Commits

Reviewing files that changed from the base of the PR and between b9d2397 and 72c7bd0.

⛔ Files ignored due to path filters (1)
  • core/go.sum is excluded by !**/*.sum
📒 Files selected for processing (39)
  • AGENTS.md
  • core/internal/llmtests/chat_completion_stream.go
  • core/internal/llmtests/response_validation.go
  • core/providers/anthropic/anthropic.go
  • core/providers/anthropic/chat_test.go
  • core/providers/azure/azure.go
  • core/providers/bedrock/bedrock.go
  • core/providers/bedrock/transport_test.go
  • core/providers/cerebras/cerebras.go
  • core/providers/cohere/cohere.go
  • core/providers/elevenlabs/elevenlabs.go
  • core/providers/fireworks/fireworks.go
  • core/providers/gemini/gemini.go
  • core/providers/gemini/list_models_single_payload_test.go
  • core/providers/groq/groq.go
  • core/providers/huggingface/huggingface.go
  • core/providers/mistral/custom_provider_test.go
  • core/providers/mistral/mistral.go
  • core/providers/mistral/transcription_test.go
  • core/providers/nebius/nebius.go
  • core/providers/ollama/ollama.go
  • core/providers/openai/openai.go
  • core/providers/openrouter/openrouter.go
  • core/providers/parasail/parasail.go
  • core/providers/perplexity/perplexity.go
  • core/providers/replicate/replicate.go
  • core/providers/sgl/sgl.go
  • core/providers/utils/decompression_test.go
  • core/providers/utils/large_response.go
  • core/providers/utils/make_request_test.go
  • core/providers/utils/streaming_client_test.go
  • core/providers/utils/utils.go
  • core/providers/utils/utils_test.go
  • core/providers/vertex/models.go
  • core/providers/vertex/rerank.go
  • core/providers/vertex/vertex.go
  • core/providers/vllm/vllm.go
  • core/providers/xai/xai.go
  • framework/configstore/tables/team.go
✅ Files skipped from review due to trivial changes (8)
  • core/providers/vertex/models.go
  • core/internal/llmtests/chat_completion_stream.go
  • core/providers/utils/make_request_test.go
  • core/providers/utils/utils_test.go
  • framework/configstore/tables/team.go
  • core/providers/mistral/custom_provider_test.go
  • core/providers/huggingface/huggingface.go
  • core/providers/utils/streaming_client_test.go
🚧 Files skipped from review as they are similar to previous changes (22)
  • core/providers/vertex/rerank.go
  • core/providers/nebius/nebius.go
  • core/providers/utils/large_response.go
  • core/providers/bedrock/transport_test.go
  • core/internal/llmtests/response_validation.go
  • core/providers/cerebras/cerebras.go
  • core/providers/ollama/ollama.go
  • core/providers/sgl/sgl.go
  • core/providers/elevenlabs/elevenlabs.go
  • core/providers/groq/groq.go
  • core/providers/xai/xai.go
  • core/providers/gemini/list_models_single_payload_test.go
  • core/providers/mistral/transcription_test.go
  • core/providers/cohere/cohere.go
  • core/providers/anthropic/chat_test.go
  • core/providers/openrouter/openrouter.go
  • core/providers/utils/utils.go
  • core/providers/perplexity/perplexity.go
  • AGENTS.md
  • core/providers/replicate/replicate.go
  • core/providers/vllm/vllm.go
  • core/providers/openai/openai.go

📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes

    • Improved streaming stability by splitting streaming vs. unary network paths to avoid premature stream termination.
  • New Features

    • Per-provider, streaming-optimized HTTP client and a new default rerank model for Vertex.
  • Documentation

    • Expanded streaming vs. non-streaming timeout guidance and updated provider test-harness instructions.
  • Tests

    • Added end-to-end streaming client tests and updated tests/validation to align with dispatcher-populated metadata and stream-specific checks.

Walkthrough

Multiple providers now include a dedicated streaming HTTP client (fasthttp or net/http) built via providerUtils; streaming endpoints use this client while unary requests keep the original timeout-bounded client. Utilities, tests, Vertex rerank default, and several test assertions were updated accordingly.

Changes

Cohort / File(s) Summary
Provider streaming client additions
core/providers/.../anthropic.go, .../azure.go, .../cohere.go, .../elevenlabs.go, .../gemini.go, .../openai.go, .../replicate.go, .../vertex.go, .../cerebras.go, .../fireworks.go, .../groq.go, .../huggingface.go, .../mistral.go, .../nebius.go, .../ollama.go, .../openrouter.go, .../parasail.go, .../perplexity.go, .../sgl.go, .../vllm.go, .../xai.go, core/providers/bedrock/bedrock.go
Added provider-level streamingClient field (usually *fasthttp.Client; Bedrock uses *http.Client) initialized via providerUtils.BuildStreamingClient / BuildStreamingHTTPClient. Streaming entrypoints now use provider.streamingClient; unary paths continue to use provider.client.
Removed per-request streaming clones
core/providers/gemini/gemini.go, .../replicate.go, .../bedrock.go, .../elevenlabs.go
Eliminated ad-hoc per-request client cloning; providers now reuse the prebuilt provider-level streaming client.
Streaming wiring & passthrough changes
various core/providers/.../*Stream, passthrough handlers, image/speech streams
Streaming handlers updated to accept/use provider.streamingClient or call .Do on it so streams are not subject to unary ReadTimeouts; passthrough streaming uses provider.streamingClient for PrepareResponseStreaming.
Streaming utilities & tests
core/providers/utils/utils.go, core/providers/utils/streaming_client_test.go, core/providers/utils/decompression_test.go, core/providers/utils/make_request_test.go, core/providers/utils/utils_test.go, core/providers/utils/large_response.go
Added BuildStreamingClient(*fasthttp.Client) and BuildStreamingHTTPClient(*http.Client); changed CloneFastHTTPClientConfig to stop copying RetryIf. BuildLargeResponseClient now zeroes timeouts for large downloads. Added tests verifying streaming-client behavior and base-client non-mutation; minor test adjustments.
Tests & harness updates
core/providers/bedrock/transport_test.go, core/providers/anthropic/chat_test.go, core/providers/gemini/list_models_single_payload_test.go, core/providers/mistral/*_test.go, core/internal/llmtests/*, core/providers/utils/*_test.go
Adjusted tests to use shared redirect transport in Bedrock tests, updated assertions moved to dispatcher (ExtraFields), added per-chunk stream validations, and adapted tests to new context capture keys and streaming-client semantics.
Vertex rerank default
core/providers/vertex/models.go, core/providers/vertex/rerank.go
Added vertexDefaultRerankModel = "semantic-ranker-default@latest" and ensured ToVertexRankRequest always sets rankRequest.Model, defaulting to the new constant when input is empty.
Type modernization
framework/configstore/tables/team.go
Changed JSON-parsed fields from map[string]interface{} to map[string]any.
Docs
AGENTS.md
Documented per-provider streaming-client pattern, unary vs streaming timeout semantics, Bedrock HTTP-client guidance, and test guidance (make test-core and flags).

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client
  participant Provider as ProviderInstance
  participant StreamClient as StreamingClient
  participant Remote as ExternalAPI

  Client->>Provider: Start streaming request
  Provider->>StreamClient: Use provider.streamingClient (pass to handler / call .Do)
  StreamClient->>Remote: Open long-lived HTTP stream (ReadTimeout=0)
  Remote-->>StreamClient: Streamed chunks/events
  StreamClient-->>Provider: Forward chunks (idle-reader enforces per-chunk idle)
  Provider-->>Client: Relay SSE/chunked events
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇
I split my client into two,
One to wait, one to pursue—
Streams run long with idle care,
Answers tidy, chunks to share.
Hop—deliver, swift and true!

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is entirely a template placeholder with no substantive content filled in. Summary, changes list, type selection, affected areas, test steps, breaking-changes clarification, and all other sections remain uncompleted template text. Complete the PR description by filling in the Summary section explaining the streaming-client separation issue, the Changes section detailing the modifications across all providers and utility files, selecting appropriate checkboxes (Refactor/Feature/Bug fix), marking affected areas (Core/Transports/Providers), and documenting testing approach.
Linked Issues check ⚠️ Warning The linked issue #123 (Files API Support) is entirely unrelated to the PR's streaming-client separation changes. The PR implements separate clients for avoiding read timeouts across all providers, while the issue requests File API support for advanced workflows like RAG. Remove the unrelated #123 issue link from this PR, or clarify if the PR is intended to address multiple objectives. Confirm that the streaming-client changes do not relate to Files API support and link only relevant issues if any exist.
Out of Scope Changes check ❓ Inconclusive The PR contains numerous file changes (24+ files across providers and utilities) that align with the streaming-client separation objective. However, without a proper description clarifying objectives and scope, it is unclear whether all changes (including test assertions, documentation, and model constants) are intentional in-scope modifications or unintended additions. Provide a comprehensive description detailing the scope of streaming-client changes, explain the rationale for test-assertion removals and modifications in test files, and confirm whether documentation updates and constants are part of the intended refactoring or separate additions.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'separate streaming clients for avoiding read timeouts' clearly and directly describes the main change across all provider files—introducing dedicated streaming HTTP clients to prevent read-timeout-induced stream termination.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 04-19-separate_streaming_clients_for_avoiding_read_timeouts

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.4)

level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies"


Comment @coderabbitai help to get the list of available commands and usage tips.

@akshaydeo akshaydeo mentioned this pull request Apr 19, 2026
18 tasks
Copy link
Copy Markdown
Contributor Author

akshaydeo commented Apr 19, 2026

@akshaydeo akshaydeo mentioned this pull request Apr 19, 2026
18 tasks
@akshaydeo akshaydeo marked this pull request as ready for review April 19, 2026 03:01
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 19, 2026

Confidence Score: 5/5

Safe to merge — the change is mechanical, consistently applied, and covered by new tests.

No P0/P1 issues found. Pattern applied identically across all 18+ providers, new builder functions are well-tested, Bedrock transport tests updated to wire streamingClient to the mock server, and the Gemini local helper that only set StreamResponseBody (leaving timeouts intact) is properly replaced.

No files require special attention.

Important Files Changed

Filename Overview
core/providers/utils/utils.go Adds BuildStreamingClient (fasthttp) and BuildStreamingHTTPClient (net/http) — clones base config then zeros ReadTimeout/WriteTimeout/MaxConnDuration and sets StreamResponseBody=true. Also removes deprecated RetryIf from CloneFastHTTPClientConfig.
core/providers/utils/streaming_client_test.go New test file with 6 tests covering: timeout zeroing, base-client immutability, long-stream survival (fasthttp and net/http), nil-base guard for BuildStreamingHTTPClient.
core/providers/gemini/gemini.go Removes the local buildStreamingResponseClient helper (which only set StreamResponseBody, not zeroed timeouts) and replaces all usages with centralized BuildStreamingClient.
core/providers/bedrock/bedrock.go Uses BuildStreamingHTTPClient (net/http variant); makeStreamingRequest now routes through streamingClient.
core/providers/openai/openai.go All 8 streaming methods switched to streamingClient.
core/providers/vertex/rerank.go ToVertexRankRequest now defaults to vertexDefaultRerankModel when model is empty.
core/providers/utils/large_response.go BuildLargeResponseClient now also zeros timeouts, consistent with the new streaming-client pattern.

Reviews (6): Last reviewed commit: "separate streaming clients for avoiding ..." | Re-trigger Greptile

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/providers/openai/openai.go (1)

6908-6939: ⚠️ Potential issue | 🟠 Major

Route passthrough stream URLs through the OpenAI URL helper.

PassthroughStream still concatenates BaseURL + "/v1" + path directly, so ChatGPTOAuth normalization is skipped. With ChatGPTOAuth enabled, streamed passthrough calls will still hit the wrong upstream even after moving them onto streamingClient.

Based on learnings: In core/providers/openai, when ChatGPTOAuth is enabled, ensure all OpenAI request routes to any "/v1/..." path build their URLs using OpenAIProvider.buildRequestURL(...) or OpenAIProvider.buildFullURL(...). These helpers must perform the normalization that strips the "/v1" prefix (via chatGPTOAuthPath(...)) and maps requests to the appropriate chatgpt.com backend route (chatgpt.com/backend-api/codex). Do not manually construct URLs by concatenating provider.networkConfig.BaseURL + "/v1/...". For Passthrough and PassthroughStream specifically, call buildFullURL("/v1"+path) so the same normalization logic runs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/providers/openai/openai.go` around lines 6908 - 6939, The
PassthroughStream code is manually constructing URLs via
provider.networkConfig.BaseURL + "/v1" + path which bypasses ChatGPTOAuth
normalization; update PassthroughStream to call the OpenAIProvider URL helper
(e.g., use buildFullURL("/v1"+path) or buildRequestURL as appropriate) to
construct the url so chatGPTOAuthPath()/ChatGPTOAuth normalization and mapping
to chatgpt.com backend are applied; ensure you remove the direct concatenation
and keep the rest of the request setup (headers, body, streaming client) intact
and use the resulting URL when calling fasthttpReq.SetRequestURI.
🧹 Nitpick comments (1)
core/providers/elevenlabs/elevenlabs.go (1)

351-353: Start the stream latency clock after the handshake.

startTime is still captured before Do, so the final done chunk latency includes request/auth/handshake time instead of post-handshake streaming time. If you want this metric to stay consistent with the other provider streaming handlers, move that timestamp initialization until after the stream is established.

Based on learnings: In Go streaming handlers under core/providers (e.g., core/providers/openai/openai.go), initialize the streaming startTime inside the streaming goroutine after client.Do and the stream is set up, so the final-chunk latency reflects post-handshake streaming time.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/providers/elevenlabs/elevenlabs.go` around lines 351 - 353, Move the
startTime timestamp so it measures only post-handshake streaming: remove the
early startTime := time.Now() set before calling
provider.streamingClient.Do(req, resp) and instead initialize startTime inside
the streaming goroutine immediately after provider.streamingClient.Do returns
successfully and the stream is established (the same goroutine that reads chunks
and emits the final "done" latency). Update references to startTime in the
final-chunk latency calculation accordingly so the metric excludes
request/auth/handshake time.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/providers/anthropic/anthropic.go`:
- Around line 27-28: BuildStreamingClient currently clears ReadTimeout on
streamingClient leaving no bound between connection establishment and receipt of
response headers; modify BuildStreamingClient (and similar logic at the other
referenced block) to preserve a short header/first-byte timeout: set a small
ReadTimeout (e.g. HeaderTimeout) on the streamingClient used for the initial
request/handshake or wrap activeClient.Do calls with a context/deadline that
enforces a first-byte timeout, then switch the connection to the existing
NewIdleTimeoutReader behavior for unbounded chunk reads; reference symbols:
BuildStreamingClient, streamingClient, ReadTimeout, ConfigureDialer,
activeClient.Do, and NewIdleTimeoutReader when implementing the timeout for the
header/first-byte phase.

In `@core/providers/azure/azure.go`:
- Around line 188-192: SpeechStream is still using the unary HTTP path (calling
provider.client.Do(...)) which bypasses the new streamingClient and can hit the
unary ReadTimeout; update SpeechStream to use the AzureProvider.streamingClient
(the client created by providerUtils.BuildStreamingClient and stored on
AzureProvider) instead of provider.client.Do so all streaming calls use the
streaming client with proper timeouts and avoid ReadTimeouts; locate the
SpeechStream implementation and replace calls to provider.client.Do and any
direct HTTP request construction with the streamingClient's streaming request
method/streaming API surface on AzureProvider so the streaming path consistently
uses the new client.

In `@core/providers/vertex/vertex.go`:
- Around line 102-106: Several large-response code paths still call
PrepareResponseStreaming with the unary provider.client (causing timeout
issues); update ChatCompletion, Responses, Embedding, Rerank, ImageGeneration,
ImageEdit, and CountTokens to check
ctx.Value(schemas.BifrostContextKeyLargeResponseMode) and pass the
provider.streamingClient instead of provider.client into
PrepareResponseStreaming (or otherwise route large-response reads through
streamingClient). Modify the methods in VertexProvider where
PrepareResponseStreaming(ctx, provider.client, resp) is invoked to use the ctx
flag and provider.streamingClient so large-response streaming bypasses the
timeout-bound unary client.

---

Outside diff comments:
In `@core/providers/openai/openai.go`:
- Around line 6908-6939: The PassthroughStream code is manually constructing
URLs via provider.networkConfig.BaseURL + "/v1" + path which bypasses
ChatGPTOAuth normalization; update PassthroughStream to call the OpenAIProvider
URL helper (e.g., use buildFullURL("/v1"+path) or buildRequestURL as
appropriate) to construct the url so chatGPTOAuthPath()/ChatGPTOAuth
normalization and mapping to chatgpt.com backend are applied; ensure you remove
the direct concatenation and keep the rest of the request setup (headers, body,
streaming client) intact and use the resulting URL when calling
fasthttpReq.SetRequestURI.

---

Nitpick comments:
In `@core/providers/elevenlabs/elevenlabs.go`:
- Around line 351-353: Move the startTime timestamp so it measures only
post-handshake streaming: remove the early startTime := time.Now() set before
calling provider.streamingClient.Do(req, resp) and instead initialize startTime
inside the streaming goroutine immediately after provider.streamingClient.Do
returns successfully and the stream is established (the same goroutine that
reads chunks and emits the final "done" latency). Update references to startTime
in the final-chunk latency calculation accordingly so the metric excludes
request/auth/handshake time.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7a4448a1-ec4e-4c53-99ba-0e03c55eb8be

📥 Commits

Reviewing files that changed from the base of the PR and between 98ebb2f and 8e9c116.

📒 Files selected for processing (11)
  • core/providers/anthropic/anthropic.go
  • core/providers/azure/azure.go
  • core/providers/bedrock/bedrock.go
  • core/providers/cohere/cohere.go
  • core/providers/elevenlabs/elevenlabs.go
  • core/providers/gemini/gemini.go
  • core/providers/openai/openai.go
  • core/providers/replicate/replicate.go
  • core/providers/utils/utils.go
  • core/providers/vertex/vertex.go
  • framework/configstore/tables/team.go

Comment thread core/providers/anthropic/anthropic.go
Comment thread core/providers/vertex/vertex.go
@akshaydeo akshaydeo force-pushed the 04-19-separate_streaming_clients_for_avoiding_read_timeouts branch from 8e9c116 to 772cb92 Compare April 19, 2026 03:41
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/providers/openai/openai.go (1)

6908-6939: ⚠️ Potential issue | 🟠 Major

Use the OpenAI URL helper in PassthroughStream.

Lines 6912-6915 still rebuild the URL from provider.networkConfig.BaseURL directly, so ChatGPTOAuth normalization is skipped for streaming passthrough requests. That leaves this newly-switched streaming path pointed at the wrong upstream route whenever /v1/... needs remapping.

🔧 Proposed fix
 	path := req.Path
 	if after, ok := strings.CutPrefix(path, "/v1"); ok {
 		path = after
 	}
-	url := provider.networkConfig.BaseURL + "/v1" + path
+	url := provider.buildFullURL("/v1" + path)
 	if req.RawQuery != "" {
 		url += "?" + req.RawQuery
 	}
Based on learnings: In core/providers/openai, when ChatGPTOAuth is enabled, all `/v1/...` routes — especially `Passthrough` and `PassthroughStream` — must be built via `buildRequestURL(...)` or `buildFullURL(...)` so the `/v1` prefix is normalized correctly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/providers/openai/openai.go` around lines 6908 - 6939, PassthroughStream
currently reassembles the upstream URL by concatenating
provider.networkConfig.BaseURL + "/v1" + path which skips ChatGPTOAuth
normalization; change PassthroughStream to build the target URL via the existing
helper (e.g., call buildRequestURL(...) or buildFullURL(...) used elsewhere)
instead of manual concatenation so `/v1/...` routes are normalized for
ChatGPTOAuth; update the code that sets url (lines using
provider.networkConfig.BaseURL + "/v1" + path and query handling) to use the
helper and keep the rest of the request flow (SetRequestURI, headers, SetBody,
PrepareResponseStreaming) unchanged.
♻️ Duplicate comments (2)
core/providers/vertex/vertex.go (1)

567-568: ⚠️ Potential issue | 🟠 Major

Large-response branches still route through the unary client.

These callsites still seed PrepareResponseStreaming with provider.client. When large-response mode switches them onto streamed-body reads, they keep the unary client's 30s ReadTimeout, so the timeout regression remains for non-*Stream flows too. These should all use provider.streamingClient instead.

Representative fix
- activeClient := providerUtils.PrepareResponseStreaming(ctx, provider.client, resp)
+ activeClient := providerUtils.PrepareResponseStreaming(ctx, provider.streamingClient, resp)

Based on learnings: "Maintain and rely on the ctx.Value(schemas.BifrostContextKeyLargeResponseMode) flag in the Azure provider and across all Bifrost providers to detect large-response streaming mode."

Also applies to: 1001-1002, 1146-1147, 1439-1440, 1584-1585, 1821-1822, 2042-2043, 2672-2673

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/providers/vertex/vertex.go` around lines 567 - 568, The
PrepareResponseStreaming call is being passed provider.client so large-response
flows still use the unary client's 30s ReadTimeout; change the first argument to
provider.streamingClient when constructing activeClient (i.e., replace
provider.client with provider.streamingClient in the PrepareResponseStreaming
call) so MakeRequestWithContext uses a streaming-capable client; ensure the same
swap is applied to all equivalent callsites that call PrepareResponseStreaming
followed by MakeRequestWithContext (they rely on
ctx.Value(schemas.BifrostContextKeyLargeResponseMode) to select large-response
mode).
core/providers/azure/azure.go (1)

1001-1002: ⚠️ Potential issue | 🟠 Major

SpeechStream still bypasses the streaming client.

Line 1002 still calls provider.client.Do(...), so Azure speech streaming can still hit the unary ReadTimeout even though the other streaming entrypoints were moved over.

Suggested follow-up
-	requestErr := provider.client.Do(req, resp)
+	requestErr := provider.streamingClient.Do(req, resp)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/providers/azure/azure.go` around lines 1001 - 1002, The SpeechStream
path is still using the unary HTTP client call provider.client.Do(req, resp)
which bypasses the streaming client and can trigger the ReadTimeout; update the
SpeechStream implementation to use the streaming client's request API (the same
client used by other streaming entry points) instead of provider.client.Do,
ensure the code invokes the streaming client's Do/Stream method consistent with
other streaming handlers, and remove the unary ReadTimeout branch for
SpeechStream so SpeechStream routes through the streaming client (preserve
request/resp handling and error propagation).
🧹 Nitpick comments (3)
core/providers/mistral/transcription_test.go (1)

1599-1609: Drop the dead lastResponse placeholder.

After removing the metadata assertions, _ = lastResponse only preserves unused state. It would be cleaner to remove the variable and its assignment.

Suggested cleanup
-	var lastResponse *schemas.BifrostTranscriptionStreamResponse
-
 	for streamResp := range streamChan {
 		if streamResp.BifrostError != nil {
 			t.Logf("⚠️ Stream error (may be expected for minimal audio): %v", streamResp.BifrostError.Error.Message)
 			return
 		}
 
 		if streamResp.BifrostTranscriptionStreamResponse != nil {
 			chunkCount++
-			lastResponse = streamResp.BifrostTranscriptionStreamResponse
 
 			if streamResp.BifrostTranscriptionStreamResponse.Delta != nil {
 				allText += *streamResp.BifrostTranscriptionStreamResponse.Delta
 			}
@@
-	_ = lastResponse

Also applies to: 1629-1629

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/providers/mistral/transcription_test.go` around lines 1599 - 1609,
Remove the dead local variable and assignment for lastResponse: delete the
declaration "var lastResponse *schemas.BifrostTranscriptionStreamResponse" and
the assignment "lastResponse = streamResp.BifrostTranscriptionStreamResponse"
inside the loop that ranges over streamChan where streamResp is handled; keep
chunkCount increment and existing error/log handling for streamResp.BifrostError
and streamResp.BifrostTranscriptionStreamResponse.
AGENTS.md (1)

300-302: Clarify that context deadlines and large-response mode are part of this contract too.

This reads as if provider.streamingClient only matters for *Stream helpers and that NewIdleTimeoutReader is the sole guard. In this codebase, provider-level context cancellation is also part of the streaming timeout story, and large-response branches that flip into StreamBody reads need the same client split. Tightening this wording here will prevent future providers from reintroducing the timeout bug through non-*Stream paths.

Based on learnings: "the first-byte/header timeout concern is handled at each provider level ... via SetupStreamCancellation" and "Maintain and rely on the ctx.Value(schemas.BifrostContextKeyLargeResponseMode) flag ... to detect large-response streaming mode."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AGENTS.md` around lines 300 - 302, The docs incorrectly imply
provider.streamingClient only matters for *Stream helpers; update the text to
state that provider.streamingClient must be used wherever a streaming-mode HTTP
client is required — including non-*Stream code paths that switch into
StreamBody/large-response handling — so context deadlines and large-response
mode are honored; explicitly mention passing provider.streamingClient to every
Handle*Streaming / Handle*StreamRequest and to any direct Do calls inside
*Stream methods or branches that detect
ctx.Value(schemas.BifrostContextKeyLargeResponseMode) and flip into streaming,
note that per-chunk timeouts (NewIdleTimeoutReader) and provider-level
cancellation via SetupStreamCancellation both participate in stream lifecycle,
and recommend deriving the streaming client with
providerUtils.BuildStreamingHTTPClient(client) to clear Client.Timeout while
reusing Transport.
core/providers/elevenlabs/elevenlabs.go (1)

352-353: Start the stream latency clock after the handshake.

Line 352 starts startTime before Do, so Line 467 reports connection/header wait plus stream time. This provider’s streaming handlers normally measure final latency from after the stream is established.

Suggested adjustment
-	startTime := time.Now()
 	err := provider.streamingClient.Do(req, resp)
 	if err != nil {
 		defer providerUtils.ReleaseStreamingResponse(resp)
@@
-		chunkIndex := -1
-		lastChunkTime := time.Now()
+		chunkIndex := -1
+		startTime := time.Now()
+		lastChunkTime := startTime

Based on learnings, streaming startTime should be initialized after client.Do once the stream is established so final-chunk latency reflects streaming time rather than handshake time.

Also applies to: 415-417, 461-468

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/providers/elevenlabs/elevenlabs.go` around lines 352 - 353, The stream
latency clock is started before the HTTP handshake; move the startTime :=
time.Now() so it is set after provider.streamingClient.Do(req, resp) has
successfully returned (i.e., once the stream/handshake is established) and
before you begin measuring/consuming streaming chunks or recording the
final-chunk latency; update the same pattern in the other occurrences that use
startTime around provider.streamingClient.Do to ensure latency measures only the
streaming phase.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/providers/utils/decompression_test.go`:
- Around line 501-505: The test in the "panic_nonnil" subtest calls safeReset
with a panic of "" but the assertion message says "expected false for nil panic"
which is misleading; update the t.Fatal message in that subtest to reflect the
actual case (e.g., "expected false for empty-string panic" or "expected false
for non-nil panic") so the failure text matches the panic input and points to
safeReset correctly.

In `@core/providers/utils/streaming_client_test.go`:
- Around line 116-127: The test reads lines with scanner :=
bufio.NewScanner(resp.BodyStream()) and currently slices scanner.Text()[:5],
which can panic for short lines; change the prefix check to use a safe check
(e.g., verify len(line) >= 5 before slicing or use strings.HasPrefix(line,
"data:")) inside the loop that increments got, and after the loop assert
scanner.Err() (fail the test with t.Fatalf/t.Errorf if non-nil) so any stream
read errors are surfaced; apply the same two changes to the other scan loop
referenced (lines 204-213).

---

Outside diff comments:
In `@core/providers/openai/openai.go`:
- Around line 6908-6939: PassthroughStream currently reassembles the upstream
URL by concatenating provider.networkConfig.BaseURL + "/v1" + path which skips
ChatGPTOAuth normalization; change PassthroughStream to build the target URL via
the existing helper (e.g., call buildRequestURL(...) or buildFullURL(...) used
elsewhere) instead of manual concatenation so `/v1/...` routes are normalized
for ChatGPTOAuth; update the code that sets url (lines using
provider.networkConfig.BaseURL + "/v1" + path and query handling) to use the
helper and keep the rest of the request flow (SetRequestURI, headers, SetBody,
PrepareResponseStreaming) unchanged.

---

Duplicate comments:
In `@core/providers/azure/azure.go`:
- Around line 1001-1002: The SpeechStream path is still using the unary HTTP
client call provider.client.Do(req, resp) which bypasses the streaming client
and can trigger the ReadTimeout; update the SpeechStream implementation to use
the streaming client's request API (the same client used by other streaming
entry points) instead of provider.client.Do, ensure the code invokes the
streaming client's Do/Stream method consistent with other streaming handlers,
and remove the unary ReadTimeout branch for SpeechStream so SpeechStream routes
through the streaming client (preserve request/resp handling and error
propagation).

In `@core/providers/vertex/vertex.go`:
- Around line 567-568: The PrepareResponseStreaming call is being passed
provider.client so large-response flows still use the unary client's 30s
ReadTimeout; change the first argument to provider.streamingClient when
constructing activeClient (i.e., replace provider.client with
provider.streamingClient in the PrepareResponseStreaming call) so
MakeRequestWithContext uses a streaming-capable client; ensure the same swap is
applied to all equivalent callsites that call PrepareResponseStreaming followed
by MakeRequestWithContext (they rely on
ctx.Value(schemas.BifrostContextKeyLargeResponseMode) to select large-response
mode).

---

Nitpick comments:
In `@AGENTS.md`:
- Around line 300-302: The docs incorrectly imply provider.streamingClient only
matters for *Stream helpers; update the text to state that
provider.streamingClient must be used wherever a streaming-mode HTTP client is
required — including non-*Stream code paths that switch into
StreamBody/large-response handling — so context deadlines and large-response
mode are honored; explicitly mention passing provider.streamingClient to every
Handle*Streaming / Handle*StreamRequest and to any direct Do calls inside
*Stream methods or branches that detect
ctx.Value(schemas.BifrostContextKeyLargeResponseMode) and flip into streaming,
note that per-chunk timeouts (NewIdleTimeoutReader) and provider-level
cancellation via SetupStreamCancellation both participate in stream lifecycle,
and recommend deriving the streaming client with
providerUtils.BuildStreamingHTTPClient(client) to clear Client.Timeout while
reusing Transport.

In `@core/providers/elevenlabs/elevenlabs.go`:
- Around line 352-353: The stream latency clock is started before the HTTP
handshake; move the startTime := time.Now() so it is set after
provider.streamingClient.Do(req, resp) has successfully returned (i.e., once the
stream/handshake is established) and before you begin measuring/consuming
streaming chunks or recording the final-chunk latency; update the same pattern
in the other occurrences that use startTime around provider.streamingClient.Do
to ensure latency measures only the streaming phase.

In `@core/providers/mistral/transcription_test.go`:
- Around line 1599-1609: Remove the dead local variable and assignment for
lastResponse: delete the declaration "var lastResponse
*schemas.BifrostTranscriptionStreamResponse" and the assignment "lastResponse =
streamResp.BifrostTranscriptionStreamResponse" inside the loop that ranges over
streamChan where streamResp is handled; keep chunkCount increment and existing
error/log handling for streamResp.BifrostError and
streamResp.BifrostTranscriptionStreamResponse.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 08f5fe00-12d5-4990-bd4c-c050d55a878e

📥 Commits

Reviewing files that changed from the base of the PR and between 8e9c116 and 772cb92.

📒 Files selected for processing (36)
  • AGENTS.md
  • core/providers/anthropic/anthropic.go
  • core/providers/anthropic/chat_test.go
  • core/providers/azure/azure.go
  • core/providers/bedrock/bedrock.go
  • core/providers/bedrock/transport_test.go
  • core/providers/cerebras/cerebras.go
  • core/providers/cohere/cohere.go
  • core/providers/elevenlabs/elevenlabs.go
  • core/providers/fireworks/fireworks.go
  • core/providers/gemini/gemini.go
  • core/providers/gemini/list_models_single_payload_test.go
  • core/providers/groq/groq.go
  • core/providers/huggingface/huggingface.go
  • core/providers/mistral/custom_provider_test.go
  • core/providers/mistral/mistral.go
  • core/providers/mistral/transcription_test.go
  • core/providers/nebius/nebius.go
  • core/providers/ollama/ollama.go
  • core/providers/openai/openai.go
  • core/providers/openrouter/openrouter.go
  • core/providers/parasail/parasail.go
  • core/providers/perplexity/perplexity.go
  • core/providers/replicate/replicate.go
  • core/providers/sgl/sgl.go
  • core/providers/utils/decompression_test.go
  • core/providers/utils/make_request_test.go
  • core/providers/utils/streaming_client_test.go
  • core/providers/utils/utils.go
  • core/providers/utils/utils_test.go
  • core/providers/vertex/models.go
  • core/providers/vertex/rerank.go
  • core/providers/vertex/vertex.go
  • core/providers/vllm/vllm.go
  • core/providers/xai/xai.go
  • framework/configstore/tables/team.go
✅ Files skipped from review due to trivial changes (3)
  • core/providers/vertex/models.go
  • core/providers/utils/make_request_test.go
  • framework/configstore/tables/team.go
🚧 Files skipped from review as they are similar to previous changes (6)
  • core/providers/bedrock/bedrock.go
  • core/providers/cohere/cohere.go
  • core/providers/gemini/gemini.go
  • core/providers/anthropic/anthropic.go
  • core/providers/utils/utils.go
  • core/providers/replicate/replicate.go

Comment thread core/providers/utils/decompression_test.go
Comment thread core/providers/utils/streaming_client_test.go
@akshaydeo akshaydeo force-pushed the 04-19-separate_streaming_clients_for_avoiding_read_timeouts branch from 772cb92 to 2a64612 Compare April 19, 2026 04:14
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
core/providers/vertex/vertex.go (1)

567-567: ⚠️ Potential issue | 🟠 Major

Large-response paths still use the unary client.

PrepareResponseStreaming(...) in non-stream methods still receives provider.client (Line 567, Line 1001, Line 1146, Line 1439, Line 1584, Line 1821, Line 2042, Line 2672). That keeps large-response reads on timeout-bounded transport and can reintroduce read-timeout failures in the exact mode this PR is addressing.

Representative fix
- activeClient := providerUtils.PrepareResponseStreaming(ctx, provider.client, resp)
+ activeClient := providerUtils.PrepareResponseStreaming(ctx, provider.streamingClient, resp)

Apply this replacement in:

  • ChatCompletion
  • Responses (both Anthropic and Gemini branches)
  • Embedding
  • Rerank
  • ImageGeneration
  • ImageEdit
  • CountTokens

Based on learnings: Maintain and rely on the ctx.Value(schemas.BifrostContextKeyLargeResponseMode) flag in providers to detect large-response streaming mode.

Also applies to: 1001-1001, 1146-1146, 1439-1439, 1584-1584, 1821-1821, 2042-2042, 2672-2672

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/providers/vertex/vertex.go` at line 567, The code calls
PrepareResponseStreaming(ctx, provider.client, resp) which keeps using the unary
(timeout-bounded) client for large-response paths; change these calls to pass
the provider's streaming-capable client when large-response streaming is active
(use the ctx.Value(schemas.BifrostContextKeyLargeResponseMode) flag to detect
mode). For each affected method (ChatCompletion, Responses — Anthropic & Gemini
branches, Embedding, Rerank, ImageGeneration, ImageEdit, CountTokens) replace
the second argument provider.client with the provider's streaming client field
(e.g., provider.streamingClient or whatever streaming client field exists on the
provider) when the large-response flag is set so PrepareResponseStreaming uses a
streaming transport instead of the unary provider.client.
core/providers/azure/azure.go (1)

38-41: ⚠️ Potential issue | 🟠 Major

SpeechStream still bypasses the new streaming client.

Adding streamingClient here is the right split, but Azure speech streaming still calls provider.client.Do(...) on Line 1002. That path can still hit the unary ReadTimeout, so the timeout fix is still incomplete.

Suggested follow-up
-	requestErr := provider.client.Do(req, resp)
+	requestErr := provider.streamingClient.Do(req, resp)

As per coding guidelines "After ConfigureProxy/ConfigureDialer/ConfigureTLS, build a sibling streamingClient using providerUtils.BuildStreamingClient(client) which zeros ReadTimeout/WriteTimeout/MaxConnDuration so streams aren't killed by fasthttp's whole-response deadline."

Also applies to: 188-192

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/providers/azure/azure.go` around lines 38 - 41, The streaming path still
uses provider.client (e.g., SpeechStream calls provider.client.Do) so streams
can hit the unary ReadTimeout; create a sibling provider.streamingClient after
ConfigureProxy/ConfigureDialer/ConfigureTLS by calling
providerUtils.BuildStreamingClient(client) (which zeroes
ReadTimeout/WriteTimeout/MaxConnDuration), assign it to
provider.streamingClient, and change all streaming usages (notably SpeechStream)
to call provider.streamingClient.Do instead of provider.client.Do; also ensure
the same fix is applied to the other streaming call sites referenced around the
earlier network config block (the spots that currently use provider.client for
streaming).
🧹 Nitpick comments (2)
core/providers/azure/azure.go (1)

2803-2904: Move passthrough stream timing until after the handshake.

Line 2806 starts latency measurement before activeClient.Do(...), so the final passthrough latency includes connect/header wait instead of just stream duration. Start the timer after Do succeeds and the stream setup is complete.

Possible cleanup
-	startTime := time.Now()
-
 	if err := activeClient.Do(fasthttpReq, resp); err != nil {
 		providerUtils.ReleaseStreamingResponse(resp)
 		if errors.Is(err, context.Canceled) {
 			return nil, &schemas.BifrostError{
@@
 	go func() {
 		defer providerUtils.EnsureStreamFinalizerCalled(ctx)
 		defer func() {
@@
 		defer providerUtils.ReleaseStreamingResponse(resp)
 		defer stopIdleTimeout()
 		defer stopCancellation()
+		startTime := time.Now()
 
 		buf := make([]byte, 4096)

Based on learnings "initialize the streaming startTime inside the streaming goroutine after client.Do and the stream is set up, so the final-chunk latency reflects post-handshake streaming time."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/providers/azure/azure.go` around lines 2803 - 2904, The latency timer is
started too early (startTime := time.Now() is set before activeClient.Do
finishes); move the startTime initialization into the streaming goroutine after
the stream setup succeeds (i.e., after rawBodyStream != nil and after creating
bodyStream via providerUtils.NewIdleTimeoutReader and calling
providerUtils.SetupStreamCancellation) so that extraFields.Latency =
time.Since(startTime).Milliseconds() measures only the streaming duration;
remove the earlier startTime variable and ensure the goroutine defines startTime
:= time.Now() just before the read loop (before buf := make([]byte, 4096)) so
postHookRunner and final chunk latency reflect post-handshake streaming time.
AGENTS.md (1)

300-300: Describe the unary timeout as configurable, not fixed at 30s.

client does not always run with a hardcoded 30-second ReadTimeout; provider constructors derive that timeout from config. Tightening this wording would keep the doc aligned with the actual constructor pattern and avoid nudging future providers toward a magic number.

📝 Suggested wording
-**Streaming vs unary client:** Every provider holds two clients — `client` for unary requests (`ReadTimeout=30s` bounds the whole response) and `streamingClient` for SSE / EventStream / chunked paths (`ReadTimeout=0`; the per-chunk `NewIdleTimeoutReader` is the only governor). Pass `provider.streamingClient` to every `Handle*Streaming` / `Handle*StreamRequest` helper and to direct `Do` calls inside `*Stream` methods. For new providers, apply the same pattern — missing the switch means streams get killed at 30s.
+**Streaming vs unary client:** Every provider holds two clients — `client` for unary requests (the configured `ReadTimeout` / default request timeout bounds the whole response) and `streamingClient` for SSE / EventStream / chunked paths (`ReadTimeout=0`; the per-chunk `NewIdleTimeoutReader` governs stream idleness). Pass `provider.streamingClient` to every `Handle*Streaming` / `Handle*StreamRequest` helper and to direct `Do` calls inside `*Stream` methods. For new providers, apply the same pattern — missing the switch means streams inherit the unary whole-response timeout.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AGENTS.md` at line 300, The documentation incorrectly states that the unary
`client` uses a fixed 30s ReadTimeout; update the sentence to say the unary
`client`'s ReadTimeout is configurable via provider constructors (derived from
config) rather than a hardcoded 30s, while keeping the guidance to pass
provider.streamingClient to Handle*Streaming / Handle*StreamRequest helpers and
to Do calls inside *Stream methods; reference the symbols `client`,
`streamingClient`, `ReadTimeout`, `NewIdleTimeoutReader`,
`Handle*Streaming`/`Handle*StreamRequest`, and `*Stream` so readers know which
parts of the codebase this applies to.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@core/providers/azure/azure.go`:
- Around line 38-41: The streaming path still uses provider.client (e.g.,
SpeechStream calls provider.client.Do) so streams can hit the unary ReadTimeout;
create a sibling provider.streamingClient after
ConfigureProxy/ConfigureDialer/ConfigureTLS by calling
providerUtils.BuildStreamingClient(client) (which zeroes
ReadTimeout/WriteTimeout/MaxConnDuration), assign it to
provider.streamingClient, and change all streaming usages (notably SpeechStream)
to call provider.streamingClient.Do instead of provider.client.Do; also ensure
the same fix is applied to the other streaming call sites referenced around the
earlier network config block (the spots that currently use provider.client for
streaming).

In `@core/providers/vertex/vertex.go`:
- Line 567: The code calls PrepareResponseStreaming(ctx, provider.client, resp)
which keeps using the unary (timeout-bounded) client for large-response paths;
change these calls to pass the provider's streaming-capable client when
large-response streaming is active (use the
ctx.Value(schemas.BifrostContextKeyLargeResponseMode) flag to detect mode). For
each affected method (ChatCompletion, Responses — Anthropic & Gemini branches,
Embedding, Rerank, ImageGeneration, ImageEdit, CountTokens) replace the second
argument provider.client with the provider's streaming client field (e.g.,
provider.streamingClient or whatever streaming client field exists on the
provider) when the large-response flag is set so PrepareResponseStreaming uses a
streaming transport instead of the unary provider.client.

---

Nitpick comments:
In `@AGENTS.md`:
- Line 300: The documentation incorrectly states that the unary `client` uses a
fixed 30s ReadTimeout; update the sentence to say the unary `client`'s
ReadTimeout is configurable via provider constructors (derived from config)
rather than a hardcoded 30s, while keeping the guidance to pass
provider.streamingClient to Handle*Streaming / Handle*StreamRequest helpers and
to Do calls inside *Stream methods; reference the symbols `client`,
`streamingClient`, `ReadTimeout`, `NewIdleTimeoutReader`,
`Handle*Streaming`/`Handle*StreamRequest`, and `*Stream` so readers know which
parts of the codebase this applies to.

In `@core/providers/azure/azure.go`:
- Around line 2803-2904: The latency timer is started too early (startTime :=
time.Now() is set before activeClient.Do finishes); move the startTime
initialization into the streaming goroutine after the stream setup succeeds
(i.e., after rawBodyStream != nil and after creating bodyStream via
providerUtils.NewIdleTimeoutReader and calling
providerUtils.SetupStreamCancellation) so that extraFields.Latency =
time.Since(startTime).Milliseconds() measures only the streaming duration;
remove the earlier startTime variable and ensure the goroutine defines startTime
:= time.Now() just before the read loop (before buf := make([]byte, 4096)) so
postHookRunner and final chunk latency reflect post-handshake streaming time.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9306b505-0185-4739-af83-a4ca01b46c54

📥 Commits

Reviewing files that changed from the base of the PR and between 772cb92 and 2a64612.

⛔ Files ignored due to path filters (1)
  • core/go.sum is excluded by !**/*.sum
📒 Files selected for processing (38)
  • AGENTS.md
  • core/internal/llmtests/chat_completion_stream.go
  • core/internal/llmtests/response_validation.go
  • core/providers/anthropic/anthropic.go
  • core/providers/anthropic/chat_test.go
  • core/providers/azure/azure.go
  • core/providers/bedrock/bedrock.go
  • core/providers/bedrock/transport_test.go
  • core/providers/cerebras/cerebras.go
  • core/providers/cohere/cohere.go
  • core/providers/elevenlabs/elevenlabs.go
  • core/providers/fireworks/fireworks.go
  • core/providers/gemini/gemini.go
  • core/providers/gemini/list_models_single_payload_test.go
  • core/providers/groq/groq.go
  • core/providers/huggingface/huggingface.go
  • core/providers/mistral/custom_provider_test.go
  • core/providers/mistral/mistral.go
  • core/providers/mistral/transcription_test.go
  • core/providers/nebius/nebius.go
  • core/providers/ollama/ollama.go
  • core/providers/openai/openai.go
  • core/providers/openrouter/openrouter.go
  • core/providers/parasail/parasail.go
  • core/providers/perplexity/perplexity.go
  • core/providers/replicate/replicate.go
  • core/providers/sgl/sgl.go
  • core/providers/utils/decompression_test.go
  • core/providers/utils/make_request_test.go
  • core/providers/utils/streaming_client_test.go
  • core/providers/utils/utils.go
  • core/providers/utils/utils_test.go
  • core/providers/vertex/models.go
  • core/providers/vertex/rerank.go
  • core/providers/vertex/vertex.go
  • core/providers/vllm/vllm.go
  • core/providers/xai/xai.go
  • framework/configstore/tables/team.go
✅ Files skipped from review due to trivial changes (5)
  • core/providers/utils/make_request_test.go
  • core/providers/vertex/models.go
  • core/providers/mistral/custom_provider_test.go
  • framework/configstore/tables/team.go
  • core/providers/utils/streaming_client_test.go
🚧 Files skipped from review as they are similar to previous changes (18)
  • core/providers/vertex/rerank.go
  • core/providers/utils/decompression_test.go
  • core/providers/gemini/list_models_single_payload_test.go
  • core/providers/bedrock/transport_test.go
  • core/providers/utils/utils_test.go
  • core/providers/sgl/sgl.go
  • core/providers/openrouter/openrouter.go
  • core/providers/groq/groq.go
  • core/providers/ollama/ollama.go
  • core/providers/vllm/vllm.go
  • core/providers/huggingface/huggingface.go
  • core/providers/cohere/cohere.go
  • core/providers/fireworks/fireworks.go
  • core/providers/anthropic/chat_test.go
  • core/providers/nebius/nebius.go
  • core/providers/xai/xai.go
  • core/providers/mistral/mistral.go
  • core/providers/openai/openai.go

@akshaydeo akshaydeo force-pushed the 04-19-separate_streaming_clients_for_avoiding_read_timeouts branch from 2a64612 to 2bd0e7e Compare April 19, 2026 04:32
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
core/providers/azure/azure.go (1)

188-192: ⚠️ Potential issue | 🟠 Major

SpeechStream still bypasses the new streaming client.

provider.streamingClient is created here, but SpeechStream still calls provider.client.Do(...) on Line 1002. That audio SSE path can still hit the unary ReadTimeout, so this timeout fix still misses Azure speech streaming.

Representative fix
-	requestErr := provider.client.Do(req, resp)
+	requestErr := provider.streamingClient.Do(req, resp)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/providers/azure/azure.go` around lines 188 - 192, The AzureProvider
constructs a streamingClient but SpeechStream still uses provider.client.Do
(bypassing the streaming client) which allows the unary ReadTimeout to affect
audio SSE; update SpeechStream to use provider.streamingClient for its HTTP
requests instead of provider.client.Do so the streaming path uses the streaming
client built by providerUtils.BuildStreamingClient; ensure any call sites in
SpeechStream that reference provider.client or client.Do are replaced to call
the equivalent method on provider.streamingClient and preserve headers/context
handling and error handling.
core/providers/vertex/vertex.go (1)

102-106: ⚠️ Potential issue | 🟠 Major

Large-response paths still use client.

Adding streamingClient here fixes the explicit streaming entrypoints, but ChatCompletion, Responses, Embedding, Rerank, ImageGeneration, ImageEdit, and CountTokens still call PrepareResponseStreaming(ctx, provider.client, resp) on their large-response paths. Those branches can still hit the unary ReadTimeout, so the timeout fix is still incomplete.

Representative fix
-	activeClient := providerUtils.PrepareResponseStreaming(ctx, provider.client, resp)
+	activeClient := provider.client
+	if isLargeResp, _ := ctx.Value(schemas.BifrostContextKeyLargeResponseMode).(bool); isLargeResp {
+		activeClient = provider.streamingClient
+	}
+	activeClient = providerUtils.PrepareResponseStreaming(ctx, activeClient, resp)

Based on learnings: Maintain and rely on the ctx.Value(schemas.BifrostContextKeyLargeResponseMode) flag in the Azure provider and across all Bifrost providers to detect large-response streaming mode.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/providers/vertex/vertex.go` around lines 102 - 106, Several
large-response branches still pass provider.client into PrepareResponseStreaming
causing unary ReadTimeouts; update these paths (the large-response branches
inside ChatCompletion, Responses, Embedding, Rerank, ImageGeneration, ImageEdit,
and CountTokens) to use the VertexProvider.streamingClient instead of
provider.client when ctx.Value(schemas.BifrostContextKeyLargeResponseMode)
indicates large-response streaming mode. Locate calls to
PrepareResponseStreaming(ctx, provider.client, resp) and replace the client
argument with streamingClient (or provider.streamingClient) conditioned on the
BifrostContextKeyLargeResponseMode flag so the streaming entrypoint is used
consistently across all providers and functions.
🧹 Nitpick comments (1)
core/providers/gemini/gemini.go (1)

4127-4151: Start passthrough stream latency after the handshake.

startTime is captured before Do, so the final latency here includes connection/setup time rather than just streaming time. The other provider stream paths in this file already measure from after the stream is established.

♻️ Suggested adjustment
-	startTime := time.Now()
-
 	fasthttpReq := fasthttp.AcquireRequest()
 	resp := fasthttp.AcquireResponse()
 	resp.StreamBody = true
@@
 	ch := make(chan *schemas.BifrostStreamChunk, schemas.DefaultStreamBufferSize)
 	go func() {
+		startTime := time.Now()
 		defer providerUtils.EnsureStreamFinalizerCalled(ctx)
Based on learnings, initialize the streaming startTime inside the streaming goroutine after `client.Do` and the stream is set up, so the final-chunk latency reflects post-handshake streaming time.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/providers/gemini/gemini.go` around lines 4127 - 4151, The startTime is
being recorded before the HTTP handshake so passthrough stream latency includes
connection/setup time; move the startTime initialization into the streaming
goroutine after providerUtils.PrepareResponseStreaming returns and after
activeClient.Do(fasthttpReq, resp) succeeds (i.e., once the stream is
established) so that the latency measured by startTime reflects only
post-handshake streaming time; update references to startTime in the streaming
path accordingly (look for startTime, activeClient,
providerUtils.PrepareResponseStreaming, and the activeClient.Do call).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/providers/azure/azure.go`:
- Around line 188-192: completeRequest currently always calls
PrepareResponseStreaming(ctx, provider.client, resp) which causes large-response
requests to use the unary client (30s ReadTimeout); update completeRequest to
detect the large-response flag via
ctx.Value(schemas.BifrostContextKeyLargeResponseMode) and, when set, call
PrepareResponseStreaming(ctx, provider.streamingClient, resp) instead of
provider.client; ensure references to provider.client in completeRequest (and
any callers like TextCompletion, ChatCompletion, Responses, Embedding that rely
on completeRequest) are not forcing unary usage so large-response streaming
flows through provider.streamingClient.

---

Duplicate comments:
In `@core/providers/azure/azure.go`:
- Around line 188-192: The AzureProvider constructs a streamingClient but
SpeechStream still uses provider.client.Do (bypassing the streaming client)
which allows the unary ReadTimeout to affect audio SSE; update SpeechStream to
use provider.streamingClient for its HTTP requests instead of provider.client.Do
so the streaming path uses the streaming client built by
providerUtils.BuildStreamingClient; ensure any call sites in SpeechStream that
reference provider.client or client.Do are replaced to call the equivalent
method on provider.streamingClient and preserve headers/context handling and
error handling.

In `@core/providers/vertex/vertex.go`:
- Around line 102-106: Several large-response branches still pass
provider.client into PrepareResponseStreaming causing unary ReadTimeouts; update
these paths (the large-response branches inside ChatCompletion, Responses,
Embedding, Rerank, ImageGeneration, ImageEdit, and CountTokens) to use the
VertexProvider.streamingClient instead of provider.client when
ctx.Value(schemas.BifrostContextKeyLargeResponseMode) indicates large-response
streaming mode. Locate calls to PrepareResponseStreaming(ctx, provider.client,
resp) and replace the client argument with streamingClient (or
provider.streamingClient) conditioned on the BifrostContextKeyLargeResponseMode
flag so the streaming entrypoint is used consistently across all providers and
functions.

---

Nitpick comments:
In `@core/providers/gemini/gemini.go`:
- Around line 4127-4151: The startTime is being recorded before the HTTP
handshake so passthrough stream latency includes connection/setup time; move the
startTime initialization into the streaming goroutine after
providerUtils.PrepareResponseStreaming returns and after
activeClient.Do(fasthttpReq, resp) succeeds (i.e., once the stream is
established) so that the latency measured by startTime reflects only
post-handshake streaming time; update references to startTime in the streaming
path accordingly (look for startTime, activeClient,
providerUtils.PrepareResponseStreaming, and the activeClient.Do call).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 57cede38-ed01-4530-bd03-e3fbf08091f4

📥 Commits

Reviewing files that changed from the base of the PR and between 2a64612 and 2bd0e7e.

⛔ Files ignored due to path filters (1)
  • core/go.sum is excluded by !**/*.sum
📒 Files selected for processing (38)
  • AGENTS.md
  • core/internal/llmtests/chat_completion_stream.go
  • core/internal/llmtests/response_validation.go
  • core/providers/anthropic/anthropic.go
  • core/providers/anthropic/chat_test.go
  • core/providers/azure/azure.go
  • core/providers/bedrock/bedrock.go
  • core/providers/bedrock/transport_test.go
  • core/providers/cerebras/cerebras.go
  • core/providers/cohere/cohere.go
  • core/providers/elevenlabs/elevenlabs.go
  • core/providers/fireworks/fireworks.go
  • core/providers/gemini/gemini.go
  • core/providers/gemini/list_models_single_payload_test.go
  • core/providers/groq/groq.go
  • core/providers/huggingface/huggingface.go
  • core/providers/mistral/custom_provider_test.go
  • core/providers/mistral/mistral.go
  • core/providers/mistral/transcription_test.go
  • core/providers/nebius/nebius.go
  • core/providers/ollama/ollama.go
  • core/providers/openai/openai.go
  • core/providers/openrouter/openrouter.go
  • core/providers/parasail/parasail.go
  • core/providers/perplexity/perplexity.go
  • core/providers/replicate/replicate.go
  • core/providers/sgl/sgl.go
  • core/providers/utils/decompression_test.go
  • core/providers/utils/make_request_test.go
  • core/providers/utils/streaming_client_test.go
  • core/providers/utils/utils.go
  • core/providers/utils/utils_test.go
  • core/providers/vertex/models.go
  • core/providers/vertex/rerank.go
  • core/providers/vertex/vertex.go
  • core/providers/vllm/vllm.go
  • core/providers/xai/xai.go
  • framework/configstore/tables/team.go
✅ Files skipped from review due to trivial changes (4)
  • core/providers/utils/make_request_test.go
  • core/providers/vertex/models.go
  • core/providers/utils/utils_test.go
  • core/providers/utils/streaming_client_test.go
🚧 Files skipped from review as they are similar to previous changes (22)
  • core/providers/vertex/rerank.go
  • core/providers/gemini/list_models_single_payload_test.go
  • core/providers/mistral/custom_provider_test.go
  • core/providers/nebius/nebius.go
  • core/providers/openrouter/openrouter.go
  • core/providers/utils/decompression_test.go
  • core/internal/llmtests/response_validation.go
  • core/providers/cerebras/cerebras.go
  • framework/configstore/tables/team.go
  • core/providers/groq/groq.go
  • core/providers/sgl/sgl.go
  • core/providers/cohere/cohere.go
  • core/providers/anthropic/chat_test.go
  • core/providers/vllm/vllm.go
  • core/providers/ollama/ollama.go
  • core/providers/parasail/parasail.go
  • core/providers/huggingface/huggingface.go
  • core/providers/xai/xai.go
  • AGENTS.md
  • core/providers/perplexity/perplexity.go
  • core/providers/elevenlabs/elevenlabs.go
  • core/providers/utils/utils.go

Comment thread core/providers/azure/azure.go
@akshaydeo akshaydeo force-pushed the 04-19-separate_streaming_clients_for_avoiding_read_timeouts branch from 2bd0e7e to b9d2397 Compare April 19, 2026 04:49
coderabbitai[bot]
coderabbitai Bot previously approved these changes Apr 19, 2026
@akshaydeo akshaydeo mentioned this pull request Apr 19, 2026
18 tasks
@akshaydeo akshaydeo force-pushed the 04-19-separate_streaming_clients_for_avoiding_read_timeouts branch from b9d2397 to 72c7bd0 Compare April 19, 2026 19:45
@akshaydeo akshaydeo force-pushed the 04-19-separate_conenction_for_migrations branch from 98ebb2f to f0e0659 Compare April 19, 2026 19:45
@akshaydeo akshaydeo mentioned this pull request Apr 19, 2026
18 tasks
Copy link
Copy Markdown
Contributor Author

akshaydeo commented Apr 20, 2026

Merge activity

  • Apr 20, 6:39 AM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Apr 20, 6:40 AM UTC: @akshaydeo merged this pull request with Graphite.

@akshaydeo akshaydeo changed the base branch from 04-19-separate_conenction_for_migrations to graphite-base/2832 April 20, 2026 06:39
@akshaydeo akshaydeo changed the base branch from graphite-base/2832 to main April 20, 2026 06:40
@akshaydeo akshaydeo dismissed coderabbitai[bot]’s stale review April 20, 2026 06:40

The base branch was changed.

@akshaydeo akshaydeo merged commit c034101 into main Apr 20, 2026
7 of 14 checks passed
@akshaydeo akshaydeo deleted the 04-19-separate_streaming_clients_for_avoiding_read_timeouts branch April 20, 2026 06:40
dominictayloruk pushed a commit to dominictayloruk/bifrost that referenced this pull request Apr 21, 2026
## Summary

Briefly explain the purpose of this PR and the problem it solves.

## Changes

- What was changed and why
- Any notable design decisions or trade-offs

## Type of change

- [ ] Bug fix
- [ ] Feature
- [ ] Refactor
- [ ] Documentation
- [ ] Chore/CI

## Affected areas

- [ ] Core (Go)
- [ ] Transports (HTTP)
- [ ] Providers/Integrations
- [ ] Plugins
- [ ] UI (React)
- [ ] Docs

## How to test

Describe the steps to validate this change. Include commands and expected outcomes.

```sh
# Core/Transports
go version
go test ./...

# UI
cd ui
pnpm i || npm i
pnpm test || npm test
pnpm build || npm run build
```

If adding new configs or environment variables, document them here.

## Screenshots/Recordings

If UI changes, add before/after screenshots or short clips.

## Breaking changes

- [ ] Yes
- [ ] No

If yes, describe impact and migration instructions.

## Related issues

Link related issues and discussions. Example: Closes maximhq#123

## Security considerations

Note any security implications (auth, secrets, PII, sandboxing, etc.).

## Checklist

- [ ] I read `docs/contributing/README.md` and followed the guidelines
- [ ] I added/updated tests where appropriate
- [ ] I updated documentation where needed
- [ ] I verified builds succeed (Go and UI)
- [ ] I verified the CI pipeline passes locally if applicable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Files API Support

2 participants