fix: add idle timeout to upstream WS ReadMessage (#2998)#3013
fix: add idle timeout to upstream WS ReadMessage (#2998)#3013thiscantbeserious wants to merge 1 commit intomaximhq:mainfrom
Conversation
Before each upstream.ReadMessage() call in tryNativeWSUpstream, set a per-read deadline sourced from NetworkConfig.StreamIdleTimeoutInSeconds (default 60s). Clear the deadline immediately after each successful read so long-running streams that send a frame every <60s are never interrupted. On net.Error.Timeout(), discard the upstream connection, write a 504 upstream_timeout WS error frame to the client, and return cleanly. This prevents goroutine pile-up when an upstream accepts the WS upgrade and then goes silent. Closes maximhq#2998 Extracted from maximhq#2775 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Superseded by #3018 (merged 2026-04-24), which bundles the fix for this issue and several other native WS reliability bugs. Closing this draft as redundant. |
Summary
Fixes #2998. When a native WS upstream accepts the upgrade and then sends no frames,
upstream.ReadMessage()blocked indefinitely. This change adds a per-read idle deadline so the goroutine recovers and the client receives a 504 rather than a silent hang.Changes
wsUpstreamIdleTimeoutconstant (60s, matchingDefaultStreamIdleTimeoutInSeconds).upstreamWSIdleTimeout()method that readsNetworkConfig.StreamIdleTimeoutInSecondsfrom the provider config, falling back to the constant when no override is set.upstream.ReadMessage()intryNativeWSUpstream, setSetReadDeadline(now + idleTimeout). Clear the deadline on successful read so active streams that send a frame periodically are not interrupted.net.Error.Timeout(), discard the upstream from the pool, write a 504 upstream_timeout WS error frame to the client, return cleanly.Type of change
Affected areas
How to test
Manual, start a mock WS upstream that upgrades and stays silent, fire a request through Bifrost. Before this fix the request hangs indefinitely. After this fix it returns a 504 within
stream_idle_timeout_in_seconds.Tests added cover the stalling-server timeout, periodic-frame keepalive, and one-then-silent cases.
Breaking changes
Related issues
Closes #2998. Extracted from #2775, the PR branch contains this fix as a side effect of the OAuth feature work and it has been pulled out here as a focused change.
Checklist