Skip to content

fix: include upstream status and body in WS pool dial error (#2994)#3012

Closed
thiscantbeserious wants to merge 1 commit intomaximhq:mainfrom
thiscantbeserious:fix/2994-ws-pool-dial-error-body
Closed

fix: include upstream status and body in WS pool dial error (#2994)#3012
thiscantbeserious wants to merge 1 commit intomaximhq:mainfrom
thiscantbeserious:fix/2994-ws-pool-dial-error-body

Conversation

@thiscantbeserious
Copy link
Copy Markdown

Summary

Fixes #2994. When the gorilla WS dial rejects the upgrade, the upstream *http.Response carries the status and body needed to diagnose the failure (auth failures, proxy errors, 4xx from the backend). The current code discards the response and returns only the opaque websocket: bad handshake error, leaving operators with no signal in logs.

Changes

  • In (*Pool).dial, capture resp from websocket.DefaultDialer.Dial.
  • When resp is non-nil on error, read up to 512 bytes via io.LimitReader, close the body, and wrap the error with (upstream status N: snippet).
  • Falls back gracefully when resp is nil (no upstream reached).

Type of change

  • Bug fix

Affected areas

  • Transports (HTTP)

How to test

go test -cover github.com/maximhq/bifrost/transports/bifrost-http/websocket

Coverage moved 59.7% to 62.6%. Four new tests cover: dial error with upstream status+body, nil-response fallback, body truncation at 512 bytes, response body closed.

Breaking changes

  • No

Related issues

Closes #2994. Extracted from #2775, the PR branch contains this fix as a side effect of the OAuth feature work and it has been pulled out here as a focused change.

Checklist

  • I read docs/contributing/README.md and followed the guidelines
  • I added tests
  • Build passes

…2994)

The pool's dial() was discarding the *http.Response returned by gorilla's
Dial, so every WS handshake failure logged the same opaque
"websocket: bad handshake" with no upstream context.

Capture resp, read up to 512 bytes with io.LimitReader, close the body,
and include resp.StatusCode plus the snippet in the wrapped error.

New format: "failed to dial upstream websocket <url> (upstream status <N>: <body>): <err>"

Adds four table-driven tests covering: status+body in error, nil-response
fallback (no panic), body truncation at 512 bytes, and body-closed
(no resource leak).

Closes maximhq#2994
Extracted from maximhq#2775

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 25da79e6-65f7-4fa1-b958-b6a3e4428dcd

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

thiscantbeserious added a commit to thiscantbeserious/bifrost that referenced this pull request Apr 24, 2026
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@thiscantbeserious
Copy link
Copy Markdown
Author

Superseded by #3018 (merged 2026-04-24), which bundles the fix for this issue and several other native WS reliability bugs. Closing this draft as redundant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: WebSocket pool dial error discards upstream HTTP status and body

1 participant