Skip to content

fix: finalize native WS log on parse failure and clean close#3016

Closed
thiscantbeserious wants to merge 1 commit intomaximhq:mainfrom
thiscantbeserious:fix/2997-ws-log-finalize
Closed

fix: finalize native WS log on parse failure and clean close#3016
thiscantbeserious wants to merge 1 commit intomaximhq:mainfrom
thiscantbeserious:fix/2997-ws-log-finalize

Conversation

@thiscantbeserious
Copy link
Copy Markdown

Summary

Every native-WS request that exchanges frames with an upstream must produce exactly one log row. Two independent code paths in tryNativeWSUpstream broke that invariant, leaving log entries stuck in "processing" until the 5-minute TTL sweeper silently evicted them.

Path A - parse failure: parseUpstreamWSEvent does a full sonic.Unmarshal into BifrostResponsesStreamResponse and returns nil on any error. A frame that is valid JSON but contains nested fields the struct cannot accept (observed with large ~57 KB response.completed frames from chatgpt.com/backend-api/codex) causes streamResp to stay nil. isTerminal never becomes true, StreamEndIndicator is never set, and PostHookRunner is never called with a terminal response, so the logging plugin never writes the DB row.

Path B - clean upstream close: when ReadMessage errors after at least one frame has been forwarded but no terminal post-hook has fired yet, the read loop wrote a 502 to the client and returned without calling PostHookRunner. The in-memory pending log entry was later silently evicted by the TTL sweeper with no DB trace.

Changes

  • Add extractStreamEventType(data []byte) - performs a minimal single-field unmarshal to read only the type key as a fallback when the full struct parse fails. Returns an empty string on malformed JSON or missing type.
  • Add synthesizeTerminalStreamResponse(provider, model, eventType) - builds a minimal BifrostResponsesStreamResponse with ExtraFields populated so downstream plugins (logging, tracing) can identify the request type, provider, and model.
  • In tryNativeWSUpstream: add terminalPostHookFired guard. When the upstream WS closes (cleanly or with an error) after forwarding content but before a terminal post-hook fired, synthesize a response.completed event, set StreamEndIndicator=true, and call PostHookRunner once so the logging plugin writes the DB row. Clean closes (codes 1000, 1001, 1005) do not write a 502 to the client; abnormal closes still do.
  • When the full struct parse fails on an in-flight frame, fall back to extractStreamEventType. If the raw type is a recognized terminal type, synthesize a minimal struct and fire the terminal post-hook so the log is finalized.
  • Unit tests for extractStreamEventType (valid terminal, valid non-terminal, malformed JSON, missing type field, unknown nested fields), synthesizeTerminalStreamResponse field population, and isTerminalStreamType coverage.

Type

  • Bug fix
  • New feature
  • Refactoring
  • Documentation
  • Other

Affected areas

  • Transports (HTTP) - transports/bifrost-http/handlers/wsresponses.go

How to test

Run the unit tests directly:

go test ./transports/bifrost-http/handlers/... -run "TestExtractStreamEventType|TestSynthesizeTerminalStreamResponse|TestIsTerminalStreamType" -v
go test ./transports/bifrost-http/handlers/... -cover
go vet ./transports/bifrost-http/...

Manual reproduction for Path B (clean close without terminal event):

cat > /tmp/ws_mock_clean_close.py <<'PY'
import asyncio, json
import websockets

async def handler(ws):
    await ws.send(json.dumps({"type":"response.created","response":{"id":"resp_mock_1","status":"in_progress"},"sequence_number":0}))
    await asyncio.sleep(0.5)
    await ws.close(code=1000, reason="clean close")

async def main():
    async with websockets.serve(handler, "0.0.0.0", 9998):
        await asyncio.Future()

asyncio.run(main())
PY
python3 /tmp/ws_mock_clean_close.py &

sqlite3 bifrost-data/logs.db "SELECT COUNT(*) FROM logs"

printf '{"type":"response.create","model":"openai/gpt-4o","input":[{"type":"message","role":"user","content":[{"type":"input_text","text":"hi"}]}]}\n' \
  | websocat -n1 ws://localhost:8080/openai/v1/responses -H='Authorization: Bearer sk-fake-key'

sleep 2
sqlite3 bifrost-data/logs.db "SELECT COUNT(*) FROM logs"

Before this fix: count unchanged (no row written). After: count increases by one with status success.

Breaking changes

No.

Related

Closes #2997

This bug was discovered while working on PR #2775 in thiscantbeserious/bifrost. That branch contains a potential fix as a side-effect of the feature work. The fix has been extracted here as a focused standalone change.

…#2997)

Two independent paths in tryNativeWSUpstream could leave a log entry
stuck in "processing" indefinitely:

Path A (parse failure): parseUpstreamWSEvent returns nil on any
unmarshal error. Large or shape-shifted frames (e.g. a 57 KB
response.completed with nested fields the struct cannot accept) would
never trigger the terminal post-hook, so StreamEndIndicator was never
set and the logging plugin never wrote the DB row.

Fix: add extractStreamEventType() to do a minimal type-field probe as
fallback when the full unmarshal fails. When the raw type is a terminal
event type, synthesizeTerminalStreamResponse() builds a minimal struct
that satisfies the post-hook so the log is finalized.

Path B (clean upstream close): when ReadMessage errors after content
has been forwarded but no terminal post-hook has fired, the read loop
used to write a 502 and return without calling PostHookRunner. Clean
closes (codes 1000, 1001, 1005) are now distinguished from abnormal
ones. Either way, a synthesized response.completed event is sent to
PostHookRunner with StreamEndIndicator=true so the log row is written.
Only abnormal closes send the 502 error to the client.

A terminalPostHookFired guard prevents double-firing when a terminal
event was already forwarded normally.

Adds unit tests for extractStreamEventType (valid terminal, valid
non-terminal, malformed JSON, missing type field, unknown nested
fields), synthesizeTerminalStreamResponse, and isTerminalStreamType.

Closes maximhq#2997
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3556c8fc-da9d-42e1-9e07-a644dd929687

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

thiscantbeserious added a commit to thiscantbeserious/bifrost that referenced this pull request Apr 24, 2026
…ed to maximhq#3016)

Remove the terminalPostHookFired guard, extractStreamEventType fallback,
synthesizeTerminalStreamResponse helper, and associated tests from the
feature branch. These were extracted into focused PR maximhq#3016 against
maximhq/bifrost main.

The remaining diagnostic frame logging, wsFramePreview, and diag
variables are preserved as they belong to a separate extraction.
@thiscantbeserious
Copy link
Copy Markdown
Author

Superseded by #3018 (merged 2026-04-24), which bundles the fix for this issue and several other native WS reliability bugs. Closing this draft as redundant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Native WebSocket upstream requests never produce a log row on clean close or unparseable terminal event

1 participant