fix: finalize native WS log on parse failure and clean close#3016
fix: finalize native WS log on parse failure and clean close#3016thiscantbeserious wants to merge 1 commit intomaximhq:mainfrom
Conversation
…#2997) Two independent paths in tryNativeWSUpstream could leave a log entry stuck in "processing" indefinitely: Path A (parse failure): parseUpstreamWSEvent returns nil on any unmarshal error. Large or shape-shifted frames (e.g. a 57 KB response.completed with nested fields the struct cannot accept) would never trigger the terminal post-hook, so StreamEndIndicator was never set and the logging plugin never wrote the DB row. Fix: add extractStreamEventType() to do a minimal type-field probe as fallback when the full unmarshal fails. When the raw type is a terminal event type, synthesizeTerminalStreamResponse() builds a minimal struct that satisfies the post-hook so the log is finalized. Path B (clean upstream close): when ReadMessage errors after content has been forwarded but no terminal post-hook has fired, the read loop used to write a 502 and return without calling PostHookRunner. Clean closes (codes 1000, 1001, 1005) are now distinguished from abnormal ones. Either way, a synthesized response.completed event is sent to PostHookRunner with StreamEndIndicator=true so the log row is written. Only abnormal closes send the 502 error to the client. A terminalPostHookFired guard prevents double-firing when a terminal event was already forwarded normally. Adds unit tests for extractStreamEventType (valid terminal, valid non-terminal, malformed JSON, missing type field, unknown nested fields), synthesizeTerminalStreamResponse, and isTerminalStreamType. Closes maximhq#2997
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…ed to maximhq#3016) Remove the terminalPostHookFired guard, extractStreamEventType fallback, synthesizeTerminalStreamResponse helper, and associated tests from the feature branch. These were extracted into focused PR maximhq#3016 against maximhq/bifrost main. The remaining diagnostic frame logging, wsFramePreview, and diag variables are preserved as they belong to a separate extraction.
|
Superseded by #3018 (merged 2026-04-24), which bundles the fix for this issue and several other native WS reliability bugs. Closing this draft as redundant. |
Summary
Every native-WS request that exchanges frames with an upstream must produce exactly one log row. Two independent code paths in
tryNativeWSUpstreambroke that invariant, leaving log entries stuck in "processing" until the 5-minute TTL sweeper silently evicted them.Path A - parse failure:
parseUpstreamWSEventdoes a fullsonic.UnmarshalintoBifrostResponsesStreamResponseand returnsnilon any error. A frame that is valid JSON but contains nested fields the struct cannot accept (observed with large ~57 KBresponse.completedframes fromchatgpt.com/backend-api/codex) causesstreamRespto stay nil.isTerminalnever becomes true,StreamEndIndicatoris never set, and PostHookRunner is never called with a terminal response, so the logging plugin never writes the DB row.Path B - clean upstream close: when
ReadMessageerrors after at least one frame has been forwarded but no terminal post-hook has fired yet, the read loop wrote a 502 to the client and returned without calling PostHookRunner. The in-memory pending log entry was later silently evicted by the TTL sweeper with no DB trace.Changes
extractStreamEventType(data []byte)- performs a minimal single-field unmarshal to read only thetypekey as a fallback when the full struct parse fails. Returns an empty string on malformed JSON or missingtype.synthesizeTerminalStreamResponse(provider, model, eventType)- builds a minimalBifrostResponsesStreamResponsewith ExtraFields populated so downstream plugins (logging, tracing) can identify the request type, provider, and model.tryNativeWSUpstream: addterminalPostHookFiredguard. When the upstream WS closes (cleanly or with an error) after forwarding content but before a terminal post-hook fired, synthesize aresponse.completedevent, setStreamEndIndicator=true, and call PostHookRunner once so the logging plugin writes the DB row. Clean closes (codes 1000, 1001, 1005) do not write a 502 to the client; abnormal closes still do.extractStreamEventType. If the raw type is a recognized terminal type, synthesize a minimal struct and fire the terminal post-hook so the log is finalized.extractStreamEventType(valid terminal, valid non-terminal, malformed JSON, missing type field, unknown nested fields),synthesizeTerminalStreamResponsefield population, andisTerminalStreamTypecoverage.Type
Affected areas
transports/bifrost-http/handlers/wsresponses.goHow to test
Run the unit tests directly:
Manual reproduction for Path B (clean close without terminal event):
Before this fix: count unchanged (no row written). After: count increases by one with status
success.Breaking changes
No.
Related
Closes #2997
This bug was discovered while working on PR #2775 in thiscantbeserious/bifrost. That branch contains a potential fix as a side-effect of the feature work. The fix has been extracted here as a focused standalone change.