fix(logging): flush orphan log row before TTL sweeper deletes pending entry#3022
fix(logging): flush orphan log row before TTL sweeper deletes pending entry#3022thiscantbeserious wants to merge 1 commit intomaximhq:mainfrom
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…-branch duplicate)
|
Superseded by #3018 (merged 2026-04-24), which bundles the fix for this issue and several other native WS reliability bugs. Closing this draft as redundant. |
Summary
When a request reaches
PreLLMHookbutPostLLMHooknever fires (client disconnect, context cancel, native-WS upstream stall, container restart mid-flight), thePendingLogDataentry is left inpendingLogsEntriesindefinitely. The TTL sweeper evicts it after 5 minutes with a baresync.Map.Deleteand no DB write. The audit row is permanently lost and operators see a request that appeared as "processing" in the UI simply vanish.Root cause:
cleanupStalePendingLogsinplugins/logging/writer.go(lines 178-196) contains no call tobuildInitialLogEntryorenqueueLogEntry. BothpendingLogsEntriesandpendingLogsToInjectare evicted silently.Changes
plugins/logging/writer.go(cleanupStalePendingLogs): before deleting each expiredpendingLogsEntriesentry, callbuildInitialLogEntryto construct a minimallogstore.Log, setStatusto"error"andErrorDetailsto a message describing the TTL eviction, callenqueueLogEntryto write the row to the DB, then delete the in-memory entry. AWarnlog is emitted per eviction so the event is visible in structured logs.plugins/logging/sweeper_test.go(new file): four unit tests covering stale-entry flush, fresh-entry survival, multi-entry flush, and stalependingLogsToInjecteviction.The
pendingLogsToInjectentries are not individually DB-flushed because they are keyed by traceID and lack a standalone requestID anchor. Their eviction remains a bare delete, consistent with the current design.Type of change
Affected areas
How to test
go build ./plugins/logging/... go test ./plugins/logging/... -cover -run TestSweeper -v go vet ./plugins/logging/...Expected output: all four
TestSweeper*tests pass. Coverage oncleanupStalePendingLogsis above 90%.To reproduce the original bug and verify the fix end-to-end, configure a stalling mock WS upstream, fire a request, disconnect before any response, wait past
pendingLogTTL(5 minutes) plus one sweeper tick (1 minute), and query thelogstable. With this fix applied, a row withstatus = "error"and anerror_detailsmessage containing "abandoned" appears. Without the fix, no row is written.Screenshots/Recordings
N/A (backend-only change, no UI impact)
Breaking changes
Related issues
Closes #3003
Related: this bug was discovered while working on PR #2775 in thiscantbeserious/bifrost. The feature branch (PR #2775) does not contain a prior fix for this specific issue. This PR extracts the standalone fix.
See also: #2997, #2999, #3001 (adjacent native-WS logging bugs that increase the frequency of orphaned entries but are separate root causes).
Security considerations
None. The fix writes an additional DB row on TTL eviction. No secrets or PII beyond what
buildInitialLogEntryalready captures fromPendingLogDataare included.Checklist
docs/contributing/README.mdand followed the guidelines