Skip to content

fix(logging): flush orphan log row before TTL sweeper deletes pending entry#3022

Closed
thiscantbeserious wants to merge 1 commit intomaximhq:mainfrom
thiscantbeserious:fix/3003-ttl-sweeper-flush-orphan
Closed

fix(logging): flush orphan log row before TTL sweeper deletes pending entry#3022
thiscantbeserious wants to merge 1 commit intomaximhq:mainfrom
thiscantbeserious:fix/3003-ttl-sweeper-flush-orphan

Conversation

@thiscantbeserious
Copy link
Copy Markdown

Summary

When a request reaches PreLLMHook but PostLLMHook never fires (client disconnect, context cancel, native-WS upstream stall, container restart mid-flight), the PendingLogData entry is left in pendingLogsEntries indefinitely. The TTL sweeper evicts it after 5 minutes with a bare sync.Map.Delete and no DB write. The audit row is permanently lost and operators see a request that appeared as "processing" in the UI simply vanish.

Root cause: cleanupStalePendingLogs in plugins/logging/writer.go (lines 178-196) contains no call to buildInitialLogEntry or enqueueLogEntry. Both pendingLogsEntries and pendingLogsToInject are evicted silently.

Changes

  • plugins/logging/writer.go (cleanupStalePendingLogs): before deleting each expired pendingLogsEntries entry, call buildInitialLogEntry to construct a minimal logstore.Log, set Status to "error" and ErrorDetails to a message describing the TTL eviction, call enqueueLogEntry to write the row to the DB, then delete the in-memory entry. A Warn log is emitted per eviction so the event is visible in structured logs.
  • plugins/logging/sweeper_test.go (new file): four unit tests covering stale-entry flush, fresh-entry survival, multi-entry flush, and stale pendingLogsToInject eviction.

The pendingLogsToInject entries are not individually DB-flushed because they are keyed by traceID and lack a standalone requestID anchor. Their eviction remains a bare delete, consistent with the current design.

Type of change

  • Bug fix

Affected areas

  • Plugins

How to test

go build ./plugins/logging/...
go test ./plugins/logging/... -cover -run TestSweeper -v
go vet ./plugins/logging/...

Expected output: all four TestSweeper* tests pass. Coverage on cleanupStalePendingLogs is above 90%.

To reproduce the original bug and verify the fix end-to-end, configure a stalling mock WS upstream, fire a request, disconnect before any response, wait past pendingLogTTL (5 minutes) plus one sweeper tick (1 minute), and query the logs table. With this fix applied, a row with status = "error" and an error_details message containing "abandoned" appears. Without the fix, no row is written.

Screenshots/Recordings

N/A (backend-only change, no UI impact)

Breaking changes

  • No

Related issues

Closes #3003

Related: this bug was discovered while working on PR #2775 in thiscantbeserious/bifrost. The feature branch (PR #2775) does not contain a prior fix for this specific issue. This PR extracts the standalone fix.

See also: #2997, #2999, #3001 (adjacent native-WS logging bugs that increase the frequency of orphaned entries but are separate root causes).

Security considerations

None. The fix writes an additional DB row on TTL eviction. No secrets or PII beyond what buildInitialLogEntry already captures from PendingLogData are included.

Checklist

  • I read docs/contributing/README.md and followed the guidelines
  • I added/updated tests where appropriate
  • I updated documentation where needed
  • I verified builds succeed (Go and UI)
  • I verified the CI pipeline passes locally if applicable

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e2d18eb6-ed18-4c77-ab19-430106e329e4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@thiscantbeserious
Copy link
Copy Markdown
Author

Superseded by #3018 (merged 2026-04-24), which bundles the fix for this issue and several other native WS reliability bugs. Closing this draft as redundant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: TTL sweeper silently discards orphaned pending log entries with no DB row written

1 participant