adds concurrent map fixes for plugin timer#2834
Conversation
|
|
📝 WalkthroughSummary by CodeRabbit
WalkthroughPluginPipeline gains a Changes
Sequence Diagram(s)sequenceDiagram
participant RequestWorker
participant AttemptFinalizer as Finalizer (attempt-local / ctx-registered)
participant PluginPipeline
participant ProviderGoroutine as Provider
RequestWorker->>PluginPipeline: allocate pipeline & postHookRunner (per attempt)
RequestWorker->>AttemptFinalizer: register finalizer in ctx (sync.Once)
RequestWorker->>ProviderGoroutine: start provider goroutine using pipeline
ProviderGoroutine->>PluginPipeline: RunPostLLMHooks (increment chunkCount under streamingMu)
ProviderGoroutine->>PluginPipeline: accumulatePluginTiming (lock streamingMu, update timings/order)
alt end-of-stream or streaming error
AttemptFinalizer->>PluginPipeline: FinalizeStreamingPostHookSpans (snapshot under streamingMu, unlock, create/end spans)
AttemptFinalizer->>PluginPipeline: resetPluginPipeline (scrub under streamingMu, nil cross-request refs)
else non-streaming or failed setup
RequestWorker->>PluginPipeline: release pipeline immediately (attempt-local finalizer)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.4)level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies" Comment |
Confidence Score: 5/5Safe to merge — all P0/P1 concerns from the previous review round are addressed; only a P2 consistency observation remains. The core race conditions are correctly fixed: No files require special attention; Important Files Changed
Reviews (4): Last reviewed commit: "adds concurrent map fixes for plugin tim..." | Re-trigger Greptile |
9f0feea to
2eec161
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/bifrost.go (1)
6025-6038: 🛠️ Refactor suggestion | 🟠 MajorReset the remaining pooled references before
Put().
resetPluginPipeline()still leavesllmPlugins,mcpPlugins,logger,tracer, and the backing references insidepreHookErrors/postHookErrorslive on the pooled object. That keeps stale request/plugin state reachable across pool reuse.♻️ Proposed fix
func (p *PluginPipeline) resetPluginPipeline() { p.executedPreHooks = 0 - p.preHookErrors = p.preHookErrors[:0] - p.postHookErrors = p.postHookErrors[:0] + for i := range p.preHookErrors { + p.preHookErrors[i] = nil + } + p.preHookErrors = p.preHookErrors[:0] + for i := range p.postHookErrors { + p.postHookErrors[i] = nil + } + p.postHookErrors = p.postHookErrors[:0] + p.llmPlugins = nil + p.mcpPlugins = nil + p.logger = nil + p.tracer = nil // Reset streaming timing accumulation under lock — the provider goroutine's // deferred finalizer may still be iterating these fields when the pipeline // is returned to the pool. p.streamingMu.Lock() p.chunkCount = 0 - if p.postHookTimings != nil { - clear(p.postHookTimings) - } - p.postHookPluginOrder = p.postHookPluginOrder[:0] + p.postHookTimings = nil + p.postHookPluginOrder = nil p.streamingMu.Unlock() }As per coding guidelines, "Always reset all fields of pooled objects before calling pool.Put() to prevent stale data leaks".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/bifrost.go` around lines 6025 - 6038, resetPluginPipeline currently leaves references reachable on the pooled PluginPipeline (llmPlugins, mcpPlugins, logger, tracer and elements inside preHookErrors/postHookErrors); update resetPluginPipeline to nil out plugin slices and observers and to zero out slice elements before truncating the error slices — specifically, for PluginPipeline ensure p.llmPlugins = nil, p.mcpPlugins = nil, p.logger = nil, p.tracer = nil, and loop over p.preHookErrors and p.postHookErrors setting each element to nil before doing p.preHookErrors = p.preHookErrors[:0] and p.postHookErrors = p.postHookErrors[:0] (also consider setting p.postHookTimings = nil if appropriate) so no stale references remain when the object is returned to the pool.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@core/plugin_pipeline_race_test.go`:
- Around line 67-74: The test only exercises GetChunkCount readers but not the
writer path that increments the counter; add a concurrent writer goroutine that
drives the write path by repeatedly calling the method that increments the
counter (RunPostLLMHooks or whatever wrapper invokes chunkCount++ behind
streamingMu) on the same p instance while the readers run, so the read/write
synchronization on chunkCount protected by streamingMu is exercised; ensure the
writer loop runs for the same iterations and use wg to wait for it to finish.
---
Outside diff comments:
In `@core/bifrost.go`:
- Around line 6025-6038: resetPluginPipeline currently leaves references
reachable on the pooled PluginPipeline (llmPlugins, mcpPlugins, logger, tracer
and elements inside preHookErrors/postHookErrors); update resetPluginPipeline to
nil out plugin slices and observers and to zero out slice elements before
truncating the error slices — specifically, for PluginPipeline ensure
p.llmPlugins = nil, p.mcpPlugins = nil, p.logger = nil, p.tracer = nil, and loop
over p.preHookErrors and p.postHookErrors setting each element to nil before
doing p.preHookErrors = p.preHookErrors[:0] and p.postHookErrors =
p.postHookErrors[:0] (also consider setting p.postHookTimings = nil if
appropriate) so no stale references remain when the object is returned to the
pool.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: aceafe96-0f48-4429-a7d3-add177fa13bb
📒 Files selected for processing (2)
core/bifrost.gocore/plugin_pipeline_race_test.go
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
core/pluginpipelinerace_test.go (1)
34-74: This regression test never exercises thechunkCountwrite path.
GetChunkCount()is only meaningful here if something is concurrently writingchunkCount, and that write only happens in streamingRunPostLLMHooks(). This test stresses the timing accumulators, but it never drives the fourth changed path from the PR, so a regression around the newchunkCountlock can still pass.Please add one goroutine that calls
RunPostLLMHooks()withBifrostContextKeyStreamStartTimeset, so the reader at Lines 67-74 overlaps with the actual writer instead of onlyresetPluginPipeline().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/pluginpipelinerace_test.go` around lines 34 - 74, The test never exercises the chunkCount write path; add a goroutine that calls RunPostLLMHooks with a context containing BifrostContextKeyStreamStartTime so the chunkCount writer runs concurrently with the GetChunkCount reader; create a context via context.WithValue(context.Background(), BifrostContextKeyStreamStartTime, time.Now()) (or similar) and repeatedly call p.RunPostLLMHooks(ctx) inside a loop (similar iteration count as other goroutines) so the reader at GetChunkCount() overlaps an actual writer rather than only resetPluginPipeline().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@core/bifrost.go`:
- Around line 6029-6038: The resetPluginPipeline() implementation on
PluginPipeline leaves cross-request references (llmPlugins, mcpPlugins, logger,
tracer, postHookTimings, postHookPluginOrder, etc.) attached before pool.Put(),
causing leaks; update resetPluginPipeline() to fully scrub the struct: under
p.streamingMu lock zero counters (chunkCount), clear and nil-out slices/maps
(llmPlugins, mcpPlugins, postHookPluginOrder, postHookTimings), and nil any
retained interfaces/refs (logger, tracer) so no external objects remain
referenced before returning the pipeline to the pool via pool.Put().
- Around line 5257-5275: executeRequestWithRetries is currently using a single
request-scoped finalizer (sync.Once / postHookRunner) for all stream retry
attempts, which allows the first failed attempt to consume the finalizer and
releasePluginPipeline(pipeline) leaving later retries using a returned pipeline;
fix by making the streaming finalizer attempt-scoped: register a new post-hook
finalizer (fresh sync.Once/postHookRunner) before each call to
handleProviderStreamRequest (or re-register it immediately after
CheckFirstStreamChunkForError triggers a retry) and ensure the branch that
invokes the finalizer uses the attempt-scoped value (or falls back to directly
calling releasePluginPipeline(pipeline) only when no attempt-scoped finalizer
exists), keeping executeRequestWithRetries, CheckFirstStreamChunkForError,
postHookRunner, handleProviderStreamRequest and releasePluginPipeline as the
reference points.
---
Nitpick comments:
In `@core/pluginpipelinerace_test.go`:
- Around line 34-74: The test never exercises the chunkCount write path; add a
goroutine that calls RunPostLLMHooks with a context containing
BifrostContextKeyStreamStartTime so the chunkCount writer runs concurrently with
the GetChunkCount reader; create a context via
context.WithValue(context.Background(), BifrostContextKeyStreamStartTime,
time.Now()) (or similar) and repeatedly call p.RunPostLLMHooks(ctx) inside a
loop (similar iteration count as other goroutines) so the reader at
GetChunkCount() overlaps an actual writer rather than only
resetPluginPipeline().
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 53850deb-c591-4a8d-8616-155020d20258
📒 Files selected for processing (3)
AGENTS.mdcore/bifrost.gocore/pluginpipelinerace_test.go
✅ Files skipped from review due to trivial changes (1)
- AGENTS.md
2eec161 to
c765e30
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
core/pluginpipelinerace_test.go (1)
67-74:⚠️ Potential issue | 🟡 MinorExercise the
chunkCount++writer path, not only readers.Line 67-Line 74 currently drives only
GetChunkCount()reads. The test still doesn’t hit the streaming increment path (RunPostLLMHooks-sidechunkCount++) that was moved understreamingMu, so that read/write contention remains untested.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/pluginpipelinerace_test.go` around lines 67 - 74, The test currently only spawns readers calling p.GetChunkCount() and never exercises the writer path that increments chunkCount in RunPostLLMHooks; add a concurrent goroutine that calls RunPostLLMHooks (or triggers the code path that performs chunkCount++) for multiple iterations under the same waitgroup to create read/write contention with streamingMu, ensuring the streaming increment path (chunkCount++) is executed alongside GetChunkCount() readers; reference p.GetChunkCount and RunPostLLMHooks/streamingMu/chunkCount++ when locating where to add the writer goroutine.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@core/pluginpipelinerace_test.go`:
- Around line 67-74: The test currently only spawns readers calling
p.GetChunkCount() and never exercises the writer path that increments chunkCount
in RunPostLLMHooks; add a concurrent goroutine that calls RunPostLLMHooks (or
triggers the code path that performs chunkCount++) for multiple iterations under
the same waitgroup to create read/write contention with streamingMu, ensuring
the streaming increment path (chunkCount++) is executed alongside
GetChunkCount() readers; reference p.GetChunkCount and
RunPostLLMHooks/streamingMu/chunkCount++ when locating where to add the writer
goroutine.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4fb7e75c-54d2-4b37-8c01-76fdb3dd5b5d
📒 Files selected for processing (3)
AGENTS.mdcore/bifrost.gocore/pluginpipelinerace_test.go
✅ Files skipped from review due to trivial changes (2)
- AGENTS.md
- core/bifrost.go
c765e30 to
0c085db
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
core/pluginpipelinerace_test.go (1)
67-74:⚠️ Potential issue | 🟡 MinorAdd a concurrent chunk-count writer to complete this race test.
This currently exercises
GetChunkCount()reads only. It does not drive thechunkCountwrite path (RunPostLLMHooksincrement), so read/write synchronization forchunkCountremains untested here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/pluginpipelinerace_test.go` around lines 67 - 74, The test only spawns readers calling p.GetChunkCount() and never exercises the write path; add a concurrent writer goroutine that calls the method which increments chunkCount (RunPostLLMHooks on the same plugin instance p) inside the same iterations loop, protect goroutine lifecycle with wg.Add(1)/wg.Done(), and ensure it runs concurrently with the existing reader to exercise read/write synchronization on chunkCount.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@core/pluginpipelinerace_test.go`:
- Around line 67-74: The test only spawns readers calling p.GetChunkCount() and
never exercises the write path; add a concurrent writer goroutine that calls the
method which increments chunkCount (RunPostLLMHooks on the same plugin instance
p) inside the same iterations loop, protect goroutine lifecycle with
wg.Add(1)/wg.Done(), and ensure it runs concurrently with the existing reader to
exercise read/write synchronization on chunkCount.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 30b40367-d731-48ba-9435-ea3e867e1f51
📒 Files selected for processing (3)
AGENTS.mdcore/bifrost.gocore/pluginpipelinerace_test.go
✅ Files skipped from review due to trivial changes (2)
- AGENTS.md
- core/bifrost.go
Merge activity
|

Summary
Fixes a
fatal error: concurrent map read and map writepanic inPluginPipelineduring streaming requests. Per-chunk writers (accumulatePluginTiming) run in the provider goroutine while the end-of-stream finalizer (FinalizeStreamingPostHookSpans) and pool recycler (resetPluginPipeline) can run concurrently on different goroutines, causing unsynchronised access to the sharedpostHookTimingsmap and related fields.Changes
streamingMu sync.MutextoPluginPipelineto guardpostHookTimings,postHookPluginOrder, andchunkCountacross all concurrent accessors (accumulatePluginTiming,FinalizeStreamingPostHookSpans,resetPluginPipeline,GetChunkCount, and the streaming chunk counter inRunPostLLMHooks).FinalizeStreamingPostHookSpansnow snapshots the accumulator state under the lock and performs all tracer/span I/O outside the lock, avoiding stalling per-chunk writers on span operations.requestWorker: when a streaming request errors, the pipeline is now released through thesync.Once-wrapped finalizer already registered on the context, rather than being released directly alongside a potentially in-flight deferred finalizer call.core/plugin_pipeline_race_test.gowhich reproduces the original panic by hammering all four concurrent paths simultaneously; intended to be run with-race.Type of change
Affected areas
How to test
Breaking changes
Security considerations
None — changes are limited to internal synchronisation of in-process goroutines with no impact on auth, secrets, or external interfaces.
Checklist
docs/contributing/README.mdand followed the guidelines