Skip to content

adds concurrent map fixes for plugin timer#2834

Merged
akshaydeo merged 1 commit intov1.3.xfrom
04-19-adds_concurrent_map_fixes_for_plugin_timer
Apr 19, 2026
Merged

adds concurrent map fixes for plugin timer#2834
akshaydeo merged 1 commit intov1.3.xfrom
04-19-adds_concurrent_map_fixes_for_plugin_timer

Conversation

@akshaydeo
Copy link
Copy Markdown
Contributor

@akshaydeo akshaydeo commented Apr 19, 2026

Summary

Fixes a fatal error: concurrent map read and map write panic in PluginPipeline during streaming requests. Per-chunk writers (accumulatePluginTiming) run in the provider goroutine while the end-of-stream finalizer (FinalizeStreamingPostHookSpans) and pool recycler (resetPluginPipeline) can run concurrently on different goroutines, causing unsynchronised access to the shared postHookTimings map and related fields.

Changes

  • Added streamingMu sync.Mutex to PluginPipeline to guard postHookTimings, postHookPluginOrder, and chunkCount across all concurrent accessors (accumulatePluginTiming, FinalizeStreamingPostHookSpans, resetPluginPipeline, GetChunkCount, and the streaming chunk counter in RunPostLLMHooks).
  • FinalizeStreamingPostHookSpans now snapshots the accumulator state under the lock and performs all tracer/span I/O outside the lock, avoiding stalling per-chunk writers on span operations.
  • Fixed a double-pool-release race in requestWorker: when a streaming request errors, the pipeline is now released through the sync.Once-wrapped finalizer already registered on the context, rather than being released directly alongside a potentially in-flight deferred finalizer call.
  • Added core/plugin_pipeline_race_test.go which reproduces the original panic by hammering all four concurrent paths simultaneously; intended to be run with -race.

Type of change

  • Bug fix

Affected areas

  • Core (Go)
  • Plugins

How to test

# Reproduce the original race and verify the fix
go test -race -run TestPluginPipelineStreamingRace ./core/...

# Full test suite
go test -race ./...

Breaking changes

  • No

Security considerations

None — changes are limited to internal synchronisation of in-process goroutines with no impact on auth, secrets, or external interfaces.

Checklist

  • I read docs/contributing/README.md and followed the guidelines
  • I added/updated tests where appropriate
  • I updated documentation where needed
  • I verified builds succeed (Go and UI)
  • I verified the CI pipeline passes locally if applicable

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 19, 2026

📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes

    • Improved thread-safety and lifecycle handling for streaming to prevent race conditions and avoid leaking pipeline resources across retries.
  • Tests

    • Added a stress test to surface race conditions in streaming concurrency.
  • Documentation

    • Updated Go filename convention: disallow underscores except for _test.go; require lowercase concatenated multi-word names.

Walkthrough

PluginPipeline gains a streamingMu mutex to protect streaming post-hook state. Streaming pipeline/finalizer allocation was moved per-retry to avoid pooled-instance leaks; releases are now coordinated via ctx-registered finalizers. A race test was added and AGENTS.md forbids underscores in Go filenames (except _test.go).

Changes

Cohort / File(s) Summary
Streaming Concurrency Synchronization
core/bifrost.go
Add streamingMu sync.Mutex to PluginPipeline; guard chunkCount, postHookTimings, postHookPluginOrder, and tracer/logger access. Move pipeline/post-hook runner/finalizer creation into each streaming retry attempt; ensure release via ctx-registered finalizer (sync.Once) or immediate attempt-local release on setup failure. Update RunPostLLMHooks, accumulatePluginTiming, FinalizeStreamingPostHookSpans, resetPluginPipeline, and GetChunkCount to lock and snapshot state safely and scrub cross-request references.
Race Condition Test
core/pluginpipelinerace_test.go
Add TestPluginPipelineStreamingRace to stress concurrent use of accumulatePluginTiming, FinalizeStreamingPostHookSpans, resetPluginPipeline, and GetChunkCount under the race detector.
Filename Convention
AGENTS.md
Add rule disallowing underscores in Go source filenames except the trailing _test.go; require lowercase concatenated multi-word filenames (e.g., pluginpipeline.go).

Sequence Diagram(s)

sequenceDiagram
    participant RequestWorker
    participant AttemptFinalizer as Finalizer (attempt-local / ctx-registered)
    participant PluginPipeline
    participant ProviderGoroutine as Provider
    RequestWorker->>PluginPipeline: allocate pipeline & postHookRunner (per attempt)
    RequestWorker->>AttemptFinalizer: register finalizer in ctx (sync.Once)
    RequestWorker->>ProviderGoroutine: start provider goroutine using pipeline
    ProviderGoroutine->>PluginPipeline: RunPostLLMHooks (increment chunkCount under streamingMu)
    ProviderGoroutine->>PluginPipeline: accumulatePluginTiming (lock streamingMu, update timings/order)
    alt end-of-stream or streaming error
        AttemptFinalizer->>PluginPipeline: FinalizeStreamingPostHookSpans (snapshot under streamingMu, unlock, create/end spans)
        AttemptFinalizer->>PluginPipeline: resetPluginPipeline (scrub under streamingMu, nil cross-request refs)
    else non-streaming or failed setup
        RequestWorker->>PluginPipeline: release pipeline immediately (attempt-local finalizer)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I stitched a mutex where the packets play,
Chunks now count while goroutines sway,
Finalizers whisper when streams are done,
Maps and slices scrubbed, no races to shun,
Hopping on — concurrency saved the day.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'adds concurrent map fixes for plugin timer' directly summarizes the main change: fixing concurrent map access issues in plugin pipeline timing logic during streaming.
Description check ✅ Passed The PR description includes all required sections (Summary, Changes, Type of change, Affected areas, How to test, Breaking changes, Security considerations) and is comprehensive with specific technical details about the synchronization fix.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 04-19-adds_concurrent_map_fixes_for_plugin_timer

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.4)

level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies"


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor Author

akshaydeo commented Apr 19, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

@akshaydeo akshaydeo marked this pull request as ready for review April 19, 2026 03:40
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 19, 2026

Confidence Score: 5/5

Safe to merge — all P0/P1 concerns from the previous review round are addressed; only a P2 consistency observation remains.

The core race conditions are correctly fixed: postHookTimings, postHookPluginOrder, chunkCount, logger, and tracer are all guarded by streamingMu; FinalizeStreamingPostHookSpans snapshots under the lock and does I/O outside; double-pool-release is prevented by sync.Once. The only remaining finding is a P2 note about postHookErrors being mutated outside the lock, which is production-safe due to sync.Once sequencing but is an inconsistency worth documenting.

No files require special attention; core/bifrost.go contains the single P2 observation about postHookErrors.

Important Files Changed

Filename Overview
core/bifrost.go Adds streamingMu sync.Mutex to PluginPipeline guarding postHookTimings, postHookPluginOrder, chunkCount, logger, and tracer; snapshot-then-unlock pattern in FinalizeStreamingPostHookSpans; sync.Once-wrapped finalizer prevents double-release; postHookErrors is still mutated outside the lock in the streaming path
core/pluginpipelinerace_test.go Correctly-named race test (pluginpipelinerace_test.go) that hammers all four concurrent paths; exercises the exact goroutine interleavings possible in production
AGENTS.md Adds plugin_pipeline_race_test.go as an explicit negative example in the Go filename convention rule, reinforcing the no-underscores policy

Reviews (4): Last reviewed commit: "adds concurrent map fixes for plugin tim..." | Re-trigger Greptile

@akshaydeo akshaydeo force-pushed the 04-19-adds_concurrent_map_fixes_for_plugin_timer branch from 9f0feea to 2eec161 Compare April 19, 2026 03:43
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/bifrost.go (1)

6025-6038: 🛠️ Refactor suggestion | 🟠 Major

Reset the remaining pooled references before Put().

resetPluginPipeline() still leaves llmPlugins, mcpPlugins, logger, tracer, and the backing references inside preHookErrors / postHookErrors live on the pooled object. That keeps stale request/plugin state reachable across pool reuse.

♻️ Proposed fix
 func (p *PluginPipeline) resetPluginPipeline() {
 	p.executedPreHooks = 0
-	p.preHookErrors = p.preHookErrors[:0]
-	p.postHookErrors = p.postHookErrors[:0]
+	for i := range p.preHookErrors {
+		p.preHookErrors[i] = nil
+	}
+	p.preHookErrors = p.preHookErrors[:0]
+	for i := range p.postHookErrors {
+		p.postHookErrors[i] = nil
+	}
+	p.postHookErrors = p.postHookErrors[:0]
+	p.llmPlugins = nil
+	p.mcpPlugins = nil
+	p.logger = nil
+	p.tracer = nil
 	// Reset streaming timing accumulation under lock — the provider goroutine's
 	// deferred finalizer may still be iterating these fields when the pipeline
 	// is returned to the pool.
 	p.streamingMu.Lock()
 	p.chunkCount = 0
-	if p.postHookTimings != nil {
-		clear(p.postHookTimings)
-	}
-	p.postHookPluginOrder = p.postHookPluginOrder[:0]
+	p.postHookTimings = nil
+	p.postHookPluginOrder = nil
 	p.streamingMu.Unlock()
 }

As per coding guidelines, "Always reset all fields of pooled objects before calling pool.Put() to prevent stale data leaks".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/bifrost.go` around lines 6025 - 6038, resetPluginPipeline currently
leaves references reachable on the pooled PluginPipeline (llmPlugins,
mcpPlugins, logger, tracer and elements inside preHookErrors/postHookErrors);
update resetPluginPipeline to nil out plugin slices and observers and to zero
out slice elements before truncating the error slices — specifically, for
PluginPipeline ensure p.llmPlugins = nil, p.mcpPlugins = nil, p.logger = nil,
p.tracer = nil, and loop over p.preHookErrors and p.postHookErrors setting each
element to nil before doing p.preHookErrors = p.preHookErrors[:0] and
p.postHookErrors = p.postHookErrors[:0] (also consider setting p.postHookTimings
= nil if appropriate) so no stale references remain when the object is returned
to the pool.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/plugin_pipeline_race_test.go`:
- Around line 67-74: The test only exercises GetChunkCount readers but not the
writer path that increments the counter; add a concurrent writer goroutine that
drives the write path by repeatedly calling the method that increments the
counter (RunPostLLMHooks or whatever wrapper invokes chunkCount++ behind
streamingMu) on the same p instance while the readers run, so the read/write
synchronization on chunkCount protected by streamingMu is exercised; ensure the
writer loop runs for the same iterations and use wg to wait for it to finish.

---

Outside diff comments:
In `@core/bifrost.go`:
- Around line 6025-6038: resetPluginPipeline currently leaves references
reachable on the pooled PluginPipeline (llmPlugins, mcpPlugins, logger, tracer
and elements inside preHookErrors/postHookErrors); update resetPluginPipeline to
nil out plugin slices and observers and to zero out slice elements before
truncating the error slices — specifically, for PluginPipeline ensure
p.llmPlugins = nil, p.mcpPlugins = nil, p.logger = nil, p.tracer = nil, and loop
over p.preHookErrors and p.postHookErrors setting each element to nil before
doing p.preHookErrors = p.preHookErrors[:0] and p.postHookErrors =
p.postHookErrors[:0] (also consider setting p.postHookTimings = nil if
appropriate) so no stale references remain when the object is returned to the
pool.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: aceafe96-0f48-4429-a7d3-add177fa13bb

📥 Commits

Reviewing files that changed from the base of the PR and between 3d46b87 and 9f0feea.

📒 Files selected for processing (2)
  • core/bifrost.go
  • core/plugin_pipeline_race_test.go

Comment thread core/pluginpipelinerace_test.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
core/pluginpipelinerace_test.go (1)

34-74: This regression test never exercises the chunkCount write path.

GetChunkCount() is only meaningful here if something is concurrently writing chunkCount, and that write only happens in streaming RunPostLLMHooks(). This test stresses the timing accumulators, but it never drives the fourth changed path from the PR, so a regression around the new chunkCount lock can still pass.

Please add one goroutine that calls RunPostLLMHooks() with BifrostContextKeyStreamStartTime set, so the reader at Lines 67-74 overlaps with the actual writer instead of only resetPluginPipeline().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/pluginpipelinerace_test.go` around lines 34 - 74, The test never
exercises the chunkCount write path; add a goroutine that calls RunPostLLMHooks
with a context containing BifrostContextKeyStreamStartTime so the chunkCount
writer runs concurrently with the GetChunkCount reader; create a context via
context.WithValue(context.Background(), BifrostContextKeyStreamStartTime,
time.Now()) (or similar) and repeatedly call p.RunPostLLMHooks(ctx) inside a
loop (similar iteration count as other goroutines) so the reader at
GetChunkCount() overlaps an actual writer rather than only
resetPluginPipeline().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/bifrost.go`:
- Around line 6029-6038: The resetPluginPipeline() implementation on
PluginPipeline leaves cross-request references (llmPlugins, mcpPlugins, logger,
tracer, postHookTimings, postHookPluginOrder, etc.) attached before pool.Put(),
causing leaks; update resetPluginPipeline() to fully scrub the struct: under
p.streamingMu lock zero counters (chunkCount), clear and nil-out slices/maps
(llmPlugins, mcpPlugins, postHookPluginOrder, postHookTimings), and nil any
retained interfaces/refs (logger, tracer) so no external objects remain
referenced before returning the pipeline to the pool via pool.Put().
- Around line 5257-5275: executeRequestWithRetries is currently using a single
request-scoped finalizer (sync.Once / postHookRunner) for all stream retry
attempts, which allows the first failed attempt to consume the finalizer and
releasePluginPipeline(pipeline) leaving later retries using a returned pipeline;
fix by making the streaming finalizer attempt-scoped: register a new post-hook
finalizer (fresh sync.Once/postHookRunner) before each call to
handleProviderStreamRequest (or re-register it immediately after
CheckFirstStreamChunkForError triggers a retry) and ensure the branch that
invokes the finalizer uses the attempt-scoped value (or falls back to directly
calling releasePluginPipeline(pipeline) only when no attempt-scoped finalizer
exists), keeping executeRequestWithRetries, CheckFirstStreamChunkForError,
postHookRunner, handleProviderStreamRequest and releasePluginPipeline as the
reference points.

---

Nitpick comments:
In `@core/pluginpipelinerace_test.go`:
- Around line 34-74: The test never exercises the chunkCount write path; add a
goroutine that calls RunPostLLMHooks with a context containing
BifrostContextKeyStreamStartTime so the chunkCount writer runs concurrently with
the GetChunkCount reader; create a context via
context.WithValue(context.Background(), BifrostContextKeyStreamStartTime,
time.Now()) (or similar) and repeatedly call p.RunPostLLMHooks(ctx) inside a
loop (similar iteration count as other goroutines) so the reader at
GetChunkCount() overlaps an actual writer rather than only
resetPluginPipeline().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 53850deb-c591-4a8d-8616-155020d20258

📥 Commits

Reviewing files that changed from the base of the PR and between 9f0feea and 2eec161.

📒 Files selected for processing (3)
  • AGENTS.md
  • core/bifrost.go
  • core/pluginpipelinerace_test.go
✅ Files skipped from review due to trivial changes (1)
  • AGENTS.md

Comment thread core/bifrost.go Outdated
Comment thread core/bifrost.go
@akshaydeo akshaydeo force-pushed the 04-19-adds_concurrent_map_fixes_for_plugin_timer branch from 2eec161 to c765e30 Compare April 19, 2026 04:28
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
core/pluginpipelinerace_test.go (1)

67-74: ⚠️ Potential issue | 🟡 Minor

Exercise the chunkCount++ writer path, not only readers.

Line 67-Line 74 currently drives only GetChunkCount() reads. The test still doesn’t hit the streaming increment path (RunPostLLMHooks-side chunkCount++) that was moved under streamingMu, so that read/write contention remains untested.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/pluginpipelinerace_test.go` around lines 67 - 74, The test currently
only spawns readers calling p.GetChunkCount() and never exercises the writer
path that increments chunkCount in RunPostLLMHooks; add a concurrent goroutine
that calls RunPostLLMHooks (or triggers the code path that performs
chunkCount++) for multiple iterations under the same waitgroup to create
read/write contention with streamingMu, ensuring the streaming increment path
(chunkCount++) is executed alongside GetChunkCount() readers; reference
p.GetChunkCount and RunPostLLMHooks/streamingMu/chunkCount++ when locating where
to add the writer goroutine.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@core/pluginpipelinerace_test.go`:
- Around line 67-74: The test currently only spawns readers calling
p.GetChunkCount() and never exercises the writer path that increments chunkCount
in RunPostLLMHooks; add a concurrent goroutine that calls RunPostLLMHooks (or
triggers the code path that performs chunkCount++) for multiple iterations under
the same waitgroup to create read/write contention with streamingMu, ensuring
the streaming increment path (chunkCount++) is executed alongside
GetChunkCount() readers; reference p.GetChunkCount and
RunPostLLMHooks/streamingMu/chunkCount++ when locating where to add the writer
goroutine.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4fb7e75c-54d2-4b37-8c01-76fdb3dd5b5d

📥 Commits

Reviewing files that changed from the base of the PR and between 2eec161 and c765e30.

📒 Files selected for processing (3)
  • AGENTS.md
  • core/bifrost.go
  • core/pluginpipelinerace_test.go
✅ Files skipped from review due to trivial changes (2)
  • AGENTS.md
  • core/bifrost.go

Comment thread core/bifrost.go
@akshaydeo akshaydeo force-pushed the 04-19-adds_concurrent_map_fixes_for_plugin_timer branch from c765e30 to 0c085db Compare April 19, 2026 04:43
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
core/pluginpipelinerace_test.go (1)

67-74: ⚠️ Potential issue | 🟡 Minor

Add a concurrent chunk-count writer to complete this race test.

This currently exercises GetChunkCount() reads only. It does not drive the chunkCount write path (RunPostLLMHooks increment), so read/write synchronization for chunkCount remains untested here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/pluginpipelinerace_test.go` around lines 67 - 74, The test only spawns
readers calling p.GetChunkCount() and never exercises the write path; add a
concurrent writer goroutine that calls the method which increments chunkCount
(RunPostLLMHooks on the same plugin instance p) inside the same iterations loop,
protect goroutine lifecycle with wg.Add(1)/wg.Done(), and ensure it runs
concurrently with the existing reader to exercise read/write synchronization on
chunkCount.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@core/pluginpipelinerace_test.go`:
- Around line 67-74: The test only spawns readers calling p.GetChunkCount() and
never exercises the write path; add a concurrent writer goroutine that calls the
method which increments chunkCount (RunPostLLMHooks on the same plugin instance
p) inside the same iterations loop, protect goroutine lifecycle with
wg.Add(1)/wg.Done(), and ensure it runs concurrently with the existing reader to
exercise read/write synchronization on chunkCount.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 30b40367-d731-48ba-9435-ea3e867e1f51

📥 Commits

Reviewing files that changed from the base of the PR and between c765e30 and 0c085db.

📒 Files selected for processing (3)
  • AGENTS.md
  • core/bifrost.go
  • core/pluginpipelinerace_test.go
✅ Files skipped from review due to trivial changes (2)
  • AGENTS.md
  • core/bifrost.go

Copy link
Copy Markdown
Contributor Author

akshaydeo commented Apr 19, 2026

Merge activity

  • Apr 19, 5:09 AM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Apr 19, 5:10 AM UTC: @akshaydeo merged this pull request with Graphite.

@akshaydeo akshaydeo merged commit baeeff3 into v1.3.x Apr 19, 2026
7 of 12 checks passed
@akshaydeo akshaydeo deleted the 04-19-adds_concurrent_map_fixes_for_plugin_timer branch April 19, 2026 05:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants