-
Notifications
You must be signed in to change notification settings - Fork 854
[api] Fix memory leaks in TracerProvider.GetTracer API #4906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[api] Fix memory leaks in TracerProvider.GetTracer API #4906
Conversation
Codecov Report
@@ Coverage Diff @@
## main #4906 +/- ##
==========================================
+ Coverage 83.21% 83.51% +0.29%
==========================================
Files 295 295
Lines 12294 12324 +30
==========================================
+ Hits 10231 10292 +61
+ Misses 2063 2032 -31
Flags with carried forward coverage won't be shown. Click here to find out more.
|
| { | ||
| if (this.tracers == null) | ||
| { | ||
| // Note: We check here for a race with Dispose and return a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we need to set this.tracers = null inside the same lock. Else we could still run into a situation where some thread calling Dispose sets this.tracers to null after this if check and before the new entry is added to the dictionary. We would want to return a no-op tracer in that case, but we would end up returning a valid tracer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked it a couple times. I think it is good! Could be I'm not seeing something though. Can you write out a flow for me that you think is flawed? Here are a couple flows I'm imagining.
Case where Dispose runs in the middle of the writer and gets the lock...
- Writer thread reads the
this.tracerson Line 58. It is valid so it begins its work. - Dispose thread sets
this.tracerstonull. - Dispose thread takes the lock.
- Reader thread misses the cache and tries to take the lock. It has to wait.
- Dispose thread finishes its clean up and releases the lock.
- Writer thread gets the lock. Now it checks
this.tracers == null. This will betruenow and it will return a no-op instance.
Case where Dispose runs in the middle of the writer and waits on the lock...
- Writer thread reads the
this.tracerson Line 58. It is valid so it begins its work. - Reader thread misses the cache and takes the lock. Inside the lock it checks
this.tracers == nullwhich isfalse. It begins to do its work. - Dispose thread sets
this.tracerstonull. - Dispose thread tries to takes the lock. It has to wait.
- Writer thread adds a new tracer to the cache and releases the lock. It doesn't care that
this.tracersis now actuallynullbecause it is working on a local copy. - Dispose thread gets the lock and makes all the tracers in the cache no-ops including the one that was just added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For case 2,
Writer thread adds a new tracer to the cache and releases the lock. It doesn't care that this.tracers is now actually null because it is working on a local copy.
I think this is more of design choice. Yes, it doesn't care that this.tracers is now actually null but it could care about it 😄.
I was thinking we could offer a stronger guarantee that we would never return a Tracer when TracerProvider is disposed or being disposed. We could avoid this limbo state where the Dispose method may or may not have marked the newly returned Tracer no-op when its being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I merged the PR because I think what's there will work well enough. I'll circle back to this comment when I have a sec to see if I can simplify it or clean it up in a way that doesn't introduce a bunch of contention.
utpilla
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a non-blocking comment: #4906 (comment)
Changes
TracerProvidernow maintains a cache of theTracers it has issued. When disposed it will turn them into no-op instances and release their associatedActivitySources.Details
Consider the following simple application:
Running that we will see memory growing per iteration that is never released:
What's going on here?
Today we create a
Tracereach timeGetTraceris called which is handed its ownActivitySource. Creating spuriousActivitySources is dangerous because there is a static list of all active sources.Tracerdoes NOT implementIDisposableso users aren't given a chance to do this correctly.After the cache introduced on this PR the graph looks like this:
Merge requirement checklist
CHANGELOG.mdfiles updated for non-trivial changes