Skip to content

Conversation

@CodeBlanch
Copy link
Member

@CodeBlanch CodeBlanch commented Sep 29, 2023

Changes

  • TracerProvider now maintains a cache of the Tracers it has issued. When disposed it will turn them into no-op instances and release their associated ActivitySources.

Details

Consider the following simple application:

using var tracerProvider = Sdk.CreateTracerProviderBuilder().Build();

for (int i = 0; i < 2_500_000; i++)
{
    var tracer = tracerProvider.GetTracer("MyTracer");

    if (i % 500_000 == 0)
    {
        GC.Collect();
    }
}

Running that we will see memory growing per iteration that is never released:

image

What's going on here?

Today we create a Tracer each time GetTracer is called which is handed its own ActivitySource. Creating spurious ActivitySources is dangerous because there is a static list of all active sources.

Tracer does NOT implement IDisposable so users aren't given a chance to do this correctly.

After the cache introduced on this PR the graph looks like this:

image

Merge requirement checklist

  • CONTRIBUTING guidelines followed (nullable enabled, static analysis, etc.)
  • Unit tests added/updated
  • Appropriate CHANGELOG.md files updated for non-trivial changes

@codecov
Copy link

codecov bot commented Sep 29, 2023

Codecov Report

Merging #4906 (83e550b) into main (3e885c7) will increase coverage by 0.29%.
The diff coverage is 94.11%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4906      +/-   ##
==========================================
+ Coverage   83.21%   83.51%   +0.29%     
==========================================
  Files         295      295              
  Lines       12294    12324      +30     
==========================================
+ Hits        10231    10292      +61     
+ Misses       2063     2032      -31     
Flag Coverage Δ
unittests 83.51% <94.11%> (+0.29%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/OpenTelemetry.Api/Trace/Tracer.cs 92.68% <100.00%> (+0.18%) ⬆️
src/OpenTelemetry.Api/Trace/TracerProvider.cs 93.93% <93.33%> (-6.07%) ⬇️

... and 6 files with indirect coverage changes

@CodeBlanch CodeBlanch added the pkg:OpenTelemetry.Api Issues related to OpenTelemetry.Api NuGet package label Oct 2, 2023
@CodeBlanch CodeBlanch marked this pull request as ready for review October 3, 2023 23:27
@CodeBlanch CodeBlanch requested a review from a team October 3, 2023 23:27
{
if (this.tracers == null)
{
// Note: We check here for a race with Dispose and return a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we need to set this.tracers = null inside the same lock. Else we could still run into a situation where some thread calling Dispose sets this.tracers to null after this if check and before the new entry is added to the dictionary. We would want to return a no-op tracer in that case, but we would end up returning a valid tracer.

Copy link
Member Author

@CodeBlanch CodeBlanch Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked it a couple times. I think it is good! Could be I'm not seeing something though. Can you write out a flow for me that you think is flawed? Here are a couple flows I'm imagining.

Case where Dispose runs in the middle of the writer and gets the lock...

  • Writer thread reads the this.tracers on Line 58. It is valid so it begins its work.
  • Dispose thread sets this.tracers to null.
  • Dispose thread takes the lock.
  • Reader thread misses the cache and tries to take the lock. It has to wait.
  • Dispose thread finishes its clean up and releases the lock.
  • Writer thread gets the lock. Now it checks this.tracers == null. This will be true now and it will return a no-op instance.

Case where Dispose runs in the middle of the writer and waits on the lock...

  • Writer thread reads the this.tracers on Line 58. It is valid so it begins its work.
  • Reader thread misses the cache and takes the lock. Inside the lock it checks this.tracers == null which is false. It begins to do its work.
  • Dispose thread sets this.tracers to null.
  • Dispose thread tries to takes the lock. It has to wait.
  • Writer thread adds a new tracer to the cache and releases the lock. It doesn't care that this.tracers is now actually null because it is working on a local copy.
  • Dispose thread gets the lock and makes all the tracers in the cache no-ops including the one that was just added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For case 2,

Writer thread adds a new tracer to the cache and releases the lock. It doesn't care that this.tracers is now actually null because it is working on a local copy.

I think this is more of design choice. Yes, it doesn't care that this.tracers is now actually null but it could care about it 😄.

I was thinking we could offer a stronger guarantee that we would never return a Tracer when TracerProvider is disposed or being disposed. We could avoid this limbo state where the Dispose method may or may not have marked the newly returned Tracer no-op when its being used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged the PR because I think what's there will work well enough. I'll circle back to this comment when I have a sec to see if I can simplify it or clean it up in a way that doesn't introduce a bunch of contention.

Copy link
Contributor

@utpilla utpilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a non-blocking comment: #4906 (comment)

@CodeBlanch CodeBlanch merged commit d4d5122 into open-telemetry:main Oct 5, 2023
@CodeBlanch CodeBlanch deleted the api-gettracer-memoryleak branch October 5, 2023 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pkg:OpenTelemetry.Api Issues related to OpenTelemetry.Api NuGet package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants