Skip to content

sdk/trace: fix goroutine leak in ForceFlush on context cancel#6363

Closed
sipsma wants to merge 1 commit intoopen-telemetry:mainfrom
sipsma:fix-ctx-cancel-leak
Closed

sdk/trace: fix goroutine leak in ForceFlush on context cancel#6363
sipsma wants to merge 1 commit intoopen-telemetry:mainfrom
sipsma:fix-ctx-cancel-leak

Conversation

@sipsma
Copy link
Copy Markdown

@sipsma sipsma commented Feb 24, 2025

When batchSpanProcessor.ForceFlush was called and the context was canceled, the method's select statement returned the context err right away.

This also meant that the wait chan was no longer being read from, which meant that the goroutine attempting to write to it ended up blocking and leaking forever.

The fix is to just make the wait chan buffered with a size of 1. It can always be written to and will just be gc'd if the ForceFlush method has already exited.

When `batchSpanProcessor.ForceFlush` was called and the context was
canceled, the method's select statement returned the context err right
away.

This also meant that the `wait` chan was no longer being read from,
which meant that the goroutine attempting to write to it ended up
blocking and leaking forever.

The fix is to just make the `wait` chan buffered with a size of 1. It
can always be written to and will just be gc'd if the `ForceFlush`
method has already exited.

Signed-off-by: Erik Sipsma <erik@sipsma.dev>
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented Feb 24, 2025

CLA Signed


The committers listed above are authorized under a signed CLA.

@dmathieu
Copy link
Copy Markdown
Member

Could you add a test?

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 24, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.8%. Comparing base (8f4a5c6) to head (b2a07e4).
Report is 2 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@          Coverage Diff          @@
##            main   #6363   +/-   ##
=====================================
  Coverage   81.8%   81.8%           
=====================================
  Files        283     283           
  Lines      24900   24900           
=====================================
  Hits       20386   20386           
  Misses      4110    4110           
  Partials     404     404           

see 2 files with indirect coverage changes

@sipsma
Copy link
Copy Markdown
Author

sipsma commented Feb 24, 2025

Could you add a test?

@dmathieu happy to, but not sure yet how to go about it for this case of testing that the goroutine didn't leak. There's always something like dumping current goroutine stack traces, but felt extremely fiddly (if you disagree let me know, can always give it a shot).

The other possibility would be changing the actual code to make it more unit testable for this case; e.g. changing ForceFlush to just call an internal method like forceFlush that accepts arguments like the wait chan and could thus be unit tested. But then that's making the actual code more complicated for the sake of unit testing, which people have mixed feelings on.

Let me know what you think, open to any suggestions.

@dmathieu
Copy link
Copy Markdown
Member

The approach @pellared has taken to fix the leak in the simple batch processor looks like it could be reused here?
#6368

@pellared
Copy link
Copy Markdown
Member

pellared commented Feb 25, 2025

@sipsma, I assigned myself to the issue before you created the PR. See #6360

Closing per #6369. Feel free to review, this is also a contribution.

Thank you for your willingness to contribute.

@pellared pellared closed this Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants