Skip to content

Solve panic due to concurrent access to ExportSpans#3058

Merged
tonistiigi merged 1 commit intomoby:masterfrom
gsaraf:fix_ExportSpans_panic
Dec 29, 2022
Merged

Solve panic due to concurrent access to ExportSpans#3058
tonistiigi merged 1 commit intomoby:masterfrom
gsaraf:fix_ExportSpans_panic

Conversation

@gsaraf
Copy link
Copy Markdown
Contributor

@gsaraf gsaraf commented Aug 23, 2022

Background:
After enabling Open Telemetry export to Jaeger using the JAEGER_TRACE env var, started seeing occasional panics across our fleet of buildkitd containers. A helpful pointer from the people at the OpenTelemetry Go repo indicated a possible concurrency problem. Wrapping the ExportSpans with a mutex solved the problem completely.

Issue: #3004

Testing: Before this fix, would get several panics a day. After, have gone for several weeks without any panics.
I've also run the suggested tests: ./hack/test integration gateway dockerfile.

Copy link
Copy Markdown
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is really correct.

If your claim that ExportSpans can only be called on a single thread is correct then this is not the only place where it gets called. It will get called by regular context traces from buildkitd (maybe container forwarding as well).

In that case the only way I would see is that all the exporter implementations https://github.com/moby/buildkit/blob/master/util/tracing/detect/detect.go#L78 would need to be wrapped with something that makes ExportSpans safe to call before this exporter is used to create a traceprovider.

Tbh, if the description is correct I don't really understand the logic of this API design. Exporter is supposed to be input to sdktrace.NewBatchSpanProcessor in the regular flow. But if everything breaks down if there are two spanprocessor/traceprovider instances with the same exporter I don't understand what is the point of defining an exporter interface at all.

@gsaraf
Copy link
Copy Markdown
Contributor Author

gsaraf commented Aug 24, 2022

I've mentioned this thread in the original issue in the opentelemetry SDK, hopefully they will respond.

Without knowing too much about the reason this architecture was chosen, there is a benefit to the exporter interface, as it allows for different implementations of the exporter. This is used even in buildkit, for creating a delegated exporter.

Going through all calls to ExportSpans in the codebase:

  • in util/tracing/detect/delegated/delegated.go the export is guarded by a mutex, though this may have been intended to guard the exporters list. Still safe.
  • in control/control.go I add the mutex in this PR.
  • in cmd/buildkitd/main.go maybe used for uploading traces from the client? I'm not 100% sure. Would potentially need to be guarded as well. I'm happy to add that change.

I see what you mean about detectExporter returning a safe exporter - that sounds fine as well. Do you prefer that change?

@tonistiigi
Copy link
Copy Markdown
Member

https://github.com/moby/buildkit/blob/master/util/tracing/detect/detect.go#L78 will return for example in this case the Jaeger exporter. Then this exporter is called for example in control where you added mutex. But it also goes to the https://github.com/moby/buildkit/blob/master/util/tracing/detect/detect.go#L99 and from there all the local traces will end up in the exporter. Even if the control path has a mutex and SpanProcessor implementation is single-threaded they still don't know about each other at all.

@gsaraf gsaraf force-pushed the fix_ExportSpans_panic branch from 1503aa6 to 8c7c731 Compare December 4, 2022 14:23
@gsaraf gsaraf requested a review from tonistiigi December 4, 2022 14:23
@gsaraf
Copy link
Copy Markdown
Contributor Author

gsaraf commented Dec 4, 2022

Sorry about the delay, I've changed the implementation to create a thread-safe exporter instance and use that instead. Let me know if you'd like me to change anything! :)

@gsaraf
Copy link
Copy Markdown
Contributor Author

gsaraf commented Dec 14, 2022

@tonistiigi - can you re-review? Thanks!

@gsaraf
Copy link
Copy Markdown
Contributor Author

gsaraf commented Dec 29, 2022

Hi @tonistiigi - can you re-review this please? I'd like to stop having to maintaining my fork :)

@tonistiigi
Copy link
Copy Markdown
Member

@gsaraf Could you rebase. GH doesn't show conflict but this code has actually changed in latest version.

Comment thread util/tracing/detect/threadsafe.go Outdated
Comment thread util/tracing/detect/threadsafe.go Outdated
Comment thread util/tracing/detect/threadsafe.go Outdated
@gsaraf gsaraf force-pushed the fix_ExportSpans_panic branch 2 times, most recently from 3cae6c6 to fe649ce Compare December 29, 2022 20:37
@gsaraf gsaraf requested a review from tonistiigi December 29, 2022 20:38
@gsaraf
Copy link
Copy Markdown
Contributor Author

gsaraf commented Dec 29, 2022

Thanks for the review! Rebased and made the requested changes.

Comment thread util/tracing/detect/detect.go Outdated
@gsaraf gsaraf force-pushed the fix_ExportSpans_panic branch from fe649ce to f81f662 Compare December 29, 2022 21:07
Comment thread util/tracing/detect/detect.go Outdated
Signed-off-by: Gahl Saraf <saraf.gahl@gmail.com>
@gsaraf gsaraf force-pushed the fix_ExportSpans_panic branch from f81f662 to afb01a7 Compare December 29, 2022 21:44
@tonistiigi tonistiigi merged commit fff089f into moby:master Dec 29, 2022
@tonistiigi tonistiigi mentioned this pull request Jan 6, 2023
emmanuelguerin pushed a commit to emmanuelguerin/buildkit that referenced this pull request Mar 31, 2025
vendor: github.com/moby/buildkit v0.20.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants