-
Notifications
You must be signed in to change notification settings - Fork 446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Occasional Segfault with LongCounter instrument #1632
Comments
@Ongy - can you share your code, or whether you are generating events across multiple threads? |
Not sure yet whether I want to open that code. I'll try to do a minimal reproduction sample so it's not a pain to compile/run either way. My side of the code is fully single threaded. Events will always come from the main task and outside the otlp there's nothing that starts threads (or forks) |
https://github.com/Ongy/otlp-example exhibits this crash. |
@lalitb reduced the linked example quite a bit. Somewhat expected note, commenting out https://github.com/Ongy/otlp-example/blob/main/conntrack/conntracker.cpp#L76 helps. |
Yes commenting that would break the pipeline and no metrics would-be exporter. Your code looks pretty simple scenario, but I suspect some issue in sync-storage here. Will debug this shortly. Thanks for reporting the issue with minimal reproduction. |
Ok, I am able to reproduce the issue consistently using sample shared, anywhere after 500K - 2 million executions. I was suspecting sync-storage, but this seems to be different. Also using std::mutext instead of common::SpinLockMutex doesn't fix the issue. Will be investigating further. |
Describe your environment Describe any aspect of your environment relevant to the problem, including your platform, build system, version numbers of installed dependencies, etc. If you're reporting a problem with a specific version of a library in this repo, please check whether the problem has been fixed on main branch.
Steps to reproduce
I have a setup with multiple metric instruments.
Opentelemetry SDK is configured to export into the prometheus integration, with view/integration setup largely like the example code.
Crash is nondetermenistic and takes between a couple seconds and over an hour to show up with a counter called a couople hundred times each second.
I'll try to build a smaller example to exhibit the error as well.
What is the expected behavior?
No crashes.
What is the actual behavior?
Segfault in opentelemetry-cpp code.
Additional context
I've extracted a call stack and some more information:
Stepping through the debugger, I see that the
this
pointer is0x0
in theGetOrSetDefault
Aggregate callstack, and then the memberlock_
passed to the guard is at0x38
which is still in the 0 page..What confuses me though, onn the frame one above, I can see that
this->attributes_hashmap_
has a non-zero value.The text was updated successfully, but these errors were encountered: