-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Linux by tracking context with ServiceContext #438
Conversation
|
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #438 +/- ##
==========================================
- Coverage 66.33% 66.04% -0.29%
==========================================
Files 331 335 +4
Lines 14104 14379 +275
==========================================
+ Hits 9356 9497 +141
- Misses 4748 4882 +134
☔ View full report in Codecov by Sentry. |
…erviceContext isn't supported
This would be extremely valuable to us and I am happy to help take this PR over the finish line if needed, are there any updates on the status right now? |
@semicoleon |
@@ -111,6 +111,40 @@ public extension Span { | |||
func setAttribute(key: SemanticAttributes, value: Bool) { | |||
return setAttribute(key: key.rawValue, value: AttributeValue.bool(value)) | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fundamentally change the context by running actions in a block, meaning context association cannot be made using the current method on iOS. We'd prefer this to be in a separate #ifdef
for linux, to avoid users from mixing these APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow why this wouldn't work on iOS? Could you elaborate a bit?
@@ -130,4 +133,33 @@ public extension SpanBuilder { | |||
@discardableResult func setAttribute(key: String, value: Bool) -> Self { | |||
return setAttribute(key: key, value: AttributeValue.bool(value)) | |||
} | |||
|
|||
/// Builds and starts a span, setting it as active for the duration of the closure. The span is ended when when the closure exits (even if an error is thrown). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as above. Also, it appears these are not part of the OTEL API Spec, and probably shouldn't be added.
@@ -22,7 +22,7 @@ public class StablePeriodicMetricReaderSdk : StableMetricReader { | |||
scheduleTimer = DispatchSource.makeTimerSource(flags: DispatchSource.TimerFlags(), queue: scheduleQueue) | |||
|
|||
scheduleTimer.setEventHandler { [weak self] in | |||
autoreleasepool { | |||
maybeAutoreleasepool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how this will effect things. We'd be more comfortable if this was also in an IFDEF
@@ -5,17 +5,17 @@ | |||
|
|||
import Foundation | |||
import OpenTelemetryApi | |||
|
|||
import Atomics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we removed atomics from the framework due to compatibility issues with XCFrameworks iirc.
@@ -15,7 +15,7 @@ class PropagatedSpan: Span { | |||
var context: SpanContext | |||
|
|||
func end() { | |||
OpenTelemetry.instance.contextProvider.removeContextForSpan(self) | |||
_ = OpenTelemetry.instance.contextProvider.tryRemoveContextForSpan(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you change the name? this implies it might throw.
@@ -67,18 +70,42 @@ class ActivityContextManager: ContextManager { | |||
return (currentActivityId, activityState) | |||
} | |||
|
|||
func removeContextValue(forKey key: OpenTelemetryContextKeys, value: AnyObject) { | |||
let activityIdent = os_activity_get_identifier(OS_ACTIVITY_CURRENT, nil) | |||
public func removeContextValue(forKey key: OpenTelemetryContextKeys, value: AnyObject) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These shouldn't be public. The context should be managed automatically, this seems like implementation details leaking.
@@ -4,6 +4,7 @@ | |||
*/ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than changing the existing context manager, a separate one should be added for linux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the bug this was addressing has been fixed upstream so this change can probably be discarded now
Greetings all, and thank you for an excellent product. We are using this library on both MacOS and Linux; clearly we have a vested interest in this PR. Would it help if we forked semicoleon:linux, and ran our system tests on that code base, or are you too far from having an acceptable MR? Thank you! |
@youngde811 If you were willing to pick up this PR, that would be greatly appreciated. There are some critical issue with this PR that I've enumerated that need to be addressed before we can move forward. |
Unfortunately I don't have time to come back to this at the moment. If someone picks it up, I'll do my best to answer any questions they have though. I don't believe |
Yes, I'd be happy to finish up this work. I could use any detail you might have on the issues stalling this PR (if there is any extra information); what to avoid that would violate project requirements; etc. I'll go back over the comments above and pick up as much knowledge as I can, then we'll go from there. Thanks! |
I’ll take it over, as I wrote just a bit ago. Thanks!
lasciate ogni speranza, voi ch'entrate
…On Wed, Sep 27, 2023 at 2:12 PM, Cole ***@***.***(mailto:On Wed, Sep 27, 2023 at 2:12 PM, Cole <<a href=)> wrote:
Unfortunately I don't have time to come back to this at the moment. If someone picks it up, I'll do my best to answer any questions they have though.
I don't believe #if os(Linux) will be viable for gating API changes. At least for my use case it's also important that the same code builds on macOS, and I imagine that's true for most other prospective users of Linux support in this library. That way why I aimed to move things towards an API that worked with the ServiceContext mode of context (closure based) as well as the os_activity manager.
—
Reply to this email directly, [view it on GitHub](#438 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/ABV3HLFZRT3HNQCKADKNDCLX4RUCTANCNFSM6AAAAAA2NOPLFE).
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
FYI, I have approval from leadership to take on this PR and get it done appropriately. I'll keep folks posted here. Cheers. |
Ok, I see some of what's going on here. So in ActivityContextManager.swift, the code is using a technique akin to a Foreign Function Interface (FFI) to gain access to the os_activity_create native symbol in its shared library. This is perfectly fine, IMHO, and should remain for MacOS users (we're deploying on both MacOS and Linux). I have a suggestion. I should start this work afresh, with an eye towards refactoring whatever is platform-specific (even if the Linux version is a no-op due to no OS support). No modules should be removed (it appears ResourceExtension was removed from the fork, yet our code uses this module as I'm sure many others do; builds on MacOS are unable to find that module, for example). Keep the port as restricted as possible while maintaining MacOS functionality and gaining Linux capability. I'd like to create a branch from main; work there; then submit a PR that respects the project requirements. What say you all? Thanks! |
I've done some research on the Activities subsystem for MacOS (and I guess iOS?), and see that it appears to be used for associating log messages to user events, via a number of library API functions. Within ActivityContextManager, it appears that nothing directly related to Activities is returned to upstream users of this class, at least by examining the opentelemetry-swift code base. I'm wondering if a Linux implementation of ActivityContextManager could use syslog (if installed) on a distribution. Would that be sufficient? It won't be a complementary implementation, as I'm unaware of a Linux library for associating user events to log messages. However, a Linux-based ActivityContextManager that employs syslog would at least produce something useful. Thoughts? |
Oh, and never mind about branching from your main. I decided to just fork this project. I'll make the Linux support changes, then submit a PR. Thoughts on my question regarding ActivityContextManager and syslog are appreciated. Cheers! |
We should move this discussion to another, more appropriate location. Where would that be ideal for everyone? |
The activity system is being used by ActivityContextManager as essentially a unique ID in a thread local. The activity system is integrated into other APIs like libdispatch such that sending work to another thread via a dispatch queue will cause the closure running on the queue to inherit the activity ID. That makes using activity IDs for tracking context really nice because you don't have to manually associate the dispatched closure with the context from the spawning thread. Unfortunately as far as I can tell there's really no equivalent system on Linux, which is why I opted for using a task local via ServiceContext. |
Agreed; I've found no Linux equivalent and I don't want to kludge something together. But that information really helps, as I had just figured why the activity library is being used. Thank you! |
Hi, yes what @semicoleon says is exactly why it is used, it is a bit hacky but it has worked really well in all kind of contexts. The problem with using task local is that it can work for manual instrumentation, but it makes really difficult doing autoinstrumentation or it gets into a nightmare mixing its usage with current os.activity implementation. For a more interactive discussion I think the otel-swift channel in the cncf slack is a good match: https://cloud-native.slack.com/archives/C01NCHR19SB |
Outstanding! Thank you; I’ll take a look. Cheers!
lasciate ogni speranza, voi ch'entrate
…On Thu, Sep 28, 2023 at 3:43 PM, Ignacio Bonafonte ***@***.***(mailto:On Thu, Sep 28, 2023 at 3:43 PM, Ignacio Bonafonte <<a href=)> wrote:
Hi, yes what ***@***.***(https://github.com/semicoleon) says is exactly why it is used, it is a bit hacky but it has worked really well in all kind of contexts. The problem with using task local is that it can work for manual instrumentation, but it makes really difficult doing autoinstrumentation or it gets into a nightmare mixing its usage with current os.activity implementation.
For a more interactive discussion I think the otel-swift channel in the cncf slack is a good match: https://cloud-native.slack.com/archives/C01NCHR19SB
—
Reply to this email directly, [view it on GitHub](#438 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/ABV3HLD7ZXZKTP53IDHNNOLX4XHP5ANCNFSM6AAAAAA2NOPLFE).
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
So, why can’t we just use Swift’s UUID support for these unique “thread local” identifiers? Each time UUID().uuidString is invoked, a new identifier is created. Right? Isn’t that why this project used the os.activity library? Using a UUID would make the code portable… |
A couple of years ago a thread local might have been sufficient for Linux support, but in a post concurrency world it's basically useless. A task can move between threads across await points, so the thread local value will simply be incorrect sometimes. That's why task locals exist. |
So again. Can’t we use Swift UUIDs? They’re unique, can be carried by a task, used as a key, etc. What am I missing? Thanks!
lasciate ogni speranza, voi ch'entrate
…On Thu, Sep 28, 2023 at 7:14 PM, Cole ***@***.***(mailto:On Thu, Sep 28, 2023 at 7:14 PM, Cole <<a href=)> wrote:
A couple of years ago a thread local might have been sufficient for Linux support, but in a post concurrency world it's basically useless.
A task can move between threads across await points, so the thread local value will simply be incorrect sometimes. That's why task locals exist.
—
Reply to this email directly, [view it on GitHub](#438 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/ABV3HLBEPUAEFUMJFJXX6JLX4YAEDANCNFSM6AAAAAA2NOPLFE).
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
How are you storing the UUID such that you can look up the correct UUID for a caller? Using a thread local doesn't really work with concurrency, as I mentioned. Using a task local requires changing the shape of the API as I do in this PR currently. |
I don’t have the code in front of me currently, but doesn’t the context manager use os.activity to get a unique ID and use that as a key into a map? Wouldn’t Swift’s UUID support accomplish the same thing? |
The activity API provides a way for the context manager to request the currently active identifier from the system without having any additional data passed in to the manager. That's the functionality that needs to be replaced, not generation of unique values. |
Hmm. Sounds to me as if we’re talking about the same thing. At the end of the day, we need metadata attached to a “task” that may be used when needed in the context manager. Alright, thanks. I’ll see where this leads. |
Morning folks. Thanks to input from @semicoleon and @nachoBonafonte, I did some research last night and comfortably grok os.activity; how we're employing it; and how we might mimic this for Linux. More prototyping, then I might have a PR for you folks late next week. I appreciate the assistance here; very smart team, and it's a pleasure to work with all of you. Thank you! |
Update: we have opentelemetry-swift running on Linux (ARM64), and should have AMD64 quickly. We decided to reject @semicoleon’s PR as being too extensive; rather, we wrote a small C library that emulates enough of os.activity to provide the needed functionality. Thus far the OTEL package is behaving correctly, on ARM64. The new C library will quickly support AMD64, and that will complete our fork. Ultimately, @semicoleon’s use of Tasks may prove the ideal solution, but at this stage that seems too much for a single PR, and as indicated violates the API. |
@youngde811 any news here? |
Yes, see my comment above from two weeks back. We’ve devised a solution that seems to work well on both arm64 and x86_64, by a) writing a small C library that offers functionality that mimics some of os.activity; and b) refactors some of the context structure in the Swift code. Disruptions to the codebase are minimal. We’re currently exercising the tracing support to ensure it operates correctly. |
Update on our Linux port. We now have a reliable implementation based on task local variables and the Swift withValue() construct. No OTEL API changes were made to existing classes/structs; rather, we have 1) a few necessary extensions in OpenTelemetryApi; and 2) a small wrapper library (not part of opentelemetry-swift) whose API consists of two principal convenience functions: withTracing() and withSpan() (both of which are basically closures). Peer and nested traces work properly; data is sent to Grafana/Tempo and looks correct. We have not yet tested remote spans, but I’ve no reason to believe they won’t work. Over the next several weeks we will be integrating the OTEL port, along with our support library, into one of our primary applications and will thoroughly exercise the port. Dunno if you folks are still interested in this fork; just post here if you are. Cheers. |
Update: Our Linux port runs very well on both Linux itself and MacOS. The approach taken is the use of Swift’s structured concurrency; in particular TaskLocal and withValue(). For our own needs, we also wrote a thin veneer library that offers two functions that abstract the underlying implementation: withTracing() and withSpan(). A bonus is we’ve eliminated the dependency on Apple’s poorly documented and OS specific activity library. We’re also in the process of getting rid of os.log dependencies. Finally, our changes a) do not break the API; b) are self contained and small in size; and c) are not visible to clients of the library. |
Oh, and if you folks would like us to submit a PR for our Linux port, please let me know. Cheers. |
The limitation of that approach is that there is no way to dynamically add instrumentation to existing code or auto-instrument external code, you must explicitly create the traces at the client code directly and cannot add it after. Similar to that approach there is https://github.com/apple/swift-distributed-tracing, which also has a otlp exporter I think. We may want to import their traces as we do with swift-metrics and might be a solution for Linux also. I also have concerns that it may be difficult to write the OpenTelemetry api on top of TaskLocal. Do you have a repository where we can check what changes are you doing and how user code is written? |
Hello @nachoBonafonte! Thanks for the reply. So, several thoughts.
Currently, the API we've written that fronts the OTEL library is in our private GitLab repository. However, since our goal is to contribute to getting your project running on Linux, I'm pretty certain we could make that project available. Until I've cleared that with leadership, here are two functions from our Observability library:
And here's an example of client usage:
Nested spans are supported, of course. So. What do you feel is the best way to proceed here? Some sort of requirements document for current and future multi-platform goals would help us tremendously, if such a thing is available. Let me know. Thanks! Oh, and this is awesome work, BTW. |
@nachoBonafonte By the way. When you wrote “add dynamic instrumentation”, are you speaking of code injection? The way VM level Java agents can? |
I am completing log simplification to eliminate any dependencies on os.log; should be just about tested. Once done, I’ll take a screenshot of our Grafana/Tempo data source after a load test and post it here. |
Closing in favor of #546 |
This PR adds support for Linux, and will close #38. Firstly this involved doing quite a lot of work on the package manifest(s) to get the dependencies that wouldn't build on Linux filtered out (If anyone has suggestions on better ways to structure the platform specific stuff for the Package.swift files I would love to hear them!). Second, I modified the
ContextManager
protocol to support the closure based context APIs required when usingServiceContext
, and created aServiceContextManager
type.ContextManager
s which can support arbitrary start and end locations can implement theManualContextManager
to opt in to that capability.Supporting the old style of context management is probably a requirement, but it's worth noting that it can never work with
ServiceContext
. That makes existence of thoes APIs a bit of a footgun for libraries which don't control which context manager they may run with. Regardless the current shape of the API in this PR is unquestionably a breaking change.LoggerSdk
was previously usingos_log
to notify users of a configuration issue. I changed this toprint
to support Linux, but that doesn't seem ideal. Would some sort of configurable error handler on theOpenTelemetry
instance make sense?I ended up needing to fix #420 to get tests to pass without even more changes.
ActivityContextManager
now ensures thevalue
passed to the remove method has been seen by the context manager before, and if not immediately returns. This requires storing the activity identifier along with the activity state so the manager isn't blindly removing the current activity.