OTEP: Define ResourceProvider #4316

tedsuo · 2024-12-02T20:48:19Z

This OTEP defines a ResourceProvider, to enable the updating of resources that have a lifespan shorter than the lifespan of the application instance.

This OTEP is currently a draft. The primary open question is how to handle active spans that are already running when a resource change is made. Please see the OTEP for details.

oteps/4316-resource-provider.md

jsuereth · 2024-12-03T15:32:04Z

oteps/4316-resource-provider.md

+
+## Open questions
+
+The primary open question – which must be resolved before this OTEP is accepted –


I assume pending acceptance of this OTEP we'd begin experimenting here?

This seems like the biggest point to address ASAP.

why is this relevant at all? Spans are produced after the end of activity and the snapshot is taken then. Identifying attributes need to remain stable anyway, so the change of descriptive attributes should be know by the backend and the backend can provide adequate information where required.

We need to prototype extensively before accepting this OTEP. Like all OTEPs, it should not be accepted until we can link to working examples that illustrate how we plan to resolve these concerns.

bidetofevil · 2024-12-03T16:01:10Z

oteps/4316-resource-provider.md

+as search indexes and metric dimensions. For those situations, we only get to pick
+one value.
+
+The simplest implementation is for the BatchProcessor to listen for resource changes,


The naive implementation could lead to a degenerate case, for example, on mobile, where something like a network connection quickly oscillates between connected and unconnected. This would effectively nerf the batching, cutting a new batch a few times a second (which is big in mobile).

Whether this case needs to be handled well, or simply called out so that implementations can protect itself from it, is something we should address. I don't think a that we need so solve this well at this point for us to move this forward.

I think what we should learn from this observation is that descriptive attributes cannot be sent with the data but must be sent through another channel instead. IIRC, there already is a proposal for such a channel.

Also, reaching this point, I no longer like the network example as it seems to show that it should not have been a resource attribute in the first place. I'd rather use service instances migrating between hardware or threads.

SO regarding flagellations + descriptive attributes - I think we need to encourage (but not force) users not to have descriptive attributes in RESOURCE, but allow them in ENTITY. Additionally, we should prefer NOT to identify source of telemetry with something that is volatile. These concerns should be addressed in how we model entity types within Semantic Conventions.

We discussed this a bit in person, but effectively things that MAY be stable in a server-side context MAY NOT be stable in a mobile context (or less stable). So choice of which "entities" to use may need to be localized to an application/service deployment. OTEL needs to be flexible enough to support these both. What we should NOT do is prevent valid identities on Mobile because they don't work in servers or vice-versa.

Yes, but let's be careful to not overcomplicate things with a lot of philosophy! We run into a lot of trouble and bike shedding when we lose track of the practical usage of this data.

Resources are labels applied to all of the data in a batch. Entities are groups of labels. EntityStates are a timeline of changes with precise timestamps as to when the change occurred.

From that perspective, we can ask the question: if a resource such as the networking stack thrashes, or even changes once in the middle of an operation, what label (or labels) is the most helpful to put on that batch of data? And at what granularity should we be segmenting batches of data? The fine grained history is in the EntityState stream. Anything in resources is going to be very coarse grained. So we need to pick a course-graining strategy, based on how we intend backends to make use of the resource labels as a practical manner – not based on a philosophical theory.

For example, perhaps some entities can be marked as "volatile" in a way that causes them to stack up in the same batch rather than trigger a flush.

MrAlias · 2024-12-03T16:28:12Z

oteps/4316-resource-provider.md

+This change should be fully backwards compatible, with one potential exception:
+fingerprinting. It is possible that an analysis tool which accepts OTLP may identify
+individual services by creating an identifier by hashing all of the resource attributes.


Does this mean this will be in a v2 of the SDK given this is backwards incompatible?

Yes I believe it would, especially if all of the details about provider setup are exposed (which they are in v1). In a world where users have a config file and a NewSDK constructor that encapsulates all of these details, I think it would break far fewer users. But it would still be a major version bump, I don't see how it couldn't be.

In general, it seems like switching to entities would change how resources are detected, and I imagine that alone would probably create a breaking change to SDK setup.

We had been goinng through contortions to try to avoid a version bump.

If we think it's easier to just bump the sdk version - we need to discuss that with sdk maintainers but it would give us a lot of options in API design.

I think that Go may be in a special situation here.

Breaking the SDK is not great, it would be better if there is overlap and a grace period rather than a hard break.

For reference, this is the upgrade strategy we use, which depends on users being able to push to the latest version of the SDK. We don't want to create a situation where maintainers have to maintain two branches.

tigrannajaryan · 2024-12-03T16:51:17Z

I support the idea of allowing the Resources to be changed over time. We still need to make sure this complies with our stability guarantees and Resource spec.

We will need to specify how this will interact with the entities. The Entities SIG has a proposal about updating the Resource in a specific way and I want to make sure we don't introduce conflicting ideas that will be impossible to merge in the future.

feldentm-SAP

Overall, I appreciate the document. I left picky comments to allow following my perception when reading it top-down without knowing more details.
Things that need to be changed:

structure and links to other documents need to be improved before releasing the document
the document should provide a clear context at the top and stay in that context
we might have to define elsewhere where which kinds of changes are expected
I struggle with the pseudo code syntax and the provided signatures seem to be incomplete; maybe adding explicit void could help
unless I got something completely wrong, the change is breaking and we should spend more effort on explaining why it is necessary and why it wasn't avoidable

oteps/4316-resource-provider.md

feldentm-SAP · 2024-12-04T11:29:34Z

oteps/4316-resource-provider.md

+
+// Whenever the SessionManager starts a new session
+// it updates the ResourceProvider with a new session id.
+sessionManager.OnChange(


here, the OnChange function is provided by some third-party class and the SetAttribute is used to propagate changes. Is this intended?

oteps/4316-resource-provider.md

feldentm-SAP · 2024-12-04T11:32:31Z

oteps/4316-resource-provider.md

+
+## Example Implementation
+
+Pseudocode examples for a possible Validator and ResourceProvider implementation. Attention is placed on making the ResourceProvider thread safe, without introducing any locking or synchronization overhead to `GetResource`, which is the only ResourceProvider method on the hot path for OpenTelemetry instrumentation.


why is locking relevant here? it seems to be the current focus topic; I agree that it is important, but it isn't the most important topic, right? Using the API and migration should be the most important topics.

feldentm-SAP · 2024-12-04T11:34:04Z

oteps/4316-resource-provider.md

+
+```
+// Example of a thread-safe ResourceProvider
+class ResourceProvider{


I do not really understand the pseudo code syntax. It seems to be Go with classes and some strange extra syntax that I do not get. Also, explicit this is really uncommon and only used in languages that require it due to bad language design.

I'll make this comment again - The syntax/language for pseudo-code isn't important here.

Are you able to understand what the goal of the interface is from the description and the example? If so, let's evaluate that, not choice of pseudo-code syntax.

If you have specific things you don't understand, list them so they can be addressed.

I would add that regardless of the pseudo-code someone chooses to use, we request that the examples to be 100% explicit and not assume that the reader knows any implicit details about a particular programming language.

feldentm-SAP · 2024-12-04T11:36:05Z

oteps/4316-resource-provider.md

+
+    // calling listeners inside of the lock ensures that the listeners do not fire
+    // out of order or get called simultaneously by multiple threads, but would
+    // also allow a poorly implemented listener to block the ResourceProvider.


Yes, plus the statement above is not really correct since the listener could just enqueue a task in a thread pool. I'm not sure why this part should be specified here. It isn't even required in languages/runtimes that do not have threads or commonly don't use them.

Yes when it comes to language-specific examples it's best to look at prototypes, not pseudo-code. The pseudo-code is helpful to show that there is at least one way to implement the proposed changed in a multi-threaded language. It shouldn't be considered the only way or even the best way, but just something to allow us to discuss potential issues with the design that implementations may need to think about.

tigrannajaryan · 2024-12-05T16:43:19Z

We have this wording in a Stable spec doc:

When used with distributed tracing, a resource can be associated with the TracerProvider when the TracerProvider is created. That association cannot be changed later.

How do we reconcile this OTEP with that last assertion? One possibility would be that a Resource directly associated with TracerProvider cannot be changed, but a Resource associated with the ResourceProvider can be replaced and thus the indirect association of the Resource with TracerProvider can change. I would want to confirm that this is an allowed modification of spec wording and we are not breaking our stability guarantees.

jzwc · 2024-12-07T15:58:14Z

Thanks for the comprehensive proposal.

Let me comment here on the whole topic from a narow position of a mobile, especially real user monitoring and analytics area.

The proposal, in its analysis, mixes, in my opinion, two, potentially several, different things together: the truly immutable resources (at least on mobile - like operating system, its version, hardware, etc.) and the mutable states of the device (connection QoS) and application (background-foreground, session, user, and their attributes).

My opinion is that just making resources mutable does not solve the apparent inadequacy of the open-telemetry model for the RUM and analytics realm.

The mutable state requires different handling than resources, on several levels. Not only does the state mutate, but it can also be different for different concurrently running bits of code. E.g., a background thread indeed is subject to the same mutable device state (online/offline), yet it can still run in a context of some state attributes in which it was spawned (e.g., session), although the respective app state on the main thread has already changed.

Thus, there seem to be at least three different “resources” realms:

the classic immutable resources as defined in OTel (with potential mutability proposed here)
mutable global state that applies to all running code of the application
mutable state that is typically global but can define different (typically temporary) local context for concurrently running code

Thus it would be, in my opinion, beneficial to narrow the “let us make resources mutable“ proposal to the use cases where the seemingly immutable (by current definition) resources may mutate.

As for other use-cases of mutable device and applications state, a dedicated solution would serve them better. The dedicated channel to sent the mutable state attributes, mentioned in a comment, seems as a sound solution for me. For the local “sticky” state attributes, they can perhaps be provided in the affected signal attributes to override the global state values.

jsuereth · 2024-12-09T13:23:01Z

Responding to @jzwc

Thus, there seem to be at least three different “resources” realms:

the classic immutable resources as defined in OTel (with potential mutability proposed here)

mutable global state that applies to all running code of the application

mutable state that is typically global but can define different (typically temporary) local context for concurrently running code

I'm not sure if you've seen the OTEP where we split apart Resource, but imagine Resource now as a composable set of Entities (bundles). Resource represents the context in which telemetry was generated, but I think the thing we learned from Mobile is that the context is not the SDK itself but something more dynamic, and we're trying to respond to that here. IMO - This proposal is about your (3) - mutable state that is typically global but can define different (typically temporary) local context for concurrently running code.

Imagine that resource is composed of a set of bundles (entities). 90% of the entities are static for the lifetime of the SDK, e.g. OS, hardware, etc. A few change (like session). With this proposal and the [previous OTEP](mutable state that is typically global but can define different (typically temporary) local context for concurrently running code), you'd just be swapping up the entities that changes wholesale (e.g. Here's a new session, replace the old one and all information about it wholesale).

As for other use-cases of mutable device and applications state, a dedicated solution would serve them better.

This is also a thing with the Entities work, there would be a special "Entities" channel that could fire out state change events.

TL;DR; I agree on the surface that if you just viewed this proposal purely from resource perspective, I think we'd wind up mixing concerns as you suggest. If you layer in the Entities work, I think this should give us the tools we need to solve our problems.

tedsuo · 2024-12-12T05:10:47Z

Ok, big update! I've added Entities to the design. It might even make sense to rename this an EntityProvider instead of a ResourceProvider.

Regardless of the name, I'm positive that this design is not correct, as I'm not even sure which Entity document to be referring to at this point (I used OTEP 0256). But as Cunningham's Law states, "the best way to get the right answer on the internet is not to ask a question; it's to post the wrong answer."

Please let me know what is missing, I would love to pair with one of the entity sponsors to rewrite this and add further details.

tedsuo · 2024-12-18T22:44:38Z

Latest update:

Added an AddEntity method to ResourceProvider
Clarified that the resource object should be regenerated entirely when an entity changes

The next big bugbear I would like to look at is how to handle the potential for thrashing resources, such as the networking stack on mobile rapidly changing.

One possible solution is to batch changes in a time window. Rather that have separate individual callbacks for EntityState or EntityDelete, there would be a single callback that receives a list of entity events along with the final computed resource. Adjusting the time window length makes a tradeoff between the maximum number of batch flushes that can be triggered and the accuracy of the resources on any particular batch. I don't know if that is a good solution. @bidetofevil what do you actually want here?

tedsuo added 2 commits December 2, 2024 12:43

OTEP: Define ResourceProvider

a830655

add pr number to filename

c8d5301

jsuereth reviewed Dec 3, 2024

View reviewed changes

bidetofevil reviewed Dec 3, 2024

View reviewed changes

MrAlias reviewed Dec 3, 2024

View reviewed changes

tedsuo added the OTEP OpenTelemetry Enhancement Proposal (OTEP) label Dec 3, 2024

feldentm-SAP suggested changes Dec 4, 2024

View reviewed changes

tigrannajaryan mentioned this pull request Dec 5, 2024

Adding resource attributes post-creation (e.g. via auto-discovery) #1298

Open

tedsuo added 6 commits December 11, 2024 16:54

formatting

a3cb621

Add entities to ResourceProvider

2fd655c

clarify locking

a1d4461

updated explanation

844ffa8

update example

cc9877f

reformat ResourceProvider description

3cd7641

tedsuo added 4 commits December 18, 2024 14:17

resource must be entirely regenerated

7e24b05

add Add Entity method

6e1d24d

remove example usage for now

3d01a5f

move entitystate creation out of callback loop

47d549e

service.id -> service.instance.id

b134f1b

MSNev mentioned this pull request Dec 20, 2024

Update enduser domain and add enduser.authentication.id open-telemetry/semantic-conventions#1456

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OTEP: Define ResourceProvider #4316

OTEP: Define ResourceProvider #4316

tedsuo commented Dec 2, 2024

jsuereth Dec 3, 2024

feldentm-SAP Dec 4, 2024

tedsuo Dec 12, 2024

bidetofevil Dec 3, 2024

feldentm-SAP Dec 4, 2024

feldentm-SAP Dec 4, 2024

jsuereth Dec 13, 2024

tedsuo Dec 17, 2024

tedsuo Dec 17, 2024

MrAlias Dec 3, 2024

tedsuo Dec 12, 2024

jsuereth Dec 12, 2024 •

edited

Loading

tedsuo Dec 17, 2024

tigrannajaryan commented Dec 3, 2024

feldentm-SAP left a comment

feldentm-SAP Dec 4, 2024

feldentm-SAP Dec 4, 2024

feldentm-SAP Dec 4, 2024

jsuereth Dec 12, 2024

tedsuo Dec 17, 2024

feldentm-SAP Dec 4, 2024

tedsuo Dec 17, 2024

tigrannajaryan commented Dec 5, 2024 •

edited

Loading

jzwc commented Dec 7, 2024 •

edited

Loading

jsuereth commented Dec 9, 2024

tedsuo commented Dec 12, 2024

tedsuo commented Dec 18, 2024


		## Open questions

		The primary open question – which must be resolved before this OTEP is accepted –


		## Example Implementation

		Pseudocode examples for a possible Validator and ResourceProvider implementation. Attention is placed on making the ResourceProvider thread safe, without introducing any locking or synchronization overhead to `GetResource`, which is the only ResourceProvider method on the hot path for OpenTelemetry instrumentation.

OTEP: Define ResourceProvider #4316

Are you sure you want to change the base?

OTEP: Define ResourceProvider #4316

Conversation

tedsuo commented Dec 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsuereth Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tigrannajaryan commented Dec 3, 2024

feldentm-SAP left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tigrannajaryan commented Dec 5, 2024 • edited Loading

jzwc commented Dec 7, 2024 • edited Loading

jsuereth commented Dec 9, 2024

tedsuo commented Dec 12, 2024

tedsuo commented Dec 18, 2024

jsuereth Dec 12, 2024 •

edited

Loading

tigrannajaryan commented Dec 5, 2024 •

edited

Loading

jzwc commented Dec 7, 2024 •

edited

Loading