-
Notifications
You must be signed in to change notification settings - Fork 975
OTEP: Context-scoped Attributes #4931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
4fd5f24
f7418da
ec73f97
853a883
4654c47
dee0a52
981f030
56321c4
fa7bd4c
5baf3bc
12a7167
efc02b4
bfa000e
6049b1c
0f6d3f6
799cec0
8ed7810
2cbcbbd
2b72887
c6cf28b
003f797
a53dd0d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,325 @@ | ||
| # Context-scoped attributes | ||
|
|
||
| Add Context-scoped telemetry attributes which typically apply to all signals | ||
| associated with a request as it crosses a single service. | ||
|
|
||
| Original author: [Christian Neumüller](https://github.com/Oberon00). | ||
|
|
||
| ## Motivation | ||
|
|
||
| This OTEP aims to address various related demands that have been brought up | ||
| in the past, where the scope (and lifetime) of resource attributes is too broad, | ||
| but the scope of span attributes is too narrow. For example, this happens where there is a | ||
| mismatch between the OpenTelemetry SDK’s (and thus TracerProvider’s, MeterProvider’s, LoggerProvider's) | ||
| process-wide initialization and the semantic scope of a (sub)service. | ||
|
|
||
| A typical use case is supporting multi tenancy, with tenant information existing | ||
| in the request itself (e.g. header key, access token, request parameter). This tenant information | ||
| could be then propagated via Context-scoped attributes. **Any** telemetry produced during the processing | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does it include profiles in the future?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I hope so! In Golang I'm aware of built-in runtime libraries that are meant for this sort of thing, like https://pkg.go.dev/runtime/trace#WithRegion. I think the ideal outcome would be that OTel profiling tools recognize OTel context-scoped attributes like this. |
||
| of the request would then be automatically associated with the respective request/tenant. | ||
| Other alternatives would be problematic: | ||
|
|
||
| * Having an SDK instance per tenant (in order to report tenant-specific information) can be prohibitive | ||
| as there could be hundreds or thousands of different tenants, and the processing/exporting pipeline | ||
| would be duplicated N-times. | ||
| * Custom processors for attaching tenant-specific information would work for spans and logs. However, | ||
| at the time of writing this OTEP there is no specified processor functionality for metrics. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This seems like a weak argument right?. Spec could be adding processor concept. Or Views could be modified to support pulling attributes from context. (Such capability existed in Metrics Views originally, but was removed for scope control)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes and no - the related effort and discussion took a whole year and it wasn't accepted/merged. I don't know about the Views path but it could take long time before it's actually discussed/accepted (not to mention stabilized).
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It does seem like the large argument here over automatic inclusion of metrics merits we take a look at it. IIRC - the original context-scoped attributes failed because of scope creep, so I understand wanting to keep this narrow. Maybe we can just focus the discussion on "What's the right mechanism for context scoped attributes to interact with metrics" - and be a little flexible in thinking for that one. |
||
| * In general, OpenTelemetry should offer out-of-the-box common functionality that is extensively | ||
| used, instead of asking users to write custom components recurrently. | ||
|
|
||
| A related usecase is posed in the issue | ||
| [open-telemetry/opentelemetry-specification#335](https://github.com/open-telemetry/opentelemetry-specification/issues/335) | ||
| “Consider adding a resource (semantic convention) to distinguish | ||
| HTTP applications (http.app attribute)”. If the opentelemetry-java agent | ||
| is injected in a JVM running a classical Java Application Server (such as tomcat), | ||
| there will be a single resource for the whole JVM. The application server can | ||
| host multiple independent applications. There is currently no way to distinguish | ||
| between these, other than with the generic HTTP attributes like | ||
|
jack-berg marked this conversation as resolved.
|
||
| `http.route` or `url.template`. | ||
| Logically, the app attribute would apply to all spans within the trace | ||
| as it crosses the app server, as shown in the diagram below: | ||
|
|
||
|  | ||
|
|
||
| This example shows two traces, with two "HTTP GET" root spans, one originating | ||
| from service `frontend` and another from `user-verification-batchjob`. | ||
| Each of these HTTP GET spans calls into a third service `my-teams-tomcat`. | ||
| That service hosts two distinct HTTP subservices, and as each of the traces crosses it, | ||
| all the spans have a respective `http.app` associated with it. When the `my-teams-tomcat` service | ||
| makes a call to another service `authservice`, that attribute does *not* apply | ||
| to the remote child spans. | ||
|
|
||
| A similar problem occurs with `faas.name` and `cloud.resource_id` | ||
| ([Function as a Service Resource semantic conventions](https://opentelemetry.io/docs/specs/semconv/resource/faas/)) | ||
| on Azure Functions. While both are defined as Resource attributes, an Azure | ||
| function app hosts multiple co-deployed functions with different names and IDs. | ||
| In PR [open-telemetry/opentelemetry-specification#2502](https://github.com/open-telemetry/opentelemetry-specification/pull/2502), | ||
| this was solved by allowing to set these Resource attributes on the FaaS | ||
| root span instead. This has the drawback that the context of which function a span | ||
| is executed in is lost in any child spans that may occur within the function (app). | ||
|
|
||
| ## Explanation | ||
|
|
||
| The Context-scoped attributes allows you to attach attributes to all telemetry | ||
| signals emitted within a Context. Context-scoped attributes are standard | ||
| attributes, which means you can use strings, integers, floating point numbers, | ||
| booleans, arrays, or complex types, just like for any telemetry item. | ||
|
|
||
| When Context-scoped attributes are enabled, they MUST be added to telemetry items | ||
| that were **emitted** while the Context containing the Context-scoped attributes | ||
| was active, or to telemetry items that had this Context explicitly set as parent. | ||
|
|
||
| Context-scoped attributes should be thought of equivalent to adding the attribute | ||
| directly to each single telemetry item it applies to. These attributes will then be | ||
| automatically available to **any** component down the pipeline, e.g. | ||
| to samplers and processors. | ||
|
|
||
| Context-scoped attributes MUST NOT be propagated cross-service, i.e. no context propagator | ||
| must be implemented for them. This is because they are meant to annotate (a subset of) | ||
| the telemetry items related of a single service (see also [the next section](#comp-baggage)). | ||
|
|
||
| Instrumentation libraries desiring to set Context-scoped attributes MUST offer this as an opt-in | ||
| behavior, which will be disabled by default (i.e. libraries will not set Context-scoped attributes). | ||
| Comprehensive documentation on the used attributes SHOULD be provided to end users. | ||
| This SHOULD include details on whether these attributes have high or low cardinality values. | ||
|
|
||
| SDKs MUST disable Context-scoped attributes support by default, in order to reduce the possibility | ||
| of unexpected side effects, as well as any performance hit due to additional allocations. The user | ||
| MUST explicitly enable this functionality on a per-signal basis: | ||
|
|
||
| ```yaml | ||
| tracer_provider: | ||
| context_scoped_attributes: true | ||
| logger_provider: | ||
| context_scoped_attributes: true | ||
| ``` | ||
|
|
||
| See [Open questions](#open-questions) on details on details regarding the implementation. | ||
|
|
||
| <a name="comp-baggage"></a> | ||
|
|
||
| ### Comparing Context-scoped attributes to Baggage | ||
|
|
||
| Context-scoped attributes and | ||
| [baggage](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/baggage/api.md) | ||
| have two major commonalities: | ||
|
|
||
| * Both are "bags of attributes" | ||
| * Both are propagated identically in-process (Baggage also lives on the Context). | ||
|
|
||
| However, the use cases are very different: Baggage is meant to transport app-level | ||
| data along the path of execution, while Context-scoped attributes are annotations | ||
| for telemetry in the same execution path. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. app-level data vs telemetry annotations - not sure of the differences between them.. In my view, the only difference between Baggage is that Baggage is out-of-proc+in-proc, whereas Context-scope Attributes is in-proc only. What kind of data users put inside it - totally upto them. They can put data to affect control flow, telemetry attributes, feature-flags etc. OTel don't put any restrictions.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Baggage only accepts strings. Also, if we were to go any Baggage way, we should offer automatic attachment of Baggage values at the SDK level, like the one defined by this OTEP.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would still love some kind of configuration or capability (can be outside the scope of this OTEP) where I can say "these baggage attributes should automatically promote to be context-scoped as well". I do believe we should NOT add bagggage to all signals "by default" but force users to opt-in to things they wish to trust. This is more a 'secure by design' philosophy of requiring users to opt-in to things from "untrusted" (i.e. not within the app) sources. |
||
|
|
||
| This also explains the functional & API differences: | ||
|
|
||
| * Most important point: Baggage is meant to be propagated across services; | ||
| in fact, this is the main use case of baggage. While a propagator for | ||
| Context-scoped attributes would be implementable, it would break the intended | ||
| meaning by extending the scope of attributes to called services and would also | ||
| raise many security concerns. | ||
| * Baggage only supports string values. | ||
| * **Related** Baggage (implicitly or explicitly linked to telemetry items) MAY not be available | ||
| to telemetry exporters (although e.g., a SpanProcessor could be used to change that), | ||
| while that's the whole purpose of Context-scoped attributes. | ||
|
|
||
| ### Comparing Context-scoped attributes to Instrumentation Scope Attributes | ||
|
|
||
| Context-scoped attributes and | ||
| [Instrumentation Scope attributes](https://github.com/open-telemetry/oteps/pull/201) | ||
| have these commonalities: | ||
|
|
||
| * Both have "scope" in the name. | ||
| * Both apply a set of telemetry attributes to a set of telemetry items | ||
| (those in their "scope"). | ||
|
|
||
| While Instrumentation Scope attributes apply to all items emitted by the same | ||
|
cijothomas marked this conversation as resolved.
|
||
| Tracer, Meter or LogEmitter (i.e., typically the same telemetry-implementing unit | ||
| of code), the scope of Context-scoped attributes is determined at runtime by the | ||
| Context, independently of the used Tracer, Meter or LogEmitter. | ||
|
|
||
| Moreover, Tracer, Meter and Logger instances typically have the same | ||
| life span as the application itself, whereas Context-scoped attributes | ||
| lifespan is tied to the Context itself. For example, a Context solely created | ||
| for an incoming request will stop existing once such request is done. | ||
|
|
||
| In practice, Instrumentation Scope and Context-scoped attributes will | ||
| often be completely orthogonal: | ||
| When the Context is flows in-process through a single | ||
| service, there will often be exactly one span per Tracer (e.g. one span from | ||
| the HTTP server instrumentation as the request arrives, and another one from | ||
| a HTTP client instrumentation, as a downstream service is called). | ||
|
|
||
| ## Internal details | ||
|
|
||
| This section explains the API and SDK level changes that are expected to support this. | ||
|
|
||
| ### API changes | ||
|
|
||
| The API would be extended with two operations to get and set Context-scoped | ||
| attributes: | ||
|
|
||
| * **Set Context Attributes**, e.g. `Context SetContextScopedAttributes(Context context, Attributes attributes)` | ||
| or `Context.setAttributes()`. | ||
| * **Get Context Attributes**, e.g. `Attributes GetContextScopedAttributes(Context context)` or | ||
| `Context.getAttributes()`. | ||
|
|
||
| `Set Context Attributes` takes a set of attributes and returns a new Context that contains | ||
| the specified attributes. | ||
|
|
||
| `Get Context Attributes` returns the context-scoped Attributes, or an empty set if none exists. | ||
|
|
||
| ### SDK changes | ||
|
|
||
| Upon telemetry items creation (e.g. Span, LogRecord), the SDK MUST get | ||
| the Context-scoped attributes of the logically associated Context, and | ||
| add them to the newly created telemetry item, skipping attributes with | ||
| keys already present in the telemetry item. This MUST be done before any | ||
| extension point is invoked, e.g. Sampler or Processor. | ||
|
|
||
| This will be an implementation-level change without any changes in the API-surface | ||
| of the SDK (i.e. it is not necessary to make Context-scoped attributes distinguishable | ||
| from “direct” telemetry attributes). | ||
|
|
||
| The snippet below shows what the expected behavior looks like: | ||
|
|
||
| ```java | ||
| public Span startSpan(...) { | ||
| // Implicit (currently active) or explicit parent. | ||
| Context context = getParentContext(); | ||
|
|
||
| // Attributes specified by the user upon creation time. | ||
| Attributes initialAttributes = getInitialAttributes(); | ||
|
|
||
| // Merge the Context attributes, which have lesser priority | ||
| // than the attributes explicitly specified at telemetry item | ||
| // creation. | ||
| getContextScopedAttributes(context).forEach((key, value) -> { | ||
| if (!initialAttributes.containsKey(key)) { | ||
| initialAttributes.put(key, value); | ||
| } | ||
| }); | ||
|
|
||
| // Pass the updated attributes to the sampling layer. | ||
| SamplingDecision samplingDecision = sampler.ShouldSample( | ||
| context, | ||
| initialAttributes, ...); | ||
|
|
||
| return new Span(initialAttributes, ...); | ||
| } | ||
| ``` | ||
|
|
||
| ## Trade-offs and mitigations | ||
|
|
||
| None at this moment. | ||
|
|
||
| ## Prior art and alternatives | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would be useful to mention similar concept that exists in logs (MDC in Java, .NET ILogger scopes) since they provide examples of API and we can also learn from years of usage and patterns people came up with. |
||
|
|
||
| ### AddBeforeEnd callback and AddFieldToTrace | ||
|
|
||
| There is the (slightly related) issue | ||
| [open-telemetry/opentelemetry-specification#1089](https://github.com/open-telemetry/opentelemetry-specification/issues/1089) | ||
| “Add BeforeEnd to have a callback where the span is still writeable” which seems | ||
| to be motivated by essentially the same problem, though the motivation is stated | ||
| more on a technical level: “What I tried to implement is a feature called trace field | ||
| in Honeycomb (reference: [AddFieldToTrace](https://godoc.org/github.com/honeycombio/beeline-go#AddFieldToTrace))”. | ||
| The aforementioned feature's documentation reads: "AddFieldToTrace adds the field to both the currently | ||
| active span and all other spans involved in this trace that occur within this process.". | ||
|
|
||
| ### Alternative: An implementation for traces on the Backend | ||
|
|
||
| A receiver of trace data (and only trace data! i.e. no other signals) has | ||
| information about the execution flow through the parent-child relationship of | ||
| spans. It could use that to propagate/copy attributes from the parent span to | ||
| children after receiving them. However, there are several problems/differences | ||
| in what is possible: | ||
|
|
||
| * The backend needs to determine which attributes only apply to a particular span, | ||
| and which apply to a whole subtrace. In absence of a dedicated API that allows | ||
| instrumentation libraries to add this information, it can only use pre-determined | ||
| rules (e.g. fixed list of attribute keys) for that. | ||
| * Spans are only sent (by usual SpanProcessor+Exporter pipelines) when they are ended, | ||
| meaning that the parent will usually arrive after the children. This complicates | ||
| processing, and can make additional buffering and/or reprocessing (e.g. repeated | ||
| on-read processing) necessary. | ||
| * To only propagate the attributes within a service, | ||
| [open-telemetry/oteps#182](https://github.com/open-telemetry/oteps/pull/182) | ||
| (sending the `IsRemote` property of the parent span Context in the OTLP protocol) | ||
| would be required. This would also require using OTLP, or another protocol | ||
| (is there any?) that transports this information. Also, the OTLP exporters & | ||
| language SDKs used in the participating services need to actually implement | ||
| the feature. In absence of that, the span kind (server/consumer as child of | ||
| client/producer) would be the only heuristic to try working around this. | ||
|
|
||
| Additionally, Context-scoped attributes would have the advantage of **being usable | ||
| for metrics and logs** as well. The backend could alternatively try to re-integrate | ||
| attributes from matched spans, if it's a single backend for all signals, but this | ||
| would probably become even more costly to implement. | ||
|
|
||
| ### Alternative: Implementing Context-scoped attributes on top of Baggage | ||
|
|
||
| See also: | ||
|
|
||
| * [Related GitHub discussion](https://github.com/open-telemetry/oteps/pull/207#pullrequestreview-1055913542) | ||
| * Section [Comparing Context-scoped attributes to Baggage](#comp-baggage) above | ||
|
|
||
| Instead of using a dedicated Context key and APIs, the Baggage APIs and | ||
| implementation could be extended to support the use cases Context-scoped | ||
| attributes are meant to solve. This would require: | ||
|
|
||
| * Having metadata on the Baggage entry to decide if it should be a "real Baggage" | ||
| or a telemetry attribute. This would decide whether the Baggage entry is added | ||
| to telemetry items, and whether it is (not) propagated. These could be separate | ||
| flags, potentially enabling more use cases. | ||
| * Changing Baggage entries to hold an | ||
| [Attribute](https://github.com/open-telemetry/opentelemetry-specification/blob/v1.12.0/specification/common/README.md#attribute) | ||
| instead of a string as value. | ||
| * Defining and implementing an association between Baggage and telemetry items | ||
| (this point would be more or less the same as what is required for the proposed | ||
| OTEP, e.g. a span processor could be added that enumerates the baggage, filters | ||
| for attributes). | ||
|
|
||
| Using Baggage as a vehicle for implementing Context-scoped attributes seems like | ||
| a valid approach, the decision could be left up to the language. Whether this | ||
| is sensible could depend, for example, on future features that Baggage might | ||
| gain anyway (for example, a flag to stop propagation might be useful for | ||
| interoperation with OpenCensus), and on how much boilerplate code is required in | ||
| the language for implementing a new API vs. extending an existing one. | ||
|
|
||
| ### Alternative: Implementing Context-scoped attributes using a built-in Processor | ||
|
|
||
| An SDK built-in processor could be added in order to stamp telemetry items | ||
| with the context-scoped attribute. It needs specifically to be built-in in order to have access | ||
| to the internal key for getting the context-scoped attributes from `Context`. There are some issues | ||
| with this approach though: | ||
|
|
||
| * Samplers have their `ShouldSample` operation invoked _before_ a processor | ||
| can get their `OnStart` call. Thus we would need to add a `BeforeShouldSample` or `BeforeStart` | ||
| operations to the processor interface, which seems redundant. | ||
| * Processors would need to be configured per-signal, which is both redundant but could also | ||
| offer more granularity. | ||
|
|
||
| ## Open questions | ||
|
|
||
| * Should `Baggage` be included as part of this new process? The need to attach | ||
| `Baggage` values to individual telemetry items has been a request from users | ||
| in the past. This is even considered as part of one of the alternatives that | ||
| were considered. | ||
| * Configuration could be defined, at least initially, as a single boolean setting for EACH | ||
| signal, similar to how the entire SDK is enabled or disabled via the `disabled` configuration | ||
| option, e.g. enable this for traces and logs but disable it for metrics. | ||
| However, this OTEP leaves the topic open in case further options are needed, e.g. allow list or | ||
| deny list specific attributes on a per-signal basis. | ||
| * Support for profiling should be (eventually) added, but this has to be discussed | ||
| and confirmed with the profiling SIG. Potentially, this support could be disabled by default | ||
| if needed. | ||
|
|
||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should instrumentation libs ever set context scoped attributes? I can find use-cases for it (e.g. when doing complex logical operations, it's useful to stamp parent operation name on all transport requests under it), but it means that a single bad instrumentation can create a lot of noise and cardinality issues on metrics. If we never want instrumentations to use the API, then should it be SDK-only API?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we shouldn't allow instrumentation to set attributes in Context, at least initially. We could come up with guidelines on when do that and when it's a bad idea - this will also affect or depend on the defaults we decide per-signal. I have massaged the OTEP to say instrumentation SHOULD NOT use this feature by default, while adding this as an open question.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can understand us taking our time before having it by default, but I do think instrumentation should be able to do it - but always flag guarded. @carlosalberto Would it make sense to delegate that open question to some of our folks running new instrumentatoin libraries in Semconv? E.g. @lmolkova, @trask could represent HTTP or GenAI and figure out how they might interact with this feature. Similarly for other SIGs. |
||
| ## Future possibilities | ||
|
|
||
| With the changes implemented in this OTEP, we will hopefully unblock some | ||
| long-standing specification issues. See [Motivation](#motivation). | ||
|
|
||
| * Instrumentation libraries should not use this feature, i.e. use `Set Context Attributes`, | ||
| but there may be cases where this would be beneficial to the telemetry output. We should | ||
| write documentation and guidelines exploring the side effects and how they play along | ||
| the enabled/disabled defaults for different signals. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: this could be changed to a mermaid diagram |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think context-scoped attributes should not be added as metric attributes by default, given the impact of new attributes on cardinality. I would prefer if the default behavior was for the metrics SDK to treat them the same as default-disabled attributes provided to the Metrics API:
There are definitely cases where a user will specifically want the contextual attribute stored as a metric attribute by default. I would prefer if this was an optional parameter that could be passed when storing the attribute in context.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could add this with a flags attribute as a metric-only feature (as spans nor logs have such opt-in functionality, and hence a simple boolean wouldn't be so descriptive):
Is this something close to what you have in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, more or less. In Go I was thinking it would just use the option pattern (like most of our parameters do):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key question for me is about configuration: just how granular / expressive should the configuration tools be. More granular mean more verbose, but a tool capable of solving more problems. Less granular means easier to configure, but less capable.
The PR currently proposes the least granular option:
On the other end of the spectrum, we might have something like:
The upside is that there's a built in mechanism to account for metrics cardinality issues, and any other expressiveness use cases that arise. The downside is that the common case of "enable all context scoped attributes on all signals" is more verbose:
There's also a middle ground of providing a top level and signal specific config option:
Given that declarative config already makes you repeat things like OTLP exporter configuration for each signal, I lean towards the middle ground approach. I would bet that omitting the additional granularity / expressive now would be temporary and we would end up adding it later anyway. The middle ground strikes a nice balance, allowing for relatively terse config for the common case while still accommodating signal-specific config use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dashpole I am not sure metrics should be special-cased by default here.
The main value of this OTEP is a simple mental model: if an attribute is added to the active Context, it applies to telemetry produced in that Context.
Once metrics become the only exception, that model gets harder to explain and easier to misuse. Users will see the attribute on spans and logs and reasonably expect it to also be present on metrics from the same operation.
I agree metric cardinality is a real concern, but I don't think we should hard-code that concern into the default behavior. Cardinality is a consequence of which attributes the user chooses to put into context. That is already something users need to think about when they decide to use this API, just like they do today when adding metric attributes directly.
In other words, I think the right split is:
That keeps the feature understandable while still giving metrics users the guardrails they need.
Also, for the motivating examples in this PR,
http.appis exactly the kind of attribute that seems useful on metrics as well.Besides, this seems like the more extensible design. If we special-case metrics now, then any future signal with similar cost/cardinality concerns may need another exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dashpole
I am not against disabling some signals when adding attributes to the context. I support offering the option to disable signals.
The semantic model I mentioned is just a way to think about the default behavior.
Yes
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I'm OK with the default being to add it to all signals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems quite risky to me. Metrics is designed for high-performance and by changing the default behavior to pull in attributes from Context, the performance will be significantly affected. (.NET and Rust implementation relies on incoming read-only-slice, and if we need to add attributes from context by default, then it forces constructing a new read-only-slice and copying incoming one to that, This will kill the performance and also trigger heap allocation likely).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cijothomas Theoretically this was always something you needed to do in metrics, because we prepared to attach baggage labels in the initial implementation. You also need context for trace-id in exemplars.
I hear the concern, but I think we need to find clever ways to solve this - e.g. creating an attributes data structure that optmises to avoid allocations but may not be elegant to interact with. I thoroughly believe we need to find a way to deal with this limitation over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cijothomas Lets separate a few concerns:
I've been mostly concerned in this thread about whether users will make the right choice about which signals to opt an attribute into.
I don't see a way around this. If a user wants to dynamically get attributes from context and attach them to metrics, then they will have to pay this cost. In Go, we would probably use pooling to try to mitigate this.
Note that the "default" behavior is still performant regardless, since users need to explicitly opt-in to use contextual attributes in their application. I think the question of defaults mostly is about ergonomics and expectations vs protecting users from themselves. Once a user has decided they want an attribute attached to metrics, they assume the risks and performance costs associated with it.