Add OpenTelemetry guidelines doc#312
Conversation
|
@open-telemetry/specs-approvers @open-telemetry/technical-committee please review. This is opinionated take on how Otel SDKs should identify themselves via OpAMP in a remotely managed system. |
| ## Identifying Attributes | ||
|
|
||
| Language instrumentation agents SHOULD copy all attributes from the | ||
| OpenTelemetry SDK `Resource` into `AgentDescription.identifying_attributes`. |
There was a problem hiding this comment.
I do not get the idea for a single exception here, while everything else follows a different pattern?
https://github.com/open-telemetry/opamp-spec/blob/main/specification.md#agentdescriptionidentifying_attributes
There was a problem hiding this comment.
why should a os.version be in the identifying_attributes list while here it's said otherwise:
https://github.com/open-telemetry/opamp-spec/blob/main/specification.md#agentdescriptionnon_identifying_attributes
There was a problem hiding this comment.
@tigrannajaryan can probably help clarify this, but I think the idea is that language agents and the Collector treat Resource and telemetry slightly differently. Notably, the Collector's Resource for its own internal/diagnostic telemetry is often different than the Resource(s) of the things it is handling telemetry for (like other agents or the operating system or whatever)...whereas the language agents use a Resource to identify the source of ALL telemetry it produces.
There was a problem hiding this comment.
In the end it's "2 pipes taking to the same place" anyway, so server can rearrange however it needs - but the change itself will be a breaking change if there is a need for one.
I just wanted to be sure that the base hypothesis about what the resource contains is same. At least in .NET it contains tons of general classifiers like os.version, host.arch, process.runtime.name.
@tigrannajaryan said that identifying_attributes should form a guid but in the end every extra component makes that guid stronger (besides being a very dynamic guid) 🤷♂️
There was a problem hiding this comment.
I do not get the idea for a single exception here, while everything else follows a different pattern? https://github.com/open-telemetry/opamp-spec/blob/main/specification.md#agentdescriptionidentifying_attributes
We should copy that section into this file too. Those are also OpenTelemetry guidelines, for the Collector.
There was a problem hiding this comment.
why should a
os.versionbe in theidentifying_attributeslist while here it's said otherwise: https://github.com/open-telemetry/opamp-spec/blob/main/specification.md#agentdescriptionnon_identifying_attributes
There is no contradiction here. os.version is not in Collector's Resource, thus we do not expect it to be in identifying_attributes. But since information about OS used it typically useful to agents the guidelines recommend including it as non_identifying_attributes.
There was a problem hiding this comment.
I just wanted to be sure that the base hypothesis about what the resource contains is same. At least in .NET it contains tons of general classifiers like os.version, host.arch, process.runtime.name.
This is the unfortunate result of how we are forced to put everything into Resource which is just a single bag of attributes. We simply have no way to differentiate attributes that are important for identification vs ones that are useful to have but are not required for identification. The new Entity concept solves this problem by having this clear distinction between identifying and non-identifying attributes, which mirror OpAMP concepts precisely.
For now I think it is fine for simplicity to say we include all Resource attributes.
RassK
left a comment
There was a problem hiding this comment.
#310 (comment)
I do not get the idea for a single exception here, while everything else follows a different pattern? https://github.com/open-telemetry/opamp-spec/blob/main/specification.md#agentdescriptionidentifying_attributesWe should copy that section into this file too. Those are also OpenTelemetry guidelines, for the Collector.
With this minor change, it's seems good to go.
|
I am going to keep this open for a few days to get more extensive feedback from the community. I will publicize in Slack. |
| This document provides guidance for language instrumentation agents that use | ||
| OpAMP. |
There was a problem hiding this comment.
(nit) Can we settle on using a single term? Previously we have
guidance for instrumentation runtimes
Personally, I am not a fan of using the "agent" term. I suggest "component" or "runtime" or "implementation" (I do not have any strong opinion). I think the sentence below is the only one that where using "agent" is fine (however, we could also use zero-code instrumentation to align with https://opentelemetry.io/)
Examples include auto-instrumentation language agents and similar
instrumentation runtimes built with OpenTelemetry SDKs.
What matters for me the most is having consistent terminology used in this PR.
There was a problem hiding this comment.
It's indeed a nit, and I don't feel that strongly about it. With whatever term you go with, you'll have people wondering about the other term. Using both helps to suggest that both are valid, but whatever, I get it....and I agree that "agent" is criminally overloaded. Not sure which term is the preferred term of art anymore honestly....
|
@dmitryax please take a look. |
| Language instrumentation agents SHOULD leave | ||
| `AgentDescription.non_identifying_attributes` empty. | ||
|
|
||
| ## Rationale |
There was a problem hiding this comment.
| ## Rationale | |
| ### Rationale |
Mostly a nit: not sure whether this section or the identifying attributes section would be better, but shouldn't this be nested under one of those two? I don't think this rationale necessarily applies to the whole document.
tigrannajaryan
left a comment
There was a problem hiding this comment.
All, I have been thinking about the suggestion @dmitryax made and I think it is a good one.
We can hard code the triplet (service.name, service.instance.id, service.namespace.name) to be the identifying attributes for Otel SDKs.
All other Resource attributes can go into non-identifying attributes. As Dmitrii said if you enable/disable detectors or the environment changes these other attributes can change, which is not desirable for identifying attributes.
This matches the 2 formal requirements OpAMP spec has for identifying attributes, quoting from the spec:
The Agent SHOULD also include these attributes in the Resource of its own telemetry.
Yes, these attributes will be in own telemetry (if SDK reports own telemetry).
The combination of identifying attributes SHOULD be sufficient to uniquely identify the Agent's own telemetry in the destination system to which the Agent sends its own telemetry.
True again, the uniqueness is guaranteed. We already guarantee that the triplet is globally unique.
This also nicely aligns conceptually with how the upcoming Entities think about identifying and non-identifying attributes. The triplet is precisely the identifying attributes of the Service Instance entity.
@breedx-splk I know we earlier agreed the full Resource is the right choice, but I think I was wrong, this is a better approach.
Blocking this to prevent merging for now until we discuss this option.
|
I should point out that @RassK suggested essentially the same thing over here in the original PR that I closed: #310 (comment) I honestly do not care much about this stuff, so I don't have a real opinion. I will change this PR to reflect the new guidance: |
|
Ok, I did the thing. I wonder if omitting |
Thanks. Yes, we can still use |
|
@andykellr @dashpole @jack-berg @RassK @Kielek this PR has changed substantially. Please take another look. |
| `Resource` attributes into `AgentDescription.identifying_attributes`: | ||
|
|
||
| - `service.name` | ||
| - `service.instance.id` |
There was a problem hiding this comment.
Is it intentional to be here?
If not provided manually, it is generated during startup process. At least in the OTel .NET Auto.
After the restart, it will provide new id.
I am pretty sure that it was common behavior in other technologies at the time of implementation.
There was a problem hiding this comment.
I didn't suggest it, but I understood it to be intentional, yes.
Without it, you're just left with service.name and service.namespace, which is too ambiguous and doesn't serve to uniquely identify an agent instance.
Kielek
left a comment
There was a problem hiding this comment.
It should be also fine, with some comment related to service.instance.id
|
Merging. If further refinements are needed, can be done in a future PR. |
|
Continuation which moves Otel Collector guidelines to the same doc: #318 |
Continuation of open-telemetry#312 This brings Otel Collector guidelines from specification.md to the opentelemetry-guidelines.md where we want all Otel-related guidelines to be.
Continuation of open-telemetry#312 This brings Otel Collector guidelines from specification.md to the opentelemetry-guidelines.md where we want all Otel-related guidelines to be.
This follows #310.
It creates a new document instead of polluting the general high level OpAMP spec with tedious details about actual behavior.