Skip to content

Should telemetry stability rely on schema transformation #3296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lmolkova opened this issue Mar 6, 2023 · 12 comments · Fixed by #3380
Closed

Should telemetry stability rely on schema transformation #3296

lmolkova opened this issue Mar 6, 2023 · 12 comments · Fixed by #3380
Assignees
Labels
area:semantic-conventions Related to semantic conventions spec:miscellaneous For issues that don't match any other spec label

Comments

@lmolkova
Copy link
Contributor

lmolkova commented Mar 6, 2023

The subject came up in semconv SIG meeting:

Assuming semconv becomes stable (e.g. HTTP), attribute renames are considered problematic. For example, alignment with ECS would rename most of the commonly used HTTP and network attributes causing dissatisfaction from users who don't use schema transformation (which is likely to be a majority of otel users).
Schema transformation is not implemented by default and implies that users leverage the collector or their vendor is fast enough to support schema changes keeping alerts and dashboards working.

With these doubts expressed, I suggest to keep this part of versioning-and-stability.md doc that talks about schema transformation as experimental:

Changes to telemetry produced by OpenTelemetry instrumentation SHOULD avoid
breaking analysis tools, such as dashboards and alerts. To achieve this, while
allowing the evolution of telemetry and semantic conventions, OpenTelemetry
relies on the concept of
[Telemetry Schemas](schemas/README.md).
Changes to semantic conventions in this specification are allowed, provided that
the changes can be described by schema files. The following changes can be
currently described and are allowed:
- Renaming of span, metric, log and resource attributes.
- Renaming of metrics.
- Renaming of span events.

@jack-berg
Copy link
Member

My vote is that telemetry stability should not depend on schema transformation stability. While there is not yet a collector processor that can consume telemetry schemas to ensure compatibility, the concept seems reasonable and we appear to just be waiting on the emergence of contributors motivated to build the thing. Given our risk aversion to making breaking changes to the existing semantic conventions, its not super surprising that building the thing hasn't been a priority.

If an individual is impacted by a change to a convention they can use a more general purpose collector processor to reshape the telemetry, or go and build a dedicated schema transformation processor.

@lmolkova
Copy link
Contributor Author

lmolkova commented Mar 8, 2023

@jack-berg I agree in the long term.

In the short term, we don't have the tooling or confidence to allow attribute renames. The benefit of attribute renames also seems to be low - why would we do this after stability?

So, what I'm suggesting is to let telemetry stability definition become stable without schema transformation and without attribute/metrics/spans renames. They can come later when tooling and confidence catch up.

@jack-berg
Copy link
Member

So, what I'm suggesting is to let telemetry stability definition become stable without schema transformation and without attribute/metrics/spans renames. They can come later when tooling and confidence catch up.

So essentially stable conventions can add new attributes but not rename until tooling develops for schema transformation? That also seems reasonable and a good forcing function to encourage development of tooling.

@jsuereth
Copy link
Contributor

jsuereth commented Mar 9, 2023

We actually can't add new attributes to metrics, FYI.

I have a PR to reduce this restriction to adding metrics that do not increase # of timeseries, but that's not through yet.

@lmolkova
Copy link
Contributor Author

lmolkova commented Mar 25, 2023

Looked into existing tooling and perf overhead of schema transformation:

  • There is no existing schema transformation implementation. schemaprocessor in collector-contrib is a stub.
    Could not find anything existing on github that operates with OTel schema URLs. We have a schema file parser here, but afaik no transformation.

  • Checked perf overhead of explicit dummy manual transformation with transform processor for 4 messaging attributes from v1.16.0 -> 1.19.0. The impact is minimal (~2% in throughput). Full data and configs here.

I.e. it's very likely possible to do schema transformation efficiently, but until there is vendor-neutral tooling, we should now allow attribute renames.

(Did I miss something? Is there anyone in the community who does generic schema transformation?)

Related: open-telemetry/opentelemetry-collector-contrib#5036

@yurishkuro
Copy link
Member

So, what I'm suggesting is to let telemetry stability definition become stable without schema transformation and without attribute/metrics/spans renames. They can come later when tooling and confidence catch up.

This seems like a flawed argument. When something is considered stable with an established process for evolution, then removing the ability to rename stuff is a backwards-compatible change, but adding that ability is a breaking change.

@trask
Copy link
Member

trask commented Apr 6, 2023

with an established process for evolution

I think this is the problem we are facing right now, that since no one has implemented this, it's hard to say that it's an "established process for evolution"

@yurishkuro
Copy link
Member

That's a technicality. My point is that after making this change there is no way back to allow renaming, even if tooling supports it, because the stability guarantee will preclude renaming.

@lmolkova
Copy link
Contributor Author

lmolkova commented Apr 6, 2023

making something less strict is in general not breaking. The stability is defined now as a contract between instrumentation and consumer

Semantic conventions define a contract between the signals that instrumentation
will provide and analysis tools that consumes the instrumentation (e.g.
dashboards, alerts, queries, etc.).
Changes to telemetry produced by OpenTelemetry instrumentation SHOULD avoid
breaking analysis tools, such as dashboards and alerts. To achieve this, while
allowing the evolution of telemetry and semantic conventions, OpenTelemetry
relies on the concept of

Arguably, the transformation layer that might exist in the future will transform the telemetry of version X to one consumer wants without anyone noticing. If such layer can't be added, then we should never allow renames (which is not the end of the world).

@arminru arminru added the spec:miscellaneous For issues that don't match any other spec label label Apr 6, 2023
@tigrannajaryan
Copy link
Member

tigrannajaryan commented Apr 6, 2023

making something less strict is in general not breaking

@yurishkuro is correct.

Making something less strict on the producer end is breaking for consumer who did not previously expect the less strict behavior of the producer.

We should not remove the notion of schema-aided transformations from the spec. The existence of the requirement in the spec ensures that consumers don't make an assumption that the attributes cannot be renamed. They can be renamed in the future and consumers are warned that it is a possible.

Note that this does not force us to use the renaming capability. We can easily place a moratorium on using the renaming capability. However, mere existence of the capabilities description in the spec allows us to easily remove the moratorium and begin using the capability. Without the capability in the spec you can no longer add it in the future, it will be a breaking change from consumer's perspective.

@yurishkuro
Copy link
Member

What @tigrannajaryan said. It is in our data contract as producer that we can rename attributes as long as we follow the established process with the schemas. The fact that consumers chose not to implement the schema mechanism is not our fault, we are not breaking the contract by exercising the existing right. Once we remove that right from the contract we cannot add it back without breaking the contract. It would be like a shopping orders producer deciding "I feel like being less strict today so I won't include the shipping address with the orders".

@trask
Copy link
Member

trask commented Apr 6, 2023

I think @tigrannajaryan's proposal sounds reasonable, to put a moratorium on relying on schema transformations in stable semantic conventions. We can lift the moratorium if/when we think it's ok for telemetry stability to rely on them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment