-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow all metric conventions to be either synchronous or asynchronous. #2458
Allow all metric conventions to be either synchronous or asynchronous. #2458
Conversation
922a501
to
9be3f8e
Compare
I would suggest that whether metric instruments are synchronous or asynchronous is always implementation detail. You should be able to switch between these instrument forms for performance reasons, and temporality controls are available for exporters to control what the consumer receives. Moreover, for the OTLP exporter specifically, the supported temporality preferences will give consistent temporality output, even when the underlying implementation changes between synchronous and asynchronous instruments. |
Thanks Trask, I merged it, and rebased to the latest origin/main. |
4faebf4
to
c28ba84
Compare
@fstab can you update the PR title and description to reflect the expanded scope? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to clarify the "semantic convention" rational to use the "instrument" type instead of the Aggregation type in the table. For OTLP consumers it is irrelevant the instrument type (since someone can use a view to change this), so what is the correct way to define the "semantic convention"? Who is the user/consumer of these? Etc.
67eae26
to
5727162
Compare
@trask I updated the title and description, and I rebased on |
Opinions here: The "user" is the one who decides to link in a standards-conventional library of code; the standards conventions tell them what kind of view they should expect to be able to write. The Aggregation is user-controllable from that point onward, with the conventional instrument telling you which Aggregation to expect by default (but not which temporality). I'm so pleased that this issue has come to the head, because the point of temporality is that it is non-semantic in nature. In other words--its a performance and cost trade. I loved hearing this:
Sweet!! |
@fstab thx! it will be squashed during merge
My basic view (biased as an instrumenter I'm sure) is that the semantic conventions are a set of requirements for compliant instrumentation. This also then defines guarantees for what OTLP consumers / backends can expect from compliant instrumentation under default views / span processors / log processors. If we want to make a stronger statement about metric aggregation type, we could list both instrument type and aggregation type in the semantic conventions to make it clear that if a user alters the aggregation type of a "standard" metric, it will no longer be considered a "standard" metric. |
This is a good point. I'm still struggling with the terminology. We are currently using What we could do is use
The What do you think? |
@fstab these are the available Aggregation Types (essentially what the SDK could potentially produce): Data Model is a super set of what the SDK could produce because it contains extra stuff (e.g. Summary) for compat/interop with other systems (e.g. Prometheus). Hope this helps! |
Thanks a lot @reyang. My understanding is if we specify the aggregation (left column) in the semantic conventions, then implementations could choose any of the compatible instrument types (right column).
Sounds like for monitoring backends it should be fine to specify aggregations only, because that defines what monitoring backends can expect to be sent over the wire (except for temporality). For implementers that should be fine too, because they are free to choose any of the compatible Instrument types, as the instrument type would be an implementation detail. I'm happy to replace Instrument with Aggregation in the semantic conventions. What do you think? |
Correction: @fstab, Asynchronous Gauge instruments are not compatible with Non-monotonic Sum--that's a semantic change. Otherwise, I support the direction you're heading.
I agree this needs to be fixed. One solution would be e.g. to list "Counter or Asynchronous Counter" everywhere. Maybe "(Asynchronous) Counter" conveys this without new terminology? We might want way to refer to instruments and semantic-equivalents. Counters and Asynchronous Counters are semantic equivalents, and same for UpDownCounter and Asynchronous UpDownCounter. Otherwise, there are no other semantic equivalencies between instruments.
Would a general guideline that "semantic conventions are written to use "Counter" and "UpDownCounter" but for each of these the asynchronous equivalent may be used" work? |
b82d948
to
87f4e80
Compare
I fixed the Markdownlint error by adding a \n to the end of |
First, to get it out of the way I support this PR as JS maintainer. I think this allows quite a bit more freedom for instrumentation authors and different libraries/runtimes may expose metrics in different ways so allowing this flexibility seems like a good idea for the long haul. Now I'd like to address the disagreement I see here. It seems like there is a fundamental disagreement here on who the semantic conventions are for. Is it meant to describe what the instrumentation authors should do, or is it meant to describe what the backend processors should expect to see? This question hasn't been that important until now because in tracing there was no mechanism like views which could cause those two to be in disagreement. Now with views, it is possible for the user to configure metrics such that they disagree with the semantic conventions. I think this is a question that we need to resolve and clearly document, because without it we are likely to see these disagreements again in the future, and even worse than that a backend may incorrectly process or display data based on the semantic conventions in particular configurations. I think it can be handled outside this PR, but I thought it was important to address. |
Are you concerned about a potential change of aggregation or change of temporality? The SDK has temporality controls so I think this is not the concern you're having. If the concern is about change of aggregation, isn't that the whole point of a Views feature in our SDKs? Viewing a Histogram instrument using the SUM, COUNT, MIN, or MAX aggregation, for example--the user does this on purpose, so I don't see a problem. |
The way I understand it, the semantic conventions are meant to indicate 1 of 4 semantic categories. Any aggregation that is semantically compatible with the instrument category is meaningful. I believe our job as the authors of semantic conventions and a data model is to define what is meaningful, not what is useful. A simple backend that faithfully displays each kind of aggregation can make a useful presentation out of any data, which should convey meaning to a user provided that the aggregation was semantically compatible with the instrument dictated by semantic conventions. The user controls the View, and the user receives the intended meaning for whatever use they have. Does this answer your concerns @dyladan and @bogdandrutu? |
I think we are in agreement, but @bogdandrutu's concern seems different than what mine was. It seems to me that @bogdandrutu feels that the semconv should be able to be interpreted exactly by a backend to decide what data to display and how to display it. You seem to feel that the semconv describes what the data means, but not necessarily what its exact representation needs to be and requires the backend to have some flexibility. FWIW I think you and I are interpreting the meaning of semantic conventions the same way. My point is simply that this needs to be agreed on by everyone or we're going to see these arguments again. |
+1, I've created #2475 to track this issue and will raise it to the semantic convention SIG. This is a @fstab there is one CI job failing? Would you fix it? Thanks! |
Signed-off-by: Fabian Stäber <[email protected]>
Signed-off-by: Fabian Stäber <[email protected]>
Signed-off-by: Fabian Stäber <[email protected]>
Signed-off-by: Fabian Stäber <[email protected]>
Signed-off-by: Fabian Stäber <[email protected]>
Signed-off-by: Fabian Stäber <[email protected]>
af98fae
to
8b5d6a5
Compare
Fixed the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I support this and agree that sync vs async instruments are an implementation detail and the semantic convention doesn't need to specify it. One example that comes to mind is http.server.active_requests
. This was listed as async in the semantic convention, but I can imagine many instrumentations use a synchronous instrument to record it depending on implementation.
There is a transient failure which was caused by the web.archive.org maintenance ... @fstab I think you've done your part, let's wait and I'll trigger the CI again once web.archive.org has recovered. |
Changes
It should be an implementation detail whether metric instruments are synchronous or asynchronous.
As an example, for many Java runtime metrics there will be two ways to implement them: Using JFR events, or using JMX. JFR-based implementations will likely be synchronous, while JMX-based implementations will likely be asynchronous. The semantic conventions spec should be defined in a way to allow compatible implementations both ways.
This is generally true, not just for Java runtime metrics.
This PR removes "synchronous" and "asynchronous" from the specification of semantic conventions for metrics.