-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for sampling.priority #107
Conversation
I signed it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what "priority" means in the context of this document, nor why it should be in the range 1-1000. Can you explain?
I am familiar with sampling "probability". How will a fixed-probability sampler encode its probability? How will an unequal-probability sampler encode its probability?
We could consider changing the name to make it clear also I was pointed out by @reyang that it is not following the ABNF format for the key . As the trace passes service boundaries that are using different algorithms for sampling how do we ensure that the trace that was sampled in by the upstream component is honored. So as the trace is initiated the sampling priority is calculated based on any algorithm and then the other components in the call path use this value to decide whether the trace should be recorded based on it's sampling rate. |
To try clearing up some confusion: Apart from the 0-1000 instead of 0..1 range, when you talk about "priority", did you mean to use the word "probability" instead, @Kanwaldeep? |
|
||
### Probability Sampler default behavior | ||
|
||
The default behavior of probability sampler is to calculate a stateless hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is not yet agreed upon, although there are some PRs open, namely open-telemetry/opentelemetry-specification#331 which is several months old and just yesterday we got open-telemetry/opentelemetry-specification#611
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this behavior even relevant to this OTEP? I think whichever algorithm is used by the sampler, as long as it can express the result via a "probability" value (not every sampler can, e.g. rate limiting), the actual calculation is not relevant.
text/trace/0107-sampling-priority.md
Outdated
function of `trace-id` to make a sampling decision. | ||
|
||
It is suggested to extend the sampler behavior to expose a `tracestate` field | ||
called `sampling.priority` with the integer value `[0-1000]` that will indicate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indicate how? What does a value of 408 mean? 40.8% probability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. The max value need to be agreed on. 0-1000 would work for most services (sampling down to 1 out of 1000). For extremely loaded services may need more, but likely they may opt out of propagating this flag to optimize performance.
Alternative approach is to go 1..100 and allow non-integers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Large scale systems are especially in need of having one trace stand for many traces; thus I worry about artificially capping at 1000 sample rate.
text/trace/0107-sampling-priority.md
Outdated
sampling priority of the current span. | ||
|
||
This will allow to align sampling algorithms between various components. | ||
Especially for the transition scenarios from the different SDKs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which "transition" scenarios? Switching from an old to a new version? Please clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. If there are different hashing algos used for historical reasons - you can make compoents respect sampling.priority first and then change algo underneath.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it may be expanded in a real spec. @Oberon00 does this note clarify things or you think more descriptive tesxt is needed here. If it's very unclear - @Kanwaldeep worth expanding a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, that the discussion in open-telemetry/opentelemetry-specification#611 has people split between using the "random bits" of the UUID from traceid generation, vs. hashing the whole traceid again just in case the trace id was not randomly generated.
@Oberon00 name is coming from OpenTracing and is already used in collector: |
Please write about this in the prior art section. I can't find the connection of the word "priority" as defined in e.g. https://www.merriam-webster.com/dictionary/priority to the meaning I assume it has in this OTEP. But following the link behind your link to https://github.com/opentracing/specification/blob/11dd7f8/semantic_conventions.md#span-tags-table the definition of priority seems to be effectively a simple boolean hasPriority:
(OK, actually a tri-bool, with the three states absent or less than zero: default; zero: don't sample; greater than zero: please sample). |
I can add the link to previous art as mentioned. Also I don't think the tri-bool would work as the proposal is to use the sampling.priority to make the sampling decision if passed. If the sampling priority is within the adjusted sampling rate configured for the component. @SergeyKanzhelev let me know if this is what you are thinking as well. If so I can work on updating the proposal. |
@Oberon00 most implementation just set a value Do you know of other prior art and names that can be used? Are you concerned with the previous art as some might have used the attribute with the same name differently before and didn't set probabilistic value? |
I'm mostly concerned that you never define what your "priority" is in this PR. If you clearly define what it is, maybe we can then find a better name as a next step, or maybe we find that priority does fit the definition. |
I've updated the proposal with the priority calculation. |
I still do not understand why the word "priority" is being used. It seems misleading, and I also do not see why we would create a standard that encourages loss of information due to rounding. I can see two useful and non-lossy conventions we might adopt:
One is the inverse of the other. The reason why "adjusted_count" (i.e., the inverse probability)( is my preference is that it has a natural interpretation. The adjusted count of a sampled item is the number of items it is representative for in the whole population. Items generated with a 40.8% probability sample have adjusted counts of 1/0.408 = 2.45098039216. When you see one of these items, you would count it as 2.45098039216 items. It is important not to lose precision and accuracy here because we would like to transform sampled span data into metric data. When we generate a metric count from a 40.8% sampled span, we should count precisely 2.45098039216 items. |
I like probability over adjusted_count with the bound range between 0 and 1. However, I generally agree these terms are more accurate to what the intention is versus priority. |
I agree with sampling.probability instead of priority with a floating point value in the range 0-1. My only concern is if the name would be confusing with this open PR - open-telemetry/opentelemetry-specification#570 |
It looks like open-telemetry/opentelemetry-specification#570 and this proposal are the same, other than "sampling" vs "sample" as the label key prefix. Maybe we don't need this OTEP, just need to merge 570. |
Except for the in-band propagation via tracestate, which seems pretty important. |
OK. I read through this and feel a bit befuddled by the discussion about coordinating sampling. Can we agree that the in-band propagation of sampling probability is independent and orthogonal to the in-band propagation of the sampling policy that was used? I am interested in knowing that my parent was or was-not selected for sampling and the (upper bound on) its probability. I am not interested in head sampling using fixed probability. The reason I wrote "upper bound on" is that it is easy to imagine a sampling scheme combining head and tail sampling policies. We might use a 10% head sample to lower the overhead of tracing, and then decide which among those 10% of spans should be recorded in a tail sampler. The tail sampler will correctly compute the outcome probability and apply it as a |
Yes agreed that sampling probability is different than upper bound of the sampling policy. This proposal is about having the first app/component in the distributed trace calculating the probability and propagating the value so the next app/component can honor it. This is to align apps using different algorithms for Probability Sampler. |
Could we make clear why a tracing system would prefer to propagate probability and then use it downstream, as opposed to propagating a decision (true/false) for the downstream system to honor? (I'm not sure I know the answer.) |
The reason we need probability is the components downstream are using different algorithm for sampling. So let's assume head component A started the trace and sampled in 60% of the traces and then next component B has sampling policy of 30% but is using different sampling algorithm. In this scenario how can we make sure that the 30% of the traces sampled in by component B are subset of the 60% traces sampled in by component A. |
This, and also systems that depend upon dynamically sampling traces to generate metrics rather than a separate metrics system, need to know how many spans one span represents when counting up spans, computing percentiles of latency, etc. |
I do not see any disagreement with open-telemetry/opentelemetry-specification#570. The calculated probability of the span equals the configured probability of the sampler. If you have another interpretation, please clarify? |
Yup, we're in vigorous agreement. |
Note that we are discussing sampling in the metrics API here: #113 In that proposal, I've encouraged us to use the term "sample_count" to describe inverse probability. I'm willing to accept a "sample_probability" field instead, containing the inverse. |
Definitely seems like I am missing something :( . The probability sampler calculates the hash function and checks if it is within the configured probability of the sampler. For example Sampled-in = hash(trace-id) / max_long < sampling-ratio. So the output of hash(trace-id)/ max_long is what we want to propagate. My understanding is the sampling-ratio is the configured probability of the sampler. |
The value of If you consider After the sampling decision is made, the value of |
So if you only pass sampling_probability say 0.6 then it's not clear to me how the following scenario can be handled. So let's assume head component A started the trace and sampled in 60% of the traces and then next component B has sampling policy of 30% but is using different sampling algorithm. In this scenario how can we make sure that the 30% of the traces sampled in by component B are subset of the 60% traces sampled in by component A. The idea of the proposal is Component B checks the passed probability and makes the decision. So if the passed value is 0.6 then Component B would not sample it in. But if the output of the hash function in component A was 0.25 and that value was passed to Component B then it would sample this trace in as it is using the calculated probability by A to make the decision. |
Now I understand the meaning of "priority", I apologize for taking so long to get to this point. There are a lot of ways to organize a sampling scheme. The ones I'm most familiar with allow the root in a trace to make the sampling decision, and all the children will respect it. In this case, the children don't need to know the probability. Your example describes a coordinated sampling scheme, where it is important to know how the decision was made, for the purpose of coordination. Sending the proposed "priority" (hash(trace_id) / max_long) is one way to achieve that, but then all uses of this value have to follow this one scheme, right? I believe elsewhere @yurishkuro has described how Jaeger conveys this information as two attributes: (1) the sampling probability, (2) the sampling technique. In your example, we could transmit the
The recipient must know the scheme, and then can apply the same hash to compute the priority value that you're interested in. |
They could configure not to use the passed in sampling.priority. But yes all the components use the output of the algorithm as calculated by the root span. |
I will wait for @yurishkuro for more details on this. But passing the sample.scheme would require all the downstream components to handle implementation for various schemes to calculate the probability. My 2 cents: Passing sampling.priority seems much more efficient for this use case. |
It looks like Jaeger uses another meaning for the word "priority", see here: #115 (comment) @yurishkuro there is a question for you above. |
Not sure what the question is. Jaeger does not pass sampling parameters downstream. The decision is made once, represented by a single bit in the flags, and respected by all children. The |
Thanks @yurishkuro, sorry it wasn't a proper question. I couldn't find information about @Kanwaldeep I wonder if this use of a |
I was thinking sampling.probability.output as this would make it clear. By making this a standard we make it easier for teams to adopt OTel SDK if they are already using other SDKs. |
Based on discussion in Sampling WG meeting we are working on PR for:
|
What are "plugins" here? You mean tracestate should be publicly accessible? I think it is. |
👋 Hope all is well. Just popping in from otel-js to echo what @lizthegrey offered here #107 (comment). This PR's approach, or something like otep-specification #570 would be tremendously helpful for vendor specific exporters/systems which rely on the sampled traces/spans themselves to generate metrics (like hits/latency/etc) rather than a separate metrics system. I won't pretend to have enough knowledge about the Spec to say whether this info should live on |
This is the crucial difference though. Samplers can already set normal attributes today. Is this enough for you? The key difference is that normal attributes are not propagated in-band, while traceState is (e.g. in the W3C tracestate HTTP header). |
@Oberon00 In-band means, within the same process right? Basically, would that attribute be propagated from the parent to a non-root span? It seems like at the moment (at least in practice) that is not the case, and what PR #570 is trying to address, no? In-Band (if i understand the term correctly) propagation is the main issue for the use case I have in mind, since the idea is that currently the default |
No, it means in the same communication "band" that the process would also use without instrumentation to communicate with other processes. E.g. if a process normally makes an HTTP request to some other service, with in-band propagation, headers would be injected into that HTTP request. But span attributes (as opposed to trace state) are not propagated at all (not in-process, not cross-process, not in-band; they are sent out-of-band to some backend with the exporter typically). |
@Oberon00 ah ok gotcha, thank you for the clarification. So, at the most basic level, what matters more for me at least, is in-process propagation. This seems more reasonably accomplished by PR #570. Does that help clarify things? Hmm TL;DR I should've commented on PR #570 instead, but i felt the the use case described in the comments here was worth highlighting. |
Closing in favor of #135. |
Closing as @lmolkova has started a new pull request for this proposal. |
No description provided.