Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for sampling.priority #107

Closed
wants to merge 7 commits into from

Conversation

Kanwaldeep
Copy link

No description provided.

@Kanwaldeep
Copy link
Author

I signed it

Copy link
Contributor

@jmacd jmacd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what "priority" means in the context of this document, nor why it should be in the range 1-1000. Can you explain?

I am familiar with sampling "probability". How will a fixed-probability sampler encode its probability? How will an unequal-probability sampler encode its probability?

@Kanwaldeep
Copy link
Author

We could consider changing the name to make it clear also I was pointed out by @reyang that it is not following the ABNF format for the key . As the trace passes service boundaries that are using different algorithms for sampling how do we ensure that the trace that was sampled in by the upstream component is honored. So as the trace is initiated the sampling priority is calculated based on any algorithm and then the other components in the call path use this value to decide whether the trace should be recorded based on it's sampling rate.

@Oberon00
Copy link
Member

To try clearing up some confusion: Apart from the 0-1000 instead of 0..1 range, when you talk about "priority", did you mean to use the word "probability" instead, @Kanwaldeep?


### Probability Sampler default behavior

The default behavior of probability sampler is to calculate a stateless hash
Copy link
Member

@Oberon00 Oberon00 May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is not yet agreed upon, although there are some PRs open, namely open-telemetry/opentelemetry-specification#331 which is several months old and just yesterday we got open-telemetry/opentelemetry-specification#611

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this behavior even relevant to this OTEP? I think whichever algorithm is used by the sampler, as long as it can express the result via a "probability" value (not every sampler can, e.g. rate limiting), the actual calculation is not relevant.

function of `trace-id` to make a sampling decision.

It is suggested to extend the sampler behavior to expose a `tracestate` field
called `sampling.priority` with the integer value `[0-1000]` that will indicate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indicate how? What does a value of 408 mean? 40.8% probability?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. The max value need to be agreed on. 0-1000 would work for most services (sampling down to 1 out of 1000). For extremely loaded services may need more, but likely they may opt out of propagating this flag to optimize performance.

Alternative approach is to go 1..100 and allow non-integers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large scale systems are especially in need of having one trace stand for many traces; thus I worry about artificially capping at 1000 sample rate.

sampling priority of the current span.

This will allow to align sampling algorithms between various components.
Especially for the transition scenarios from the different SDKs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which "transition" scenarios? Switching from an old to a new version? Please clarify.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. If there are different hashing algos used for historical reasons - you can make compoents respect sampling.priority first and then change algo underneath.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may be expanded in a real spec. @Oberon00 does this note clarify things or you think more descriptive tesxt is needed here. If it's very unclear - @Kanwaldeep worth expanding a bit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, that the discussion in open-telemetry/opentelemetry-specification#611 has people split between using the "random bits" of the UUID from traceid generation, vs. hashing the whole traceid again just in case the trace id was not randomly generated.

@Oberon00
Copy link
Member

Oberon00 commented May 19, 2020

Please write about this in the prior art section. I can't find the connection of the word "priority" as defined in e.g. https://www.merriam-webster.com/dictionary/priority to the meaning I assume it has in this OTEP. But following the link behind your link to https://github.com/opentracing/specification/blob/11dd7f8/semantic_conventions.md#span-tags-table the definition of priority seems to be effectively a simple boolean hasPriority:

If greater than 0, a hint to the Tracer to do its best to capture the trace. If 0, a hint to the trace to not-capture the trace. If absent, the Tracer should use its default sampling mechanism.

(OK, actually a tri-bool, with the three states absent or less than zero: default; zero: don't sample; greater than zero: please sample).

@Kanwaldeep
Copy link
Author

(OK, actually a tri-bool, with the three states absent or less than zero: default; zero: don't sample; greater than zero: please sample).

I can add the link to previous art as mentioned. Also I don't think the tri-bool would work as the proposal is to use the sampling.priority to make the sampling decision if passed. If the sampling priority is within the adjusted sampling rate configured for the component. @SergeyKanzhelev let me know if this is what you are thinking as well. If so I can work on updating the proposal.

@SergeyKanzhelev
Copy link
Member

@Oberon00 most implementation just set a value >0 to indicate "sampled" decision. This proposal extends it to also propagate the probability of sampling. In other words, even so the decision was to DO sample, as component really wants to sample, it can use this probability instead of a hash of traceID.

Do you know of other prior art and names that can be used? Are you concerned with the previous art as some might have used the attribute with the same name differently before and didn't set probabilistic value?

@Oberon00
Copy link
Member

I'm mostly concerned that you never define what your "priority" is in this PR. If you clearly define what it is, maybe we can then find a better name as a next step, or maybe we find that priority does fit the definition.

@Kanwaldeep
Copy link
Author

I'm mostly concerned that you never define what your "priority" is in this PR. If you clearly define what it is, maybe we can then find a better name as a next step, or maybe we find that priority does fit the definition.

I've updated the proposal with the priority calculation.

@jmacd
Copy link
Contributor

jmacd commented May 28, 2020

I still do not understand why the word "priority" is being used. It seems misleading, and I also do not see why we would create a standard that encourages loss of information due to rounding.

I can see two useful and non-lossy conventions we might adopt:

  1. sampling.probability with a floating point value in the range 0-1
  2. sampling.adjusted_count with a floating point value in the range 0-infinity

One is the inverse of the other. The reason why "adjusted_count" (i.e., the inverse probability)( is my preference is that it has a natural interpretation. The adjusted count of a sampled item is the number of items it is representative for in the whole population. Items generated with a 40.8% probability sample have adjusted counts of 1/0.408 = 2.45098039216. When you see one of these items, you would count it as 2.45098039216 items.

It is important not to lose precision and accuracy here because we would like to transform sampled span data into metric data. When we generate a metric count from a 40.8% sampled span, we should count precisely 2.45098039216 items.

@MikeGoldsmith
Copy link
Member

I like probability over adjusted_count with the bound range between 0 and 1. However, I generally agree these terms are more accurate to what the intention is versus priority.

@Kanwaldeep
Copy link
Author

I agree with sampling.probability instead of priority with a floating point value in the range 0-1. My only concern is if the name would be confusing with this open PR - open-telemetry/opentelemetry-specification#570

@jmacd
Copy link
Contributor

jmacd commented May 29, 2020

It looks like open-telemetry/opentelemetry-specification#570 and this proposal are the same, other than "sampling" vs "sample" as the label key prefix. Maybe we don't need this OTEP, just need to merge 570.

@Oberon00
Copy link
Member

Except for the in-band propagation via tracestate, which seems pretty important.

@jmacd
Copy link
Contributor

jmacd commented May 29, 2020

OK. I read through this and feel a bit befuddled by the discussion about coordinating sampling. Can we agree that the in-band propagation of sampling probability is independent and orthogonal to the in-band propagation of the sampling policy that was used? I am interested in knowing that my parent was or was-not selected for sampling and the (upper bound on) its probability. I am not interested in head sampling using fixed probability.

The reason I wrote "upper bound on" is that it is easy to imagine a sampling scheme combining head and tail sampling policies. We might use a 10% head sample to lower the overhead of tracing, and then decide which among those 10% of spans should be recorded in a tail sampler. The tail sampler will correctly compute the outcome probability and apply it as a sampling.probability attribute, but the probability is expected to be <= 10%. Therefore the value we propagate in-band is an upper-bound on the probability that the parent was recorded.

@Kanwaldeep
Copy link
Author

Yes agreed that sampling probability is different than upper bound of the sampling policy.

This proposal is about having the first app/component in the distributed trace calculating the probability and propagating the value so the next app/component can honor it. This is to align apps using different algorithms for Probability Sampler.

@jmacd
Copy link
Contributor

jmacd commented May 30, 2020

Could we make clear why a tracing system would prefer to propagate probability and then use it downstream, as opposed to propagating a decision (true/false) for the downstream system to honor? (I'm not sure I know the answer.)

@Kanwaldeep
Copy link
Author

The reason we need probability is the components downstream are using different algorithm for sampling. So let's assume head component A started the trace and sampled in 60% of the traces and then next component B has sampling policy of 30% but is using different sampling algorithm. In this scenario how can we make sure that the 30% of the traces sampled in by component B are subset of the 60% traces sampled in by component A.

@lizthegrey
Copy link
Member

The reason we need probability is the components downstream are using different algorithm for sampling. So let's assume head component A started the trace and sampled in 60% of the traces and then next component B has sampling policy of 30% but is using different sampling algorithm. In this scenario how can we make sure that the 30% of the traces sampled in by component B are subset of the 60% traces sampled in by component A.

This, and also systems that depend upon dynamically sampling traces to generate metrics rather than a separate metrics system, need to know how many spans one span represents when counting up spans, computing percentiles of latency, etc.

@jmacd
Copy link
Contributor

jmacd commented Jun 9, 2020

I do not see any disagreement with open-telemetry/opentelemetry-specification#570.

The calculated probability of the span equals the configured probability of the sampler. If you have another interpretation, please clarify?

@lizthegrey
Copy link
Member

I do not see any disagreement with open-telemetry/opentelemetry-specification#570.

The calculated probability of the span equals the configured probability of the sampler. If you have another interpretation, please clarify?

Yup, we're in vigorous agreement.

@jmacd
Copy link
Contributor

jmacd commented Jun 9, 2020

Note that we are discussing sampling in the metrics API here: #113

In that proposal, I've encouraged us to use the term "sample_count" to describe inverse probability. I'm willing to accept a "sample_probability" field instead, containing the inverse.

@Kanwaldeep
Copy link
Author

Kanwaldeep commented Jun 9, 2020

I do not see any disagreement with open-telemetry/opentelemetry-specification#570.

The calculated probability of the span equals the configured probability of the sampler. If you have another interpretation, please clarify?

Definitely seems like I am missing something :( . The probability sampler calculates the hash function and checks if it is within the configured probability of the sampler. For example Sampled-in = hash(trace-id) / max_long < sampling-ratio. So the output of hash(trace-id)/ max_long is what we want to propagate. My understanding is the sampling-ratio is the configured probability of the sampler.

@jmacd
Copy link
Contributor

jmacd commented Jun 9, 2020

The value of hash(trace-id) / max_long is a presumably a uniform random number in the range [0,1] (assuming a perfect hash function). The equation hash(trace-id) / max_long < sampling_probability will be true with probability sampling_probability. If you consider one span, the expected number of spans that will be recorded equals sampling_probability, so with 100% sampling you expect 1 output sample span per arriving span. With 50% sampling you expect 0.5 sample spans per arriving span.

If you consider 1 / sampling_probability arriving spans, the expected number of sample spans equals sampling_probability * 1 / samping_probability == 1, and that is why when we see one sample span we want to know its sampling_probability (or the inverse).

After the sampling decision is made, the value of hash(trace-id) / max_long is not relevant, sampling_probability (or the inverse) is what matters to the downstream system.

@Kanwaldeep
Copy link
Author

So if you only pass sampling_probability say 0.6 then it's not clear to me how the following scenario can be handled.

So let's assume head component A started the trace and sampled in 60% of the traces and then next component B has sampling policy of 30% but is using different sampling algorithm. In this scenario how can we make sure that the 30% of the traces sampled in by component B are subset of the 60% traces sampled in by component A.

The idea of the proposal is Component B checks the passed probability and makes the decision. So if the passed value is 0.6 then Component B would not sample it in. But if the output of the hash function in component A was 0.25 and that value was passed to Component B then it would sample this trace in as it is using the calculated probability by A to make the decision.

@jmacd
Copy link
Contributor

jmacd commented Jun 9, 2020

Now I understand the meaning of "priority", I apologize for taking so long to get to this point.

There are a lot of ways to organize a sampling scheme. The ones I'm most familiar with allow the root in a trace to make the sampling decision, and all the children will respect it. In this case, the children don't need to know the probability.

Your example describes a coordinated sampling scheme, where it is important to know how the decision was made, for the purpose of coordination. Sending the proposed "priority" (hash(trace_id) / max_long) is one way to achieve that, but then all uses of this value have to follow this one scheme, right?

I believe elsewhere @yurishkuro has described how Jaeger conveys this information as two attributes: (1) the sampling probability, (2) the sampling technique. In your example, we could transmit the sample.probability as a value in the range [0,1] along with a description of the coordination scheme you've used, which in this case means a descriptor for "consistent hashing of trace_id". For example:

sample.probability=0.5
sample.scheme=cityhash(traceid)

The recipient must know the scheme, and then can apply the same hash to compute the priority value that you're interested in.

@Kanwaldeep
Copy link
Author

Your example describes a coordinated sampling scheme, where it is important to know how the decision was made, for the purpose of coordination. Sending the proposed "priority" (hash(trace_id) / max_long) is one way to achieve that, but then all uses of this value have to follow this one scheme, right?

They could configure not to use the passed in sampling.priority. But yes all the components use the output of the algorithm as calculated by the root span.

@Kanwaldeep
Copy link
Author

The recipient must know the scheme, and then can apply the same hash to compute the priority value that you're interested in.

I will wait for @yurishkuro for more details on this. But passing the sample.scheme would require all the downstream components to handle implementation for various schemes to calculate the probability. My 2 cents: Passing sampling.priority seems much more efficient for this use case.

@jmacd
Copy link
Contributor

jmacd commented Jun 11, 2020

It looks like Jaeger uses another meaning for the word "priority", see here: #115 (comment)

@yurishkuro there is a question for you above.

@yurishkuro
Copy link
Member

Not sure what the question is. Jaeger does not pass sampling parameters downstream. The decision is made once, represented by a single bit in the flags, and respected by all children. The sampler.type and sampler.param tags are reported with the span out of band.

@jmacd
Copy link
Contributor

jmacd commented Jun 12, 2020

Thanks @yurishkuro, sorry it wasn't a proper question. I couldn't find information about sampler.type and sampler.param.

@Kanwaldeep I wonder if this use of a tracestate attribute even needs to be standardized. The system you are describing will have to be aware of this on either end, right? At the very least, I believe the word "priority" has caused considerable confusion and I wonder if there might be a better choice.

@Kanwaldeep
Copy link
Author

@Kanwaldeep I wonder if this use of a tracestate attribute even needs to be standardized. The system you are describing will have to be aware of this on either end, right? At the very least, I believe the word "priority" has caused considerable confusion and I wonder if there might be a better choice.

I was thinking sampling.probability.output as this would make it clear. By making this a standard we make it easier for teams to adopt OTel SDK if they are already using other SDKs.

@Kanwaldeep
Copy link
Author

Based on discussion in Sampling WG meeting we are working on PR for:

  • Sampler should be able to add/modify keys on tracestate.
  • Plugins to have access tracestate. This way, new samplers are not blocked by having to agree on the meanings of common terms like “priority.”

@Oberon00
Copy link
Member

Oberon00 commented Jul 6, 2020

What are "plugins" here? You mean tracestate should be publicly accessible? I think it is.

@ericmustin
Copy link

👋 Hope all is well. Just popping in from otel-js to echo what @lizthegrey offered here #107 (comment). This PR's approach, or something like otep-specification #570 would be tremendously helpful for vendor specific exporters/systems which rely on the sampled traces/spans themselves to generate metrics (like hits/latency/etc) rather than a separate metrics system. I won't pretend to have enough knowledge about the Spec to say whether this info should live on span.attribute or span.context.traceState, but just that having enough information to infer a span's sample.count/weight/etc at the exporter level would be quite useful for many practical use cases.

@Oberon00
Copy link
Member

I won't pretend to have enough knowledge about the Spec to say whether this info should live on span.attribute or span.context.traceState

This is the crucial difference though. Samplers can already set normal attributes today. Is this enough for you? The key difference is that normal attributes are not propagated in-band, while traceState is (e.g. in the W3C tracestate HTTP header).

@ericmustin
Copy link

This is the crucial difference though. Samplers can already set normal attributes today. Is this enough for you? The key difference is that normal attributes are not propagated in-band, while traceState is (e.g. in the W3C tracestate HTTP header).

@Oberon00 In-band means, within the same process right? Basically, would that attribute be propagated from the parent to a non-root span? It seems like at the moment (at least in practice) that is not the case, and what PR #570 is trying to address, no?

In-Band (if i understand the term correctly) propagation is the main issue for the use case I have in mind, since the idea is that currently the default ProbabilitySampler leaves exporters blind to the sampling decision on any downstream spans they receive.

@Oberon00
Copy link
Member

Oberon00 commented Jul 17, 2020

In-band means, within the same process

No, it means in the same communication "band" that the process would also use without instrumentation to communicate with other processes. E.g. if a process normally makes an HTTP request to some other service, with in-band propagation, headers would be injected into that HTTP request.

But span attributes (as opposed to trace state) are not propagated at all (not in-process, not cross-process, not in-band; they are sent out-of-band to some backend with the exporter typically).

@ericmustin
Copy link

@Oberon00 ah ok gotcha, thank you for the clarification. So, at the most basic level, what matters more for me at least, is in-process propagation. This seems more reasonably accomplished by PR #570. Does that help clarify things? Hmm TL;DR I should've commented on PR #570 instead, but i felt the the use case described in the comments here was worth highlighting.

@lmolkova
Copy link
Contributor

Closing in favor of #135.
I've tried to address all comments there, sorry for the lost history of comments...

@Kanwaldeep
Copy link
Author

Kanwaldeep commented Aug 21, 2020

Closing as @lmolkova has started a new pull request for this proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants