Skip to content
This repository has been archived by the owner on Dec 6, 2024. It is now read-only.

Non-power-of-two consistent tail probability sampling in TraceState #226

Closed

Conversation

@jmacd jmacd requested review from a team March 2, 2023 08:46
text/trace/0226-sampling-random-traceids.md Outdated Show resolved Hide resolved
text/trace/0226-sampling-random-traceids.md Outdated Show resolved Hide resolved
text/trace/0226-sampling-random-traceids.md Outdated Show resolved Hide resolved
text/trace/0226-sampling-random-traceids.md Outdated Show resolved Hide resolved
text/trace/0226-sampling-random-traceids.md Outdated Show resolved Hide resolved
text/trace/0226-sampling-random-traceids.md Outdated Show resolved Hide resolved
@jmacd
Copy link
Contributor Author

jmacd commented Mar 3, 2023

@oertl Thank you. I accept your suggestions and wonder, would you be interested in drafting blocks of replacement text for your scheme? I'll be glad to work out the pseudocode snippets from there, but your words for "hex/binary threshold equals number of spans dropped out of 2^56", for explaining the 52-bits vs 56 bits issue, and the use of greater-or-equal, I take it, for filtering. I would be glad to then work on writing a spec to deprecate p-value in favor of t-value, where as you say the threshold "0" equals "00000000000000" indicating all spans with 7-bytes >= "00000000000000".

@oertl
Copy link
Contributor

oertl commented Mar 6, 2023

As alternative to the t-value, the p-value definition could be extended. For example, p:11.a3d could be interpreted as 11 leading zeros followed by a 1 and the bits defined by the extra hex code "a3d". Remaining bits are filled by ones. In this case, the value would be

00000000000    1  1010      0011      1101       111111111111111111111111111111111111     = 0x001a3dffffffff = 28853590294527 

11 leading        hex "a"   hex "3"   hex "d"    32 one bits  to fill up to 56 bits    
zeros

The resulting value corresponds be the number of kept spans out of 2^56 subtracted by 1.

The example above would correspond to a sampling probability of (28853590294527 + 1)/2^56 = 0.000400424

With this definition p:0, p:1, p:2, etc. would correspond to power of two sampling probabilities as already defined.

@jmacd
Copy link
Contributor Author

jmacd commented Mar 6, 2023

p:11.a3d

By the way, in my prototype I used the syntax a3d-11 where you've written 11.a3d (b/c I read this as "a3d shifted 11"). In this draft, to simplify, I proposed to encode the 11 zeros (i.e., 00000000000a3d), but I agree that encoding the zero count separately is more compact for most probabilities.

As for whether we overload how to parse p of not, that's a separate question-- it would break existing uses of p-value. That's why I prefer to create a new variable, but since p is marked experimental I'm open to this change.


This proposes to extend that specification with support for 56-bit
precision sampling probability. This is seen as particularly
important for implementation of probabilistic tail samplers (e.g., in
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be good to elaborate a bit more on the motivation for this requirement of higher precision sampling probability.

text/trace/0226-sampling-random-traceids.md Outdated Show resolved Hide resolved
@kalyanaj
Copy link

kalyanaj commented Mar 9, 2023

fer to create a new variable, but since p is marked experimental I'm open to this change

If p-value is not used currently in any implementations (or if they are okay with a breaking change since it is still experimental as you call out), yes it looks like extending p-value to also encode non-powers-of-two sampling probabilities is a good idea by @oertl.

Reasoning: Since p-value and t-value are conceptually for the same purpose (to encode sampling probabilities), ideally it will be good to just extend the p-value concept - it will make it a bit easier to understand for folks who are already familiar with the purpose of p-value, rather than trying to think of t-value as a new concept.

@PeterF778
Copy link

With respect to obsoleting the r-value, I reported a new issue 3307 today. If we agree that the issue is real (I'm not entirely sure), the solution I proposed would make consistent probability samplers work differently if there's no r-value.

@jmacd
Copy link
Contributor Author

jmacd commented Apr 21, 2023

This was discussed in yesterday's Sampling SIG. Since the initial feedback, I had come to the following rough idea to use all the bits of the TraceID, to consistently decide how to interpolate between powers of two.

Considering an 8-bit example with 75% sampling:

  • t-value indicates the trace ID at which drops take place
  • 75% sampling corresponds with t=0x40, (i.e., decimal 64) indicating to drop 64 out of 256 traces
  • t=0x40 corresponds with adjusted count of 4/3, calculated as 0x100 / (0x100 - 0x40)
  • t=0x40 also can be deterministically mapped to p=0 or p=1 using trace-id randomness to interpolate, in the example there are 192 out of 256 traces being accepted, they just need to be split into p=0 and p=1 to preserve the expected value of adjusted count. A similar/related formula is needed to interpolate by a head sampler, which I realize we never published in the specification (an omission, I think, see where I implemented this function in the head sampler)

cc @oertl @PeterF778

@oertl
Copy link
Contributor

oertl commented Apr 22, 2023

@jmacd

This was discussed in yesterday's Sampling SIG. Since the initial feedback, I had come to the following rough idea to use all the bits of the TraceID, to consistently decide how to interpolate between powers of two.

Unfortunately, after rethinking this proposal, I have concluded that it would not allow correct estimation of trace quantities (e.g. estimating the number of traces touching one service A and another service B) as described in my paper. It would only work for span estimates (e.g. estimating the number of spans of service A). The proposal violates the basic assumption that the choice of the sampling probability is independent of the shared randomness (trace ID or p-value).

@PeterF778
Copy link

Let's take one step back.

The Consistent Probability Sampling schema already allows consistent head and tail sampling if the sampling probability is a power of 2. Now we want to extend it so that consistency (head and tail) is preserved even for non-power-of-2 probabilities.

It is obvious, I think, that we need to use the random bits of trace-id somehow, if we want to get consistent sampling with any given probability, meaning multiple instances of (head or tail) samplers making the same sampling decisions wrt spans belonging to the same trace. From the past discussions, it looks like we have two categories of possible extensions:

A. Allow (approximation of) any non-power-of-2 sampling probabilities in TraceState.

B. Continue to restrict the sampling probabilities in TraceState to power-of-2 values.

and we can always have

C. Drop Consistent Probability Sampling as it has been proposed and replace it with something else.

Now we could try to summarize the benefits and disadvantages of these approaches.
The way I see it, the benefits of A are:

  • with sampling probabilities like 75% or 5%, the metric estimates (span-to-metric pipeline) are a bit better (a smaller expected error) than when switching between power-of-2 probabilities
  • makes some folks happy that they can specify any probability for a single span

while the potential issues are:

  • possible non-integer adjusted counts
  • subsequent sampling steps (head+tail, for example) for the same span force us to multiply probabilities - this puts pressure on providing more precision to represent probabilities, or entails loss of precision
  • need to deal with a potentially large number of sampling probabilities, complicating metric calculation
  • longer strings required to encode probability encoding in TraceState (if we care about such things)

Respectively, approach B is the reverse of that, with the drawbacks being:

  • slightly larger expected error for metric estimates if the requested sampling rate was not a power of 2

and the advantages are:

  • adjusted counts are full integer numbers
  • sets of spans/traces can be re-sampled multiple times using the same mechanism and still encode the resulting adjusted count accurately
  • small number of possible span probabilities make processing easier

With respect to C, it is a very wide open field, but let's keep in mind that for expressing sampling probabilities we do not need high precision, we need wide range of values. That's why the logarithmic scale of r-values and p-values is so powerful and efficient.

If we try to replace that with a linear model based on 56 random bits in trace-id, we quickly run out of bits.

@kalyanaj
Copy link

kalyanaj commented May 4, 2023

Thanks @PeterF778 for consolidating the tradeoffs - overall the summary of the tradeoffs makes sense to me.

I didn't quite understand the below two points - can you please clarify/elaborate?

sets of spans/traces can be re-sampled multiple times using the same mechanism and still encode the resulting adjusted count accurately

and this final point about running out of bits:

If we try to replace that with a linear model based on 56 random bits in trace-id, we quickly run out of bits.

@jmacd
Copy link
Contributor Author

jmacd commented May 4, 2023

03f693c Contains an update based on recent discussions.

@jmacd
Copy link
Contributor Author

jmacd commented May 11, 2023

@kentquirk new draft:
df6b1d0

@PeterF778
Copy link

A very nice and clean proposal!
A few questions though: it says it is extending the r-value and p-value proposal - the p-value will be used to calculate the final adjusted count, but is the r-value used anywhere?
And an old question remains: why tail sampling requires explicit non-power-of-two sampling probabilities as opposed to (deterministic) interpolation between two adjacent power-of-twos? What do we gain or lose with either approach?

text/trace/0226-sampling-random-traceids.md Outdated Show resolved Hide resolved
text/trace/0226-sampling-random-traceids.md Outdated Show resolved Hide resolved
text/trace/0226-sampling-random-traceids.md Outdated Show resolved Hide resolved
text/trace/0226-sampling-random-traceids.md Outdated Show resolved Hide resolved
user's intended sampling probability without floating point conversion
loss.

## Prior art and alternatives

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Towards the end, we may want to call out that one benefit of the r-value based randomness was that it could be used to get consistent sampling across multiple traces (e.g., all traces started within a time window by a participant) - it would be good to call out that it should be possible to support it in the future as a complement to the current proposal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to use arbitrary sampling probabilities, we should not use the current definition of the r-value. It makes no sense to have different discretizations for the r-value (powers of two) and for the t-value (56-bit values). Therefore, the r-value should rather be a 14-digit hex value that overrides the random bits of the trace ID, if present. This way we could also handle traces where the random flag is not set in the trace context. If the flag is not set and there is also no r-value, we could require consistent samplers to set the r-value by generating a 56-bit random value.

Sampler's threshold, the span passes through with the current
sampler's t-value, otherwise the span is discarded.

## Examples

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add two more examples that shows how consistent probability sampling can be achieved across multiple participants.

Example 1:

  • Upstream participant samples at 10% probability (ot=t:0.1 is sent as part of tracestate)
  • Downstream participant does parent-based sampling. It uses the sampled flag to make the decision, gets the t-value from the parent context and emits it as part of its context (ot=t:0.1 is sent as part of tracestate to further downstream participants)

Example 2:

  • Upstream participant samples at 10% probability (ot=t:0.1 is sent as part of tracestate)
  • Downstream participant samples at 5% probability - it calculates a threshold based on its sampling rate and compares with the traceID last 7 bytes to make the sampling decision (ot=t:20 is sent as part of tracestate).
  • Downstream participant does parent-based sampling (uses the sampled flag to make the decision, gets the t-value from the parent context and emits it as part of its context)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These examples sound good to me! Will do.

Traces whose least-significant 56 bits form an unsigned value less
than 7205759403792794.

## T-value encoding for adjusted counts

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be good to define the mutation rules and propagation rules for t-value. E.g., something on the lines of:

  • if a participant is doing parent-based sampling, it should propagate the t-value from its parent.
  • if a participant is doing consistent probability sampling using its own sampling rate, it should mutate the t-value to set the new adjusted count / sampling rate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite answering your question, but I've prototyped open-telemetry/opentelemetry-collector-contrib#22058 with a different sort of answer to your question.

In this case referring to span data records, where there are multiple collectors in a pipeline. The first collector may sample at 1/10; when a subsequent collector samples at 1/20, the t-value of the selected spans will be updated. If the subsequent collector samples at 1/2, however, it is being less selective than the first collector, so it should not modify the t-value. That is to say that t-value adjusted counts should not fall and t-valued probabilities should not rise.

See the logic here: https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/22058/files#diff-33f10350e2875f926dd2be6fc4c6bb88cfd8043cf6ac6d100295cf654771d90dR210-R219

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a problem with such sampling behavior. Let's assume that the previous collector in chain sampled all traces with errors with probability 1, and all remaining traces with 1/100. If the next collector in chain is configured with 1/10, it will not touch the healthy traces, but will decimate the traces with errors. So any stratified sampling logic must be known and repeated by all collectors in the pipeline. Even if we prohibit stratified sampling, to set up a collector sampling probability in any meaningful way we have to know the minimum sampling probability of all the preceding collectors.

@PeterF778
Copy link

PeterF778 commented May 15, 2023

I would like to see how to provide algorithms for tail sampling based on the t-value, for the following use cases:

  1. (static sampling) Given a set of N1 spans, I want to re-sample it to arrive at approximately N2 spans.
  2. (dynamic sampling) Given a stream of incoming spans, I want to re-sample it so I get at most X spans/s (approximately).

The potential challenge I see is that the algorithms may require multiplying t-values and/or sorting the spans according to their t-value.

@oertl
Copy link
Contributor

oertl commented May 16, 2023

@PeterF778

I would like to see how to provide algorithms for tail sampling based on the t-value, for the following use cases:

  1. (static sampling) Given a set of N1 spans, I want to re-sample it to arrive at approximately N2 spans.

I believe collecting N2 spans with smallest random value can be done in O(N1) time and O(N2) space without sorting. The idea is to have a threshold T (initial value 1) and a buffer of size N2*(1 + eps) with some constant eps > 0 that stores all spans with random values (interpreted as values from [0,1)) smaller than T. If the buffer is full, the span with the (N2+1)-th largest random value is chosen using Quickselect . This random value defines the new threshold T which has an expected value of E(T) = 1/(1 + eps). All spans with random values >= T get dropped. The remaining N2 spans are kept and put into the first part of the buffer. This compaction step takes O(N2*(1+eps)) time. After that, the buffer has again space for N2*eps new spans. We fill that with the next spans in the data stream having a random value smaller than the threshold T. Since the expected value of the threshold is 1/(1 + eps), it takes N2*eps*(1+eps) spans on average to fill the buffer again, when the next compaction takes place. The new threshold will have an expected value of E(T) = 1/(1+eps)^2. And so on.

Therefore we need

O(N2*(1+eps)) time to process the first N2*(1 + eps) spans
O(N2*(1+eps)) time to process the next N2*eps*(1+eps) spans
O(N2*(1+eps)) time to process the fnext N2*eps*(1+eps)^2 spans
...

In summary, it takes O(K*N2*(1+eps)) time to process

N2*(1 + eps) + N2*eps*(1+eps) + N2*eps*(1+eps)^2 + ... + N2*eps*(1+eps)^(K-1) = N2*(1+eps)^K

spans. The average processing costs per span are therefore

O(K*N2*(1+eps) / N2*(1+eps)^K) = O(K / (1+eps)^(K-1)) = O(1)

which is constant. I am not 100% sure yet what t-value needs to be assigned to the surviving spans to get unbiased counts. Probably it is sufficent to replace the current t-value, if it is larger, by the threshold T. This Adaptive Threshold Sampling
paper might be helpful in this context.

  1. (dynamic sampling) Given a stream of incoming spans, I want to re-sample it so I get at most X spans/s (approximately).

If there was no previous sampling stage, we need to estimate the incoming rate from spans in the past (e.g., with exponential smoothing). Using this estimate and the desired rate, we can calculate a sampling probability that is used as the threshold. If there were previous sampling stages, the random values are no longer uniformly distributed and the actual distribution must be considered when choosing the threshold. For example, to achieve a 50% reduction, the threshold would need to be set to the median of all random values. To obtain a sampling probability of p, one would have to set the threshold to the p-quantile. It might be possible to use techniques like Frugal Streaming to incrementally adjust the threshold.

@PeterF778
Copy link

I'm not sure I understood your algorithm for static sampling correctly, @oertl. Correct me, if I'm wrong, but it looks like the selection of spans to survive is affected heavily by their t-value, rather than the source of randomness (trace-id), which introduces a bias.
Suppose that the original set of N1 spans contains a subset S of spans that have been already heavily sampled, so their t-values are very low - much lower than the spans not belonging to set S. In that case, if the size of set S is smaller than N2, all elements of set S will survive the re-sampling process. IMHO, an unbiased algorithm should sample spans from set S roughly with the same chances of survival as those from outside set S.

@oertl
Copy link
Contributor

oertl commented May 16, 2023

IMHO, an unbiased algorithm should sample spans from set S roughly with the same chances of survival as those from outside set S.

It depends on what is meant by bias. My understanding is the following: The expected adjusted count is equal to 1. This is ensured by setting the adjusted count to the inverse of the sampling probability. This should be the case with the algorithm described above.

@PeterF778, I think your concern is that the algorithm balances the sampling probabilities of the previous sampling stages. Depending on what you want to estimate, this sampling strategy may or may not be beneficial. I haven't checked, but I think the algorithm is similar to VarOpt sampling (see here), which minimizes variance when estimating arbitrary subset sums. This algorithm makes sense if you have a sampling stage that combines samples collected in earlier sampling stages with different probabilities for no good reason, e.g., due to unbalanced load or short-term load fluctuations.

The situation is different if you have already identified certain classes of spans (e.g., spans with errors) that should be sampled with a higher probability. In this case, you want to sample these spans more frequently than others by purpose. I think the correct term for this is stratified sampling, where weights are defined for different classes of spans and the sampling algorithm tries to sample them so that the ratios of the weights are reflected by the corresponding ratios of the t-values. To complicate matters further, different stages of sampling would define different weights for the same span. Early stages may not be able to assess the importance of a span while later sampling stages have a more holistic view of the trace and therefore might assign a different weight to a span. For example, if the child span has an error.

Anyway, discussion of these sampling strategies takes us a bit off topic. The same issues must be solved when using power-of-two sampling probabilities.

@PeterF778
Copy link

@PeterF778, I think your concern is that the algorithm balances the sampling probabilities of the previous sampling stages. Depending on what you want to estimate, this sampling strategy may or may not be beneficial.

Yes, I imagine "static sampling" to be applied to aging data, after one or many stratified sampling steps were performed .

Anyway, discussion of these sampling strategies takes us a bit off topic. The same issues must be solved when using power-of-two sampling probabilities.

If we want to compare the two competing consistent probability sampling mechanisms, we have to understand what they will entail in all processing stages. I believe the re-sampling algorithms will be very similar in principle, but there could be differences in complexity or accuracy of the results.

@oertl
Copy link
Contributor

oertl commented May 17, 2023

If we want to compare the two competing consistent probability sampling mechanisms, we have to understand what they will entail in all processing stages. I believe the re-sampling algorithms will be very similar in principle, but there could be differences in complexity or accuracy of the results.

Yes, this is true. For example, for "dynamic sampling" it is probably much simpler to estimate quantiles if there is a power-of-two discretization as you could simply aggregate into a histogram as there are only a small number of relevant values. However, any sampling stage is free to use just power-of-two sampling probabilities, if it is more efficient. Even, if spans come with non-power-of-two t-values, they could be easily downsampled to the next power of two in a first step.

@jmacd
Copy link
Contributor Author

jmacd commented May 25, 2023

New developments discussed in the Sampling SIG today.

We proposed "s-value" as a mechanism to encode the accumulation of independent non-consistent sampling stage adjusted counts. t-value and s-value would be separate fields, both included in tracestate for consistency. Existing vendor-specific sampling probabilities with unknown-and-presumed-independent sampling mechanisms will encode probability or adjusted count (as with t-value encoding) using tracestate s-value.

In the coming week I will resolve the conversations above. The plan is to draft a proposed change to https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/tracestate-probability-sampling.md and document/justify the changes in this OTEP. Most of the existing specification will be re-used.

@jmacd
Copy link
Contributor Author

jmacd commented Jun 1, 2023

I've updated the document making "r-value" optional (and 56-bits) and making "s-value" for independent non-consistent sampling. I've incorporated some helpful feedback from @kalyanaj. Thanks.

TraceState field. It would be appropriate to name this field
`LogState`.

This proposal does makes r-value an optional 56-bit number as opposed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: This proposal makes...


To calculate the Sampling threshold, we began with an IEEE-754
standard double-precision floating point number. With 52-bits of
significand and a floating exponent, the probability value used to
Copy link
Contributor

@oertl oertl Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With 52-bits of significand...

Double-precision floating-point values have a 52-bit mantissa but are able to represent 53-bit significands (except for subnormal values). See https://cs.stackexchange.com/a/152267/102560.

value in big-endian byte order.
2. The sampling probability (range `[0x1p-56, 1]`) is multipled by
1. When the r-value is present and parses as a 56-bit random value,
use it, otherwise bytes 10-16 of the TraceID are interpreted as a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is worth specifying whether this counts from 0 or 1, or, even better, including an annotated traceID here, just for clarity.

1. When the r-value is present and parses as a 56-bit random value,
use it, otherwise bytes 10-16 of the TraceID are interpreted as a
56-bit random value in big-endian byte order
2. The sampling probability (range `[0x1p-56, 1]`) is multiplied by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most readers will be unfamiliar with floating point hex notation (I was) and this is probably needlessly terse. One way to express it would be (0, 1], but that also might be too confusing. Perhaps "greater than 0 and less than or equal to 1" or even 0 < n <= 1?

Similarly below, I might say 2^56 rather than using the hex notation.


### 10% probability sampling twice

The tracestate value `ot=s:0.01` corresponds with 10% sampling by one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe expand this to show how the tracestate would be modified at each stage?

@kentquirk
Copy link
Member

@jmacd , @kalyanaj , @oertl , @PeterF778

As promised in the SIG meeting, I have implemented a proposal for how to calculate threshold, sampling rate, and sampling probability in several different languages (Go, JavaScript, and Python).

The proposal is simple -- here's the Go version:

func threshold(tValue float64) int64 {
	const k = 0x1p+56
	if tValue < 1.0 {
		return int64(k*tValue + 0.5)
	}
	return int64(k/tValue + 0.5)
}

func samplingRate(tValue float64) int64 {
	if tValue < 1.0 {
		return int64(1.0/tValue + 0.5)
	}
	return int64(tValue + 0.5)
}

func samplingProbability(tValue float64) float64 {
	if tValue < 1.0 {
		return tValue
	}
	return 1.0 / tValue
}

All 3 languages seem to deliver identical results in the first 2 functions.

I was unable to convince all 3 languages to format floating point values identically (I probably could have by installing various libraries but I wanted to use the standard libraries). But they seemed to be identical through the first 14 digits.

Thoughts?

@oertl
Copy link
Contributor

oertl commented Jun 6, 2023

I wonder if we should prefer a canonical lossless (e.g. hex-encoded and only values less than or equal to 1) representation of the t-value that does not depend on platform- or language-dependent behavior (like the r-value). This would make encoding and parsing much simpler and less costly. For scenarios with multiple sampling stages, the t-value needs to be parsed and encoded frequently, and therefore this should be done as efficient as possible.

@kentquirk
Copy link
Member

I understand your point, but I'm personally much more concerned about human usability. What I see from users in real sampling configurations are sampling rates like 1 in 10, 1 in 6, and 1 in 1000.

rate hex value < 1
10 0x1.999999999999ap-4
6 0x1.5555555555555p-03
1000 0x1.0624dd2f1a9fcp-10

All of those representations lose the user's intent, while 10, 6, and 1000 do not.

There is also the problem that as far as I have found, many languages do not support hex representation in floating point. Or if they do, they do so differently.

I strongly feel that we should bias toward comprehensible and easily implementable human representations.

@oertl
Copy link
Contributor

oertl commented Jun 7, 2023

I understand your point, but I'm personally much more concerned about human usability. What I see from users in real sampling configurations are sampling rates like 1 in 10, 1 in 6, and 1 in 1000.

I also understand your point. But the limited number of bits simply does not allow to sample exactly 1 out of 3 consistently. In my opinion, it is more important to forward the actual sampling probability applied, not the user's intent. If you know that all your sampling configurations are of kind "1 out of X", it is relatively easy to reconstruct X from the applied sampling probability to make the user believe that it really was sampled as originally intended. Maybe an additional flag indicating that the reported t-value comes from a "1 out of X" rule could be a compromise.

In large systems, sampling probabilities are typically automatically chosen (e.g. based on rate limits), and it is more important that the parsing/encoding overhead is small.

There is also the problem that as far as I have found, many languages do not support hex representation in floating point. Or if they do, they do so differently.

It is easy to specify a lossless hex representation for the t-value. It could be defined as the integer threshold value used when comparing with the 56 random bits of the trace ID (or the optional r-value). If the t-value is an integer, it is straightforward to find a platform/language-independent hex representation. This definition would also reduce floating point operations as the sampling decision is simply the result of comparing the t-value with the random bits.

@jmacd
Copy link
Contributor Author

jmacd commented Jun 15, 2023

@jmacd
Copy link
Contributor Author

jmacd commented Sep 6, 2023

This OTEP has been well-replaced by #235. Thanks @kentquirk!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants