Skip to content

Expose trace sampling controls in the public API#375

Closed
hausdorff wants to merge 6 commits intoenvoyproxy:masterfrom
hausdorff:trace-sampling
Closed

Expose trace sampling controls in the public API#375
hausdorff wants to merge 6 commits intoenvoyproxy:masterfrom
hausdorff:trace-sampling

Conversation

@hausdorff
Copy link

@hausdorff hausdorff commented Dec 29, 2017

I'm basically completely new to Envoy, so I am probably missing some stuff you'd want to see in a PR, just let me know and I'll fix it up.

  • Didn't see any direct tests on the API objects that seemed like this would need to be added to, and the contributing doc didn't mention writing any, so there are none here.
  • Docs seem to be generated by CI, so I didn't generate any new ones.
  • I consulted HTTP connection manager tracing docs and general tracing docs so I think this change is semantically right, but it is highly likely that I've misunderstood something about how they work and what they mean, so you'll want to pay close attention there.

Additional questions I think I've answered, but might be worth double-checking:

  • Are these mutually exclusive options, and do special precautions need to be taken as such? I think they're not, and I think it's sufficient to simply plumb these settings through to the runtime inside Envoy core.
  • Who is responsible for default values? I believe it should be Envoy core, since the runtime values already have defaults, and it doesn't look like we're bundling in any fancy default logic here anyway.

Envoy core has an open issue1 to expose the target trace sample
percentages as part of the API. This is exposed as part of the HTTP
connection manager runtime, but is not yet exposed as part of the API.
(See the issue for more details on those controls.)

This commit will introduce the 3 main runtime controls as part of the
core API.

  1. Tracing#client_enabled, which specifies the target percentage of
    requests to be force-traced. (Meant to map to HTTP connection
    manager's runtime tracing.client_enabled setting.)
  2. Tracing#global_enabled, which specifies the target percentage of
    requests to be traced after all rules and checks have been applied.
    (Meant to map to HTTP connection manager's runtime
    tracing.global_enabled setting.)
  3. Tracing#random_sampling, which specifies the number of requests to
    be randomly traced. (Meant to map to HTTP connection manager's
    runtime tracing.random_sampling setting.)

Signed-off-by: Alex Clemmer clemmer.alexander@gmail.com

Envoy core has an open issue[1] to expose the target trace sample
percentages as part of the API. This is exposed as part of the HTTP
connection manager runtime, but is not yet exposed as part of the API.
(See the issue for more details on those controls.)

This commit will introduce the 3 main runtime controls as part of the
core API.

  1. Tracing#client_enabled, which specifies the target percentage of
     requests to be force-traced. (Meant to map to HTTP connection
     manager's runtime tracing.client_enabled setting.)
  2. Tracing#global_enabled, which specifies the target percentage of
     requests to be traced after all rules and checks have been applied.
     (Meant to map to HTTP connection manager's runtime
     tracing.global_enabled setting.)
  3. Tracing#random_sampling, which specifies the number of requests to
     be randomly traced. (Meant to map to HTTP connection manager's
     runtime tracing.random_sampling setting.)

[1]: envoyproxy/envoy#1813

Signed-off-by: Alex Clemmer <clemmer.alexander@gmail.com>
@mattklein123
Copy link
Member

@hausdorff thanks a bunch fo working on this.

I think you actually want to add these options here https://github.com/envoyproxy/data-plane-api/blob/master/api/filter/network/http_connection_manager.proto#L67 which is per HTTP connection manager config. If you look at how they will actually be used, the function is here: https://github.com/envoyproxy/envoy/blob/master/source/common/tracing/http_tracer_impl.cc#L43. This ends up being called from the connection manager and will require the connection manager config getting plumbed through.

@hausdorff
Copy link
Author

@mattklein123 Ok cool, thanks for the quick review, I'll update this PR later today.

Following up from Matt Klein's comments that they should be customizable
on a per-conn-manager basis.

Signed-off-by: Alex Clemmer <clemmer.alexander@gmail.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks.

// Global target percentage of requests that will be force traced if the
// x-client-trace-id header is set. Must be an integer number between 0 and
// 100.
uint32 client_enabled = 3 [(validate.rules).uint32.lte = 100];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need to use wrapped types, i.e. google.protobuf.UInt32Value, to obtain the defaulting behavior specified in https://www.envoyproxy.io/docs/envoy/latest/configuration/http_conn_man/runtime.html?highlight=sample. This is because proto3 can't distinguish between non-specified scalar values and their null values (0 in the case of uint32).

Copy link
Author

@hausdorff hausdorff Dec 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so just so we're all clear, you're saying they need to be nullable so that we can detect when they're not set when we plumb it through to the runtime? (EDIT: I now see that they're not nullable, but they're null-check-able. Heh.)

Makes sense. Fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.


// Global target percentage of requests that will be randomly traced.
// Specified as ten-thousandths of a percent (i.e., in 0.01% increments),
// using integer numbers in the range 0-10000.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a problem in runtime as well, but I can't help but feel that we should have a standard practice for specifying percentages in the Envoy proto API. In some places, we use double, other places uint32, with different degrees of granularity. Ideally we would have a Percent message type in the API that would also capture the lte constraint.

@mattklein123 @wora for comment and possible addition to STYLE.md. I don't think there is anything actionable for this PR @hausdorff.

Copy link
Member

@mattklein123 mattklein123 Dec 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I'm sad we didn't do this in the original v2 conversion. Oh well. Might as well start now. Can we just add a common Percent message per @htuch? I would probably just use a double and clamp it between 0.0 and 1.0. Internally, we can convert to integer as needed (e.g., a number out of 10,000 or 100,000 or whatever).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to cause waves by having the API not match up with the implementation, but I'd also prefer the wrapping a double in a Percent message type. So let's do that.

Fixed.

// Global target percentage of requests that will be traced after all other
// checks have been applied (force tracing, sampling, etc.). Must be a
// number between 0 and
// 100.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

repeated string request_headers_for_tags = 2;

// Global target percentage of requests that will be force traced if the
// x-client-trace-id header is set. Must be an integer number between 0 and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention we have for headers in docs is *x-client-trace-id*.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

// created if the specified header name is present in the request's headers.
repeated string request_headers_for_tags = 2;

// Global target percentage of requests that will be force traced if the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I didn't because this link went stale in the original issue, but we can add it if you like.

Signed-off-by: Alex Clemmer <clemmer.alexander@gmail.com>
@hausdorff
Copy link
Author

@mattklein123, @htuch Ok, so we've created a new message type, Percentage. I'll follow up this PR with another one that transitions all V2 percentages to use this, as well as updating STYLE.md.

BTW I took a guess about how we want to handle the resolution mismatch of using a float here and an integral type in runtime by noting in the comments that the percentages have a resolution of 1% or 0.01%, rounded down (I believe this is probably "close enough" to being true even if it elides the details of floating point math). I think there's room for discussion here, though, about whether we want the runtime to use something other than integral types, though, but I'm not well versed enough in how that code works to have specific opinions yet.

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for the Percent cleanup. Unfortunately, we can't go and change existing uses of Percent variants where APIs are frozen, since this will break backwards compatibility. So, we can only use this in new fields or some of the experimental APIs. See https://www.envoyproxy.io/docs/envoy/latest/configuration/overview/v2_overview#status for what is frozen/experimental.

api/base.proto Outdated
}

message Percent {
float value = 5 [(validate.rules).float = {gte: 0, lte: 100}];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float value = 1 would be more minimal. One other thing; I think @mattklein123 was suggesting we make this between 0.0 and 1.0.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, sorry, I got lazy. Will fix.

api/base.proto Outdated
google.protobuf.Struct config = 2;
}

message Percent {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some simple comment for docs purposes (e.g. explaining purpose, valid ranges).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, fixed.

// <https://www.envoyproxy.io/docs/envoy/latest/configuration/http_conn_man/runtime.html?highlight=sample>`_.
Percent global_enabled = 4;

// Global target percentage of requests that will be randomly traced. Must be a number between 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The range comments here can be elided if we add them to Percent.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

// Global target percentage of requests that will be randomly traced. Must be a number between 0
// and 100; resolution is to the nearest 0.01%, rounded down. Defaults to 100. This variable is
// a direct analog for the variable of the same name in the `HTTP Connection Manager
// <https://www.envoyproxy.io/docs/envoy/latest/configuration/http_conn_man/runtime.html?highlight=sample>`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do :ref: style internal links to avoid relying on possible stale external links, grep around for how :ref: is used to see how.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed... I think

Signed-off-by: Alex Clemmer <clemmer.alexander@gmail.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! You'll need to run the fix_format script as the check_format is failing before we can merge.

Signed-off-by: Alex Clemmer <clemmer.alexander@gmail.com>
@hausdorff
Copy link
Author

@htuch ah, sorry, I thought I read in the style guide we're wrapping at 100 columns.

repeated string request_headers_for_tags = 2;

// Global target percentage of requests that will be force traced if the *x-client-trace-id*
// header is set. Percent is resolved to the nearest 1% (rounded down). Defaults to 1 (i.e.,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For new things in the v2 API, I think we have the chance to normalize the gradients which will be much less confusing to people. I would recommend doing the following:

  1. Documenting on the Percent message itself that the float value will be normalized to 0.01% increments. (Internally we will multiply by 10,000).
  2. Then we can remove the rounding documentation from each of these such that it's inferred that everything is 0.01% increments.

For legacy code in which a Percent object is not defined, we can use the legacy runtime dividend. For code in which Percent is defined, we can use 10,000 as the dividend. I would recommend we also deprecate the old dividends that are not 100 in runtime and then just delete them in the next release with a release note.

I realize this is more work, but it will make the future situation much easier to understand.

How does this sound?

Copy link
Author

@hausdorff hausdorff Dec 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm pretty easy-going about landing the patches correctly even if it's slower.)

I am ok with having all Percent objects resolve at 0.01% increments as long as @htuch agrees. But while it's up in the air, I'm a bit confused about the semantics.

What's the reason we're using an integral type instead of a float-ish type to represent stuff like sampling percentage targets? Is it just to force people to think in target percentages that are perceptibly different over a few thousand requests? If so then representing it as a float just seems like a weird tool, since we're letting people express things we won't execute on. (EDIT: Especially since it's possible to have something weird like 1.0000000001 or something, which I suspect would fail the bounds check above.)

And if not, then I'm not sure an arbitrary truncation horizon is clearly better than just letting good old IEEE 754 "solve" the problem incidentally by the way it specifies float semantics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no good historical reason that the config didn't always use floats. It's just the way it was done. Internally in the code, dealing with integers is substantially faster when computing randomness and random chance hits, so we want to be able to convert any float config into integers for computation.

I actually don't care if we want to make Percent very clearly an integer between 0 and 10,000 and explain that it will be converted to 0.01% increments internally. I just figured most users would prefer to work with a float. I'm fine either way.

@wora might have some thoughts on this as well. I would prefer to wait for him to chime in before we merge this so it will likely need to wait till early next week.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S., the reason 10,000 is used is that random result can be inferred from 2 bytes of entropy which we make use of in a few places (UUID stable sampling comes to mind).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. There should be no rush on API changes!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we should avoid using float unless it is data size critical, such as colors. JSON cannot represent float, so you end up float<->double conversions everywhere, and causes data loss.

Envoy is a basic infrastructure. So I don't feel Envoy should enforces the minimum ratio of 0.01%. Why is this something Envoy needs to decide? Can we leave the problem to the operators.

If we do enforces 0.01% ratio, then we should change Percent to use integers, so we don't need to explain the rounding problem because integer forces rounding anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing Percent to RoundedPercent so it's more clear to the user? We can also switch to double and make the range 0.0-100.0 per @wora. Then I think there will be no confusion that we are going to round internally for performance reasons.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need, and will likely never need, the extra resolution, but it's also not a big deal to just use double. Let's go with double.

api/base.proto Outdated
// <envoy_api_field_filter.network.HttpConnectionManager.tracing>`).
message Percent {
// The percent, a float between 0 and 1.
float value = 1 [(validate.rules).float = {gte: 0, lte: 1}];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also say that I don't feel very strongly about 0.0-1.0 if people prefer 0.0-100.0.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer ~0.0-~1.0, which I think most people would say is more intuitive.

I would consider replacing 1.0 with some other number (e.g., 100) if we were planning on users needing that extra resolution between numbers. I do not get the sense they will, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the message is called percent, the value range should be [0, 100]. That is how most people interpret English words.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, I think per above we should switch to RoundedPercent, double, 0.0-100.0

@wora
Copy link
Contributor

wora commented Jan 2, 2018

The current design probably makes sense for Envoy developers. I don't think it is suitable for operators who want to use the feature. The design enforces too many restrictions and terminology on operators. IMO, tracing is not a mission critical feature like authz, we should let operators control whatever they want. There is no need for us to enforce too much rules, such as the percent rounding.

api/base.proto Outdated
// <envoy_api_field_filter.network.HttpConnectionManager.tracing>`).
message Percent {
// The percent, a float between 0 and 1.
float value = 1 [(validate.rules).float = {gte: 0, lte: 1}];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the message is called percent, the value range should be [0, 100]. That is how most people interpret English words.

repeated string request_headers_for_tags = 2;

// Global target percentage of requests that will be force traced if the *x-client-trace-id*
// header is set. Percent is resolved to the nearest 1% (rounded down). Defaults to 1 (i.e.,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we should avoid using float unless it is data size critical, such as colors. JSON cannot represent float, so you end up float<->double conversions everywhere, and causes data loss.

Envoy is a basic infrastructure. So I don't feel Envoy should enforces the minimum ratio of 0.01%. Why is this something Envoy needs to decide? Can we leave the problem to the operators.

If we do enforces 0.01% ratio, then we should change Percent to use integers, so we don't need to explain the rounding problem because integer forces rounding anyway.

// same name in the :ref:`HTTP Connection Manager <config_http_conn_man_runtime>`.
Percent global_enabled = 4;

// Global target percentage of requests that will be randomly traced. Percent is resolved to the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concept of Global is strange here. If I run a server, that server can be a standalone server, or a server in a cluster, or a zone, or a region, or many regions. What does "Global" mean here? Does my server have to talk to a global service to do the tracing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it's per HTTP connection manager (as I understand it). Fixed.

Percent global_enabled = 4;

// Global target percentage of requests that will be randomly traced. Percent is resolved to the
// nearest 0.01%, rounded down. Defaults to 1 (i.e., 100%). This variable is a direct analog for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really should get out of the rounding business. If I am running a memcache service with millions of requests per second, why do I have to follow such policy?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API docs are telling you what Envoy itself is going to do. We need to explain to users of the API what the effect of setting a specific value will have, so somewhere we need to be clear about how fine grained percentages/ratios will be interpreted in practice.

// applied (force tracing, sampling, etc.). Percent is resolved to the nearest 1% (rounded
// down). Defaults to 1 (i.e., 100%). This variable is a direct analog for the variable of the
// same name in the :ref:`HTTP Connection Manager <config_http_conn_man_runtime>`.
Percent global_enabled = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not design an API based on another API unless the concept is well know, such as country_code. From reading this comment, I don't understand what "global enabled" means to me, either as a developer or an operator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup let's clean this up. Let's just call this sampled to go with client_sampled above.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

// Connection Manager <config_http_conn_man_runtime>`.
Percent client_enabled = 3;

// Global target percentage of requests that will be traced after all other checks have been
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concept of "after all other checks" is questionable here. In a distributed system, we can not assume there is a fixed order of checks. It doesn't seem proper to assume tracing has the privilege of being the last decider.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that this is per-HTTP connection manager, which is not distributed, so I think we should just get rid of the "global" and call that out specifically.


// Global target percentage of requests that will be traced after all other checks have been
// applied (force tracing, sampling, etc.). Percent is resolved to the nearest 1% (rounded
// down). Defaults to 1 (i.e., 100%). This variable is a direct analog for the variable of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal lesson with infrastructure design is everything should be default to 0. Don't assume everyone wants my feature no matter how much I love it myself.

If I add a new feature to a system, I cannot touch all existing customers, so the new feature must be default off. Instead of arguing whether a new feature should be default on, we should have a hard policy, everything is default off, and don't spend time debating on it. We never know what users want to do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I agree that default 0 would be better, that's not how the current code works and we can't change it. Se la vi.

// header is set. Percent is resolved to the nearest 1% (rounded down). Defaults to 1 (i.e.,
// 100%). This variable is a direct analog for the variable of the same name in the :ref:`HTTP
// Connection Manager <config_http_conn_man_runtime>`.
Percent client_enabled = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field name is very vague. It is more like "client_enabled_sampling".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This was an unfortunate internal name choice. Let's fix this in the config. I would probably just do client_sampling personally.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@hausdorff
Copy link
Author

hausdorff commented Jan 2, 2018

Ok, let me try to summarize current open issues, so they don't get lost:

  1. Docs and naming need work. [Clear fix, I'll do it today.]
  2. Decide whether sampling should be 0%, not 100%.
  3. Decide whether to do rounding in percentages.

I'll fix (1). For (2), I agree with @mattklein123 that it's probably too late to change the sampling defaults, since they're baked into HTTP Connection Manager's runtime. I'll update the code and mark this as resolved since I sense this is not up for debate. :)

So I think the only remaining issue is (3). I agree there is a tension between how percentages are expressed and how we represent them for performance reasons. I think, on balance, if we're not going to actually use RoundedPercent (whose proposed semantics I understand to be "truncate at 1/10k) everywhere as a "unified" Percent type, the utility of having it around is somewhat diminished. So at this point I would be inclined to just go with the unsigned integer we had before. But I'm also like a 6/10 in terms of how strongly I believe this. @mattklein123, @htuch, if you feel stronger than that we can do it your way.

@mattklein123
Copy link
Member

Personally, I still feel there is value in a common RoundedPercent type message in which we are clear that it is 0-100% in 0.01% increments. This will be useful in a variety of different places in which we do % based selection. I think it's really a shame today that we have some places in which we effectively round to 1% and other places where we round to 0.01%. This is a chance to do better at least for new things. Curious to hear what others/@htuch thinks.

@hausdorff
Copy link
Author

hausdorff commented Jan 3, 2018

@mattklein123, @htuch, @wora I've resolved 2 of the 3 open issues in the summary.

For the third, in the new commit, I'm proposing that we have HundredthsRoundedPercent that goes from [0, 100] rounded down to the nearest 0.01%, backed by a double. My rationale, after hearing all the input, is that I believe auto-rounding is much less error prone than asking for a number between 1 and 10k. (And if you don't believe normal users can mess that up, look in my last comment, where I, an ex-math major, claim that we "truncate at 1/10k".)

I will admit to being a little peeved that we're using a double for the ones-rounded percentages, but then again, if I wanted to work on something pretty, I would still be working in math.

Signed-off-by: Alex Clemmer <clemmer.alexander@gmail.com>
@wora
Copy link
Contributor

wora commented Jan 3, 2018 via email

// traced if the *x-client-trace-id* header is set. Defaults to 100%. This variable is a direct
// analog for the variable of the same name in the :ref:`HTTP Connection Manager
// <config_http_conn_man_runtime>`.
double client_sampling = 3 [(validate.rules).double = {gte: 0, lte: 100}];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do the following:

  • Add a universal Percent message, which enforced [0..100] range and is backed by double.
  • Express locally, at each field, what the actual effective interpretation will be. This interpretation might change over time (e.g. it might be rounded to nearest 1% today, in later Envoy releases it might be 0.1%). I agree with @wora that folks supplying the API data should be able to express their intents as fine grained as they wish, but at the same time, users need to know what the limits at the implementation level are.

// after all other checks have been applied (force tracing, sampling, etc.). Defaults to 100%.
// This variable is a direct analog for the variable of the same name in the :ref:`HTTP
// Connection Manager <config_http_conn_man_runtime>`.
double sampling = 4 [(validate.rules).double = {gte: 0, lte: 100}];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you start a STYLE.md in root with a single bullet point for now, which captures the point that @wora made about preferring double over float for JSON conversion reasons? This is is something we should now start to be more consistent about, and it's one of these missing proto3 style guidelines that would be good to codify. Thanks!

@wora
Copy link
Contributor

wora commented Jan 3, 2018

Let's not to put hundredth percent into the message name. We may allow finer grained control in the future. We just document it in the comments.

This is a subtle trade off we often need to make: code vs comment. It is much easier to add features and update the comment. We should codify the spec if there is strong need for it. In this case, I think it is just an optimization, so we don't need to codify it IMO.

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing approval until changes made to make PR status overview easier to understand.

@brian-pane
Copy link
Contributor

@hausdorff FYI, in case it's useful, there's now a Percent message type in api/base.proto (added as part of #417)

@htuch
Copy link
Member

htuch commented Jan 23, 2018

@hausdorff friendly ping.

@htuch
Copy link
Member

htuch commented Jan 30, 2018

@hausdorff I'm going to close this out due to inactivity, feel free to reopen when you want to resume.

@htuch htuch closed this Jan 30, 2018
@douglas-reid
Copy link
Contributor

@htuch I was just about to comment that I was really hoping this could go in, as we need it in Istio. Is the only thing left to cleanup the PR description? Shall I duplicate the current state with a clearer description in a separate PR? How can I best move this forward?

@htuch
Copy link
Member

htuch commented Jan 30, 2018

Yeah, it's good to go if we can update it to use the new Percent message. Happy to reopen or take another PR that continues this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants