-
Notifications
You must be signed in to change notification settings - Fork 273
Expose trace sampling controls in the public API #375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
2c4373d
33e1cea
2ebd52a
5f4defe
6938a37
bec964a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -227,3 +227,11 @@ message TransportSocket { | |
| // See the supported transport socket implementations for further documentation. | ||
| google.protobuf.Struct config = 2; | ||
| } | ||
|
|
||
| // Percent, typically used to specify things like target sampling percentages among tracing requests | ||
| // (as in, e.g., :ref:`HTTP Connection Manager tracing | ||
| // <envoy_api_field_filter.network.HttpConnectionManager.tracing>`). | ||
| message Percent { | ||
| // The percent, a float between 0 and 1. | ||
| float value = 1 [(validate.rules).float = {gte: 0, lte: 1}]; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would also say that I don't feel very strongly about 0.0-1.0 if people prefer 0.0-100.0.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I prefer ~0.0-~1.0, which I think most people would say is more intuitive. I would consider replacing 1.0 with some other number (e.g., 100) if we were planning on users needing that extra resolution between numbers. I do not get the sense they will, though.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the message is called percent, the value range should be [0, 100]. That is how most people interpret English words.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SGTM, I think per above we should switch to |
||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -80,6 +80,23 @@ message HttpConnectionManager { | |
| // populate the tag name, and the header value is used to populate the tag value. The tag is | ||
| // created if the specified header name is present in the request's headers. | ||
| repeated string request_headers_for_tags = 2; | ||
|
|
||
| // Global target percentage of requests that will be force traced if the *x-client-trace-id* | ||
| // header is set. Percent is resolved to the nearest 1% (rounded down). Defaults to 1 (i.e., 100%). This | ||
| // variable is a direct analog for the variable of the same name in the :ref:`HTTP Connection | ||
| // Manager <config_http_conn_man_runtime>`. | ||
| Percent client_enabled = 3; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The field name is very vague. It is more like "client_enabled_sampling".
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed. This was an unfortunate internal name choice. Let's fix this in the config. I would probably just do
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed. |
||
|
|
||
| // Global target percentage of requests that will be traced after all other checks have been | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The concept of "after all other checks" is questionable here. In a distributed system, we can not assume there is a fixed order of checks. It doesn't seem proper to assume tracing has the privilege of being the last decider.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My understanding is that this is per-HTTP connection manager, which is not distributed, so I think we should just get rid of the "global" and call that out specifically. |
||
| // applied (force tracing, sampling, etc.). Percent is resolved to the nearest 1% (rounded | ||
| // down). Defaults to 1 (i.e., 100%). This variable is a direct analog for the variable of the same name in | ||
| // the :ref:`HTTP Connection Manager <config_http_conn_man_runtime>`. | ||
| Percent global_enabled = 4; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should not design an API based on another API unless the concept is well know, such as country_code. From reading this comment, I don't understand what "global enabled" means to me, either as a developer or an operator.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yup let's clean this up. Let's just call this
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed. |
||
|
|
||
| // Global target percentage of requests that will be randomly traced. Percent is resolved to the | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The concept of Global is strange here. If I run a server, that server can be a standalone server, or a server in a cluster, or a zone, or a region, or many regions. What does "Global" mean here? Does my server have to talk to a global service to do the tracing?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nope, it's per HTTP connection manager (as I understand it). Fixed. |
||
| // nearest 0.01%, rounded down. Defaults to 1 (i.e., 100%). This variable is a direct analog for | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We really should get out of the rounding business. If I am running a memcache service with millions of requests per second, why do I have to follow such policy?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The API docs are telling you what Envoy itself is going to do. We need to explain to users of the API what the effect of setting a specific value will have, so somewhere we need to be clear about how fine grained percentages/ratios will be interpreted in practice. |
||
| // the variable of the same name in the :ref:`HTTP Connection Manager <config_http_conn_man_runtime>`. | ||
| Percent random_sampling = 5; | ||
| } | ||
|
|
||
| // Presence of the object defines whether the connection manager | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some simple comment for docs purposes (e.g. explaining purpose, valid ranges).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, fixed.