thrift proxy: add upstream_rq_time histograms to cluster metrics#15884
thrift proxy: add upstream_rq_time histograms to cluster metrics#15884zuercher merged 5 commits intoenvoyproxy:mainfrom
Conversation
Signed-off-by: William Fu <wfu@pinterest.com>
rgs1
left a comment
There was a problem hiding this comment.
One quick drive by comment, will do a more in-depth pass a bit later
Signed-off-by: William Fu <wfu@pinterest.com>
rgs1
left a comment
There was a problem hiding this comment.
Just a few more nits and comments, otherwise LGTM.
|
|
||
|
|
||
| The filter also outputs MessageType statistics in the upstream cluster's stat scope. | ||
| The filter is also responsible for cluster-level statistics derived from routed clusters. |
There was a problem hiding this comment.
nit: derived from upstream clusters?
| upstream_resp_reply_error_(stat_name_set_->add("upstream_resp_error")), | ||
| upstream_resp_exception_(stat_name_set_->add("upstream_resp_exception")), | ||
| upstream_resp_invalid_type_(stat_name_set_->add("upstream_resp_invalid_type")), | ||
| upstream_rq_time_(stat_name_set_->add("upstream_rq_time")), passthrough_supported_(false) {} |
There was a problem hiding this comment.
I wonder if we should just add the thrift. prefix to every one of these stats here, to avoid the runtime concat of these two stats... [ I see it done for some of the TLS code... ]
There was a problem hiding this comment.
I see both ways being done in the repo
source/extensions/filters/network/thrift_proxy/router/router_impl.cc
Outdated
Show resolved
Hide resolved
UpstreamRequest Signed-off-by: William Fu <wfu@pinterest.com>
zuercher
left a comment
There was a problem hiding this comment.
Looks good. Let's note these new stats in the release nodes (in the new features section), and I think we can merge.
Signed-off-by: William Fu <wfu@pinterest.com>
|
Thanks for your help @zuercher ! |
…oyproxy#15884) In envoyproxy#15668 we began supporting cluster level metrics for the thrift_proxy connection manager. These metrics were limited to messageType counters only; now that these are working OK we are comfortable moving to cluster level histograms for upstream_rq_time. Risk Level: Low Testing: New unit tests Docs Changes: future pr Release Notes: updated Signed-off-by: William Fu <wfu@pinterest.com> Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>
…oyproxy#15884) In envoyproxy#15668 we began supporting cluster level metrics for the thrift_proxy connection manager. These metrics were limited to messageType counters only; now that these are working OK we are comfortable moving to cluster level histograms for upstream_rq_time. Risk Level: Low Testing: New unit tests Docs Changes: future pr Release Notes: updated Signed-off-by: William Fu <wfu@pinterest.com>
Make things match with what envoyproxy#15884 does. Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
These new histograms should be useful for tracking large requests and responses in deployments using the Thrift Proxy. This is a follow-up to #15884. Risk Level: low Testing: tests added & updated. Docs Changes: yes, the new histograms and the details of their behavior has been documented. Release Notes: added. Platform Specific Features: n/a Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
These new histograms should be useful for tracking large requests and responses in deployments using the Thrift Proxy. This is a follow-up to envoyproxy#15884. Risk Level: low Testing: tests added & updated. Docs Changes: yes, the new histograms and the details of their behavior has been documented. Release Notes: added. Platform Specific Features: n/a Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
Signed-off-by: William Fu wfu@pinterest.com
Commit Message:
In #15668 we began supporting cluster level metrics for the thrift_proxy connection manager. These metrics were limited to messageType counters only; now that these are working OK we are comfortable moving to cluster level histograms for
upstream_rq_time.Additional Description:
One interesting design choice here that surfaced was the naming convention for thrift-related cluster metrics. Mirroring off HTTP as much as possible, we adopted the
upstream_prefix, but to avoid conflating with the HTTP counterpart (in the event that we start supporting multi-protocol clusters) we opted to addthriftin the StatName.Happy to revise this decision if desired.
Risk Level: Low, #15668 was successful
Testing: New unit tests
Docs Changes: Planning to update docs