Skip to content

thrift proxy: add upstream_rq_time histograms to cluster metrics#15884

Merged
zuercher merged 5 commits intoenvoyproxy:mainfrom
williamsfu99:histograms
Apr 14, 2021
Merged

thrift proxy: add upstream_rq_time histograms to cluster metrics#15884
zuercher merged 5 commits intoenvoyproxy:mainfrom
williamsfu99:histograms

Conversation

@williamsfu99
Copy link
Contributor

Signed-off-by: William Fu wfu@pinterest.com

Commit Message:
In #15668 we began supporting cluster level metrics for the thrift_proxy connection manager. These metrics were limited to messageType counters only; now that these are working OK we are comfortable moving to cluster level histograms for upstream_rq_time.

Additional Description:
One interesting design choice here that surfaced was the naming convention for thrift-related cluster metrics. Mirroring off HTTP as much as possible, we adopted the upstream_ prefix, but to avoid conflating with the HTTP counterpart (in the event that we start supporting multi-protocol clusters) we opted to add thrift in the StatName.

Happy to revise this decision if desired.

Risk Level: Low, #15668 was successful
Testing: New unit tests
Docs Changes: Planning to update docs

Signed-off-by: William Fu <wfu@pinterest.com>
@williamsfu99 williamsfu99 requested a review from zuercher as a code owner April 8, 2021 08:08
Copy link
Member

@rgs1 rgs1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One quick drive by comment, will do a more in-depth pass a bit later

Signed-off-by: William Fu <wfu@pinterest.com>
Copy link
Member

@rgs1 rgs1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few more nits and comments, otherwise LGTM.



The filter also outputs MessageType statistics in the upstream cluster's stat scope.
The filter is also responsible for cluster-level statistics derived from routed clusters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: derived from upstream clusters?

upstream_resp_reply_error_(stat_name_set_->add("upstream_resp_error")),
upstream_resp_exception_(stat_name_set_->add("upstream_resp_exception")),
upstream_resp_invalid_type_(stat_name_set_->add("upstream_resp_invalid_type")),
upstream_rq_time_(stat_name_set_->add("upstream_rq_time")), passthrough_supported_(false) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should just add the thrift. prefix to every one of these stats here, to avoid the runtime concat of these two stats... [ I see it done for some of the TLS code... ]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see both ways being done in the repo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, np then.

William Fu added 2 commits April 9, 2021 01:08
Signed-off-by: William Fu <wfu@pinterest.com>
UpstreamRequest

Signed-off-by: William Fu <wfu@pinterest.com>
rgs1
rgs1 previously approved these changes Apr 12, 2021
Copy link
Member

@rgs1 rgs1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

Over to @zuercher.

Copy link
Member

@zuercher zuercher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Let's note these new stats in the release nodes (in the new features section), and I think we can merge.

Signed-off-by: William Fu <wfu@pinterest.com>
@zuercher zuercher merged commit dffcf01 into envoyproxy:main Apr 14, 2021
@williamsfu99
Copy link
Contributor Author

Thanks for your help @zuercher !

douglas-reid pushed a commit to douglas-reid/envoy that referenced this pull request Apr 19, 2021
…oyproxy#15884)

In envoyproxy#15668 we began supporting cluster level metrics for the thrift_proxy connection manager. These metrics were limited to messageType counters only; now that these are working OK we are comfortable moving to cluster level histograms for upstream_rq_time.

Risk Level: Low
Testing: New unit tests
Docs Changes: future pr
Release Notes: updated
Signed-off-by: William Fu <wfu@pinterest.com>
Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Monkeyanator pushed a commit to Monkeyanator/envoy that referenced this pull request Apr 20, 2021
…oyproxy#15884)

In envoyproxy#15668 we began supporting cluster level metrics for the thrift_proxy connection manager. These metrics were limited to messageType counters only; now that these are working OK we are comfortable moving to cluster level histograms for upstream_rq_time.

Risk Level: Low
Testing: New unit tests
Docs Changes: future pr
Release Notes: updated
Signed-off-by: William Fu <wfu@pinterest.com>
rgs1 pushed a commit to rgs1/envoy that referenced this pull request May 17, 2021
Make things match with what envoyproxy#15884 does.

Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
zuercher pushed a commit that referenced this pull request May 24, 2021
These new histograms should be useful for tracking large requests and responses
in deployments using the Thrift Proxy. This is a follow-up to #15884.

Risk Level: low
Testing: tests added & updated.
Docs Changes: yes, the new histograms and the details of their behavior has been documented.
Release Notes: added.
Platform Specific Features: n/a

Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
leyao-daily pushed a commit to leyao-daily/envoy that referenced this pull request Sep 30, 2021
These new histograms should be useful for tracking large requests and responses
in deployments using the Thrift Proxy. This is a follow-up to envoyproxy#15884.

Risk Level: low
Testing: tests added & updated.
Docs Changes: yes, the new histograms and the details of their behavior has been documented.
Release Notes: added.
Platform Specific Features: n/a

Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants