-
Notifications
You must be signed in to change notification settings - Fork 4k
rpc, server: add DRPC metrics server interceptor #153651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
aefbab8 to
c02a61f
Compare
c02a61f to
59818af
Compare
aa-joshi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aa-joshi reviewed 3 of 10 files at r1, 1 of 4 files at r3, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @cthumuluru-crdb, @dhartunian, and @shubhamdhama)
-- commits line 4 at r3:
Let's add the brief about the changes.
-- commits line 10 at r3:
Let's include the release notes as this change is going to affect customer as well
pkg/server/serve_mode.go line 60 at r3 (raw file):
// metrics so the metrics are only exported to external sources such as // /_status/vars and DataDog. var serverRPCRequestMetricsEnabled = settings.RegisterBoolSetting(
I see that we are deprecating the existing cluster settings and introducing the new one. I am not really sure about this change as existing customer might get affected due to this. Let's confirm the expectation about renaming/changing the cluster settings.
pkg/rpc/metrics.go line 483 at r3 (raw file):
RpcMethodLabel: info.FullMethod, RpcStatusCodeLabel: code.String(), RpcServerLabel: "grpc",
I would suggest to add it as constant
pkg/rpc/metrics.go line 500 at r3 (raw file):
return func( ctx context.Context, req interface{},
let's use any.
pkg/rpc/metrics.go line 503 at r3 (raw file):
rpc string, handler drpcmux.UnaryHandler, ) (interface{}, error) {
let's use any.
pkg/rpc/metrics.go line 521 at r3 (raw file):
RpcMethodLabel: rpc, RpcStatusCodeLabel: code.String(), RpcServerLabel: "drpc",
Let's add it as constant
Nukitt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aa-joshi, @cthumuluru-crdb, @dhartunian, and @shubhamdhama)
Previously, aa-joshi (Akshay Joshi) wrote…
Let's add the brief about the changes.
Done.
Previously, aa-joshi (Akshay Joshi) wrote…
Let's include the release notes as this change is going to affect customer as well
Done.
pkg/rpc/metrics.go line 483 at r3 (raw file):
Previously, aa-joshi (Akshay Joshi) wrote…
I would suggest to add it as constant
Done.
pkg/rpc/metrics.go line 500 at r3 (raw file):
Previously, aa-joshi (Akshay Joshi) wrote…
let's use any.
Done.
pkg/rpc/metrics.go line 503 at r3 (raw file):
Previously, aa-joshi (Akshay Joshi) wrote…
let's use any.
Done.
pkg/rpc/metrics.go line 521 at r3 (raw file):
Previously, aa-joshi (Akshay Joshi) wrote…
Let's add it as constant
Done.
pkg/server/serve_mode.go line 60 at r3 (raw file):
Previously, aa-joshi (Akshay Joshi) wrote…
I see that we are deprecating the existing cluster settings and introducing the new one. I am not really sure about this change as existing customer might get affected due to this. Let's confirm the expectation about renaming/changing the cluster settings.
Not going ahead with the setting deprecation, instead added an alias for it. Customer shouldn't face any issues now.
71447ae to
4116db7
Compare
aa-joshi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @cthumuluru-crdb, @dhartunian, @Nukitt, and @shubhamdhama)
Previously, Nukitt (Nukit) wrote…
Done.
Let's be explicit about the alias that we are introducing in the cluster settings.
pkg/server/serve_mode.go line 65 at r4 (raw file):
"enables the collection of rpc metrics", false, settings.WithName("server.grpc.request_metrics.enabled"),
Should it be WithName or WithRetiredName?
cthumuluru-crdb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aa-joshi, @dhartunian Existing customer dashboards may not account for this new dimension. Adding it could distort the signals customers see today. Should we instead introduce a new metric and keep it internal until DRPC is released. Do you have any recommendations on how we should proceed with this change?
Nukitt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aa-joshi, @cthumuluru-crdb, @dhartunian, and @shubhamdhama)
Previously, aa-joshi (Akshay Joshi) wrote…
Let's be explicit about the alias that we are introducing in the cluster settings.
Done.
pkg/server/serve_mode.go line 65 at r4 (raw file):
Previously, aa-joshi (Akshay Joshi) wrote…
Should it be
WithNameorWithRetiredName?
WithRetiredName looks more appropriate, will go with that.
4116db7 to
541bb05
Compare
|
Based on the conversation on this slack thread, we evaluated how to expose DRPC server request metrics alongside the existing gRPC server interceptor that records durations in |
541bb05 to
fb02340
Compare
cthumuluru-crdb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I've a few minor comments. Can you please take care of those before pushing the PR? Regarding adding guidance feel free to create a ticket to track that separately.
| // collection of gRPC and DRPC request duration metrics. This uses export only | ||
| // metrics so the metrics are only exported to external sources such as | ||
| // /_status/vars and DataDog. | ||
| var serverRPCRequestMetricsEnabled = settings.RegisterBoolSetting( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
server.rpc.request_metrics.enabled is not a public setting. This change looks good to me. There is some discussion around cluster setting deprecation here. Can you followup, add the guidance for deprecating and renaming cluster settings and publish it please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok will update this doc accordingly.
Metrics change detectedThis PR adds or updates one or more CRDB metrics. If you want these metrics to be exported by CRDB Cloud clusters to Internal CRL Datadog and/or included in the customer metric export integration (Essential metrics for standard deployment, and Essential metrics for advanced deployment), refer to this Installation and Usage guide of a CLI tool that syncs the metric mappings in managed-service. Run this CLI tool after your CRDB PR is merged.
Note: Your metric will appear in Internal CRL Datadog only after the managed-service PR merges and the new OTel configuration rolls out to at least one cluster running a CRDB build that includes this metric. Docs: cockroach-metric-sync Questions: reach out to @obs-india-prs |
Previously, there was no support for metrics interceptor in DRPC, and capturing drpc metrics wouldn't be possible. In this PR, we add metrics server interceptor support in DRPC and reuse the `rpc_server_request_duration_nanos` metric for both gRPC and DRPC since the metric reflects server-side duration agnostic of transport and interceptor chains are not expected to differ materially in practice. When differentiation is needed, it can be correlated with other metadata. This way we can observe both gRPC and DRPC latency through interceptors. Since this metric is behind a cluster setting, we also had to change the key of the metric to `server.rpc.request_metrics.enabled` so that it can be related to both the rpc framework and set an alias as its retired name `server.grpc.request_metrics.enabled`. Epic: CRDB-49359 Fixes: cockroachdb#144373 Release note: None
fb02340 to
c561d21
Compare
@cthumuluru-crdb correct, I have the same understanding hence ignoring the bot's message. |
|
TFTR! bors r=cthumuluru-crdb |
Previously, there was no support for metrics interceptor in
DRPC, and capturing drpc metrics wouldn't be possible.
In this PR, we add metrics server interceptor support in DRPC and
reuse the
rpc_server_request_duration_nanosmetricfor both gRPC and DRPC since the metric reflects server-side
duration agnostic of transport and interceptor chains are not expected
to differ materially in practice. When differentiation is needed, it
can be correlated with other metadata. This way we can observe both
gRPC and DRPC latency through interceptors.
Since this metric is behind a cluster setting, we also had to
change the key of the metric to
server.rpc.request_metrics.enabledso that it can be related to both the rpc framework and set an alias
as its retired name
server.grpc.request_metrics.enabled.Epic: CRDB-49359
Fixes: #144373
Release note: None