Skip to content

Conversation

@Nukitt
Copy link
Contributor

@Nukitt Nukitt commented Sep 17, 2025

Previously, there was no support for metrics interceptor in
DRPC, and capturing drpc metrics wouldn't be possible.

In this PR, we add metrics server interceptor support in DRPC and
reuse the rpc_server_request_duration_nanos metric
for both gRPC and DRPC since the metric reflects server-side
duration agnostic of transport and interceptor chains are not expected
to differ materially in practice. When differentiation is needed, it
can be correlated with other metadata. This way we can observe both
gRPC and DRPC latency through interceptors.

Since this metric is behind a cluster setting, we also had to
change the key of the metric to server.rpc.request_metrics.enabled
so that it can be related to both the rpc framework and set an alias
as its retired name server.grpc.request_metrics.enabled.

Epic: CRDB-49359
Fixes: #144373
Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@Nukitt Nukitt force-pushed the metricsServerInterceptor branch from aefbab8 to c02a61f Compare September 17, 2025 11:40
@Nukitt Nukitt marked this pull request as ready for review September 17, 2025 13:58
@Nukitt Nukitt requested review from a team as code owners September 17, 2025 13:58
@Nukitt Nukitt requested review from cthumuluru-crdb, dhartunian and shubhamdhama and removed request for a team September 17, 2025 13:58
@Nukitt Nukitt force-pushed the metricsServerInterceptor branch from c02a61f to 59818af Compare September 18, 2025 15:19
@Nukitt Nukitt requested a review from aa-joshi September 18, 2025 16:04
Copy link
Contributor

@aa-joshi aa-joshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aa-joshi reviewed 3 of 10 files at r1, 1 of 4 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cthumuluru-crdb, @dhartunian, and @shubhamdhama)


-- commits line 4 at r3:
Let's add the brief about the changes.


-- commits line 10 at r3:
Let's include the release notes as this change is going to affect customer as well


pkg/server/serve_mode.go line 60 at r3 (raw file):

// metrics so the metrics are only exported to external sources such as
// /_status/vars and DataDog.
var serverRPCRequestMetricsEnabled = settings.RegisterBoolSetting(

I see that we are deprecating the existing cluster settings and introducing the new one. I am not really sure about this change as existing customer might get affected due to this. Let's confirm the expectation about renaming/changing the cluster settings.


pkg/rpc/metrics.go line 483 at r3 (raw file):

			RpcMethodLabel:     info.FullMethod,
			RpcStatusCodeLabel: code.String(),
			RpcServerLabel:     "grpc",

I would suggest to add it as constant


pkg/rpc/metrics.go line 500 at r3 (raw file):

	return func(
		ctx context.Context,
		req interface{},

let's use any.


pkg/rpc/metrics.go line 503 at r3 (raw file):

		rpc string,
		handler drpcmux.UnaryHandler,
	) (interface{}, error) {

let's use any.


pkg/rpc/metrics.go line 521 at r3 (raw file):

			RpcMethodLabel:     rpc,
			RpcStatusCodeLabel: code.String(),
			RpcServerLabel:     "drpc",

Let's add it as constant

Copy link
Contributor Author

@Nukitt Nukitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aa-joshi, @cthumuluru-crdb, @dhartunian, and @shubhamdhama)


-- commits line 4 at r3:

Previously, aa-joshi (Akshay Joshi) wrote…

Let's add the brief about the changes.

Done.


-- commits line 10 at r3:

Previously, aa-joshi (Akshay Joshi) wrote…

Let's include the release notes as this change is going to affect customer as well

Done.


pkg/rpc/metrics.go line 483 at r3 (raw file):

Previously, aa-joshi (Akshay Joshi) wrote…

I would suggest to add it as constant

Done.


pkg/rpc/metrics.go line 500 at r3 (raw file):

Previously, aa-joshi (Akshay Joshi) wrote…

let's use any.

Done.


pkg/rpc/metrics.go line 503 at r3 (raw file):

Previously, aa-joshi (Akshay Joshi) wrote…

let's use any.

Done.


pkg/rpc/metrics.go line 521 at r3 (raw file):

Previously, aa-joshi (Akshay Joshi) wrote…

Let's add it as constant

Done.


pkg/server/serve_mode.go line 60 at r3 (raw file):

Previously, aa-joshi (Akshay Joshi) wrote…

I see that we are deprecating the existing cluster settings and introducing the new one. I am not really sure about this change as existing customer might get affected due to this. Let's confirm the expectation about renaming/changing the cluster settings.

Not going ahead with the setting deprecation, instead added an alias for it. Customer shouldn't face any issues now.

@Nukitt Nukitt force-pushed the metricsServerInterceptor branch 2 times, most recently from 71447ae to 4116db7 Compare September 22, 2025 05:53
Copy link
Contributor

@aa-joshi aa-joshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cthumuluru-crdb, @dhartunian, @Nukitt, and @shubhamdhama)


-- commits line 10 at r3:

Previously, Nukitt (Nukit) wrote…

Done.

Let's be explicit about the alias that we are introducing in the cluster settings.


pkg/server/serve_mode.go line 65 at r4 (raw file):

	"enables the collection of rpc metrics",
	false,
	settings.WithName("server.grpc.request_metrics.enabled"),

Should it be WithName or WithRetiredName?

Copy link
Contributor

@cthumuluru-crdb cthumuluru-crdb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aa-joshi, @dhartunian Existing customer dashboards may not account for this new dimension. Adding it could distort the signals customers see today. Should we instead introduce a new metric and keep it internal until DRPC is released. Do you have any recommendations on how we should proceed with this change?

Copy link
Contributor Author

@Nukitt Nukitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aa-joshi, @cthumuluru-crdb, @dhartunian, and @shubhamdhama)


-- commits line 10 at r3:

Previously, aa-joshi (Akshay Joshi) wrote…

Let's be explicit about the alias that we are introducing in the cluster settings.

Done.


pkg/server/serve_mode.go line 65 at r4 (raw file):

Previously, aa-joshi (Akshay Joshi) wrote…

Should it be WithName or WithRetiredName?

WithRetiredName looks more appropriate, will go with that.

@Nukitt
Copy link
Contributor Author

Nukitt commented Oct 30, 2025

Based on the conversation on this slack thread, we evaluated how to expose DRPC server request metrics alongside the existing gRPC server interceptor that records durations in rpc.server.request.duration.nanos (a non-essential metric implemented with Prometheus vector metrics that are being deprecated in favor of AggMetrics) and is disabled by default behind server.grpc.request_metrics.enabled, collecting for methods prefixed with /cockroach.server and /cockroach.ts when enabled.
Adding a DRPC request-metrics interceptor prompted three options: (1) reuse the existing metric, (2) add a distinguishing label (e.g., rpc_server) to the metric, or (3) create a new DRPC-specific metric.
We are now moving ahead with the first option to reuse the existing metric for both gRPC and DRPC since the metric reflects server-side duration agnostic of transport and interceptor chains are not expected to differ materially in practice.
When differentiation is needed, we can correlate with other metadata; since the metric is for internal use, renaming or meaning adjustments are not to be worried about.

@Nukitt Nukitt force-pushed the metricsServerInterceptor branch from 541bb05 to fb02340 Compare October 31, 2025 03:46
Copy link
Contributor

@cthumuluru-crdb cthumuluru-crdb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I've a few minor comments. Can you please take care of those before pushing the PR? Regarding adding guidance feel free to create a ticket to track that separately.

// collection of gRPC and DRPC request duration metrics. This uses export only
// metrics so the metrics are only exported to external sources such as
// /_status/vars and DataDog.
var serverRPCRequestMetricsEnabled = settings.RegisterBoolSetting(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

server.rpc.request_metrics.enabled is not a public setting. This change looks good to me. There is some discussion around cluster setting deprecation here. Can you followup, add the guidance for deprecating and renaming cluster settings and publish it please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok will update this doc accordingly.

@blathers-crl
Copy link

blathers-crl bot commented Nov 2, 2025

Metrics change detected

This PR adds or updates one or more CRDB metrics. If you want these metrics to be exported by CRDB Cloud clusters to Internal CRL Datadog and/or included in the customer metric export integration (Essential metrics for standard deployment, and Essential metrics for advanced deployment), refer to this Installation and Usage guide of a CLI tool that syncs the metric mappings in managed-service. Run this CLI tool after your CRDB PR is merged.

  • The CLI opens a PR in managed-service with the required config changes.
  • Please track that PR and ensure it merges so your metrics become available to CRDB Cloud clusters.

Note: Your metric will appear in Internal CRL Datadog only after the managed-service PR merges and the new OTel configuration rolls out to at least one cluster running a CRDB build that includes this metric.

Docs: cockroach-metric-sync

Questions: reach out to @obs-india-prs

Previously, there was no support for metrics interceptor in
DRPC, and capturing drpc metrics wouldn't be possible.

In this PR, we add metrics server interceptor support in DRPC and
reuse the `rpc_server_request_duration_nanos` metric
for both gRPC and DRPC since the metric reflects server-side
duration agnostic of transport and interceptor chains are not expected
to differ materially in practice. When differentiation is needed, it
can be correlated with other metadata. This way we can observe both
gRPC and DRPC latency through interceptors.

Since this metric is behind a cluster setting, we also had to
change the key of the metric to `server.rpc.request_metrics.enabled`
so that it can be related to both the rpc framework and set an alias
as its retired name `server.grpc.request_metrics.enabled`.

Epic: CRDB-49359
Fixes: cockroachdb#144373
Release note: None
@Nukitt Nukitt force-pushed the metricsServerInterceptor branch from fb02340 to c561d21 Compare November 10, 2025 05:17
@cthumuluru-crdb
Copy link
Contributor

@Nukitt, re: comment here, I don't think we are adding any new metrics right? We’re planning to reuse the existing ones, so these metrics should already be exported to Cloud’s internal Datadog if they’re intended to be.

@Nukitt
Copy link
Contributor Author

Nukitt commented Nov 10, 2025

@Nukitt, re: comment here, I don't think we are adding any new metrics right? We’re planning to reuse the existing ones, so these metrics should already be exported to Cloud’s internal Datadog if they’re intended to be.

@cthumuluru-crdb correct, I have the same understanding hence ignoring the bot's message.

@Nukitt
Copy link
Contributor Author

Nukitt commented Nov 10, 2025

TFTR!

bors r=cthumuluru-crdb

@craig
Copy link
Contributor

craig bot commented Nov 10, 2025

@craig craig bot merged commit 7281dd9 into cockroachdb:master Nov 10, 2025
24 checks passed
rishabh7m added a commit to crltest-issue-syncing/cockroach-fork that referenced this pull request Nov 20, 2025
rishabh7m added a commit to crltest-issue-syncing/cockroach-fork that referenced this pull request Nov 21, 2025
rishabh7m added a commit to crltest-issue-syncing/cockroach-fork that referenced this pull request Nov 21, 2025
rishabh7m added a commit to crltest-issue-syncing/cockroach-fork that referenced this pull request Dec 1, 2025
rishabh7m added a commit to crltest-issue-syncing/cockroach-fork that referenced this pull request Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

server: implement metrics, filtering rules on server state interceptors in drpc

4 participants