Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc: add BenchmarkGRPCPing #136558

Merged
merged 1 commit into from
Dec 10, 2024
Merged

rpc: add BenchmarkGRPCPing #136558

merged 1 commit into from
Dec 10, 2024

Conversation

tbg
Copy link
Member

@tbg tbg commented Dec 3, 2024

This benchmarks simple unary requests across a gRPC server with
CockroachDB-specific settings (snappy compression, etc).

The benchmarks show a problematic amount of allocations especially
at small payload sizes, which is likely a common case in CRDB.

We also see that it's highly beneficial to reuse RPC streams, i.e.
to use bidirectional streaming RPCs to implement unary request-response RPCs.
This is because gRPC implements unary RPCs using one-off streams, which
incurs high overhead. I would liken it to CockroachDB using DistSQL for point
lookups.

Results

./dev bench --timeout 1h --ignore-cache --stream-output --bench-mem ./pkg/rpc --filter BenchmarkGRPCPing --test-args '-test.benchtime=5s -test.cpu=8 -test.count 7 -test.timeout=1h' 2>&1 | tee -a bench.txt
benchstat -col /rpc bench.txt
                         │  UnaryUnary  │            StreamStream            │
                         │    sec/op    │   sec/op     vs base               │
GRPCPing/bytes=______1-8   117.23µ ± 6%   63.78µ ± 6%  -45.60% (p=0.001 n=7)
GRPCPing/bytes=____256-8   119.66µ ± 2%   69.45µ ± 4%  -41.96% (p=0.001 n=7)
GRPCPing/bytes=___1024-8   118.16µ ± 5%   65.98µ ± 2%  -44.16% (p=0.001 n=7)
GRPCPing/bytes=___2048-8   127.25µ ± 3%   79.78µ ± 6%  -37.31% (p=0.001 n=7)
GRPCPing/bytes=___4096-8   141.04µ ± 1%   86.34µ ± 4%  -38.78% (p=0.001 n=7)
GRPCPing/bytes=___8192-8    174.0µ ± 4%   117.8µ ± 3%  -32.31% (p=0.001 n=7)
GRPCPing/bytes=__16384-8    241.6µ ± 2%   180.7µ ± 5%  -25.21% (p=0.001 n=7)
GRPCPing/bytes=__32768-8    349.8µ ± 2%   296.5µ ± 2%  -15.25% (p=0.001 n=7)
GRPCPing/bytes=__65536-8    495.1µ ± 2%   428.3µ ± 5%  -13.49% (p=0.001 n=7)
GRPCPing/bytes=_262144-8    1.418m ± 4%   1.386m ± 3%        ~ (p=0.053 n=7)
GRPCPing/bytes=1048576-8    6.002m ± 6%   5.885m ± 2%        ~ (p=0.259 n=7)
geomean                     301.1µ        214.6µ       -28.74%

                         │  UnaryUnary  │             StreamStream             │
                         │     B/s      │      B/s       vs base               │
GRPCPing/bytes=______1-8   400.4Ki ± 5%    732.4Ki ± 5%  +82.93% (p=0.001 n=7)
GRPCPing/bytes=____256-8   4.463Mi ± 2%    7.687Mi ± 4%  +72.22% (p=0.001 n=7)
GRPCPing/bytes=___1024-8   16.92Mi ± 5%    30.30Mi ± 2%  +79.09% (p=0.001 n=7)
GRPCPing/bytes=___2048-8   31.06Mi ± 3%    49.54Mi ± 6%  +59.50% (p=0.001 n=7)
GRPCPing/bytes=___4096-8   55.71Mi ± 1%    91.02Mi ± 5%  +63.37% (p=0.001 n=7)
GRPCPing/bytes=___8192-8   90.06Mi ± 4%   133.04Mi ± 3%  +47.73% (p=0.001 n=7)
GRPCPing/bytes=__16384-8   129.5Mi ± 2%    173.2Mi ± 5%  +33.70% (p=0.001 n=7)
GRPCPing/bytes=__32768-8   178.8Mi ± 2%    211.0Mi ± 2%  +18.00% (p=0.001 n=7)
GRPCPing/bytes=__65536-8   252.6Mi ± 2%    292.0Mi ± 5%  +15.60% (p=0.001 n=7)
GRPCPing/bytes=_262144-8   352.6Mi ± 4%    360.9Mi ± 4%        ~ (p=0.053 n=7)
GRPCPing/bytes=1048576-8   333.2Mi ± 6%    339.9Mi ± 2%        ~ (p=0.259 n=7)
geomean                    48.06Mi         67.42Mi       +40.27%

                         │  UnaryUnary   │            StreamStream             │
                         │     B/op      │     B/op      vs base               │
GRPCPing/bytes=______1-8   14.992Ki ± 2%   3.633Ki ± 2%  -75.77% (p=0.001 n=7)
GRPCPing/bytes=____256-8   17.512Ki ± 3%   6.460Ki ± 1%  -63.11% (p=0.001 n=7)
GRPCPing/bytes=___1024-8    26.63Ki ± 1%   15.67Ki ± 1%  -41.18% (p=0.001 n=7)
GRPCPing/bytes=___2048-8    38.52Ki ± 1%   27.67Ki ± 1%  -28.17% (p=0.001 n=7)
GRPCPing/bytes=___4096-8    65.89Ki ± 1%   54.97Ki ± 0%  -16.57% (p=0.001 n=7)
GRPCPing/bytes=___8192-8    116.9Ki ± 1%   105.4Ki ± 1%   -9.88% (p=0.001 n=7)
GRPCPing/bytes=__16384-8    215.6Ki ± 0%   203.5Ki ± 1%   -5.61% (p=0.001 n=7)
GRPCPing/bytes=__32768-8    459.8Ki ± 1%   444.5Ki ± 1%   -3.33% (p=0.001 n=7)
GRPCPing/bytes=__65536-8    808.4Ki ± 1%   791.8Ki ± 1%   -2.05% (p=0.001 n=7)
GRPCPing/bytes=_262144-8    3.289Mi ± 0%   3.276Mi ± 0%   -0.40% (p=0.017 n=7)
GRPCPing/bytes=1048576-8    12.94Mi ± 0%   12.94Mi ± 0%        ~ (p=0.053 n=7)
geomean                     182.4Ki        130.5Ki       -28.42%

                         │ UnaryUnary  │           StreamStream            │
                         │  allocs/op  │ allocs/op   vs base               │
GRPCPing/bytes=______1-8   182.00 ± 0%   46.00 ± 0%  -74.73% (p=0.001 n=7)
GRPCPing/bytes=____256-8   182.00 ± 0%   46.00 ± 0%  -74.73% (p=0.001 n=7)
GRPCPing/bytes=___1024-8   182.00 ± 0%   47.00 ± 0%  -74.18% (p=0.001 n=7)
GRPCPing/bytes=___2048-8   183.00 ± 0%   48.00 ± 2%  -73.77% (p=0.001 n=7)
GRPCPing/bytes=___4096-8   183.00 ± 0%   48.00 ± 0%  -73.77% (p=0.001 n=7)
GRPCPing/bytes=___8192-8   183.00 ± 0%   50.00 ± 2%  -72.68% (p=0.001 n=7)
GRPCPing/bytes=__16384-8   194.00 ± 0%   55.00 ± 2%  -71.65% (p=0.001 n=7)
GRPCPing/bytes=__32768-8   209.00 ± 0%   72.00 ± 1%  -65.55% (p=0.001 n=7)
GRPCPing/bytes=__65536-8   216.00 ± 0%   80.00 ± 1%  -62.96% (p=0.001 n=7)
GRPCPing/bytes=_262144-8    275.0 ± 0%   141.0 ± 1%  -48.73% (p=0.001 n=7)
GRPCPing/bytes=1048576-8    487.0 ± 1%   350.0 ± 3%  -28.13% (p=0.001 n=7)
geomean                     214.1        69.37       -67.60%

Informs #134971.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@tbg tbg marked this pull request as ready for review December 3, 2024 08:54
@tbg tbg requested a review from a team as a code owner December 3, 2024 08:54
Copy link
Contributor

@cthumuluru-crdb cthumuluru-crdb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

pkg/rpc/context_test.go Show resolved Hide resolved
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @tbg)


pkg/rpc/context_test.go line 2272 at r1 (raw file):

					return anyresp, nil
				},
				SU: func(srv grpcutils.GRPCTest_StreamUnaryServer) error {

Does the benchmark use this one-directional stream?


pkg/rpc/context_test.go line 2302 at r1 (raw file):

			cliRPCCtx := newTestContext(uuid.MakeV4(), clock, maxOffset, stopper)
			cliRPCCtx.NodeID.Set(ctx, 2)
			cc, err := cliRPCCtx.grpcDialRaw(ctx, remoteAddr, DefaultClass)

I'm just now finding that the sysbench microbenchmarks exercise the loopbackTransport, which means that they bypass the TCP stack and disable snappy compression. Is that the case here are well?


pkg/rpc/context_test.go line 2342 at r1 (raw file):

					b.SetBytes(int64(req.Size() + resp.Size()))
					b.ResetTimer()
					defer b.StopTimer()

nit: is this needed? Is there some cleanup that this is guarding against measuring?

This benchmarks simple unary requests across a gRPC server with
CockroachDB-specific settings (snappy compression, etc).
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRS!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cthumuluru-crdb and @nvanbenschoten)


pkg/rpc/context_test.go line 2272 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Does the benchmark use this one-directional stream?

No, removing.


pkg/rpc/context_test.go line 2302 at r1 (raw file):

I'm just now finding that the sysbench microbenchmarks exercise the loopbackTransport

I assume you'll do something about that, right? I think I saw the effect of this when I looked at some profiles and the only marshaling/unmarshaling I could see was for the raft transport. Though I'm confused why this wouldn't have been elided as well: possible it just happens to not call into a code path that has access to the local transport?

Either way, we don't hit that here. The loopback transport is only used when we are using an rpcContext to connect to its own AdvertiseAddr:

/pkg/rpc/context.go#L1351-L1354

	transport := tcpTransport
	if rpcCtx.ContextOptions.AdvertiseAddr == target && !rpcCtx.ClientOnly {
		// See the explanation on loopbackDialFn for an explanation about this.
		transport = loopbackTransport

but here we are starting a grpc server at a random unused port. I also verified this (unscientifically) with a printf.


pkg/rpc/context_test.go line 2342 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: is this needed? Is there some cleanup that this is guarding against measuring?

No, this was left over from an earlier version of the benchmark. Removing.

@craig craig bot merged commit bb4d357 into cockroachdb:master Dec 10, 2024
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants