otelgrpc: optimize stats handler InPayload and OutPayload#8035
Conversation
This was inspired by open-telemetry#7186 and profiles I noticed in my app. Also thanks to @boekkooi-impossiblecloud for writing the benchmarks. InPayload and OutPayload blocks would create a new attribute set for each message. This is particularly bad for streaming calls without thousands of messages. Creating the set once improves the speed and memory allocations of your benchmarks, as well as mine. ``` goos: linux goarch: amd64 pkg: go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc cpu: AMD Ryzen 9 7900X 12-Core Processor │ otelbase │ otelcached │ │ sec/op │ sec/op vs base │ ServerHandler_HandleRPC_Begin-24 42.30n ± 3% 42.58n ± 5% ~ (p=0.710 n=7) ServerHandler_HandleRPC_InPayload-24 463.1n ± 6% 351.2n ± 5% -24.16% (p=0.001 n=7) ServerHandler_HandleRPC_OutPayload-24 461.1n ± 2% 354.2n ± 9% -23.18% (p=0.001 n=7) ServerHandler_HandleRPC_OutTrailer-24 42.33n ± 2% 42.28n ± 5% ~ (p=0.710 n=7) ServerHandler_HandleRPC_OutHeader-24 231.7n ± 2% 230.7n ± 5% ~ (p=0.929 n=7) ServerHandler_HandleRPC_End-24 234.5n ± 6% 149.3n ± 1% -36.33% (p=0.001 n=7) ServerHandler_HandleRPC_Nil-24 41.91n ± 3% 41.79n ± 1% ~ (p=0.594 n=7) ServerHandler_HandleRPC_NoOp_Begin-24 42.59n ± 10% 42.30n ± 6% ~ (p=0.318 n=7) ServerHandler_HandleRPC_NoOp_InPayload-24 167.60n ± 6% 72.09n ± 8% -56.99% (p=0.001 n=7) ServerHandler_HandleRPC_NoOp_OutPayload-24 170.40n ± 3% 71.80n ± 1% -57.86% (p=0.001 n=7) ServerHandler_HandleRPC_NoOp_OutTrailer-24 42.30n ± 2% 42.19n ± 2% ~ (p=0.318 n=7) ServerHandler_HandleRPC_NoOp_OutHeader-24 44.60n ± 4% 43.85n ± 2% -1.68% (p=0.038 n=7) ServerHandler_HandleRPC_NoOp_End-24 230.1n ± 4% 147.7n ± 22% -35.81% (p=0.001 n=7) ServerHandler_HandleRPC_NoOp_Nil-24 41.69n ± 3% 41.76n ± 2% ~ (p=0.779 n=7) NoInstrumentation-24 985.8µ ± 11% 991.9µ ± 1% ~ (p=0.902 n=7) geomean 192.8n 156.1n -19.01% │ otelbase │ otelcached │ │ B/op │ B/op vs base │ ServerHandler_HandleRPC_Begin-24 32.00 ± 0% 32.00 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_InPayload-24 985.0 ± 0% 600.0 ± 0% -39.09% (p=0.001 n=7) ServerHandler_HandleRPC_OutPayload-24 985.0 ± 0% 600.0 ± 0% -39.09% (p=0.001 n=7) ServerHandler_HandleRPC_OutTrailer-24 32.00 ± 0% 32.00 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_OutHeader-24 297.0 ± 0% 297.0 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_End-24 576.0 ± 0% 224.0 ± 0% -61.11% (p=0.001 n=7) ServerHandler_HandleRPC_Nil-24 32.00 ± 0% 32.00 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_NoOp_Begin-24 32.00 ± 0% 32.00 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_NoOp_InPayload-24 432.00 ± 0% 48.00 ± 0% -88.89% (p=0.001 n=7) ServerHandler_HandleRPC_NoOp_OutPayload-24 432.00 ± 0% 48.00 ± 0% -88.89% (p=0.001 n=7) ServerHandler_HandleRPC_NoOp_OutTrailer-24 32.00 ± 0% 32.00 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_NoOp_OutHeader-24 32.00 ± 0% 32.00 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_NoOp_End-24 576.0 ± 0% 224.0 ± 0% -61.11% (p=0.001 n=7) ServerHandler_HandleRPC_NoOp_Nil-24 32.00 ± 0% 32.00 ± 0% ~ (p=1.000 n=7) ¹ NoInstrumentation-24 2.805Mi ± 2% 2.796Mi ± 2% ~ (p=0.259 n=7) geomean 261.3 160.8 -38.44% ¹ all samples are equal │ otelbase │ otelcached │ │ allocs/op │ allocs/op vs base │ ServerHandler_HandleRPC_Begin-24 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_InPayload-24 9.000 ± 0% 7.000 ± 0% -22.22% (p=0.001 n=7) ServerHandler_HandleRPC_OutPayload-24 9.000 ± 0% 7.000 ± 0% -22.22% (p=0.001 n=7) ServerHandler_HandleRPC_OutTrailer-24 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_OutHeader-24 6.000 ± 0% 6.000 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_End-24 6.000 ± 0% 7.000 ± 0% +16.67% (p=0.001 n=7) ServerHandler_HandleRPC_Nil-24 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_NoOp_Begin-24 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_NoOp_InPayload-24 5.000 ± 0% 3.000 ± 0% -40.00% (p=0.001 n=7) ServerHandler_HandleRPC_NoOp_OutPayload-24 5.000 ± 0% 3.000 ± 0% -40.00% (p=0.001 n=7) ServerHandler_HandleRPC_NoOp_OutTrailer-24 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_NoOp_OutHeader-24 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_NoOp_End-24 6.000 ± 0% 7.000 ± 0% +16.67% (p=0.001 n=7) ServerHandler_HandleRPC_NoOp_Nil-24 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=7) ¹ NoInstrumentation-24 1.342k ± 3% 1.349k ± 2% ~ (p=0.519 n=7) geomean 5.310 4.898 -7.75% ¹ all samples are equal ```
|
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8035 +/- ##
=====================================
Coverage 80.1% 80.1%
=====================================
Files 190 190
Lines 12260 12266 +6
=====================================
+ Hits 9825 9831 +6
Misses 2073 2073
Partials 362 362
🚀 New features to boost your workflow:
|
|
I made a small change where I switched which branch in TagRPC allocates a slice, so that if an rpc is filtered out, it's now less likely to create an extra slice. It's in this commit: c7a2b2d |
|
I've been benchmarking this with my own code and it doesn't work quite as well as I had hoped. The problem is in the Still trying to see if there's an easy solution. |
|
Please take another look. I figured out why the extra allocations and slowness were not visible in your existing benchmarks and fixed this by using a somewhat real The simplest solution seems to be using the original Also, I figured out an easy way to further speed this up and remove more allocations by converting |
| inMessages int64 | ||
| outMessages int64 | ||
| metricAttrs []attribute.KeyValue | ||
| metricAttrSet attribute.Set |
There was a problem hiding this comment.
Funny story: I spent the last couple of hours going through a profile from production, and came to this repo to suggest this exact optimisation!
…/grpc/otelgrpc I mostly want to pick up my optimization in open-telemetry/opentelemetry-go-contrib#8035
…/grpc/otelgrpc I just want my optimization from open-telemetry/opentelemetry-go-contrib#8035
…/grpc/otelgrpc (#10876) I just want my optimization from open-telemetry/opentelemetry-go-contrib#8035 Full release notes: https://github.com/open-telemetry/opentelemetry-go-contrib/releases/tag/v1.39.0
This was inspired by #7186 and profiles I noticed in my app. Also thanks to @boekkooi-impossiblecloud for writing the benchmarks.
InPayload and OutPayload blocks would create a new attribute set for each message. This is particularly bad for streaming calls without thousands of messages. Creating the set once improves the speed and memory allocations of your benchmarks, as well as mine.