otelgrpc: optimize stats handler InPayload and OutPayload by vanja-p · Pull Request #8035 · open-telemetry/opentelemetry-go-contrib

vanja-p · 2025-10-18T01:35:53Z

This was inspired by #7186 and profiles I noticed in my app. Also thanks to @boekkooi-impossiblecloud for writing the benchmarks.

InPayload and OutPayload blocks would create a new attribute set for each message. This is particularly bad for streaming calls without thousands of messages. Creating the set once improves the speed and memory allocations of your benchmarks, as well as mine.

goarch: amd64
pkg: go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc
cpu: AMD Ryzen 9 7900X 12-Core Processor            
                                           │  base-meter  │           otelcache-meter           │
                                           │    sec/op    │    sec/op     vs base               │
ServerHandler_HandleRPC_Begin-24              42.14n ± 4%   14.95n ±  6%  -64.52% (p=0.001 n=7)
ServerHandler_HandleRPC_InPayload-24          474.7n ± 6%   323.5n ±  1%  -31.85% (p=0.001 n=7)
ServerHandler_HandleRPC_OutPayload-24         483.4n ± 2%   326.1n ±  4%  -32.54% (p=0.001 n=7)
ServerHandler_HandleRPC_OutTrailer-24         41.81n ± 1%   14.96n ±  3%  -64.22% (p=0.001 n=7)
ServerHandler_HandleRPC_OutHeader-24          237.4n ± 4%   213.4n ±  6%  -10.11% (p=0.001 n=7)
ServerHandler_HandleRPC_End-24                266.7n ± 7%   234.5n ±  9%  -12.07% (p=0.001 n=7)
ServerHandler_HandleRPC_Nil-24                41.84n ± 7%   14.94n ±  5%  -64.29% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_Begin-24         42.09n ± 2%   15.50n ±  4%  -63.17% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_InPayload-24    169.60n ± 2%   44.44n ±  3%  -73.80% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_OutPayload-24   170.90n ± 4%   45.19n ±  1%  -73.56% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_OutTrailer-24    42.77n ± 4%   14.83n ± 10%  -65.33% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_OutHeader-24     43.31n ± 1%   15.98n ±  9%  -63.10% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_End-24           230.1n ± 1%   196.3n ± 14%  -14.69% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_Nil-24           42.02n ± 1%   15.02n ± 10%  -64.26% (p=0.001 n=7)
NoInstrumentation-24                          1.006m ± 6%   1.024m ± 14%        ~ (p=0.209 n=7)
geomean                                       195.6n        93.96n        -51.97%

                                           │  base-meter  │             otelcache-meter              │
                                           │     B/op     │     B/op      vs base                    │
ServerHandler_HandleRPC_Begin-24               32.00 ± 0%      0.00 ± 0%  -100.00% (p=0.001 n=7)
ServerHandler_HandleRPC_InPayload-24           985.0 ± 0%     568.0 ± 0%   -42.34% (p=0.001 n=7)
ServerHandler_HandleRPC_OutPayload-24          985.0 ± 0%     568.0 ± 0%   -42.34% (p=0.001 n=7)
ServerHandler_HandleRPC_OutTrailer-24          32.00 ± 0%      0.00 ± 0%  -100.00% (p=0.001 n=7)
ServerHandler_HandleRPC_OutHeader-24           297.0 ± 0%     265.0 ± 0%   -10.77% (p=0.001 n=7)
ServerHandler_HandleRPC_End-24                 576.0 ± 0%     544.0 ± 0%    -5.56% (p=0.001 n=7)
ServerHandler_HandleRPC_Nil-24                 32.00 ± 0%      0.00 ± 0%  -100.00% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_Begin-24          32.00 ± 0%      0.00 ± 0%  -100.00% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_InPayload-24     432.00 ± 0%     16.00 ± 0%   -96.30% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_OutPayload-24    432.00 ± 0%     16.00 ± 0%   -96.30% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_OutTrailer-24     32.00 ± 0%      0.00 ± 0%  -100.00% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_OutHeader-24      32.00 ± 0%      0.00 ± 0%  -100.00% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_End-24            576.0 ± 0%     544.0 ± 0%    -5.56% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_Nil-24            32.00 ± 0%      0.00 ± 0%  -100.00% (p=0.001 n=7)
NoInstrumentation-24                         2.796Mi ± 3%   2.793Mi ± 2%         ~ (p=0.620 n=7)
geomean                                        261.2                      ?                      ¹ ²
¹ summaries must be >0 to compute geomean
² ratios must be >0 to compute geomean

                                           │ base-meter  │             otelcache-meter             │
                                           │  allocs/op  │  allocs/op   vs base                    │
ServerHandler_HandleRPC_Begin-24              2.000 ± 0%    0.000 ± 0%  -100.00% (p=0.001 n=7)
ServerHandler_HandleRPC_InPayload-24          9.000 ± 0%    5.000 ± 0%   -44.44% (p=0.001 n=7)
ServerHandler_HandleRPC_OutPayload-24         9.000 ± 0%    5.000 ± 0%   -44.44% (p=0.001 n=7)
ServerHandler_HandleRPC_OutTrailer-24         2.000 ± 0%    0.000 ± 0%  -100.00% (p=0.001 n=7)
ServerHandler_HandleRPC_OutHeader-24          6.000 ± 0%    4.000 ± 0%   -33.33% (p=0.001 n=7)
ServerHandler_HandleRPC_End-24                6.000 ± 0%    4.000 ± 0%   -33.33% (p=0.001 n=7)
ServerHandler_HandleRPC_Nil-24                2.000 ± 0%    0.000 ± 0%  -100.00% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_Begin-24         2.000 ± 0%    0.000 ± 0%  -100.00% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_InPayload-24     5.000 ± 0%    1.000 ± 0%   -80.00% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_OutPayload-24    5.000 ± 0%    1.000 ± 0%   -80.00% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_OutTrailer-24    2.000 ± 0%    0.000 ± 0%  -100.00% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_OutHeader-24     2.000 ± 0%    0.000 ± 0%  -100.00% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_End-24           6.000 ± 0%    4.000 ± 0%   -33.33% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_Nil-24           2.000 ± 0%    0.000 ± 0%  -100.00% (p=0.001 n=7)
NoInstrumentation-24                         1.341k ± 2%   1.342k ± 1%         ~ (p=0.739 n=7)
geomean                                       5.309                     ?                      ¹ ²
¹ summaries must be >0 to compute geomean
² ratios must be >0 to compute geomean

@boekkooi-impossiblecloud

This was inspired by open-telemetry#7186 and profiles I noticed in my app. Also thanks to @boekkooi-impossiblecloud for writing the benchmarks. InPayload and OutPayload blocks would create a new attribute set for each message. This is particularly bad for streaming calls without thousands of messages. Creating the set once improves the speed and memory allocations of your benchmarks, as well as mine. ``` goos: linux goarch: amd64 pkg: go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc cpu: AMD Ryzen 9 7900X 12-Core Processor │ otelbase │ otelcached │ │ sec/op │ sec/op vs base │ ServerHandler_HandleRPC_Begin-24 42.30n ± 3% 42.58n ± 5% ~ (p=0.710 n=7) ServerHandler_HandleRPC_InPayload-24 463.1n ± 6% 351.2n ± 5% -24.16% (p=0.001 n=7) ServerHandler_HandleRPC_OutPayload-24 461.1n ± 2% 354.2n ± 9% -23.18% (p=0.001 n=7) ServerHandler_HandleRPC_OutTrailer-24 42.33n ± 2% 42.28n ± 5% ~ (p=0.710 n=7) ServerHandler_HandleRPC_OutHeader-24 231.7n ± 2% 230.7n ± 5% ~ (p=0.929 n=7) ServerHandler_HandleRPC_End-24 234.5n ± 6% 149.3n ± 1% -36.33% (p=0.001 n=7) ServerHandler_HandleRPC_Nil-24 41.91n ± 3% 41.79n ± 1% ~ (p=0.594 n=7) ServerHandler_HandleRPC_NoOp_Begin-24 42.59n ± 10% 42.30n ± 6% ~ (p=0.318 n=7) ServerHandler_HandleRPC_NoOp_InPayload-24 167.60n ± 6% 72.09n ± 8% -56.99% (p=0.001 n=7) ServerHandler_HandleRPC_NoOp_OutPayload-24 170.40n ± 3% 71.80n ± 1% -57.86% (p=0.001 n=7) ServerHandler_HandleRPC_NoOp_OutTrailer-24 42.30n ± 2% 42.19n ± 2% ~ (p=0.318 n=7) ServerHandler_HandleRPC_NoOp_OutHeader-24 44.60n ± 4% 43.85n ± 2% -1.68% (p=0.038 n=7) ServerHandler_HandleRPC_NoOp_End-24 230.1n ± 4% 147.7n ± 22% -35.81% (p=0.001 n=7) ServerHandler_HandleRPC_NoOp_Nil-24 41.69n ± 3% 41.76n ± 2% ~ (p=0.779 n=7) NoInstrumentation-24 985.8µ ± 11% 991.9µ ± 1% ~ (p=0.902 n=7) geomean 192.8n 156.1n -19.01% │ otelbase │ otelcached │ │ B/op │ B/op vs base │ ServerHandler_HandleRPC_Begin-24 32.00 ± 0% 32.00 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_InPayload-24 985.0 ± 0% 600.0 ± 0% -39.09% (p=0.001 n=7) ServerHandler_HandleRPC_OutPayload-24 985.0 ± 0% 600.0 ± 0% -39.09% (p=0.001 n=7) ServerHandler_HandleRPC_OutTrailer-24 32.00 ± 0% 32.00 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_OutHeader-24 297.0 ± 0% 297.0 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_End-24 576.0 ± 0% 224.0 ± 0% -61.11% (p=0.001 n=7) ServerHandler_HandleRPC_Nil-24 32.00 ± 0% 32.00 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_NoOp_Begin-24 32.00 ± 0% 32.00 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_NoOp_InPayload-24 432.00 ± 0% 48.00 ± 0% -88.89% (p=0.001 n=7) ServerHandler_HandleRPC_NoOp_OutPayload-24 432.00 ± 0% 48.00 ± 0% -88.89% (p=0.001 n=7) ServerHandler_HandleRPC_NoOp_OutTrailer-24 32.00 ± 0% 32.00 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_NoOp_OutHeader-24 32.00 ± 0% 32.00 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_NoOp_End-24 576.0 ± 0% 224.0 ± 0% -61.11% (p=0.001 n=7) ServerHandler_HandleRPC_NoOp_Nil-24 32.00 ± 0% 32.00 ± 0% ~ (p=1.000 n=7) ¹ NoInstrumentation-24 2.805Mi ± 2% 2.796Mi ± 2% ~ (p=0.259 n=7) geomean 261.3 160.8 -38.44% ¹ all samples are equal │ otelbase │ otelcached │ │ allocs/op │ allocs/op vs base │ ServerHandler_HandleRPC_Begin-24 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_InPayload-24 9.000 ± 0% 7.000 ± 0% -22.22% (p=0.001 n=7) ServerHandler_HandleRPC_OutPayload-24 9.000 ± 0% 7.000 ± 0% -22.22% (p=0.001 n=7) ServerHandler_HandleRPC_OutTrailer-24 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_OutHeader-24 6.000 ± 0% 6.000 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_End-24 6.000 ± 0% 7.000 ± 0% +16.67% (p=0.001 n=7) ServerHandler_HandleRPC_Nil-24 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_NoOp_Begin-24 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_NoOp_InPayload-24 5.000 ± 0% 3.000 ± 0% -40.00% (p=0.001 n=7) ServerHandler_HandleRPC_NoOp_OutPayload-24 5.000 ± 0% 3.000 ± 0% -40.00% (p=0.001 n=7) ServerHandler_HandleRPC_NoOp_OutTrailer-24 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_NoOp_OutHeader-24 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=7) ¹ ServerHandler_HandleRPC_NoOp_End-24 6.000 ± 0% 7.000 ± 0% +16.67% (p=0.001 n=7) ServerHandler_HandleRPC_NoOp_Nil-24 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=7) ¹ NoInstrumentation-24 1.342k ± 3% 1.349k ± 2% ~ (p=0.519 n=7) geomean 5.310 4.898 -7.75% ¹ all samples are equal ```

linux-foundation-easycla · 2025-10-18T01:36:00Z

The committers listed above are authorized under a signed CLA.

✅ login: dmathieu / name: Damien Mathieu (531b7d6)

codecov · 2025-10-18T12:54:54Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.1%. Comparing base (203529f) to head (531b7d6).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@          Coverage Diff          @@
##            main   #8035   +/-   ##
=====================================
  Coverage   80.1%   80.1%           
=====================================
  Files        190     190           
  Lines      12260   12266    +6     
=====================================
+ Hits        9825    9831    +6     
  Misses      2073    2073           
  Partials     362     362

Files with missing lines	Coverage Δ
...n/google.golang.org/grpc/otelgrpc/stats_handler.go	`98.5% <100.0%> (+<0.1%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

flc1125

There are some we can adjust to a pool, but I think it could be another PR.

vanja-p · 2025-10-20T14:38:04Z

I made a small change where I switched which branch in TagRPC allocates a slice, so that if an rpc is filtered out, it's now less likely to create an extra slice. It's in this commit: c7a2b2d

vanja-p · 2025-10-20T15:28:30Z

I've been benchmarking this with my own code and it doesn't work quite as well as I had hoped. The problem is in the case *stats.End: block. Passing multiple metric.RecordOptions with attributes causes metric.mergeSets to be called, and because this is done for 3 metrics (duration, in msgs, out msgs), it actually allocates more in some cases.

Still trying to see if there's an easy solution.

vanja-p · 2025-10-20T19:37:11Z

Please take another look.

I figured out why the extra allocations and slowness were not visible in your existing benchmarks and fixed this by using a somewhat real MetricProvider in the benchmarks. This clearly showed that my initial version was adding a bunch of allocations for *stats.End.

The simplest solution seems to be using the original metricAttrs slice in the *stats.End block. Appending one attribute to this and making a new set from it is faster than merging sets even once. This is a bit clunky since gRPCContext now has to keep both a slice and a set with the same values, but it seems ok.

Also, I figured out an easy way to further speed this up and remove more allocations by converting inSize and outSize to a int64Hist once when creating the handler instead of on every handleRPC call. I updated the benchmark results in the PR description.

bboreham · 2025-10-29T17:56:09Z

+	inMessages    int64
+	outMessages   int64
+	metricAttrs   []attribute.KeyValue
+	metricAttrSet attribute.Set


Funny story: I spent the last couple of hours going through a profile from production, and came to this repo to suggest this exact optimisation!

…/grpc/otelgrpc I mostly want to pick up my optimization in open-telemetry/opentelemetry-go-contrib#8035

…/grpc/otelgrpc I just want my optimization from open-telemetry/opentelemetry-go-contrib#8035

…/grpc/otelgrpc (#10876) I just want my optimization from open-telemetry/opentelemetry-go-contrib#8035 Full release notes: https://github.com/open-telemetry/opentelemetry-go-contrib/releases/tag/v1.39.0

vanja-p requested review from a team and dashpole as code owners October 18, 2025 01:35

flc1125 added the Skip Changelog Allow PR to succeed without requiring an addition to the CHANGELOG label Oct 20, 2025

flc1125 approved these changes Oct 20, 2025

View reviewed changes

vanja-p added 2 commits October 20, 2025 10:35

Merge remote-tracking branch 'upstream/main' into vanja-optimize-stats

34bab9b

flip slice allocation

c7a2b2d

dashpole approved these changes Oct 20, 2025

View reviewed changes

vanja-p added 4 commits October 20, 2025 11:59

don't use cached set for End block

dc54330

fix whitespace

913637e

use meter in benchmarks

4f2dc42

don't convert to interface on each handleRPC call

5873c7f

vanja-p requested review from dashpole and flc1125 October 20, 2025 19:38

Update CHANGELOG.md

60cba14

dashpole approved these changes Oct 28, 2025

View reviewed changes

dmathieu approved these changes Oct 28, 2025

View reviewed changes

Merge branch 'main' into vanja-optimize-stats

c5b89e6

flc1125 approved these changes Oct 29, 2025

View reviewed changes

Merge branch 'main' into vanja-optimize-stats

531b7d6

dmathieu merged commit 94ad722 into open-telemetry:main Oct 29, 2025
28 checks passed

bboreham reviewed Oct 29, 2025

View reviewed changes

MrAlias mentioned this pull request Dec 8, 2025

Release v1.39.0 #8261

Merged

MrAlias added this to the v1.39.0 milestone Dec 8, 2025

vanja-p added a commit to buildbuddy-io/buildbuddy that referenced this pull request Dec 10, 2025

Upgrade go.opentelemetry.io/contrib/instrumentation/google.golang.org…

59efe7b

…/grpc/otelgrpc I mostly want to pick up my optimization in open-telemetry/opentelemetry-go-contrib#8035

vanja-p added a commit to buildbuddy-io/buildbuddy that referenced this pull request Dec 10, 2025

Upgrade go.opentelemetry.io/contrib/instrumentation/google.golang.org…

6cd13f5

…/grpc/otelgrpc I just want my optimization from open-telemetry/opentelemetry-go-contrib#8035

vanja-p mentioned this pull request Dec 10, 2025

Upgrade go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc buildbuddy-io/buildbuddy#10876

Merged

bboreham mentioned this pull request Jan 8, 2026

Improve performance of attribute.NewSet open-telemetry/opentelemetry-go#7743

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

otelgrpc: optimize stats handler InPayload and OutPayload#8035

otelgrpc: optimize stats handler InPayload and OutPayload#8035
dmathieu merged 10 commits into
open-telemetry:mainfrom
vanja-p:vanja-optimize-stats

vanja-p commented Oct 18, 2025 •

edited

Loading

Uh oh!

linux-foundation-easycla Bot commented Oct 18, 2025 •

edited

Loading

Uh oh!

codecov Bot commented Oct 18, 2025 •

edited

Loading

Uh oh!

flc1125 left a comment •

edited

Loading

Uh oh!

vanja-p commented Oct 20, 2025

Uh oh!

vanja-p commented Oct 20, 2025

Uh oh!

vanja-p commented Oct 20, 2025

Uh oh!

Uh oh!

bboreham Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

vanja-p commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linux-foundation-easycla Bot commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

flc1125 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vanja-p commented Oct 20, 2025

Uh oh!

vanja-p commented Oct 20, 2025

Uh oh!

vanja-p commented Oct 20, 2025

Uh oh!

Uh oh!

bboreham Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

vanja-p commented Oct 18, 2025 •

edited

Loading

linux-foundation-easycla Bot commented Oct 18, 2025 •

edited

Loading

codecov Bot commented Oct 18, 2025 •

edited

Loading

flc1125 left a comment •

edited

Loading