Skip to content

otelgrpc: optimize stats handler InPayload and OutPayload#8035

Merged
dmathieu merged 10 commits into
open-telemetry:mainfrom
vanja-p:vanja-optimize-stats
Oct 29, 2025
Merged

otelgrpc: optimize stats handler InPayload and OutPayload#8035
dmathieu merged 10 commits into
open-telemetry:mainfrom
vanja-p:vanja-optimize-stats

Conversation

@vanja-p
Copy link
Copy Markdown
Contributor

@vanja-p vanja-p commented Oct 18, 2025

This was inspired by #7186 and profiles I noticed in my app. Also thanks to @boekkooi-impossiblecloud for writing the benchmarks.

InPayload and OutPayload blocks would create a new attribute set for each message. This is particularly bad for streaming calls without thousands of messages. Creating the set once improves the speed and memory allocations of your benchmarks, as well as mine.

goarch: amd64
pkg: go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc
cpu: AMD Ryzen 9 7900X 12-Core Processor            
                                           │  base-meter  │           otelcache-meter           │
                                           │    sec/op    │    sec/op     vs base               │
ServerHandler_HandleRPC_Begin-24              42.14n ± 4%   14.95n ±  6%  -64.52% (p=0.001 n=7)
ServerHandler_HandleRPC_InPayload-24          474.7n ± 6%   323.5n ±  1%  -31.85% (p=0.001 n=7)
ServerHandler_HandleRPC_OutPayload-24         483.4n ± 2%   326.1n ±  4%  -32.54% (p=0.001 n=7)
ServerHandler_HandleRPC_OutTrailer-24         41.81n ± 1%   14.96n ±  3%  -64.22% (p=0.001 n=7)
ServerHandler_HandleRPC_OutHeader-24          237.4n ± 4%   213.4n ±  6%  -10.11% (p=0.001 n=7)
ServerHandler_HandleRPC_End-24                266.7n ± 7%   234.5n ±  9%  -12.07% (p=0.001 n=7)
ServerHandler_HandleRPC_Nil-24                41.84n ± 7%   14.94n ±  5%  -64.29% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_Begin-24         42.09n ± 2%   15.50n ±  4%  -63.17% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_InPayload-24    169.60n ± 2%   44.44n ±  3%  -73.80% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_OutPayload-24   170.90n ± 4%   45.19n ±  1%  -73.56% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_OutTrailer-24    42.77n ± 4%   14.83n ± 10%  -65.33% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_OutHeader-24     43.31n ± 1%   15.98n ±  9%  -63.10% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_End-24           230.1n ± 1%   196.3n ± 14%  -14.69% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_Nil-24           42.02n ± 1%   15.02n ± 10%  -64.26% (p=0.001 n=7)
NoInstrumentation-24                          1.006m ± 6%   1.024m ± 14%        ~ (p=0.209 n=7)
geomean                                       195.6n        93.96n        -51.97%

                                           │  base-meter  │             otelcache-meter              │
                                           │     B/op     │     B/op      vs base                    │
ServerHandler_HandleRPC_Begin-24               32.00 ± 0%      0.00 ± 0%  -100.00% (p=0.001 n=7)
ServerHandler_HandleRPC_InPayload-24           985.0 ± 0%     568.0 ± 0%   -42.34% (p=0.001 n=7)
ServerHandler_HandleRPC_OutPayload-24          985.0 ± 0%     568.0 ± 0%   -42.34% (p=0.001 n=7)
ServerHandler_HandleRPC_OutTrailer-24          32.00 ± 0%      0.00 ± 0%  -100.00% (p=0.001 n=7)
ServerHandler_HandleRPC_OutHeader-24           297.0 ± 0%     265.0 ± 0%   -10.77% (p=0.001 n=7)
ServerHandler_HandleRPC_End-24                 576.0 ± 0%     544.0 ± 0%    -5.56% (p=0.001 n=7)
ServerHandler_HandleRPC_Nil-24                 32.00 ± 0%      0.00 ± 0%  -100.00% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_Begin-24          32.00 ± 0%      0.00 ± 0%  -100.00% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_InPayload-24     432.00 ± 0%     16.00 ± 0%   -96.30% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_OutPayload-24    432.00 ± 0%     16.00 ± 0%   -96.30% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_OutTrailer-24     32.00 ± 0%      0.00 ± 0%  -100.00% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_OutHeader-24      32.00 ± 0%      0.00 ± 0%  -100.00% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_End-24            576.0 ± 0%     544.0 ± 0%    -5.56% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_Nil-24            32.00 ± 0%      0.00 ± 0%  -100.00% (p=0.001 n=7)
NoInstrumentation-24                         2.796Mi ± 3%   2.793Mi ± 2%         ~ (p=0.620 n=7)
geomean                                        261.2                      ?                      ¹ ²
¹ summaries must be >0 to compute geomean
² ratios must be >0 to compute geomean

                                           │ base-meter  │             otelcache-meter             │
                                           │  allocs/op  │  allocs/op   vs base                    │
ServerHandler_HandleRPC_Begin-24              2.000 ± 0%    0.000 ± 0%  -100.00% (p=0.001 n=7)
ServerHandler_HandleRPC_InPayload-24          9.000 ± 0%    5.000 ± 0%   -44.44% (p=0.001 n=7)
ServerHandler_HandleRPC_OutPayload-24         9.000 ± 0%    5.000 ± 0%   -44.44% (p=0.001 n=7)
ServerHandler_HandleRPC_OutTrailer-24         2.000 ± 0%    0.000 ± 0%  -100.00% (p=0.001 n=7)
ServerHandler_HandleRPC_OutHeader-24          6.000 ± 0%    4.000 ± 0%   -33.33% (p=0.001 n=7)
ServerHandler_HandleRPC_End-24                6.000 ± 0%    4.000 ± 0%   -33.33% (p=0.001 n=7)
ServerHandler_HandleRPC_Nil-24                2.000 ± 0%    0.000 ± 0%  -100.00% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_Begin-24         2.000 ± 0%    0.000 ± 0%  -100.00% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_InPayload-24     5.000 ± 0%    1.000 ± 0%   -80.00% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_OutPayload-24    5.000 ± 0%    1.000 ± 0%   -80.00% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_OutTrailer-24    2.000 ± 0%    0.000 ± 0%  -100.00% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_OutHeader-24     2.000 ± 0%    0.000 ± 0%  -100.00% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_End-24           6.000 ± 0%    4.000 ± 0%   -33.33% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_Nil-24           2.000 ± 0%    0.000 ± 0%  -100.00% (p=0.001 n=7)
NoInstrumentation-24                         1.341k ± 2%   1.342k ± 1%         ~ (p=0.739 n=7)
geomean                                       5.309                     ?                      ¹ ²
¹ summaries must be >0 to compute geomean
² ratios must be >0 to compute geomean

This was inspired by open-telemetry#7186 and profiles I noticed in my app.  Also thanks to @boekkooi-impossiblecloud for writing the benchmarks.

InPayload and OutPayload blocks would create a new attribute set for each message. This is particularly bad for streaming calls without thousands of messages. Creating the set once improves the speed and memory allocations of your benchmarks, as well as mine.

```
goos: linux
goarch: amd64
pkg: go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc
cpu: AMD Ryzen 9 7900X 12-Core Processor
                                           │   otelbase    │             otelcached              │
                                           │    sec/op     │    sec/op     vs base               │
ServerHandler_HandleRPC_Begin-24              42.30n ±  3%   42.58n ±  5%        ~ (p=0.710 n=7)
ServerHandler_HandleRPC_InPayload-24          463.1n ±  6%   351.2n ±  5%  -24.16% (p=0.001 n=7)
ServerHandler_HandleRPC_OutPayload-24         461.1n ±  2%   354.2n ±  9%  -23.18% (p=0.001 n=7)
ServerHandler_HandleRPC_OutTrailer-24         42.33n ±  2%   42.28n ±  5%        ~ (p=0.710 n=7)
ServerHandler_HandleRPC_OutHeader-24          231.7n ±  2%   230.7n ±  5%        ~ (p=0.929 n=7)
ServerHandler_HandleRPC_End-24                234.5n ±  6%   149.3n ±  1%  -36.33% (p=0.001 n=7)
ServerHandler_HandleRPC_Nil-24                41.91n ±  3%   41.79n ±  1%        ~ (p=0.594 n=7)
ServerHandler_HandleRPC_NoOp_Begin-24         42.59n ± 10%   42.30n ±  6%        ~ (p=0.318 n=7)
ServerHandler_HandleRPC_NoOp_InPayload-24    167.60n ±  6%   72.09n ±  8%  -56.99% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_OutPayload-24   170.40n ±  3%   71.80n ±  1%  -57.86% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_OutTrailer-24    42.30n ±  2%   42.19n ±  2%        ~ (p=0.318 n=7)
ServerHandler_HandleRPC_NoOp_OutHeader-24     44.60n ±  4%   43.85n ±  2%   -1.68% (p=0.038 n=7)
ServerHandler_HandleRPC_NoOp_End-24           230.1n ±  4%   147.7n ± 22%  -35.81% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_Nil-24           41.69n ±  3%   41.76n ±  2%        ~ (p=0.779 n=7)
NoInstrumentation-24                          985.8µ ± 11%   991.9µ ±  1%        ~ (p=0.902 n=7)
geomean                                       192.8n         156.1n        -19.01%

                                           │   otelbase   │              otelcached               │
                                           │     B/op     │     B/op      vs base                 │
ServerHandler_HandleRPC_Begin-24               32.00 ± 0%     32.00 ± 0%        ~ (p=1.000 n=7) ¹
ServerHandler_HandleRPC_InPayload-24           985.0 ± 0%     600.0 ± 0%  -39.09% (p=0.001 n=7)
ServerHandler_HandleRPC_OutPayload-24          985.0 ± 0%     600.0 ± 0%  -39.09% (p=0.001 n=7)
ServerHandler_HandleRPC_OutTrailer-24          32.00 ± 0%     32.00 ± 0%        ~ (p=1.000 n=7) ¹
ServerHandler_HandleRPC_OutHeader-24           297.0 ± 0%     297.0 ± 0%        ~ (p=1.000 n=7) ¹
ServerHandler_HandleRPC_End-24                 576.0 ± 0%     224.0 ± 0%  -61.11% (p=0.001 n=7)
ServerHandler_HandleRPC_Nil-24                 32.00 ± 0%     32.00 ± 0%        ~ (p=1.000 n=7) ¹
ServerHandler_HandleRPC_NoOp_Begin-24          32.00 ± 0%     32.00 ± 0%        ~ (p=1.000 n=7) ¹
ServerHandler_HandleRPC_NoOp_InPayload-24     432.00 ± 0%     48.00 ± 0%  -88.89% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_OutPayload-24    432.00 ± 0%     48.00 ± 0%  -88.89% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_OutTrailer-24     32.00 ± 0%     32.00 ± 0%        ~ (p=1.000 n=7) ¹
ServerHandler_HandleRPC_NoOp_OutHeader-24      32.00 ± 0%     32.00 ± 0%        ~ (p=1.000 n=7) ¹
ServerHandler_HandleRPC_NoOp_End-24            576.0 ± 0%     224.0 ± 0%  -61.11% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_Nil-24            32.00 ± 0%     32.00 ± 0%        ~ (p=1.000 n=7) ¹
NoInstrumentation-24                         2.805Mi ± 2%   2.796Mi ± 2%        ~ (p=0.259 n=7)
geomean                                        261.3          160.8       -38.44%
¹ all samples are equal

                                           │  otelbase   │              otelcached              │
                                           │  allocs/op  │  allocs/op   vs base                 │
ServerHandler_HandleRPC_Begin-24              2.000 ± 0%    2.000 ± 0%        ~ (p=1.000 n=7) ¹
ServerHandler_HandleRPC_InPayload-24          9.000 ± 0%    7.000 ± 0%  -22.22% (p=0.001 n=7)
ServerHandler_HandleRPC_OutPayload-24         9.000 ± 0%    7.000 ± 0%  -22.22% (p=0.001 n=7)
ServerHandler_HandleRPC_OutTrailer-24         2.000 ± 0%    2.000 ± 0%        ~ (p=1.000 n=7) ¹
ServerHandler_HandleRPC_OutHeader-24          6.000 ± 0%    6.000 ± 0%        ~ (p=1.000 n=7) ¹
ServerHandler_HandleRPC_End-24                6.000 ± 0%    7.000 ± 0%  +16.67% (p=0.001 n=7)
ServerHandler_HandleRPC_Nil-24                2.000 ± 0%    2.000 ± 0%        ~ (p=1.000 n=7) ¹
ServerHandler_HandleRPC_NoOp_Begin-24         2.000 ± 0%    2.000 ± 0%        ~ (p=1.000 n=7) ¹
ServerHandler_HandleRPC_NoOp_InPayload-24     5.000 ± 0%    3.000 ± 0%  -40.00% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_OutPayload-24    5.000 ± 0%    3.000 ± 0%  -40.00% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_OutTrailer-24    2.000 ± 0%    2.000 ± 0%        ~ (p=1.000 n=7) ¹
ServerHandler_HandleRPC_NoOp_OutHeader-24     2.000 ± 0%    2.000 ± 0%        ~ (p=1.000 n=7) ¹
ServerHandler_HandleRPC_NoOp_End-24           6.000 ± 0%    7.000 ± 0%  +16.67% (p=0.001 n=7)
ServerHandler_HandleRPC_NoOp_Nil-24           2.000 ± 0%    2.000 ± 0%        ~ (p=1.000 n=7) ¹
NoInstrumentation-24                         1.342k ± 3%   1.349k ± 2%        ~ (p=0.519 n=7)
geomean                                       5.310         4.898        -7.75%
¹ all samples are equal
```
@vanja-p vanja-p requested review from a team and dashpole as code owners October 18, 2025 01:35
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented Oct 18, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: dmathieu / name: Damien Mathieu (531b7d6)

@codecov
Copy link
Copy Markdown

codecov Bot commented Oct 18, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.1%. Comparing base (203529f) to head (531b7d6).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@          Coverage Diff          @@
##            main   #8035   +/-   ##
=====================================
  Coverage   80.1%   80.1%           
=====================================
  Files        190     190           
  Lines      12260   12266    +6     
=====================================
+ Hits        9825    9831    +6     
  Misses      2073    2073           
  Partials     362     362           
Files with missing lines Coverage Δ
...n/google.golang.org/grpc/otelgrpc/stats_handler.go 98.5% <100.0%> (+<0.1%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@flc1125 flc1125 added the Skip Changelog Allow PR to succeed without requiring an addition to the CHANGELOG label Oct 20, 2025
Copy link
Copy Markdown
Member

@flc1125 flc1125 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some we can adjust to a pool, but I think it could be another PR.

@vanja-p
Copy link
Copy Markdown
Contributor Author

vanja-p commented Oct 20, 2025

I made a small change where I switched which branch in TagRPC allocates a slice, so that if an rpc is filtered out, it's now less likely to create an extra slice. It's in this commit: c7a2b2d

@vanja-p
Copy link
Copy Markdown
Contributor Author

vanja-p commented Oct 20, 2025

I've been benchmarking this with my own code and it doesn't work quite as well as I had hoped. The problem is in the case *stats.End: block. Passing multiple metric.RecordOptions with attributes causes metric.mergeSets to be called, and because this is done for 3 metrics (duration, in msgs, out msgs), it actually allocates more in some cases.

Still trying to see if there's an easy solution.

@vanja-p
Copy link
Copy Markdown
Contributor Author

vanja-p commented Oct 20, 2025

Please take another look.

I figured out why the extra allocations and slowness were not visible in your existing benchmarks and fixed this by using a somewhat real MetricProvider in the benchmarks. This clearly showed that my initial version was adding a bunch of allocations for *stats.End.

The simplest solution seems to be using the original metricAttrs slice in the *stats.End block. Appending one attribute to this and making a new set from it is faster than merging sets even once. This is a bit clunky since gRPCContext now has to keep both a slice and a set with the same values, but it seems ok.

Also, I figured out an easy way to further speed this up and remove more allocations by converting inSize and outSize to a int64Hist once when creating the handler instead of on every handleRPC call. I updated the benchmark results in the PR description.

@vanja-p vanja-p requested review from dashpole and flc1125 October 20, 2025 19:38
@dmathieu dmathieu merged commit 94ad722 into open-telemetry:main Oct 29, 2025
28 checks passed
inMessages int64
outMessages int64
metricAttrs []attribute.KeyValue
metricAttrSet attribute.Set
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny story: I spent the last couple of hours going through a profile from production, and came to this repo to suggest this exact optimisation!

@MrAlias MrAlias mentioned this pull request Dec 8, 2025
@MrAlias MrAlias added this to the v1.39.0 milestone Dec 8, 2025
vanja-p added a commit to buildbuddy-io/buildbuddy that referenced this pull request Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Skip Changelog Allow PR to succeed without requiring an addition to the CHANGELOG

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants