Skip to content

trace: optimize id parsing and string functions#7157

Closed
bboreham wants to merge 4 commits intoopen-telemetry:mainfrom
bboreham:hex-parsing
Closed

trace: optimize id parsing and string functions#7157
bboreham wants to merge 4 commits intoopen-telemetry:mainfrom
bboreham:hex-parsing

Conversation

@bboreham
Copy link
Copy Markdown
Contributor

@bboreham bboreham commented Aug 8, 2025

Update of #6791, resolving merge conflicts and adding changelog.

I also proposed a couple more commits rearranging things to avoid duplication between 8-byte and 16-byte versions of the same code; I felt the shorter version is easier to review.

See also #7155, which adds/extends tests to add confidence.

jschaf and others added 4 commits August 8, 2025 12:52
With specialized routines, we can avoid the allocation of hex.DecodeString
since we know the structure of the IDs.

We can use `==` instead of bytes.Equal for arrays. From the Go [spec]:

> Array types are comparable if their array element types are comparable. Two
> array values are equal if their corresponding element values are equal. The
> elements are compared in ascending index order, and comparison stops as soon
> as two element values differ (or all elements have been compared).

[spec]: https://go.dev/ref/spec#Comparison_operators

To generate:
```sh
mkdir private
cd sdk
go test -run=xxxxMatchNothingxxxx -bench=. -count=10 go.opentelemetry.io/otel/sdk/trace -timeout=30m | tee ../private/base.txt
go test -run=xxxxMatchNothingxxxx -bench=. -count=10 go.opentelemetry.io/otel/sdk/trace -timeout=30m | tee ../private/new.txt
benchstat ../private/base.txt ../private/new.txt
```

Results:
```
goos: darwin
goarch: arm64
pkg: go.opentelemetry.io/otel/sdk/trace
cpu: Apple M2 Max
                                              │ ../private/base.txt │          ../private/new.txt          │
                                              │       sec/op        │    sec/op      vs base                │
Truncate/Unlimited-12                                 0.2086n ±  1%   0.2140n ±  3%   +2.59% (p=0.017 n=10)
Truncate/Zero-12                                      0.3048n ±  0%   0.3070n ±  1%   +0.71% (p=0.014 n=10)
Truncate/Short-12                                     0.2083n ±  2%   0.2097n ±  1%        ~ (p=0.148 n=10)
Truncate/ASCII-12                                     0.6870n ±  0%   0.6855n ±  0%        ~ (p=0.493 n=10)
Truncate/ValidUTF-8-12                                 1.298n ±  0%    1.302n ±  1%   +0.31% (p=0.003 n=10)
Truncate/InvalidUTF-8-12                               9.457n ±  0%    9.420n ±  1%        ~ (p=0.529 n=10)
Truncate/MixedUTF-8-12                                 17.30n ±  1%    17.29n ±  0%        ~ (p=0.359 n=10)
RecordingSpanSetAttributes/WithLimit/false-12          2.055µ ±  1%    2.082µ ±  9%   +1.29% (p=0.014 n=10)
RecordingSpanSetAttributes/WithLimit/true-12           4.368µ ±  0%    4.364µ ±  0%   -0.08% (p=0.049 n=10)
SpanEnd-12                                             72.57n ± 17%    73.75n ± 16%        ~ (p=0.853 n=10)
TraceStart/with_a_simple_span-12                       320.1n ± 10%    314.6n ± 10%        ~ (p=0.165 n=10)
TraceStart/with_several_links-12                       432.7n ±  2%    429.4n ±  1%        ~ (p=0.063 n=10)
TraceStart/with_attributes-12                          477.3n ±  1%    468.1n ±  6%   -1.94% (p=0.005 n=10)
SpanLimits/AttributeValueLengthLimit-12                4.401µ ±  1%    4.439µ ±  2%        ~ (p=0.089 n=10)
SpanLimits/AttributeCountLimit-12                      4.125µ ±  1%    4.151µ ±  1%   +0.62% (p=0.014 n=10)
SpanLimits/EventCountLimit-12                          3.900µ ±  2%    3.935µ ±  1%   +0.88% (p=0.023 n=10)
SpanLimits/LinkCountLimit-12                           3.870µ ±  2%    3.901µ ±  1%        ~ (p=0.148 n=10)
SpanLimits/AttributePerEventCountLimit-12              4.212µ ±  1%    4.243µ ±  1%   +0.75% (p=0.008 n=10)
SpanLimits/AttributePerLinkCountLimit-12               4.200µ ±  1%    4.224µ ±  0%   +0.57% (p=0.041 n=10)
SpanSetAttributesOverCapacity-12                       1.661µ ±  1%    1.653µ ±  0%   -0.48% (p=0.049 n=10)
StartEndSpan/AlwaysSample-12                           317.9n ±  0%    316.5n ±  0%   -0.44% (p=0.007 n=10)
StartEndSpan/NeverSample-12                            152.3n ±  0%    152.0n ±  0%   -0.23% (p=0.005 n=10)
SpanWithAttributes_4/AlwaysSample-12                   527.2n ±  0%    532.4n ±  1%   +1.00% (p=0.000 n=10)
SpanWithAttributes_4/NeverSample-12                    240.6n ±  0%    241.6n ±  0%   +0.46% (p=0.000 n=10)
SpanWithAttributes_8/AlwaysSample-12                   704.5n ±  0%    718.3n ±  1%   +1.97% (p=0.000 n=10)
SpanWithAttributes_8/NeverSample-12                    325.0n ±  0%    327.2n ±  1%   +0.68% (p=0.000 n=10)
SpanWithAttributes_all/AlwaysSample-12                 576.8n ±  0%    584.7n ±  1%   +1.37% (p=0.000 n=10)
SpanWithAttributes_all/NeverSample-12                  264.6n ±  1%    263.3n ±  0%   -0.47% (p=0.045 n=10)
SpanWithAttributes_all_2x/AlwaysSample-12              818.6n ±  1%    834.9n ±  0%   +1.98% (p=0.000 n=10)
SpanWithAttributes_all_2x/NeverSample-12               378.3n ±  0%    382.9n ±  1%   +1.23% (p=0.000 n=10)
SpanWithEvents_4/AlwaysSample-12                       715.1n ±  1%    721.1n ±  0%   +0.83% (p=0.003 n=10)
SpanWithEvents_4/NeverSample-12                        156.1n ±  1%    155.1n ±  1%   -0.64% (p=0.002 n=10)
SpanWithEvents_8/AlwaysSample-12                       1.098µ ±  0%    1.104µ ±  0%   +0.55% (p=0.000 n=10)
SpanWithEvents_8/NeverSample-12                        158.8n ±  0%    158.6n ±  1%        ~ (p=0.288 n=10)
SpanWithEvents_WithStackTrace/AlwaysSample-12          438.8n ±  0%    438.5n ±  0%        ~ (p=0.868 n=10)
SpanWithEvents_WithStackTrace/NeverSample-12           168.2n ±  1%    167.4n ±  1%   -0.48% (p=0.014 n=10)
SpanWithEvents_WithTimestamp/AlwaysSample-12           430.6n ±  0%    432.9n ±  0%   +0.53% (p=0.001 n=10)
SpanWithEvents_WithTimestamp/NeverSample-12            193.7n ±  0%    190.1n ±  1%   -1.91% (p=0.000 n=10)
TraceID_DotString-12                                   42.37n ±  0%    24.80n ±  0%  -41.45% (p=0.000 n=10)
SpanID_DotString-12                                    31.30n ±  0%    17.22n ±  0%  -44.98% (p=0.000 n=10)
SpanProcessorOnEnd/batch:_10,_spans:_10-12             163.3n ±  0%    163.4n ±  0%        ~ (p=0.120 n=10)
SpanProcessorOnEnd/batch:_10,_spans:_100-12            1.639µ ±  0%    1.635µ ±  0%   -0.27% (p=0.000 n=10)
SpanProcessorOnEnd/batch:_100,_spans:_10-12            163.3n ±  0%    163.2n ±  0%        ~ (p=0.115 n=10)
SpanProcessorOnEnd/batch:_100,_spans:_100-12           1.636µ ±  0%    1.635µ ±  1%        ~ (p=0.509 n=10)
SpanProcessorVerboseLogging-12                         6.769µ ±  2%    6.600µ ±  2%   -2.49% (p=0.030 n=10)
geomean                                                221.4n          216.4n         -2.29%

                                              │ ../private/base.txt │          ../private/new.txt          │
                                              │        B/op         │     B/op      vs base                 │
Truncate/Unlimited-12                                  0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/Zero-12                                       0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/Short-12                                      0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/ASCII-12                                      0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/ValidUTF-8-12                                 0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/InvalidUTF-8-12                               16.00 ± 0%       16.00 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/MixedUTF-8-12                                 32.00 ± 0%       32.00 ± 0%       ~ (p=1.000 n=10) ¹
RecordingSpanSetAttributes/WithLimit/false-12        6.891Ki ± 0%     6.891Ki ± 0%       ~ (p=1.000 n=10) ¹
RecordingSpanSetAttributes/WithLimit/true-12         7.023Ki ± 0%     7.023Ki ± 0%       ~ (p=1.000 n=10) ¹
SpanEnd-12                                             0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
TraceStart/with_a_simple_span-12                       528.0 ± 0%       528.0 ± 0%       ~ (p=1.000 n=10) ¹
TraceStart/with_several_links-12                       704.0 ± 0%       704.0 ± 0%       ~ (p=1.000 n=10) ¹
TraceStart/with_attributes-12                          784.0 ± 0%       784.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanLimits/AttributeValueLengthLimit-12              10.56Ki ± 0%     10.56Ki ± 0%       ~ (p=1.000 n=10) ¹
SpanLimits/AttributeCountLimit-12                    9.844Ki ± 0%     9.844Ki ± 0%       ~ (p=1.000 n=10) ¹
SpanLimits/EventCountLimit-12                        9.422Ki ± 0%     9.422Ki ± 0%       ~ (p=1.000 n=10) ¹
SpanLimits/LinkCountLimit-12                         9.031Ki ± 0%     9.031Ki ± 0%       ~ (p=1.000 n=10) ¹
SpanLimits/AttributePerEventCountLimit-12            10.47Ki ± 0%     10.47Ki ± 0%       ~ (p=1.000 n=10) ¹
SpanLimits/AttributePerLinkCountLimit-12             10.47Ki ± 0%     10.47Ki ± 0%       ~ (p=1.000 n=10) ¹
SpanSetAttributesOverCapacity-12                       592.0 ± 0%       592.0 ± 0%       ~ (p=1.000 n=10) ¹
StartEndSpan/AlwaysSample-12                           528.0 ± 0%       528.0 ± 0%       ~ (p=1.000 n=10) ¹
StartEndSpan/NeverSample-12                            144.0 ± 0%       144.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_4/AlwaysSample-12                 1.016Ki ± 0%     1.016Ki ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_4/NeverSample-12                    400.0 ± 0%       400.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_8/AlwaysSample-12                 1.516Ki ± 0%     1.516Ki ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_8/NeverSample-12                    656.0 ± 0%       656.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_all/AlwaysSample-12               1.141Ki ± 0%     1.141Ki ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_all/NeverSample-12                  464.0 ± 0%       464.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_all_2x/AlwaysSample-12            1.891Ki ± 0%     1.891Ki ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_all_2x/NeverSample-12               848.0 ± 0%       848.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_4/AlwaysSample-12                     1.016Ki ± 0%     1.016Ki ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_4/NeverSample-12                        144.0 ± 0%       144.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_8/AlwaysSample-12                     1.641Ki ± 0%     1.641Ki ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_8/NeverSample-12                        144.0 ± 0%       144.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_WithStackTrace/AlwaysSample-12          624.0 ± 0%       624.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_WithStackTrace/NeverSample-12           160.0 ± 0%       160.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_WithTimestamp/AlwaysSample-12           648.0 ± 0%       648.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_WithTimestamp/NeverSample-12            184.0 ± 0%       184.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorOnEnd/batch:_10,_spans:_10-12             0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorOnEnd/batch:_10,_spans:_100-12            0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorOnEnd/batch:_100,_spans:_10-12            0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorOnEnd/batch:_100,_spans:_100-12           0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorVerboseLogging-12                       9.547Ki ± 0%     9.547Ki ± 0%       ~ (p=1.000 n=10) ¹
geomean                                                           ²                 +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                              │ ../private/base.txt │         ../private/new.txt         │
                                              │      allocs/op      │ allocs/op   vs base                 │
Truncate/Unlimited-12                                  0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/Zero-12                                       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/Short-12                                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/ASCII-12                                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/ValidUTF-8-12                                 0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/InvalidUTF-8-12                               1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/MixedUTF-8-12                                 1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
RecordingSpanSetAttributes/WithLimit/false-12          3.000 ± 0%     3.000 ± 0%       ~ (p=1.000 n=10) ¹
RecordingSpanSetAttributes/WithLimit/true-12           10.00 ± 0%     10.00 ± 0%       ~ (p=1.000 n=10) ¹
SpanEnd-12                                             0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
TraceStart/with_a_simple_span-12                       2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
TraceStart/with_several_links-12                       3.000 ± 0%     3.000 ± 0%       ~ (p=1.000 n=10) ¹
TraceStart/with_attributes-12                          4.000 ± 0%     4.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanLimits/AttributeValueLengthLimit-12                41.00 ± 0%     41.00 ± 0%       ~ (p=1.000 n=10) ¹
SpanLimits/AttributeCountLimit-12                      38.00 ± 0%     38.00 ± 0%       ~ (p=1.000 n=10) ¹
SpanLimits/EventCountLimit-12                          35.00 ± 0%     35.00 ± 0%       ~ (p=1.000 n=10) ¹
SpanLimits/LinkCountLimit-12                           35.00 ± 0%     35.00 ± 0%       ~ (p=1.000 n=10) ¹
SpanLimits/AttributePerEventCountLimit-12              38.00 ± 0%     38.00 ± 0%       ~ (p=1.000 n=10) ¹
SpanLimits/AttributePerLinkCountLimit-12               38.00 ± 0%     38.00 ± 0%       ~ (p=1.000 n=10) ¹
SpanSetAttributesOverCapacity-12                       3.000 ± 0%     3.000 ± 0%       ~ (p=1.000 n=10) ¹
StartEndSpan/AlwaysSample-12                           2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
StartEndSpan/NeverSample-12                            2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_4/AlwaysSample-12                   4.000 ± 0%     4.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_4/NeverSample-12                    3.000 ± 0%     3.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_8/AlwaysSample-12                   4.000 ± 0%     4.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_8/NeverSample-12                    3.000 ± 0%     3.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_all/AlwaysSample-12                 4.000 ± 0%     4.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_all/NeverSample-12                  3.000 ± 0%     3.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_all_2x/AlwaysSample-12              4.000 ± 0%     4.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_all_2x/NeverSample-12               3.000 ± 0%     3.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_4/AlwaysSample-12                       5.000 ± 0%     5.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_4/NeverSample-12                        2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_8/AlwaysSample-12                       6.000 ± 0%     6.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_8/NeverSample-12                        2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_WithStackTrace/AlwaysSample-12          4.000 ± 0%     4.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_WithStackTrace/NeverSample-12           3.000 ± 0%     3.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_WithTimestamp/AlwaysSample-12           5.000 ± 0%     5.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_WithTimestamp/NeverSample-12            4.000 ± 0%     4.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorOnEnd/batch:_10,_spans:_10-12             0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorOnEnd/batch:_10,_spans:_100-12            0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorOnEnd/batch:_100,_spans:_10-12            0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorOnEnd/batch:_100,_spans:_100-12           0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorVerboseLogging-12                         35.00 ± 0%     35.00 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                                           ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean
```

```
goos: darwin
goarch: arm64
pkg: go.opentelemetry.io/otel/sdk/trace
cpu: Apple M2 Max
                  │ ../private/base_hex.txt │       ../private/new_hex.txt        │
                  │         sec/op          │   sec/op     vs base                │
TraceIDFromHex-12               56.47n ± 0%   15.96n ± 0%  -71.74% (p=0.000 n=10)
SpanIDFromHex-12               34.680n ± 0%   8.742n ± 1%  -74.79% (p=0.000 n=10)
geomean                         44.26n        11.81n       -73.31%

                  │ ../private/base_hex.txt │         ../private/new_hex.txt          │
                  │          B/op           │    B/op     vs base                     │
TraceIDFromHex-12                16.00 ± 0%    0.00 ± 0%  -100.00% (p=0.000 n=10)
SpanIDFromHex-12                 8.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)
geomean                          11.31                    ?                       ¹ ²
¹ summaries must be >0 to compute geomean
² ratios must be >0 to compute geomean

                  │ ../private/base_hex.txt │         ../private/new_hex.txt          │
                  │        allocs/op        │ allocs/op   vs base                     │
TraceIDFromHex-12                1.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)
SpanIDFromHex-12                 1.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)
geomean                          1.000                    ?                       ¹ ²
¹ summaries must be >0 to compute geomean
² ratios must be >0 to compute geomean
```

Issue: open-telemetry#6721
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This version runs ~5% slower.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
@bboreham bboreham marked this pull request as draft August 8, 2025 14:02
@bboreham
Copy link
Copy Markdown
Contributor Author

bboreham commented Aug 8, 2025

Changed to draft in view of the test failures.

@marcschaeferger
Copy link
Copy Markdown
Contributor

I think this PR can be closed, see What is not present from this PR. It looks like the functionality proposed in this PR has already been implemented via #6791.

After reviewing the current code/main Branch, the core optimizations described in this PR as addition to #6791 and #7155 (fast hex parsing and allocation-free formatting for TraceID / SpanID, Changeog...) are already present.

Fast hex parsing (reverse lookup table)

These loops already implement the optimized hex decoding logic with the reverse lookup table and invalid character checks.


Allocation-free formatting for TraceID / SpanID

Formatting is already implemented using fixed-size byte arrays and lookup-table encoding:


What is not present from this PR

  • One difference compared to this draft is the refactor that extracts a hexToBin helper to reduce duplication between the 16-byte (TraceID) and 8-byte (SpanID) parsing logic.
  • There is currently no hexToBin helper in the repository, and the optimized parsing loop is duplicated between TraceIDFromHex and SpanIDFromHex.

@marcschaeferger
Copy link
Copy Markdown
Contributor

@dmathieu @pellared See comment above. PR can be closed

@dmathieu dmathieu closed this Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants