trace: optimize id parsing and string functions#7157
trace: optimize id parsing and string functions#7157bboreham wants to merge 4 commits intoopen-telemetry:mainfrom
Conversation
With specialized routines, we can avoid the allocation of hex.DecodeString since we know the structure of the IDs. We can use `==` instead of bytes.Equal for arrays. From the Go [spec]: > Array types are comparable if their array element types are comparable. Two > array values are equal if their corresponding element values are equal. The > elements are compared in ascending index order, and comparison stops as soon > as two element values differ (or all elements have been compared). [spec]: https://go.dev/ref/spec#Comparison_operators To generate: ```sh mkdir private cd sdk go test -run=xxxxMatchNothingxxxx -bench=. -count=10 go.opentelemetry.io/otel/sdk/trace -timeout=30m | tee ../private/base.txt go test -run=xxxxMatchNothingxxxx -bench=. -count=10 go.opentelemetry.io/otel/sdk/trace -timeout=30m | tee ../private/new.txt benchstat ../private/base.txt ../private/new.txt ``` Results: ``` goos: darwin goarch: arm64 pkg: go.opentelemetry.io/otel/sdk/trace cpu: Apple M2 Max │ ../private/base.txt │ ../private/new.txt │ │ sec/op │ sec/op vs base │ Truncate/Unlimited-12 0.2086n ± 1% 0.2140n ± 3% +2.59% (p=0.017 n=10) Truncate/Zero-12 0.3048n ± 0% 0.3070n ± 1% +0.71% (p=0.014 n=10) Truncate/Short-12 0.2083n ± 2% 0.2097n ± 1% ~ (p=0.148 n=10) Truncate/ASCII-12 0.6870n ± 0% 0.6855n ± 0% ~ (p=0.493 n=10) Truncate/ValidUTF-8-12 1.298n ± 0% 1.302n ± 1% +0.31% (p=0.003 n=10) Truncate/InvalidUTF-8-12 9.457n ± 0% 9.420n ± 1% ~ (p=0.529 n=10) Truncate/MixedUTF-8-12 17.30n ± 1% 17.29n ± 0% ~ (p=0.359 n=10) RecordingSpanSetAttributes/WithLimit/false-12 2.055µ ± 1% 2.082µ ± 9% +1.29% (p=0.014 n=10) RecordingSpanSetAttributes/WithLimit/true-12 4.368µ ± 0% 4.364µ ± 0% -0.08% (p=0.049 n=10) SpanEnd-12 72.57n ± 17% 73.75n ± 16% ~ (p=0.853 n=10) TraceStart/with_a_simple_span-12 320.1n ± 10% 314.6n ± 10% ~ (p=0.165 n=10) TraceStart/with_several_links-12 432.7n ± 2% 429.4n ± 1% ~ (p=0.063 n=10) TraceStart/with_attributes-12 477.3n ± 1% 468.1n ± 6% -1.94% (p=0.005 n=10) SpanLimits/AttributeValueLengthLimit-12 4.401µ ± 1% 4.439µ ± 2% ~ (p=0.089 n=10) SpanLimits/AttributeCountLimit-12 4.125µ ± 1% 4.151µ ± 1% +0.62% (p=0.014 n=10) SpanLimits/EventCountLimit-12 3.900µ ± 2% 3.935µ ± 1% +0.88% (p=0.023 n=10) SpanLimits/LinkCountLimit-12 3.870µ ± 2% 3.901µ ± 1% ~ (p=0.148 n=10) SpanLimits/AttributePerEventCountLimit-12 4.212µ ± 1% 4.243µ ± 1% +0.75% (p=0.008 n=10) SpanLimits/AttributePerLinkCountLimit-12 4.200µ ± 1% 4.224µ ± 0% +0.57% (p=0.041 n=10) SpanSetAttributesOverCapacity-12 1.661µ ± 1% 1.653µ ± 0% -0.48% (p=0.049 n=10) StartEndSpan/AlwaysSample-12 317.9n ± 0% 316.5n ± 0% -0.44% (p=0.007 n=10) StartEndSpan/NeverSample-12 152.3n ± 0% 152.0n ± 0% -0.23% (p=0.005 n=10) SpanWithAttributes_4/AlwaysSample-12 527.2n ± 0% 532.4n ± 1% +1.00% (p=0.000 n=10) SpanWithAttributes_4/NeverSample-12 240.6n ± 0% 241.6n ± 0% +0.46% (p=0.000 n=10) SpanWithAttributes_8/AlwaysSample-12 704.5n ± 0% 718.3n ± 1% +1.97% (p=0.000 n=10) SpanWithAttributes_8/NeverSample-12 325.0n ± 0% 327.2n ± 1% +0.68% (p=0.000 n=10) SpanWithAttributes_all/AlwaysSample-12 576.8n ± 0% 584.7n ± 1% +1.37% (p=0.000 n=10) SpanWithAttributes_all/NeverSample-12 264.6n ± 1% 263.3n ± 0% -0.47% (p=0.045 n=10) SpanWithAttributes_all_2x/AlwaysSample-12 818.6n ± 1% 834.9n ± 0% +1.98% (p=0.000 n=10) SpanWithAttributes_all_2x/NeverSample-12 378.3n ± 0% 382.9n ± 1% +1.23% (p=0.000 n=10) SpanWithEvents_4/AlwaysSample-12 715.1n ± 1% 721.1n ± 0% +0.83% (p=0.003 n=10) SpanWithEvents_4/NeverSample-12 156.1n ± 1% 155.1n ± 1% -0.64% (p=0.002 n=10) SpanWithEvents_8/AlwaysSample-12 1.098µ ± 0% 1.104µ ± 0% +0.55% (p=0.000 n=10) SpanWithEvents_8/NeverSample-12 158.8n ± 0% 158.6n ± 1% ~ (p=0.288 n=10) SpanWithEvents_WithStackTrace/AlwaysSample-12 438.8n ± 0% 438.5n ± 0% ~ (p=0.868 n=10) SpanWithEvents_WithStackTrace/NeverSample-12 168.2n ± 1% 167.4n ± 1% -0.48% (p=0.014 n=10) SpanWithEvents_WithTimestamp/AlwaysSample-12 430.6n ± 0% 432.9n ± 0% +0.53% (p=0.001 n=10) SpanWithEvents_WithTimestamp/NeverSample-12 193.7n ± 0% 190.1n ± 1% -1.91% (p=0.000 n=10) TraceID_DotString-12 42.37n ± 0% 24.80n ± 0% -41.45% (p=0.000 n=10) SpanID_DotString-12 31.30n ± 0% 17.22n ± 0% -44.98% (p=0.000 n=10) SpanProcessorOnEnd/batch:_10,_spans:_10-12 163.3n ± 0% 163.4n ± 0% ~ (p=0.120 n=10) SpanProcessorOnEnd/batch:_10,_spans:_100-12 1.639µ ± 0% 1.635µ ± 0% -0.27% (p=0.000 n=10) SpanProcessorOnEnd/batch:_100,_spans:_10-12 163.3n ± 0% 163.2n ± 0% ~ (p=0.115 n=10) SpanProcessorOnEnd/batch:_100,_spans:_100-12 1.636µ ± 0% 1.635µ ± 1% ~ (p=0.509 n=10) SpanProcessorVerboseLogging-12 6.769µ ± 2% 6.600µ ± 2% -2.49% (p=0.030 n=10) geomean 221.4n 216.4n -2.29% │ ../private/base.txt │ ../private/new.txt │ │ B/op │ B/op vs base │ Truncate/Unlimited-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Truncate/Zero-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Truncate/Short-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Truncate/ASCII-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Truncate/ValidUTF-8-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Truncate/InvalidUTF-8-12 16.00 ± 0% 16.00 ± 0% ~ (p=1.000 n=10) ¹ Truncate/MixedUTF-8-12 32.00 ± 0% 32.00 ± 0% ~ (p=1.000 n=10) ¹ RecordingSpanSetAttributes/WithLimit/false-12 6.891Ki ± 0% 6.891Ki ± 0% ~ (p=1.000 n=10) ¹ RecordingSpanSetAttributes/WithLimit/true-12 7.023Ki ± 0% 7.023Ki ± 0% ~ (p=1.000 n=10) ¹ SpanEnd-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ TraceStart/with_a_simple_span-12 528.0 ± 0% 528.0 ± 0% ~ (p=1.000 n=10) ¹ TraceStart/with_several_links-12 704.0 ± 0% 704.0 ± 0% ~ (p=1.000 n=10) ¹ TraceStart/with_attributes-12 784.0 ± 0% 784.0 ± 0% ~ (p=1.000 n=10) ¹ SpanLimits/AttributeValueLengthLimit-12 10.56Ki ± 0% 10.56Ki ± 0% ~ (p=1.000 n=10) ¹ SpanLimits/AttributeCountLimit-12 9.844Ki ± 0% 9.844Ki ± 0% ~ (p=1.000 n=10) ¹ SpanLimits/EventCountLimit-12 9.422Ki ± 0% 9.422Ki ± 0% ~ (p=1.000 n=10) ¹ SpanLimits/LinkCountLimit-12 9.031Ki ± 0% 9.031Ki ± 0% ~ (p=1.000 n=10) ¹ SpanLimits/AttributePerEventCountLimit-12 10.47Ki ± 0% 10.47Ki ± 0% ~ (p=1.000 n=10) ¹ SpanLimits/AttributePerLinkCountLimit-12 10.47Ki ± 0% 10.47Ki ± 0% ~ (p=1.000 n=10) ¹ SpanSetAttributesOverCapacity-12 592.0 ± 0% 592.0 ± 0% ~ (p=1.000 n=10) ¹ StartEndSpan/AlwaysSample-12 528.0 ± 0% 528.0 ± 0% ~ (p=1.000 n=10) ¹ StartEndSpan/NeverSample-12 144.0 ± 0% 144.0 ± 0% ~ (p=1.000 n=10) ¹ SpanWithAttributes_4/AlwaysSample-12 1.016Ki ± 0% 1.016Ki ± 0% ~ (p=1.000 n=10) ¹ SpanWithAttributes_4/NeverSample-12 400.0 ± 0% 400.0 ± 0% ~ (p=1.000 n=10) ¹ SpanWithAttributes_8/AlwaysSample-12 1.516Ki ± 0% 1.516Ki ± 0% ~ (p=1.000 n=10) ¹ SpanWithAttributes_8/NeverSample-12 656.0 ± 0% 656.0 ± 0% ~ (p=1.000 n=10) ¹ SpanWithAttributes_all/AlwaysSample-12 1.141Ki ± 0% 1.141Ki ± 0% ~ (p=1.000 n=10) ¹ SpanWithAttributes_all/NeverSample-12 464.0 ± 0% 464.0 ± 0% ~ (p=1.000 n=10) ¹ SpanWithAttributes_all_2x/AlwaysSample-12 1.891Ki ± 0% 1.891Ki ± 0% ~ (p=1.000 n=10) ¹ SpanWithAttributes_all_2x/NeverSample-12 848.0 ± 0% 848.0 ± 0% ~ (p=1.000 n=10) ¹ SpanWithEvents_4/AlwaysSample-12 1.016Ki ± 0% 1.016Ki ± 0% ~ (p=1.000 n=10) ¹ SpanWithEvents_4/NeverSample-12 144.0 ± 0% 144.0 ± 0% ~ (p=1.000 n=10) ¹ SpanWithEvents_8/AlwaysSample-12 1.641Ki ± 0% 1.641Ki ± 0% ~ (p=1.000 n=10) ¹ SpanWithEvents_8/NeverSample-12 144.0 ± 0% 144.0 ± 0% ~ (p=1.000 n=10) ¹ SpanWithEvents_WithStackTrace/AlwaysSample-12 624.0 ± 0% 624.0 ± 0% ~ (p=1.000 n=10) ¹ SpanWithEvents_WithStackTrace/NeverSample-12 160.0 ± 0% 160.0 ± 0% ~ (p=1.000 n=10) ¹ SpanWithEvents_WithTimestamp/AlwaysSample-12 648.0 ± 0% 648.0 ± 0% ~ (p=1.000 n=10) ¹ SpanWithEvents_WithTimestamp/NeverSample-12 184.0 ± 0% 184.0 ± 0% ~ (p=1.000 n=10) ¹ SpanProcessorOnEnd/batch:_10,_spans:_10-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ SpanProcessorOnEnd/batch:_10,_spans:_100-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ SpanProcessorOnEnd/batch:_100,_spans:_10-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ SpanProcessorOnEnd/batch:_100,_spans:_100-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ SpanProcessorVerboseLogging-12 9.547Ki ± 0% 9.547Ki ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ ../private/base.txt │ ../private/new.txt │ │ allocs/op │ allocs/op vs base │ Truncate/Unlimited-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Truncate/Zero-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Truncate/Short-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Truncate/ASCII-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Truncate/ValidUTF-8-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Truncate/InvalidUTF-8-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Truncate/MixedUTF-8-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ RecordingSpanSetAttributes/WithLimit/false-12 3.000 ± 0% 3.000 ± 0% ~ (p=1.000 n=10) ¹ RecordingSpanSetAttributes/WithLimit/true-12 10.00 ± 0% 10.00 ± 0% ~ (p=1.000 n=10) ¹ SpanEnd-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ TraceStart/with_a_simple_span-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ TraceStart/with_several_links-12 3.000 ± 0% 3.000 ± 0% ~ (p=1.000 n=10) ¹ TraceStart/with_attributes-12 4.000 ± 0% 4.000 ± 0% ~ (p=1.000 n=10) ¹ SpanLimits/AttributeValueLengthLimit-12 41.00 ± 0% 41.00 ± 0% ~ (p=1.000 n=10) ¹ SpanLimits/AttributeCountLimit-12 38.00 ± 0% 38.00 ± 0% ~ (p=1.000 n=10) ¹ SpanLimits/EventCountLimit-12 35.00 ± 0% 35.00 ± 0% ~ (p=1.000 n=10) ¹ SpanLimits/LinkCountLimit-12 35.00 ± 0% 35.00 ± 0% ~ (p=1.000 n=10) ¹ SpanLimits/AttributePerEventCountLimit-12 38.00 ± 0% 38.00 ± 0% ~ (p=1.000 n=10) ¹ SpanLimits/AttributePerLinkCountLimit-12 38.00 ± 0% 38.00 ± 0% ~ (p=1.000 n=10) ¹ SpanSetAttributesOverCapacity-12 3.000 ± 0% 3.000 ± 0% ~ (p=1.000 n=10) ¹ StartEndSpan/AlwaysSample-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ StartEndSpan/NeverSample-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ SpanWithAttributes_4/AlwaysSample-12 4.000 ± 0% 4.000 ± 0% ~ (p=1.000 n=10) ¹ SpanWithAttributes_4/NeverSample-12 3.000 ± 0% 3.000 ± 0% ~ (p=1.000 n=10) ¹ SpanWithAttributes_8/AlwaysSample-12 4.000 ± 0% 4.000 ± 0% ~ (p=1.000 n=10) ¹ SpanWithAttributes_8/NeverSample-12 3.000 ± 0% 3.000 ± 0% ~ (p=1.000 n=10) ¹ SpanWithAttributes_all/AlwaysSample-12 4.000 ± 0% 4.000 ± 0% ~ (p=1.000 n=10) ¹ SpanWithAttributes_all/NeverSample-12 3.000 ± 0% 3.000 ± 0% ~ (p=1.000 n=10) ¹ SpanWithAttributes_all_2x/AlwaysSample-12 4.000 ± 0% 4.000 ± 0% ~ (p=1.000 n=10) ¹ SpanWithAttributes_all_2x/NeverSample-12 3.000 ± 0% 3.000 ± 0% ~ (p=1.000 n=10) ¹ SpanWithEvents_4/AlwaysSample-12 5.000 ± 0% 5.000 ± 0% ~ (p=1.000 n=10) ¹ SpanWithEvents_4/NeverSample-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ SpanWithEvents_8/AlwaysSample-12 6.000 ± 0% 6.000 ± 0% ~ (p=1.000 n=10) ¹ SpanWithEvents_8/NeverSample-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ SpanWithEvents_WithStackTrace/AlwaysSample-12 4.000 ± 0% 4.000 ± 0% ~ (p=1.000 n=10) ¹ SpanWithEvents_WithStackTrace/NeverSample-12 3.000 ± 0% 3.000 ± 0% ~ (p=1.000 n=10) ¹ SpanWithEvents_WithTimestamp/AlwaysSample-12 5.000 ± 0% 5.000 ± 0% ~ (p=1.000 n=10) ¹ SpanWithEvents_WithTimestamp/NeverSample-12 4.000 ± 0% 4.000 ± 0% ~ (p=1.000 n=10) ¹ SpanProcessorOnEnd/batch:_10,_spans:_10-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ SpanProcessorOnEnd/batch:_10,_spans:_100-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ SpanProcessorOnEnd/batch:_100,_spans:_10-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ SpanProcessorOnEnd/batch:_100,_spans:_100-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ SpanProcessorVerboseLogging-12 35.00 ± 0% 35.00 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean ``` ``` goos: darwin goarch: arm64 pkg: go.opentelemetry.io/otel/sdk/trace cpu: Apple M2 Max │ ../private/base_hex.txt │ ../private/new_hex.txt │ │ sec/op │ sec/op vs base │ TraceIDFromHex-12 56.47n ± 0% 15.96n ± 0% -71.74% (p=0.000 n=10) SpanIDFromHex-12 34.680n ± 0% 8.742n ± 1% -74.79% (p=0.000 n=10) geomean 44.26n 11.81n -73.31% │ ../private/base_hex.txt │ ../private/new_hex.txt │ │ B/op │ B/op vs base │ TraceIDFromHex-12 16.00 ± 0% 0.00 ± 0% -100.00% (p=0.000 n=10) SpanIDFromHex-12 8.000 ± 0% 0.000 ± 0% -100.00% (p=0.000 n=10) geomean 11.31 ? ¹ ² ¹ summaries must be >0 to compute geomean ² ratios must be >0 to compute geomean │ ../private/base_hex.txt │ ../private/new_hex.txt │ │ allocs/op │ allocs/op vs base │ TraceIDFromHex-12 1.000 ± 0% 0.000 ± 0% -100.00% (p=0.000 n=10) SpanIDFromHex-12 1.000 ± 0% 0.000 ± 0% -100.00% (p=0.000 n=10) geomean 1.000 ? ¹ ² ¹ summaries must be >0 to compute geomean ² ratios must be >0 to compute geomean ``` Issue: open-telemetry#6721
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This version runs ~5% slower. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
|
Changed to draft in view of the test failures. |
|
I think this PR can be closed, see After reviewing the current code/main Branch, the core optimizations described in this PR as addition to #6791 and #7155 (fast hex parsing and allocation-free formatting for TraceID / SpanID, Changeog...) are already present. Fast hex parsing (reverse lookup table)
These loops already implement the optimized hex decoding logic with the reverse lookup table and invalid character checks. Allocation-free formatting for TraceID / SpanID Formatting is already implemented using fixed-size byte arrays and lookup-table encoding:
What is not present from this PR
|
Update of #6791, resolving merge conflicts and adding changelog.
I also proposed a couple more commits rearranging things to avoid duplication between 8-byte and 16-byte versions of the same code; I felt the shorter version is easier to review.
See also #7155, which adds/extends tests to add confidence.