optimize map allocation by leiwingqueen · Pull Request #44613 · open-telemetry/opentelemetry-collector-contrib

leiwingqueen · 2025-11-28T15:21:26Z

Description

Summary

When adding multiple key-value pairs to a pcommon.Map using PutStr(), the underlying slice undergoes frequent reallocations and copies, leading to significant memory overhead. In production workloads with high cardinality metrics/logs, this becomes a major performance bottleneck.

Evidence

Memory profiling (pprof) shows that Map.EnsureCapacity accounts for 27.17% (0.94GB) of total memory allocation, even though it's never explicitly called in application code:

flat    flat%   sum%    cum     cum%
0.94GB  27.17%  60.05%  0.94GB  27.17%  go.opentelemetry.io/collector/pdata/pcommon.Map.EnsureCapacity
0.54GB  15.49%  75.55%  0.54GB  15.49%  go.opentelemetry.io/collector/pdata/pcommon.value.SetStr
0.41GB  11.82%  87.36%  0.42GB  12.17%  go.opentelemetry.io/collector/pdata/pcommon.Map.PutInt

Root Cause

The current PutStr implementation relies on Go's built-in append:

Link to tracking issue

Fixes #44612

Testing

old version benchmark

go test -bench BenchmarkExtractAttributes -benchmem -run Test -count=1
goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver
cpu: Apple M4 Pro
BenchmarkExtractAttributes/5-14          4895226               229.4 ns/op           496 B/op          8 allocs/op
BenchmarkExtractAttributes/20-14          574508              2035 ns/op            4584 B/op         32 allocs/op
BenchmarkExtractAttributes/100-14                  66211             18260 ns/op           20264 B/op        118 allocs/op
BenchmarkExtractAttributes/500-14                   5136            221833 ns/op          123785 B/op        526 allocs/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver        5.761s

new version benchmark

go test -bench BenchmarkExtractAttributes -benchmem -run Test -count=1
goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver
cpu: Apple M4 Pro
BenchmarkExtractAttributes/5-14          5377898               206.0 ns/op           464 B/op          7 allocs/op
BenchmarkExtractAttributes/20-14          678472              1702 ns/op            3016 B/op         27 allocs/op
BenchmarkExtractAttributes/100-14                  67134             17291 ns/op           14152 B/op        111 allocs/op
BenchmarkExtractAttributes/500-14                   5365            222215 ns/op          102697 B/op        517 allocs/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver        5.717s

Documentation

Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>

dashpole · 2025-12-01T16:56:57Z

@leiwingqueen are you able to add a benchmark to demonstrate the improvement? I realize this is a fairly trivial optimization, but it would be nice to show that it is having the desired effect.

perebaj · 2025-12-04T09:42:54Z

Hey @leiwingqueen, one doubt about this PR: did you face memory issues using this component in a real environment, or are you just trying to find things to improve in the code base?

perebaj · 2025-12-04T11:35:39Z

could you add the changelog file? You can generate it using make chlog-new and validate if the file is well formatted using make chlog-valide

dashpole · 2025-12-04T17:04:58Z

@leiwingqueen Good question. Is your profile from a benchmark, or from production data?

Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>

leiwingqueen · 2025-12-05T16:45:14Z

@leiwingqueen Good question. Is your profile from a benchmark, or from production data?

@dashpole @perebaj My profile is from production data, but there's a difference with this component. I'm trying to create a new component cloned from this one that supports the PRW v1 protocol. However, I've found that it causes excessive memory allocations.

Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>

leiwingqueen · 2025-12-11T16:00:39Z

@leiwingqueen are you able to add a benchmark to demonstrate the improvement? I realize this is a fairly trivial optimization, but it would be nice to show that it is having the desired effect.

@dashpole Here is benchmark result.

# run benchmark test command
go test -bench BenchmarkExtractAttributes -benchmem -run Test -count=1

Performance Benchmark Results

Overview

This PR optimizes memory allocation in extractAttributes by pre-allocating map capacity using EnsureCapacity().

Test Environment

CPU: Apple M4 Pro
OS: Darwin (macOS)
Arch: arm64
GOMAXPROCS: 14

Benchmark Results

Execution Time Comparison

Attributes	withEnsure	withoutEnsure	Improvement
5	202.1 ns/op	222.5 ns/op	9.2% faster
20	1714 ns/op	2022 ns/op	15.2% faster
100	17236 ns/op	18193 ns/op	5.3% faster
500	221446 ns/op	222545 ns/op	0.5% faster

Memory Allocation Comparison

Attributes	withEnsure	withoutEnsure	Memory Saved
5	464 B/op	496 B/op	6.5%
20	3016 B/op	4584 B/op	34.2%
100	14152 B/op	20264 B/op	30.2%
500	102697 B/op	123785 B/op	17.0%

Allocations Per Operation

Attributes	withEnsure	withoutEnsure	Reduction
5	7 allocs/op	8 allocs/op	1 fewer
20	27 allocs/op	32 allocs/op	5 fewer
100	111 allocs/op	118 allocs/op	7 fewer
500	517 allocs/op	526 allocs/op	9 fewer

Key Improvements

✅ Consistent performance gains across all test scenarios
✅ Significant memory savings (up to 34.2% for 20 attributes)
✅ Reduced allocation count leads to lower GC pressure
✅ No trade-offs - improvements across all metrics

Conclusion

Pre-allocating map capacity with EnsureCapacity() provides measurable benefits:

Reduces memory allocations by avoiding dynamic map growth
Improves execution time, especially for medium-sized datasets (20-100 attributes)
Decreases GC pressure in high-throughput scenarios

This optimization is particularly valuable for Prometheus remote write receiver handling high-volume metric ingestion.

Raw Benchmark Output

BenchmarkExtractAttributes/5_withEnsure-14               4987830               202.1 ns/op           464 B/op          7 allocs/op
BenchmarkExtractAttributes/5_withoutEnsure-14            5433693               222.5 ns/op           496 B/op          8 allocs/op
BenchmarkExtractAttributes/20_withEnsure-14               675800              1714 ns/op            3016 B/op         27 allocs/op
BenchmarkExtractAttributes/20_withoutEnsure-14            591427              2022 ns/op            4584 B/op         32 allocs/op
BenchmarkExtractAttributes/100_withEnsure-14               70404             17236 ns/op           14152 B/op        111 allocs/op
BenchmarkExtractAttributes/100_withoutEnsure-14            64551             18193 ns/op           20264 B/op        118 allocs/op
BenchmarkExtractAttributes/500_withEnsure-14                5326            221446 ns/op          102697 B/op        517 allocs/op
BenchmarkExtractAttributes/500_withoutEnsure-14             5348            222545 ns/op          123785 B/op        526 allocs/op

leiwingqueen · 2025-12-11T16:06:18Z

@perebaj @dashpole Could you please review this PR again? I've added benchmark results showing significant memory improvements (up to 34% reduction in allocations).

dashpole · 2025-12-11T16:06:38Z

+
+// extractAttributesNoEnsure is a copy of extractAttributes without the EnsureCapacity call.
+// It intentionally avoids pre-sizing the returned pcommon.Map so we can compare allocations.
+func extractAttributesNoEnsure(ls labels.Labels) pcommon.Map {


Please remove this part, as we don't need to commit the old code. Usually, I do the following for performance improvements:

add the benchmark in the first commit. Run it against current HEAD, and store the result in old.txt.

make my optimization, and re-run the benchmark and store the result in new.txt

paste benchstat old.txt new.txt in the PR description.

@dashpole Done.

Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>

dashpole

Thanks a bunch!

perebaj · 2025-12-12T22:14:42Z

-	for labelName, labelValue := range ls.Map() {
+	labelMap := ls.Map()
+	// job, instance and metric name will always become labels, but scope name and version may or may not be present
+	attrs.EnsureCapacity(len(labelMap) - 3)


Can we add a comment here explaining why 3? otherwise it seems that we just set a magic number...

Isn't that what is written above?

// job, instance and metric name will always become labels

at a first glance on this comment, i can't conclude that 3 is related to job, instance, and metric_name

@perebaj Done.

perebaj · 2025-12-12T22:15:48Z

+// makeLabels builds a labels.Labels with the given total number of labels.
+// It always includes "job", "instance", and the metric name label, plus (total-3) extra labels.
+func makeLabels(total int) labels.Labels {
+	if total < 3 {
+		total = 3
+	}


Why total-3? it's because we set 3 on extractAttributes, if it was, can we add a comment explaining the why's?

perebaj · 2025-12-12T22:16:41Z

+}
+
+func BenchmarkExtractAttributes(b *testing.B) {
+	sizes := []int{5, 20, 100, 500}


Can we add more sizes here? I'm asking myself why the gains of this improvement are not linear...

I added more test cases here.

old version

goos: darwin goarch: arm64 pkg: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver cpu: Apple M4 Pro BenchmarkExtractAttributes/5-14 4900918 226.1 ns/op 496 B/op 8 allocs/op BenchmarkExtractAttributes/20-14 569083 2118 ns/op 4584 B/op 32 allocs/op BenchmarkExtractAttributes/100-14 65355 18193 ns/op 20264 B/op 118 allocs/op BenchmarkExtractAttributes/500-14 5133 225575 ns/op 123785 B/op 526 allocs/op BenchmarkExtractAttributes/1000-14 1273 930589 ns/op 246555 B/op 1032 allocs/op BenchmarkExtractAttributes/2000-14 428 2729724 ns/op 549436 B/op 2043 allocs/op PASS ok github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver 9.299s

new version

goos: darwin goarch: arm64 pkg: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver cpu: Apple M4 Pro BenchmarkExtractAttributes/5-14 5340733 211.5 ns/op 464 B/op 7 allocs/op BenchmarkExtractAttributes/20-14 649693 1711 ns/op 3016 B/op 27 allocs/op BenchmarkExtractAttributes/100-14 69399 17433 ns/op 14152 B/op 111 allocs/op BenchmarkExtractAttributes/500-14 5322 220042 ns/op 102697 B/op 517 allocs/op BenchmarkExtractAttributes/1000-14 1299 915396 ns/op 209082 B/op 1022 allocs/op BenchmarkExtractAttributes/2000-14 444 2722205 ns/op 421851 B/op 2031 allocs/op PASS ok github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver 9.052s

The non-linear memory savings (6.5% → 34% → 30% → 17% → 15% → 23%) maybe are due to Go's map growth strategy.
Exponential Growth: Go maps grow in powers of 2 (buckets: 1 → 2 → 4 → 8 → 16 → 32...)
Load Factor Trigger: Maps resize when load factor exceeds ~6.5, doubling the bucket count.

The 20 and 2000 attribute cases show the highest savings (34% and 23%) because they hit expansion thresholds.

Uhmmmm. Many thanks for that!

It was exactly what Dashpole shared with me on pv

perebaj · 2025-12-12T22:21:25Z

One doubt

You also mentioned that:

Memory profiling (pprof) shows that Map.EnsureCapacity accounts for 27.17% (0.94GB) of total memory allocation, even though it's never explicitly called in application code:

Did try to build that code after your changes to validate if the memory profiling gets better?

leiwingqueen · 2025-12-13T01:56:17Z

One doubt

You also mentioned that:

Memory profiling (pprof) shows that Map.EnsureCapacity accounts for 27.17% (0.94GB) of total memory allocation, even though it's never explicitly called in application code:

Did try to build that code after your changes to validate if the memory profiling gets better?

@perebaj Yes, validated. Total memory allocation decreased by approximately 30%
after the optimization. Note that my version also includes optimizations
to the PRW protocol parsing logic before the Map.EnsureCapacity fix.

Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>

ArthurSens

After addressing the linting failure it LGTM, but please let's make sure @perebaj is also happy with the changes :)

Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>

otelbot · 2025-12-15T15:32:05Z

Thank you for your contribution @leiwingqueen! 🎉 We would like to hear from you about your experience contributing to OpenTelemetry by taking a few minutes to fill out this survey. If you are getting started contributing, you can also join the CNCF Slack channel #opentelemetry-new-contributors to ask for guidance and get help.

optimize map allocation

e1d60d4

Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>

leiwingqueen requested review from a team, ArthurSens and dashpole as code owners November 28, 2025 15:21

github-actions Bot assigned codeboten Nov 28, 2025

github-actions Bot added the receiver/prometheusremotewrite label Nov 28, 2025

github-actions Bot requested a review from perebaj November 28, 2025 15:21

dashpole reviewed Dec 1, 2025

View reviewed changes

Comment thread receiver/prometheusremotewritereceiver/receiver.go Outdated

change capacity init size

6bc9c01

Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>

leiwingqueen added 2 commits December 11, 2025 23:00

add change log

0bcaf83

Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>

add benchmark test

2aa795a

Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>

dashpole reviewed Dec 11, 2025

View reviewed changes

fix benchmark test

c4e2e9b

Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>

dashpole approved these changes Dec 11, 2025

View reviewed changes

perebaj reviewed Dec 12, 2025

View reviewed changes

comment change

24028f3

Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>

ArthurSens approved these changes Dec 15, 2025

View reviewed changes

Comment thread receiver/prometheusremotewritereceiver/receiver_bench_test.go

leiwingqueen added 2 commits December 15, 2025 21:35

add copyright header

62031e3

Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>

add more test case

ece1ad4

Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>

perebaj approved these changes Dec 15, 2025

View reviewed changes

ArthurSens added the ready to merge Code review completed; ready to merge by maintainers label Dec 15, 2025

songy23 merged commit 5cf2b23 into open-telemetry:main Dec 15, 2025
205 checks passed

github-actions Bot added this to the next release milestone Dec 15, 2025

Conversation

leiwingqueen commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary

Evidence

Root Cause

Link to tracking issue

Testing

Documentation

Uh oh!

Uh oh!

dashpole commented Dec 1, 2025

Uh oh!

perebaj commented Dec 4, 2025

Uh oh!

perebaj commented Dec 4, 2025

Uh oh!

dashpole commented Dec 4, 2025

Uh oh!

leiwingqueen commented Dec 5, 2025

Uh oh!

leiwingqueen commented Dec 11, 2025

Performance Benchmark Results

Overview

Test Environment

Benchmark Results

Execution Time Comparison

Memory Allocation Comparison

Allocations Per Operation

Key Improvements

Conclusion

Raw Benchmark Output

Uh oh!

leiwingqueen commented Dec 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dashpole left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

perebaj commented Dec 12, 2025

Uh oh!

leiwingqueen commented Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArthurSens left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

otelbot Bot commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

leiwingqueen commented Nov 28, 2025 •

edited

Loading

leiwingqueen commented Dec 13, 2025 •

edited

Loading