Skip to content

optimize map allocation#44613

Merged
songy23 merged 8 commits into
open-telemetry:mainfrom
leiwingqueen:prometheusremotewritereceiver-optimize-map-allocation
Dec 15, 2025
Merged

optimize map allocation#44613
songy23 merged 8 commits into
open-telemetry:mainfrom
leiwingqueen:prometheusremotewritereceiver-optimize-map-allocation

Conversation

@leiwingqueen

@leiwingqueen leiwingqueen commented Nov 28, 2025

Copy link
Copy Markdown
Contributor

Description

Summary

When adding multiple key-value pairs to a pcommon.Map using PutStr(), the underlying slice undergoes frequent reallocations and copies, leading to significant memory overhead. In production workloads with high cardinality metrics/logs, this becomes a major performance bottleneck.

Evidence

Memory profiling (pprof) shows that Map.EnsureCapacity accounts for 27.17% (0.94GB) of total memory allocation, even though it's never explicitly called in application code:

flat    flat%   sum%    cum     cum%
0.94GB  27.17%  60.05%  0.94GB  27.17%  go.opentelemetry.io/collector/pdata/pcommon.Map.EnsureCapacity
0.54GB  15.49%  75.55%  0.54GB  15.49%  go.opentelemetry.io/collector/pdata/pcommon.value.SetStr
0.41GB  11.82%  87.36%  0.42GB  12.17%  go.opentelemetry.io/collector/pdata/pcommon.Map.PutInt
Root Cause

The current PutStr implementation relies on Go's built-in append:

Link to tracking issue

Fixes #44612

Testing

old version benchmark

go test -bench BenchmarkExtractAttributes -benchmem -run Test -count=1
goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver
cpu: Apple M4 Pro
BenchmarkExtractAttributes/5-14          4895226               229.4 ns/op           496 B/op          8 allocs/op
BenchmarkExtractAttributes/20-14          574508              2035 ns/op            4584 B/op         32 allocs/op
BenchmarkExtractAttributes/100-14                  66211             18260 ns/op           20264 B/op        118 allocs/op
BenchmarkExtractAttributes/500-14                   5136            221833 ns/op          123785 B/op        526 allocs/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver        5.761s

new version benchmark

go test -bench BenchmarkExtractAttributes -benchmem -run Test -count=1
goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver
cpu: Apple M4 Pro
BenchmarkExtractAttributes/5-14          5377898               206.0 ns/op           464 B/op          7 allocs/op
BenchmarkExtractAttributes/20-14          678472              1702 ns/op            3016 B/op         27 allocs/op
BenchmarkExtractAttributes/100-14                  67134             17291 ns/op           14152 B/op        111 allocs/op
BenchmarkExtractAttributes/500-14                   5365            222215 ns/op          102697 B/op        517 allocs/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver        5.717s

Documentation

Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>
Comment thread receiver/prometheusremotewritereceiver/receiver.go Outdated
@dashpole

dashpole commented Dec 1, 2025

Copy link
Copy Markdown
Contributor

@leiwingqueen are you able to add a benchmark to demonstrate the improvement? I realize this is a fairly trivial optimization, but it would be nice to show that it is having the desired effect.

@perebaj

perebaj commented Dec 4, 2025

Copy link
Copy Markdown
Contributor

Hey @leiwingqueen, one doubt about this PR: did you face memory issues using this component in a real environment, or are you just trying to find things to improve in the code base?

@perebaj

perebaj commented Dec 4, 2025

Copy link
Copy Markdown
Contributor

could you add the changelog file? You can generate it using make chlog-new and validate if the file is well formatted using make chlog-valide

@dashpole

dashpole commented Dec 4, 2025

Copy link
Copy Markdown
Contributor

@leiwingqueen Good question. Is your profile from a benchmark, or from production data?

Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>
@leiwingqueen

Copy link
Copy Markdown
Contributor Author

@leiwingqueen Good question. Is your profile from a benchmark, or from production data?

@dashpole @perebaj My profile is from production data, but there's a difference with this component. I'm trying to create a new component cloned from this one that supports the PRW v1 protocol. However, I've found that it causes excessive memory allocations.

Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>
Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>
@leiwingqueen

Copy link
Copy Markdown
Contributor Author

@leiwingqueen are you able to add a benchmark to demonstrate the improvement? I realize this is a fairly trivial optimization, but it would be nice to show that it is having the desired effect.

@dashpole Here is benchmark result.

# run benchmark test command
go test -bench BenchmarkExtractAttributes -benchmem -run Test -count=1

Performance Benchmark Results

Overview

This PR optimizes memory allocation in extractAttributes by pre-allocating map capacity using EnsureCapacity().

Test Environment

  • CPU: Apple M4 Pro
  • OS: Darwin (macOS)
  • Arch: arm64
  • GOMAXPROCS: 14

Benchmark Results

Execution Time Comparison

Attributes withEnsure withoutEnsure Improvement
5 202.1 ns/op 222.5 ns/op 9.2% faster
20 1714 ns/op 2022 ns/op 15.2% faster
100 17236 ns/op 18193 ns/op 5.3% faster
500 221446 ns/op 222545 ns/op 0.5% faster

Memory Allocation Comparison

Attributes withEnsure withoutEnsure Memory Saved
5 464 B/op 496 B/op 6.5%
20 3016 B/op 4584 B/op 34.2%
100 14152 B/op 20264 B/op 30.2%
500 102697 B/op 123785 B/op 17.0%

Allocations Per Operation

Attributes withEnsure withoutEnsure Reduction
5 7 allocs/op 8 allocs/op 1 fewer
20 27 allocs/op 32 allocs/op 5 fewer
100 111 allocs/op 118 allocs/op 7 fewer
500 517 allocs/op 526 allocs/op 9 fewer

Key Improvements

Consistent performance gains across all test scenarios
Significant memory savings (up to 34.2% for 20 attributes)
Reduced allocation count leads to lower GC pressure
No trade-offs - improvements across all metrics

Conclusion

Pre-allocating map capacity with EnsureCapacity() provides measurable benefits:

  • Reduces memory allocations by avoiding dynamic map growth
  • Improves execution time, especially for medium-sized datasets (20-100 attributes)
  • Decreases GC pressure in high-throughput scenarios

This optimization is particularly valuable for Prometheus remote write receiver handling high-volume metric ingestion.

Raw Benchmark Output

BenchmarkExtractAttributes/5_withEnsure-14               4987830               202.1 ns/op           464 B/op          7 allocs/op
BenchmarkExtractAttributes/5_withoutEnsure-14            5433693               222.5 ns/op           496 B/op          8 allocs/op
BenchmarkExtractAttributes/20_withEnsure-14               675800              1714 ns/op            3016 B/op         27 allocs/op
BenchmarkExtractAttributes/20_withoutEnsure-14            591427              2022 ns/op            4584 B/op         32 allocs/op
BenchmarkExtractAttributes/100_withEnsure-14               70404             17236 ns/op           14152 B/op        111 allocs/op
BenchmarkExtractAttributes/100_withoutEnsure-14            64551             18193 ns/op           20264 B/op        118 allocs/op
BenchmarkExtractAttributes/500_withEnsure-14                5326            221446 ns/op          102697 B/op        517 allocs/op
BenchmarkExtractAttributes/500_withoutEnsure-14             5348            222545 ns/op          123785 B/op        526 allocs/op

@leiwingqueen

Copy link
Copy Markdown
Contributor Author

@perebaj @dashpole Could you please review this PR again? I've added benchmark results showing significant memory improvements (up to 34% reduction in allocations).


// extractAttributesNoEnsure is a copy of extractAttributes without the EnsureCapacity call.
// It intentionally avoids pre-sizing the returned pcommon.Map so we can compare allocations.
func extractAttributesNoEnsure(ls labels.Labels) pcommon.Map {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this part, as we don't need to commit the old code. Usually, I do the following for performance improvements:

  • add the benchmark in the first commit. Run it against current HEAD, and store the result in old.txt.
  • make my optimization, and re-run the benchmark and store the result in new.txt
  • paste benchstat old.txt new.txt in the PR description.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dashpole Done.

Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>

@dashpole dashpole left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a bunch!

for labelName, labelValue := range ls.Map() {
labelMap := ls.Map()
// job, instance and metric name will always become labels, but scope name and version may or may not be present
attrs.EnsureCapacity(len(labelMap) - 3)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment here explaining why 3? otherwise it seems that we just set a magic number...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that what is written above?

// job, instance and metric name will always become labels

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at a first glance on this comment, i can't conclude that 3 is related to job, instance, and metric_name

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@perebaj Done.

Comment on lines +10 to +15
// makeLabels builds a labels.Labels with the given total number of labels.
// It always includes "job", "instance", and the metric name label, plus (total-3) extra labels.
func makeLabels(total int) labels.Labels {
if total < 3 {
total = 3
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why total-3? it's because we set 3 on extractAttributes, if it was, can we add a comment explaining the why's?

}

func BenchmarkExtractAttributes(b *testing.B) {
sizes := []int{5, 20, 100, 500}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add more sizes here? I'm asking myself why the gains of this improvement are not linear...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added more test cases here.

  • old version
goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver
cpu: Apple M4 Pro
BenchmarkExtractAttributes/5-14          4900918               226.1 ns/op           496 B/op          8 allocs/op
BenchmarkExtractAttributes/20-14          569083              2118 ns/op            4584 B/op         32 allocs/op
BenchmarkExtractAttributes/100-14                  65355             18193 ns/op           20264 B/op        118 allocs/op
BenchmarkExtractAttributes/500-14                   5133            225575 ns/op          123785 B/op        526 allocs/op
BenchmarkExtractAttributes/1000-14                  1273            930589 ns/op          246555 B/op       1032 allocs/op
BenchmarkExtractAttributes/2000-14                   428           2729724 ns/op          549436 B/op       2043 allocs/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver        9.299s
  • new version
goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver
cpu: Apple M4 Pro
BenchmarkExtractAttributes/5-14          5340733               211.5 ns/op           464 B/op          7 allocs/op
BenchmarkExtractAttributes/20-14          649693              1711 ns/op            3016 B/op         27 allocs/op
BenchmarkExtractAttributes/100-14                  69399             17433 ns/op           14152 B/op        111 allocs/op
BenchmarkExtractAttributes/500-14                   5322            220042 ns/op          102697 B/op        517 allocs/op
BenchmarkExtractAttributes/1000-14                  1299            915396 ns/op          209082 B/op       1022 allocs/op
BenchmarkExtractAttributes/2000-14                   444           2722205 ns/op          421851 B/op       2031 allocs/op
PASS
ok      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver        9.052s

The non-linear memory savings (6.5% → 34% → 30% → 17% → 15% → 23%) maybe are due to Go's map growth strategy.
Exponential Growth: Go maps grow in powers of 2 (buckets: 1 → 2 → 4 → 8 → 16 → 32...)
Load Factor Trigger: Maps resize when load factor exceeds ~6.5, doubling the bucket count.

The 20 and 2000 attribute cases show the highest savings (34% and 23%) because they hit expansion thresholds.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhmmmm. Many thanks for that!

It was exactly what Dashpole shared with me on pv

@perebaj

perebaj commented Dec 12, 2025

Copy link
Copy Markdown
Contributor

One doubt

You also mentioned that:

Memory profiling (pprof) shows that Map.EnsureCapacity accounts for 27.17% (0.94GB) of total memory allocation, even though it's never explicitly called in application code:

Did try to build that code after your changes to validate if the memory profiling gets better?

@leiwingqueen

leiwingqueen commented Dec 13, 2025

Copy link
Copy Markdown
Contributor Author

One doubt

You also mentioned that:

Memory profiling (pprof) shows that Map.EnsureCapacity accounts for 27.17% (0.94GB) of total memory allocation, even though it's never explicitly called in application code:

Did try to build that code after your changes to validate if the memory profiling gets better?

@perebaj Yes, validated. Total memory allocation decreased by approximately 30%
after the optimization. Note that my version also includes optimizations
to the PRW protocol parsing logic before the Map.EnsureCapacity fix.

Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>

@ArthurSens ArthurSens left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After addressing the linting failure it LGTM, but please let's make sure @perebaj is also happy with the changes :)

Comment thread receiver/prometheusremotewritereceiver/receiver_bench_test.go
Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>
Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>
@ArthurSens ArthurSens added the ready to merge Code review completed; ready to merge by maintainers label Dec 15, 2025
@songy23 songy23 merged commit 5cf2b23 into open-telemetry:main Dec 15, 2025
205 checks passed
@github-actions github-actions Bot added this to the next release milestone Dec 15, 2025
@otelbot

otelbot Bot commented Dec 15, 2025

Copy link
Copy Markdown
Contributor

Thank you for your contribution @leiwingqueen! 🎉 We would like to hear from you about your experience contributing to OpenTelemetry by taking a few minutes to fill out this survey. If you are getting started contributing, you can also join the CNCF Slack channel #opentelemetry-new-contributors to ask for guidance and get help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready to merge Code review completed; ready to merge by maintainers receiver/prometheusremotewrite

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[receiver/prometheusremotewritereceiver] Map.PutStr causes excessive memory allocations due to repeated slice expansions

6 participants