optimize map allocation#44613
Conversation
Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>
|
@leiwingqueen are you able to add a benchmark to demonstrate the improvement? I realize this is a fairly trivial optimization, but it would be nice to show that it is having the desired effect. |
|
Hey @leiwingqueen, one doubt about this PR: did you face memory issues using this component in a real environment, or are you just trying to find things to improve in the code base? |
|
could you add the changelog file? You can generate it using |
|
@leiwingqueen Good question. Is your profile from a benchmark, or from production data? |
Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>
@dashpole @perebaj My profile is from production data, but there's a difference with this component. I'm trying to create a new component cloned from this one that supports the PRW v1 protocol. However, I've found that it causes excessive memory allocations. |
Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>
Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>
@dashpole Here is benchmark result. # run benchmark test command
go test -bench BenchmarkExtractAttributes -benchmem -run Test -count=1Performance Benchmark ResultsOverviewThis PR optimizes memory allocation in Test Environment
Benchmark ResultsExecution Time Comparison
Memory Allocation Comparison
Allocations Per Operation
Key Improvements✅ Consistent performance gains across all test scenarios ConclusionPre-allocating map capacity with
This optimization is particularly valuable for Prometheus remote write receiver handling high-volume metric ingestion. Raw Benchmark Output |
|
|
||
| // extractAttributesNoEnsure is a copy of extractAttributes without the EnsureCapacity call. | ||
| // It intentionally avoids pre-sizing the returned pcommon.Map so we can compare allocations. | ||
| func extractAttributesNoEnsure(ls labels.Labels) pcommon.Map { |
There was a problem hiding this comment.
Please remove this part, as we don't need to commit the old code. Usually, I do the following for performance improvements:
- add the benchmark in the first commit. Run it against current HEAD, and store the result in old.txt.
- make my optimization, and re-run the benchmark and store the result in new.txt
- paste
benchstat old.txt new.txtin the PR description.
Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>
| for labelName, labelValue := range ls.Map() { | ||
| labelMap := ls.Map() | ||
| // job, instance and metric name will always become labels, but scope name and version may or may not be present | ||
| attrs.EnsureCapacity(len(labelMap) - 3) |
There was a problem hiding this comment.
Can we add a comment here explaining why 3? otherwise it seems that we just set a magic number...
There was a problem hiding this comment.
Isn't that what is written above?
// job, instance and metric name will always become labels
There was a problem hiding this comment.
at a first glance on this comment, i can't conclude that 3 is related to job, instance, and metric_name
| // makeLabels builds a labels.Labels with the given total number of labels. | ||
| // It always includes "job", "instance", and the metric name label, plus (total-3) extra labels. | ||
| func makeLabels(total int) labels.Labels { | ||
| if total < 3 { | ||
| total = 3 | ||
| } |
There was a problem hiding this comment.
Why total-3? it's because we set 3 on extractAttributes, if it was, can we add a comment explaining the why's?
| } | ||
|
|
||
| func BenchmarkExtractAttributes(b *testing.B) { | ||
| sizes := []int{5, 20, 100, 500} |
There was a problem hiding this comment.
Can we add more sizes here? I'm asking myself why the gains of this improvement are not linear...
There was a problem hiding this comment.
I added more test cases here.
- old version
goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver
cpu: Apple M4 Pro
BenchmarkExtractAttributes/5-14 4900918 226.1 ns/op 496 B/op 8 allocs/op
BenchmarkExtractAttributes/20-14 569083 2118 ns/op 4584 B/op 32 allocs/op
BenchmarkExtractAttributes/100-14 65355 18193 ns/op 20264 B/op 118 allocs/op
BenchmarkExtractAttributes/500-14 5133 225575 ns/op 123785 B/op 526 allocs/op
BenchmarkExtractAttributes/1000-14 1273 930589 ns/op 246555 B/op 1032 allocs/op
BenchmarkExtractAttributes/2000-14 428 2729724 ns/op 549436 B/op 2043 allocs/op
PASS
ok github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver 9.299s- new version
goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver
cpu: Apple M4 Pro
BenchmarkExtractAttributes/5-14 5340733 211.5 ns/op 464 B/op 7 allocs/op
BenchmarkExtractAttributes/20-14 649693 1711 ns/op 3016 B/op 27 allocs/op
BenchmarkExtractAttributes/100-14 69399 17433 ns/op 14152 B/op 111 allocs/op
BenchmarkExtractAttributes/500-14 5322 220042 ns/op 102697 B/op 517 allocs/op
BenchmarkExtractAttributes/1000-14 1299 915396 ns/op 209082 B/op 1022 allocs/op
BenchmarkExtractAttributes/2000-14 444 2722205 ns/op 421851 B/op 2031 allocs/op
PASS
ok github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver 9.052sThe non-linear memory savings (6.5% → 34% → 30% → 17% → 15% → 23%) maybe are due to Go's map growth strategy.
Exponential Growth: Go maps grow in powers of 2 (buckets: 1 → 2 → 4 → 8 → 16 → 32...)
Load Factor Trigger: Maps resize when load factor exceeds ~6.5, doubling the bucket count.
The 20 and 2000 attribute cases show the highest savings (34% and 23%) because they hit expansion thresholds.
There was a problem hiding this comment.
Uhmmmm. Many thanks for that!
It was exactly what Dashpole shared with me on pv
|
One doubt You also mentioned that:
Did try to build that code after your changes to validate if the memory profiling gets better? |
@perebaj Yes, validated. Total memory allocation decreased by approximately 30% |
Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>
ArthurSens
left a comment
There was a problem hiding this comment.
After addressing the linting failure it LGTM, but please let's make sure @perebaj is also happy with the changes :)
Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>
Signed-off-by: leiwingqueen <leiwingqueen@gmail.com>
|
Thank you for your contribution @leiwingqueen! 🎉 We would like to hear from you about your experience contributing to OpenTelemetry by taking a few minutes to fill out this survey. If you are getting started contributing, you can also join the CNCF Slack channel #opentelemetry-new-contributors to ask for guidance and get help. |
Description
Summary
When adding multiple key-value pairs to a pcommon.Map using PutStr(), the underlying slice undergoes frequent reallocations and copies, leading to significant memory overhead. In production workloads with high cardinality metrics/logs, this becomes a major performance bottleneck.
Evidence
Memory profiling (pprof) shows that Map.EnsureCapacity accounts for 27.17% (0.94GB) of total memory allocation, even though it's never explicitly called in application code:
Root Cause
The current PutStr implementation relies on Go's built-in append:
Link to tracking issue
Fixes #44612
Testing
old version benchmark
go test -bench BenchmarkExtractAttributes -benchmem -run Test -count=1 goos: darwin goarch: arm64 pkg: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver cpu: Apple M4 Pro BenchmarkExtractAttributes/5-14 4895226 229.4 ns/op 496 B/op 8 allocs/op BenchmarkExtractAttributes/20-14 574508 2035 ns/op 4584 B/op 32 allocs/op BenchmarkExtractAttributes/100-14 66211 18260 ns/op 20264 B/op 118 allocs/op BenchmarkExtractAttributes/500-14 5136 221833 ns/op 123785 B/op 526 allocs/op PASS ok github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver 5.761snew version benchmark
go test -bench BenchmarkExtractAttributes -benchmem -run Test -count=1 goos: darwin goarch: arm64 pkg: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver cpu: Apple M4 Pro BenchmarkExtractAttributes/5-14 5377898 206.0 ns/op 464 B/op 7 allocs/op BenchmarkExtractAttributes/20-14 678472 1702 ns/op 3016 B/op 27 allocs/op BenchmarkExtractAttributes/100-14 67134 17291 ns/op 14152 B/op 111 allocs/op BenchmarkExtractAttributes/500-14 5365 222215 ns/op 102697 B/op 517 allocs/op PASS ok github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver 5.717sDocumentation