Optimistic allocation for Count and Set metrics#308
Conversation
5b9f70b to
993d798
Compare
remeh
left a comment
There was a problem hiding this comment.
Hey @opsengine,
Thank you for the contribution, it looks good I'm merging this one in!
I'll do some internals validation as well and then work on releasing it in a new version of the lib.
|
We can only merge signed commits. Any chance you could push force a signed commit instead? |
|
@remeh thank you for merging. Can you please cut a new release?
do you still need this? |
|
@opsengine Sorry, validating internally is taking me longer than expected, but I'm on top of it this week. I'll start a new release once that's done Your commit got in, please make sure to sign next ones (I see you have an opened draft PR 🙇 !) |
|
@opsengine it's now released part of 5.8.2, thanks for the contribution. |
This PR optimizes the
count()andset()aggregator methods by moving the allocation of new metric objects outside of the critical section (write lock).Previously, the new metric structure was allocated while holding the write lock. This change instantiates the metric optimistically after releasing the read lock but before acquiring the write lock. Rationale: if the metric could not be found while holding the read lock, it's unlikely to be found after a few ns, so it will have to be created anyway. Occasionally this assumption will be incorrect and the effect will be one unnecessary allocation.
Note that this optimization is already present in
gauge()but for some reason it's missing in the other two functions.Changes
count():newCountMetricis now called before acquiringa.countsM.Lock().set():newSetMetricis now called before acquiringa.setsM.Lock().Motivation
Reduce Lock Contention under memory pressure: Memory allocation is relatively expensive compared to map insertion. By moving
newCountMetricandnewSetMetricoutside the write lock, we significantly shorten the critical section where the lock is held.How we found this
The profiler revealed lock contention in this critical section while an application was under load. The application is using datadog-go v4.8.3 but the issue would affect the latest version as well.
Mutex Contention Time

Block Time

Goroutines
