Skip to content
This repository was archived by the owner on Aug 23, 2023. It is now read-only.

ConsolidateNudge: nudge without reducing capacity of underlying slice, to keep pointslicepool effective #1923

Merged
merged 1 commit into from
Oct 15, 2020

Conversation

Dieterbe
Copy link
Contributor

@Dieterbe Dieterbe commented Oct 14, 2020

Note: this builds on top of #1922, review/merge that first.

Without this fix, you would commonly see that slices going into the pointslicepool would have a cap that is one or two points less then what you need on subsequent reads.

E.g. I added some debug lines and saw this:

metrictank-q1_1  | PSP.GetMin(2880) from github.com/grafana/metrictank/vendor/github.com/tinylib/msgp/msgp.Decode->github.com/grafana/metrictank/api/models.(*GetDataRespV1).DecodeMsg
metrictank-q1_1  | candidate has cap 2877 we want 2880
metrictank-q1_1  | PSP.GetMin(2880) from github.com/grafana/metrictank/api.(*Server).renderMetrics->github.com/grafana/metrictank/api.(*Server).executePlan
metrictank-q1_1  | candidate has cap 2877 we want 2880
metrictank-q1_1  | PSP.Put(2880) from github.com/grafana/metrictank/api.(*Server).renderMetrics->github.com/grafana/metrictank/expr.Plan.Clean
metrictank-q1_1  | PSP.Put(2877) from github.com/grafana/metrictank/api.(*Server).renderMetrics->github.com/grafana/metrictank/expr.Plan.Clean

This fix increases efficacy of the pool.

Here we can see the comparison of two query nodes q0 and q1.
q0 runs master+stats (#1922)
q1 runs the code from this branch.
We can see that as time advances, there are periods where nudging happens, which breaks down the efficacy of the pointslicepool in q0.
Q1 doesn't have this problem. We see an increased hit rate, sadly it doesn't translate into memory savings

comparison
q1:
q1
q0:
q0

Copy link
Collaborator

@shanson7 shanson7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me, but I'm wondering how effective the PSP really is. It would be nice if we could show a definitive workload that is improved in either memory usage or query execution speed by the usage of the PSP. I expect a workload with many requests for points >= DefaultPointSliceSize might see an allocation rate improvement within GC cycles.

@Dieterbe
Copy link
Contributor Author

Dieterbe commented Oct 15, 2020

I've been thinking to make the PSP configurable, and provide a few options:
"null", "traditional", "size-classes x/y/z", etc. we can then just easily try different options on real workloads

Copy link
Contributor

@robert-milan robert-milan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the other comments. I would not expect to see a memory savings, but probably reduce allocation rate / GC improvements on heavy workloads.

without this fix, you would commonly see that slices going into the
pointslicepool would have a cap that is one or two points less then
what you need on subsequent reads. This fix increases efficacy of
the pool.
@Dieterbe Dieterbe force-pushed the psp-stats-with-nudge-fix branch from d8b5ee5 to e45c4ac Compare October 15, 2020 21:24
@Dieterbe Dieterbe merged commit 2edab76 into master Oct 15, 2020
@Dieterbe Dieterbe deleted the psp-stats-with-nudge-fix branch October 15, 2020 21:25
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants