-
Notifications
You must be signed in to change notification settings - Fork 107
groupByTags Performance improvements + fix setting consolidator per group #1165
Conversation
After deploying I found an issue with simply using the 'name' tag. The |
|
||
for t, tag := range tagValues { | ||
tags[tag] = strconv.Itoa(t) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm tags
was never used and the compiler never complained? that's strange
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was "used" in that it was populated. It was never used after that though. Enough to satisfy the compiler!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, i've run into this a few times before :/
the before we tested every combo of numseries 1/10/100/1000 and input series generation via 2x test.RandFloats10k / test.RandFloats10k,test.RandFloatsWithNulls10k / test.RandFloatsWithNulls10k,test.RandFloatsWithNulls10k why do we now benchmark fewer amounts of series and none of the RandFloatsWithNulls variants? i suppose the reason for the latter is because we don't want to conflate performance of the aggregator with that of GBT, but if that's the case, why do we still test different numbers of datapoints per series? I suppose different numbers of points per series could be useful to see the effect of the data memory use on that of the metadata stuff that GBT is doing. But if anything the number of input series seems like a parameter we should explore more, like we used to |
Yeah, at this point I was still trying to isolate the performance issues I was seeing with the aggregation. I think the two variables relevant to GBT are number of input series and output series. Maybe whether or not to include 'name'. |
@@ -59,7 +62,7 @@ func (s *FuncGroupByTags) Exec(cache map[Req][]models.Series) ([]models.Series, | |||
if !useName { | |||
// if all series have the same name, name becomes one of our tags | |||
for _, serie := range series { | |||
thisName := strings.Split(serie.Target, ";")[0] | |||
thisName := serie.Tags["name"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right on.
it is probably not clear, but it looks like we do try to assure the name tag is always set for any series (FuncGet sets it initially and any other function that creates new series should set it as well I think)
for name, groupSeries := range groups { | ||
cons, queryCons := summarizeCons(series) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems incorrect. it's possible for input series to have different consolidators defined, or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, but I think that's true with many of the aggregation functions and it seems first come first serve.
In general, I think grouping here would be very awkward with mixing consolidators and the user is likely doing something...odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually the code in master also seems wrong.
seems to me the most correct way is for each group to do cons, queryCons := summarizeCons(groupSeries)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is more correct, though in the case that the groups are mixed, there is still an arbitrary consolidator chosen. But when 'name' is one of the groupByTags, I could this working better
this PR needs some history rewriting.
|
I rebased to be (I believe) what you were asking for, though the commit order shown here is different than I get with |
Regarding the benchmarks, what do you think of: input series: 1, 10, 100, 10k, 100k This would result in 15 benchmarks. So as to not overly test the series aggregations, I could use a small number of datapoints, like 100. |
yes, but let's also include 1k. |
For both input and output? That would bump the number of benchmarks to 21. Should be fine |
yeah, i guess. |
Add another test function Rework benchmarks
Remove re-allocation optimization
unlike usual, i didn't bother to reboot my laptop into benchmark mode, as you already posted pretty good results. but i wanted to run them anyway with the latest changes. my observations:
|
Use pre-parsed tag for name, only get the summarizeCons once.
Benchmarks for just this set of changes
Combined with #1164 and #1158 :