Added aggregate function #1307

the-real-stiven · 2019-05-10T22:31:12Z

Native implementation of aggregate() Graphite function. (http://graphite.readthedocs.io/en/latest/functions.html#graphite.render.functions.aggregate)

Speed improvement (used mtcmptest):

---------- Native targets/aggregate.json Latencies ----------
Mean: 8.685615ms
50th percentile: 7.550184ms
95th percentile: 16.368781ms
99th percentile: 22.609535ms
Max: 33.192157ms
Success: 100%
Errors: []

---------- Graphite (Python) targets/aggregate.json Latencies ----------
Mean: 27.983191ms
50th percentile: 26.544092ms
95th percentile: 40.984074ms
99th percentile: 56.065418ms
Max: 92.55713ms

Success: 100%
Errors: []

---------- Speed Improvement ----------
Mean: x3.2
50th percentile: x3.5
95th percentile: x2.5
99th percentile: x2.5
Max: x2.8

DanCech · 2019-05-11T01:10:31Z

Nice work!

DanCech · 2019-05-11T01:22:34Z

There are a few things I see that this doesn't (yet) account for:

Graphite handles the case where the series being aggregated have different intervals, which it does by running the seriesList through normalize
When computing last graphite returns the last non-null value for each interval, this implementation always returns the value of the last series in the list.
Graphite supports an xFilesFactor parameter that specifies the proportion of values for each interval that can be null before the aggregated value for that interval should be considered null.

https://github.com/graphite-project/graphite-web/blob/master/webapp/graphite/render/functions.py#L294

the-real-stiven · 2019-05-12T23:20:41Z

Hi Dan,

I believe the series get normalized at fetch time (point 3 of consolidateBy https://github.com/grafana/metrictank/blob/master/devdocs/expr.md)
fixed
xFilesfactor is currently not supported by any mt functions (check https://github.com/grafana/metrictank/blob/master/docs/graphite.md). This would have to be a whole another feature.

DanCech · 2019-05-13T15:36:59Z

Yeah, so right now it mostly happens in alignRequests, though I'm not sure whether it would work properly if we do something like grouping raw series with summarized series etc. The problem is that if there is a situation where series don't have the same number of points we can easily try to access nonexistent indexes. I'm also not sure how we handle series with different start times...
Cool, though we do have the same issues as above
Ok, seems like we should have a separate field to track whether a function is 100% graphite-compatible and if so, with which graphite version (eg without xff support this could be 100% compatible with the various functions that are aliased to aggregate before 1.1.0, but it can't be 100% compatible with aggregate since the initial implementation supported xff.

shanson7 · 2019-05-13T15:46:03Z

To Dan's point, if you did something like:

target=sum(summarize(seriesByTag('name=series1'), '10min', 'sum', false), seriesByTag('name=series1'))

You get PANIC: runtime error: index out of range from MT. So, this corner case already exists.

Dieterbe · 2019-05-13T16:18:22Z

re 1, i think this is a separate bug and outside of scope of this function

re 3:

Ok, seems like we should have a separate field to track whether a function is 100% graphite-compatible and if so, with which graphite version (eg without xff support this could be 100% compatible with the various functions that are aliased to aggregate before 1.1.0, but it can't be 100% compatible with aggregate since the initial implementation supported xff.

i've thought about something similar, and that is just to target an older version , so that we can still have stable to mean graphite compatible; albeit with an older - specific - version of graphite. like maybe our target shouldn't be "current graphite", but "graphite as it was xx months ago", this may allow us to not implement all of the functions, if some of them have been implemented very recently.

However, let's try to avoid this for now and let's just see how far we get. with the xFilesfactor in particular, seems we can just add this fairly easily and be done with it.

the-real-stiven · 2019-05-17T20:30:24Z

ok, I added support for xff through function parameter. I also made aggregate more usable by other functions so that it respects xff set through settings (which still needs to be implemented) and sets the Target/ QueryPatt correctly

added helper function for combining query patterns

remove xff vals after computing agg function fixed tests

Dieterbe · 2019-05-27T09:06:20Z

expr/seriesaggregators.go

+	case "last", "current":
+		return crossSeriesLast
+	case "count":
+		return crossSeriesCount


good additions. they weren't documented in graphite either so i filed a pr for graphite docs.
https://github.com/graphite-project/graphite-web/pull/2451/files

seems we are missing avg_zero

added avg_zero!

Dieterbe · 2019-05-27T09:37:12Z

a few remarks regarding xFilesFactor:

i didn't dig into our expr parsing code but i presume the ArgFloat for xFilesFactor will default to 0? so both for aggregate as well as the concrete ones like average(). is this a problem or am i wrong?
instead of first aggregating all aggregates, and then changing some of them to NaN per xFF, why don't we add xFF support to all aggregate functions directly? seems this would result in cleaner/simpler, more performant code. we just have to make sure that for invocations that don't support xFF (such as calling average() through the graphite api) it behaves properly

the-real-stiven · 2019-05-27T20:27:15Z

Yes, xff defaults to 0 which is effectively not having the parameter (same as graphite functionality) (it translates to "at least 0% of values have to be not None")
The reasoning behind that was because xff checks the % of None values at any given time between all series (i.e. cross series). However, some of the functions iterate over the series rather than the datapoints (crossSeriesMin, crossSeriesMax, crossSeriesSum, etc.). So implementing it in every function would result in lots of duplicated code and some messy pre-processing. When aggregate(series, "average") is called, it supports xff. When average() is called, xff defaults to 0 (or, in the future, whatever is set in settings), which is the graphite behavior (https://github.com/graphite-project/graphite-web/blob/fe1f0d130696ed7be63b794a594bf6ea5f567b33/webapp/graphite/render/functions.py#L332 and https://github.com/graphite-project/graphite-web/blob/fe1f0d130696ed7be63b794a594bf6ea5f567b33/webapp/graphite/render/functions.py#L210). So because of that it seems cleanest to just remove unwanted values after the processing in accordance with the set xff.

Dieterbe · 2019-05-28T06:04:30Z

The reasoning behind that was because xff checks the % of None values at any given time between all series (i.e. cross series). However, some of the functions iterate over the series rather than the datapoints (crossSeriesMin, crossSeriesMax, crossSeriesSum, etc.). So implementing it in every function would result in lots of duplicated code and some messy pre-processing

I don't understand. you're saying that xff is a check for multiple datapoints across several series, but for the same timestamp, and i looked at the implementation of one of the mentioned functions - crossSeriesMin - and that's how it aggregates, across series, in buckets per timestamp.

Why can't we add xFF support to all cross series aggregation functions? i don't see how this would lead to duplicated code or messy pre-processing.

Dieterbe · 2019-05-28T11:57:13Z

Stiven Deleur [9:21 AM]
hey, just wanted to follow up on the aggregate PR discussion
So, to clarify, are you suggesting to add the xff check into each crossSeriesX function instead of having it in aggregate()?

dieter [9:22 AM]
yes

Stiven Deleur [9:24 AM]
why not just abstract it away from those functions since those functions are meant for handling actual aggregations and not for processing additional parameters such as xff?

dieter [9:25 AM]
because xff is a variable that ties in very closely to the aggregation logic, it controls how each aggregation should work. it's not some separate concern

Stiven Deleur [9:27 AM]
my reasoning was that xff works exactly the same for each aggregation function (if too many nulls at a certain timestamp then the value becomes null) so it would be bit redundant to add that logic to every function

dieter [9:29 AM]
i understand, but i care more about performance then about DRY
so the suggestion to move xff logic into the crossSeries functions, and not as separate processing step, is with the goal of increasing performance.

Stiven Deleur [9:30 AM]
got it ok
I’ll see how I can do that with minimal additional allocations

dieter [9:32 AM]
well, we could also just leave this for later. maybe it's better you spend your limited time on doing more functions, rather than optimizing code

Stiven Deleur [9:34 AM]
I think what I’ll do is change the crossSeriesX functions to only aggregate at a certain timestamp and then add a wrapper function that loops over the whole series and checks xff before aggregating

should kill two birds with one stone in terms of performance and DRY
I’d rather get this right since a lot of the functions will be using aggregate!

dieter [9:36 AM]
the current crossSeries functions were recently refactored for performance
see #1164
so i would be very cautious with splitting it up. instead we could just introduce the xff checks inline into the function
either that or just leave things as is
actually, let's just leave it the way you have it now
if performance becomes an issue, we can look at it again

Stiven Deleur [9:39 AM]
So I think the way he did that was by looping over the series instead of each timestamp
but I need to iterate over each time stamp to figure out if there are enough nulls

dieter [9:40 AM]
yes, so let's leave what you have

Stiven Deleur [9:40 AM]
ok sounds good!
(that’s what I meant by unnecessary preprocessing)
In that case is the PR good to go?

dieter [9:44 AM]
i think so, but i need to look at it a bit more, i haven't reviewed in depth yet, but i'm also not sure how in-depth i should review. like when it comes to things like setting the right tags, and the behavior of xff and of each new processing function you introduce, i can probably assume you followed graphite quite literally and did it correctly

Stiven Deleur [9:45 AM]
Yes, I follow graphite as closely as I can and just adjust things that I see could be optimized or just don’t work with go
Plus, I run the comparison tests for a bunch of queries to make sure that tags and basic processing is the same
But do feel free to look into it further, I definitely don’t want to introduce any bugs!
Plus, its always weird replicating some of graphite behavior that doesn’t quite make sense (like with INF in the other PR)

Stiven Deleur [9:55 AM]
Let me actually go ahead and run some benchmark tests tomorrow on this PR to make sure that xff is not hurting the performance too much. I’ll comment with the diff once I do that

Dieterbe · 2019-05-28T12:10:33Z

expr/xfilesfactor.go

+	return xff(nonNull, len(in), xFilesFactor)
+}
+
+func xff(nonNull int, total int, xFilesFactor float64) bool {


please add a comment for both of these functions to explain the purpose

Dieterbe · 2019-05-28T12:11:16Z

expr/seriesaggregators.go

@@ -64,6 +70,28 @@ func crossSeriesAvg(in []models.Series, out *[]schema.Point) {
 	}
 }

+func crossSeriesAvgZero(in []models.Series, out *[]schema.Point) {


please add a comment explaining this function. how does it differ from avg?

stale · 2020-04-04T05:56:02Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

the-real-stiven added 14 commits May 20, 2019 19:29

added aggregate function

05e7b6c

added tags

612375d

added last and count aggregator functions

7fc7695

added tests

c0f70e0

added aggregate function aliases

4af1e91

made function stable and edited doc

db31c30

when somputing last, return the last non-null val

415fffc

fixed test case

dd75922

added xFilesfactor functions

b1b4157

added xff check to aggregate

7ef50b3

added tests for aggregate with xff

580d661

split aggregate function so that it can be used byy other functions;

1a632f8

added helper function for combining query patterns

fixed xff func

beaaa71

remove xff vals after computing agg function fixed tests

removed unused function

ad691de

the-real-stiven force-pushed the aggregate branch from 69eff87 to ad691de Compare May 21, 2019 02:32

Dieterbe reviewed May 27, 2019

View reviewed changes

added avg_zero

39a0a00

Dieterbe reviewed May 28, 2019

View reviewed changes

agao48 mentioned this pull request Mar 26, 2020

Add aggregate function bloomberg/metrictank#81

Closed

stale bot added the stale label Apr 4, 2020

agao48 mentioned this pull request Apr 6, 2020

Add aggregate function #1751

Closed

stale bot closed this Apr 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added aggregate function #1307

Added aggregate function #1307

the-real-stiven commented May 10, 2019

DanCech commented May 11, 2019

DanCech commented May 11, 2019

the-real-stiven commented May 12, 2019

DanCech commented May 13, 2019

shanson7 commented May 13, 2019

Dieterbe commented May 13, 2019 •

edited

Loading

the-real-stiven commented May 17, 2019

Dieterbe May 27, 2019

Dieterbe May 27, 2019

the-real-stiven May 27, 2019

Dieterbe commented May 27, 2019

the-real-stiven commented May 27, 2019 •

edited

Loading

Dieterbe commented May 28, 2019

Dieterbe commented May 28, 2019

Dieterbe May 28, 2019

Dieterbe May 28, 2019

stale bot commented Apr 4, 2020

Added aggregate function #1307

Added aggregate function #1307

Conversation

the-real-stiven commented May 10, 2019

DanCech commented May 11, 2019

DanCech commented May 11, 2019

the-real-stiven commented May 12, 2019

DanCech commented May 13, 2019

shanson7 commented May 13, 2019

Dieterbe commented May 13, 2019 • edited Loading

the-real-stiven commented May 17, 2019

Dieterbe May 27, 2019

Choose a reason for hiding this comment

Dieterbe May 27, 2019

Choose a reason for hiding this comment

the-real-stiven May 27, 2019

Choose a reason for hiding this comment

Dieterbe commented May 27, 2019

the-real-stiven commented May 27, 2019 • edited Loading

Dieterbe commented May 28, 2019

Dieterbe commented May 28, 2019

Dieterbe May 28, 2019

Choose a reason for hiding this comment

Dieterbe May 28, 2019

Choose a reason for hiding this comment

stale bot commented Apr 4, 2020

Dieterbe commented May 13, 2019 •

edited

Loading

the-real-stiven commented May 27, 2019 •

edited

Loading