Skip to content
This repository was archived by the owner on Aug 23, 2023. It is now read-only.

Added aggregate function #1307

Closed
wants to merge 15 commits into from

Conversation

the-real-stiven
Copy link
Contributor

Native implementation of aggregate() Graphite function. (http://graphite.readthedocs.io/en/latest/functions.html#graphite.render.functions.aggregate)

Speed improvement (used mtcmptest):

---------- Native targets/aggregate.json Latencies ----------
Mean: 8.685615ms
50th percentile: 7.550184ms
95th percentile: 16.368781ms
99th percentile: 22.609535ms
Max: 33.192157ms
Success: 100%
Errors: []

---------- Graphite (Python) targets/aggregate.json Latencies ----------
Mean: 27.983191ms
50th percentile: 26.544092ms
95th percentile: 40.984074ms
99th percentile: 56.065418ms
Max: 92.55713ms

Success: 100%
Errors: []

---------- Speed Improvement ----------
Mean: x3.2
50th percentile: x3.5
95th percentile: x2.5
99th percentile: x2.5
Max: x2.8

@DanCech
Copy link
Contributor

DanCech commented May 11, 2019

Nice work!

@DanCech
Copy link
Contributor

DanCech commented May 11, 2019

There are a few things I see that this doesn't (yet) account for:

  1. Graphite handles the case where the series being aggregated have different intervals, which it does by running the seriesList through normalize

  2. When computing last graphite returns the last non-null value for each interval, this implementation always returns the value of the last series in the list.

  3. Graphite supports an xFilesFactor parameter that specifies the proportion of values for each interval that can be null before the aggregated value for that interval should be considered null.

https://github.com/graphite-project/graphite-web/blob/master/webapp/graphite/render/functions.py#L294

@the-real-stiven
Copy link
Contributor Author

Hi Dan,

  1. I believe the series get normalized at fetch time (point 3 of consolidateBy https://github.com/grafana/metrictank/blob/master/devdocs/expr.md)
  2. fixed
  3. xFilesfactor is currently not supported by any mt functions (check https://github.com/grafana/metrictank/blob/master/docs/graphite.md). This would have to be a whole another feature.

@DanCech
Copy link
Contributor

DanCech commented May 13, 2019

  1. Yeah, so right now it mostly happens in alignRequests, though I'm not sure whether it would work properly if we do something like grouping raw series with summarized series etc. The problem is that if there is a situation where series don't have the same number of points we can easily try to access nonexistent indexes. I'm also not sure how we handle series with different start times...
  2. Cool, though we do have the same issues as above
  3. Ok, seems like we should have a separate field to track whether a function is 100% graphite-compatible and if so, with which graphite version (eg without xff support this could be 100% compatible with the various functions that are aliased to aggregate before 1.1.0, but it can't be 100% compatible with aggregate since the initial implementation supported xff.

@shanson7
Copy link
Collaborator

To Dan's point, if you did something like:

target=sum(summarize(seriesByTag('name=series1'), '10min', 'sum', false), seriesByTag('name=series1'))

You get PANIC: runtime error: index out of range from MT. So, this corner case already exists.

@Dieterbe
Copy link
Contributor

Dieterbe commented May 13, 2019

re 1, i think this is a separate bug and outside of scope of this function

re 3:

Ok, seems like we should have a separate field to track whether a function is 100% graphite-compatible and if so, with which graphite version (eg without xff support this could be 100% compatible with the various functions that are aliased to aggregate before 1.1.0, but it can't be 100% compatible with aggregate since the initial implementation supported xff.

i've thought about something similar, and that is just to target an older version , so that we can still have stable to mean graphite compatible; albeit with an older - specific - version of graphite. like maybe our target shouldn't be "current graphite", but "graphite as it was xx months ago", this may allow us to not implement all of the functions, if some of them have been implemented very recently.

However, let's try to avoid this for now and let's just see how far we get. with the xFilesfactor in particular, seems we can just add this fairly easily and be done with it.

@the-real-stiven
Copy link
Contributor Author

ok, I added support for xff through function parameter. I also made aggregate more usable by other functions so that it respects xff set through settings (which still needs to be implemented) and sets the Target/ QueryPatt correctly

case "last", "current":
return crossSeriesLast
case "count":
return crossSeriesCount
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good additions. they weren't documented in graphite either so i filed a pr for graphite docs.
https://github.com/graphite-project/graphite-web/pull/2451/files

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we are missing avg_zero

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added avg_zero!

@Dieterbe
Copy link
Contributor

a few remarks regarding xFilesFactor:

  1. i didn't dig into our expr parsing code but i presume the ArgFloat for xFilesFactor will default to 0? so both for aggregate as well as the concrete ones like average(). is this a problem or am i wrong?
  2. instead of first aggregating all aggregates, and then changing some of them to NaN per xFF, why don't we add xFF support to all aggregate functions directly? seems this would result in cleaner/simpler, more performant code. we just have to make sure that for invocations that don't support xFF (such as calling average() through the graphite api) it behaves properly

@the-real-stiven
Copy link
Contributor Author

the-real-stiven commented May 27, 2019

  1. Yes, xff defaults to 0 which is effectively not having the parameter (same as graphite functionality) (it translates to "at least 0% of values have to be not None")
  2. The reasoning behind that was because xff checks the % of None values at any given time between all series (i.e. cross series). However, some of the functions iterate over the series rather than the datapoints (crossSeriesMin, crossSeriesMax, crossSeriesSum, etc.). So implementing it in every function would result in lots of duplicated code and some messy pre-processing. When aggregate(series, "average") is called, it supports xff. When average() is called, xff defaults to 0 (or, in the future, whatever is set in settings), which is the graphite behavior (https://github.com/graphite-project/graphite-web/blob/fe1f0d130696ed7be63b794a594bf6ea5f567b33/webapp/graphite/render/functions.py#L332 and https://github.com/graphite-project/graphite-web/blob/fe1f0d130696ed7be63b794a594bf6ea5f567b33/webapp/graphite/render/functions.py#L210). So because of that it seems cleanest to just remove unwanted values after the processing in accordance with the set xff.

@Dieterbe
Copy link
Contributor

The reasoning behind that was because xff checks the % of None values at any given time between all series (i.e. cross series). However, some of the functions iterate over the series rather than the datapoints (crossSeriesMin, crossSeriesMax, crossSeriesSum, etc.). So implementing it in every function would result in lots of duplicated code and some messy pre-processing

I don't understand. you're saying that xff is a check for multiple datapoints across several series, but for the same timestamp, and i looked at the implementation of one of the mentioned functions - crossSeriesMin - and that's how it aggregates, across series, in buckets per timestamp.

Why can't we add xFF support to all cross series aggregation functions? i don't see how this would lead to duplicated code or messy pre-processing.

@Dieterbe
Copy link
Contributor

Stiven Deleur [9:21 AM]
hey, just wanted to follow up on the aggregate PR discussion
So, to clarify, are you suggesting to add the xff check into each crossSeriesX function instead of having it in aggregate()?

dieter [9:22 AM]
yes

Stiven Deleur [9:24 AM]
why not just abstract it away from those functions since those functions are meant for handling actual aggregations and not for processing additional parameters such as xff?

dieter [9:25 AM]
because xff is a variable that ties in very closely to the aggregation logic, it controls how each aggregation should work. it's not some separate concern

Stiven Deleur [9:27 AM]
my reasoning was that xff works exactly the same for each aggregation function (if too many nulls at a certain timestamp then the value becomes null) so it would be bit redundant to add that logic to every function

dieter [9:29 AM]
i understand, but i care more about performance then about DRY
so the suggestion to move xff logic into the crossSeries functions, and not as separate processing step, is with the goal of increasing performance.

Stiven Deleur [9:30 AM]
got it ok
I’ll see how I can do that with minimal additional allocations

dieter [9:32 AM]
well, we could also just leave this for later. maybe it's better you spend your limited time on doing more functions, rather than optimizing code

Stiven Deleur [9:34 AM]
I think what I’ll do is change the crossSeriesX functions to only aggregate at a certain timestamp and then add a wrapper function that loops over the whole series and checks xff before aggregating

should kill two birds with one stone in terms of performance and DRY
I’d rather get this right since a lot of the functions will be using aggregate!

dieter [9:36 AM]
the current crossSeries functions were recently refactored for performance
see #1164
so i would be very cautious with splitting it up. instead we could just introduce the xff checks inline into the function
either that or just leave things as is
actually, let's just leave it the way you have it now
if performance becomes an issue, we can look at it again

Stiven Deleur [9:39 AM]
So I think the way he did that was by looping over the series instead of each timestamp
but I need to iterate over each time stamp to figure out if there are enough nulls

dieter [9:40 AM]
yes, so let's leave what you have

Stiven Deleur [9:40 AM]
ok sounds good!
(that’s what I meant by unnecessary preprocessing)
In that case is the PR good to go?

dieter [9:44 AM]
i think so, but i need to look at it a bit more, i haven't reviewed in depth yet, but i'm also not sure how in-depth i should review. like when it comes to things like setting the right tags, and the behavior of xff and of each new processing function you introduce, i can probably assume you followed graphite quite literally and did it correctly

Stiven Deleur [9:45 AM]
Yes, I follow graphite as closely as I can and just adjust things that I see could be optimized or just don’t work with go
Plus, I run the comparison tests for a bunch of queries to make sure that tags and basic processing is the same
But do feel free to look into it further, I definitely don’t want to introduce any bugs!
Plus, its always weird replicating some of graphite behavior that doesn’t quite make sense (like with INF in the other PR)

Stiven Deleur [9:55 AM]
Let me actually go ahead and run some benchmark tests tomorrow on this PR to make sure that xff is not hurting the performance too much. I’ll comment with the diff once I do that

return xff(nonNull, len(in), xFilesFactor)
}

func xff(nonNull int, total int, xFilesFactor float64) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a comment for both of these functions to explain the purpose

@@ -64,6 +70,28 @@ func crossSeriesAvg(in []models.Series, out *[]schema.Point) {
}
}

func crossSeriesAvgZero(in []models.Series, out *[]schema.Point) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a comment explaining this function. how does it differ from avg?

@stale
Copy link

stale bot commented Apr 4, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 4, 2020
@agao48 agao48 mentioned this pull request Apr 6, 2020
@stale stale bot closed this Apr 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants