-
Notifications
You must be signed in to change notification settings - Fork 137
Aggregation support for InfluxDBStore. #136
Conversation
### Features | ||
|
||
- [#99](https://github.com/sourcegraph/appdash/pull/99): New store engine backed by [InfluxDB](https://github.com/influxdata/influxdb). | ||
- [#110](https://github.com/sourcegraph/appdash/pull/110): Implementation of the OpenTracing API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use perhaps a better format for changelog; in particular one that is date-based in case Release Notes
and Features
sections become lengthy in the future.
Personal preference also tells me to shy away from version numbers somewhat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 97111e6
(Previous PR by @chris-ramon (I rebased against master) at #127) I've given an in-depth overview of why removing |
}) | ||
if err != nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a rather fatal issue here. InfluxDBStore.Traces
only returns e.g. 10 traces at a time due to pagination. So it right now returns "traces in the last 72 hr, maximum 10" but ideally the dashboard shows us N slowest traces over the past 72hr (all traces, not just 10).
I think this also makes the reported numbers on the dashboard less accurate, because it would be the average time of 10 traces, instead of the average time of all traces in the selected timeline.
I tried lifting the limit on InfluxDBStore.Traces
so it returns all traces, but if there are many the query becomes extremely slow (obviously). We will need to use InfluxDB's average etc features instead.
I've looked into using InfluxDB's calculation features like There is no concept of when a trace 'ends' -- we assume that data can be constantly put into the But, if we look at why we need 'trace time' in the first place (for the dashboard), we can reconsider the problem from the top-down: Is displaying 'trace name' and 'trace time' the most valuable information from the Dashboard? Would it be more valuable to just display operations/spans on the dashboard instead? This would mean..
Aside from the one con, which means we would need to sampling and/or a different measurement/table for dashboard data to be sampled at a different rate, this has a lot of cool implications, and could provide much more valuable insight to users on the dashboard who are asking the question "what does the system look like overall?" There is one implementation complexity not mentioned here, which is that we don't know a spans name and duration at exactly the same time. Luckily, the work @chris-ramon is doing with |
This change fixes the InfluxDBStore aggregation support (previously, it would calculate only the average of the selected timerange "up to 10 traces" which produced counter-intuitive results). Instead, to make use of InfluxDB in a more consistent mannor switch the Dashboard to a span-based display rather than a trace-based display. This means users will see individual operations instead of a summarization of all operations within traces. For more details about the motivation for this change, see #136 (comment)
…ering now `/dashboard` timeline filter works for InfluxDBStore
…span ids now `/traces?show=span_ids...` works for `InfluxDBStore`
- Make it date-based such that the two columns will not become too lengthy in the future. - Shy away from version numbers for now.
This change fixes the InfluxDBStore aggregation support (previously, it would calculate only the average of the selected timerange "up to 10 traces" which produced counter-intuitive results). Instead, to make use of InfluxDB in a more consistent mannor switch the Dashboard to a span-based display rather than a trace-based display. This means users will see individual operations instead of a summarization of all operations within traces. For more details about the motivation for this change, see #136 (comment)
The user interface tries to filter based on _trace_ IDs whereas here we tried to filter based on _span IDs_, this caused the filter to not show correct information.
1005858
to
8b57bf4
Compare
stddev can be nil when there are <= 1 entries and a standard deviation cannot be calculated at all.
@chris-ramon can you review my follow-up commits on this branch? |
mean, min, max, stddev, count := v[1], v[2], v[3], v[4], v[5] | ||
results[i] = &AggregatedResult{ | ||
RootSpanName: row.Tags["name"], | ||
Average: time.Duration(mustJSONFloat64(mean) * float64(time.Second)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we replace float64(time.Second)
with a variable instead? declared & assigned around line 209, to avoid repetitive equal calculations.
Hi @slimsag! great work on improving the aggregation support for the |
…fixed integer This is more resiliant to change / future-proof.
Details
/dashboard
&/aggregate
endpoints.appdash.AggregateEvent.SlowestRawQuery() string
appdash.Trace.TimespanEvent() (TimespanEvent, error)
appdash.Traces() ([]*Trace, error)
--->Traces(opts TracesOpts) ([]*Trace, error)
TracesOpts
.AggregateStore
in favor ofInfluxDBStore
.Closes #137