Adds InfluxDB store. #99

chris-ramon · 2016-01-22T21:52:52Z

Details

Issue: #98

Notes to consider:

InfluxDB sends anonymous data to m.influxdb.com, see:
- https://docs.influxdata.com/influxdb/v0.10/administration/config/#reporting-disabled-false
- https://github.com/influxdata/influxdb/blob/master/etc/config.sample.toml#L3
InfluxDB version being use is v0.10:

Clustering, replication, and high-availability are in a beta state.
The query engine is not optimized for the new TSM engine. A significant refactor of the query engine is in progress targeted for release in version 0.10.1. source

Key Concepts

Measurement

The measurement acts as a container for tags, fields, and the time column, and the measurement name is the description of the data that are stored in the associated fields. Measurement names are strings, and, for any SQL users out there, a measurement is conceptually similar to a table.

Tags

Tags are optional. You don’t need to have tags in your data structure, but it’s generally a good idea to make use of them because, unlike fields, tags are indexed. This means that queries on tags are faster and that tags are ideal for storing commonly-queried metadata.

Fields

Fields are a required piece of InfluxDB’s data structure - you cannot have data in InfluxDB without fields. It’s also important to note that fields are not indexed. Queries that use field values as filters must scan all values that match the other conditions in the query. As a result, those queries are not performant relative to queries on tags (more on tags below). In general, fields should not contain commonly-queried metadata.

Retention Policy

The part of InfluxDB’s data structure that describes for how long InfluxDB keeps data (duration) and how many copies of those data are stored in the cluster (replication factor). RPs are unique per database and along with the measurement and tag set define a series. When you create a database, InfluxDB automatically creates a retention policy called default with an infinite duration and a replication factor set to the number of nodes in the cluster. See Database Management for retention policy management.

Continuous Queries

A CQ is an InfluxQL query that the system runs automatically and periodically within a database. InfluxDB stores the results of the CQ in a specified measurement. CQs require a function in the SELECT clause and must include a GROUP BY time() clause.

emidoots · 2016-01-28T02:41:54Z

I haven't had a chance to review the entire update yet -- but this is extremely awesome progress! :) Keep up the great work!

Just responding to a few key points, and I'll have more feedback tomorrow.

TODO: decide which Point.Precision should we use here.

Looking that the doc you linked to, it appears to go with ms by default -- which should be precise enough for most (all?) use cases we and others will have. I think using the default is probably fine.

Would it be easy for users to configure this to something else if they do want to change it?

Span.ID.Trace, Span.ID.Span & Span.ID.Parent are save as tags - [InfluxDB indexes tags]

We should not have Span.ID.Span or Span.ID.Parent be tags, they should be fields. Nobody will ever want to query by these, only by Span.ID.Trace -- so the indexing InfluxDB would do would be wasteful I think.

chris-ramon · 2016-01-28T16:50:53Z

Thanks taking the time to review this one @slimsag! 👍

Looking that the doc you linked to, it appears to go with ms by default -- which should be precise enough for most (all?) use cases we and others will have. I think using the default is probably fine.

Would it be easy for users to configure this to something else if they do want to change it?

Good call, I agree we should let appdash users provide an optional config struct - we could add a new param to NewInfluxDBStore named PointConfig:

func NewInfluxDBStore(c *influxDBServer.Config, bi *influxDBServer.BuildInfo, p PointConfig) (*InfluxDBStore, error) {
  // ...
}

type PointPrecision string
type PointConfig struct {
  Precision PointPrecision
}

Perhaps we should wrap NewInfluxDBStore params within a struct called InfluxDBStoreConfig:

func NewInfluxDBStore(config InfluxDBStoreConfig) (*InfluxDBStore, error) {
  // ...
}

type InfluxDBStoreConfig struct {
  ServerConfig *influxDBServer.Config
  ServerBuildInfo *influxDBServer.BuildInfo
  PointConfig PointConfig
}

We should not have Span.ID.Span or Span.ID.Parent be tags, they should be fields. Nobody will ever want to query by these, only by Span.ID.Trace -- so the indexing InfluxDB would do would be wasteful I think.

Yes, we can only keep Span.ID.Trace as tag - current InfluxDBStore implementation relies on Span.ID.Parent to check; if value is empty then must be the root Span otherwise a children Span - but we can improve this by saving a new field named: is_root.

In regards removing Span.ID.Span I've found this method func (t *Trace) FindSpan(spanID ID) *Trace

emidoots · 2016-01-30T15:33:30Z

influxdb_store.go

+	tags := make(map[string]string, 3)
+	tags["trace_id"] = id.Trace.String()
+	tags["span_id"] = id.Span.String()
+	tags["parent_id"] = id.Parent.String()


Can be written more clearly as just:

tags := map[string]string{ "trace_id": id.Trace.String(), "span_id": id.Span.String(), "parent_id": id.Parent.String(), }

Def agree, we should use map literals here - fixed on: 815bd39

emidoots · 2016-01-30T15:41:23Z

Perhaps we should wrap NewInfluxDBStore params within a struct called InfluxDBStoreConfig

I like this idea a lot!

emidoots · 2016-01-30T15:46:25Z

Thanks for the hard work on this @chris-ramon ! I left some comments inline, and everything else you have said I agree with.

Once you're satisfied with the state of this PR for having MemoryStore-like functionality, we should:

Make the Queryer interface and the frontend support pagination.
Start investigating how we can add AggregateStore-like functionality (N slowest traces, full trace times for last ~72/hours, +last 20000 traces for developer inspection).

Happy to chat with you more on the details about these, just wanted to provide an overview of where to go from here. I've tried out your PR locally and it's great! You've made some awesome progress here!

emidoots · 2016-01-30T15:47:47Z

influxdb_store.go

+	var isRootSpan bool
+	// Iterate over series(spans) to create trace children's & set trace fields.
+	for _, s := range result.Series {
+		span, err := newSpanFromRow(&s)


If possible, could you adjust the style a bit here (and in other places too)? Prefer a blank newline after comments, like this:

trace := &Trace{} // GROUP BY * -> meaning group by all tags(trace_id, span_id & parent_id) // grouping by all tags includes those and it's values on the query response. q := fmt.Sprintf("SELECT * FROM spans WHERE trace_id='%s' GROUP BY *", id) result, err := in.executeOneQuery(q) if err != nil { return nil, err } // result.Series -> A slice containing all the spans. if len(result.Series) == 0 { return nil, errors.New("trace not found") } var isRootSpan bool // Iterate over series(spans) to create trace children's & set trace fields. for _, s := range result.Series { span, err := newSpanFromRow(&s)

good call, def improves readability - fixed on: 11e791d

emidoots · 2016-02-03T05:03:55Z

influxdb_store.go

+
+	// trace_id, span_id & parent_id are set as tags
+	// because InfluxDB tags are indexed & those values
+	// are uselater on queries.


typo here. s/are uselater/are used later/g

good catch! - fixed on 5fc2a65

also frontend won't crash due trying to use empty time values

also adds an influxdb webapp example

rename pkg to webapp so we can import it

renames influxdb pk name - see: github.com/influxdata/influxdb/issues/5388

dmitshur · 2016-02-16T03:47:51Z

BTW, I don't think 80 char width should be considered a mandatory strict limit.

Rob Pike said it himself.

https://twitter.com/rob_pike/status/563798709868056576
https://twitter.com/rob_pike/status/563801489190043648
golang/go@a625b91

Most of my code falls within 40-140 characters wide. Sometimes longer when the stuff on the right side is not important.

This reverts commit d589fcb.

chris-ramon · 2016-02-16T18:17:22Z

Good call @shurcooL, I did revert d589fcb which follows strictly 80-chars-wide code style and preserved the one that @slimsag pointed-out above.

emidoots · 2016-02-16T18:25:07Z

Looks great, thanks! @chris-ramon & @shurcooL :)

test mode for running tests & release mode as default.

tests for Collect & Trace methods

which is used to tell appdash database how long time preserve data before deleting it

updates to reuse code from examples/cmd/webapp rename pkg to webapp so we can import it manuall install deps renames influxdb pk name - see: github.com/influxdata/influxdb/issues/5388 updates to correct paths adds Collect & Trace implementation cleans up influxdb example adds Traces implementation & cleanups InfluxDBStore fixes naming clash updates to more consistent func names improvements on Traces implementation now two queries are executed, one for root spans and other for children spans use map literals instead for readability use default point precision 'ms' & set utc time typo updates NewInfluxDBStore param signature, using struct instead for consistency. improves code style improves strategy for replace existing spans on DB adds InfluxDBStore.findSpanPoint and removes InfluxDBStore.removeSpanIfExists since not needed anymore improves root span checking fields might contain empty values so better to start annotations slice from zero size temp fix for frontend hanging when seeing trace detail page typo Revert "temp fix for frontend hanging" - Lasting fix on 7a77805 This reverts commit 38edc7b. updates to preserve existing span fields do not replace existing annotations saved on db, just append new ones use ID's method instead of its implementation we might want to move zeroID to `id.go` set all other fields diff than Name too updates to correct error text improvements on span annotations updating handles potential closing errors if so we should return it captures potential closing error and logs it adds trace pagination related todo updates to handle multiple row values & update docs due to we already improved the strategy to remove existing span then save new one, now we just append new annotations to the existing span docs improvements on InfluxDBStore.Collect method adds missing whitespace adds support to save `schemas` field to spans measurement to keep track which schemas were saved by `Collect(...)` Revert "adds empty time value validation" This reverts commit 7a77805. Reverting since not required anymore to prevent ui breaking, There's a workaround introduced with: 6d10ff7. adds sorting related improvements improves comments for `InfluxDBStore` updates influxdb related paths; fixes introduced on v0.10 therefore not changes on travis related to influxdb import path issues is required - see: influxdata/influxdb#5617 adds support for auth to `InfluxDBStore.server` typo and fit comments into 80-char-width updates to keep 80-chars code width limit Revert "updates to keep 80-chars code width limit" This reverts commit d589fcb. adds mode(test, release) support for InfluxDBStore test mode for running tests & release mode as default. adds InfluxDBStore tests tests for Collect & Trace methods removes httptrace dependency to avoid cyclic dependencies adds test for InfluxDBStore.Traces() improvements on comments, unnecessary code & codestyle adds default retention policy support which is used to tell appdash database how long time preserve data before deleting it improves comments readability & adds a low priority TODO support to add sub-traces to it's trace parent clean-up TestInfluxDBStore & adds TestFindTraceParent code readability improvements

emidoots · 2016-03-09T18:02:07Z

LGTM

Add InfluxDB storage backend.

chris-ramon force-pushed the influxdb-store branch 4 times, most recently from 86d724d to 71fae14 Compare January 27, 2016 23:07

chris-ramon force-pushed the influxdb-store branch from 9bcc52c to ed9de95 Compare January 29, 2016 21:38

emidoots reviewed Jan 30, 2016
View reviewed changes

emidoots mentioned this pull request Feb 2, 2016

change import path github.com/influxdb/influxdb to github.com/influxdata/influxdb influxdata/influxdb#5388

Closed

emidoots reviewed Feb 3, 2016
View reviewed changes

adds empty time value validation

7a77805

also frontend won't crash due trying to use empty time values

chris-ramon mentioned this pull request Feb 3, 2016

Improvements on D3 Timeline data. #108

Merged

1 task

chris-ramon added 9 commits February 3, 2016 15:33

adds initial support for influxdb store

ad72e8d

also adds an influxdb webapp example

updates to reuse code from examples/cmd/webapp

255c2f3

rename pkg to webapp so we can import it

manuall install deps

6d805b2

renames influxdb pk name - see: github.com/influxdata/influxdb/issues/5388

updates to correct paths

4e1dd54

adds Collect & Trace implementation

8770314

cleans up influxdb example

ca50fa5

adds Traces implementation & cleanups InfluxDBStore

7d6fb1b

fixes naming clash

92361fe

updates to more consistent func names

750cbb1

Revert "updates to keep 80-chars code width limit"

745ea49

This reverts commit d589fcb.

chris-ramon added 4 commits February 18, 2016 20:22

adds mode(test, release) support for InfluxDBStore

77130ee

test mode for running tests & release mode as default.

adds InfluxDBStore tests

5db3789

tests for Collect & Trace methods

removes httptrace dependency to avoid cyclic dependencies

49f63e5

adds test for InfluxDBStore.Traces()

2d762fe

chris-ramon force-pushed the influxdb-store branch from 115eabf to 74b6775 Compare February 23, 2016 00:40

improvements on comments, unnecessary code & codestyle

f056451

chris-ramon force-pushed the influxdb-store branch from 74b6775 to f056451 Compare February 23, 2016 00:59

chris-ramon added 2 commits February 24, 2016 00:36

adds default retention policy support

4a4b359

which is used to tell appdash database how long time preserve data before deleting it

improves comments readability & adds a low priority TODO

62b5c69

chris-ramon force-pushed the influxdb-store branch from ee35fd8 to 62b5c69 Compare February 25, 2016 02:37

chris-ramon added 2 commits March 2, 2016 12:00

support to add sub-traces to it's trace parent

a7fb78f

clean-up TestInfluxDBStore & adds TestFindTraceParent

d4354ee

chris-ramon force-pushed the influxdb-store branch from a3b873f to d4354ee Compare March 2, 2016 17:06

chris-ramon added 2 commits March 2, 2016 12:46

code readability improvements

7024adb

disables reporting to m.influxdb.com

ef7a19a

chris-ramon force-pushed the influxdb-store branch from 0a0dedd to ef7a19a Compare March 3, 2016 04:03

chris-ramon added a commit to chris-ramon/appdash that referenced this pull request Mar 6, 2016

PR sourcegraph#99

32bba44

chris-ramon mentioned this pull request Mar 7, 2016

Adds InfluxDBStore benchmarks. #114

Merged

4 tasks

emidoots pushed a commit that referenced this pull request Mar 9, 2016

Merge pull request #99 from chris-ramon/influxdb-store

7be3bd9

Add InfluxDB storage backend.

emidoots merged commit 7be3bd9 into sourcegraph:master Mar 9, 2016

emidoots mentioned this pull request Mar 9, 2016

Explore using InfluxDB as a storage backend #98

Closed

This was referenced Mar 9, 2016

updates to keep influxdb metrics reporting enabled #115

Merged

Adds emptiness time checking. #116

Merged

codefromthecrypt mentioned this pull request Jun 28, 2017

Storage support using InfluxDB interesting? openzipkin/zipkin#1628

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds InfluxDB store. #99

Adds InfluxDB store. #99

chris-ramon commented Jan 22, 2016

emidoots commented Jan 28, 2016

chris-ramon commented Jan 28, 2016

emidoots Jan 30, 2016

chris-ramon Jan 30, 2016

emidoots commented Jan 30, 2016

emidoots commented Jan 30, 2016

emidoots Jan 30, 2016

chris-ramon Feb 1, 2016

emidoots Feb 3, 2016

chris-ramon Feb 3, 2016

dmitshur commented Feb 16, 2016

chris-ramon commented Feb 16, 2016

emidoots commented Feb 16, 2016

emidoots commented Mar 9, 2016

Adds InfluxDB store. #99

Adds InfluxDB store. #99

Conversation

chris-ramon commented Jan 22, 2016

Details

Clustering, replication, and high-availability are in a beta state. The query engine is not optimized for the new TSM engine. A significant refactor of the query engine is in progress targeted for release in version 0.10.1. source

Key Concepts

emidoots commented Jan 28, 2016

chris-ramon commented Jan 28, 2016

emidoots Jan 30, 2016

Choose a reason for hiding this comment

chris-ramon Jan 30, 2016

Choose a reason for hiding this comment

emidoots commented Jan 30, 2016

emidoots commented Jan 30, 2016

emidoots Jan 30, 2016

Choose a reason for hiding this comment

chris-ramon Feb 1, 2016

Choose a reason for hiding this comment

emidoots Feb 3, 2016

Choose a reason for hiding this comment

chris-ramon Feb 3, 2016

Choose a reason for hiding this comment

dmitshur commented Feb 16, 2016

chris-ramon commented Feb 16, 2016

emidoots commented Feb 16, 2016

emidoots commented Mar 9, 2016

Clustering, replication, and high-availability are in a beta state.
The query engine is not optimized for the new TSM engine. A significant refactor of the query engine is in progress targeted for release in version 0.10.1. source