InfluxDBStore: various improvements (+lower memory usage) #171

emidoots · 2016-05-25T05:33:36Z

Fix an index out of bounds panic when viewing the /traces page.
Use a less memory intensive DB schema (users will need to rm -rf ~/.influxdb to remove the old DB).
- In our production application, this change shows a decrease from 40+ GB (and OOM panics) to ~2.4 GB of RAM usage.
Upgrade to InfluxDB client v2 (fixes InfluxDBStore: use InfluxDB client v2 #140).

Seen when visiting the `/traces` page when one trace has no children spans.

Prior to this change we had four tags in our `spans` measurement: - `name` which is generally low-cardinality (maps 1:1 with span names). - `span_id` which is 100% unique. - `trace_id` and `parent_id` which are generally unqiue, but not 100%. If we propose a hypothetical situation with 100 unique `name` tag values, and N=50000 unique `span_id`, `trace_id`, and `parent_id` tag values where N is the number of data points (maps 1:1 with Appdash spans) then we can calculate our total cardinality via the method described at https://docs.influxdata.com/influxdb/v0.13/concepts/glossary/#series-cardinality ``` 100 (name) * 50,000 (span_id) * 50,000 (trace_id) * 50,000 (parent_id) == 12,500,000,000,000,000 ``` For only a dataset of 50,000 spans! And having such Very High Cardinality™ in fact causes much higher RAM usage than is desireable. Quoting https://docs.influxdata.com/influxdb/v0.13/concepts/schema_and_data_layout/#discouraged-schema-design > Tags that specify highly variable information like UUIDs, hashes, and random > strings can increase your series cardinality to uncomfortable levels. If you > need that information in your database, consider storing the high-cardinality > data as a field rather than a tag (note that query performance will be slower). This change does mean that trace lookup times will be a linear scan, but trace lookups are far less common than writes in general (and this decreases RAM usage by a factor of almost 20x in production systems). If it is found that trace lookup times are not great after this change with a full 72hr worth of data, we can consider using a lower cardinality `trace_id`-based tag (e.g. the first two bytes of that string) in order to reduce the linear scan time significantly. It's not clear yet whether or not this optimization is needed. To take advantage of this new schema, users will need to `rm -rf ~/.influxdb` to remove the old database.

keegancsmith · 2016-05-25T12:58:13Z

I don't know if I have asked this before, but is influxdb appropriate for the appdash use case? We aren't really storing timeseries data.

But otherwise LGTM

chris-ramon · 2016-05-25T18:06:47Z

influxdb_store.go

@@ -569,8 +568,8 @@ func (in *InfluxDBStore) init(server *influxDBServer.Server) error {
 	in.server = server
 	// TODO: Upgrade to client v2, see: github.com/influxdata/influxdb/blob/master/client/v2/client.go


Kudos on upgrading to v2 client @slimsag! - we might want to remove this TODO.

Ah good point, I missed that. Removed in ccf0b7a

chris-ramon · 2016-05-25T18:33:43Z

I don't know if I have asked this before, but is influxdb appropriate for the appdash use case? We aren't really storing timeseries data.

Hi @keegancsmith, here an issue where @slimsag wrote about the motivations for replacing AggregateStore for InfluxDBStore you might want to take a look.

Additional to that I'd like to add the following points on why it can be a good use case for appdash:

Optimized for high-throughput scenarios(Eg. using 1 node, it can ingest a billion values per day).
- Thousands of spans writes per second.
Complex queries via it's integrated query language:
- Better span's info. summary on appdash's dashboard.
Retention Policies & Continuous Queries:
- Downsampling & Aggregation of spans.

keegancsmith · 2016-05-25T19:30:27Z

@chris-ramon cool thanks for the response and comments. Wasn't saying we made the wrong choice, was just looking for motivation :)

The points you mentioned are great, and I'd love to learn more about it. Would you mind a few emails or VC to sate my curiosity on this?

chris-ramon · 2016-05-25T21:37:33Z

@keegancsmith sounds good :) - I'll be sharing some detailed info. expanding on the points mentioned above.

emidoots added 4 commits May 16, 2016 23:49

fix index out of bounds panic in InfluxDBStore.Traces

c25b85f

Seen when visiting the `/traces` page when one trace has no children spans.

InfluxDBStore: upgrade to InfluxDB client v2

b2b5aaa

govendor add github.com/influxdata/influxdb/client/v2

4e01de2

emidoots assigned keegancsmith May 25, 2016

chris-ramon reviewed May 25, 2016
View reviewed changes

emidoots added 2 commits May 26, 2016 21:15

InfluxDBStore: remove invalid TODO

ccf0b7a

CHANGELOG: mention PR #171

464b11a

emidoots merged commit 464b11a into master May 27, 2016

emidoots deleted the sg/influx-mem branch May 27, 2016 06:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InfluxDBStore: various improvements (+lower memory usage) #171

InfluxDBStore: various improvements (+lower memory usage) #171

emidoots commented May 25, 2016

keegancsmith commented May 25, 2016

chris-ramon May 25, 2016 •

edited

Loading

emidoots May 27, 2016

chris-ramon commented May 25, 2016 •

edited

Loading

keegancsmith commented May 25, 2016

chris-ramon commented May 25, 2016 •

edited

Loading

		@@ -569,8 +568,8 @@ func (in InfluxDBStore) init(server influxDBServer.Server) error {
		in.server = server
		// TODO: Upgrade to client v2, see: github.com/influxdata/influxdb/blob/master/client/v2/client.go

InfluxDBStore: various improvements (+lower memory usage) #171

InfluxDBStore: various improvements (+lower memory usage) #171

Conversation

emidoots commented May 25, 2016

keegancsmith commented May 25, 2016

chris-ramon May 25, 2016 • edited Loading

Choose a reason for hiding this comment

emidoots May 27, 2016

Choose a reason for hiding this comment

chris-ramon commented May 25, 2016 • edited Loading

keegancsmith commented May 25, 2016

chris-ramon commented May 25, 2016 • edited Loading

chris-ramon May 25, 2016 •

edited

Loading

chris-ramon commented May 25, 2016 •

edited

Loading

chris-ramon commented May 25, 2016 •

edited

Loading