Skip to content
This repository has been archived by the owner on Oct 29, 2021. It is now read-only.

InfluxDBStore: various improvements (+lower memory usage) #171

Merged
merged 6 commits into from
May 27, 2016

Conversation

emidoots
Copy link
Member

  • Fix an index out of bounds panic when viewing the /traces page.
  • Use a less memory intensive DB schema (users will need to rm -rf ~/.influxdb to remove the old DB).
    • In our production application, this change shows a decrease from 40+ GB (and OOM panics) to ~2.4 GB of RAM usage.
  • Upgrade to InfluxDB client v2 (fixes InfluxDBStore: use InfluxDB client v2 #140).

emidoots added 4 commits May 16, 2016 23:49
Seen when visiting the `/traces` page when one trace has no children spans.
Prior to this change we had four tags in our `spans` measurement:

- `name` which is generally low-cardinality (maps 1:1 with span names).
- `span_id` which is 100% unique.
- `trace_id` and `parent_id` which are generally unqiue, but not 100%.

If we propose a hypothetical situation with 100 unique `name` tag values, and N=50000 unique
`span_id`, `trace_id`, and `parent_id` tag values where N is the number of data points
(maps 1:1 with Appdash spans) then we can calculate our total cardinality via the
method described at https://docs.influxdata.com/influxdb/v0.13/concepts/glossary/#series-cardinality

```
100 (name) * 50,000 (span_id) * 50,000 (trace_id) * 50,000 (parent_id)

== 12,500,000,000,000,000
```

For only a dataset of 50,000 spans! And having such Very High Cardinality™ in
fact causes much higher RAM usage than is desireable. Quoting https://docs.influxdata.com/influxdb/v0.13/concepts/schema_and_data_layout/#discouraged-schema-design

> Tags that specify highly variable information like UUIDs, hashes, and random
> strings can increase your series cardinality to uncomfortable levels. If you
> need that information in your database, consider storing the high-cardinality
> data as a field rather than a tag (note that query performance will be slower).

This change does mean that trace lookup times will be a linear scan, but trace
lookups are far less common than writes in general (and this decreases RAM usage
by a factor of almost 20x in production systems).

If it is found that trace lookup times are not great after this change with a full
72hr worth of data, we can consider using a lower cardinality `trace_id`-based tag
(e.g. the first two bytes of that string) in order to reduce the linear scan time
significantly. It's not clear yet whether or not this optimization is needed.

To take advantage of this new schema, users will need to `rm -rf ~/.influxdb` to
remove the old database.
@keegancsmith
Copy link
Member

I don't know if I have asked this before, but is influxdb appropriate for the appdash use case? We aren't really storing timeseries data.

But otherwise LGTM

@@ -569,8 +568,8 @@ func (in *InfluxDBStore) init(server *influxDBServer.Server) error {
in.server = server
// TODO: Upgrade to client v2, see: github.com/influxdata/influxdb/blob/master/client/v2/client.go
Copy link
Contributor

@chris-ramon chris-ramon May 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kudos on upgrading to v2 client @slimsag! - we might want to remove this TODO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point, I missed that. Removed in ccf0b7a

@chris-ramon
Copy link
Contributor

chris-ramon commented May 25, 2016

I don't know if I have asked this before, but is influxdb appropriate for the appdash use case? We aren't really storing timeseries data.

Hi @keegancsmith, here an issue where @slimsag wrote about the motivations for replacing AggregateStore for InfluxDBStore you might want to take a look.

Additional to that I'd like to add the following points on why it can be a good use case for appdash:

  • Optimized for high-throughput scenarios(Eg. using 1 node, it can ingest a billion values per day).
    • Thousands of spans writes per second.
  • Complex queries via it's integrated query language:
    • Better span's info. summary on appdash's dashboard.
  • Retention Policies & Continuous Queries:
    • Downsampling & Aggregation of spans.

@keegancsmith
Copy link
Member

@chris-ramon cool thanks for the response and comments. Wasn't saying we made the wrong choice, was just looking for motivation :)

The points you mentioned are great, and I'd love to learn more about it. Would you mind a few emails or VC to sate my curiosity on this?

@chris-ramon
Copy link
Contributor

chris-ramon commented May 25, 2016

@keegancsmith sounds good :) - I'll be sharing some detailed info. expanding on the points mentioned above.

@emidoots emidoots merged commit 464b11a into master May 27, 2016
@emidoots emidoots deleted the sg/influx-mem branch May 27, 2016 06:07
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

InfluxDBStore: use InfluxDB client v2
3 participants