Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InfluxDB as trace storage backend #272

Closed
yurishkuro opened this issue Jul 15, 2017 · 21 comments
Closed

InfluxDB as trace storage backend #272

yurishkuro opened this issue Jul 15, 2017 · 21 comments

Comments

@yurishkuro
Copy link
Member

yurishkuro commented Jul 15, 2017

Meta-issue no storage backends: #638

There is some work happening here openzipkin/zipkin#1628

My interest at this time is what features such implementation could provide, i.e.

  • what would be write throughput per node with RF=2
  • could the backend support indexing of arbitrary tags / log fields, or do they need to be pre-defined
  • what is the write amplification or perf impact as a function of # of tags/fields per span
    • in Cassandra backend every tag is an extra write
    • in ES it's extra indexing time on the server
  • will the backend support correct server-side joins and LIMIT (broken with Cassandra today)
  • how search with multiple tags would be handled
    • in Cassandra it's an AND across different spans from the same service name (weird)
    • in ES it's an AND across tags from the same span only (index document is one span)
  • could the backend support latency aggregates out of the box (by service/endpoint)? This one is something I'd expect InfluxDB to be able to do easily, since it's fundamentally a TS db

cc @gianarb @goller

@xjerod
Copy link

xjerod commented Jul 16, 2017

From personal experience I've easily done 6 million points per minute to a single node with no issue using the recommended 5000 points per request, however batching is a big key, as your batches get smaller write performance reduces drastically.

In terms of arbitrary tags / log fields, they do not need to be predefined, however fields cannot have a mixed type, so once you set fieldA=int64, fieldA always has to be an int64.

For indexing, tags are always indexed, fields are never indexed. This means that cardinality of tags is a big issue since Influx creates an in-memory index for all tags (might be okay with their new TSI) and any query against a field looking for a specific value causes a scan of the data - this is usually okay since you're generally querying by time span, but something to keep in mind.

Aggregations can be easily implemented with their built in aggregation functions and a groupby service and endpoint

@goller
Copy link

goller commented Jul 27, 2017

Hi @yurishkuro we'd like to contribute influxdb as a trace support backend. Currently, we are getting experience with writing spans with telegraf into InfluxDB running with the new TSI engine

@jrbury is absolutely correct on all points. The TSI engine is built to handle much higher cardinality. Here is how we define cardinality: https://docs.influxdata.com/influxdb/v1.3/concepts/glossary/#series-cardinality

I believe that the trace id will dominate the cardinality.

Regarding your other questions:

will the backend support correct server-side joins and LIMIT (broken with Cassandra today)

Influx does not have server-side joins per se, but, it is able to group by any number of tags. Additionally, influx has several meta queries using the SHOW keywords that are used to get information about tag sets. The SELECT and SHOW queries both support LIMIT.

how search with multiple tags would be handled

Multiple tags can be handled with a WHERE clause. The WHERE clause would not neeed to be restrictions of a single service name or single span. I believe it should "just work."

could the backend support latency aggregates out of the box (by service/endpoint)? This one is something I'd expect InfluxDB to be able to do easily, since it's fundamentally a TS db

Yes, I believe this should be in our wheelhouse for sure.

So, what do you think of us trying our hand at implementing the store?

@yurishkuro
Copy link
Member Author

yurishkuro commented Jul 27, 2017

As if I could try to stop you!

Seriously though, if you have the cycles and the desire to do this, then by all means. I recommend doing it in some other repo so that you don't have to go through our code reviews until you have a working proof of concept and run some integration and stress tests. Note that we have some integration tests that (in theory) should work across different storage backends - ./plugin/storage/integration/...

@yurishkuro
Copy link
Member Author

@goller just saw this https://github.com/influxdata/jaeger. Just curious - why are you going after zipkin's nomenclature ("binary annotations" etc. ) instead of OpenTracing, given that you're already operating on Jaeger's domain model? It seems like extra work. Note that Jaeger backend can both produce and consume Zipkin model if necessary.

@goller
Copy link

goller commented Aug 22, 2017 via email

@goller
Copy link

goller commented Aug 22, 2017

@yurishkuro To better understand zipkin's model, we implemented a telegraf plugin here: https://github.com/influxdata/telegraf/tree/master/plugins/inputs/zipkin

Our goal is to support OpenTracing for sure, but, we figured we would support zipkin's data model to store into influxdb via telegraf. That way both jaeger and zipkin could read data from it.

Do you think it would be better for the collection of spans to be stored using the OpenTracing naming?

@codefromthecrypt
Copy link

codefromthecrypt commented Aug 22, 2017 via email

@yurishkuro
Copy link
Member Author

@goller zipkin model does not support all features of OpenTracing, such as KV-logs and span references. Because of that the transformation from Jaeger to Zipkin data model can be lossy. If you're implementing Jaeger backend with InfluxDB, it seems to make more sense for that backend to use Jaeger data model and not be lossy.

@yurishkuro
Copy link
Member Author

@goller btw jaeger-collector can accept Zipkin spans in various formats at :9411/api/v1/spans. It converts them to Jaeger internal data model that SpanWriter/SpanReader are operating on.

@codefromthecrypt
Copy link

codefromthecrypt commented Aug 22, 2017 via email

@jacobmarble
Copy link
Contributor

FYI active work on this issue:
https://github.com/influxdata/jaeger/tree/influxdb

Today, this branch works with InfluxDB 2.0 alpha. It works today, but I won't open a PR until we've used it ourselves for a while.

@yurishkuro
Copy link
Member Author

The plugin framework issue: #422

@jpkrohling jpkrohling mentioned this issue Feb 26, 2019
8 tasks
@jacobmarble
Copy link
Contributor

FYI we have moved our active work to a new repo, which uses the gRPC framework:
https://github.com/influxdata/jaeger-influxdb

@juanpabloaj
Copy link

@jacobmarble is the repo available? I got 404

@jacobmarble
Copy link
Contributor

jacobmarble commented May 15, 2019 via email

@MattBoatman
Copy link

@jacobmarble Did https://github.com/influxdata/jaeger-influxdb get moved to another location? The link from the docs 404s

@jpkrohling
Copy link
Contributor

Looks like the repo is available :-)

@jacobmarble
Copy link
Contributor

@MattBoatman I'm not sure why you got a 404.

Related, that repository will be archived in the next few months, as its replacement stabilizes. A new InfluxDB storage engine is in development, which handle traces much better than the current engine. This new Jaeger plugin is designed around a schema which is friendly to both OpenTelemetry and the new storage engine:
https://github.com/influxdata/influxdb-observability/tree/main/jaeger-query-plugin

@MattBoatman
Copy link

@jacobmarble I emailed the influx team and they restored the repo ;)
Good to know, I was just following the links from the jaeger docs

@yurishkuro yurishkuro removed the help wanted Features that maintainers are willing to accept but do not have cycles to implement label Aug 26, 2022
@jkowall
Copy link
Contributor

jkowall commented Nov 4, 2022

There is a newer version of this which works with iOx the new engine. https://github.com/influxdata/influxdb-observability/tree/main/jaeger-query-plugin the older repo is only for v1 and v2 of InfluxDB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants