Support archiving traces with ES storage #818

nziebart · 2018-05-15T01:35:01Z

Requirement - what kind of business use case are you trying to solve?

We use the ES backend, and would like to be able to archive traces

Problem - what in Jaeger blocks you from solving the requirement?

Archiving is only supported by the Cassandra storage plugin

Proposal - what do you suggest to solve the problem or improve the existing situation?

I briefly looked at the implementation of archiving. It looks like most of the logic is built on things that already existed, and that it would be fairly straightforward to add an ES implementation. Two options come to mind:

Allow configuring a second ES cluster for archiving (the same way we do for Cassandra)
Use a separate index for archive traces. The esCleaner.py script would need to ignore this index.

Any open questions to address

It seems like we'd need a different way of dividing up the indexes to support this if we go with option (1) above. Currently we create an index for each day, which probably doesn't make sense for archiving. Option (2) above would solve this inherently.

The text was updated successfully, but these errors were encountered:

yurishkuro · 2018-05-15T03:14:57Z

I think this needs both 1 and 2 simultaneously. Probably requires #799 and #628 to be implemented first (per cluster), and then add ability to specify archiving cluster.

nziebart · 2018-05-16T18:27:27Z

Is there a specific motivation for wanting a separate cluster for archiving, rather than just a separate keysapce/index?

yurishkuro · 2018-05-16T18:39:47Z

it doesn't have to be a separate cluster, but doesn't have NOT to be either. The configuration is such that archive storage inherits most settings from primary storage, and you can override some things.

pavolloffay · 2018-11-15T15:28:08Z

I am looking into this

pavolloffay · 2018-11-15T16:35:13Z

@yurishkuro Are the archived traces shown on /search page? it seems that an archived trace can be only accessed directly via /trace/id endpoint.

pavolloffay · 2018-11-15T16:57:00Z

If the above is true we don't want to create index for service names, therefore e will have to make changes to span writer and reader. I also assume that we will create only one archive index per deployment.

To be able to support multiple tenants the index prefix will be just put in front of the archive index e.g. tenant:jaeger-span-archive or tenant:jaeger-span-1970-01-01 :)

yurishkuro · 2018-11-16T00:27:38Z

@yurishkuro Are the archived traces shown on /search page? it seems that an archived trace can be only accessed directly via /trace/id endpoint.

Yes, it only works for direct lookups by ID. It's primarily built to support long-lived hyperlinks that people can put to tickets, postmortem docs, etc.

pavolloffay · 2018-11-16T15:56:59Z

We should also think about retention for the archive index. We could do it per time like proposed in #628 (day, month, year) or just allow to use a different index name e.g. `jaeger-span-archive-2. It might be also doable with prefix.

Note that it's not expected to delete data from an index in ES.

pavolloffay · 2018-11-16T16:40:46Z

^^ cc @jaegertracing/elasticsearch

masteinhauser · 2018-11-20T15:44:29Z

Just as a counter-point, we would not use the native Jaeger archive at all.

Instead, we currently rely on our own elasticsearch-curator configuration to route indices from "hot nodes" to "warm nodes". Specifically in our Kube+AWS deployment, this means moving from Elasticsearch nodes sized as r4.4xlarge with gp2 EBS volumes to r4.xlarge nodes with st1 EBS volumes.

This might be a better recommendation for the Jaeger project, even though it is more complexity in the Elasticsearch deployment and surrounding tooling itself.

An older blog post, though still largely relevant in Elasticsearch 6.x, which further explains this architecture: "Hot-Warm" Architecture in Elasticsearch 5.x

I guess, specifically, this feels like possibly the wrong way to fix a performance limitation/regression in the Elasticsearch storage backend. If given time-bounding on the query, it will "automatically" optimize which indexes need to be scanned instead of walking the entire available data-set of spans.

pavolloffay · 2018-11-20T16:10:24Z

This is all new to me I will have to experiment and do some reading. But it seems we could have one archive index (as perhaps an alias). This alias would point to one write index (archive-3) and several read indices (archive-1, archive-2). The write index would be rolled over (based on conditions - shards, time?) and put to read indices. I am not sure if the rolled over index can be automatically assigned to another alias.

pavolloffay · 2018-11-26T14:10:33Z

After playing with rollover API here is my proposal how we could go forward and use it for archive index. If it works well we could start experimenting and use it for the main indices.

First a brief explanation of rollover API:

It's an API which allows to rollover a new index if the old one matches any condition - age, number of documents, size.
the API has to be called explicitly - rollover does not happen automatically once it's setup.
the API returns the name of the new index
name of the new index can be specified e.g. name-%counter or name-%date-%counter

Great news is that ES >= 6.4.4 supports is_write_index (https://www.elastic.co/guide/en/elasticsearch/reference/6.4/indices-aliases.html#aliases-write-index, https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-rollover-index.html#indices-rollover-is-write-index) which allows to use an alias for writes with multiple indices. The write index in the alias has is_write_index:true. With rollover the old index stays in alias as read only. This simplifies readers and external tolling - adding the old index to separate read alias.

My proposal is to allow use two archive indices - one for writes and one for reads. By default the read index would be the same as write. This satifies a simple deployments with no extra configuration and the more complex deployments would have to create the archive aliases (or single if ES6 is used) before deploying and use cronjob with rollover.

cc @jaegertracing/elasticsearch any feedback is welcome.

The last thing we have to figure out is how to call rollover. We could use curator with a cronjob. The culrator would call rollover, parse the response and put the old index (if ES5) to read alias (I am not sure if rollover operation in curator returns an object with index info).
{"old_index":"jaeger-span-archive-000001","new_index":"jaeger-span-archive-000002","rolled_over":true,"dry_run":false,"acknowledged":true,"shards_acknowledged":true,"conditions":{"[max_age: 1s]":true,"[max_docs: 1]":false}}
I am linking an issue regarding automatic rollover API elastic/elasticsearch#26092.

yurishkuro · 2018-11-26T17:57:05Z

NB: are you only thinking of using this for archives? It seems useful for the main storage as well, since we're currently issuing queries over multiple indices, rather than a single alias, which would simplify the code & configuration.

pavolloffay · 2018-11-26T19:27:01Z

My plan is to start with the archive index and then add option for the main storage.

yurishkuro · 2018-11-26T19:28:48Z

that's fair, although in Cassandra there is no difference between main/archive storage implementations, just a configuration, so you would still need to make changes to the ES storage impl - are you thinking a fork or a feature flag?

pavolloffay · 2018-11-26T19:33:18Z

A good question, I think a feature flag there will be a lot of similarities. Only get index names functions should by different maybe we could resolve that function in the constructor.

My main blocker here is how to get old index name after rollover and put it into the read alias. I will have to play with the curator.

zdicesare · 2018-11-27T15:42:20Z

@pavolloffay Would archiving traces only be supported with ES >= 6.4.4?

pavolloffay · 2018-11-27T16:10:02Z

No, it will be supported for > 5.x. 6.4.4 just would leverage the is_write_index to have one alias for writes and reads.

yurishkuro added enhancement help wanted Features that maintainers are willing to accept but do not have cycles to implement labels May 15, 2018

yurishkuro mentioned this issue May 19, 2018

Allow configurable elasticsearch index date format #840

Closed

pavolloffay added the storage/elasticsearch label Oct 11, 2018

This was referenced Nov 15, 2018

[tracking] Release 1.9 #1184

Closed

Handling UI configuration in a json file jaegertracing/jaeger-operator#68

Closed

yurishkuro mentioned this issue Nov 16, 2018

Add archive traces UI configuration jaegertracing/documentation#171

Merged

pavolloffay mentioned this issue Nov 16, 2018

Support archive traces for ES storage #1197

Merged

ghost assigned pavolloffay Nov 16, 2018

ghost added the review label Nov 16, 2018

pavolloffay closed this as completed in #1197 Jan 22, 2019

ghost removed the review label Jan 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support archiving traces with ES storage #818

Support archiving traces with ES storage #818

nziebart commented May 15, 2018 •

edited

Loading

yurishkuro commented May 15, 2018

nziebart commented May 16, 2018

yurishkuro commented May 16, 2018

pavolloffay commented Nov 15, 2018

pavolloffay commented Nov 15, 2018

pavolloffay commented Nov 15, 2018

yurishkuro commented Nov 16, 2018

pavolloffay commented Nov 16, 2018

pavolloffay commented Nov 16, 2018

masteinhauser commented Nov 20, 2018

pavolloffay commented Nov 20, 2018

pavolloffay commented Nov 26, 2018 •

edited

Loading

yurishkuro commented Nov 26, 2018

pavolloffay commented Nov 26, 2018

yurishkuro commented Nov 26, 2018

pavolloffay commented Nov 26, 2018

zdicesare commented Nov 27, 2018

pavolloffay commented Nov 27, 2018

Support archiving traces with ES storage #818

Support archiving traces with ES storage #818

Comments

nziebart commented May 15, 2018 • edited Loading

Requirement - what kind of business use case are you trying to solve?

Problem - what in Jaeger blocks you from solving the requirement?

Proposal - what do you suggest to solve the problem or improve the existing situation?

Any open questions to address

yurishkuro commented May 15, 2018

nziebart commented May 16, 2018

yurishkuro commented May 16, 2018

pavolloffay commented Nov 15, 2018

pavolloffay commented Nov 15, 2018

pavolloffay commented Nov 15, 2018

yurishkuro commented Nov 16, 2018

pavolloffay commented Nov 16, 2018

pavolloffay commented Nov 16, 2018

masteinhauser commented Nov 20, 2018

pavolloffay commented Nov 20, 2018

pavolloffay commented Nov 26, 2018 • edited Loading

yurishkuro commented Nov 26, 2018

pavolloffay commented Nov 26, 2018

yurishkuro commented Nov 26, 2018

pavolloffay commented Nov 26, 2018

zdicesare commented Nov 27, 2018

pavolloffay commented Nov 27, 2018

nziebart commented May 15, 2018 •

edited

Loading

pavolloffay commented Nov 26, 2018 •

edited

Loading