Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support archiving traces with ES storage #818

Closed
nziebart opened this issue May 15, 2018 · 18 comments · Fixed by #1197
Closed

Support archiving traces with ES storage #818

nziebart opened this issue May 15, 2018 · 18 comments · Fixed by #1197
Assignees
Labels
enhancement help wanted Features that maintainers are willing to accept but do not have cycles to implement storage/elasticsearch

Comments

@nziebart
Copy link

nziebart commented May 15, 2018

Requirement - what kind of business use case are you trying to solve?

We use the ES backend, and would like to be able to archive traces

Problem - what in Jaeger blocks you from solving the requirement?

Archiving is only supported by the Cassandra storage plugin

Proposal - what do you suggest to solve the problem or improve the existing situation?

I briefly looked at the implementation of archiving. It looks like most of the logic is built on things that already existed, and that it would be fairly straightforward to add an ES implementation. Two options come to mind:

  1. Allow configuring a second ES cluster for archiving (the same way we do for Cassandra)
  2. Use a separate index for archive traces. The esCleaner.py script would need to ignore this index.

Any open questions to address

It seems like we'd need a different way of dividing up the indexes to support this if we go with option (1) above. Currently we create an index for each day, which probably doesn't make sense for archiving. Option (2) above would solve this inherently.

@yurishkuro
Copy link
Member

I think this needs both 1 and 2 simultaneously. Probably requires #799 and #628 to be implemented first (per cluster), and then add ability to specify archiving cluster.

@yurishkuro yurishkuro added enhancement help wanted Features that maintainers are willing to accept but do not have cycles to implement labels May 15, 2018
@nziebart
Copy link
Author

Is there a specific motivation for wanting a separate cluster for archiving, rather than just a separate keysapce/index?

@yurishkuro
Copy link
Member

it doesn't have to be a separate cluster, but doesn't have NOT to be either. The configuration is such that archive storage inherits most settings from primary storage, and you can override some things.

@pavolloffay
Copy link
Member

I am looking into this

@pavolloffay
Copy link
Member

@yurishkuro Are the archived traces shown on /search page? it seems that an archived trace can be only accessed directly via /trace/id endpoint.

@pavolloffay
Copy link
Member

If the above is true we don't want to create index for service names, therefore e will have to make changes to span writer and reader. I also assume that we will create only one archive index per deployment.

To be able to support multiple tenants the index prefix will be just put in front of the archive index e.g. tenant:jaeger-span-archive or tenant:jaeger-span-1970-01-01 :)

@yurishkuro
Copy link
Member

@yurishkuro Are the archived traces shown on /search page? it seems that an archived trace can be only accessed directly via /trace/id endpoint.

Yes, it only works for direct lookups by ID. It's primarily built to support long-lived hyperlinks that people can put to tickets, postmortem docs, etc.

@pavolloffay
Copy link
Member

We should also think about retention for the archive index. We could do it per time like proposed in #628 (day, month, year) or just allow to use a different index name e.g. `jaeger-span-archive-2. It might be also doable with prefix.

Note that it's not expected to delete data from an index in ES.

@pavolloffay
Copy link
Member

^^ cc @jaegertracing/elasticsearch

@masteinhauser
Copy link
Member

Just as a counter-point, we would not use the native Jaeger archive at all.

Instead, we currently rely on our own elasticsearch-curator configuration to route indices from "hot nodes" to "warm nodes". Specifically in our Kube+AWS deployment, this means moving from Elasticsearch nodes sized as r4.4xlarge with gp2 EBS volumes to r4.xlarge nodes with st1 EBS volumes.

This might be a better recommendation for the Jaeger project, even though it is more complexity in the Elasticsearch deployment and surrounding tooling itself.

An older blog post, though still largely relevant in Elasticsearch 6.x, which further explains this architecture: "Hot-Warm" Architecture in Elasticsearch 5.x

I guess, specifically, this feels like possibly the wrong way to fix a performance limitation/regression in the Elasticsearch storage backend. If given time-bounding on the query, it will "automatically" optimize which indexes need to be scanned instead of walking the entire available data-set of spans.

@pavolloffay
Copy link
Member

This is all new to me I will have to experiment and do some reading. But it seems we could have one archive index (as perhaps an alias). This alias would point to one write index (archive-3) and several read indices (archive-1, archive-2). The write index would be rolled over (based on conditions - shards, time?) and put to read indices. I am not sure if the rolled over index can be automatically assigned to another alias.

@pavolloffay
Copy link
Member

pavolloffay commented Nov 26, 2018

After playing with rollover API here is my proposal how we could go forward and use it for archive index. If it works well we could start experimenting and use it for the main indices.

First a brief explanation of rollover API:

  • It's an API which allows to rollover a new index if the old one matches any condition - age, number of documents, size.
  • the API has to be called explicitly - rollover does not happen automatically once it's setup.
  • the API returns the name of the new index
  • name of the new index can be specified e.g. name-%counter or name-%date-%counter

Great news is that ES >= 6.4.4 supports is_write_index (https://www.elastic.co/guide/en/elasticsearch/reference/6.4/indices-aliases.html#aliases-write-index, https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-rollover-index.html#indices-rollover-is-write-index) which allows to use an alias for writes with multiple indices. The write index in the alias has is_write_index:true. With rollover the old index stays in alias as read only. This simplifies readers and external tolling - adding the old index to separate read alias.

My proposal is to allow use two archive indices - one for writes and one for reads. By default the read index would be the same as write. This satifies a simple deployments with no extra configuration and the more complex deployments would have to create the archive aliases (or single if ES6 is used) before deploying and use cronjob with rollover.

cc @jaegertracing/elasticsearch any feedback is welcome.

The last thing we have to figure out is how to call rollover. We could use curator with a cronjob. The culrator would call rollover, parse the response and put the old index (if ES5) to read alias (I am not sure if rollover operation in curator returns an object with index info).
{"old_index":"jaeger-span-archive-000001","new_index":"jaeger-span-archive-000002","rolled_over":true,"dry_run":false,"acknowledged":true,"shards_acknowledged":true,"conditions":{"[max_age: 1s]":true,"[max_docs: 1]":false}}
I am linking an issue regarding automatic rollover API elastic/elasticsearch#26092.

@yurishkuro
Copy link
Member

NB: are you only thinking of using this for archives? It seems useful for the main storage as well, since we're currently issuing queries over multiple indices, rather than a single alias, which would simplify the code & configuration.

@pavolloffay
Copy link
Member

My plan is to start with the archive index and then add option for the main storage.

@yurishkuro
Copy link
Member

that's fair, although in Cassandra there is no difference between main/archive storage implementations, just a configuration, so you would still need to make changes to the ES storage impl - are you thinking a fork or a feature flag?

@pavolloffay
Copy link
Member

A good question, I think a feature flag there will be a lot of similarities. Only get index names functions should by different maybe we could resolve that function in the constructor.

My main blocker here is how to get old index name after rollover and put it into the read alias. I will have to play with the curator.

@zdicesare
Copy link
Contributor

@pavolloffay Would archiving traces only be supported with ES >= 6.4.4?

@pavolloffay
Copy link
Member

No, it will be supported for > 5.x. 6.4.4 just would leverage the is_write_index to have one alias for writes and reads.

@ghost ghost removed the review label Jan 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement help wanted Features that maintainers are willing to accept but do not have cycles to implement storage/elasticsearch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants