Support archive traces for ES storage #1197

pavolloffay · 2018-11-16T14:37:51Z

Resolves #818
Related to #1242

This adds the implementation of archive storage for ES. There are two possible configurations how to use this feature:

the default - it will store and read archived traces from jaeger-archive-span index.
using rollover API - it writers spans to an alias jaeger-span-archive-write and reads from an alias jaeger-span-archive-read. This deployment requires an external component that performs index managent: creating index, creating alias, callig rollover API and managing max-span-age by removing indices from read alias. This component is esRollover.py available as jaeger-es-rollover docker image.

As the default index name does not collide with aliases used by rollover users can seameasly migrate from default deployment to rollover by putting jaeger-archive-span index into the read alias.

Default use case

It does not require any configuration just deploying jaeger.

Rollover use case

It requires to create index jaeger-span-archive-000001 and aliases jaeger-span-archive-read, jaeger-span-archive-write.

To enable rollover the following switch has to be enabled

--es-archive.use-aliases                                    Use read and write aliases for indices. Use this option with Elasticsearch rollover API. It requires an external component to create aliases before startup and then performing its management. Note that es-archive.max-span-age is not taken into the account and has to be substituted by external component managing read alias. (default false)

Initialization

Initialize - create index and aliases

ARCHIVE=true UNIT=seconds python3 esRollover.py init localhost:9200

Rollover

Now we can start jaeger and archive traces. The next step is to rollover to new write index:

ARCHIVE=true CONDITIONS='{"max_age": "1s"}'  python3 esRollover.py  rollover localhost:9200

I am using max age 1s to trigger the rollover immmediately. This step can be repeated multiple times, The read alias should always point to all created indices, whereas write alias only to the last one.

Managing max-span-age and delete old indices

The last step is to remove old indices from read alias - to mimic max-span-age.

ARCHIVE=true  UNIT=minutes UNIT_COUNT=2  python3 esRollover.py  lookback localhost:9200

We can also delete the old indices entirely by running es-index-cleaner

ARCHIVE=true python3 esCleaner.py 1 localhost:9200

Note that it supports only days at unit.

Other

Indices: http://localhost:9200/_cat/indices
Aliases: http://localhost:9200/_cat/aliases

UI config to enable archive tab: --query.ui-config ui-config.json

{
    "archiveEnabled": true
}

ES 6.4.4 supports is_write_index - it means that a single alias can be used for reads and writes (it points to multiple indices from which only one is write). We can add support later by using a different flag.

The following command can be used to rollover and put the new index to read alias.

curl -ivX PUT -H "Content-Type: application/json" localhost:9200/jaeger-span-archive-000001 -d '{
  "aliases": {
    "jaeger-span-archive-write": {"is_write_index": true},
    "jaeger-span-archive-read": {}
  }
}'

Note that all actions provided by esRollover.py can be written declarativelly as yml files for curator. Here is a working examlple https://gist.github.com/pavolloffay/5ce3f59f0967aadd5b019be411121c6c. The problem is to support index prefix. Yaml approach is quite nice but does not satifly our requirements if we want to have some fields configurable.

codecov · 2018-11-16T14:46:53Z

Codecov Report

Merging #1197 into master will not change coverage.
The diff coverage is 100%.

@@          Coverage Diff           @@
##           master   #1197   +/-   ##
======================================
  Coverage     100%    100%           
======================================
  Files         161     162    +1     
  Lines        7210    7263   +53     
======================================
+ Hits         7210    7263   +53

Impacted Files	Coverage Δ
plugin/storage/es/options.go	`100% <100%> (ø)`	⬆️
plugin/storage/es/spanstore/reader.go	`100% <100%> (ø)`	⬆️
plugin/storage/es/factory.go	`100% <100%> (ø)`	⬆️
plugin/storage/es/spanstore/writer.go	`100% <100%> (ø)`	⬆️
plugin/storage/es/spanstore/index_utils.go	`100% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aa30ce4...9832bef. Read the comment docs.

plugin/storage/es/factory.go

plugin/storage/es/spanstore/writer.go

pavolloffay · 2018-12-05T23:17:47Z

This is ready for review @jaegertracing/elasticsearch @yurishkuro @objectiser

Please read the first comment. We could also automatically default to write and read aliases and do not support no ttl use case.

The question is also who should create aliases. We could embed it in Jaeger, although I am not sure about that approach.

plugin/storage/es/spanstore/index_utils.go

yurishkuro

@pavolloffay my one high level concern is this: the Cassandra storage (reader/writer) have no knowledge of whether they are operating on the main or archive keyspace, which results in a lot less code and special cases. Do you think the same is possible to achieve with ES, i.e. by parameterizing the reader/writer? I realize that because ES has no notion of the key spaces, there needs to be some variation, but couldn't it be achieved by simply passing a few config params like different index name patterns, instead of explicitly making reader/writer aware of "archive" notion and having branching behavior? What are the fundamental differences between the main and archive flows in ES?

pkg/es/config/config.go

pavolloffay · 2019-01-03T15:01:21Z

Do you think the same is possible to achieve with ES, i.e. by parameterizing the reader/writer?

That is what I wanted to achieve here too. The archive parameter is not present in reader/writer methods. it's only passed to the constructor which changes how indices are resolved.

plugin/storage/es/spanstore/reader.go

yurishkuro · 2019-01-04T12:58:51Z

Do you think the same is possible to achieve with ES, i.e. by parameterizing the reader/writer?

That is what I wanted to achieve here too. The archive parameter is not present in reader/writer methods. it's only passed to the constructor which changes how indices are resolved.

That's exactly what concerns me: imho nothing in plugin/storage/es/* should have any references to archive/non-archive names, we're only half-way there. The storage can support two modes: daily indices and index aliases. The distinction between archive/non-archive ideally should be consolidated in one place, and the outcome of that one place is a config object that says these are the names/patterns and the daily vs. alias mode. Currently too many places in the code know about archive/non-archive.

Another high-level question: is it possible to transparently support the daily-index mode via index-alias mode (I assume not)?

masteinhauser · 2019-01-04T13:33:01Z

is it possible to transparently support the daily-index mode via index-alias mode (I assume not)?

This is possible, at least for reads, via the curator and assigning multiple indices to the same alias.

For the example above, a glob pattern can also be used to associate an alias to more than one index that share a common name:

POST /_aliases
{
    "actions" : [
        { "add" : { "index" : "test*", "alias" : "all_test_indices" } }
    ]
}

In this case, the alias is a point-in-time alias that will group all current indices that match, it will not automatically update as new indices that match this pattern are added/removed.

Again, Aliases work well (and transparently) only for read operations:

It is an error to index to an alias which points to more than one index.
However, I believe @pavolloffay mentioned (perhaps elsewhere) of specifying the single write-index to use while still behind a single alias.

Ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html

yurishkuro · 2019-01-04T14:00:08Z

It would be nice if jaeger collector/ingester would perform the management of the write indices. It already creates new daily index opportunistically. We have a simple master election implementation for Cassandra used by adaptive sampling, can probably extend it to ES to pick one of the nodes to perform index rollover.

pavolloffay · 2019-01-10T16:52:03Z

@yurishkuro what do you mean by:

It would be nice if jaeger collector/ingester would perform the management of the write indices

Do you mean to create aliases if the rollover is being used? I think that is the only index functionality we could manage from jaeger. The other API that is necessary to call is localhost:9200/jaeger-span-archive/_rollover, however it does not make sense to call it from collector because:

the body accepts various conditions - it does not make sense to pollute our API to accept it
the call can fail and the component calling it should act accordingly - this is more of operator's job.

We will have to provide a component to create aliases and call rollover API. The first comment contains the curl calls. If we embed the alias creation to the collector it limits the naming patterns (maybe something more). I would put the index management into a separate component it's more flexible, power users could skip it and use their configurations.

Note that the default behavior is without rollover API. What we have to solve is the migration from single index to rollover. If we will use slightly different index names for rollover we are fine users can easily migrate by putting old index to read alias. The other option is to use reindex API to change archive index name and then create alias with the original name. However reindex copies data which takes time so there will be downtime.

yurishkuro · 2019-01-11T01:39:31Z

Do you mean to create aliases if the rollover is being used?

Yes, it's what (I think) I meant. Sorry, too many nuances going on here.

+1 on having an external mgmt component outside of collectors. However, are you sure it would work with daily indices, even if we normalize them with aliases? Today we create a new index when we see a span with the timestamp for the next day, right? Mgmt component won't have that info.

Also, is es curator difficult to set up? If not, what is the value of writing our own? If we did mgmt out of collectors it's attractive because there are fewer things to deploy (btw totally agree with/ your point about the API for such thing).

pavolloffay · 2019-01-11T10:32:44Z

However, are you sure it would work with daily indices, even if we normalize them with aliases?

The alias does not contain date it would be simply an alias jaeger-span. This alias would point to an active write index jaeger-span-00001. The external component would call rollover to create a new write index if some conditions are valid eg. size, date or number of shards. This also solves the current problem when writer creates index too far in the future #841.

pavolloffay · 2019-01-15T16:46:27Z

@jaegertracing/elasticsearch @yurishkuro @jpkrohling @objectiser could you please review?

While doing the review could you please deploy multiple jaeger instances to single ES and especially verify that index prefix works as expected.

plugin/storage/es/factory.go

objectiser

LGTM - although would be good to get feedback from ES users to make sure the approach meets their requirements, both for the archive indices but also in future for the main indices.

pavolloffay · 2019-01-17T16:31:58Z

@jaegertracing/elasticsearch it would be great if any of you guys could have a look and briefly comment on our approach.

pavolloffay · 2019-01-17T16:46:10Z

It's important to mention that the similar approach would be used for the main indices when using rollover.

pavolloffay · 2019-01-22T08:18:25Z

I think we should go forward and merge this PR. It will be available in latest images so people can try. If nobody objects I will merge it today EOD.

Signed-off-by: Pavol Loffay <[email protected]>

ledor473 · 2019-01-31T18:38:29Z

plugin/storage/es/factory.go

-		TagKeysAsFields:   tags,
-		TagDotReplacement: f.primaryConfig.GetTagDotReplacement(),
-	}), nil
+	return createSpanWriter(f.metricsFactory, f.logger, f.archiveClient, f.primaryConfig, false)


@pavolloffay I think the client here should be f.primaryClient and not f.archiveClient

pavolloffay requested review from black-adder, jpkrohling, objectiser, tiffon, vprithvi and yurishkuro as code owners November 16, 2018 14:37

ghost assigned pavolloffay Nov 16, 2018

ghost added the review label Nov 16, 2018

yurishkuro reviewed Nov 16, 2018

View reviewed changes

plugin/storage/es/factory.go Show resolved Hide resolved

yurishkuro reviewed Nov 16, 2018

View reviewed changes

plugin/storage/es/spanstore/writer.go Outdated Show resolved Hide resolved

pavolloffay force-pushed the es-archive-traces branch from afc261c to 48e2095 Compare December 5, 2018 08:55

pavolloffay changed the title ~~WIP: Support archive traces for ES storage~~ Support archive traces for ES storage Dec 5, 2018

yurishkuro reviewed Dec 28, 2018

View reviewed changes

plugin/storage/es/spanstore/index_utils.go Outdated Show resolved Hide resolved

yurishkuro reviewed Dec 28, 2018

View reviewed changes

pkg/es/config/config.go Outdated Show resolved Hide resolved

pavolloffay force-pushed the es-archive-traces branch from bfd2dbe to 83b34b7 Compare January 3, 2019 16:20

yurishkuro reviewed Jan 4, 2019

View reviewed changes

plugin/storage/es/spanstore/reader.go Outdated Show resolved Hide resolved

pavolloffay force-pushed the es-archive-traces branch 2 times, most recently from ee71f61 to 02fd4d6 Compare January 10, 2019 13:44

pavolloffay added the storage/elasticsearch label Jan 10, 2019

pavolloffay mentioned this pull request Jan 16, 2019

Use Elasticsearch Rollover API to manage indices #1242

Closed

This comment has been minimized.

Sign in to view

objectiser reviewed Jan 16, 2019

View reviewed changes

plugin/storage/es/factory.go Show resolved Hide resolved

This comment has been minimized.

Sign in to view

pavolloffay force-pushed the es-archive-traces branch from cfeed61 to 0809856 Compare January 17, 2019 13:13

This comment has been minimized.

Sign in to view

pavolloffay mentioned this pull request Jan 17, 2019

[tracking] Release 1.9 #1184

Closed

4 tasks

This comment has been minimized.

Sign in to view

objectiser approved these changes Jan 17, 2019

View reviewed changes

pavolloffay force-pushed the es-archive-traces branch 2 times, most recently from db8878b to e4f71f7 Compare January 18, 2019 13:30

pavolloffay added 2 commits January 22, 2019 09:28

Support archive traces for ES

6c41ff2

Signed-off-by: Pavol Loffay <[email protected]>

Change messages

9832bef

Signed-off-by: Pavol Loffay <[email protected]>

pavolloffay force-pushed the es-archive-traces branch from e4f71f7 to 9832bef Compare January 22, 2019 08:28

pavolloffay merged commit 6abb0a8 into jaegertracing:master Jan 22, 2019

ghost removed the review label Jan 22, 2019

ledor473 reviewed Jan 31, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support archive traces for ES storage #1197

Support archive traces for ES storage #1197

pavolloffay commented Nov 16, 2018 •

edited

Loading

codecov bot commented Nov 16, 2018 •

edited

Loading

pavolloffay commented Dec 5, 2018

yurishkuro left a comment

pavolloffay commented Jan 3, 2019

yurishkuro commented Jan 4, 2019

masteinhauser commented Jan 4, 2019

yurishkuro commented Jan 4, 2019

pavolloffay commented Jan 10, 2019

yurishkuro commented Jan 11, 2019

pavolloffay commented Jan 11, 2019

pavolloffay commented Jan 15, 2019

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

objectiser left a comment

pavolloffay commented Jan 17, 2019

pavolloffay commented Jan 17, 2019

pavolloffay commented Jan 22, 2019

ledor473 Jan 31, 2019

Support archive traces for ES storage #1197

Support archive traces for ES storage #1197

Conversation

pavolloffay commented Nov 16, 2018 • edited Loading

Default use case

Rollover use case

Initialization

Rollover

Managing max-span-age and delete old indices

Other

codecov bot commented Nov 16, 2018 • edited Loading

Codecov Report

pavolloffay commented Dec 5, 2018

yurishkuro left a comment

Choose a reason for hiding this comment

pavolloffay commented Jan 3, 2019

yurishkuro commented Jan 4, 2019

masteinhauser commented Jan 4, 2019

yurishkuro commented Jan 4, 2019

pavolloffay commented Jan 10, 2019

yurishkuro commented Jan 11, 2019

pavolloffay commented Jan 11, 2019

pavolloffay commented Jan 15, 2019

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

objectiser left a comment

Choose a reason for hiding this comment

pavolloffay commented Jan 17, 2019

pavolloffay commented Jan 17, 2019

pavolloffay commented Jan 22, 2019

ledor473 Jan 31, 2019

Choose a reason for hiding this comment

pavolloffay commented Nov 16, 2018 •

edited

Loading

codecov bot commented Nov 16, 2018 •

edited

Loading