Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Elasticsearch 7 support #1690

Merged
merged 21 commits into from
Aug 22, 2019

Conversation

gregoryfranklin
Copy link
Contributor

@gregoryfranklin gregoryfranklin commented Jul 25, 2019

Adds support for Elasticsearch 7.x

Resolves #1474
Depends on olivere/elastic#1146

  • Updates github.com/olivere/elastic to 6.2.21
  • Removes the deprecated default field from mappings
  • Replaces document types with '_doc' as a transition step to removing them entirely
  • Sets include_type_name for compatibility between elasticsearch 6.x and 7.x

rest_total_hits_as_int support is also required for compatibility between elasticsearch 6.x and 7.x (waiting for feature to be merged in github.com/olivere/elastic) olivere/elastic#1146

@gregoryfranklin gregoryfranklin changed the title Elasticsearch 7 Support WIP: Elasticsearch 7 Support Jul 25, 2019
@gregoryfranklin gregoryfranklin force-pushed the elasticsearch_7 branch 3 times, most recently from d646855 to 1f1b304 Compare August 5, 2019 14:44
@codecov
Copy link

codecov bot commented Aug 9, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@ecdecd1). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #1690   +/-   ##
=========================================
  Coverage          ?   98.22%           
=========================================
  Files             ?      195           
  Lines             ?     9529           
  Branches          ?        0           
=========================================
  Hits              ?     9360           
  Misses            ?      134           
  Partials          ?       35
Impacted Files Coverage Δ
plugin/storage/es/spanstore/reader.go 100% <ø> (ø)
plugin/storage/es/spanstore/service_operation.go 100% <ø> (ø)
plugin/storage/es/factory.go 100% <100%> (ø)
plugin/storage/es/dependencystore/schema.go 100% <100%> (ø)
plugin/storage/es/dependencystore/storage.go 85.71% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ecdecd1...22a2332. Read the comment docs.

@pavolloffay
Copy link
Member

pavolloffay commented Aug 13, 2019

Elasticsearch 7 documentation

I have tested

Data migration from ES6 to ES7

Elasticsearch can read indices created in the previous major version. If you have indices created in 5.x or before, you must reindex or delete them before upgrading to 7.2.1. Elasticsearch nodes will fail to start if incompatible indices are present. Snapshots of 5.x or earlier indices cannot be restored to a 7.x cluster even if they were created by a 6.x cluster.

I was able to migrate data from ES 5.x up to ES 7.x. I did the following

  1. run ES 5.x and store data
  2. run ES 6.8 (this does not require any migration)
  3. change index template to be ES 7.x compatible, before running the specify the number of shards (number of nodes) and replicas (0).
curl -ivX POST -H "Content-Type: application/json" http://localhost:9200/_template/jaeger-span -d @./plugin/storage/es/mappings/jaeger-span-7.json
curl -ivX POST -H "Content-Type: application/json" http://localhost:9200/_template/jaeger-service -d @./plugin/storage/es/mappings/jaeger-service-7.json
  1. reindex all span and service indices to a new index with the corrent mapping. The new indices will have suffix -1.
curl -ivX POST -H "Content-Type: application/json" http://localhost:9200/_reindex -d @reindex.json
{
  "source": {
    "index": "jaeger-span-*"
  },
  "dest": {
    "index": "jaeger-span"
  },
  "script": {
    "lang": "painless",
    "source": "ctx._index = 'jaeger-span-' + (ctx._index.substring('jaeger-span-'.length(), ctx._index.length())) + '-1'"
  }
}
  1. delete old indices, exlude -1 indices from deletion
curl -ivX DELETE -H "Content-Type: application/json" http://localhost:9200/jaeger-span-\*,-\*-1
curl -ivX DELETE -H "Content-Type: application/json" http://localhost:9200/jaeger-service-\*,-\*-1
  1. create new indices with removed -1. Run the same for service indices.
curl -ivX POST -H "Content-Type: application/json" http://localhost:9200/_reindex -d @reindex.json
{
  "source": {
    "index": "jaeger-span-*"
  },
  "dest": {
    "index": "jaeger-span"
  },
  "script": {
    "lang": "painless",
    "source": "ctx._index = 'jaeger-span-' + (ctx._index.substring('jaeger-span-'.length(), ctx._index.length() - 2))"
  }
}
  1. remove -1 indices
curl -ivX DELETE -H "Content-Type: application/json" http://localhost:9200/jaeger-span-\*-1
curl -ivX DELETE -H "Content-Type: application/json" http://localhost:9200/jaeger-service-\*-1
  1. Stop ES6 and run ES7

Mapping changes:

Other changes which affect us

One span can have only 10000 tags and logs (sum) by default

To safeguard against out of memory errors, the number of nested json objects within a single document across all fields has been limited to 10000. This default limit can be changed with the index setting index.mapping.nested_objects.limit. https://www.elastic.co/guide/en/elasticsearch/reference/7.x/breaking-changes-7.0.html#limit-number-nested-json-objects

Warn logs in ES7

{"type": "deprecation", "timestamp": "2019-08-13T12:47:59,052+0000", "level": "WARN", "component": "o.e.d.r.a.s.RestSearchAction", "cluster.name": "docker-cluster", "node.name": "c57fe6fcb117", "cluster.uuid": "CU17MCH2QnWLBuOvP-RKXQ", "node.id": "HeFqmR2tR4SgOEBO0g1g3Q",  "message": "[types removal] Specifying types in search requests is deprecated."  }
{"type": "deprecation", "timestamp": "2019-08-13T12:47:59,605+0000", "level": "WARN", "component": "o.e.d.a.b.BulkRequestParser", "cluster.name": "docker-cluster", "node.name": "c57fe6fcb117", "cluster.uuid": "CU17MCH2QnWLBuOvP-RKXQ", "node.id": "HeFqmR2tR4SgOEBO0g1g3Q",  "message": "[types removal] Specifying types in bulk requests is deprecated."  }
{"type": "deprecation", "timestamp": "2019-08-13T12:47:42,603+0000", "level": "WARN", "component": "o.e.d.r.a.a.i.RestPutIndexTemplateAction", "cluster.name": "docker-cluster", "node.name": "c57fe6fcb117", "cluster.uuid": "CU17MCH2QnWLBuOvP-RKXQ", "node.id": "HeFqmR2tR4SgOEBO0g1g3Q",  "message": "[types removal] Specifying include_type_name in put index template requests is deprecated. The parameter will be removed in the next major version."  }

Copy link
Member

@pavolloffay pavolloffay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gregoryfranklin thanks for the nice work!

A couple of high-level comments

  • avoid the use of deprecated APIs in ES7 - see inline comments
  • keep only two sets of index mappings - one for ES7 and another for ES6 and ES5

Also the PR needs to be rebased

scripts/travis/es-integration-test.sh Outdated Show resolved Hide resolved
scripts/travis/es-integration-test.sh Outdated Show resolved Hide resolved
scripts/travis/es-integration-test.sh Outdated Show resolved Hide resolved
plugin/storage/integration/es_index_cleaner_test.go Outdated Show resolved Hide resolved
print('Creating index template {}'.format(template_name))
headers = {'Content-Type': 'application/json'}
s = get_request_session(os.getenv("ES_USERNAME"), os.getenv("ES_PASSWORD"), str2bool(os.getenv("ES_TLS", 'false')), os.getenv("ES_TLS_CA"), os.getenv("ES_TLS_CERT"), os.getenv("ES_TLS_KEY"))
r = s.put(sys.argv[2] + '/_template/' + template_name, headers=headers, data=template)
compat = '?include_type_name=true' if esVersion == '7' else ''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my note:

Before 7.0.0, the mappings definition used to include a type name. Although mappings no longer contain a type name by default, you can still use the old format by setting the parameter include_type_name. For more details, please see Removal of mapping types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The include_type_name parameter in the index creation, index template, and mapping APIs will default to false. Setting the parameter at all will result in a deprecation warning.

from https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html#_schedule_for_removal_of_mapping_types

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again do we need this parameter? It is already deprecated in ES7.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is recommended to make index templates typeless by re-adding them with include_type_name set to false. Under the hood, typeless templates will use the dummy type _doc when creating indices.

https://www.elastic.co/guide/en/elasticsearch/reference/master/removal-of-types.html#_index_templates

pkg/es/wrapper/wrapper.go Outdated Show resolved Hide resolved
return WrapESSearchService(c.client.Search(indices...))
searchService := c.client.Search(indices...)
if c.esVersion == 7 {
searchService = searchService.RestTotalHitsAsInt(true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?

rest_total_hits_as_int or restTotalHitsAsInt | boolean - Indicates whether hits.total should be rendered as an integer or an object in the rest search response

https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking-changes-7.0.html#hits-total-now-object-search-response

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, rest_total_hits_as_int is required whilst using the v6 branch of github.com/olivere/elastic because the result from elasticsearch needs to be unmarshalled into a go struct.
The go struct that comes with github.com/olivere/elastic v6 matches the response from ES5 and ES6 but only matches ES7 if rest_total_hits_as_int is used.

pkg/es/config/config.go Show resolved Hide resolved
pkg/es/config/config.go Outdated Show resolved Hide resolved
scripts/travis/es-integration-test.sh Outdated Show resolved Hide resolved
@pavolloffay
Copy link
Member

@gregoryfranklin the PR needs to be updated and it can go in. @jaegertracing/elasticsearch please also have a look if you are interested in ES7. Or if you could review data migration part I provided in #1690 (comment)

@pavolloffay
Copy link
Member

pavolloffay commented Aug 14, 2019

Based on #1690 (comment) we could use ES 7 mapping when 6.8 is detected for more seamless update. Users would just deploy with 6.8 and wait until old indices TTLout and are removed and then deploy with ES 7.x.

It can be done in a separate PR.

…s of elasticsearch

A convenient way to run the elasticsearch integration test is to just run ./scripts/travis/es-integration-test.sh

This change allows you to specify different elasticsearch versions

eg

ES_VERSION=5.6.12 scripts/travis/es-integration-test.sh
ES_VERSION=6.8.1 scripts/travis/es-integration-test.sh
ES_VERSION=7.2.0 scripts/travis/es-integration-test.sh

The default version to use for the tests is currently 5.6.12

Signed-off-by: Greg Franklin <[email protected]>
Signed-off-by: Greg Franklin <[email protected]>
@gregoryfranklin
Copy link
Contributor Author

I have removed include_type_name. This also meant removing types from all the search queries.

The token propagation test is currently broken. The reason for this is that Jaeger will now not start up unless it can reach elasticsearch and determine the ES version, but the token propagation test is not using a real ES server. I need to think about this a bit more to find a solution.

@pavolloffay
Copy link
Member

The token propagation test is currently broken. The reason for this is that Jaeger will now not start up unless it can reach elasticsearch and determine the ES version, but the token propagation test is not using a real ES server. I need to think about this a bit more to find a solution.

We are using a mocked ES server. Just create an endpoint which returns ES version.

Without IncludeTypeName, ES will use the type "_doc" all documents.

Type is removed from all search queries so that it queries all types
whether they be "_doc" or (eg) "span".

Type should not be used when indexing documents on ES7 (so that the
default "_doc" type is used).  It should, however, be used on ES5 and ES6
so that the type matches that described by the mapping.

Signed-off-by: Greg Franklin <[email protected]>
Signed-off-by: Greg Franklin <[email protected]>
Signed-off-by: Greg Franklin <[email protected]>
The query service needs to detect the elasticsearch version to start up.

Elasticsearch must therefore be started before the query service and must
return a version number in response to a ping request

Signed-off-by: Greg Franklin <[email protected]>
plugin/storage/es/dependencystore/schema.go:37:9: if block ends with a return statement, so drop this else and outdent its block

Signed-off-by: Greg Franklin <[email protected]>
@pavolloffay
Copy link
Member

@gregoryfranklin could you please merge mappings for ES 6 and ES 5 to a single file and just add a new mapping for ES 7. The legacy mapping works well for ES5 and ES6. It seems unnecessary to create duplicate files.

Copy link
Member

@pavolloffay pavolloffay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we are getting close to the merge.

Could you please revert the change to token propagation test? And also use a single mapping for dependencies if the old mapping works?

time.Sleep(100 * time.Millisecond)

// Path relative to plugin/storage/integration/token_propagation_test.go
cmd := exec.Command("../../../cmd/query/query-linux", "--es.server-urls=http://127.0.0.1:9200", "--es.tls=false", "--query.bearer-token-propagation=true")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please revert this. It's better to run the query from outside the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query service must be started after the elasticsearch server because the version check is done when the client is initialised (ie on startup). If you start the query service before running the test, as it was before, the query service exits with an error.

pkg/es/config/config.go Outdated Show resolved Hide resolved
@@ -22,3 +22,17 @@ const dependenciesMapping = `{
"` + dependencyType + `":{}
}
}`

const dependenciesMapping7 = `{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? The legacy mapping should work with the ES7. I think you added this after I tested this and it worked without this change.

If it works please revert this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The legacy mapping does not work with ES7 because the legacy mapping includes the type name. Type names are not permitted in ES7 unless you use the include_type_name parameter, which we removed because it is deprecated.

@pavolloffay
Copy link
Member

I think this looks pretty good, I will do final testing and merge tomorrow.

One thing I don't like is the change to token propagation tests. To fix that we should allow starting Jaeger if the Elasticsearch version cannot be retrieved and just use one template.

@pavolloffay pavolloffay changed the title WIP: Elasticsearch 7 Support Add Elasticsearch 7 Support Aug 22, 2019
@pavolloffay pavolloffay changed the title Add Elasticsearch 7 Support Add Elasticsearch 7 support Aug 22, 2019
@pavolloffay pavolloffay merged commit 11d7631 into jaegertracing:master Aug 22, 2019
pavolloffay pushed a commit to pavolloffay/jaeger that referenced this pull request Aug 22, 2019
* Make it possible to run es-integration-test against different versions of elasticsearch

A convenient way to run the elasticsearch integration test is to just run ./scripts/travis/es-integration-test.sh

This change allows you to specify different elasticsearch versions

eg

ES_VERSION=5.6.12 scripts/travis/es-integration-test.sh
ES_VERSION=6.8.1 scripts/travis/es-integration-test.sh
ES_VERSION=7.2.0 scripts/travis/es-integration-test.sh

The default version to use for the tests is currently 5.6.12

Signed-off-by: Greg Franklin <[email protected]>

* Update github.com/olivere/elastic to 6.2.21

Signed-off-by: Greg Franklin <[email protected]>

* Migrate to a single '_doc' elasticsearch document type

Signed-off-by: Greg Franklin <[email protected]>

* Add IncludeTypeName for compatability with elasticsearch 7.x

Signed-off-by: Greg Franklin <[email protected]>

* Remove deprecated __default__ field from es mappings

Signed-off-by: Greg Franklin <[email protected]>

* Rebase on master.  Fix for ES5

Signed-off-by: Greg Franklin <[email protected]>

* Update esRollover.py to use per-version templates

Signed-off-by: Greg Franklin <[email protected]>

* Update github.com/olivere/elastic to 6.2.22 and use RestTotalHitsAsInt against ES7

Signed-off-by: Greg Franklin <[email protected]>

* esRollover.py should set '?include_type_name=true' when creating templates for ES7

Signed-off-by: Greg Franklin <[email protected]>

* Check loading of all versions of the ES mappings

Signed-off-by: Greg Franklin <[email protected]>

* Run es-integration-test.sh against ES5, ES6 and ES7

Signed-off-by: Greg Franklin <[email protected]>

* Run 'make fmt' to add license text to mocks

Signed-off-by: Greg Franklin <[email protected]>

* Update elasticsearch versions used for integration tests (and add --rm to docker commands)

Signed-off-by: Greg Franklin <[email protected]>

* Do not use IncludeTypeName on ES7

Without IncludeTypeName, ES will use the type "_doc" all documents.

Type is removed from all search queries so that it queries all types
whether they be "_doc" or (eg) "span".

Type should not be used when indexing documents on ES7 (so that the
default "_doc" type is used).  It should, however, be used on ES5 and ES6
so that the type matches that described by the mapping.

Signed-off-by: Greg Franklin <[email protected]>

* Log elasticsearch version

Signed-off-by: Greg Franklin <[email protected]>

* Make sure we can get the elasticsearch version on servers other than localhost

Signed-off-by: Greg Franklin <[email protected]>

* Fix es dependency storage test

Signed-off-by: Greg Franklin <[email protected]>

* Fix token propagation test for elasticsearch version detection

The query service needs to detect the elasticsearch version to start up.

Elasticsearch must therefore be started before the query service and must
return a version number in response to a ping request

Signed-off-by: Greg Franklin <[email protected]>

* Fix lint failure

plugin/storage/es/dependencystore/schema.go:37:9: if block ends with a return statement, so drop this else and outdent its block

Signed-off-by: Greg Franklin <[email protected]>

* Use the same file for elasticsearch mappings on ES5 and ES6.  Only ES7 needs to be different

Signed-off-by: Greg Franklin <[email protected]>

* Change log message to use structured logging

Signed-off-by: Greg Franklin <[email protected]>
kennyaz pushed a commit to kennyaz/jaeger that referenced this pull request Jun 12, 2023
* Make it possible to run es-integration-test against different versions of elasticsearch

A convenient way to run the elasticsearch integration test is to just run ./scripts/travis/es-integration-test.sh

This change allows you to specify different elasticsearch versions

eg

ES_VERSION=5.6.12 scripts/travis/es-integration-test.sh
ES_VERSION=6.8.1 scripts/travis/es-integration-test.sh
ES_VERSION=7.2.0 scripts/travis/es-integration-test.sh

The default version to use for the tests is currently 5.6.12

Signed-off-by: Greg Franklin <[email protected]>

* Update github.com/olivere/elastic to 6.2.21

Signed-off-by: Greg Franklin <[email protected]>

* Migrate to a single '_doc' elasticsearch document type

Signed-off-by: Greg Franklin <[email protected]>

* Add IncludeTypeName for compatability with elasticsearch 7.x

Signed-off-by: Greg Franklin <[email protected]>

* Remove deprecated __default__ field from es mappings

Signed-off-by: Greg Franklin <[email protected]>

* Rebase on master.  Fix for ES5

Signed-off-by: Greg Franklin <[email protected]>

* Update esRollover.py to use per-version templates

Signed-off-by: Greg Franklin <[email protected]>

* Update github.com/olivere/elastic to 6.2.22 and use RestTotalHitsAsInt against ES7

Signed-off-by: Greg Franklin <[email protected]>

* esRollover.py should set '?include_type_name=true' when creating templates for ES7

Signed-off-by: Greg Franklin <[email protected]>

* Check loading of all versions of the ES mappings

Signed-off-by: Greg Franklin <[email protected]>

* Run es-integration-test.sh against ES5, ES6 and ES7

Signed-off-by: Greg Franklin <[email protected]>

* Run 'make fmt' to add license text to mocks

Signed-off-by: Greg Franklin <[email protected]>

* Update elasticsearch versions used for integration tests (and add --rm to docker commands)

Signed-off-by: Greg Franklin <[email protected]>

* Do not use IncludeTypeName on ES7

Without IncludeTypeName, ES will use the type "_doc" all documents.

Type is removed from all search queries so that it queries all types
whether they be "_doc" or (eg) "span".

Type should not be used when indexing documents on ES7 (so that the
default "_doc" type is used).  It should, however, be used on ES5 and ES6
so that the type matches that described by the mapping.

Signed-off-by: Greg Franklin <[email protected]>

* Log elasticsearch version

Signed-off-by: Greg Franklin <[email protected]>

* Make sure we can get the elasticsearch version on servers other than localhost

Signed-off-by: Greg Franklin <[email protected]>

* Fix es dependency storage test

Signed-off-by: Greg Franklin <[email protected]>

* Fix token propagation test for elasticsearch version detection

The query service needs to detect the elasticsearch version to start up.

Elasticsearch must therefore be started before the query service and must
return a version number in response to a ping request

Signed-off-by: Greg Franklin <[email protected]>

* Fix lint failure

plugin/storage/es/dependencystore/schema.go:37:9: if block ends with a return statement, so drop this else and outdent its block

Signed-off-by: Greg Franklin <[email protected]>

* Use the same file for elasticsearch mappings on ES5 and ES6.  Only ES7 needs to be different

Signed-off-by: Greg Franklin <[email protected]>

* Change log message to use structured logging

Signed-off-by: Greg Franklin <[email protected]>
Signed-off-by: kennyaz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Elasticsearch 7.x
2 participants