Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch 7 Support #831

Closed
4 tasks done
orangejulius opened this issue Oct 28, 2019 · 8 comments
Closed
4 tasks done

Elasticsearch 7 Support #831

orangejulius opened this issue Oct 28, 2019 · 8 comments

Comments

@orangejulius
Copy link
Member

orangejulius commented Oct 28, 2019

This issue will track support for Elasticsearch 7 in Pelias.

Most Elasticsearch upgrades require two sets of changes:

  • Base compatibility changes, often dropping use of functionality no longer supported by Elasticsearch. These changes are generally required before the new version of Elasticsearch works at all
  • Tweaks and changes to ensure that queries return the correct results and with adequate performance. These are usually a bit more subjective and can come after initial support has been completed.

Pelias Tasks

Here's the list of breaking changes we'll need to adapt to (this list will be updated over time):

Reference links

orangejulius added a commit to pelias/docker that referenced this issue Oct 28, 2019
This is the first step in supporting Elasticsearch 7.

At this time, Pelias does not work out of the box on ES7, but with a
Docker image ready to go, we can begin testing changes for
compatibility.

This Dockerfile and config is identical to the ES6 Docker image, except
for changing the version, and making one update to the
`elasticsearch.yml`:

In ES7, the bulk thread pool is removed, and both bulk and non-bulk
operations go through a single
[write](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html#modules-threadpool)
thread pool.

For Pelias we have found increasing the queue size of this thread pool
is useful to ensure imports can succeed without errors, so the
configuration file has been updated accordingly.

Connects pelias/pelias#831
orangejulius added a commit to pelias/model that referenced this issue Nov 7, 2019
This is necessary for Elasticsearch 7

Connects pelias/pelias#831
@orangejulius
Copy link
Member Author

With the list of changes above as of this writing, an ES7 build and an import of a few million records for the Portland Metro area works well, and querying with the latest API causes no errors.

I'm sure there's more work to do, in particular I think at least one geo query related change will be required, but it looks like the core part of the ES7 upgrade is now fairly well understood! 🎉

orangejulius added a commit to pelias/schema that referenced this issue Nov 7, 2019
The first error seen when trying to use our current schema with
Elasticsearch 7 is:

```
[illegal_argument_exception] Token filter [word_delimiter] cannot be
used to parse synonyms
```

The [word delimiter](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-word-delimiter-tokenfilter.html)
token filter is only used in one place: the `peliasAdmin` analyzer.

Looking at the documentation for `word_delimiter`, it does _a lot_:
splitting words, handling punctuation, and even some basic stemming.

It really feels like an extremely broad tool and at this point feels
like something that Elasticsearch would deprecate in the future.

Furthermore, looking at our integration tests, it seems one of the key
reasons we used it was to tokenize on hyphens, which we have done using
the `peliasNameTokenizer` since
#375.

Considering how complicated this token filter is, and how it's now being
used with relatively little effect, it seems like something we can
remove.

Connects pelias/pelias#831
@missinglink missinglink pinned this issue Nov 22, 2019
@missinglink
Copy link
Member

after merging pelias/schema#403 its now possible to create indices on ES 6.8.5 which will be compatible with 7.4.2

@missinglink
Copy link
Member

missinglink commented Nov 27, 2019

For the adventurous among you, we have a prelease pelias/schema branch here.
You'll find the corresponding docker images here.

At a minimum you should ensure that you've made the following configuration changes for ES7:

  • Update pelias.json to set the correct esclient.apiVersion (7.4 at time of writing)
  • Set the schema.typeName property to _doc in pelias.json (note the underscore!)
  • Update docker-compose.yml to set the correct services.elasticsearch.image (pelias/elasticsearch:7.4.2 at time of writing)

orangejulius added a commit to pelias/documentation that referenced this issue Jan 3, 2020
Elasticsearch 6 is now supported, we are working on ES7!

Connects pelias/pelias#719
Connects pelias/pelias#831
orangejulius added a commit to pelias/schema that referenced this issue Jan 7, 2020
In Elasticsearch 7+, the [hits count is now an
object](https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking-changes-7.0.html#hits-total-now-object-search-response).

This was needed because Elasticsearch now includes a performance
improvement that allows non-exact hit counts to be used when the exact
count isn't needed.

This adds a helper to wrap around the breaking change and support either
the old or new format.

Extracted from #394
Connects pelias/pelias#831
orangejulius added a commit to pelias/schema that referenced this issue Jan 7, 2020
BREAKING CHANGE

This drops Elasticsearch 5 from the test matrix for this repo. While
it makes no other changes, it's a breaking change as it marks the point
where we stop supporting ES5.

This allows us to make changes that support only ES6/7 as we add support
for Elasticsearch 7.

Connects pelias/pelias#831
orangejulius added a commit to pelias/schema that referenced this issue Jan 7, 2020
BREAKING CHANGE: This drops Elasticsearch 5 from the test matrix for this repo. While
it makes no other changes, it's a breaking change as it marks the point
where we stop supporting ES5.

This allows us to make changes that support only ES6/7 as we add support
for Elasticsearch 7.

Connects pelias/pelias#831
orangejulius added a commit to pelias/config that referenced this issue Jan 7, 2020
Now that we have dropped support for ES5, we can change the default type
name from `doc` to `_doc`. Either setting is compatible with ES6, but
only `_doc` is compatible with ES7.

Connects pelias/pelias#831
orangejulius added a commit to pelias/schema that referenced this issue Jan 8, 2020
This drops support for testing the schema with the Elasticsearch type
name set to `doc`. This was only needed to support Elasticsearch 5. In
order to support Elasticsearch 7, we'll no longer be supporting ES5.

Connects pelias/pelias#831
orangejulius added a commit to pelias/schema that referenced this issue Jan 8, 2020
This adds Elasticsearch 7.5.1 to the CI test matrix. Until we merge
complete support for ES7, this CI run is allowed to fail without failing
the entire build.

Connects pelias/pelias#831
orangejulius added a commit to pelias/schema that referenced this issue Jan 8, 2020
This adds Elasticsearch 7.5.1 to the CI test matrix. Until we merge
complete support for ES7, this CI run is allowed to fail without failing
the entire build.

Connects pelias/pelias#831
orangejulius added a commit to pelias/schema that referenced this issue Jan 8, 2020
In ES7, specifying a mapping type name will no longer be allowed. ES6
can emulate this behavior by setting the `include_type_name` parameter
to `false` when creating and fetching mappings.

This PR sets that parameter so that our mapping format is compatible
with ES6, while using the ES7 preferred format.

In the future, when we wish to drop support for ES6, we'll only have to
stop using the `include_type_name` configuration option.

Connects pelias/pelias#831
orangejulius added a commit to pelias/schema that referenced this issue Jan 8, 2020
In ES7, specifying a mapping type name will no longer be allowed. ES6
can emulate this behavior by setting the `include_type_name` parameter
to `false` when creating and fetching mappings.

This PR sets that parameter so that our mapping format is compatible
with ES6, while using the ES7 preferred format.

In the future, when we wish to drop support for ES6, we'll only have to
stop using the `include_type_name` configuration option.

Connects pelias/pelias#831
orangejulius added a commit to pelias/schema that referenced this issue Jan 8, 2020
In ES7, specifying a mapping type name will no longer be allowed. ES6
can emulate this behavior by setting the `include_type_name` parameter
to `false` when creating and fetching mappings.

This PR sets that parameter so that our mapping format is compatible
with ES6, while using the ES7 preferred format.

In the future, when we wish to drop support for ES6, we'll only have to
stop using the `include_type_name` configuration option.

Connects pelias/pelias#831
orangejulius pushed a commit to pelias/schema that referenced this issue Jan 8, 2020
The `_all` field was deprecated in Elasticsearch 6 and completely
removed in [Elasticsearch
7](https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking-changes-7.0.html#all-meta-field-removed).

Pelias has disabled this field for quite some time, however now that we
have dropped support for ES5, we can remove this configuration option.
This also moves us towards supporting ES7!

Connects pelias/pelias#831
orangejulius added a commit to pelias/geonames that referenced this issue Jan 8, 2020
orangejulius added a commit to pelias/csv-importer that referenced this issue Jan 8, 2020
As of `pelias-config-4.8.0` we are now using the new Elasticsearch 7
compatible default document type name: `_doc`.

Now that we have dropped support for ES5, we want to ensure this value
is the default going forward.

Connects pelias/config#122
Connects pelias/pelias#831
orangejulius added a commit to pelias/docker that referenced this issue Jan 8, 2020
orangejulius added a commit to pelias/docker that referenced this issue Jan 8, 2020
Now that we have merged pelias/config#122 and
associated PRs to add Elasticsearch 7 support to Pelias master branches,
we no longer need to override the `typeName` config option in project
`pelias.json` configration.

This reverts commit acc4202.

Connects pelias/pelias#831
orangejulius added a commit to pelias/docker that referenced this issue Jan 8, 2020
Now that we have merged pelias/config#122 and
associated PRs to add Elasticsearch 7 support to Pelias master branches,
we no longer need to override the `typeName` config option in project
`pelias.json` configration.

This reverts commit acc4202.

Connects pelias/pelias#831
orangejulius added a commit to pelias/docker that referenced this issue Jan 8, 2020
@orangejulius
Copy link
Member Author

As of today, Elasticsearch 7 support has been merged to the master branches of all relevant Pelias repositories.

We'll follow up with additional changes or improvements as needed, but in our testing so far, ES7 appears to perform well.

We'll also soon start rolling out ES7 as the default to many of the regional projects in the Pelias Docker project.

orangejulius added a commit to pelias/documentation that referenced this issue Jan 10, 2020
orangejulius added a commit to pelias/dashboard that referenced this issue Jan 15, 2020
This should support at least both ES6 and ES7 counts.

Connects pelias/pelias#831
orangejulius added a commit to pelias/docker that referenced this issue Jan 18, 2020
orangejulius added a commit to pelias/documentation that referenced this issue Jan 18, 2020
@orangejulius
Copy link
Member Author

I think this is done now! 🎉

In our testing ES7 performs well compared to ES6. There are no changes in query behavior and performance is the same or slightly better.

Please reach out to us if you find any ES7 related bugs! ES6 will continue to be supported for some time, but ES7 is now the recommended version!

@mihneadb
Copy link

@orangejulius I was trying to figure out if updating to a newer version of Pelias should imply data changes in ES and couldn't find this in any release notes / doc page. Is that a thing? And do you think maybe there should be such a page? I can add it to the docs repo via a PR (just need some tips re: its contents).
Thanks!

@orangejulius
Copy link
Member Author

@mihneadb good point.

We definitely recommend starting fresh when updating to a new version, but you can follow the general Elasticsearch compatibility, where in general Elasticsearch can read indices created from one prior major version.

In our experience, using an index created in ES5 with ES6 lead to performance issues, but using an index created in ES6 with ES7 has worked fine.

orangejulius added a commit to pelias/documentation that referenced this issue Mar 13, 2020
This has been the case for some time.

Connects pelias/pelias#831
@orangejulius
Copy link
Member Author

Just in case anyone comes to this issue looking, Elasticsearch 7 is currently not only the highest supported, but also the recommended version of Elasticsearch to use.

Until Elasticsearch 8 comes out and we add support for it, that will continue to be the case.

orangejulius added a commit to pelias/model that referenced this issue May 20, 2020
This is necessary for Elasticsearch 7

Connects pelias/pelias#831
orangejulius added a commit to pelias/schema that referenced this issue May 20, 2020
The first error seen when trying to use our current schema with
Elasticsearch 7 is:

```
[illegal_argument_exception] Token filter [word_delimiter] cannot be
used to parse synonyms
```

The [word delimiter](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-word-delimiter-tokenfilter.html)
token filter is only used in one place: the `peliasAdmin` analyzer.

Looking at the documentation for `word_delimiter`, it does _a lot_:
splitting words, handling punctuation, and even some basic stemming.

It really feels like an extremely broad tool and at this point feels
like something that Elasticsearch would deprecate in the future.

Furthermore, looking at our integration tests, it seems one of the key
reasons we used it was to tokenize on hyphens, which we have done using
the `peliasNameTokenizer` since
#375.

Considering how complicated this token filter is, and how it's now being
used with relatively little effect, it seems like something we can
remove.

Connects pelias/pelias#831
@orangejulius orangejulius unpinned this issue May 26, 2020
jsvrcek added a commit to EventKit/pelias-config that referenced this issue Sep 11, 2020
* feat(elasticsearch): Default to `_doc` as type name for ES7 support

Now that we have dropped support for ES5, we can change the default type
name from `doc` to `_doc`. Either setting is compatible with ES6, but
only `_doc` is compatible with ES7.

Connects pelias/pelias#831

* chore(CI): Remove deprecated `matrix` section

Connects pelias/pelias#850

* feat(config): Default `whosonfirst.importPostalcodes` to true

These take very little additional space, and are quite useful.

We should have enabled this a long time ago.

Closes pelias#61

* fix(esclient): default esclient.apiVersion to 7.x

* feat: remove `imports.whosonfirst.importVenues`

* feat(Node.js): Drop support for Node.js 8

Node.js 8 is no longer supported as it reached [end of
life](https://github.com/nodejs/Release#release-schedule) at the end of 2019.

Connects pelias/pelias#837

* feat: Enable Postal Cities by default

For quite a while now we've had a solution to the "Postal Cities
problem" (pelias/pelias#396), but it was
disabled by default.

Enough time has passed that it should probably be enabled.

Closes pelias/pelias#396

* fix(get): support for lodash get defaultValue

* removed auth

* fix syntax

Co-authored-by: Julian Simioni <[email protected]>
Co-authored-by: Julian Simioni <[email protected]>
Co-authored-by: missinglink <[email protected]>
Co-authored-by: Joxit <[email protected]>
calpb pushed a commit to sorelle/docker that referenced this issue Mar 29, 2021
This is the first step in supporting Elasticsearch 7.

At this time, Pelias does not work out of the box on ES7, but with a
Docker image ready to go, we can begin testing changes for
compatibility.

This Dockerfile and config is identical to the ES6 Docker image, except
for changing the version, and making one update to the
`elasticsearch.yml`:

In ES7, the bulk thread pool is removed, and both bulk and non-bulk
operations go through a single
[write](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html#modules-threadpool)
thread pool.

For Pelias we have found increasing the queue size of this thread pool
is useful to ensure imports can succeed without errors, so the
configuration file has been updated accordingly.

Connects pelias/pelias#831
calpb pushed a commit to sorelle/docker that referenced this issue Mar 29, 2021
Now that we have merged pelias/config#122 and
associated PRs to add Elasticsearch 7 support to Pelias master branches,
we no longer need to override the `typeName` config option in project
`pelias.json` configration.

This reverts commit acc4202.

Connects pelias/pelias#831
calpb pushed a commit to sorelle/docker that referenced this issue Mar 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants