Skip to content

Commit

Permalink
feat(peliasAdmin): Remove word delimiter filter
Browse files Browse the repository at this point in the history
The first error seen when trying to use our current schema with
Elasticsearch 7 is:

```
[illegal_argument_exception] Token filter [word_delimiter] cannot be
used to parse synonyms
```

The [word delimiter](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-word-delimiter-tokenfilter.html)
token filter is only used in one place: the `peliasAdmin` analyzer.

Looking at the documentation for `word_delimiter`, it does _a lot_:
splitting words, handling punctuation, and even some basic stemming.

It really feels like an extremely broad tool and at this point feels
like something that Elasticsearch would deprecate in the future.

Furthermore, looking at our integration tests, it seems one of the key
reasons we used it was to tokenize on hyphens, which we have done using
the `peliasNameTokenizer` since
#375.

Considering how complicated this token filter is, and how it's now being
used with relatively little effect, it seems like something we can
remove.

Connects pelias/pelias#831
  • Loading branch information
orangejulius committed May 20, 2020
1 parent adc758c commit 5701484
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion integration/analyzer_peliasAdmin.js
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ module.exports.tests.analyze = function(test, common){
assertAnalysis( 'notnull', ' ^ ', [] );

// remove punctuation (handled by the char_filter)
assertAnalysis( 'punctuation', punctuation.all.join(''), [] );
assertAnalysis( 'punctuation', punctuation.all.join(''), ['0:&'] );

suite.run( t.end );
});
Expand Down

0 comments on commit 5701484

Please sign in to comment.