Standard token filter removal causes exceptions after upgrade

The removal of standard token filter in combination with the way the relevant factories are cached causes exceptions to be thrown when trying to query or insert documents to a < 7.0.0 index.

Reproduction steps:

- Create an index in es 6.8.6
```
PUT /myindex
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "type":      "custom", 
          "tokenizer": "standard",
          "char_filter": [
            "html_strip"
          ],
          "filter": [
            "lowercase",
            "asciifolding",
            "standard"
          ]
        }
      }
    }
  }
}

POST /myindex/_mapping/_doc

{
  "properties": {
    "title": {
      "type":     "text",
      "analyzer": "my_custom_analyzer"
    }
  }
}
```

- Upgrade to 7.4.2 and then query the index or insert a doc:
```
GET /myindex/_search
{
	"query": {
		"match" : {
			"title" : "Lala la lalala as a developer adf"
		}
	}
}

or

POST /myindex/_doc
{
	"title" : "foo bar"
}
```

and exception is thrown:
```
Caused by: java.lang.IllegalArgumentException: The [standard] token filter has been removed.
	at org.elasticsearch.indices.analysis.AnalysisModule.lambda$setupPreConfiguredTokenFilters$1(AnalysisModule.java:189) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.index.analysis.PreConfiguredTokenFilter.lambda$singletonWithVersion$2(PreConfiguredTokenFilter.java:66) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.index.analysis.PreConfiguredTokenFilter$1.create(PreConfiguredTokenFilter.java:132) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.index.analysis.CustomAnalyzer.createComponents(CustomAnalyzer.java:92) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:136) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]
	at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:199) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]
	at org.elasticsearch.index.search.MatchQuery$MatchQueryBuilder.createQuery(MatchQuery.java:497) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.index.search.MatchQuery$MatchQueryBuilder.createFieldQuery(MatchQuery.java:386) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.apache.lucene.util.QueryBuilder.createBooleanQuery(QueryBuilder.java:96) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]
	at org.elasticsearch.index.search.MatchQuery.parseInternal(MatchQuery.java:289) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.index.search.MatchQuery.parse(MatchQuery.java:281) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.index.query.MatchQueryBuilder.doToQuery(MatchQueryBuilder.java:426) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.index.query.AbstractQueryBuilder.toQuery(AbstractQueryBuilder.java:99) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.index.query.QueryShardContext.lambda$toQuery$1(QueryShardContext.java:305) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:317) ~[elasticsearch-7.4.2.jar:7.4.2]
	... 17 more
```

The exception is gone if the es node is restarted once again (after the upgrade to >= 7).
It's caused by the way the `Analysis#setupPreConfiguredTokenFilters` registers in the cache using the `PreConfiguredTokenFilter#singletonWithVersion`. The strategy used is `ONE` so there is only one factory and not one per version. So when the node starts for the first time in >= 7 a bunch of new internal indices are created:
```
[2020-01-07T18:43:52,363][INFO ][o.e.c.m.MetaDataIndexTemplateService] [matriv] adding template [.watch-history-10] for index patterns [.watcher-history-10*]
[2020-01-07T18:43:52,364][WARN ][o.e.c.s.MasterService    ] [matriv] took [43.8s], which is over [10s], to compute cluster state update for [create-index-template [.watch-history-10], cause [api]]
[2020-01-07T18:43:55,023][INFO ][o.e.c.m.MetaDataIndexTemplateService] [matriv] adding template [.slm-history] for index patterns [.slm-history-1*]
[2020-01-07T18:43:59,344][INFO ][o.e.x.i.a.TransportPutLifecycleAction] [matriv] adding index lifecycle policy [watch-history-ilm-policy]
[2020-01-07T18:43:59,467][INFO ][o.e.x.i.a.TransportPutLifecycleAction] [matriv] adding index lifecycle policy [slm-history-ilm-policy]
[2020-01-07T18:43:59,734][INFO ][o.e.c.r.a.AllocationService] [matriv] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[myindex][0]]]).
```

Of course those have index creation version 7.x.x and so the `TokenFilterFactory` is registered once with version 7.x.x. When our data index `myindex` gets processed it uses the 7.x.x as version (because due to the `ONE` caching strategy there is no other instanced cache with version 6.x.x) and so the code below:
```
PreConfiguredTokenFilter.singletonWithVersion("standard", true, (reader, version) -> {
                if (version.before(Version.V_7_0_0)) {
                    deprecationLogger.deprecatedAndMaybeLog("standard_deprecation",
                        "The [standard] token filter is deprecated and will be removed in a future version.");
                } else {
                    throw new IllegalArgumentException("The [standard] token filter has been removed.");
                }
                return reader;
            }));
``` leads to the exception.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Standard token filter removal causes exceptions after upgrade #50734

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Standard token filter removal causes exceptions after upgrade #50734

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions