Skip to content

Standard token filter removal causes exceptions after upgrade #50734

@matriv

Description

@matriv

The removal of standard token filter in combination with the way the relevant factories are cached causes exceptions to be thrown when trying to query or insert documents to a < 7.0.0 index.

Reproduction steps:

  • Create an index in es 6.8.6
PUT /myindex
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "type":      "custom", 
          "tokenizer": "standard",
          "char_filter": [
            "html_strip"
          ],
          "filter": [
            "lowercase",
            "asciifolding",
            "standard"
          ]
        }
      }
    }
  }
}

POST /myindex/_mapping/_doc

{
  "properties": {
    "title": {
      "type":     "text",
      "analyzer": "my_custom_analyzer"
    }
  }
}
  • Upgrade to 7.4.2 and then query the index or insert a doc:
GET /myindex/_search
{
	"query": {
		"match" : {
			"title" : "Lala la lalala as a developer adf"
		}
	}
}

or

POST /myindex/_doc
{
	"title" : "foo bar"
}

and exception is thrown:

Caused by: java.lang.IllegalArgumentException: The [standard] token filter has been removed.
	at org.elasticsearch.indices.analysis.AnalysisModule.lambda$setupPreConfiguredTokenFilters$1(AnalysisModule.java:189) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.index.analysis.PreConfiguredTokenFilter.lambda$singletonWithVersion$2(PreConfiguredTokenFilter.java:66) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.index.analysis.PreConfiguredTokenFilter$1.create(PreConfiguredTokenFilter.java:132) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.index.analysis.CustomAnalyzer.createComponents(CustomAnalyzer.java:92) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:136) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]
	at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:199) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]
	at org.elasticsearch.index.search.MatchQuery$MatchQueryBuilder.createQuery(MatchQuery.java:497) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.index.search.MatchQuery$MatchQueryBuilder.createFieldQuery(MatchQuery.java:386) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.apache.lucene.util.QueryBuilder.createBooleanQuery(QueryBuilder.java:96) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]
	at org.elasticsearch.index.search.MatchQuery.parseInternal(MatchQuery.java:289) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.index.search.MatchQuery.parse(MatchQuery.java:281) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.index.query.MatchQueryBuilder.doToQuery(MatchQueryBuilder.java:426) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.index.query.AbstractQueryBuilder.toQuery(AbstractQueryBuilder.java:99) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.index.query.QueryShardContext.lambda$toQuery$1(QueryShardContext.java:305) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:317) ~[elasticsearch-7.4.2.jar:7.4.2]
	... 17 more

The exception is gone if the es node is restarted once again (after the upgrade to >= 7).
It's caused by the way the Analysis#setupPreConfiguredTokenFilters registers in the cache using the PreConfiguredTokenFilter#singletonWithVersion. The strategy used is ONE so there is only one factory and not one per version. So when the node starts for the first time in >= 7 a bunch of new internal indices are created:

[2020-01-07T18:43:52,363][INFO ][o.e.c.m.MetaDataIndexTemplateService] [matriv] adding template [.watch-history-10] for index patterns [.watcher-history-10*]
[2020-01-07T18:43:52,364][WARN ][o.e.c.s.MasterService    ] [matriv] took [43.8s], which is over [10s], to compute cluster state update for [create-index-template [.watch-history-10], cause [api]]
[2020-01-07T18:43:55,023][INFO ][o.e.c.m.MetaDataIndexTemplateService] [matriv] adding template [.slm-history] for index patterns [.slm-history-1*]
[2020-01-07T18:43:59,344][INFO ][o.e.x.i.a.TransportPutLifecycleAction] [matriv] adding index lifecycle policy [watch-history-ilm-policy]
[2020-01-07T18:43:59,467][INFO ][o.e.x.i.a.TransportPutLifecycleAction] [matriv] adding index lifecycle policy [slm-history-ilm-policy]
[2020-01-07T18:43:59,734][INFO ][o.e.c.r.a.AllocationService] [matriv] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[myindex][0]]]).

Of course those have index creation version 7.x.x and so the TokenFilterFactory is registered once with version 7.x.x. When our data index myindex gets processed it uses the 7.x.x as version (because due to the ONE caching strategy there is no other instanced cache with version 6.x.x) and so the code below:

PreConfiguredTokenFilter.singletonWithVersion("standard", true, (reader, version) -> {
                if (version.before(Version.V_7_0_0)) {
                    deprecationLogger.deprecatedAndMaybeLog("standard_deprecation",
                        "The [standard] token filter is deprecated and will be removed in a future version.");
                } else {
                    throw new IllegalArgumentException("The [standard] token filter has been removed.");
                }
                return reader;
            }));
``` leads to the exception.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions