Fix case sensitivity rules for wildcard queries on text fields #71751

cbuescher · 2021-04-15T14:09:07Z

Wildcard queries on text fields should not apply the fields analyzer to the
search query. However, we accidentally enabled this in #53127 by moving the
query normalization to the StringFieldType super type. This change fixes this by
separating the notion of normalization and case insensitivity (as implemented in
the case_insensitive flag). This is done because we still need to maintain
normalization of the query sting when the wildcard query method on the field type is
requested from the query_string query parser. Wildcard queries on keyword
fields should also continue to apply the fields normalizer, regardless of
whether the case_insensitive is set, because normalization could involve
something else than lowercasing (e.g. substituting umlauts like in the
GermanNormalizationFilter).

Closes #71403

Wildcard queries on text fields should not apply the fields analyzer to the search query. However, we accidentally enabled this in elastic#53127 by moving the query normalization to the StringFieldType super type. This change fixes this by separating the notion of normalization and case insensitivity (as implemented in the `case_insensitive` flag). This is done because we still need to maintain normalization of the query sting when the wildcard query method on the field type is requested from the `query_string` query parser. Wildcard queries on keyword fields should also continue to apply the fields normalizer, regardless of whether the `case_insensitive` is set, because normalization could involve something else than lowercasing (e.g. substituting umlauts like in the GermanNormalizationFilter). Closes elastic#71403

elasticmachine · 2021-04-15T14:09:11Z

Pinging @elastic/es-search (Team:Search)

markharwood

Left a couple of comments and it also looks like WildcardFieldMapper is missing a normalizedWildcardQuery override but otherwise LGTM

markharwood · 2021-04-15T15:22:01Z

server/src/internalClusterTest/java/org/elasticsearch/search/query/SearchQueryIT.java

+             assertHitCount(searchResponse, 0L);

             wildCardQuery = wildcardQuery("field1", "bb*");
             searchResponse = client().prepareSearch().setQuery(wildCardQuery).get();


Maybe add a test where the search string is mixed case but set wildCardQuery.caseInsensitive(true)

markharwood · 2021-04-15T15:30:28Z

server/src/main/java/org/elasticsearch/index/mapper/KeywordFieldMapper.java

+            boolean caseInsensitive,
+            SearchExecutionContext context
+        ) {
+            return super.wildcardQuery(value, method, caseInsensitive, true, context);


Maybe calling normalizedWildcardQuery here would make the intent more obvious plus help any tracing back of where we make use of normalized wildcard queries

Good idea, but I left the caseInsensitive parameter out of the normalizedWildcardQuery signature on purpose, so I need to call the protected method that takes both arguments here. My thinking was that we use normalizedWildcardQuery only from QueryStringQueryParser where we don't have the caseInsensitive option. Maybe you see a different solution?

Ah ok. That makes sense.

…ic#71751) Wildcard queries on text fields should not apply the fields analyzer to the search query. However, we accidentally enabled this in elastic#53127 by moving the query normalization to the StringFieldType super type. This change fixes this by separating the notion of normalization and case insensitivity (as implemented in the `case_insensitive` flag). This is done because we still need to maintain normalization of the query sting when the wildcard query method on the field type is requested from the `query_string` query parser. Wildcard queries on keyword fields should also continue to apply the fields normalizer, regardless of whether the `case_insensitive` is set, because normalization could involve something else than lowercasing (e.g. substituting umlauts like in the GermanNormalizationFilter). Closes elastic#71403

… (#72214) Wildcard queries on text fields should not apply the fields analyzer to the search query. However, we accidentally enabled this in #53127 by moving the query normalization to the StringFieldType super type. This change fixes this by separating the notion of normalization and case insensitivity (as implemented in the `case_insensitive` flag). This is done because we still need to maintain normalization of the query sting when the wildcard query method on the field type is requested from the `query_string` query parser. Wildcard queries on keyword fields should also continue to apply the fields normalizer, regardless of whether the `case_insensitive` is set, because normalization could involve something else than lowercasing (e.g. substituting umlauts like in the GermanNormalizationFilter). Closes #71403

… (#72216) Wildcard queries on text fields should not apply the fields analyzer to the search query. However, we accidentally enabled this in #53127 by moving the query normalization to the StringFieldType super type. This change fixes this by separating the notion of normalization and case insensitivity (as implemented in the `case_insensitive` flag). This is done because we still need to maintain normalization of the query sting when the wildcard query method on the field type is requested from the `query_string` query parser. Wildcard queries on keyword fields should also continue to apply the fields normalizer, regardless of whether the `case_insensitive` is set, because normalization could involve something else than lowercasing (e.g. substituting umlauts like in the GermanNormalizationFilter). Closes #71403

…ic#71751) (elastic#72214) Wildcard queries on text fields should not apply the fields analyzer to the search query. However, we accidentally enabled this in elastic#53127 by moving the query normalization to the StringFieldType super type. This change fixes this by separating the notion of normalization and case insensitivity (as implemented in the `case_insensitive` flag). This is done because we still need to maintain normalization of the query sting when the wildcard query method on the field type is requested from the `query_string` query parser. Wildcard queries on keyword fields should also continue to apply the fields normalizer, regardless of whether the `case_insensitive` is set, because normalization could involve something else than lowercasing (e.g. substituting umlauts like in the GermanNormalizationFilter). Closes elastic#71403

… (#72224) Wildcard queries on text fields should not apply the fields analyzer to the search query. However, we accidentally enabled this in #53127 by moving the query normalization to the StringFieldType super type. This change fixes this by separating the notion of normalization and case insensitivity (as implemented in the `case_insensitive` flag). This is done because we still need to maintain normalization of the query sting when the wildcard query method on the field type is requested from the `query_string` query parser. Wildcard queries on keyword fields should also continue to apply the fields normalizer, regardless of whether the `case_insensitive` is set, because normalization could involve something else than lowercasing (e.g. substituting umlauts like in the GermanNormalizationFilter). Closes #71403

cbuescher added >bug :Search/Search Search-related issues that do not fall into other categories v8.0.0 v7.13.0 v7.12.2 labels Apr 15, 2021

cbuescher requested a review from markharwood April 15, 2021 14:09

elasticmachine added the Team:Search Meta label for search team label Apr 15, 2021

markharwood reviewed Apr 15, 2021

View reviewed changes

Christoph Büscher added 3 commits April 15, 2021 18:07

iter

bca2be0

rework test

4960a88

Merge branch 'master' into fix-71403

bdcb458

cbuescher merged commit 0519e37 into elastic:master Apr 26, 2021

cbuescher added the backport pending label Apr 26, 2021

cbuescher mentioned this pull request Apr 26, 2021

Fix case sensitivity rules for wildcard queries on text fields (#71751) #72214

Merged

cbuescher mentioned this pull request Apr 26, 2021

Fix case sensitivity rules for wildcard queries on text fields (#71751) #72216

Merged

cbuescher added v7.14.0 and removed backport pending labels Apr 27, 2021

jbaiera mentioned this pull request Jun 9, 2021

Fix failing strict wildcard pushdown tests elastic/elasticsearch-hadoop#1683

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

romseygeek mentioned this pull request Jun 16, 2022

Nested query using wildcard filter with custom analyzer : query failed on 7.13.0 #87728

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix case sensitivity rules for wildcard queries on text fields #71751

Fix case sensitivity rules for wildcard queries on text fields #71751

Uh oh!

cbuescher commented Apr 15, 2021

Uh oh!

elasticmachine commented Apr 15, 2021

Uh oh!

markharwood left a comment

Uh oh!

markharwood Apr 15, 2021

Uh oh!

markharwood Apr 15, 2021

Uh oh!

cbuescher Apr 15, 2021

Uh oh!

markharwood Apr 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix case sensitivity rules for wildcard queries on text fields #71751

Fix case sensitivity rules for wildcard queries on text fields #71751

Uh oh!

Conversation

cbuescher commented Apr 15, 2021

Uh oh!

elasticmachine commented Apr 15, 2021

Uh oh!

markharwood left a comment

Choose a reason for hiding this comment

Uh oh!

markharwood Apr 15, 2021

Choose a reason for hiding this comment

Uh oh!

markharwood Apr 15, 2021

Choose a reason for hiding this comment

Uh oh!

cbuescher Apr 15, 2021

Choose a reason for hiding this comment

Uh oh!

markharwood Apr 23, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants