Skip to content

Conversation

@cbuescher
Copy link
Member

Wildcard queries on keyword fields should get normalized, however this normalization
should exclude the two special characters * and ? in order to keep the wildcard query
itself intact.

Closes #46300

Wildcard queries on keyword fields get normalized, however this normalization
step should exclude the two special characters * and ? in order to keep the
wildcard query itself intact.

Closes elastic#46300
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Analysis)

@cbuescher
Copy link
Member Author

@jimczi pushed some changes moving the code like you suggested

@cbuescher
Copy link
Member Author

@elasticmachine update branch

@elasticmachine
Copy link
Collaborator

merge conflict between base and head

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The StringFieldType looks good to me.
I am less sure about the modifications in the _type field. I left a comment on how we can address it but that can be done in a follow up if you prefer.

@cbuescher
Copy link
Member Author

@jimczi @romseygeek thanks for the review, I added a commit that changes TypeFieldType to extend ConstantFieldType like you suggested and adapted tests where necessary. Hope this looks okay now.
For the backport of this to 7.x/7.6., if I'm not mistaken we still need to support more than one "_type" value, so cannot use ConstantFieldType here. It looks like wildcard queries are forwarded to termsQuery(via StringFieldType#wildcardQuery and TypeFieldType#termQuery) which only performs exact term matching returning either MatchAllDocs or MatchNoDocs. I think we need to continue that based on the "_type" field on 7.x since we are sure to have only one type, but it might be something other than "_doc" (from e.g. an old 6.x index)? Am I more or less right about this? We can discuss this on the backport if you are okay with this change on master for now...

@romseygeek
Copy link
Contributor

For the backport of this to 7.x/7.6., if I'm not mistaken we still need to support more than one "_type" value, so cannot use ConstantFieldType here

We can have only one type in 7x, but it may be something other than _doc - you should be able to get it from MapperService#type() though

Christoph Büscher added 2 commits March 11, 2020 16:02
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good to me. I left one comment regarding the handling of prefix and wildcard queries for _type field. Feel free to address it without further review.

result = new MatchNoDocsQuery("[_type] was lexicographically greater than upper bound of range");
}
}
protected boolean matches(String pattern, QueryShardContext context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to handle wildcard and prefixes here ? We don't support prefix and wildcard queries on the _type field today and since _type are now a thing from the past I don't think we should add this ability. Just checking that the pattern exactly matches the internal type should be enough.

@cbuescher
Copy link
Member Author

@elasticmachine run elasticsearch-ci/2
@elasticmachine run elasticsearch-ci/bwc
@elasticmachine run elasticsearch-ci/default-distro

@cbuescher
Copy link
Member Author

@elasticmachine update branch

@cbuescher cbuescher merged commit facd525 into elastic:master Mar 12, 2020
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request Mar 12, 2020
)

Wildcard queries on keyword fields get normalized, however this normalization
step should exclude the two special characters * and ? in order to keep the
wildcard query itself intact.

Closes elastic#46300
@jimczi jimczi removed the v7.6.2 label Mar 18, 2020
@jimczi
Copy link
Contributor

jimczi commented Mar 18, 2020

@cbuescher I removed the v7.6.2 since this pr doesn't need to be backported in 7.6. The backport is still missing in 7.7 but it can be tracked here.

cbuescher pushed a commit that referenced this pull request Mar 24, 2020
…53512)

Wildcard queries on keyword fields get normalized, however this normalization
step should exclude the two special characters * and ? in order to keep the
wildcard query itself intact.

Closes #46300
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request Apr 15, 2021
Wildcard queries on text fields should not apply the fields analyzer to the
search query. However, we accidentally enabled this in elastic#53127 by moving the
query normalization to the StringFieldType super type. This change fixes this by
separating the notion of normalization and case insensitivity (as implemented in
the `case_insensitive` flag). This is done because we still need to maintain
normalization of the query sting when the wildcard query method on the field type is
requested from the `query_string` query parser. Wildcard queries on keyword
fields should also continue to apply the fields normalizer, regardless of
whether the `case_insensitive` is set, because normalization could involve
something else than lowercasing (e.g. substituting umlauts like in the
GermanNormalizationFilter).

Closes elastic#71403
cbuescher pushed a commit that referenced this pull request Apr 26, 2021
Wildcard queries on text fields should not apply the fields analyzer to the
search query. However, we accidentally enabled this in #53127 by moving the
query normalization to the StringFieldType super type. This change fixes this by
separating the notion of normalization and case insensitivity (as implemented in
the `case_insensitive` flag). This is done because we still need to maintain
normalization of the query sting when the wildcard query method on the field type is
requested from the `query_string` query parser. Wildcard queries on keyword
fields should also continue to apply the fields normalizer, regardless of
whether the `case_insensitive` is set, because normalization could involve
something else than lowercasing (e.g. substituting umlauts like in the
GermanNormalizationFilter).

Closes #71403
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request Apr 26, 2021
…ic#71751)

Wildcard queries on text fields should not apply the fields analyzer to the
search query. However, we accidentally enabled this in elastic#53127 by moving the
query normalization to the StringFieldType super type. This change fixes this by
separating the notion of normalization and case insensitivity (as implemented in
the `case_insensitive` flag). This is done because we still need to maintain
normalization of the query sting when the wildcard query method on the field type is
requested from the `query_string` query parser. Wildcard queries on keyword
fields should also continue to apply the fields normalizer, regardless of
whether the `case_insensitive` is set, because normalization could involve
something else than lowercasing (e.g. substituting umlauts like in the
GermanNormalizationFilter).

Closes elastic#71403
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request Apr 26, 2021
…ic#71751)

Wildcard queries on text fields should not apply the fields analyzer to the
search query. However, we accidentally enabled this in elastic#53127 by moving the
query normalization to the StringFieldType super type. This change fixes this by
separating the notion of normalization and case insensitivity (as implemented in
the `case_insensitive` flag). This is done because we still need to maintain
normalization of the query sting when the wildcard query method on the field type is
requested from the `query_string` query parser. Wildcard queries on keyword
fields should also continue to apply the fields normalizer, regardless of
whether the `case_insensitive` is set, because normalization could involve
something else than lowercasing (e.g. substituting umlauts like in the
GermanNormalizationFilter).

Closes elastic#71403
cbuescher pushed a commit that referenced this pull request Apr 26, 2021
… (#72214)

Wildcard queries on text fields should not apply the fields analyzer to the
search query. However, we accidentally enabled this in #53127 by moving the
query normalization to the StringFieldType super type. This change fixes this by
separating the notion of normalization and case insensitivity (as implemented in
the `case_insensitive` flag). This is done because we still need to maintain
normalization of the query sting when the wildcard query method on the field type is
requested from the `query_string` query parser. Wildcard queries on keyword
fields should also continue to apply the fields normalizer, regardless of
whether the `case_insensitive` is set, because normalization could involve
something else than lowercasing (e.g. substituting umlauts like in the
GermanNormalizationFilter).

Closes #71403
cbuescher pushed a commit that referenced this pull request Apr 26, 2021
… (#72216)

Wildcard queries on text fields should not apply the fields analyzer to the
search query. However, we accidentally enabled this in #53127 by moving the
query normalization to the StringFieldType super type. This change fixes this by
separating the notion of normalization and case insensitivity (as implemented in
the `case_insensitive` flag). This is done because we still need to maintain
normalization of the query sting when the wildcard query method on the field type is
requested from the `query_string` query parser. Wildcard queries on keyword
fields should also continue to apply the fields normalizer, regardless of
whether the `case_insensitive` is set, because normalization could involve
something else than lowercasing (e.g. substituting umlauts like in the
GermanNormalizationFilter).

Closes #71403
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request Apr 26, 2021
…ic#71751) (elastic#72214)

Wildcard queries on text fields should not apply the fields analyzer to the
search query. However, we accidentally enabled this in elastic#53127 by moving the
query normalization to the StringFieldType super type. This change fixes this by
separating the notion of normalization and case insensitivity (as implemented in
the `case_insensitive` flag). This is done because we still need to maintain
normalization of the query sting when the wildcard query method on the field type is
requested from the `query_string` query parser. Wildcard queries on keyword
fields should also continue to apply the fields normalizer, regardless of
whether the `case_insensitive` is set, because normalization could involve
something else than lowercasing (e.g. substituting umlauts like in the
GermanNormalizationFilter).

Closes elastic#71403
cbuescher pushed a commit that referenced this pull request Apr 26, 2021
… (#72224)

Wildcard queries on text fields should not apply the fields analyzer to the
search query. However, we accidentally enabled this in #53127 by moving the
query normalization to the StringFieldType super type. This change fixes this by
separating the notion of normalization and case insensitivity (as implemented in
the `case_insensitive` flag). This is done because we still need to maintain
normalization of the query sting when the wildcard query method on the field type is
requested from the `query_string` query parser. Wildcard queries on keyword
fields should also continue to apply the fields normalizer, regardless of
whether the `case_insensitive` is set, because normalization could involve
something else than lowercasing (e.g. substituting umlauts like in the
GermanNormalizationFilter).

Closes #71403
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wildcard queries should not normalize/remove wildcards in queries

5 participants