-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Elasticsearch Version
7.13.0
Java Version
10.0.2
OS Version
MacOSX 12.4 (ElasticSearch on Docker)
Problem Description
Hello,
In our project, we use a custom analyzer on identifiers field to remove hyphens. All our identifiers metadata are stored into nested fields with (at least) a type and a value.
So, to search on identifiers, we use a nested query with two criteria :
- a match filter on
identifier.type; to search exact value - a wildcard filter on
identifier.value; to allow search using wildcard characters (?, *)
All works fine with ES until 7.12.1 version. Now I tried to update to 7.13.0 version and my query failed, and I don't know why :'(
I searched on release notes but I didn't find ant breaking changes or mention to any changes about this.
Is anyone would have any idea or explanation (or a work-around). To solve my problem.
You could found a testing scenario in attach.
I tested on my local laptop using clean docker images. One using elasticsearch-icu-7.12.1 and the second elasticsearch-icu-7.13.0.
Many thanks for your help !
Steps to Reproduce
DELETE /my-index
PUT /my-index
{
"settings": {
"analysis":{
"analyzer": {
"identifier-analyzer": {
"tokenizer": "keyword",
"char_filter": ["hyphen-filter"],
"filter": ["lowercase"]
}
},
"char_filter": {
"hyphen-filter": {
"type": "pattern_replace",
"pattern": "-",
"replacement": ""
}
}
}
}
}
PUT /my-index/_mapping
{
"properties": {
"identifiers": {
"type": "nested",
"properties": {
"type": {
"type": "keyword"
},
"value": {
"type": "text",
"analyzer": "identifier-analyzer",
"fields": {
"raw": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
POST /my-index/_doc
{
"identifiers":[{
"type": "isbn",
"value": "978-2-09-250279-2"
}, {
"type": "isbn",
"value": "2-09-250279-4"
}, {
"type": "local",
"value": "custom-localid-12345"
}]
}
# First query :: Search on an existing ISBN but with value containing bad hypens
# 7.12.1 --> return the document as expected
# 7.13.0 --> not return the document... seems a problem with 'wilcard'.
GET /my-index/_search
{
"query": {
"nested": {
"path": "identifiers",
"query": {
"bool": {
"must": [
{ "wildcard": { "identifiers.value": "978209-2---502792" } },
{ "match": { "identifiers.type": "isbn" } }
]
}
}
}
}
}
# replacing 'wildcard' filter by 'match'
# 7.12.1 --> still OK
# 7.13.0 --> now working! a small victory....
GET /my-index/_search
{
"query": {
"nested": {
"path": "identifiers",
"query": {
"bool": {
"must": [
{ "match": { "identifiers.value": "978209-2---502792" } },
{ "match": { "identifiers.type": "isbn" } }
]
}
}
}
}
}
# Second query :: Search ISBN using wildcard and bad hyphens
# 7.12.1 --> return the document as expected. Yeaaaaah!
# 7.13.0 --> not return the document... seems a problem with 'wilcard' and my custom analyzer
GET /my-index/_search
{
"query": {
"nested": {
"path": "identifiers",
"query": {
"bool": {
"must": [
{ "wildcard": { "identifiers.value": "9-78-2*" } },
{ "match": { "identifiers.type": "isbn" } }
]
}
}
}
}
}
# 7.12.1 --> still working
# 7.13.0 --> now working.... but I need to manually remove hyphens
GET /my-index/_search
{
"query": {
"nested": {
"path": "identifiers",
"query": {
"bool": {
"must": [
{ "wildcard": { "identifiers.value": "9782*" } },
{ "match": { "identifiers.type": "isbn" } }
]
}
}
}
}
}
# However my custom analyzer seems working fine in both versions :'(
POST /my-index/_analyze
{
"analyzer": "identifier-analyzer",
"text": "978209-2---502792"
}
POST /my-index/_analyze
{
"analyzer": "identifier-analyzer",
"text": "9-78-2*"
}
Additionally, the query profiler from Kibana demonstrates different behaviors between both version
query
{
"query": {
"nested": {
"path": "identifiers",
"query": {
"bool": {
"must": [
{ "wildcard": { "identifiers.value": "978209-2---502792" } },
{ "match": { "identifiers.type": "isbn" } }
]
}
}
}
}
}

