Skip to content

Nested query using wildcard filter with custom analyzer : query failed on 7.13.0 #87728

@zannkukai

Description

@zannkukai

Elasticsearch Version

7.13.0

Java Version

10.0.2

OS Version

MacOSX 12.4 (ElasticSearch on Docker)

Problem Description

Hello,

In our project, we use a custom analyzer on identifiers field to remove hyphens. All our identifiers metadata are stored into nested fields with (at least) a type and a value.
So, to search on identifiers, we use a nested query with two criteria :

  • a match filter on identifier.type ; to search exact value
  • a wildcard filter on identifier.value ; to allow search using wildcard characters (?, *)

All works fine with ES until 7.12.1 version. Now I tried to update to 7.13.0 version and my query failed, and I don't know why :'(
I searched on release notes but I didn't find ant breaking changes or mention to any changes about this.

Is anyone would have any idea or explanation (or a work-around). To solve my problem.
You could found a testing scenario in attach.
I tested on my local laptop using clean docker images. One using elasticsearch-icu-7.12.1 and the second elasticsearch-icu-7.13.0.

Many thanks for your help !

Steps to Reproduce

DELETE /my-index

PUT /my-index
{
  "settings": {
    "analysis":{
      "analyzer": {
        "identifier-analyzer": {
          "tokenizer": "keyword",
          "char_filter": ["hyphen-filter"],
          "filter": ["lowercase"]
        }
      },
      "char_filter": {
        "hyphen-filter": {
          "type": "pattern_replace",
          "pattern": "-",
          "replacement": ""
        }
      }
    }
  }
}


PUT /my-index/_mapping
{
  "properties": {
    "identifiers": {
      "type": "nested",
      "properties": {
        "type": {
          "type": "keyword"
        },
        "value": {
          "type": "text",
          "analyzer": "identifier-analyzer",
          "fields": {
            "raw": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  }
}

POST /my-index/_doc
{
  "identifiers":[{
    "type": "isbn",
    "value": "978-2-09-250279-2"
  }, {
    "type": "isbn",
    "value": "2-09-250279-4"
  }, {
    "type": "local",
    "value": "custom-localid-12345"
  }]
}

# First query :: Search on an existing ISBN but with value containing bad hypens
#   7.12.1 --> return the document as expected
#   7.13.0 --> not return the document... seems a problem with 'wilcard'.
GET /my-index/_search
{
  "query": {
    "nested": {
      "path": "identifiers",
      "query": {
        "bool": {
          "must": [
            { "wildcard": { "identifiers.value": "978209-2---502792" } },
            { "match": { "identifiers.type": "isbn" } }
          ]
        }
      }
    }
  }
}
#  replacing 'wildcard' filter by 'match'
#  7.12.1 --> still OK
#  7.13.0 --> now working! a small victory....
GET /my-index/_search
{
  "query": {
    "nested": {
      "path": "identifiers",
      "query": {
        "bool": {
          "must": [
            { "match": { "identifiers.value": "978209-2---502792" } },
            { "match": { "identifiers.type": "isbn" } }
          ]
        }
      }
    }
  }
}
# Second query :: Search ISBN using wildcard and bad hyphens
#   7.12.1 --> return the document as expected. Yeaaaaah!
#   7.13.0 --> not return the document... seems a problem with 'wilcard' and my custom analyzer
GET /my-index/_search
{
  "query": {
    "nested": {
      "path": "identifiers",
      "query": {
        "bool": {
          "must": [
            { "wildcard": { "identifiers.value": "9-78-2*" } },
            { "match": { "identifiers.type": "isbn" } }
          ]
        }
      }
    }
  }
}
#   7.12.1 --> still working
#   7.13.0 --> now working.... but I need to manually remove hyphens
GET /my-index/_search
{
  "query": {
    "nested": {
      "path": "identifiers",
      "query": {
        "bool": {
          "must": [
            { "wildcard": { "identifiers.value": "9782*" } },
            { "match": { "identifiers.type": "isbn" } }
          ]
        }
      }
    }
  }
}

# However my custom analyzer seems working fine in both versions :'(
POST /my-index/_analyze
{
  "analyzer": "identifier-analyzer",
  "text": "978209-2---502792"
}
POST /my-index/_analyze
{
  "analyzer": "identifier-analyzer",
  "text": "9-78-2*"
}

Additionally, the query profiler from Kibana demonstrates different behaviors between both version
query

{
  "query": {
    "nested": {
      "path": "identifiers",
      "query": {
        "bool": {
          "must": [
            { "wildcard": { "identifiers.value": "978209-2---502792" } },
            { "match": { "identifiers.type": "isbn" } }
          ]
        }
      }
    }
  }
}

7.12
image

7.13
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions