Skip to content

Synonym rule processing not working as expected #44571

@cbuescher

Description

@cbuescher

On 7.2.0 the behaviour of using a synonym filter after e.g. a lowercase filter in the analysis is not as expected.

PUT /test_index
{
    "settings": {
        "index" : {
            "analysis" : {
                "analyzer" : {
                    "synonym" : {
                        "tokenizer" : "whitespace",
                        "filter" : [ "lowercase", "synonym"]
                    }
                },
                "filter" : {
                    "synonym" : {
                        "type" : "synonym",
                        "synonyms" : [ "Eins, Uno, One" ]
                    }
                }
            }
        }
    }
}

GET /test_index/_analyze
{
  "analyzer": "synonym",
  "text" : "Uno"
}

returns

{
  "tokens" : [
    {
      "token" : "uno",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 0
    }
  ]
}

I would have expected the lowercasing to be applied to the synonym filter inputs and consequently the term to expand to the three variations.

This is how the output of the previous example looks like on 7.1.1. still:

{
  "tokens" : [
    {
      "token" : "uno",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "eins",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "SYNONYM",
      "position" : 0
    },
    {
      "token" : "one",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "SYNONYM",
      "position" : 0
    }
  ]
}

The problem doesn't seem to be limited to the _analyze endpoint but also searching doesn't work as before.

On 7.2.0:

PUT /test_index/_mapping
{
    "properties": {
      "title" : {
        "type": "text",
        "analyzer": "synonym"
      }
    }
}

POST /test_index/_doc
{
  "title" : "Eins"
}


POST /test_index/_search
{
  "query": {
    "match": {
      "title": "Uno"
    }
  }
}

returns no hits.

On 7.1.1 this is the output:

"hits" : [
      {
        "_index" : "test_index",
        "_type" : "_doc",
        "_id" : "pqt3BWwB6PA4qSE33nZE",
        "_score" : 0.5274171,
        "_source" : {
          "title" : "Eins"
        }
      }
    ]

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions