Skip to content

Under certain conditions, sort values for a hit come from an unrelated document #31554

@dbevacqua

Description

@dbevacqua

Elasticsearch version (bin/elasticsearch --version):

6.3.0 (from docker.elastic.co/elasticsearch/elasticsearch-oss:6.3.0)

Plugins installed: []

JVM version (java -version):

openjdk version "10.0.1" 2018-04-17
OpenJDK Runtime Environment (build 10.0.1+10)
OpenJDK 64-Bit Server VM (build 10.0.1+10, mixed mode)

OS version (uname -a if on a Unix-like system):

Linux f076156ce113 4.4.0-128-generic #154-Ubuntu SMP Fri May 25 14:15:18 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

Under certain conditions, sort values for a hit come from an unrelated document. The specific case illustrated below involves two documents on the same shard, one of which is PUT both before and after the other without a refresh in between.

Expected behaviour: a search with multi-level nested sort targeting that document returns "missing" sort value.

Actual behaviour: a search with multi-level nested sort targeting that document returns the sort value from the other document.

We believe the problem is wider-reaching than this as we have observed the behaviour with many (or most) of our searches which use multi-level nested sorts.

Steps to reproduce:

Execute following bash script.

#!/bin/bash

function http {
  curl -Ss -H 'Content-Type: application/json' -X "$@"
  echo
}

http DELETE "localhost:9200/test"

http PUT "localhost:9200/test"  -d '{
    "mappings": {
      "test-type": {
        "dynamic": "strict",
        "properties": {
          "nested1": {
            "type": "nested",
            "properties": {
              "nested2": {
                "type": "nested",
                "properties": {
                  "nested2_keyword": {
                    "type": "keyword"
                  },
                  "sortScore": {
                    "type": "integer"
                  }
                }
              },
              "nested1_keyword": {
                "type": "keyword"
              }
            }
          },
          "key": {
            "type": "keyword"
          }
        }
      }
    },
    "settings": {
      "index": {
        "number_of_shards": "1"
      }
    }
}
'

http PUT "localhost:9200/test/test-type/1" -d '{ "key": 1 }'

http PUT "localhost:9200/test/test-type/2" -d '{
  "key": 2,
  "nested1": [      
    {
      "nested2": [
        {
          "nested2_keyword": "nested2_bar",
          "sortScore": 1234
        }
      ],
      "nested1_keyword": "nested1_foo"
    }      
 ]
}
'

# works with refresh here (but not earlier)
#http POST "localhost:9200/test/_refresh"

#works with update instead of PUT
#http POST "localhost:9200/test/test-type/1/_update" -H 'Content-Type: application/json' -d '{ "key": 1 }'

http PUT "localhost:9200/test/test-type/1" -d '{ "key": 1 }'

http POST "localhost:9200/test/_refresh"

http GET "localhost:9200/test/_search?pretty" -d '{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
              "key":  1
          }
        }
      ]
    }
  },
  "sort": [
    {
      "nested1.nested2.sortScore": {
        "order": "desc",
        "mode": "max",
          "missing" : -1,
          "nested": {
          "path": "nested1",
          "filter": {
            "term": {
              "nested1.nested1_keyword": "nested1_foo"
            }
          },
          "nested": {
            "path": "nested1.nested2",
            "filter": {
              "term": {
                "nested1.nested2.nested2_keyword": "nested2_bar"
              }
            }
          }
        }
      }
    }
  ]
}
'

The search returns something like this:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "test",
        "_type" : "test-type",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "key" : 1
        },
        "sort" : [
          1234
        ]
      }
    ]
  }
}

the sort value of 1234 is defined in doc 2, not doc 1.

Provide logs (if relevant):

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Search/SearchSearch-related issues that do not fall into other categories>bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions