-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Elasticsearch version (bin/elasticsearch --version): 8.0.0, 7.12.0, 7.11.0
Plugins installed: [] none, default distribution
JVM version (java -version): built-in JDK
OS version (uname -a if on a Unix-like system): all (this is my current master source running but this impacts 7.x and 7.11 branches as well)
"name" : "LEEDR-XPS",
"cluster_name" : "es-test-cluster",
"cluster_uuid" : "s-GANDlNSZ2nNdr00SQw3g",
"version" : {
"number" : "8.0.0-SNAPSHOT",
"build_flavor" : "oss",
"build_type" : "zip",
"build_hash" : "3454a094f73e7696446dbd2c0525041293dd4460",
"build_date" : "2021-01-19T19:31:16.897887417Z",
"build_snapshot" : true,
"lucene_version" : "8.8.0",
"minimum_wire_compatibility_version" : "7.12.0",
"minimum_index_compatibility_version" : "7.0.0"
},
"tagline" : "You Know, for Search"
}
Description of the problem including expected versus actual behavior: A cardinality agg with split by terms is no longer returning the term with the largest result count. Results vary based on the "size".
Almost 3 years ago we automated the Shakespeare Kibana getting started tutorial https://www.elastic.co/guide/en/kibana/6.8/tutorial-load-dataset.html
The test has been passing with the same expected results until about Oct 29, 2020 when the results returned by the aggregation changed. Unfortunately the test was skipped to allow Kibana to take the new Elasticsearch snapshot and wasn't investigated until now.
Steps to reproduce:
Please include a minimal but complete recreation of the problem,
including (e.g.) index creation, mappings, settings, query etc. The easier
you make for us to reproduce it, the more likely that somebody will take the
time to look at it.
- download this data https://download.elastic.co/demos/kibana/gettingstarted/shakespeare_6.0.json
- create this mapping;
PUT /shakespeare
{
"mappings": {
"properties": {
"speaker": {
"type": "keyword"
},
"play_name": {
"type": "keyword"
},
"line_id": {
"type": "integer"
},
"speech_number": {
"type": "integer"
}
}
}
}
- Load the data;
curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/shakespeare/doc/_bulk?pretty' --data-binary @shakespeare_6.0.json - count the docs to make sure we have the same data
curl -XGET 'localhost:9220/shakespeare/_count'"count":111396 - run the same query as the Kibana visualization test;
GET /shakespeare/_search
{
"aggs": {
"2": {
"terms": {
"field": "play_name",
"order": {
"1": "desc"
},
"size": 5
},
"aggs": {
"1": {
"cardinality": {
"field": "speaker"
}
}
}
}
},
"size": 0,
"fields": [],
"script_fields": {},
"stored_fields": [
"*"
],
"_source": {
"excludes": []
},
"query": {
"bool": {
"must": [],
"filter": [
{
"match_all": {}
}
],
"should": [],
"must_not": []
}
}
}
The results I get on latest master are incorrect;
{
"took" : 12,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"2" : {
"doc_count_error_upper_bound" : -1,
"sum_other_doc_count" : 94454,
"buckets" : [
{
"key" : "Henry VI Part 2",
"doc_count" : 3334,
"1" : {
"value" : 65
}
},
{
"key" : "Coriolanus",
"doc_count" : 3992,
"1" : {
"value" : 62
}
},
{
"key" : "Antony and Cleopatra",
"doc_count" : 3862,
"1" : {
"value" : 55
}
},
{
"key" : "Henry VI Part 1",
"doc_count" : 2983,
"1" : {
"value" : 53
}
},
{
"key" : "Julius Caesar",
"doc_count" : 2771,
"1" : {
"value" : 51
}
}
]
}
}
}
If we increase the terms agg size to 12 we get results that show the largest bucket value of 71 which is what the Kibana test has expected since it was written almost 3 years ago and is what 7.10 shows;
"buckets" : [
{
"key" : "Richard III",
"doc_count" : 3941,
"1" : {
"value" : 71
}
},
Provide logs (if relevant):