-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
I believe I have isolated a regression in performance of the top_hits aggregation between ES 7.9 and ES 7.10.2 (which means the regression is present in OpenSearch 1.0 as well).
To Reproduce
I added the following operation to the default nyc_taxis track in es rally:
{
"name": "top_hits_agg",
"warmup-iterations": 10,
"iterations": 50,
"clients": 2,
"operation": {
"operation-type": "search",
"body": {
"size": 0,
"query": {
"range": {
"dropoff_datetime": {
"from": "2015-01-01 00:00:00",
"to": "2015-01-01 12:00:00"
}
}
},
"aggs": {
"2": {
"terms": {
"field": "payment_type",
"size": 1000
},
"aggregations": {
"3": {
"date_histogram": {
"field": "dropoff_datetime",
"fixed_interval": "5s"
},
"aggregations": {
"1": {
"top_hits": {
"size": 1,
"sort": [
{
"dropoff_datetime": {
"order": "desc"
}
}
]
}
}
}
}
}
}
}
}
}
}
After populating both a 7.9 and 7.10.2 cluster with the nyc_taxis dataset, I then ran a race with the following commands:
esrally race --pipeline=benchmark-only --distribution-version=7.9.0 --track=nyc_taxis --target-host=127.0.0.1:9790 --report-file=es_790_report_$(date +%s) --user-tag='version:7.9,race:top_hits_and_date' --include-tasks=top_hits_agg,date_histogram_agg
esrally race --pipeline=benchmark-only --distribution-version=7.10.2 --track=nyc_taxis --target-host=127.0.0.1:9710 --report-file=es_710_report_$(date +%s) --user-tag='version:7.10.2,race:top_hits_and_date' --include-tasks=top_hits_agg,date_histogram_agg
Expected behavior
The expected behavior is that there will be very similar performance for the top_hits aggregation between the two versions of the software. The following data compares ES7.9 as baseline and ES7.10.2 as the contender. As expected, the performance of the date_histogram_agg operation is more-or-less at parity between the two. However, the top_hits aggregation performance is consistently 20-30% worse in 7.10.2 than 7.9:
| Metric | Task | Baseline | Contender | Diff | Unit |
|---|---|---|---|---|---|
| Cumulative indexing time of primary shards | 125.509 | 110.375 | -15.1333 | min | |
| Min cumulative indexing time across primary shard | 125.509 | 110.375 | -15.1333 | min | |
| Median cumulative indexing time across primary shard | 125.509 | 110.375 | -15.1333 | min | |
| Max cumulative indexing time across primary shard | 125.509 | 110.375 | -15.1333 | min | |
| Cumulative indexing throttle time of primary shards | 0 | 0 | 0 | min | |
| Min cumulative indexing throttle time across primary shard | 0 | 0 | 0 | min | |
| Median cumulative indexing throttle time across primary shard | 0 | 0 | 0 | min | |
| Max cumulative indexing throttle time across primary shard | 0 | 0 | 0 | min | |
| Cumulative merge time of primary shards | 41.7306 | 47.5379 | 5.80733 | min | |
| Cumulative merge count of primary shards | 93 | 89 | -4 | ||
| Min cumulative merge time across primary shard | 41.7306 | 47.5379 | 5.80733 | min | |
| Median cumulative merge time across primary shard | 41.7306 | 47.5379 | 5.80733 | min | |
| Max cumulative merge time across primary shard | 41.7306 | 47.5379 | 5.80733 | min | |
| Cumulative merge throttle time of primary shards | 4.12925 | 4.25008 | 0.12083 | min | |
| Min cumulative merge throttle time across primary shard | 4.12925 | 4.25008 | 0.12083 | min | |
| Median cumulative merge throttle time across primary shard | 4.12925 | 4.25008 | 0.12083 | min | |
| Max cumulative merge throttle time across primary shard | 4.12925 | 4.25008 | 0.12083 | min | |
| Cumulative refresh time of primary shards | 2.60575 | 2.36 | -0.24575 | min | |
| Cumulative refresh count of primary shards | 82 | 79 | -3 | ||
| Min cumulative refresh time across primary shard | 2.60575 | 2.36 | -0.24575 | min | |
| Median cumulative refresh time across primary shard | 2.60575 | 2.36 | -0.24575 | min | |
| Max cumulative refresh time across primary shard | 2.60575 | 2.36 | -0.24575 | min | |
| Cumulative flush time of primary shards | 5.45558 | 5.48922 | 0.03363 | min | |
| Cumulative flush count of primary shards | 35 | 34 | -1 | ||
| Min cumulative flush time across primary shard | 5.45558 | 5.48922 | 0.03363 | min | |
| Median cumulative flush time across primary shard | 5.45558 | 5.48922 | 0.03363 | min | |
| Max cumulative flush time across primary shard | 5.45558 | 5.48922 | 0.03363 | min | |
| Total Young Gen GC time | 0.258 | 1.24 | 0.982 | s | |
| Total Young Gen GC count | 32 | 136 | 104 | ||
| Total Old Gen GC time | 0 | 0 | 0 | s | |
| Total Old Gen GC count | 0 | 0 | 0 | ||
| Store size | 25.0414 | 24.3655 | -0.67586 | GB | |
| Translog size | 5.12227e-08 | 5.12227e-08 | 0 | GB | |
| Heap used for segments | 0.193222 | 0.0988235 | -0.0944 | MB | |
| Heap used for doc values | 0.0354309 | 0.034874 | -0.00056 | MB | |
| Heap used for terms | 0.0408325 | 0.0370789 | -0.00375 | MB | |
| Heap used for norms | 0 | 0 | 0 | MB | |
| Heap used for points | 0 | 0 | 0 | MB | |
| Heap used for stored fields | 0.116959 | 0.0268707 | -0.09009 | MB | |
| Segment count | 32 | 30 | -2 | ||
| Min Throughput | date_histogram_agg | 1.50023 | 1.50023 | -1e-05 | ops/s |
| Median Throughput | date_histogram_agg | 1.50035 | 1.50035 | -0 | ops/s |
| Max Throughput | date_histogram_agg | 1.50054 | 1.50054 | 0 | ops/s |
| 50th percentile latency | date_histogram_agg | 548.563 | 549.301 | 0.73863 | ms |
| 90th percentile latency | date_histogram_agg | 557.053 | 559.417 | 2.36497 | ms |
| 99th percentile latency | date_histogram_agg | 571.816 | 569.581 | -2.23542 | ms |
| 100th percentile latency | date_histogram_agg | 578.69 | 575.319 | -3.37139 | ms |
| 50th percentile service time | date_histogram_agg | 547.429 | 548.28 | 0.85141 | ms |
| 90th percentile service time | date_histogram_agg | 555.608 | 558.266 | 2.65809 | ms |
| 99th percentile service time | date_histogram_agg | 570.955 | 568.385 | -2.57038 | ms |
| 100th percentile service time | date_histogram_agg | 577.466 | 573.309 | -4.1576 | ms |
| error rate | date_histogram_agg | 0 | 0 | 0 | % |
| Min Throughput | top_hits_agg | 0.720015 | 0.559013 | -0.161 | ops/s |
| Median Throughput | top_hits_agg | 0.747501 | 0.577278 | -0.17022 | ops/s |
| Max Throughput | top_hits_agg | 0.759315 | 0.582273 | -0.17704 | ops/s |
| 50th percentile latency | top_hits_agg | 2569.6 | 3373.35 | 803.747 | ms |
| 90th percentile latency | top_hits_agg | 2609.98 | 3455.39 | 845.411 | ms |
| 99th percentile latency | top_hits_agg | 2622.45 | 3502.29 | 879.836 | ms |
| 100th percentile latency | top_hits_agg | 2694.13 | 3503.48 | 809.352 | ms |
| 50th percentile service time | top_hits_agg | 2569.6 | 3373.35 | 803.747 | ms |
| 90th percentile service time | top_hits_agg | 2609.98 | 3455.39 | 845.411 | ms |
| 99th percentile service time | top_hits_agg | 2622.45 | 3502.29 | 879.836 | ms |
| 100th percentile service time | top_hits_agg | 2694.13 | 3503.48 | 809.352 | ms |
| error rate | top_hits_agg | 0 | 0 | 0 | % |
Plugins
None
Host/Environment (please complete the following information):
- OS: Ubuntu 20.04
- Version ES 7.9 and ES 7.10.2
- Host type: c6i.8xlarge
Additional context
The elasticsearch distributions are just the stock versions downloaded as tar files, extracted and started with the following command:
ES_JAVA_OPTS="-Xms8g -Xmx8g" ./bin/elasticsearch
The only non-default setting in the yml configuration file is the port number. Both instances are running on the same machine, though there load is only driven one at a time (not concurrently). The service is bound to localhost and es rally is running on the same machine.