Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks to compare autohisto with date histogram #40

Merged
merged 3 commits into from
Sep 17, 2018

Conversation

pcsanwald
Copy link
Contributor

This PR pairs with this autohisto PR and adds two aggregations for comparison, which @colings86 suggested.

To test, I ran rally locally using esrally --track-path=/Users/paulsanwald/Code/rally-tracks/nyc_taxis. Using the full NYC taxi data set on my local laptop gave a few errors that I was unable to reproduce using a subset of the dataset, adjusting the number of buckets to correspond to date_histogram to ensure we're doing an apples to apples comparison. Here's the results from a local run:

|   All |                Min Throughput |       autohisto_agg |       2.01 |  ops/s |
|   All |             Median Throughput |       autohisto_agg |       2.02 |  ops/s |
|   All |                Max Throughput |       autohisto_agg |       2.04 |  ops/s |
|   All |       50th percentile latency |       autohisto_agg |    7.04785 |     ms |
|   All |       90th percentile latency |       autohisto_agg |    9.29401 |     ms |
|   All |       99th percentile latency |       autohisto_agg |    9.47358 |     ms |
|   All |      100th percentile latency |       autohisto_agg |    9.50069 |     ms |
|   All |  50th percentile service time |       autohisto_agg |    4.03652 |     ms |
|   All |  90th percentile service time |       autohisto_agg |    4.29146 |     ms |
|   All |  99th percentile service time |       autohisto_agg |    5.12963 |     ms |
|   All | 100th percentile service time |       autohisto_agg |    6.79592 |     ms |
|   All |                    error rate |       autohisto_agg |          0 |      % |
|   All |                Min Throughput |  date_histogram_agg |       2.01 |  ops/s |
|   All |             Median Throughput |  date_histogram_agg |       2.02 |  ops/s |
|   All |                Max Throughput |  date_histogram_agg |       2.04 |  ops/s |
|   All |       50th percentile latency |  date_histogram_agg |    8.33197 |     ms |
|   All |       90th percentile latency |  date_histogram_agg |    8.89726 |     ms |
|   All |       99th percentile latency |  date_histogram_agg |    9.02826 |     ms |
|   All |      100th percentile latency |  date_histogram_agg |    9.03734 |     ms |
|   All |  50th percentile service time |  date_histogram_agg |    3.80297 |     ms |
|   All |  90th percentile service time |  date_histogram_agg |    3.93656 |     ms |
|   All |  99th percentile service time |  date_histogram_agg |    4.38513 |     ms |
|   All | 100th percentile service time |  date_histogram_agg |     4.5412 |     ms |
|   All |                    error rate |  date_histogram_agg |          0 |      % |

Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look fine to me. I just have a few comments:

Using the full NYC taxi data set on my local laptop gave a few errors that I was unable to reproduce using a subset of the dataset, adjusting the number of buckets to correspond to date_histogram to ensure we're doing an apples to apples comparison.

Does this mean that with the full data set there will be errors? I read the discussion in elastic/elasticsearch#28993 and as far as I understand it, there will be errors with the full data set. In that case I think it might make sense to combine it e.g. with a query to reduce the amount of data and avoid errors?

My understanding is also that we need to wait merging this until the Elasticsearch PR is available on master.

@colings86
Copy link

@pcsanwald I'm interested what the errors were with the full dataset? Were they the bucket limit errors we are discussing in elastic/elasticsearch#28993?

Otherwise, could we add a second pair of operations that run a sub-aggregation under the histogram? Something like a terms agg on a low cardinality field with an avg fare under it? The auto_date_histogram runs in breadth first mode so the evaluation of the sub aggregations is delayed until after collection is finished so I am interested in what affect this has on the performance.

@danielmitterdorfer you are right, this will need to be merged once elastic/elasticsearch#28993 is in master otherwise your nightly benchmarks will start failing

@pcsanwald
Copy link
Contributor Author

pcsanwald commented Apr 3, 2018

@colings86 @danielmitterdorfer I was struggling to debug the error in rally (couldn't find the actual error messages), and wasn't able to load the full dataset on a vanilla ES server running locally (which I was also struggling to debug), which led me to suspect that perhaps the problem was with running a local setup.

I like the idea of limiting the query further and will update the PR and re-test.

@danielmitterdorfer
Copy link
Member

@colings86 @danielmitterdorfer I was struggling to debug the error in rally (couldn't find the actual error messages), and wasn't able to load the full dataset on a vanilla ES server running locally (which I was also struggling to debug), which led me to suspect that perhaps the problem was with running a local setup.

One tip @pcsanwald: By default, Rally will just continue on errors and you will only see that something bad has happened if you look at the "error rate". However, you can enforce a more strict mode by specifying --on-error=abort on the command line. In that case you should get a more detailed error message as soon as the first error occurs and the benchmark is stopped. Should the error message not be helpful enough, please just tell here and we'll see what we can do to improve it.

@pcsanwald
Copy link
Contributor Author

thanks @danielmitterdorfer, the part I was struggling with was how to get a more specific stacktrace for runs where the error rate for a query was 100%, I was looking in the logs, and couldn't find the errors. I'll use --on-error=abort which will hopefully also speed up the dev cycle for this :)

@pcsanwald
Copy link
Contributor Author

Pushed changes to limit by query as @danielmitterdorfer @colings86 suggested, re-ran successfully with full NYC taxi dataset, results here:

|   All |                  Min Throughput |       autohisto_agg |      1.96 |  ops/s |
|   All |               Median Throughput |       autohisto_agg |         2 |  ops/s |
|   All |                  Max Throughput |       autohisto_agg |      2.01 |  ops/s |
|   All |         50th percentile latency |       autohisto_agg |   531.618 |     ms |
|   All |         90th percentile latency |       autohisto_agg |   964.313 |     ms |
|   All |         99th percentile latency |       autohisto_agg |   1020.03 |     ms |
|   All |        100th percentile latency |       autohisto_agg |   1077.56 |     ms |
|   All |    50th percentile service time |       autohisto_agg |   465.907 |     ms |
|   All |    90th percentile service time |       autohisto_agg |   522.152 |     ms |
|   All |    99th percentile service time |       autohisto_agg |   618.826 |     ms |
|   All |   100th percentile service time |       autohisto_agg |   788.446 |     ms |
|   All |                      error rate |       autohisto_agg |         0 |      % |
|   All |                  Min Throughput |  date_histogram_agg |      1.88 |  ops/s |
|   All |               Median Throughput |  date_histogram_agg |       1.9 |  ops/s |
|   All |                  Max Throughput |  date_histogram_agg |      1.99 |  ops/s |
|   All |         50th percentile latency |  date_histogram_agg |   3079.44 |     ms |
|   All |         90th percentile latency |  date_histogram_agg |   4391.78 |     ms |
|   All |         99th percentile latency |  date_histogram_agg |   5058.75 |     ms |
|   All |        100th percentile latency |  date_histogram_agg |   5073.23 |     ms |
|   All |    50th percentile service time |  date_histogram_agg |   539.676 |     ms |
|   All |    90th percentile service time |  date_histogram_agg |   589.096 |     ms |
|   All |    99th percentile service time |  date_histogram_agg |   669.835 |     ms |
|   All |   100th percentile service time |  date_histogram_agg |   692.903 |     ms |
|   All |                      error rate |  date_histogram_agg |         0 |      % |

@danielmitterdorfer
Copy link
Member

Changes look fine to me. One remark: Your target throughput seems a bit too high for your hardware but we can adjust this for our nightly hardware once you have merged (more background info on target throughput and latency, and also my ElasticON talk starting at around 11:30 into the video / slide 15).

@danielmitterdorfer
Copy link
Member

@pcsanwald is there still interest in merging this PR?

@pcsanwald
Copy link
Contributor Author

@danielmitterdorfer - yes, especially now that the auto-interval histogram stuff is merged. let me "dust this off" and re-test, etc.

@danielmitterdorfer
Copy link
Member

Great! Thank you Paul.

@pcsanwald
Copy link
Contributor Author

@danielmitterdorfer I re-ran this locally and am happy with the results for this dataset; I might also add a query over geonames in order to benchmark cases where we have sparse fields (and thus lots of 0 buckets in the aggregation result), but I will open a separate PR for that.

@danielmitterdorfer
Copy link
Member

@pcsanwald you're free to merge this at any time. It will automatically be run in our nightly benchmarks as soon as it is merged but we need to create a new chart for it (I'll take care of that).

@pcsanwald pcsanwald merged commit 1b05cc8 into elastic:master Sep 17, 2018
@danielmitterdorfer
Copy link
Member

I've added charts now for our nightly and release benchmarks. Thanks for your PR @pcsanwald!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants