Skip to content

Conversation

@nik9000
Copy link
Member

@nik9000 nik9000 commented May 28, 2020

This saves some memory when the histogram aggregation is not a top
level aggregation by dropping asMultiBucketAggregator in favor of
natively implementing multi-bucket storage in the aggregator. For the
most part this just uses the LongKeyedBucketOrds that we built the
first time we did this.

This saves some memory when the `histogram` aggregation is not a top
level aggregation by dropping `asMultiBucketAggregator` in favor of
natively implementing multi-bucket storage in the aggregator. For the
most part this just uses the `LongKeyedBucketOrds` that we built the
first time we did this.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 28, 2020
@nik9000
Copy link
Member Author

nik9000 commented May 28, 2020

I'm going to add a test for the new debug information.

* Base class for functionality shared between aggregators for this
* {@code histogram} aggregation.
*/
public abstract class AbstractHistogramAggregator extends BucketsAggregator {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a TODO that the range and numeric version of the aggregator shared a ton of code. Now they share a superclass that provides all that code.

ValuesSource valuesSource, DocValueFormat formatter, SearchContext context,
Aggregator parent, Map<String, Object> metadata) throws IOException {
ValuesSource.Range rangeValueSource = (ValuesSource.Range) valuesSource;
if (rangeValueSource.rangeType().isNumeric() == false) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this check into the ctor so I can use the ctor reference that we've been doing elsewhere.

Aggregator parent,
boolean collectsFromSingleBucket,
Map<String, Object> metadata) throws IOException {
if (collectsFromSingleBucket == false) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the important line!

fieldType.setName("field");
try (IndexReader reader = w.getReader()) {
IndexSearcher searcher = new IndexSearcher(reader);
InternalHistogram histogram = search(searcher, new MatchAllDocsQuery(), aggBuilder, fieldType);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this kind of thing all over the place so I figured I'd make a utility method for it.

Releasables.close(releasables);
releasables.clear();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice touch. likely usable across many future tests

@nik9000
Copy link
Member Author

nik9000 commented May 29, 2020

I'm pulling some performance numbers for this. I'll likely merge before they get done and update with them once they come in. I'm fairly confident in it though.

@nik9000 nik9000 merged commit 460b204 into elastic:master May 29, 2020
@nik9000
Copy link
Member Author

nik9000 commented May 29, 2020

About a 38% performance gain in the test that I ran:

Before:

|                    error rate |            index |           0 |      % |
|                Min Throughput | date_histo_histo |        0.28 |  ops/s |
|             Median Throughput | date_histo_histo |        0.28 |  ops/s |
|                Max Throughput | date_histo_histo |        0.28 |  ops/s |
|       50th percentile latency | date_histo_histo |     23599.5 |     ms |
|       90th percentile latency | date_histo_histo |     35095.1 |     ms |
|      100th percentile latency | date_histo_histo |     37977.2 |     ms |
|  50th percentile service time | date_histo_histo |     3575.41 |     ms |
|  90th percentile service time | date_histo_histo |     3605.77 |     ms |
| 100th percentile service time | date_histo_histo |     3659.23 |     ms |
|                    error rate | date_histo_histo |           0 |      % |

After:

|                Min Throughput | date_histo_histo |        0.33 |  ops/s |
|             Median Throughput | date_histo_histo |        0.34 |  ops/s |
|                Max Throughput | date_histo_histo |        0.34 |  ops/s |
|       50th percentile latency | date_histo_histo |     2200.88 |     ms |
|       90th percentile latency | date_histo_histo |     2222.86 |     ms |
|      100th percentile latency | date_histo_histo |     2245.81 |     ms |
|  50th percentile service time | date_histo_histo |     2200.06 |     ms |
|  90th percentile service time | date_histo_histo |     2222.01 |     ms |
| 100th percentile service time | date_histo_histo |     2244.96 |     ms |
|                    error rate | date_histo_histo |           0 |      % |

nik9000 added a commit to nik9000/elasticsearch that referenced this pull request May 29, 2020
This saves some memory when the `histogram` aggregation is not a top
level aggregation by dropping `asMultiBucketAggregator` in favor of
natively implementing multi-bucket storage in the aggregator. For the
most part this just uses the `LongKeyedBucketOrds` that we built the
first time we did this.
nik9000 added a commit that referenced this pull request May 29, 2020
…7377)

This saves some memory when the `histogram` aggregation is not a top
level aggregation by dropping `asMultiBucketAggregator` in favor of
natively implementing multi-bucket storage in the aggregator. For the
most part this just uses the `LongKeyedBucketOrds` that we built the
first time we did this.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request May 29, 2020
nik9000 added a commit that referenced this pull request May 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.9.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants