Multi-bucket aggregator wrapper is slow and uses a ton of memory

Before 7.9.0 many of our more complex aggregations made a simplifying assumption that required that they duplicate many data structures once per bucket that contained them. The most expensive of these weighed in at a couple of kilobytes each. So for an aggregation like:
```
POST _search
{
  "aggs": {
    "date": {
      "date_histogram": { "field": "timestamp", "calendar_interval": "day" },
      "aggs": {
        "ips": {
          "terms": { "field": "ip" }
        }
      }
    }
  }
}
```

When run over three years spends a couple of megabytes just on bucket accounting. More deeply nested aggregations spend even more on this overhead. And 7.9.0 removes all of it which should allow us to run better in lower memory environments.

As a bonus we wrote quite a few Rally [benchmarks](https://github.com/elastic/rally-tracks/) for aggs to make sure that these tests didn't slow down aggregations. So we can think much more scientifically about aggregation performance. The benchmarks suggest that these changes don't affect simple aggregation trees and speed up complex aggregation trees of similar or higher depth than the example above. Your actual performance changes will vary but it this should help! :crossed_fingers:  

EDIT:
Everything above the EDIT mark was added when I tagged this `release highlight` so it could be more easily understood in context.

#55873 removed the "multi-bucket wrapper" from the numeric terms aggregator and showed that we can get a pretty substantial performance improvement in some common aggregation requests. This will track work to remove the wrapper for other aggregations because:
1. I expect we can get a similar or better performance improvement for each one.
2. The wrapper makes it very difficult to reason about aggregations.
3. This will give us a good excuse to add rally tracks for these aggregations.

* [x] string `terms`  (#57207 + #57361 #57397 + #57438 + #57758)
* [x] `significant_terms` (#56789 + #57207 + #57361 + #57397 + #57438 + #57758)
* [x] `rare_terms` (#57948)
* [x] `date_histogram` (#56921)
* [x] `auto_date_histogram` (#57304)
* [x] `histogram` (#57277)
* [x] `parent` (#57490 + #57892)
* [x] `child` (#57490 + #57892)
* [x] `geohash_grid` (#57483)
* [x] `geotile_grid` (#57483)
* [x] `scripted_metric` (#57627)
* [x] `significant_text` (#57903  + #58145)

After this is all done we can:
* [x] Remove `significant_terms`'s "funny" reference back to its factory for caching. We won't need it because they'll only ever be one aggregator so it can cache. (#57903)
* [x] ~~Look into non-`BigArrays` backed memory usage in aggs. This is more important now that we don't get the 5k "artificial" value added to the breaker per bucket.~~ Moved to #59892
* [x] Replace `descendsFromBucketAggregator(parent)` with `collectsFromSingleBucket`. (#58571)
* [x] Look into replacing "lego-ed" data structures with purpose built ones. (7.10: #59740)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi-bucket aggregator wrapper is slow and uses a ton of memory #56487

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multi-bucket aggregator wrapper is slow and uses a ton of memory #56487

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions