Skip to content

Multi-bucket aggregator wrapper is slow and uses a ton of memory #56487

@nik9000

Description

@nik9000

Before 7.9.0 many of our more complex aggregations made a simplifying assumption that required that they duplicate many data structures once per bucket that contained them. The most expensive of these weighed in at a couple of kilobytes each. So for an aggregation like:

POST _search
{
  "aggs": {
    "date": {
      "date_histogram": { "field": "timestamp", "calendar_interval": "day" },
      "aggs": {
        "ips": {
          "terms": { "field": "ip" }
        }
      }
    }
  }
}

When run over three years spends a couple of megabytes just on bucket accounting. More deeply nested aggregations spend even more on this overhead. And 7.9.0 removes all of it which should allow us to run better in lower memory environments.

As a bonus we wrote quite a few Rally benchmarks for aggs to make sure that these tests didn't slow down aggregations. So we can think much more scientifically about aggregation performance. The benchmarks suggest that these changes don't affect simple aggregation trees and speed up complex aggregation trees of similar or higher depth than the example above. Your actual performance changes will vary but it this should help! 🤞

EDIT:
Everything above the EDIT mark was added when I tagged this release highlight so it could be more easily understood in context.

#55873 removed the "multi-bucket wrapper" from the numeric terms aggregator and showed that we can get a pretty substantial performance improvement in some common aggregation requests. This will track work to remove the wrapper for other aggregations because:

  1. I expect we can get a similar or better performance improvement for each one.
  2. The wrapper makes it very difficult to reason about aggregations.
  3. This will give us a good excuse to add rally tracks for these aggregations.

After this is all done we can:

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions