Skip to content

Make sure all significant memory usage in aggs are tracked in BigArrays #59892

@nik9000

Description

@nik9000

When we did #56487 we decided that it was important to do an inventory of all of the memory that aggregations allocate that is not part of BigArrays. We'd like to get everything tracked so we're less reliant on the real memory breaker catching stuff.

  • DeferringBucketCollector subclasses aren't tracked.
  • matrix_stats's RunningStats has a bunch of HashMaps that aren't being tracked properly. It looks like they don't grow a lot, but if you put in a high enough cardinality agg it could get messy.
  • string_stats has Map<Character, LongArray> would could end up taking up a fair bit of untracked space if under a high cardinality agg and there are a bunch of characters. English's would see that Map has 64ish entries. Japanese and Chinese look like they'd consistently see a couple thousand entries in the Map. And the array won't work at all for things that aren't on the BMP like Emoji and Egyptian Hieroglyphs and a few unlucky languages.
  • top_hits will create a bunch of Collectors which aren't tracked by BigArrays. They are all fairly careful with memory, but it could use a bit and we aren't tracking it.
  • TDigestState, HDR histogram and friends look like they can use a fair bit of untracked memory. We could probably track a max for it or something like that (see also Integrate TDigestState with circuit breakers #99815). For HDR, the HDR histogram library needs to be forked first (Fork HdrHistogram library #95904)
  • HyperLogLogPlusPlus has an OpenBitSet which has the same behavior as our BitArray but it isn't backed to BigArrays. Use standard bit set impl in cardinality #61816
  • DoubleHistogram and friends are also untracked.
  • ScriptedMetric is totally untracked and frankly terrifying.
  • filters's "compatible" collector can realize a bunch of bit sets in memory. Trigger parent circuit breaker when building scorers in filters aggregation #102511
  • The reduction phase is all java object based (see Enable Circuit Breaker tracking in more parts of the aggregations framework #89437)
  • While global ordinals memory usage is tracked, the process of building them isn't Check the real memory circuit breaker when building global ordinals #102462

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions