Skip to content

[Feature Request] Add a cluster setting for memory threshold to pick OrdinalsCollector in Cardinality aggregation #15269

@rishabhmaurya

Description

@rishabhmaurya

Is your feature request related to a problem? Please describe

Currently the logic to pick Ordinals vs DirectCollector is dynamic and based on the number of ordinals to collect and memory overhead. If memory overhead is high for OrdinalsCollector, then DirectCollector is used. Due to https://issues.apache.org/jira/browse/LUCENE-9663 few users are reporting regression in Cardinality aggregation because of slower DirectCollector after replacing prefix compression with LZ4 compression for terms dictionary in lucene 8.9.
Fix proposed here is to use OrdinalsCollector more often which will collect the ordinals into a bitset first and then performs term lookup in postCollect() of segment and that's a lot faster.
However, we don't have a way to control picking up OrdinalsCollector in OpenSearch.

Describe the solution you'd like

Introduce a memory threshold dynamic setting which OrdinalsCollector can use and if its usage is under this threshold, always use OrdinalsCollector. This logic can be added here.

Related component

Search:Aggregations

Describe alternatives you've considered

Use of eager_global_ordinals or murmur hash, but its an index time setting and can have impact on indexing performance. Its discussed in more detail here.

Additional context

No response

Metadata

Metadata

Type

No type

Projects

Status

🆕 New

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions