-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Is your feature request related to a problem? Please describe
Currently the logic to pick Ordinals vs DirectCollector is dynamic and based on the number of ordinals to collect and memory overhead. If memory overhead is high for OrdinalsCollector, then DirectCollector is used. Due to https://issues.apache.org/jira/browse/LUCENE-9663 few users are reporting regression in Cardinality aggregation because of slower DirectCollector after replacing prefix compression with LZ4 compression for terms dictionary in lucene 8.9.
Fix proposed here is to use OrdinalsCollector more often which will collect the ordinals into a bitset first and then performs term lookup in postCollect() of segment and that's a lot faster.
However, we don't have a way to control picking up OrdinalsCollector in OpenSearch.
Describe the solution you'd like
Introduce a memory threshold dynamic setting which OrdinalsCollector can use and if its usage is under this threshold, always use OrdinalsCollector. This logic can be added here.
Related component
Search:Aggregations
Describe alternatives you've considered
Use of eager_global_ordinals or murmur hash, but its an index time setting and can have impact on indexing performance. Its discussed in more detail here.
Additional context
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status