-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for dynamic pruning to cardinality aggregations on low-cardinality keyword fields. #92060
Conversation
…rdinality keyword fields. On low-cardinality keyword fields, the `cardinality` aggregation currently uses the `global_ordinals` execution mode most of the time, which consists of collecting all documents that match the query, reading ordinals of the values that these documents contain, and setting bits in a bitset for these ordinals. This commit introduces a feedback loop between the query and the `cardinality` aggregator, which allows the query to skip documents that only contain values that have already been seen by the `cardinality` aggregator. On the `nyc_taxis` dataset, a `match_all` query and the `vendor_id` field (2 unique values), the `cardinality` aggregation went from 3s to 3ms. The speedup would certainly not be as good in all cases, but I would still expect in to be very significant in many cases.
Hi @jpountz, I've created a changelog YAML for you. |
Pinging @elastic/es-analytics-geo (Team:Analytics) |
I've run some benchmarks on a synthetically generated index that has 100M documents with the following values for the i-th indexed document:
Then the following queries:
{
"size": 0,
"track_total_hits": false,
"aggs": {
"k2_cardinality": {
"cardinality": {
"field": "k2",
"execution_hint": "global_ordinals"
}
}
}
}
{
"size": 0,
"track_total_hits": false,
"query": {
"term": {
"k1": "0"
}
},
"aggs": {
"k2_cardinality": {
"cardinality": {
"field": "k2",
"execution_hint": "global_ordinals"
}
}
}
} Here is what Rally reported:
The query on I also pushed some more tests that make sure that the optimization kicks in by using the debug info, so I think it's ready for someone to have a closer look at how it works. |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it's taken me so long to look at this. I think this is fine to merge.
// we'd be paying the overhead of dynamic pruning without getting any benefits. | ||
private static final int MAX_FIELD_CARDINALITY_FOR_DYNAMIC_PRUNING = 1024; | ||
|
||
// Only start dynamic pruning when 128 ordinals or less have not been seen yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by this comment. Do we prune when we have less than 128 or more than 128 ordinals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I understand this, is that if we have less then or equal to 128 unseen ordinals then we prune.
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should get this change merged. LGTM
// we'd be paying the overhead of dynamic pruning without getting any benefits. | ||
private static final int MAX_FIELD_CARDINALITY_FOR_DYNAMIC_PRUNING = 1024; | ||
|
||
// Only start dynamic pruning when 128 ordinals or less have not been seen yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I understand this, is that if we have less then or equal to 128 unseen ordinals then we prune.
void startPruning() throws IOException { | ||
dynamicPruningSuccess++; | ||
nonVisitedOrds = new HashMap<>(); | ||
// TODO: iterate the bitset using a `nextClearBit` operation? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Letting this loop be lead by nextClearBit()
is a good idea 👍
I think we can add that method to bitset and make use of that in a follow up change.
noData++; | ||
return LeafBucketCollector.NO_OP_COLLECTOR; | ||
} | ||
// Otherwise we might be aggregating e.g. an IP field, which indexes data using points rather than an inverted index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in a follow up change, we can also do the same trick here with points?
final CompetitiveIterator competitiveIterator; | ||
|
||
{ | ||
// This optimization only works for top-level cardinality aggregations that collect bucket 0, so we can retrieve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the only current limitation then I think we can get around this by creating CompetitiveIterator
instance lazily? We would then need a CompetitiveIterator
per bucket ordinal. Not sure if this can work. But could be explored in a follow up PR.
this.docsWithField = docsWithField; | ||
} | ||
|
||
private Map<Long, PostingsEnum> nonVisitedOrds; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this optimisation is limited in its use, only if on fields with <= 1024 terms, the need to use LongObjectPagedHashMap
isn't needed here.
if (indexTerms != null) { | ||
BitArray bits = visitedOrds.get(0); | ||
final int numNonVisitedOrds = maxOrd - (bits == null ? 0 : (int) bits.cardinality()); | ||
if (maxOrd <= MAX_FIELD_CARDINALITY_FOR_DYNAMIC_PRUNING || numNonVisitedOrds <= MAX_TERMS_FOR_DYNAMIC_PRUNING) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the optimisation also kicks in on fields with more than 1024 unique values, if there 128 or less terms to be processed. The brute force leaf bucket collector implementation does update the bit sets used here to determine numNonVisitedOrds
.
This improvement also yielded a significant speedup for another benchmark that is under development. This search request counts the number of unique deployments using the
This search request has a top level
|
const_keyword fields don't show up in the leafReader, since they have a const value. elastic#92060 modified the logic to return no results in case the leaf reader contains no information about the requested field in a cardinality aggregation. This is wrong for const_keyword fields, as they contain up to 1 distinct value. To fix this, we fall back to the old logic in this case that can handle const_keyword fields properly. Fixes elastic#99776
* Fix cardinality agg for const_keyword const_keyword fields don't show up in the leafReader, since they have a const value. #92060 modified the logic to return no results in case the leaf reader contains no information about the requested field in a cardinality aggregation. This is wrong for const_keyword fields, as they contain up to 1 distinct value. To fix this, we fall back to the old logic in this case that can handle const_keyword fields properly. Fixes #99776 * Update docs/changelog/99814.yaml * Update skip ranges for broken releases.
* Fix cardinality agg for const_keyword const_keyword fields don't show up in the leafReader, since they have a const value. elastic#92060 modified the logic to return no results in case the leaf reader contains no information about the requested field in a cardinality aggregation. This is wrong for const_keyword fields, as they contain up to 1 distinct value. To fix this, we fall back to the old logic in this case that can handle const_keyword fields properly. Fixes elastic#99776 * Update docs/changelog/99814.yaml * Update skip ranges for broken releases.
* Fix cardinality agg for const_keyword const_keyword fields don't show up in the leafReader, since they have a const value. #92060 modified the logic to return no results in case the leaf reader contains no information about the requested field in a cardinality aggregation. This is wrong for const_keyword fields, as they contain up to 1 distinct value. To fix this, we fall back to the old logic in this case that can handle const_keyword fields properly. Fixes #99776 * Update docs/changelog/99814.yaml * Update skip ranges for broken releases.
On low-cardinality keyword fields, the
cardinality
aggregation currently uses theglobal_ordinals
execution mode most of the time, which consists of collecting all documents that match the query, reading ordinals of the values that these documents contain, and setting bits in a bitset for these ordinals.This commit introduces a feedback loop between the query and the
cardinality
aggregator, which allows the query to skip documents that only contain values that have already been seen by thecardinality
aggregator. On thenyc_taxis
dataset, amatch_all
query and thevendor_id
field (2 unique values), thecardinality
aggregation went from 3s to 3ms. The speedup would certainly not be as good in all cases, but I would still expect it to be very significant in many cases.