Add support for dynamic pruning to cardinality aggregations on low-cardinality keyword fields. #92060

jpountz · 2022-12-02T11:17:49Z

On low-cardinality keyword fields, the cardinality aggregation currently uses the global_ordinals execution mode most of the time, which consists of collecting all documents that match the query, reading ordinals of the values that these documents contain, and setting bits in a bitset for these ordinals.

This commit introduces a feedback loop between the query and the cardinality aggregator, which allows the query to skip documents that only contain values that have already been seen by the cardinality aggregator. On the nyc_taxis dataset, a match_all query and the vendor_id field (2 unique values), the cardinality aggregation went from 3s to 3ms. The speedup would certainly not be as good in all cases, but I would still expect it to be very significant in many cases.

…rdinality keyword fields. On low-cardinality keyword fields, the `cardinality` aggregation currently uses the `global_ordinals` execution mode most of the time, which consists of collecting all documents that match the query, reading ordinals of the values that these documents contain, and setting bits in a bitset for these ordinals. This commit introduces a feedback loop between the query and the `cardinality` aggregator, which allows the query to skip documents that only contain values that have already been seen by the `cardinality` aggregator. On the `nyc_taxis` dataset, a `match_all` query and the `vendor_id` field (2 unique values), the `cardinality` aggregation went from 3s to 3ms. The speedup would certainly not be as good in all cases, but I would still expect in to be very significant in many cases.

elasticsearchmachine · 2022-12-02T12:40:49Z

Hi @jpountz, I've created a changelog YAML for you.

elasticsearchmachine · 2022-12-02T13:56:44Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

jpountz · 2022-12-07T18:02:02Z

I've run some benchmarks on a synthetically generated index that has 100M documents with the following values for the i-th indexed document:

k0 is i % 1000 < 50
k1 is i % 500 < 25
k2 is i % 500

Then the following queries:

match_all-cardinality:

      {
        "size": 0,
        "track_total_hits": false,
        "aggs": {
          "k2_cardinality": {
            "cardinality": {
              "field": "k2",
              "execution_hint": "global_ordinals"
            }
          }
        }
      }

k0-cardinality. This runs a cardinality aggregation on a term query that would collect every unique value of k2 so it should be able to exit early. This is expected to simulate a cardinality on the number of agents sending data over a date range, since all agents are expected to exist given a sufficiently long time range.

      {
        "size": 0,
        "track_total_hits": false,
        "query": {
          "term": {
            "k0": "0"
          }
        },
        "aggs": {
          "k2_cardinality": {
            "cardinality": {
              "field": "k2",
              "execution_hint": "global_ordinals"
            }
          }
        }
      }

k1-cardinality. This runs a cardinality aggregation on a term query that does NOT see every value of k2. So it can prune some hits, but it cannot end early, it need to go through the whole index. This would typically happen when there is correlation between the query and the aggregated field, e.g. compute the cardinality of agents that reported errors: probably that some monitored hosts are more likely to have errors than others.

      {
        "size": 0,
        "track_total_hits": false,
        "query": {
          "term": {
            "k1": "0"
          }
        },
        "aggs": {
          "k2_cardinality": {
            "cardinality": {
              "field": "k2",
              "execution_hint": "global_ordinals"
            }
          }
        }
      }

Here is what Rally reported:

|                                                Min Throughput | match_all-cardinality |    0.57575     | 444.294       |   443.718   |  ops/s | +77067.82% |
|                                               Mean Throughput | match_all-cardinality |    0.578541    | 444.294       |   443.715   |  ops/s | +76695.54% |
|                                             Median Throughput | match_all-cardinality |    0.57887     | 444.294       |   443.715   |  ops/s | +76651.97% |
|                                                Max Throughput | match_all-cardinality |    0.580407    | 444.294       |   443.713   |  ops/s | +76448.66% |
|                                       50th percentile latency | match_all-cardinality | 1719.91        |   1.88433     | -1718.02    |     ms |    -99.89% |
|                                       90th percentile latency | match_all-cardinality | 1753.1         |   2.1355      | -1750.96    |     ms |    -99.88% |
|                                      100th percentile latency | match_all-cardinality | 1791.7         |   2.63628     | -1789.06    |     ms |    -99.85% |
|                                  50th percentile service time | match_all-cardinality | 1719.91        |   1.88433     | -1718.02    |     ms |    -99.89% |
|                                  90th percentile service time | match_all-cardinality | 1753.1         |   2.1355      | -1750.96    |     ms |    -99.88% |
|                                 100th percentile service time | match_all-cardinality | 1791.7         |   2.63628     | -1789.06    |     ms |    -99.85% |
|                                                    error rate | match_all-cardinality |    0           |   0           |     0       |      % |      0.00% |
|                                                Min Throughput |        k0-cardinality |    0.5367      | 473.317       |   472.78    |  ops/s | +88090.11% |
|                                               Mean Throughput |        k0-cardinality |    0.538432    | 473.317       |   472.778   |  ops/s | +87806.55% |
|                                             Median Throughput |        k0-cardinality |    0.538581    | 473.317       |   472.778   |  ops/s | +87782.16% |
|                                                Max Throughput |        k0-cardinality |    0.539811    | 473.317       |   472.777   |  ops/s | +87581.97% |
|                                       50th percentile latency |        k0-cardinality | 1846.59        |   1.83855     | -1844.75    |     ms |    -99.90% |
|                                       90th percentile latency |        k0-cardinality | 1891.9         |   2.06683     | -1889.83    |     ms |    -99.89% |
|                                      100th percentile latency |        k0-cardinality | 1953.52        |   2.39496     | -1951.12    |     ms |    -99.88% |
|                                  50th percentile service time |        k0-cardinality | 1846.59        |   1.83855     | -1844.75    |     ms |    -99.90% |
|                                  90th percentile service time |        k0-cardinality | 1891.9         |   2.06683     | -1889.83    |     ms |    -99.89% |
|                                 100th percentile service time |        k0-cardinality | 1953.52        |   2.39496     | -1951.12    |     ms |    -99.88% |
|                                                    error rate |        k0-cardinality |    0           |   0           |     0       |      % |      0.00% |
|                                                Min Throughput |        k1-cardinality |    0.529033    |   1.876       |     1.34697 |  ops/s |   +254.61% |
|                                               Mean Throughput |        k1-cardinality |    0.532735    |   1.88155     |     1.34881 |  ops/s |   +253.19% |
|                                             Median Throughput |        k1-cardinality |    0.533744    |   1.8789      |     1.34515 |  ops/s |   +252.02% |
|                                                Max Throughput |        k1-cardinality |    0.534619    |   1.89503     |     1.36041 |  ops/s |   +254.46% |
|                                       50th percentile latency |        k1-cardinality | 1871.84        | 535.195       | -1336.64    |     ms |    -71.41% |
|                                       90th percentile latency |        k1-cardinality | 1901.33        | 539.858       | -1361.47    |     ms |    -71.61% |
|                                      100th percentile latency |        k1-cardinality | 1945.91        | 542.866       | -1403.04    |     ms |    -72.10% |
|                                  50th percentile service time |        k1-cardinality | 1871.84        | 535.195       | -1336.64    |     ms |    -71.41% |
|                                  90th percentile service time |        k1-cardinality | 1901.33        | 539.858       | -1361.47    |     ms |    -71.61% |
|                                 100th percentile service time |        k1-cardinality | 1945.91        | 542.866       | -1403.04    |     ms |    -72.10% |
|                                                    error rate |        k1-cardinality |    0           |   0           |     0       |      % |      0.00% |

The query on k1 is between 3x and 4x faster. And the query on a match_all and on k0 are ~1000x faster.

I also pushed some more tests that make sure that the optimization kicks in by using the debug info, so I think it's ready for someone to have a closer look at how it works.

martijnvg · 2023-04-17T15:08:49Z

@elasticmachine update branch

not-napoleon

Sorry it's taken me so long to look at this. I think this is fine to merge.

not-napoleon · 2023-03-01T20:39:30Z

.../main/java/org/elasticsearch/search/aggregations/metrics/GlobalOrdCardinalityAggregator.java

+    // we'd be paying the overhead of dynamic pruning without getting any benefits.
+    private static final int MAX_FIELD_CARDINALITY_FOR_DYNAMIC_PRUNING = 1024;
+
+    // Only start dynamic pruning when 128 ordinals or less have not been seen yet.


I'm confused by this comment. Do we prune when we have less than 128 or more than 128 ordinals?

The way I understand this, is that if we have less then or equal to 128 unseen ordinals then we prune.

martijnvg · 2023-05-01T13:59:28Z

@elasticmachine update branch

martijnvg

I think we should get this change merged. LGTM

martijnvg · 2023-05-01T14:47:22Z

.../main/java/org/elasticsearch/search/aggregations/metrics/GlobalOrdCardinalityAggregator.java

+    // we'd be paying the overhead of dynamic pruning without getting any benefits.
+    private static final int MAX_FIELD_CARDINALITY_FOR_DYNAMIC_PRUNING = 1024;
+
+    // Only start dynamic pruning when 128 ordinals or less have not been seen yet.


The way I understand this, is that if we have less then or equal to 128 unseen ordinals then we prune.

martijnvg · 2023-05-01T14:48:31Z

.../main/java/org/elasticsearch/search/aggregations/metrics/GlobalOrdCardinalityAggregator.java

+        void startPruning() throws IOException {
+            dynamicPruningSuccess++;
+            nonVisitedOrds = new HashMap<>();
+            // TODO: iterate the bitset using a `nextClearBit` operation?


Letting this loop be lead by nextClearBit() is a good idea 👍
I think we can add that method to bitset and make use of that in a follow up change.

martijnvg · 2023-05-01T14:50:44Z

.../main/java/org/elasticsearch/search/aggregations/metrics/GlobalOrdCardinalityAggregator.java

+                    noData++;
+                    return LeafBucketCollector.NO_OP_COLLECTOR;
+                }
+                // Otherwise we might be aggregating e.g. an IP field, which indexes data using points rather than an inverted index.


Maybe in a follow up change, we can also do the same trick here with points?

martijnvg · 2023-05-01T14:53:27Z

.../main/java/org/elasticsearch/search/aggregations/metrics/GlobalOrdCardinalityAggregator.java

+                        final CompetitiveIterator competitiveIterator;
+
+                        {
+                            // This optimization only works for top-level cardinality aggregations that collect bucket 0, so we can retrieve


If this is the only current limitation then I think we can get around this by creating CompetitiveIterator instance lazily? We would then need a CompetitiveIterator per bucket ordinal. Not sure if this can work. But could be explored in a follow up PR.

martijnvg · 2023-05-01T14:56:52Z

.../main/java/org/elasticsearch/search/aggregations/metrics/GlobalOrdCardinalityAggregator.java

+            this.docsWithField = docsWithField;
+        }
+
+        private Map<Long, PostingsEnum> nonVisitedOrds;


Given that this optimisation is limited in its use, only if on fields with <= 1024 terms, the need to use LongObjectPagedHashMap isn't needed here.

martijnvg · 2023-05-01T15:00:43Z

.../main/java/org/elasticsearch/search/aggregations/metrics/GlobalOrdCardinalityAggregator.java

+            if (indexTerms != null) {
+                BitArray bits = visitedOrds.get(0);
+                final int numNonVisitedOrds = maxOrd - (bits == null ? 0 : (int) bits.cardinality());
+                if (maxOrd <= MAX_FIELD_CARDINALITY_FOR_DYNAMIC_PRUNING || numNonVisitedOrds <= MAX_TERMS_FOR_DYNAMIC_PRUNING) {


So the optimisation also kicks in on fields with more than 1024 unique values, if there 128 or less terms to be processed. The brute force leaf bucket collector implementation does update the bit sets used here to determine numNonVisitedOrds.

martijnvg · 2023-05-03T08:30:47Z

This improvement also yielded a significant speedup for another benchmark that is under development. This search request counts the number of unique deployments using the cardinality aggregation:

{
    "aggs": {
      "0": {
        "cardinality": {
          "field": "kubernetes.deployment.name"
        }
      }
    },
    "size": 0,
    "fields": [
      {
        "field": "@timestamp",
        "format": "date_time"
      },
      {
        "field": "event.ingested",
        "format": "date_time"
      },
      {
        "field": "process.cpu.start_time",
        "format": "date_time"
      },
      {
        "field": "system.process.cpu.start_time",
        "format": "date_time"
      }
    ],
    "script_fields": {},
    "stored_fields": [
      "*"
    ],
    "runtime_mappings": {},
    "_source": {
      "excludes": []
    },
    "query": {
      "bool": {
        "must": [],
        "filter": [
          {
            "match_phrase": {
              "data_stream.dataset": "kubernetes.pod"
            }
          },
          {
            "range": {
              "@timestamp": {
                "format": "strict_date_optional_time",
                "gte": "{{info[1]}}",
                "lte": "{{end_time}}"
              }
            }
          }
        ],
        "should": [],
        "must_not": []
      }
    }
  }

This search request has a top level cardinality aggregation, which a requirement for the improvements in this change to kick in. Results:

                              Min Throughput |            unique_deployment_count_15_minutes |     24.6054      |    284.15        |     259.545       |  ops/s | +1054.83% |
|                                               Mean Throughput |            unique_deployment_count_15_minutes |     24.8917      |    284.15        |     259.258       |  ops/s | +1041.55% |
|                                             Median Throughput |            unique_deployment_count_15_minutes |     24.9475      |    284.15        |     259.203       |  ops/s | +1038.99% |
|                                                Max Throughput |            unique_deployment_count_15_minutes |     25.0225      |    284.15        |     259.128       |  ops/s | +1035.58% |
|                                       50th percentile latency |            unique_deployment_count_15_minutes |     39.1605      |      2.79196     |     -36.3685      |     ms |   -92.87% |
|                                       90th percentile latency |            unique_deployment_count_15_minutes |     40.4239      |      3.40919     |     -37.0147      |     ms |   -91.57% |
|                                       99th percentile latency |            unique_deployment_count_15_minutes |     49.6284      |      6.01068     |     -43.6178      |     ms |   -87.89% |
|                                      100th percentile latency |            unique_deployment_count_15_minutes |     55.0896      |      8.12486     |     -46.9647      |     ms |   -85.25% |
|                                  50th percentile service time |            unique_deployment_count_15_minutes |     39.1605      |      2.79196     |     -36.3685      |     ms |   -92.87% |
|                                  90th percentile service time |            unique_deployment_count_15_minutes |     40.4239      |      3.40919     |     -37.0147      |     ms |   -91.57% |
|                                  99th percentile service time |            unique_deployment_count_15_minutes |     49.6284      |      6.01068     |     -43.6178      |     ms |   -87.89% |
|                                 100th percentile service time |            unique_deployment_count_15_minutes |     55.0896      |      8.12486     |     -46.9647      |     ms |   -85.25% |
|                                                    error rate |            unique_deployment_count_15_minutes |      0           |      0           |       0           |      % |     0.00% |
|                                                Min Throughput |               unique_deployment_count_2_hours |      8.43276     |    299.224       |     290.791       |  ops/s | +3448.35% |
|                                               Mean Throughput |               unique_deployment_count_2_hours |      8.44431     |    299.224       |     290.78        |  ops/s | +3443.50% |
|                                             Median Throughput |               unique_deployment_count_2_hours |      8.44032     |    299.224       |     290.784       |  ops/s | +3445.17% |
|                                                Max Throughput |               unique_deployment_count_2_hours |      8.4756      |    299.224       |     290.748       |  ops/s | +3430.42% |
|                                       50th percentile latency |               unique_deployment_count_2_hours |    117.179       |      2.79249     |    -114.386       |     ms |   -97.62% |
|                                       90th percentile latency |               unique_deployment_count_2_hours |    121.86        |      3.29555     |    -118.565       |     ms |   -97.30% |
|                                       99th percentile latency |               unique_deployment_count_2_hours |    131.548       |      3.82721     |    -127.721       |     ms |   -97.09% |
|                                      100th percentile latency |               unique_deployment_count_2_hours |    135.695       |      6.89987     |    -128.795       |     ms |   -94.92% |
|                                  50th percentile service time |               unique_deployment_count_2_hours |    117.179       |      2.79249     |    -114.386       |     ms |   -97.62% |
|                                  90th percentile service time |               unique_deployment_count_2_hours |    121.86        |      3.29555     |    -118.565       |     ms |   -97.30% |
|                                  99th percentile service time |               unique_deployment_count_2_hours |    131.548       |      3.82721     |    -127.721       |     ms |   -97.09% |
|                                 100th percentile service time |               unique_deployment_count_2_hours |    135.695       |      6.89987     |    -128.795       |     ms |   -94.92% |
|                                                    error rate |               unique_deployment_count_2_hours |      0           |      0           |       0           |      % |     0.00% |
|                                                Min Throughput |              unique_deployment_count_24_hours |      4.26713     |    308.831       |     304.563       |  ops/s | +7137.44% |
|                                               Mean Throughput |              unique_deployment_count_24_hours |      4.27294     |    308.831       |     304.558       |  ops/s | +7127.59% |
|                                             Median Throughput |              unique_deployment_count_24_hours |      4.27199     |    308.831       |     304.559       |  ops/s | +7129.19% |
|                                                Max Throughput |              unique_deployment_count_24_hours |      4.28726     |    308.831       |     304.543       |  ops/s | +7103.46% |
|                                       50th percentile latency |              unique_deployment_count_24_hours |    231.968       |      2.71129     |    -229.257       |     ms |   -98.83% |
|                                       90th percentile latency |              unique_deployment_count_24_hours |    242.298       |      3.03284     |    -239.266       |     ms |   -98.75% |
|                                       99th percentile latency |              unique_deployment_count_24_hours |    253.986       |      3.7028      |    -250.283       |     ms |   -98.54% |
|                                      100th percentile latency |              unique_deployment_count_24_hours |    255.924       |      7.39493     |    -248.53        |     ms |   -97.11% |
|                                  50th percentile service time |              unique_deployment_count_24_hours |    231.968       |      2.71129     |    -229.257       |     ms |   -98.83% |
|                                  90th percentile service time |              unique_deployment_count_24_hours |    242.298       |      3.03284     |    -239.266       |     ms |   -98.75% |
|                                  99th percentile service time |              unique_deployment_count_24_hours |    253.986       |      3.7028      |    -250.283       |     ms |   -98.54% |
|                                 100th percentile service time |              unique_deployment_count_24_hours |    255.924       |      7.39493     |    -248.53        |     ms |   -97.11% |
|                                                    error rate |              unique_deployment_count_24_hours |      0           |      0           |       0           |      % |     0.00% |

const_keyword fields don't show up in the leafReader, since they have a const value. elastic#92060 modified the logic to return no results in case the leaf reader contains no information about the requested field in a cardinality aggregation. This is wrong for const_keyword fields, as they contain up to 1 distinct value. To fix this, we fall back to the old logic in this case that can handle const_keyword fields properly. Fixes elastic#99776

* Fix cardinality agg for const_keyword const_keyword fields don't show up in the leafReader, since they have a const value. #92060 modified the logic to return no results in case the leaf reader contains no information about the requested field in a cardinality aggregation. This is wrong for const_keyword fields, as they contain up to 1 distinct value. To fix this, we fall back to the old logic in this case that can handle const_keyword fields properly. Fixes #99776 * Update docs/changelog/99814.yaml * Update skip ranges for broken releases.

* Fix cardinality agg for const_keyword const_keyword fields don't show up in the leafReader, since they have a const value. elastic#92060 modified the logic to return no results in case the leaf reader contains no information about the requested field in a cardinality aggregation. This is wrong for const_keyword fields, as they contain up to 1 distinct value. To fix this, we fall back to the old logic in this case that can handle const_keyword fields properly. Fixes elastic#99776 * Update docs/changelog/99814.yaml * Update skip ranges for broken releases.

* Fix cardinality agg for const_keyword const_keyword fields don't show up in the leafReader, since they have a const value. #92060 modified the logic to return no results in case the leaf reader contains no information about the requested field in a cardinality aggregation. This is wrong for const_keyword fields, as they contain up to 1 distinct value. To fix this, we fall back to the old logic in this case that can handle const_keyword fields properly. Fixes #99776 * Update docs/changelog/99814.yaml * Update skip ranges for broken releases.

elastic/elasticsearch#92060

elasticsearchmachine added the v8.7.0 label Dec 2, 2022

jpountz added :Analytics/Aggregations Aggregations >enhancement labels Dec 2, 2022

jpountz added 4 commits December 2, 2022 13:40

Update docs/changelog/92060.yaml

cc07694

better name

0ae5942

Checkstyle.

2be15a6

Make test less trivial.

ad832af

jpountz marked this pull request as ready for review December 2, 2022 13:56

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Dec 2, 2022

jpountz added 3 commits December 2, 2022 15:09

spotless

b4d9aae

Merge branch 'main' into cardinality_dynamic_pruning

7c53bb4

More tests.

44b23d5

Fix test.

266afb6

not-napoleon self-requested a review December 19, 2022 14:28

jpountz added 3 commits December 24, 2022 17:18

Merge branch 'main' into cardinality_dynamic_pruning

f52d1ad

Spotless.

2d6611f

Merge branch 'main' into cardinality_dynamic_pruning

e7c1e2f

rjernst added v8.8.0 and removed v8.7.0 labels Feb 8, 2023

Merge branch 'main' into cardinality_dynamic_pruning

2e67600

not-napoleon approved these changes Apr 18, 2023

View reviewed changes

gmarouli added v8.9.0 and removed v8.8.0 labels Apr 26, 2023

Merge branch 'main' into cardinality_dynamic_pruning

2b7cd34

martijnvg approved these changes May 1, 2023

View reviewed changes

martijnvg merged commit 79ad42c into elastic:main May 1, 2023

martijnvg mentioned this pull request May 3, 2023

Improve metric query performance #95776

Closed

7 tasks

jpountz deleted the cardinality_dynamic_pruning branch June 9, 2023 16:09

kkrik-es mentioned this pull request Sep 22, 2023

Cardinality agg on constant_keyword fields returning zero in 8.9+ #99776

Closed

kkrik-es mentioned this pull request Sep 22, 2023

Fix cardinality agg for const_keyword #99814

Merged

floragunncom pushed a commit to floragunncom/search-guard that referenced this pull request Mar 3, 2024

fix junit tests - cardinality aggregation

c1e230f

elastic/elasticsearch#92060

floragunncom pushed a commit to floragunncom/search-guard that referenced this pull request Mar 3, 2024

fix junit tests - cardinality aggregation

6f36777

elastic/elasticsearch#92060

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for dynamic pruning to cardinality aggregations on low-cardinality keyword fields. #92060

Add support for dynamic pruning to cardinality aggregations on low-cardinality keyword fields. #92060

jpountz commented Dec 2, 2022 •

edited

Loading

elasticsearchmachine commented Dec 2, 2022

elasticsearchmachine commented Dec 2, 2022

jpountz commented Dec 7, 2022

martijnvg commented Apr 17, 2023

not-napoleon left a comment

not-napoleon Mar 1, 2023

martijnvg May 1, 2023

martijnvg commented May 1, 2023

martijnvg left a comment

martijnvg May 1, 2023

martijnvg May 1, 2023

martijnvg May 1, 2023

martijnvg May 1, 2023

martijnvg May 1, 2023

martijnvg May 1, 2023

martijnvg commented May 3, 2023

Add support for dynamic pruning to cardinality aggregations on low-cardinality keyword fields. #92060

Add support for dynamic pruning to cardinality aggregations on low-cardinality keyword fields. #92060

Conversation

jpountz commented Dec 2, 2022 • edited Loading

elasticsearchmachine commented Dec 2, 2022

elasticsearchmachine commented Dec 2, 2022

jpountz commented Dec 7, 2022

martijnvg commented Apr 17, 2023

not-napoleon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martijnvg commented May 1, 2023

martijnvg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martijnvg commented May 3, 2023

jpountz commented Dec 2, 2022 •

edited

Loading