Solves problems with expensive inlinestats benchmarks by ncordon · Pull Request #896 · elastic/rally-tracks

ncordon · 2025-10-30T17:23:32Z

Adds a smaller index nyc_taxis_sample with a 1000 rows of the original nyc_taxis and runs the benchmarks stats_count_group_by_esql and inlinestats_count_group_by_esql on it.

Why

Because there are some benchmarks for inline stats that are triggering circuit breaker exceptions:

esrally.exceptions.RallyError: Cannot run task [stats_count_group_by_esql]: Request returned an error. Error type: api, Description: {"error":{"root_cause":[{"type":"circuit_breaking_exception","reason":"[request] 

Data too large, data for [<reused_arrays>] would be [5153974035/4.8gb], which is larger than the limit of [5153960755/4.7gb]; 
for more information, see https://www.elastic.co/docs/troubleshoot/elasticsearch/circuit-breaker-errors?version=master","bytes_wanted":5153974035,"bytes_limit":5153960755,"durability":"TRANSIENT","suppressed":[{"type":"circuit_breaking_exception","reason":"[request] 

Data too large, data for [<reused_arrays>] would be [5153974035/4.8gb], which is larger than the limit of [5153960755/4.7gb];

ncordon · 2025-10-31T09:01:54Z

nyc_taxis/operations/default.json

      "name": "stats_count_group_by_esql",
      "operation-type": "esql",
-      "query" : "FROM nyc_taxis METADATA _id | stats count(passenger_count) by _id | LIMIT 1000"
+      "query" : "FROM nyc_taxis_sample METADATA _id | stats count(passenger_count) by _id"


Note the LIMIT 1000 was moved after the queries in this other pr: #873

Having the limit before or after the stats completely changes the behaviour of the query. The other PR moved it after, and this one effectively moves it before again. I'm assuming we will see performance differences with both changes?

Having said that, I understand the purpose of these queries was to limit before, so the change in this current PR is moving back towards that original purpose.

The queries haven't even run after we changed the limit to be after, right? So we should be able to revert to the original purpose of these benchmarks that was to evaluate the inline stats more than the fetching part

Good point!

craigtaverner

Approved, although I wonder if there are pre-existing operations that could be used. At least for deleting the index, I assume there exists a delete-index operation. For the sampling of the index, I doubt there exists an operation for that, so the raw-operation is probably necessary. Would be interesting to get an opinion from es-perf about this.

craigtaverner · 2025-10-31T10:20:31Z

nyc_taxis/operations/default.json

+      "body": {
+        "source": {
+          "index": "nyc_taxis",
+          "query": { "match_all": {} }


Do we need a query here, isn't match_all implied?

I assumed you had two ways of doing this, one with a query, and include a size inside that, and the other with the max_docs parameter. Since you used max_docs, I presumed the query was unnecessary.

Yeah you are right, I thought we still needed the query 👍

craigtaverner · 2025-10-31T10:21:50Z

nyc_taxis/operations/default.json

+    },
+    {
+      "name": "delete-nyc-taxis-sample-index",
+      "operation-type": "raw-request",


Isn't there already a delete index operation you could use instead of a raw-request?

See index property in https://esrally.readthedocs.io/en/stable/track.html#delete-index.

craigtaverner · 2025-10-31T10:24:11Z

nyc_taxis/operations/default.json

      "name": "stats_count_group_by_esql",
      "operation-type": "esql",
-      "query" : "FROM nyc_taxis METADATA _id | stats count(passenger_count) by _id | LIMIT 1000"
+      "query" : "FROM nyc_taxis_sample METADATA _id | stats count(passenger_count) by _id"


Having the limit before or after the stats completely changes the behaviour of the query. The other PR moved it after, and this one effectively moves it before again. I'm assuming we will see performance differences with both changes?

craigtaverner · 2025-10-31T10:25:09Z

nyc_taxis/operations/default.json

      "name": "stats_count_group_by_esql",
      "operation-type": "esql",
-      "query" : "FROM nyc_taxis METADATA _id | stats count(passenger_count) by _id | LIMIT 1000"
+      "query" : "FROM nyc_taxis_sample METADATA _id | stats count(passenger_count) by _id"


Having said that, I understand the purpose of these queries was to limit before, so the change in this current PR is moving back towards that original purpose.

gbanasiak

The reindexing approach with document limit does not guarantee the same set of documents in nyc_taxis_sample on each run which may or may not contribute to variability of the benchmark depending on uniformity of the corpus. A conservative approach would be hand-picking 1000 documents, creating a new corpus file, and defining a new index in corpora. Something to consider if results are more noisy than before.

Please see other comments below.

nyc_taxis/challenges/default.json

nyc_taxis/operations/default.json

gbanasiak · 2025-11-20T17:53:59Z

backported to 9.2 in #919

This is an empty commit which records missing backports from manual or squashed backports through "cherry picked from" metadata. CI determines Elasticsearch build arguments #925 (#926) (cherry picked from commit 8c33ff5) Exclude some challenges when testing with ES release builds #922 (#919) (cherry picked from commit f38f8fc) Reduce filtering scope in CI workflow #908 (#919) (cherry picked from commit 8e571a5) Address pytest deprecations #911 (#919) (cherry picked from commit fa81c5e) Solves problems with expensive inlinestats benchmarks #896 (#919) (cherry picked from commit 8209244) Adds better challenges for comparing inlinestats #873 (#919) (cherry picked from commit c41a950)

Solves problems with expensive inlinestats benchmarks

4fc5adc

ncordon force-pushed the reduce-inline-stats-benchmarks branch from 57ddf24 to 4fc5adc Compare October 31, 2025 08:55

ncordon commented Oct 31, 2025

View reviewed changes

craigtaverner approved these changes Oct 31, 2025

View reviewed changes

gbanasiak requested a review from a team October 31, 2025 10:51

gbanasiak reviewed Oct 31, 2025

View reviewed changes

nyc_taxis/challenges/default.json Show resolved Hide resolved

ncordon added 3 commits October 31, 2025 14:38

Creates sampled index before the refresh

38f58e3

Uses delete-index operation

0d0e1ab

Removes query from the create index op

06f8cb8

gbanasiak reviewed Oct 31, 2025

View reviewed changes

nyc_taxis/operations/default.json Outdated Show resolved Hide resolved

ncordon added 2 commits October 31, 2025 15:58

Removes unnecesary dup

4d24642

Adds another refresh

c5ee865

ncordon force-pushed the reduce-inline-stats-benchmarks branch from d7d6ee1 to c5ee865 Compare October 31, 2025 15:32

ncordon merged commit 8209244 into elastic:master Oct 31, 2025
28 checks passed

esbenchmachine added the backport pending Awaiting backport to stable release branch label Oct 31, 2025

gbanasiak pushed a commit that referenced this pull request Nov 19, 2025

Solves problems with expensive inlinestats benchmarks (#896)

ca0ae49

gbanasiak added v9.2 and removed backport pending Awaiting backport to stable release branch labels Nov 20, 2025

elastic deleted a comment from esbenchmachine Nov 21, 2025

esbenchmachine mentioned this pull request Dec 3, 2025

Rename IT folders #938

Merged

esbenchmachine mentioned this pull request Dec 5, 2025

Backport reminders - add missing default values #947

Merged

Comments

Conversation

ncordon commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

craigtaverner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gbanasiak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gbanasiak commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ncordon commented Oct 30, 2025 •

edited

Loading