Add Big5 track by wangch079 · Pull Request #775 · elastic/rally-tracks

wangch079 · 2025-04-18T14:50:32Z

https://github.com/elastic/search-developer-productivity/issues/3789

This PR adds Big5 rally track to benchmark the five essential areas:

Text Querying
Sorting
Date Histogram
Range Queries
Terms Aggregation

wangch079 · 2025-04-18T14:55:28Z

big5/track.json

+          "source-file": "logs.ndjson.bz2",
+          "document-count": 1131862,
+          "compressed-bytes": 57764621,
+          "uncompressed-bytes": 1047614086


I added this file to https://rally-tracks.elastic.co/big5 for testing purposes. We will regenerate a much larger data set.

It would be good to get the test mode corpus there too - that is the -1k variants - this will allow the full CI to run too

Thank you @gareth-ellis , I see the CI failures.

I will generate a proper data set and add -1k files for testing at the same time.

great, thanks!

Hi @gareth-ellis , I updated corpora with 8 files, the size of each (after decompressed) is about 128GB (1TB in total).

The CI still fails:

Error: The action 'Run tests' has timed out after 120 minutes.

Even the copora size is large, I think the test only uses the -1k file (which is less than 100 KB)?

The issue seems to be more that (probably) your track is leaving indices in a none green state, so apm which comes a little later sits and waits for all indices to be green.

You have shards and replicas hard coded to 1:1 - i would suggest changing so these can be configured, and having default value for replicas as 0, this will stop the indices being yellow (since IT tests run with a single node, we wont automatically allocate a replica on the same node as a primary). If you feel that it is inappropriate to have a default replica count of 0, then you can set as 1 and then add a new IT file - i suggest just call it custom_config or similar, and then override the number of replicas (that you've set as configurable as mentioned above) to 0 - in a similar way to as is done here; https://github.com/elastic/rally-tracks/blob/master/it/test_security.py#L34

Regarding the points regarding size, I dont have any concern - its your choice if you have a single file or multiple, though note that rally will still download all 8 files even if ingest-percentage is set to 12.5% ( I believe, at least).

I think having a large corpus is a good thing - as mentioned we have e.g github_archive with over 6TB - (though it needs merging into the pubic repo, still).

You have shards and replicas hard coded to 1:1 - i would suggest changing so these can be configured, and having default value for replicas as 0, this will stop the indices being yellow

Will look into this soon. Thank you

Hi @gareth-ellis , just confirm you were referring to this:

rally-tracks/big5/request_body/index_template/elasticsearch.json

Line 30 in 7a7b846

"number_of_replicas": "1"

rally-tracks/big5/request_body/index_template/opensearch.json

Line 10 in 7a7b846

"index.number_of_replicas": "1"

Indeed, you can provide parameters as we do e.g here : https://github.com/elastic/rally-tracks/blob/master/geonames/index.json#L8

gareth-ellis

LGTM, thanks!

Copilot

Pull Request Overview

This PR adds a new Big5 rally track to benchmark key Elasticsearch performance areas.

Introduces a new README.md file with details on text querying, sorting, date histogram, range queries, and terms aggregation.
Provides documentation on document structure and configurable track parameters.

Files not reviewed (6)

big5/challenges/default.json: Language not supported
big5/request_body/index_template/elasticsearch.json: Language not supported
big5/request_body/index_template/opensearch.json: Language not supported
big5/request_body/policy/elasticsearch.json: Language not supported
big5/request_body/policy/opensearch.json: Language not supported
big5/track.json: Language not supported

Add Big5 track

9685016

wangch079 requested a review from a team April 18, 2025 14:50

wangch079 commented Apr 18, 2025

View reviewed changes

wangch079 added 2 commits April 29, 2025 09:20

update corpora

7a7b846

make number_of_shards and number_of_replicas configurable

74f39e5

wangch079 requested a review from gareth-ellis May 1, 2025 14:39

gareth-ellis approved these changes May 2, 2025

View reviewed changes

gareth-ellis requested a review from Copilot May 2, 2025 05:36

Copilot AI reviewed May 2, 2025

View reviewed changes

wangch079 merged commit d018973 into elastic:master May 2, 2025
13 checks passed

NickDris mentioned this pull request Aug 28, 2025

dris test branch NickDris/rally-tracks#2

Closed

Conversation

wangch079 commented Apr 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gareth-ellis left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments