Streams challenge by flash1293 · Pull Request #816 · elastic/rally-tracks

flash1293 · 2025-07-11T11:16:04Z

Related to https://github.com/elastic/streams-program/issues/368

Set up the logs data stream like streams does it and throw the elastic/logs dataset at it.

This challenge measures indexing performance and storage usage, a comparable challenge to measure query performance is located in https://github.com/elastic/rally-tracks/blob/master/elastic/logs/challenges/logging-chicken.json

The processing pipeline still contains a couple polyfills for features that didn't land yet, but this is expected for now - I will keep it in sync with what happens in practice.

gbanasiak

Integration tests grind to a halt in a test that happens after the newly added streams. I haven't looked why exactly that happens but it might be necessary to add cleanup of items added in logging-streams challenge in https://github.com/elastic/rally-tracks/blob/master/elastic/logs/tasks/index-setup.json which is shared between challenges. Note integration tests for elastic/logs reuse the same ES cluster. Whatever is left by the previous tests and not cleaned-up by current test can potentially affect the outcome.

Similarly, the new challenge should be removing data streams.

elastic/logs/challenges/logging-streams.json

flash1293 · 2025-07-14T15:53:03Z

Thanks for the comments, @gbanasiak - I addressed your comments:

Move ignore-response-error-level out of throttling config
Clean up after tests are run (let's see whether that makes it work)

flash1293 · 2025-07-15T08:59:44Z

@gbanasiak seems like it still has the same problem. Is it because it's crunching too much data? Based on the params it looks like it will handle 72GB of data which seems like a lot. OTOH the other test cases do that too?

gbanasiak

The problem is definitely due to state in which ES cluster is left after streams challenge test. The streams challenge test takes about a minute to complete which is fine. The following test was taking around a minute earlier, and now gets stuck for more than 1h.

Please take a look at further comments. If this doesn't help, the test should be reproduced locally, and state of ES cluster verified.

elastic/logs/challenges/logging-streams.json

flash1293 · 2025-07-15T11:02:29Z

@gbanasiak Thanks for the look.

This is working with logs instead of logs-* on purpose - see https://github.com/elastic/rally-tracks/pull/816/files#diff-352462dbff160a0065b6f5dadc84cf7005070e2353a0663774cbf3b9d62165dbR267 - no data is actually put into data streams other than logs.

I solved this with the reroute because that was the easiest way, maybe there is a better one to represent this.

My understanding was that it's https://github.com/elastic/rally-tracks/pull/816/files#diff-352462dbff160a0065b6f5dadc84cf7005070e2353a0663774cbf3b9d62165dbR267 what is actually creating these data streams, and the new challenge is not calling the setup task on purpose (because it's all inlined for this special case).

I just re-tested this locally and it seems to leave behind a logs datastream, will try to find out what's happening there

flash1293 · 2025-07-15T11:04:16Z

Should we maybe introduce a param similar to p_mapping here?

rally-tracks/elastic/logs/track.json

Line 223 in 24c4b0b

{% if p_mapping == "mapped" %}

and based on that change the data streams mentioned in track.json?

I tried to contain the necessary changes to the challenge, but maybe that's the wrong way of doing it.

flash1293 · 2025-07-15T11:26:09Z

@gbanasiak testing locally I found an issue, let's see whether it works now:

I used

{
      "name": "delete-data-stream",
      "operation-type": "delete-data-stream",
      "data-stream": "logs"
}

instead of

{
      "name": "delete-data-stream",
      "operation": {
        "operation-type": "delete-data-stream",
        "data-stream": "logs"
      }
    },

it somehow didn't complain about this, but it also didn't do anything, I guess it just got the default parameters this way.

Also for delete-composable-template and delete-component-template, the parameter is actually called templates, not template - the docs are wrong here:

Docs (template) https://esrally.readthedocs.io/en/stable/track.html#delete-composable-template
Code (templates) https://github.com/elastic/rally/blob/master/esrally/driver/runner.py#L1596C40-L1596C49

Who should I report this to?

The question whether we should change the track.json via a parameter, create a completely separate track or keep it in the challenge like right now still stands though.

flash1293 · 2025-07-15T14:08:20Z

Still times out, but it gets a little further now (two tests instead of one after the streams one) - I don't know enough about rally to be able to tell what's going on here. Is this new test just pushing it over the edge?

I can't reproduce any issue locally, when I run the challenge on my local machine, I'm getting a clean system afterwards:

{
  "data_streams": []
}

I think I need some knowledgable eyes here.

A related question - with deleting the data stream after the test, the "final score" that's output in the console after the race concludes does not include numbers about ingest time and size after merge anymore. Is this captured by the side in the metrics store?

I'm interested in

Ingest pipeline time
Overall indexing time
Size of the data in storage before and after the force merge

gbanasiak · 2025-07-16T09:54:28Z

@flash1293 Thanks for pointing out logs@custom pipeline, I have missed it earlier. My understanding now is the challenge relies on pre-defined logs index template that matches logs-*-* index pattern which uses logs@default-pipeline as default pipeline which calls logs@custom pipeline if it exists. The challenge relies on existing data streams such as logs-kafka.log-default which trigger logs template, and ultimately create logs data stream thanks to the reroute.

Note logs@custom pipeline is not removed, which affects remaining test runs even if logs data stream is gone.

More fundamentally, the cleanup is added at the end of the challenge which is unusual and may affect Rally telemetry taken at the end of the race. In my initial comment I was suggesting to expand cleanup steps shared between the challenges in elastic/logs so that the cleanup done at the beginning of a challenge (any challenge) can restore the intended initial state of the cluster. Now understanding the reroute bit I think this challenge deviates significantly from other challenges in elastic/logs which all work on a set of data streams defined in here.

Is the reroute absolutely necessary? Couldn't you achieve the same through conditional modification of composable templates used in elastic/logs to stay within the overall set of entities created and destroyed by the challenges in this track?

Thanks for pointing out the documentation bug, I'll raise a Rally PR for that.

flash1293 · 2025-07-16T10:05:27Z

Note logs@custom pipeline is not removed, which affects remaining test runs even if logs data stream is gone.

facepalm - will fix, thanks!

Is the reroute absolutely necessary? Couldn't you achieve the same through conditional modification of composable templates used in elastic/logs to stay within the overall set of entities created and destroyed by the challenges in this track?

The name of the data stream doesn't matter too much, but we need all of the data from the different data sets in a single data stream, that's the whole idea. Not sure how to achieve that without a reroute.

We could list the logs data stream in track.json, this way it should be cleaned up automatically (not sure whether it breaks something if it isn't always there, but I would guess no). For the reroute I guess the easiest approach is to have the logs@custom reroute and remove it at the end of the challenge.

I guess forking the track completely is still possible, but I would like to avoid if we can.

gbanasiak · 2025-07-16T11:46:23Z

We could list the logs data stream in track.json, this way it should be cleaned up automatically (not sure whether it breaks something if it isn't always there, but I would guess no). For the reroute I guess the easiest approach is to have the logs@custom reroute and remove it at the end of the challenge.

Ok, let's try that. Can we use something more telling than logs, e.g. something with "streams" in the name? To make it easier to correlate with specific setup/challenge. Also wouldn't hurt to update track's README once we're sure this works fine.

gbanasiak · 2025-07-16T12:11:48Z

Also for delete-composable-template and delete-component-template, the parameter is actually called templates, not template - the docs are wrong here

Looking at this closer, the template should also work as per parameter source definition https://github.com/elastic/rally/blob/17a17c44ee70d50720b2b5509cd34e657aae9b46/esrally/track/params.py#L411-L434. Parameter sources are evaluated before the runner code.

flash1293 · 2025-07-16T13:06:27Z

Looking at this closer, the template should also work as per parameter source definition https://github.com/elastic/rally/blob/17a17c44ee70d50720b2b5509cd34e657aae9b46/esrally/track/params.py#L411-L434. Parameter sources are evaluated before the runner code.

It didn't work for me, it failed with some error about "missing operands"

gbanasiak · 2025-07-16T14:57:54Z

It didn't work for me, it failed with some error about "missing operands"

Can you confirm it wasn't caused by the first problem noted in #816 (comment) (operation attributes at a wrong JSON document level) ?

elastic/logs/challenges/logging-streams.json

elastic/logs/pipelines/logs@stream.processing.json

Co-authored-by: Grzegorz Banasiak <grzegorz.banasiak@elastic.co>

flash1293 · 2025-07-18T10:30:23Z

Let's see whether it works with your latest change.

flash1293 · 2025-07-18T12:25:49Z

@gbanasiak looks like there is no templating support for ingest pipelines... do you have another idea? We can also move it back to the challenge as last resort.

gbanasiak · 2025-07-19T06:36:36Z

looks like there is no templating support for ingest pipelines... do you have another idea?

Ah, of course, elastic/logs is using a custom create-pipeline runner which is why pipeline specified as a file is even possible.

We can also move it back to the challenge as last resort.

Yup, that's the only option I think. Then conditional is not needed. At some point we might consider this challenge runnable in serverless which is when corresponding integration test should be added.

elastic/logs/challenges/logging-streams.json

… into streams-challenge

flash1293 · 2025-07-21T11:05:32Z

@gbanasiak Alright, moved the pipeline back. The plan is to make this available on serverless as well within the next months, I can adjust once we are there.

gbanasiak

LGTM

gbanasiak · 2025-07-21T12:31:43Z

@flash1293 For backporting: what is the oldest ES version that supports the newly added challenge?

flash1293 · 2025-07-21T12:32:33Z

@gbanasiak It shouldn't be backported, just 9.2 and soon (hopefully) serverless should be supported.

gbanasiak · 2025-07-21T12:37:07Z

It shouldn't be backported, just 9.2 and soon (hopefully) serverless should be supported.

9.2 is also a backport here. There is no corresponding branch yet, but there will be. Thanks for clarifying.

flash1293 added 3 commits July 11, 2025 11:23

streams challenge

36c4cf9

dummy change

6cc7e05

add to test_logs

72ec9c3

flash1293 marked this pull request as ready for review July 11, 2025 13:27

gbanasiak requested a review from a team July 11, 2025 14:44

gbanasiak reviewed Jul 14, 2025

View reviewed changes

elastic/logs/challenges/logging-streams.json Outdated Show resolved Hide resolved

review comments

3459aea

gbanasiak reviewed Jul 15, 2025

View reviewed changes

elastic/logs/challenges/logging-streams.json Outdated Show resolved Hide resolved

elastic/logs/challenges/logging-streams.json Outdated Show resolved Hide resolved

gbanasiak reviewed Jul 15, 2025

View reviewed changes

elastic/logs/challenges/logging-streams.json Outdated Show resolved Hide resolved

flash1293 added 2 commits July 15, 2025 13:11

fix

3130aeb

fix 2

9757b11

flash1293 added 4 commits July 16, 2025 16:25

clean up

15552b4

Merge remote-tracking branch 'upstream/master' into streams-challenge

1035a2a

remove newlines

40c3947

fix another thing

22e0bef

flash1293 added 3 commits July 16, 2025 17:19

control replicas

3757097

fix and cleanup

d45abd6

delete accidentally comitted files

3fc15db

flash1293 added 3 commits July 17, 2025 09:44

reset properly

8b52512

Merge remote-tracking branch 'upstream/master' into streams-challenge

a349be2

add reroute back

83fca78

gbanasiak reviewed Jul 18, 2025

View reviewed changes

elastic/logs/challenges/logging-streams.json Show resolved Hide resolved

elastic/logs/pipelines/logs@stream.processing.json Outdated Show resolved Hide resolved

Update elastic/logs/pipelines/logs@stream.processing.json

987276e

Co-authored-by: Grzegorz Banasiak <grzegorz.banasiak@elastic.co>

gbanasiak reviewed Jul 19, 2025

View reviewed changes

elastic/logs/challenges/logging-streams.json Outdated Show resolved Hide resolved

flash1293 added 3 commits July 21, 2025 13:04

fixes

7df074f

Merge branch 'streams-challenge' of github.com:flash1293/rally-tracks…

d3a4194

… into streams-challenge

remove pipeline

861d86e

gbanasiak approved these changes Jul 21, 2025

View reviewed changes

gbanasiak added the backport pending Awaiting backport to stable release branch label Jul 21, 2025

flash1293 merged commit 502a411 into elastic:master Jul 21, 2025
14 checks passed

NickDris mentioned this pull request Aug 28, 2025

dris test branch NickDris/rally-tracks#2

Closed

flash1293 mentioned this pull request Oct 13, 2025

Update and extend streams challenges #885

Merged

Conversation

flash1293 commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gbanasiak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

flash1293 commented Jul 14, 2025

Uh oh!

flash1293 commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gbanasiak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

flash1293 commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

flash1293 commented Jul 15, 2025

Uh oh!

flash1293 commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

flash1293 commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gbanasiak commented Jul 16, 2025

Uh oh!

flash1293 commented Jul 16, 2025

Uh oh!

gbanasiak commented Jul 16, 2025

Uh oh!

gbanasiak commented Jul 16, 2025

Uh oh!

flash1293 commented Jul 16, 2025

Uh oh!

gbanasiak commented Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

flash1293 commented Jul 18, 2025

Uh oh!

flash1293 commented Jul 18, 2025

Uh oh!

gbanasiak commented Jul 19, 2025

Uh oh!

Uh oh!

flash1293 commented Jul 21, 2025

Uh oh!

gbanasiak left a comment

Choose a reason for hiding this comment

Uh oh!

gbanasiak commented Jul 21, 2025

Uh oh!

flash1293 commented Jul 21, 2025

Uh oh!

gbanasiak commented Jul 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

flash1293 commented Jul 11, 2025 •

edited

Loading

flash1293 commented Jul 15, 2025 •

edited

Loading

flash1293 commented Jul 15, 2025 •

edited

Loading

flash1293 commented Jul 15, 2025 •

edited

Loading

flash1293 commented Jul 15, 2025 •

edited

Loading