Skip to content

Streams challenge#816

Merged
flash1293 merged 20 commits intoelastic:masterfrom
flash1293:streams-challenge
Jul 21, 2025
Merged

Streams challenge#816
flash1293 merged 20 commits intoelastic:masterfrom
flash1293:streams-challenge

Conversation

@flash1293
Copy link
Contributor

@flash1293 flash1293 commented Jul 11, 2025

Related to https://github.com/elastic/streams-program/issues/368

Set up the logs data stream like streams does it and throw the elastic/logs dataset at it.

This challenge measures indexing performance and storage usage, a comparable challenge to measure query performance is located in https://github.com/elastic/rally-tracks/blob/master/elastic/logs/challenges/logging-chicken.json

The processing pipeline still contains a couple polyfills for features that didn't land yet, but this is expected for now - I will keep it in sync with what happens in practice.

@flash1293 flash1293 marked this pull request as ready for review July 11, 2025 13:27
@gbanasiak gbanasiak requested a review from a team July 11, 2025 14:44
Copy link
Contributor

@gbanasiak gbanasiak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integration tests grind to a halt in a test that happens after the newly added streams. I haven't looked why exactly that happens but it might be necessary to add cleanup of items added in logging-streams challenge in https://github.com/elastic/rally-tracks/blob/master/elastic/logs/tasks/index-setup.json which is shared between challenges. Note integration tests for elastic/logs reuse the same ES cluster. Whatever is left by the previous tests and not cleaned-up by current test can potentially affect the outcome.

Similarly, the new challenge should be removing data streams.

@flash1293
Copy link
Contributor Author

Thanks for the comments, @gbanasiak - I addressed your comments:

  • Move ignore-response-error-level out of throttling config
  • Clean up after tests are run (let's see whether that makes it work)

@flash1293
Copy link
Contributor Author

flash1293 commented Jul 15, 2025

@gbanasiak seems like it still has the same problem. Is it because it's crunching too much data? Based on the params it looks like it will handle 72GB of data which seems like a lot. OTOH the other test cases do that too?

Copy link
Contributor

@gbanasiak gbanasiak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is definitely due to state in which ES cluster is left after streams challenge test. The streams challenge test takes about a minute to complete which is fine. The following test was taking around a minute earlier, and now gets stuck for more than 1h.

Please take a look at further comments. If this doesn't help, the test should be reproduced locally, and state of ES cluster verified.

@flash1293
Copy link
Contributor Author

flash1293 commented Jul 15, 2025

@gbanasiak Thanks for the look.

This is working with logs instead of logs-* on purpose - see https://github.com/elastic/rally-tracks/pull/816/files#diff-352462dbff160a0065b6f5dadc84cf7005070e2353a0663774cbf3b9d62165dbR267 - no data is actually put into data streams other than logs.

I solved this with the reroute because that was the easiest way, maybe there is a better one to represent this.

My understanding was that it's https://github.com/elastic/rally-tracks/pull/816/files#diff-352462dbff160a0065b6f5dadc84cf7005070e2353a0663774cbf3b9d62165dbR267 what is actually creating these data streams, and the new challenge is not calling the setup task on purpose (because it's all inlined for this special case).

I just re-tested this locally and it seems to leave behind a logs datastream, will try to find out what's happening there

@flash1293
Copy link
Contributor Author

Should we maybe introduce a param similar to p_mapping here?

{% if p_mapping == "mapped" %}

and based on that change the data streams mentioned in track.json?

I tried to contain the necessary changes to the challenge, but maybe that's the wrong way of doing it.

@flash1293
Copy link
Contributor Author

flash1293 commented Jul 15, 2025

@gbanasiak testing locally I found an issue, let's see whether it works now:

I used

{
      "name": "delete-data-stream",
      "operation-type": "delete-data-stream",
      "data-stream": "logs"
}

instead of

{
      "name": "delete-data-stream",
      "operation": {
        "operation-type": "delete-data-stream",
        "data-stream": "logs"
      }
    },

it somehow didn't complain about this, but it also didn't do anything, I guess it just got the default parameters this way.

Also for delete-composable-template and delete-component-template, the parameter is actually called templates, not template - the docs are wrong here:

Who should I report this to?

The question whether we should change the track.json via a parameter, create a completely separate track or keep it in the challenge like right now still stands though.

@flash1293
Copy link
Contributor Author

flash1293 commented Jul 15, 2025

Still times out, but it gets a little further now (two tests instead of one after the streams one) - I don't know enough about rally to be able to tell what's going on here. Is this new test just pushing it over the edge?

I can't reproduce any issue locally, when I run the challenge on my local machine, I'm getting a clean system afterwards:

{
  "data_streams": []
}

I think I need some knowledgable eyes here.

A related question - with deleting the data stream after the test, the "final score" that's output in the console after the race concludes does not include numbers about ingest time and size after merge anymore. Is this captured by the side in the metrics store?

I'm interested in

  • Ingest pipeline time
  • Overall indexing time
  • Size of the data in storage before and after the force merge

@gbanasiak
Copy link
Contributor

@flash1293 Thanks for pointing out logs@custom pipeline, I have missed it earlier. My understanding now is the challenge relies on pre-defined logs index template that matches logs-*-* index pattern which uses logs@default-pipeline as default pipeline which calls logs@custom pipeline if it exists. The challenge relies on existing data streams such as logs-kafka.log-default which trigger logs template, and ultimately create logs data stream thanks to the reroute.

Note logs@custom pipeline is not removed, which affects remaining test runs even if logs data stream is gone.

More fundamentally, the cleanup is added at the end of the challenge which is unusual and may affect Rally telemetry taken at the end of the race. In my initial comment I was suggesting to expand cleanup steps shared between the challenges in elastic/logs so that the cleanup done at the beginning of a challenge (any challenge) can restore the intended initial state of the cluster. Now understanding the reroute bit I think this challenge deviates significantly from other challenges in elastic/logs which all work on a set of data streams defined in here.

Is the reroute absolutely necessary? Couldn't you achieve the same through conditional modification of composable templates used in elastic/logs to stay within the overall set of entities created and destroyed by the challenges in this track?

Thanks for pointing out the documentation bug, I'll raise a Rally PR for that.

@flash1293
Copy link
Contributor Author

Note logs@custom pipeline is not removed, which affects remaining test runs even if logs data stream is gone.

facepalm - will fix, thanks!

Is the reroute absolutely necessary? Couldn't you achieve the same through conditional modification of composable templates used in elastic/logs to stay within the overall set of entities created and destroyed by the challenges in this track?

The name of the data stream doesn't matter too much, but we need all of the data from the different data sets in a single data stream, that's the whole idea. Not sure how to achieve that without a reroute.

We could list the logs data stream in track.json, this way it should be cleaned up automatically (not sure whether it breaks something if it isn't always there, but I would guess no). For the reroute I guess the easiest approach is to have the logs@custom reroute and remove it at the end of the challenge.

I guess forking the track completely is still possible, but I would like to avoid if we can.

@gbanasiak
Copy link
Contributor

We could list the logs data stream in track.json, this way it should be cleaned up automatically (not sure whether it breaks something if it isn't always there, but I would guess no). For the reroute I guess the easiest approach is to have the logs@custom reroute and remove it at the end of the challenge.

Ok, let's try that. Can we use something more telling than logs, e.g. something with "streams" in the name? To make it easier to correlate with specific setup/challenge. Also wouldn't hurt to update track's README once we're sure this works fine.

@gbanasiak
Copy link
Contributor

Also for delete-composable-template and delete-component-template, the parameter is actually called templates, not template - the docs are wrong here

Looking at this closer, the template should also work as per parameter source definition https://github.com/elastic/rally/blob/17a17c44ee70d50720b2b5509cd34e657aae9b46/esrally/track/params.py#L411-L434. Parameter sources are evaluated before the runner code.

@flash1293
Copy link
Contributor Author

Looking at this closer, the template should also work as per parameter source definition https://github.com/elastic/rally/blob/17a17c44ee70d50720b2b5509cd34e657aae9b46/esrally/track/params.py#L411-L434. Parameter sources are evaluated before the runner code.

It didn't work for me, it failed with some error about "missing operands"

@gbanasiak
Copy link
Contributor

It didn't work for me, it failed with some error about "missing operands"

Can you confirm it wasn't caused by the first problem noted in #816 (comment) (operation attributes at a wrong JSON document level) ?

Co-authored-by: Grzegorz Banasiak <grzegorz.banasiak@elastic.co>
@flash1293
Copy link
Contributor Author

Let's see whether it works with your latest change.

@flash1293
Copy link
Contributor Author

@gbanasiak looks like there is no templating support for ingest pipelines... do you have another idea? We can also move it back to the challenge as last resort.

@gbanasiak
Copy link
Contributor

looks like there is no templating support for ingest pipelines... do you have another idea?

Ah, of course, elastic/logs is using a custom create-pipeline runner which is why pipeline specified as a file is even possible.

We can also move it back to the challenge as last resort.

Yup, that's the only option I think. Then conditional is not needed. At some point we might consider this challenge runnable in serverless which is when corresponding integration test should be added.

@flash1293
Copy link
Contributor Author

@gbanasiak Alright, moved the pipeline back. The plan is to make this available on serverless as well within the next months, I can adjust once we are there.

Copy link
Contributor

@gbanasiak gbanasiak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gbanasiak
Copy link
Contributor

@flash1293 For backporting: what is the oldest ES version that supports the newly added challenge?

@gbanasiak gbanasiak added the backport pending Awaiting backport to stable release branch label Jul 21, 2025
@flash1293
Copy link
Contributor Author

@gbanasiak It shouldn't be backported, just 9.2 and soon (hopefully) serverless should be supported.

@gbanasiak
Copy link
Contributor

It shouldn't be backported, just 9.2 and soon (hopefully) serverless should be supported.

9.2 is also a backport here. There is no corresponding branch yet, but there will be. Thanks for clarifying.

@flash1293 flash1293 merged commit 502a411 into elastic:master Jul 21, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport pending Awaiting backport to stable release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments