[Streams 🌊] Enrichment sampling data sources by tonyghiani · Pull Request #219736 · elastic/kibana

tonyghiani · 2025-04-30T13:15:57Z

📓 Summary

This work initially started with the introduction of a simple search bar on the streams enrichment samples, but as we realized it didn't fit well with the requirements for a smooth simulation experience, we moved in another direction.

Data sources

As we want to let users pull documents from multiple sources to simulate their processors (such as docs from Discover, failure store, custom documents pasted into the simulator, etc...), this work introduces a data source entity in the simulation playground.
On top of how it used to work, it converts the random samples previously fetched automatically to a dedicated data source.
As this becomes now a scalable concept, we provide users with the ability to add/remove/enable different data sources for the same simulation:

Random samples: This is always available by default to have at least a data source always available; it can still be enabled/disabled on demand.
KQL search: Provides a KQL search bar, similar to the one found in Discover and across Kibana, which enables patterns for pulling documents into this page from Discover or elsewhere.
Custom samples: Paste raw documents that will be used among the other data sources for the whole simulation.

💡 Reviewer hints

The data fetching now relies on the data plugin interfaces as we needed a more capable API than the _sample one (now removed), and it aligns with the data fetching practice used for the partitioning page.
The data source can behave differently depending on its state (enabled/disabled). To treat it as an isolated concept, a representing actor machine is introduced and the root streamEnrichment machine coordinates event-based communication as it happens already for the processors' instantiation and management.
The data sources are consistently persisted to the URL, with a couple of exceptions:
- The Custom samples data source is not persisted, as it's not only descriptive of the data source configuration but it also holds the custom samples defined by the user. This could easily hit the URL limits, so we warn the user this won't be persisted anyhow.
- The Random samples data source is always available and restored in the URL to guarantee a data source available on the page.

…i/kibana into tonyghiani-218408-control-sample-fetching

tonyghiani · 2025-06-17T12:49:17Z

@LucaWintergerst thanks for the check on functionalities!

Automatically extend data preview once a user types a new KQL query, so they always see the docs after typing without clicking again

I'd rather set the KQL preview (only this data source type as is more relevant) open by default than toggling typing as it generates quite a visual shift, I tried it and looks good!

Can’t click checkbox itself, only around it to enable/disable

I'm aware of this yes, I opened an issue to EUI, which is already fixed and should be merged with the next EUI release 👌

Are the data preview fields static? We should probably make this configurable or more flexible in the future.

I made a small change to guarantee it'll show all the fields, so also body.text will be shown among the others, but agree we can improve it! I'd rather keep this as a follow-up once we gather feedback.

The auto-apply of the kql is very aggressive. I wonder if we should either debounce more or do something more similar to discover where it only applies after pressing enter?

Agree, the experience is quite bad with the errors popping out while the user types. I updated to work on enter or button click and the experience is much better, it also leave the table in front of the users so they can still pick/copy values from the table.

LucaWintergerst · 2025-06-17T15:42:05Z

great, I just checked it again and that all works now 👍

a few other minor tweaks, and one slightly larger bug:

1.

The custom samples data source cannot be persisted. It will be lost when you leave this page.

to

The custom samples will not be persisted. They will be lost when you leave the processing page.

2.

For the custom docs, instead of

[{"foo":"bar","foo2":"baz"}]
use
[ { "@timestamp": "2025-06-17T12:00:00Z", "message": "Sample log message" } ]

3.

Change to the following, to avoid confusing users that we're not actually deleting their data.

Remove sample data source?
Removed sample data source will need to be reconfigured

4.
There is a bug with the failed % calc where it doesn't show it anymore. It's not always like that. If I had to guess it might be that we only run the simulation with 100 docs, and if the first sample source has >=100 docs it doesn't use the other sources anymore?

CleanShot.2025-06-17.at.17.39.19.mp4

…i/kibana into 218408-control-sample-fetching

tonyghiani · 2025-06-18T08:21:10Z

@LucaWintergerst thanks for the detailed suggestions, I applied all the changes 👌

Also, the bug for the stats calculation was a subtle one, good catch! I updated the logic to handle % up to 1 decimal level to give more accurate stats when the samples count is > 100 docs.

Kerry350

Great work @tonyghiani 👏

Just a couple of nits, and ignoring the checkbox issue which is being fixed on the EUI side.

This was a really good read — the composition of UI components and state machine refs etc was really nice 👌

Kerry350 · 2025-06-19T13:15:02Z

x-pack/platform/plugins/shared/streams_app/common/url_schema/enrichment_url_schema.ts

+ */
+export interface KqlSamplesDataSource extends BaseDataSource {
+  type: 'kql-samples';
+  query: {


For query, filters, and timeRange here is there a preexisting KQL type we can use, rather than redefining them?

I couldn't find any unfortunately ,and I just reused the exported types to compose it here :( For query the type is slightly different from the one exported by the es-query package.

Kerry350 · 2025-06-19T13:36:38Z

...lic/components/data_management/stream_detail_enrichment/data_sources_flyout/translations.tsx

+      'xpack.streams.streamDetailView.managementTab.enrichment.dataSources.randomSamples.callout',
+      {
+        defaultMessage:
+          'The random samples data source cannot be deleted to guarantee available samples for the simulation. You can still disable it if you want to focus on other data sources samples.',


Nit: This doesn't read well — can we change this to You can still disable it if you want to focus on samples from other data sources

Kerry350 · 2025-06-19T13:50:42Z

.../stream_detail_enrichment/state_management/data_source_state_machine/data_collector_actor.ts

+    const error = getFormattedError(event.error);
+    toasts.addError(error, {
+      title: i18n.translate('xpack.streams.enrichment.dataSources.dataCollectionError', {
+        defaultMessage: 'An issue occurred retrieving data source documents.',


Nit: Can we change to An issue occurred retrieving documents from the data source.

elasticmachine · 2025-06-19T16:29:49Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: 74d247b

Failed CI Steps

FTR Configs #28

Test Failures

[job] [logs] FTR Configs #28 / Rules Management - Rule import export API @ess @serverless @skipInServerlessMKI import_rules importing rules with an index @skipInServerless migrate pre-8.0 action connector ids should be imported into the default space importing a default-space 7.16 rule with a connector made in the default space into the default space should result in a 200

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id	before	after	diff
`streamsApp`	441	460	+19

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`streamsApp`	518.7KB	541.0KB	+22.3KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id	before	after	diff
`streamsApp`	10.5KB	10.8KB	+271.0B

Unknown metric groups

async chunk count

id	before	after	diff
`streamsApp`	7	8	+1

ESLint disabled line counts

id	before	after	diff
`streamsApp`	13	12	-1

Total ESLint disabled count

id	before	after	diff
`streamsApp`	17	16	-1

History

💚 Build #310435 succeeded e54e282
💚 Build #309776 succeeded d099c7d
💔 Build #309753 failed bf57b4a
💔 Build #309735 failed 1d57c08
💛 Build #309385 was flaky 9389cde
💚 Build #308897 succeeded 57f6775

kibanamachine · 2025-06-20T07:02:30Z

Starting backport for target branches: 8.19

https://github.com/elastic/kibana/actions/runs/15773250413

kibanamachine · 2025-06-20T07:08:43Z

💔 All backports failed

Status	Branch	Result
❌	8.19	Backport failed because of merge conflicts You might need to backport the following PRs to 8.19: - [Streams 🌊] Make management view the main page for individual stream (#224461)

Manual backport

To create the backport manually run:

node scripts/backport --pr 219736

Questions ?

Please refer to the Backport tool documentation

kibanamachine · 2025-06-23T05:44:33Z

Starting backport for target branches: 8.19

https://github.com/elastic/kibana/actions/runs/15816253337

## 📓 Summary Closes elastic#218408 This work initially started with the introduction of a simple search bar on the streams enrichment samples, but as we realized it didn't fit well with the requirements for a smooth simulation experience, we moved in another direction. ## Data sources As we want to let users pull documents from multiple sources to simulate their processors (such as docs from Discover, failure store, custom documents pasted into the simulator, etc...), this work introduces a data source entity in the simulation playground. On top of how it used to work, it converts the random samples previously fetched automatically to a dedicated data source. As this becomes now a scalable concept, we provide users with the ability to add/remove/enable different data sources for the same simulation: - **Random samples**: This is always available by default to have at least a data source always available; it can still be enabled/disabled on demand. - **KQL search**: Provides a KQL search bar, similar to the one found in Discover and across Kibana, which enables patterns for pulling documents into this page from Discover or elsewhere. - **Custom samples**: Paste raw documents that will be used among the other data sources for the whole simulation. ## 💡 Reviewer hints - The data fetching now relies on the `data` plugin interfaces as we needed a more capable API than the `_sample` one (now removed), and it aligns with the data fetching practice used for the partitioning page. - The data source can behave differently depending on its state (enabled/disabled). To treat it as an isolated concept, a representing actor machine is introduced and the root streamEnrichment machine coordinates event-based communication as it happens already for the processors' instantiation and management. - The data sources are consistently persisted to the URL, with a couple of exceptions: - The `Custom samples` data source is not persisted, as it's not only descriptive of the data source configuration but it also holds the custom samples defined by the user. This could easily hit the URL limits, so we warn the user this won't be persisted anyhow. - The `Random samples` data source is always available and restored in the URL to guarantee a data source available on the page. (cherry picked from commit b759ebb)

kibanamachine · 2025-06-23T05:50:48Z

💚 All backports created successfully

Status	Branch	Result
✅	8.19

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

# Backport This will backport the following commits from `main` to `8.19`: - [[Streams 🌊] Enrichment sampling data sources (#219736)](#219736)  ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sorenlouv/backport)  Co-authored-by: Marco Antonio Ghiani <marcoantonio.ghiani01@gmail.com>

## 📓 Summary Closes elastic#218408 This work initially started with the introduction of a simple search bar on the streams enrichment samples, but as we realized it didn't fit well with the requirements for a smooth simulation experience, we moved in another direction. ## Data sources As we want to let users pull documents from multiple sources to simulate their processors (such as docs from Discover, failure store, custom documents pasted into the simulator, etc...), this work introduces a data source entity in the simulation playground. On top of how it used to work, it converts the random samples previously fetched automatically to a dedicated data source. As this becomes now a scalable concept, we provide users with the ability to add/remove/enable different data sources for the same simulation: - **Random samples**: This is always available by default to have at least a data source always available; it can still be enabled/disabled on demand. - **KQL search**: Provides a KQL search bar, similar to the one found in Discover and across Kibana, which enables patterns for pulling documents into this page from Discover or elsewhere. - **Custom samples**: Paste raw documents that will be used among the other data sources for the whole simulation. ## 💡 Reviewer hints - The data fetching now relies on the `data` plugin interfaces as we needed a more capable API than the `_sample` one (now removed), and it aligns with the data fetching practice used for the partitioning page. - The data source can behave differently depending on its state (enabled/disabled). To treat it as an isolated concept, a representing actor machine is introduced and the root streamEnrichment machine coordinates event-based communication as it happens already for the processors' instantiation and management. - The data sources are consistently persisted to the URL, with a couple of exceptions: - The `Custom samples` data source is not persisted, as it's not only descriptive of the data source configuration but it also holds the custom samples defined by the user. This could easily hit the URL limits, so we warn the user this won't be persisted anyhow. - The `Random samples` data source is always available and restored in the URL to guarantee a data source available on the page.

tonyghiani added 4 commits April 29, 2025 11:22

wip(streams): samples fetching

25c28b4

refactor(streams): improve samples fetching abortion

ab87e30

refactor(streams): update streams search bar

e648081

wip(streams): enrichment controls

429f251

tonyghiani added Team:obs-onboarding Observability Onboarding Team backport:version Backport to applied version labels Feature:Streams This is the label for the Streams Project v9.1.0 v8.19.0 release_note:skip Skip the PR/issue when compiling release notes labels Apr 30, 2025

tonyghiani added 20 commits April 30, 2025 15:17

Merge branch 'main' into 218408-control-sample-fetching

e3d0529

Merge branch 'main' into 218408-control-sample-fetching

52d0cd4

Merge branch 'main' into 218408-control-sample-fetching

e8b70dc

wip(streams): remove change params

d689d97

wip(streams): data sources flyout

3a219d9

wip(streams): data sources flyout

6641aaf

wip(streams): enrichment url state

83fa1e0

wip(streams): sampling machine

c229739

wip(streams): data sources actors

996087f

wip(streams): data sources forms

d39dea4

wip(streams): more data sources updates

ac61d52

Merge branch 'main' into 218408-control-sample-fetching

2192c5f

wip(streams): connect simulation to data sources

ab37212

wip(streams): connect simulation to data sources mvp

23ceea6

wip(streams): add custom samples

194c754

wip(streams): isolate time range for simulation

5b327ec

refactor(streams): improve data collectors

8dc9d12

refactor(streams): improve data sources logic

ec94e0e

refactor(streams): improve flyout rendering

0fe14e9

Merge branch '218408-control-sample-fetching' of github.com:tonyghian…

e674125

…i/kibana into tonyghiani-218408-control-sample-fetching

tonyghiani added 4 commits June 18, 2025 09:51

refactor(streams): update translations

ec651ab

Merge branch 'main' into 218408-control-sample-fetching

1d57c08

Merge branch '218408-control-sample-fetching' of github.com:tonyghian…

ce76198

…i/kibana into 218408-control-sample-fetching

refactor(streams): update stats calculation

37b4126

tonyghiani added 2 commits June 18, 2025 10:21

Merge branch 'main' into 218408-control-sample-fetching

1aaef7c

refactor(streams): update stats calculation per processor

bf57b4a

Kerry350 self-requested a review June 18, 2025 08:51

fix(streams): linting

d099c7d

tonyghiani mentioned this pull request Jun 19, 2025

[Streams 🌊] Improve warning on missing simulation samples #224405

Merged

Merge branch 'main' into 218408-control-sample-fetching

e54e282

Kerry350 approved these changes Jun 19, 2025

View reviewed changes

refactor(streams): update copies

74d247b

Merge branch 'main' into 218408-control-sample-fetching

d79a81b

tonyghiani enabled auto-merge (squash) June 20, 2025 05:20

tonyghiani merged commit b759ebb into elastic:main Jun 20, 2025
10 checks passed

tonyghiani added backport:version Backport to applied version labels and removed backport:version Backport to applied version labels labels Jun 23, 2025

kibanamachine mentioned this pull request Jun 23, 2025

[8.19] [Streams 🌊] Enrichment sampling data sources (#219736) #224794

Merged

tonyghiani deleted the 218408-control-sample-fetching branch June 23, 2025 07:15

Conversation

tonyghiani commented Apr 30, 2025 • edited by kibanamachine Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📓 Summary

Data sources

💡 Reviewer hints

Uh oh!

tonyghiani commented Jun 17, 2025

Uh oh!

LucaWintergerst commented Jun 17, 2025

Uh oh!

tonyghiani commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kerry350 left a comment

Choose a reason for hiding this comment

Uh oh!

Kerry350 Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

tonyghiani Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

Kerry350 Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

Kerry350 Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

elasticmachine commented Jun 19, 2025

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

Metrics [docs]

Module Count

Async chunks

Page load bundle

async chunk count

ESLint disabled line counts

Total ESLint disabled count

History

Uh oh!

Uh oh!

kibanamachine commented Jun 20, 2025

Uh oh!

kibanamachine commented Jun 20, 2025

💔 All backports failed

Manual backport

Questions ?

Uh oh!

kibanamachine commented Jun 23, 2025

Uh oh!

kibanamachine commented Jun 23, 2025

💚 All backports created successfully

Questions ?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

tonyghiani commented Apr 30, 2025 •

edited by kibanamachine

Loading

tonyghiani commented Jun 18, 2025 •

edited

Loading