[Streams 🌊] Enrichment sampling data sources#219736
[Streams 🌊] Enrichment sampling data sources#219736tonyghiani merged 56 commits intoelastic:mainfrom
Conversation
…i/kibana into tonyghiani-218408-control-sample-fetching
|
@LucaWintergerst thanks for the check on functionalities!
I'd rather set the KQL preview (only this data source type as is more relevant) open by default than toggling typing as it generates quite a visual shift, I tried it and looks good!
I'm aware of this yes, I opened an issue to EUI, which is already fixed and should be merged with the next EUI release 👌
I made a small change to guarantee it'll show all the fields, so also
Agree, the experience is quite bad with the errors popping out while the user types. I updated to work on enter or button click and the experience is much better, it also leave the table in front of the users so they can still pick/copy values from the table. |
…i/kibana into 218408-control-sample-fetching
|
@LucaWintergerst thanks for the detailed suggestions, I applied all the changes 👌 Also, the bug for the stats calculation was a subtle one, good catch! I updated the logic to handle % up to 1 decimal level to give more accurate stats when the samples count is > 100 docs.
|
Kerry350
left a comment
There was a problem hiding this comment.
Great work @tonyghiani 👏
Just a couple of nits, and ignoring the checkbox issue which is being fixed on the EUI side.
This was a really good read — the composition of UI components and state machine refs etc was really nice 👌
| */ | ||
| export interface KqlSamplesDataSource extends BaseDataSource { | ||
| type: 'kql-samples'; | ||
| query: { |
There was a problem hiding this comment.
For query, filters, and timeRange here is there a preexisting KQL type we can use, rather than redefining them?
There was a problem hiding this comment.
I couldn't find any unfortunately ,and I just reused the exported types to compose it here :( For query the type is slightly different from the one exported by the es-query package.
| 'xpack.streams.streamDetailView.managementTab.enrichment.dataSources.randomSamples.callout', | ||
| { | ||
| defaultMessage: | ||
| 'The random samples data source cannot be deleted to guarantee available samples for the simulation. You can still disable it if you want to focus on other data sources samples.', |
There was a problem hiding this comment.
Nit: This doesn't read well — can we change this to You can still disable it if you want to focus on samples from other data sources
| const error = getFormattedError(event.error); | ||
| toasts.addError(error, { | ||
| title: i18n.translate('xpack.streams.enrichment.dataSources.dataCollectionError', { | ||
| defaultMessage: 'An issue occurred retrieving data source documents.', |
There was a problem hiding this comment.
Nit: Can we change to An issue occurred retrieving documents from the data source.
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]Module Count
Async chunks
Page load bundle
Unknown metric groupsasync chunk count
ESLint disabled line counts
Total ESLint disabled count
History
|
|
Starting backport for target branches: 8.19 https://github.com/elastic/kibana/actions/runs/15773250413 |
💔 All backports failed
Manual backportTo create the backport manually run: Questions ?Please refer to the Backport tool documentation |
|
Starting backport for target branches: 8.19 https://github.com/elastic/kibana/actions/runs/15816253337 |
## 📓 Summary Closes elastic#218408 This work initially started with the introduction of a simple search bar on the streams enrichment samples, but as we realized it didn't fit well with the requirements for a smooth simulation experience, we moved in another direction. ## Data sources As we want to let users pull documents from multiple sources to simulate their processors (such as docs from Discover, failure store, custom documents pasted into the simulator, etc...), this work introduces a data source entity in the simulation playground. On top of how it used to work, it converts the random samples previously fetched automatically to a dedicated data source. As this becomes now a scalable concept, we provide users with the ability to add/remove/enable different data sources for the same simulation: - **Random samples**: This is always available by default to have at least a data source always available; it can still be enabled/disabled on demand. - **KQL search**: Provides a KQL search bar, similar to the one found in Discover and across Kibana, which enables patterns for pulling documents into this page from Discover or elsewhere. - **Custom samples**: Paste raw documents that will be used among the other data sources for the whole simulation. ## 💡 Reviewer hints - The data fetching now relies on the `data` plugin interfaces as we needed a more capable API than the `_sample` one (now removed), and it aligns with the data fetching practice used for the partitioning page. - The data source can behave differently depending on its state (enabled/disabled). To treat it as an isolated concept, a representing actor machine is introduced and the root streamEnrichment machine coordinates event-based communication as it happens already for the processors' instantiation and management. - The data sources are consistently persisted to the URL, with a couple of exceptions: - The `Custom samples` data source is not persisted, as it's not only descriptive of the data source configuration but it also holds the custom samples defined by the user. This could easily hit the URL limits, so we warn the user this won't be persisted anyhow. - The `Random samples` data source is always available and restored in the URL to guarantee a data source available on the page. (cherry picked from commit b759ebb)
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
# Backport This will backport the following commits from `main` to `8.19`: - [[Streams 🌊] Enrichment sampling data sources (#219736)](#219736) <!--- Backport version: 9.6.6 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sorenlouv/backport) <!--BACKPORT [{"author":{"name":"Marco Antonio Ghiani","email":"marcoantonio.ghiani01@gmail.com"},"sourceCommit":{"committedDate":"2025-06-20T07:02:14Z","message":"[Streams 🌊] Enrichment sampling data sources (#219736)\n\n## 📓 Summary\n\nCloses #218408 \n\nThis work initially started with the introduction of a simple search bar\non the streams enrichment samples, but as we realized it didn't fit well\nwith the requirements for a smooth simulation experience, we moved in\nanother direction.\n\n## Data sources\n\nAs we want to let users pull documents from multiple sources to simulate\ntheir processors (such as docs from Discover, failure store, custom\ndocuments pasted into the simulator, etc...), this work introduces a\ndata source entity in the simulation playground.\nOn top of how it used to work, it converts the random samples previously\nfetched automatically to a dedicated data source.\nAs this becomes now a scalable concept, we provide users with the\nability to add/remove/enable different data sources for the same\nsimulation:\n- **Random samples**: This is always available by default to have at\nleast a data source always available; it can still be enabled/disabled\non demand.\n- **KQL search**: Provides a KQL search bar, similar to the one found in\nDiscover and across Kibana, which enables patterns for pulling documents\ninto this page from Discover or elsewhere.\n- **Custom samples**: Paste raw documents that will be used among the\nother data sources for the whole simulation.\n\n## 💡 Reviewer hints\n\n- The data fetching now relies on the `data` plugin interfaces as we\nneeded a more capable API than the `_sample` one (now removed), and it\naligns with the data fetching practice used for the partitioning page.\n- The data source can behave differently depending on its state\n(enabled/disabled). To treat it as an isolated concept, a representing\nactor machine is introduced and the root streamEnrichment machine\ncoordinates event-based communication as it happens already for the\nprocessors' instantiation and management.\n- The data sources are consistently persisted to the URL, with a couple\nof exceptions:\n- The `Custom samples` data source is not persisted, as it's not only\ndescriptive of the data source configuration but it also holds the\ncustom samples defined by the user. This could easily hit the URL\nlimits, so we warn the user this won't be persisted anyhow.\n- The `Random samples` data source is always available and restored in\nthe URL to guarantee a data source available on the page.","sha":"b759ebba3da4fa3f34fa914f99017c315a3294af","branchLabelMapping":{"^v9.1.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:obs-ux-logs","backport:version","Feature:Streams","v9.1.0","v8.19.0"],"title":"[Streams 🌊] Enrichment sampling data sources","number":219736,"url":"https://github.com/elastic/kibana/pull/219736","mergeCommit":{"message":"[Streams 🌊] Enrichment sampling data sources (#219736)\n\n## 📓 Summary\n\nCloses #218408 \n\nThis work initially started with the introduction of a simple search bar\non the streams enrichment samples, but as we realized it didn't fit well\nwith the requirements for a smooth simulation experience, we moved in\nanother direction.\n\n## Data sources\n\nAs we want to let users pull documents from multiple sources to simulate\ntheir processors (such as docs from Discover, failure store, custom\ndocuments pasted into the simulator, etc...), this work introduces a\ndata source entity in the simulation playground.\nOn top of how it used to work, it converts the random samples previously\nfetched automatically to a dedicated data source.\nAs this becomes now a scalable concept, we provide users with the\nability to add/remove/enable different data sources for the same\nsimulation:\n- **Random samples**: This is always available by default to have at\nleast a data source always available; it can still be enabled/disabled\non demand.\n- **KQL search**: Provides a KQL search bar, similar to the one found in\nDiscover and across Kibana, which enables patterns for pulling documents\ninto this page from Discover or elsewhere.\n- **Custom samples**: Paste raw documents that will be used among the\nother data sources for the whole simulation.\n\n## 💡 Reviewer hints\n\n- The data fetching now relies on the `data` plugin interfaces as we\nneeded a more capable API than the `_sample` one (now removed), and it\naligns with the data fetching practice used for the partitioning page.\n- The data source can behave differently depending on its state\n(enabled/disabled). To treat it as an isolated concept, a representing\nactor machine is introduced and the root streamEnrichment machine\ncoordinates event-based communication as it happens already for the\nprocessors' instantiation and management.\n- The data sources are consistently persisted to the URL, with a couple\nof exceptions:\n- The `Custom samples` data source is not persisted, as it's not only\ndescriptive of the data source configuration but it also holds the\ncustom samples defined by the user. This could easily hit the URL\nlimits, so we warn the user this won't be persisted anyhow.\n- The `Random samples` data source is always available and restored in\nthe URL to guarantee a data source available on the page.","sha":"b759ebba3da4fa3f34fa914f99017c315a3294af"}},"sourceBranch":"main","suggestedTargetBranches":["8.19"],"targetPullRequestStates":[{"branch":"main","label":"v9.1.0","branchLabelMappingKey":"^v9.1.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/219736","number":219736,"mergeCommit":{"message":"[Streams 🌊] Enrichment sampling data sources (#219736)\n\n## 📓 Summary\n\nCloses #218408 \n\nThis work initially started with the introduction of a simple search bar\non the streams enrichment samples, but as we realized it didn't fit well\nwith the requirements for a smooth simulation experience, we moved in\nanother direction.\n\n## Data sources\n\nAs we want to let users pull documents from multiple sources to simulate\ntheir processors (such as docs from Discover, failure store, custom\ndocuments pasted into the simulator, etc...), this work introduces a\ndata source entity in the simulation playground.\nOn top of how it used to work, it converts the random samples previously\nfetched automatically to a dedicated data source.\nAs this becomes now a scalable concept, we provide users with the\nability to add/remove/enable different data sources for the same\nsimulation:\n- **Random samples**: This is always available by default to have at\nleast a data source always available; it can still be enabled/disabled\non demand.\n- **KQL search**: Provides a KQL search bar, similar to the one found in\nDiscover and across Kibana, which enables patterns for pulling documents\ninto this page from Discover or elsewhere.\n- **Custom samples**: Paste raw documents that will be used among the\nother data sources for the whole simulation.\n\n## 💡 Reviewer hints\n\n- The data fetching now relies on the `data` plugin interfaces as we\nneeded a more capable API than the `_sample` one (now removed), and it\naligns with the data fetching practice used for the partitioning page.\n- The data source can behave differently depending on its state\n(enabled/disabled). To treat it as an isolated concept, a representing\nactor machine is introduced and the root streamEnrichment machine\ncoordinates event-based communication as it happens already for the\nprocessors' instantiation and management.\n- The data sources are consistently persisted to the URL, with a couple\nof exceptions:\n- The `Custom samples` data source is not persisted, as it's not only\ndescriptive of the data source configuration but it also holds the\ncustom samples defined by the user. This could easily hit the URL\nlimits, so we warn the user this won't be persisted anyhow.\n- The `Random samples` data source is always available and restored in\nthe URL to guarantee a data source available on the page.","sha":"b759ebba3da4fa3f34fa914f99017c315a3294af"}},{"branch":"8.19","label":"v8.19.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT--> Co-authored-by: Marco Antonio Ghiani <marcoantonio.ghiani01@gmail.com>
## 📓 Summary Closes elastic#218408 This work initially started with the introduction of a simple search bar on the streams enrichment samples, but as we realized it didn't fit well with the requirements for a smooth simulation experience, we moved in another direction. ## Data sources As we want to let users pull documents from multiple sources to simulate their processors (such as docs from Discover, failure store, custom documents pasted into the simulator, etc...), this work introduces a data source entity in the simulation playground. On top of how it used to work, it converts the random samples previously fetched automatically to a dedicated data source. As this becomes now a scalable concept, we provide users with the ability to add/remove/enable different data sources for the same simulation: - **Random samples**: This is always available by default to have at least a data source always available; it can still be enabled/disabled on demand. - **KQL search**: Provides a KQL search bar, similar to the one found in Discover and across Kibana, which enables patterns for pulling documents into this page from Discover or elsewhere. - **Custom samples**: Paste raw documents that will be used among the other data sources for the whole simulation. ## 💡 Reviewer hints - The data fetching now relies on the `data` plugin interfaces as we needed a more capable API than the `_sample` one (now removed), and it aligns with the data fetching practice used for the partitioning page. - The data source can behave differently depending on its state (enabled/disabled). To treat it as an isolated concept, a representing actor machine is introduced and the root streamEnrichment machine coordinates event-based communication as it happens already for the processors' instantiation and management. - The data sources are consistently persisted to the URL, with a couple of exceptions: - The `Custom samples` data source is not persisted, as it's not only descriptive of the data source configuration but it also holds the custom samples defined by the user. This could easily hit the URL limits, so we warn the user this won't be persisted anyhow. - The `Random samples` data source is always available and restored in the URL to guarantee a data source available on the page.


📓 Summary
Closes #218408
This work initially started with the introduction of a simple search bar on the streams enrichment samples, but as we realized it didn't fit well with the requirements for a smooth simulation experience, we moved in another direction.
Data sources
As we want to let users pull documents from multiple sources to simulate their processors (such as docs from Discover, failure store, custom documents pasted into the simulator, etc...), this work introduces a data source entity in the simulation playground.
On top of how it used to work, it converts the random samples previously fetched automatically to a dedicated data source.
As this becomes now a scalable concept, we provide users with the ability to add/remove/enable different data sources for the same simulation:
💡 Reviewer hints
dataplugin interfaces as we needed a more capable API than the_sampleone (now removed), and it aligns with the data fetching practice used for the partitioning page.Custom samplesdata source is not persisted, as it's not only descriptive of the data source configuration but it also holds the custom samples defined by the user. This could easily hit the URL limits, so we warn the user this won't be persisted anyhow.Random samplesdata source is always available and restored in the URL to guarantee a data source available on the page.