[Streams] Replay loghub data with synthtrace#212120
[Streams] Replay loghub data with synthtrace#212120dgieselaar merged 21 commits intoelastic:mainfrom
Conversation
There was a problem hiding this comment.
Implementation-wise this looks pretty good to me. Some meta questions:
- Should we rely on the public loghub repo or fork it off? I'm a little worried this breaking at some point because loghub changes its layout. This would also make it easier to expand it by our own means. In both cases we should cite loghub and the paper somewhere appropriate (like a readme file) as by the license
- I'm not so sure about the different speeds. I'm running via
node scripts/synthtrace.js sample_logs --live --kibana=http://localhost:5601 --target=http://localhost:9200 --liveBucketSize=1000and the liveBucketSize is essentially not considered because it computes its own speed. Can we make it taken into account? Different speeds for different data sets are a nice touch as it mirrors reality, but I would like to control the factor of data intake (and speed everything up by a factor of 1k for example). Maybe that's already possible and I just don't know the right command - I spot-checked some aspects of the refactoring and it makes sense to me, but I didn't dig through everything and as I'm not super familiar with the code base it's likely I'm missing something in there
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
I'm fine with either - but maybe good to do that as a follow-up, I'm not sure what the legal ramifications are.
Yes, totally forgot about this setting, I should be able to use it. Would we use a constant indexing rate for each generator, or keep the relative rate per generator (e.g. Android indexes at a way higher rate than Macbook)? |
Sounds good, then we should add a backlink to the repo and paper and follow up later.
I would prefer the latter, in practice this kind of thing happens all the time. |
|
Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services) |
💚 Build Succeeded
Metrics [docs]Public APIs missing comments
Public APIs missing exports
Unknown metric groupsAPI count
ESLint disabled in files
ESLint disabled line counts
Total ESLint disabled count
History
cc @dgieselaar |
|
Starting backport for target branches: 8.x, 9.0 https://github.com/elastic/kibana/actions/runs/13788048345 |
💔 All backports failed
Manual backportTo create the backport manually run: Questions ?Please refer to the Backport tool documentation |
|
Friendly reminder: Looks like this PR hasn’t been backported yet. |
|
Friendly reminder: Looks like this PR hasn’t been backported yet. |
|
Friendly reminder: Looks like this PR hasn’t been backported yet. |
Download, parse and replay loghub data with Synthtrace, for use in the Streams project. In summary: - adds a `@kbn/sample-log-parser` package which parses Loghub sample data, creates valid parsers for extracting and replacing timestamps, using the LLM - add a `sample_logs` scenario which uses the parsed data sets to replay Loghub data continuously as if it were live data - refactor some parts of Synthtrace (follow-up work captured in elastic#212179) - Replace custom Logger object with Kibana-standard ToolingLog - Report progress and estimated time to completion for long-running jobs - Simplify scenarioOpts (allow comma-separated key-value pairs instead of just JSON) - Simplify client initialization - When using workers, only bootstrap once (in the main thread) - Allow workers to gracefully shutdown - Downgrade some logging levels for less noise --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit ba13e86)
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
Download, parse and replay loghub data with Synthtrace, for use in the Streams project. In summary: - adds a `@kbn/sample-log-parser` package which parses Loghub sample data, creates valid parsers for extracting and replacing timestamps, using the LLM - add a `sample_logs` scenario which uses the parsed data sets to replay Loghub data continuously as if it were live data - refactor some parts of Synthtrace (follow-up work captured in elastic#212179) ## Synthtrace changes - Replace custom Logger object with Kibana-standard ToolingLog - Report progress and estimated time to completion for long-running jobs - Simplify scenarioOpts (allow comma-separated key-value pairs instead of just JSON) - Simplify client initialization - When using workers, only bootstrap once (in the main thread) - Allow workers to gracefully shutdown - Downgrade some logging levels for less noise --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit ba13e86) # Conflicts: # .github/CODEOWNERS # src/platform/packages/shared/kbn-apm-synthtrace/src/cli/utils/get_apm_es_client.ts # src/platform/packages/shared/kbn-apm-synthtrace/src/cli/utils/get_entities_es_client.ts # src/platform/packages/shared/kbn-apm-synthtrace/src/cli/utils/get_infra_es_client.ts # src/platform/packages/shared/kbn-apm-synthtrace/src/cli/utils/get_logs_es_client.ts # src/platform/packages/shared/kbn-apm-synthtrace/src/cli/utils/get_otel_es_client.ts # src/platform/packages/shared/kbn-apm-synthtrace/src/cli/utils/get_synthetics_es_client.ts # src/platform/packages/shared/kbn-apm-synthtrace/src/lib/apm/client/apm_synthtrace_kibana_client.ts
|
Looks like this PR has a backport PR but it still hasn't been merged. Please merge it ASAP to keep the branches relatively in sync. |
# Backport This will backport the following commits from `main` to `8.x`: - [[Streams] Replay loghub data with synthtrace (#212120)](#212120) <!--- Backport version: 9.6.6 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sorenlouv/backport) <!--BACKPORT [{"author":{"name":"Dario Gieselaar","email":"dario.gieselaar@elastic.co"},"sourceCommit":{"committedDate":"2025-03-11T12:30:06Z","message":"[Streams] Replay loghub data with synthtrace (#212120)\n\nDownload, parse and replay loghub data with Synthtrace, for use in the\nStreams project. In summary:\n\n- adds a `@kbn/sample-log-parser` package which parses Loghub sample\ndata, creates valid parsers for extracting and replacing timestamps,\nusing the LLM\n- add a `sample_logs` scenario which uses the parsed data sets to replay\nLoghub data continuously as if it were live data\n- refactor some parts of Synthtrace (follow-up work captured in\nhttps://github.com//issues/212179)\n\n## Synthtrace changes\n\n- Replace custom Logger object with Kibana-standard ToolingLog\n- Report progress and estimated time to completion for long-running jobs\n- Simplify scenarioOpts (allow comma-separated key-value pairs instead\nof just JSON)\n- Simplify client initialization\n- When using workers, only bootstrap once (in the main thread)\n- Allow workers to gracefully shutdown\n- Downgrade some logging levels for less noise\n\n---------\n\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"ba13e86a70c331275d40ed8f84c3f264845afc6e","branchLabelMapping":{"^v9.1.0$":"main","^v8.19.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","backport missing","v9.0.0","ci:project-deploy-observability","Team:obs-ux-infra_services","backport:version","Feature:Streams","v9.1.0","v8.19.0"],"title":"[Streams] Replay loghub data with synthtrace","number":212120,"url":"https://github.com/elastic/kibana/pull/212120","mergeCommit":{"message":"[Streams] Replay loghub data with synthtrace (#212120)\n\nDownload, parse and replay loghub data with Synthtrace, for use in the\nStreams project. In summary:\n\n- adds a `@kbn/sample-log-parser` package which parses Loghub sample\ndata, creates valid parsers for extracting and replacing timestamps,\nusing the LLM\n- add a `sample_logs` scenario which uses the parsed data sets to replay\nLoghub data continuously as if it were live data\n- refactor some parts of Synthtrace (follow-up work captured in\nhttps://github.com//issues/212179)\n\n## Synthtrace changes\n\n- Replace custom Logger object with Kibana-standard ToolingLog\n- Report progress and estimated time to completion for long-running jobs\n- Simplify scenarioOpts (allow comma-separated key-value pairs instead\nof just JSON)\n- Simplify client initialization\n- When using workers, only bootstrap once (in the main thread)\n- Allow workers to gracefully shutdown\n- Downgrade some logging levels for less noise\n\n---------\n\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"ba13e86a70c331275d40ed8f84c3f264845afc6e"}},"sourceBranch":"main","suggestedTargetBranches":["9.0","8.x"],"targetPullRequestStates":[{"branch":"9.0","label":"v9.0.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v9.1.0","branchLabelMappingKey":"^v9.1.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/212120","number":212120,"mergeCommit":{"message":"[Streams] Replay loghub data with synthtrace (#212120)\n\nDownload, parse and replay loghub data with Synthtrace, for use in the\nStreams project. In summary:\n\n- adds a `@kbn/sample-log-parser` package which parses Loghub sample\ndata, creates valid parsers for extracting and replacing timestamps,\nusing the LLM\n- add a `sample_logs` scenario which uses the parsed data sets to replay\nLoghub data continuously as if it were live data\n- refactor some parts of Synthtrace (follow-up work captured in\nhttps://github.com//issues/212179)\n\n## Synthtrace changes\n\n- Replace custom Logger object with Kibana-standard ToolingLog\n- Report progress and estimated time to completion for long-running jobs\n- Simplify scenarioOpts (allow comma-separated key-value pairs instead\nof just JSON)\n- Simplify client initialization\n- When using workers, only bootstrap once (in the main thread)\n- Allow workers to gracefully shutdown\n- Downgrade some logging levels for less noise\n\n---------\n\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"ba13e86a70c331275d40ed8f84c3f264845afc6e"}},{"branch":"8.x","label":"v8.19.0","branchLabelMappingKey":"^v8.19.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT--> --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Download, parse and replay loghub data with Synthtrace, for use in the Streams project. In summary: - adds a `@kbn/sample-log-parser` package which parses Loghub sample data, creates valid parsers for extracting and replacing timestamps, using the LLM - add a `sample_logs` scenario which uses the parsed data sets to replay Loghub data continuously as if it were live data - refactor some parts of Synthtrace (follow-up work captured in elastic#212179) ## Synthtrace changes - Replace custom Logger object with Kibana-standard ToolingLog - Report progress and estimated time to completion for long-running jobs - Simplify scenarioOpts (allow comma-separated key-value pairs instead of just JSON) - Simplify client initialization - When using workers, only bootstrap once (in the main thread) - Allow workers to gracefully shutdown - Downgrade some logging levels for less noise --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Download, parse and replay loghub data with Synthtrace, for use in the Streams project. In summary:
@kbn/sample-log-parserpackage which parses Loghub sample data, creates valid parsers for extracting and replacing timestamps, using the LLMsample_logsscenario which uses the parsed data sets to replay Loghub data continuously as if it were live dataSynthtrace changes