Skip to content

[Streams] Replay loghub data with synthtrace#212120

Merged
dgieselaar merged 21 commits intoelastic:mainfrom
dgieselaar:sample-log-parser
Mar 11, 2025
Merged

[Streams] Replay loghub data with synthtrace#212120
dgieselaar merged 21 commits intoelastic:mainfrom
dgieselaar:sample-log-parser

Conversation

@dgieselaar
Copy link
Contributor

@dgieselaar dgieselaar commented Feb 21, 2025

Download, parse and replay loghub data with Synthtrace, for use in the Streams project. In summary:

  • adds a @kbn/sample-log-parser package which parses Loghub sample data, creates valid parsers for extracting and replacing timestamps, using the LLM
  • add a sample_logs scenario which uses the parsed data sets to replay Loghub data continuously as if it were live data
  • refactor some parts of Synthtrace (follow-up work captured in [Synthtrace] Consolidate clients #212179)

Synthtrace changes

  • Replace custom Logger object with Kibana-standard ToolingLog
  • Report progress and estimated time to completion for long-running jobs
  • Simplify scenarioOpts (allow comma-separated key-value pairs instead of just JSON)
  • Simplify client initialization
  • When using workers, only bootstrap once (in the main thread)
  • Allow workers to gracefully shutdown
  • Downgrade some logging levels for less noise

@dgieselaar dgieselaar added v9.0.0 v9.1.0 v8.19.0 Feature:Streams This is the label for the Streams Project labels Feb 23, 2025
@dgieselaar dgieselaar self-assigned this Feb 23, 2025
@dgieselaar dgieselaar marked this pull request as ready for review February 23, 2025 14:22
@dgieselaar dgieselaar requested review from a team as code owners February 23, 2025 14:22
@dgieselaar dgieselaar requested a review from a team February 23, 2025 14:22
@dgieselaar dgieselaar requested a review from a team as a code owner February 23, 2025 14:22
@dgieselaar dgieselaar added backport:version Backport to applied version labels release_note:skip Skip the PR/issue when compiling release notes labels Feb 23, 2025
Copy link
Contributor

@flash1293 flash1293 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation-wise this looks pretty good to me. Some meta questions:

  • Should we rely on the public loghub repo or fork it off? I'm a little worried this breaking at some point because loghub changes its layout. This would also make it easier to expand it by our own means. In both cases we should cite loghub and the paper somewhere appropriate (like a readme file) as by the license
  • I'm not so sure about the different speeds. I'm running via node scripts/synthtrace.js sample_logs --live --kibana=http://localhost:5601 --target=http://localhost:9200 --liveBucketSize=1000 and the liveBucketSize is essentially not considered because it computes its own speed. Can we make it taken into account? Different speeds for different data sets are a nice touch as it mirrors reality, but I would like to control the factor of data intake (and speed everything up by a factor of 1k for example). Maybe that's already possible and I just don't know the right command
  • I spot-checked some aspects of the refactoring and it makes sense to me, but I didn't dig through everything and as I'm not super familiar with the code base it's likely I'm missing something in there

@botelastic botelastic bot added the ci:project-deploy-observability Create an Observability project label Feb 24, 2025
@github-actions
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@dgieselaar
Copy link
Contributor Author

Should we rely on the public loghub repo or fork it off? I'm a little worried this breaking at some point because loghub changes its layout. This would also make it easier to expand it by our own means. In both cases we should cite loghub and the paper somewhere appropriate (like a readme file) as by the license

I'm fine with either - but maybe good to do that as a follow-up, I'm not sure what the legal ramifications are.

I'm not so sure about the different speeds. I'm running via node scripts/synthtrace.js sample_logs --live --kibana=http://localhost:5601/ --target=http://localhost:9200/ --liveBucketSize=1000 and the liveBucketSize is essentially not considered because it computes its own speed. Can we make it taken into account? Different speeds for different data sets are a nice touch as it mirrors reality, but I would like to control the factor of data intake (and speed everything up by a factor of 1k for example). Maybe that's already possible and I just don't know the right command

Yes, totally forgot about this setting, I should be able to use it. Would we use a constant indexing rate for each generator, or keep the relative rate per generator (e.g. Android indexes at a way higher rate than Macbook)?

@flash1293
Copy link
Contributor

I'm fine with either - but maybe good to do that as a follow-up, I'm not sure what the legal ramifications are.

Sounds good, then we should add a backlink to the repo and paper and follow up later.

Would we use a constant indexing rate for each generator, or keep the relative rate per generator (e.g. Android indexes at a way higher rate than Macbook)?

I would prefer the latter, in practice this kind of thing happens all the time.

@botelastic botelastic bot added the Team:obs-ux-infra_services - DEPRECATED DEPRECATED - Use Team:obs-presentation. label Feb 25, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)

@elasticmachine
Copy link
Contributor

elasticmachine commented Mar 11, 2025

💚 Build Succeeded

  • Buildkite Build
  • Commit: ed40be5
  • Kibana Serverless Image: docker.elastic.co/kibana-ci/kibana-serverless:pr-212120-ed40be5a5c4a

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/apm-synthtrace 97 98 +1
@kbn/apm-synthtrace-client 272 274 +2
@kbn/sample-log-parser - 18 +18
total +21

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
@kbn/sample-log-parser - 1 +1
Unknown metric groups

API count

id before after diff
@kbn/apm-synthtrace 97 98 +1
@kbn/apm-synthtrace-client 272 274 +2
@kbn/sample-log-parser - 18 +18
total +21

ESLint disabled in files

id before after diff
@kbn/apm-synthtrace 2 3 +1

ESLint disabled line counts

id before after diff
@kbn/apm-synthtrace 6 2 -4

Total ESLint disabled count

id before after diff
@kbn/apm-synthtrace 8 5 -3

History

cc @dgieselaar

@dgieselaar dgieselaar merged commit ba13e86 into elastic:main Mar 11, 2025
10 checks passed
@kibanamachine
Copy link
Contributor

Starting backport for target branches: 8.x, 9.0

https://github.com/elastic/kibana/actions/runs/13788048345

@kibanamachine
Copy link
Contributor

💔 All backports failed

Status Branch Result
8.x Backport failed because of merge conflicts
9.0 Backport failed because of merge conflicts

You might need to backport the following PRs to 9.0:
- [Lens] [Dashboard] Remove metric chart progress bar rounded border (#211189)
- [Security Assistant] Fix timeout during Knowledge Base setup (#213738)
- [APM] Breakdown Top dependencies API (#211441)
- [SLO] file reorg, improvements to badge (#213597)
- [Discover] Exclude Elasticsearch metadata fields from Display in Content Column (#213255)

Manual backport

To create the backport manually run:

node scripts/backport --pr 212120

Questions ?

Please refer to the Backport tool documentation

@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 212120 locally

@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 212120 locally

@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 212120 locally

dgieselaar added a commit to dgieselaar/kibana that referenced this pull request Mar 18, 2025
Download, parse and replay loghub data with Synthtrace, for use in the
Streams project. In summary:

- adds a `@kbn/sample-log-parser` package which parses Loghub sample
data, creates valid parsers for extracting and replacing timestamps,
using the LLM
- add a `sample_logs` scenario which uses the parsed data sets to replay
Loghub data continuously as if it were live data
- refactor some parts of Synthtrace (follow-up work captured in
elastic#212179)

- Replace custom Logger object with Kibana-standard ToolingLog
- Report progress and estimated time to completion for long-running jobs
- Simplify scenarioOpts (allow comma-separated key-value pairs instead
of just JSON)
- Simplify client initialization
- When using workers, only bootstrap once (in the main thread)
- Allow workers to gracefully shutdown
- Downgrade some logging levels for less noise

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
(cherry picked from commit ba13e86)
@dgieselaar
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
9.0
8.x

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

dgieselaar added a commit to dgieselaar/kibana that referenced this pull request Mar 18, 2025
Download, parse and replay loghub data with Synthtrace, for use in the
Streams project. In summary:

- adds a `@kbn/sample-log-parser` package which parses Loghub sample
data, creates valid parsers for extracting and replacing timestamps,
using the LLM
- add a `sample_logs` scenario which uses the parsed data sets to replay
Loghub data continuously as if it were live data
- refactor some parts of Synthtrace (follow-up work captured in
elastic#212179)

## Synthtrace changes

- Replace custom Logger object with Kibana-standard ToolingLog
- Report progress and estimated time to completion for long-running jobs
- Simplify scenarioOpts (allow comma-separated key-value pairs instead
of just JSON)
- Simplify client initialization
- When using workers, only bootstrap once (in the main thread)
- Allow workers to gracefully shutdown
- Downgrade some logging levels for less noise

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
(cherry picked from commit ba13e86)

# Conflicts:
#	.github/CODEOWNERS
#	src/platform/packages/shared/kbn-apm-synthtrace/src/cli/utils/get_apm_es_client.ts
#	src/platform/packages/shared/kbn-apm-synthtrace/src/cli/utils/get_entities_es_client.ts
#	src/platform/packages/shared/kbn-apm-synthtrace/src/cli/utils/get_infra_es_client.ts
#	src/platform/packages/shared/kbn-apm-synthtrace/src/cli/utils/get_logs_es_client.ts
#	src/platform/packages/shared/kbn-apm-synthtrace/src/cli/utils/get_otel_es_client.ts
#	src/platform/packages/shared/kbn-apm-synthtrace/src/cli/utils/get_synthetics_es_client.ts
#	src/platform/packages/shared/kbn-apm-synthtrace/src/lib/apm/client/apm_synthtrace_kibana_client.ts
@dgieselaar dgieselaar removed the v9.0.0 label Mar 18, 2025
@kibanamachine
Copy link
Contributor

Looks like this PR has a backport PR but it still hasn't been merged. Please merge it ASAP to keep the branches relatively in sync.

dgieselaar added a commit that referenced this pull request Mar 19, 2025
# Backport

This will backport the following commits from `main` to `8.x`:
- [[Streams] Replay loghub data with synthtrace
(#212120)](#212120)

<!--- Backport version: 9.6.6 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sorenlouv/backport)

<!--BACKPORT [{"author":{"name":"Dario
Gieselaar","email":"dario.gieselaar@elastic.co"},"sourceCommit":{"committedDate":"2025-03-11T12:30:06Z","message":"[Streams]
Replay loghub data with synthtrace (#212120)\n\nDownload, parse and
replay loghub data with Synthtrace, for use in the\nStreams project. In
summary:\n\n- adds a `@kbn/sample-log-parser` package which parses
Loghub sample\ndata, creates valid parsers for extracting and replacing
timestamps,\nusing the LLM\n- add a `sample_logs` scenario which uses
the parsed data sets to replay\nLoghub data continuously as if it were
live data\n- refactor some parts of Synthtrace (follow-up work captured
in\nhttps://github.com//issues/212179)\n\n## Synthtrace
changes\n\n- Replace custom Logger object with Kibana-standard
ToolingLog\n- Report progress and estimated time to completion for
long-running jobs\n- Simplify scenarioOpts (allow comma-separated
key-value pairs instead\nof just JSON)\n- Simplify client
initialization\n- When using workers, only bootstrap once (in the main
thread)\n- Allow workers to gracefully shutdown\n- Downgrade some
logging levels for less noise\n\n---------\n\nCo-authored-by:
kibanamachine
<42973632+kibanamachine@users.noreply.github.com>","sha":"ba13e86a70c331275d40ed8f84c3f264845afc6e","branchLabelMapping":{"^v9.1.0$":"main","^v8.19.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","backport
missing","v9.0.0","ci:project-deploy-observability","Team:obs-ux-infra_services","backport:version","Feature:Streams","v9.1.0","v8.19.0"],"title":"[Streams]
Replay loghub data with
synthtrace","number":212120,"url":"https://github.com/elastic/kibana/pull/212120","mergeCommit":{"message":"[Streams]
Replay loghub data with synthtrace (#212120)\n\nDownload, parse and
replay loghub data with Synthtrace, for use in the\nStreams project. In
summary:\n\n- adds a `@kbn/sample-log-parser` package which parses
Loghub sample\ndata, creates valid parsers for extracting and replacing
timestamps,\nusing the LLM\n- add a `sample_logs` scenario which uses
the parsed data sets to replay\nLoghub data continuously as if it were
live data\n- refactor some parts of Synthtrace (follow-up work captured
in\nhttps://github.com//issues/212179)\n\n## Synthtrace
changes\n\n- Replace custom Logger object with Kibana-standard
ToolingLog\n- Report progress and estimated time to completion for
long-running jobs\n- Simplify scenarioOpts (allow comma-separated
key-value pairs instead\nof just JSON)\n- Simplify client
initialization\n- When using workers, only bootstrap once (in the main
thread)\n- Allow workers to gracefully shutdown\n- Downgrade some
logging levels for less noise\n\n---------\n\nCo-authored-by:
kibanamachine
<42973632+kibanamachine@users.noreply.github.com>","sha":"ba13e86a70c331275d40ed8f84c3f264845afc6e"}},"sourceBranch":"main","suggestedTargetBranches":["9.0","8.x"],"targetPullRequestStates":[{"branch":"9.0","label":"v9.0.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v9.1.0","branchLabelMappingKey":"^v9.1.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/212120","number":212120,"mergeCommit":{"message":"[Streams]
Replay loghub data with synthtrace (#212120)\n\nDownload, parse and
replay loghub data with Synthtrace, for use in the\nStreams project. In
summary:\n\n- adds a `@kbn/sample-log-parser` package which parses
Loghub sample\ndata, creates valid parsers for extracting and replacing
timestamps,\nusing the LLM\n- add a `sample_logs` scenario which uses
the parsed data sets to replay\nLoghub data continuously as if it were
live data\n- refactor some parts of Synthtrace (follow-up work captured
in\nhttps://github.com//issues/212179)\n\n## Synthtrace
changes\n\n- Replace custom Logger object with Kibana-standard
ToolingLog\n- Report progress and estimated time to completion for
long-running jobs\n- Simplify scenarioOpts (allow comma-separated
key-value pairs instead\nof just JSON)\n- Simplify client
initialization\n- When using workers, only bootstrap once (in the main
thread)\n- Allow workers to gracefully shutdown\n- Downgrade some
logging levels for less noise\n\n---------\n\nCo-authored-by:
kibanamachine
<42973632+kibanamachine@users.noreply.github.com>","sha":"ba13e86a70c331275d40ed8f84c3f264845afc6e"}},{"branch":"8.x","label":"v8.19.0","branchLabelMappingKey":"^v8.19.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
@kibanamachine kibanamachine removed the backport missing Added to PRs automatically when the are determined to be missing a backport. label Mar 19, 2025
CAWilson94 pushed a commit to CAWilson94/kibana that referenced this pull request Mar 22, 2025
Download, parse and replay loghub data with Synthtrace, for use in the
Streams project. In summary:

- adds a `@kbn/sample-log-parser` package which parses Loghub sample
data, creates valid parsers for extracting and replacing timestamps,
using the LLM
- add a `sample_logs` scenario which uses the parsed data sets to replay
Loghub data continuously as if it were live data
- refactor some parts of Synthtrace (follow-up work captured in
elastic#212179)

## Synthtrace changes

- Replace custom Logger object with Kibana-standard ToolingLog
- Report progress and estimated time to completion for long-running jobs
- Simplify scenarioOpts (allow comma-separated key-value pairs instead
of just JSON)
- Simplify client initialization
- When using workers, only bootstrap once (in the main thread)
- Allow workers to gracefully shutdown
- Downgrade some logging levels for less noise

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:version Backport to applied version labels ci:project-deploy-observability Create an Observability project Feature:Streams This is the label for the Streams Project release_note:skip Skip the PR/issue when compiling release notes Team:obs-ux-infra_services - DEPRECATED DEPRECATED - Use Team:obs-presentation. v8.19.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants