-
Notifications
You must be signed in to change notification settings - Fork 8.5k
🌊 LLM-powered parsing suggestions #208777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
flash1293
merged 158 commits into
elastic:main
from
flash1293:flash1293/llm-parsing-suggestions
Feb 20, 2025
Merged
Changes from all commits
Commits
Show all changes
158 commits
Select commit
Hold shift + click to select a range
a79e858
feat(streams): wip enrichment redesign
tonyghiani e6650ee
feat(streams): wip redesign
tonyghiani 6a58780
refactor(streams): update copies
tonyghiani cb10b5d
Merge branch 'main' into 93-update-ui-processing
tonyghiani eeb67a7
refactor(streams): allow text ellipsis
tonyghiani e18dd93
refactor(streams): reset forms on cancel
tonyghiani 4ec0f7d
refactor(streams): update internal forms structure and typing
tonyghiani 470a22f
refactor(streams): update internal state management to track processo…
tonyghiani 8962b15
refactor(streams): update discard changes modal
tonyghiani bd81230
refactor(streams): update dissect processor typing
tonyghiani c260449
refactor(streams): minor changes
tonyghiani 7c1c4f0
refactor(streams): update sampling condition
tonyghiani 78a09a4
Merge branch '93-update-ui-processing' of github.com:tonyghiani/kiban…
tonyghiani 731a125
Merge branch 'tonyghiani-93-update-ui-processing' into 93-update-ui-p…
tonyghiani c19f57d
Merge branch 'main' into 93-update-ui-processing
tonyghiani d7fbb25
refactor(streams): improvements to simulation
tonyghiani 224c7df
refactor(streams): update columns rendering for unmatched docs
tonyghiani 39d6a62
refactor(streams): wip simulation table
tonyghiani 10c513f
refactor(streams): wip simulation table style
tonyghiani 12e38a1
Merge branch 'main' into 93-update-ui-processing
tonyghiani 0040628
feat(streams): wip data preview
tonyghiani 23b80da
start suggestions page
flash1293 96549e2
refactor(streams): minor cleanup
tonyghiani 20d849a
refactor(streams): minor changes
tonyghiani 0df5843
refactor(streams): update live processors udpates
tonyghiani e48537f
refactor(streams): remove import
tonyghiani 340d238
refactor(streams): remove unused props
tonyghiani 81e38b9
fix(streams): disable simulation on existing processors
tonyghiani 411fac8
refactor(streams): minor changes
tonyghiani ae4b54c
Merge branch 'main' into 93-update-ui-processing
tonyghiani 9e15664
Merge branch 'main' into 93-update-ui-processing
tonyghiani 5fe314f
basic parsing suggestions
flash1293 915e846
Merge branch 'main' into 93-update-ui-processing
tonyghiani 39d9f9c
refactor(streams): update usage and remove legacy types
tonyghiani 7ab0375
Merge branch 'main' into 93-update-ui-processing
tonyghiani c4708bb
refactor(streams): address offline feedback
tonyghiani a90fe1a
Merge branch '93-update-ui-processing' of github.com:tonyghiani/kiban…
tonyghiani 315f32f
Merge branch 'main' into 93-update-ui-processing
tonyghiani 5c259be
refactor(streams): address change requests
tonyghiani be1d1ee
refactor(streams): reworded prompt message
tonyghiani 3b9d227
Merge remote-tracking branch 'upstream/main' into flash1293/llm-parsi…
flash1293 3d5b7a9
better processing
flash1293 5238c9a
ai suggestions for parsing
flash1293 a4cf9d6
make it somewhat work
flash1293 94b6e5e
make it work a little more
flash1293 9dba6c1
make it work a little more even
flash1293 2aa1331
Merge remote-tracking branch 'upstream/main' into flash1293/llm-parsi…
flash1293 620865c
[CI] Auto-commit changed files from 'node scripts/styled_components_m…
kibanamachine b4596c2
fix
flash1293 3e3c50e
Merge branch 'flash1293/llm-parsing-suggestions' of github.com:flash1…
flash1293 46f47df
fix the prompt
flash1293 423d871
fix the prompt again
flash1293 c405695
[CI] Auto-commit changed files from 'node scripts/eslint --no-cache -…
kibanamachine b3dcb17
Merge remote-tracking branch 'upstream/main' into flash1293/llm-parsi…
flash1293 eeda79c
[CI] Auto-commit changed files from 'node scripts/styled_components_m…
kibanamachine eb60cc4
improve prompt
flash1293 bba2eb9
Merge branch 'flash1293/llm-parsing-suggestions' of github.com:flash1…
flash1293 c70cff0
fix
flash1293 d7fc7cf
feat(streams): improve stream docs typing
tonyghiani b19ca23
fix hardcoded connector
flash1293 742a49c
fix(streams): fix images lazy load chunks
tonyghiani 0a78c65
refactor(streams): wip split simulation utils
tonyghiani 796dae1
Merge branch 'main' into 127-enrichment-simulation-improvements
tonyghiani c0c4de5
check for connector
flash1293 759d5db
Merge remote-tracking branch 'upstream/main' into flash1293/llm-parsi…
flash1293 aec2466
fix
flash1293 e7a06fc
Merge remote-tracking branch 'upstream/main' into flash1293/llm-parsi…
flash1293 7d25576
revert workaround
flash1293 061aade
Merge remote-tracking branch 'upstream/main' into flash1293/llm-parsi…
flash1293 08b3e5a
refactor(streams): minor change
tonyghiani 2d34c6a
refactor a bit
flash1293 fa4b760
fixes
flash1293 38316fa
[CI] Auto-commit changed files from 'node scripts/styled_components_m…
kibanamachine 2c1a8a8
remove retries and fix bugs
flash1293 36509c1
Merge branch 'flash1293/llm-parsing-suggestions' of github.com:flash1…
flash1293 a81de65
Merge remote-tracking branch 'upstream/main' into flash1293/llm-parsi…
flash1293 cc81fb8
fix more types
flash1293 47dd7ed
only send the shown samples
flash1293 0695680
[CI] Auto-commit changed files from 'node scripts/styled_components_m…
kibanamachine 9a45f2d
wip(streams): simulation API
tonyghiani 4a27b4e
Merge branch 'main' into flash1293/llm-parsing-suggestions
flash1293 72fc09c
wip(streams): simulation API updates
tonyghiani 6da33ca
all the review comments
flash1293 8365696
Merge remote-tracking branch 'upstream/main' into flash1293/llm-parsi…
flash1293 b3572fc
more cleanups
flash1293 f7230e1
Merge branch 'flash1293/llm-parsing-suggestions' of github.com:flash1…
flash1293 29fb946
refactor(streams): update client usage of state
tonyghiani a69dc16
refactor(streams): use processors id for simulation
tonyghiani 65cd1de
refactor(streams): update usage of processor id
tonyghiani 80563af
recursiverecord -> sampledocument
flash1293 289031e
Merge remote-tracking branch 'upstream/main' into flash1293/llm-parsi…
flash1293 62f33cb
fix types
flash1293 cb1e5f9
refactor(streams): remove unused functions
tonyghiani e2f42e5
Merge branch 'main' into 127-enrichment-simulation-improvements
tonyghiani edbdaee
feat(kbn-object-utils): update flattenObject override priority
tonyghiani fb49143
wip(streams): detected field simulation
tonyghiani 3b3e858
Merge remote-tracking branch 'upstream/main' into flash1293/llm-parsi…
flash1293 e1b19d5
some fixes
flash1293 996573e
some fixes
flash1293 6659dfc
refactor
flash1293 c87762d
fix(streams): performance issue documents parsing
tonyghiani a574288
wip(streams): add error reporting on streams app
tonyghiani 9ccfdde
feat(streams): add error reporting on streams app
tonyghiani 0b45a52
feat(kbn-object-utils): update naming
tonyghiani 6bf5f1b
feat(streams): remove import
tonyghiani 304381e
refactor(streams): move files
tonyghiani 0b58e5c
feat(streams): apply gracefully handled processor errors
tonyghiani 378dc28
feat(streams): improve error reporting simulation API
tonyghiani fbd277e
feat(streams): improve simulation API docs comments
tonyghiani 335b326
feat(streams): update ui on disabled submit
tonyghiani bcdf398
test(streams): update processing simulation tests
tonyghiani 6ca1455
chore(streams): fix typing issues
tonyghiani 1b2fff2
test(streams): add more processing simulation tests
tonyghiani 8e17d42
Merge branch 'main' into 127-enrichment-simulation-improvements
tonyghiani 02057be
Merge branch 'main' into 127-enrichment-simulation-improvements
tonyghiani fafcd73
refactor(streams): update type
tonyghiani d5f049e
Merge branch '127-enrichment-simulation-improvements' of github.com:t…
tonyghiani 1ba1eb0
refactor(streams): update errors + badges styles and copies
tonyghiani f8207d2
refactor(streams): update label pluralization
tonyghiani 0ba9c03
refactor(streams): revert file deletion
tonyghiani 67eafaf
refactor(streams): minor ui changes
tonyghiani c75e9de
Update src/platform/packages/shared/kbn-object-utils/src/flatten_obje…
tonyghiani f02e411
Update src/platform/packages/shared/kbn-object-utils/src/flatten_obje…
tonyghiani 72b207f
refactor(streams): minor ui changes
tonyghiani e539e65
Merge branch '127-enrichment-simulation-improvements' of github.com:t…
tonyghiani 2dce11d
[CI] Auto-commit changed files from 'node scripts/eslint --no-cache -…
kibanamachine cc3b66e
refactor(streams): fix test
tonyghiani b461282
Merge branch 'main' into 127-enrichment-simulation-improvements
tonyghiani c74aa6e
refactor(streams): wip state management
tonyghiani 2c11408
refactor(streams): address review tips
tonyghiani 707cc2c
refactor(streams): fix exports order
tonyghiani a58291f
refactor(streams): wip state management
tonyghiani 3de284d
refactor(streams): remove leftover z.lazy call
tonyghiani 7cac053
refactor(streams): wip state management
tonyghiani 525f55a
Merge branch '127-enrichment-simulation-improvements' of github.com:t…
tonyghiani 5ec5561
Merge branch 'tonyghiani-127-enrichment-simulation-improvements' into…
tonyghiani 8143adc
refactor(streams): wip state management
tonyghiani 2d440f8
refactor(kbn-object-utils): split flatten object for nested priority …
tonyghiani 5943707
Merge remote-tracking branch 'upstream/main' into flash1293/llm-parsi…
flash1293 e85d43b
Merge remote-tracking branch 'tonyghiani/127-enrichment-simulation-im…
flash1293 cb2f2cd
remove merge conflict
flash1293 1740676
[CI] Auto-commit changed files from 'ts-node .buildkite/pipeline-reso…
kibanamachine 5fe3967
review comments
flash1293 6f1ad40
Merge branch 'flash1293/llm-parsing-suggestions' of github.com:flash1…
flash1293 739ee42
refactor(streams): wip new stream enrichment hook
tonyghiani 288dc3f
reset errors
flash1293 0ef51d2
refactor(streams): wip new stream enrichment hook
tonyghiani 35edb90
Merge branch 'main' into 102-refactor-state-management
tonyghiani 8ff8453
refactor(streams): update usage to state machine
tonyghiani 705e53a
Merge branch 'main' into 102-refactor-state-management
tonyghiani c1f3895
Merge remote-tracking branch 'tonyghiani/102-refactor-state-managemen…
flash1293 0cade54
Merge remote-tracking branch 'upstream/main' into flash1293/llm-parsi…
flash1293 ef7edbc
Merge remote-tracking branch 'upstream/main' into flash1293/llm-parsi…
flash1293 44cba3c
[CI] Auto-commit changed files from 'node scripts/styled_components_m…
kibanamachine c9d307a
revert draft changes
flash1293 3a96ea4
Merge branch 'flash1293/llm-parsing-suggestions' of github.com:flash1…
flash1293 b89dad2
fine tuning
flash1293 6b3ce93
[CI] Auto-commit changed files from 'node scripts/styled_components_m…
kibanamachine File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
217 changes: 217 additions & 0 deletions
217
...ons/observability/plugins/streams/server/routes/streams/processing/suggestions_handler.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,217 @@ | ||
| /* | ||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
| * or more contributor license agreements. Licensed under the Elastic License | ||
| * 2.0; you may not use this file except in compliance with the Elastic License | ||
| * 2.0. | ||
| */ | ||
|
|
||
| import { IScopedClusterClient } from '@kbn/core/server'; | ||
| import { get, groupBy, mapValues, orderBy, shuffle, uniq, uniqBy } from 'lodash'; | ||
| import { InferenceClient } from '@kbn/inference-plugin/server'; | ||
| import { FlattenRecord } from '@kbn/streams-schema'; | ||
| import { StreamsClient } from '../../../lib/streams/client'; | ||
| import { simulateProcessing } from './simulation_handler'; | ||
| import { ProcessingSuggestionBody } from './route'; | ||
|
|
||
| export const handleProcessingSuggestion = async ( | ||
| name: string, | ||
| body: ProcessingSuggestionBody, | ||
| inferenceClient: InferenceClient, | ||
| scopedClusterClient: IScopedClusterClient, | ||
| streamsClient: StreamsClient | ||
| ) => { | ||
| const { field, samples } = body; | ||
| // Turn sample messages into patterns to group by | ||
| const evalPattern = (sample: string) => { | ||
| return sample | ||
| .replace(/[ \t\n]+/g, ' ') | ||
| .replace(/[A-Za-z]+/g, 'a') | ||
| .replace(/[0-9]+/g, '0') | ||
| .replace(/(a a)+/g, 'a') | ||
| .replace(/(a0)+/g, 'f') | ||
| .replace(/(f:)+/g, 'f:') | ||
| .replace(/0(.0)+/g, 'p'); | ||
| }; | ||
|
Comment on lines
+25
to
+34
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe some comments/examples here?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, I meant to add a unit test for this whole handler as it's doing a bunch of things and forgot - will add |
||
|
|
||
| const NUMBER_PATTERN_CATEGORIES = 5; | ||
| const NUMBER_SAMPLES_PER_PATTERN = 8; | ||
|
|
||
| const samplesWithPatterns = samples.map((sample) => { | ||
| const pattern = evalPattern(get(sample, field) as string); | ||
| return { | ||
| document: sample, | ||
| fullPattern: pattern, | ||
| truncatedPattern: pattern.slice(0, 10), | ||
| fieldValue: get(sample, field) as string, | ||
| }; | ||
| }); | ||
|
|
||
| // Group samples by their truncated patterns | ||
| const groupedByTruncatedPattern = groupBy(samplesWithPatterns, 'truncatedPattern'); | ||
| // Process each group to create pattern summaries | ||
| const patternSummaries = mapValues( | ||
| groupedByTruncatedPattern, | ||
| (samplesForTruncatedPattern, truncatedPattern) => { | ||
| const uniqueValues = uniq(samplesForTruncatedPattern.map(({ fieldValue }) => fieldValue)); | ||
| const shuffledExamples = shuffle(uniqueValues); | ||
|
|
||
| return { | ||
| truncatedPattern, | ||
| count: samplesForTruncatedPattern.length, | ||
| exampleValues: shuffledExamples.slice(0, NUMBER_SAMPLES_PER_PATTERN), | ||
| }; | ||
| } | ||
| ); | ||
| // Convert to array, sort by count, and take top patterns | ||
| const patternsToProcess = orderBy(Object.values(patternSummaries), 'count', 'desc').slice( | ||
| 0, | ||
| NUMBER_PATTERN_CATEGORIES | ||
| ); | ||
|
|
||
| const results = await Promise.all( | ||
| patternsToProcess.map((sample) => | ||
| processPattern( | ||
| sample, | ||
| name, | ||
| body, | ||
| inferenceClient, | ||
| scopedClusterClient, | ||
| streamsClient, | ||
| field, | ||
| samples | ||
| ) | ||
| ) | ||
| ); | ||
|
|
||
| const deduplicatedSimulations = uniqBy( | ||
| results.flatMap((result) => result.simulations), | ||
| (simulation) => simulation!.pattern | ||
| ); | ||
|
|
||
| return { | ||
| patterns: deduplicatedSimulations.map((simulation) => simulation!.pattern), | ||
| simulations: deduplicatedSimulations as SimulationWithPattern[], | ||
| }; | ||
| }; | ||
|
|
||
| type SimulationWithPattern = ReturnType<typeof simulateProcessing> & { pattern: string }; | ||
|
|
||
| async function processPattern( | ||
flash1293 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| sample: { truncatedPattern: string; count: number; exampleValues: string[] }, | ||
| name: string, | ||
| body: ProcessingSuggestionBody, | ||
| inferenceClient: InferenceClient, | ||
| scopedClusterClient: IScopedClusterClient, | ||
| streamsClient: StreamsClient, | ||
| field: string, | ||
| samples: FlattenRecord[] | ||
| ) { | ||
| const chatResponse = await inferenceClient.output({ | ||
| id: 'get_pattern_suggestions', | ||
| connectorId: body.connectorId, | ||
| // necessary due to a bug in the inference client - TODO remove when fixed | ||
| functionCalling: 'native', | ||
| system: `Instructions: | ||
| - You are an assistant for observability tasks with a strong knowledge of logs and log parsing. | ||
| - Use JSON format. | ||
| - For a single log source identified, provide the following information: | ||
| * Use 'source_name' as the key for the log source name. | ||
| * Use 'parsing_rule' as the key for the parsing rule. | ||
| - Use only Grok patterns for the parsing rule. | ||
| * Use %{{pattern:name:type}} syntax for Grok patterns when possible. | ||
| * Combine date and time into a single @timestamp field when it's possible. | ||
| - Use ECS (Elastic Common Schema) fields whenever possible. | ||
| - You are correct, factual, precise, and reliable. | ||
| `, | ||
| schema: { | ||
| type: 'object', | ||
| required: ['rules'], | ||
| properties: { | ||
flash1293 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| rules: { | ||
| type: 'array', | ||
| items: { | ||
| type: 'object', | ||
| required: ['parsing_rule'], | ||
| properties: { | ||
| source_name: { | ||
| type: 'string', | ||
| }, | ||
| parsing_rule: { | ||
| type: 'string', | ||
| }, | ||
| }, | ||
| }, | ||
| }, | ||
| }, | ||
| } as const, | ||
| input: `Logs: | ||
| ${sample.exampleValues.join('\n')} | ||
| Given the raw messages coming from one data source, help us do the following: | ||
| 1. Name the log source based on logs format. | ||
| 2. Write a parsing rule for Elastic ingest pipeline to extract structured fields from the raw message. | ||
| Make sure that the parsing rule is unique per log source. When in doubt, suggest multiple patterns, one generic one matching the general case and more specific ones. | ||
| `, | ||
| }); | ||
|
|
||
| const patterns = ( | ||
| chatResponse.output.rules?.map((rule) => rule.parsing_rule).filter(Boolean) as string[] | ||
| ).map(sanitizePattern); | ||
|
|
||
| const simulations = ( | ||
| await Promise.all( | ||
| patterns.map(async (pattern) => { | ||
| // Validate match on current sample | ||
| const simulationResult = await simulateProcessing({ | ||
| params: { | ||
| path: { name }, | ||
| body: { | ||
| processing: [ | ||
| { | ||
| id: 'grok-processor', | ||
| grok: { | ||
| field, | ||
| if: { always: {} }, | ||
| patterns: [pattern], | ||
| }, | ||
| }, | ||
| ], | ||
| documents: samples, | ||
| }, | ||
| }, | ||
| scopedClusterClient, | ||
| streamsClient, | ||
| }); | ||
|
|
||
| if (simulationResult.is_non_additive_simulation) { | ||
| return null; | ||
| } | ||
|
|
||
| if (simulationResult.success_rate === 0) { | ||
| return null; | ||
| } | ||
|
|
||
| // TODO if success rate is zero, try to strip out the date part and try again | ||
|
|
||
| return { | ||
| ...simulationResult, | ||
| pattern, | ||
| }; | ||
| }) | ||
| ) | ||
| ).filter(Boolean) as Array<SimulationWithPattern | null>; | ||
|
|
||
| return { | ||
| chatResponse, | ||
| simulations, | ||
| }; | ||
| } | ||
|
|
||
| /** | ||
| * We need to keep parsing additive, but overwriting timestamp or message is super common. | ||
| * This is a workaround for now until we found the proper solution for deal with this kind of cases. | ||
| */ | ||
| function sanitizePattern(pattern: string): string { | ||
| return pattern | ||
| .replace(/%\{([^}]+):message\}/g, '%{$1:message_derived}') | ||
| .replace(/%\{([^}]+):@timestamp\}/g, '%{$1:@timestamp_derived}'); | ||
| } | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really related to this PR, but that looks like a bunch of dependencies that we can remove? taskManager, encryptedSavedObject, maybe even alerting although I guess we have some references to that in the AssetService