-
Notifications
You must be signed in to change notification settings - Fork 8.6k
[Streams] Add failure store as a data source for simulations #249559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
CoenWarmer
merged 20 commits into
elastic:main
from
CoenWarmer:streams-add-failure-store-as-source
Jan 23, 2026
Merged
Changes from all commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
01d115e
Add failure store as a data source for simulations
CoenWarmer c2bc603
Merge branch 'main' into streams-add-failure-store-as-source
CoenWarmer a5cf9a5
Minor cosmetic tweaks
CoenWarmer d9e802a
Merge branch 'streams-add-failure-store-as-source' of github.com:Coen…
CoenWarmer 3bf6585
Fix type error
CoenWarmer c172171
Add failure store enabled check and user access privileges check to f…
CoenWarmer 67202d3
Add failure store by default if permissions allow, use existing simul…
CoenWarmer 5b58dce
Optimizations for the processing handling, unwrap failure store docs
CoenWarmer 2a4544e
Merge branch 'main' into streams-add-failure-store-as-source
CoenWarmer 29cf747
Create FailureStoreNotEnabledError
CoenWarmer 5e91b6a
Fix bug where failure store wouldn't switch
CoenWarmer d459efe
Show all documents
CoenWarmer 4fd2da2
Add optional time filter
CoenWarmer 8427661
Merge branch 'main' of github.com:elastic/kibana into streams-add-fai…
CoenWarmer f4fe8de
Cleanup
CoenWarmer ed6e841
Fix tests
CoenWarmer 3bfa1fd
Merge branch 'main' of github.com:elastic/kibana into streams-add-fai…
CoenWarmer 08054ea
Fix tests
CoenWarmer 740edf5
Make Failure Store not a deletable data source
CoenWarmer 87691b4
Merge branch 'main' into streams-add-failure-store-as-source
CoenWarmer File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
21 changes: 21 additions & 0 deletions
21
...tform/plugins/shared/streams/server/lib/streams/errors/failure_store_not_enabled_error.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| /* | ||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
| * or more contributor license agreements. Licensed under the Elastic License | ||
| * 2.0; you may not use this file except in compliance with the Elastic License | ||
| * 2.0. | ||
| */ | ||
|
|
||
| import { StatusError } from './status_error'; | ||
|
|
||
| export class FailureStoreNotEnabledError extends StatusError { | ||
| constructor(message: string) { | ||
| super(message, 403); | ||
| this.name = 'FailureStoreNotEnabledError'; | ||
| } | ||
| } | ||
|
|
||
| export function isFailureStoreNotEnabledError( | ||
| error: unknown | ||
| ): error is FailureStoreNotEnabledError { | ||
| return error instanceof FailureStoreNotEnabledError; | ||
| } |
259 changes: 259 additions & 0 deletions
259
...shared/streams/server/routes/internal/streams/processing/failure_store_samples_handler.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,259 @@ | ||
| /* | ||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
| * or more contributor license agreements. Licensed under the Elastic License | ||
| * 2.0; you may not use this file except in compliance with the Elastic License | ||
| * 2.0. | ||
| */ | ||
|
|
||
| import type { IScopedClusterClient } from '@kbn/core/server'; | ||
| import type { IFieldsMetadataClient } from '@kbn/fields-metadata-plugin/server/services/fields_metadata/types'; | ||
| import type { FlattenRecord } from '@kbn/streams-schema'; | ||
| import { Streams } from '@kbn/streams-schema'; | ||
| import type { StreamlangDSL } from '@kbn/streamlang'; | ||
| import type { StreamsClient } from '../../../../lib/streams/client'; | ||
| import { FAILURE_STORE_SELECTOR } from '../../../../../common/constants'; | ||
| import { simulateProcessing } from './simulation_handler'; | ||
|
|
||
| const DEFAULT_SAMPLE_SIZE = 100; | ||
|
|
||
| /** | ||
| * Structure of a document stored in the Elasticsearch failure store. | ||
| * When a document fails ingestion, Elasticsearch wraps the original document | ||
| * with metadata about the failure. | ||
| */ | ||
| interface FailureStoreDocument { | ||
| '@timestamp': string; | ||
| document: { | ||
| id?: string; | ||
| index?: string; | ||
| source: FlattenRecord; // The original document that failed | ||
| }; | ||
| error: { | ||
| type?: string; | ||
| message?: string; | ||
| stack_trace?: string; | ||
| }; | ||
| } | ||
|
|
||
| export interface FailureStoreSamplesParams { | ||
| path: { | ||
| name: string; | ||
| }; | ||
| query?: { | ||
| size?: number; | ||
| start?: string; | ||
| end?: string; | ||
| }; | ||
| } | ||
|
|
||
| export interface FailureStoreSamplesDeps { | ||
| params: FailureStoreSamplesParams; | ||
| scopedClusterClient: IScopedClusterClient; | ||
| streamsClient: StreamsClient; | ||
| fieldsMetadataClient: IFieldsMetadataClient; | ||
| } | ||
|
|
||
| export interface FailureStoreSamplesResponse { | ||
| documents: FlattenRecord[]; | ||
| } | ||
|
|
||
| /** | ||
| * Fetches documents from the failure store and applies all configured processors | ||
| * from parent streams to transform them. | ||
| * | ||
| * All failure store documents are returned regardless of when they failed, since | ||
| * the simulation uses the current processing configuration. If processing has been | ||
| * fixed since the failure, the simulation will succeed anyway. | ||
| * | ||
| * Optimizations: | ||
| * - Direct children of root streams (e.g., logs.child) have no ancestor processing, | ||
| * so we skip fetching ancestors entirely. | ||
| * - If the failure store is empty, we return early without fetching ancestors. | ||
| * - Deeper nested streams (e.g., logs.child.grandchild) go through the full flow. | ||
| */ | ||
| export const getFailureStoreSamples = async ({ | ||
| params, | ||
| scopedClusterClient, | ||
| streamsClient, | ||
| fieldsMetadataClient, | ||
| }: FailureStoreSamplesDeps): Promise<FailureStoreSamplesResponse> => { | ||
| const { name } = params.path; | ||
| const size = params.query?.size ?? DEFAULT_SAMPLE_SIZE; | ||
| const start = params.query?.start; | ||
| const end = params.query?.end; | ||
|
|
||
| // 1. Check if this is a direct child of a root stream (e.g., logs.child). | ||
| // Direct children have no ancestor processing to apply, so we can optimize by | ||
| // skipping ancestor retrieval entirely. | ||
| if (isDirectChildOfRoot(name)) { | ||
| const failureStoreDocs = await fetchFailureStoreDocuments({ | ||
| scopedClusterClient, | ||
| streamName: name, | ||
| size, | ||
| start, | ||
| end, | ||
| }); | ||
| return { documents: failureStoreDocs }; | ||
| } | ||
|
|
||
| // 2. For deeper nested streams, first fetch failure store documents. | ||
| // If no documents exist, we can return early without fetching ancestors. | ||
| const failureStoreDocs = await fetchFailureStoreDocuments({ | ||
| scopedClusterClient, | ||
| streamName: name, | ||
| size, | ||
| start, | ||
| end, | ||
| }); | ||
|
|
||
| if (failureStoreDocs.length === 0) { | ||
| return { documents: [] }; | ||
| } | ||
|
|
||
| // 3. Only fetch ancestors and stream definition when we have documents that need processing | ||
| const [ancestors, stream] = await Promise.all([ | ||
| streamsClient.getAncestors(name), | ||
| streamsClient.getStream(name), | ||
| ]); | ||
|
|
||
| // 4. Collect and combine processing steps from all ancestors (root to current stream) | ||
| const combinedProcessing = collectAncestorProcessing(ancestors, stream); | ||
|
|
||
| // If no processing steps are configured, return the raw documents | ||
| if (combinedProcessing.steps.length === 0) { | ||
| return { documents: failureStoreDocs }; | ||
| } | ||
|
|
||
| // 5. Run simulation with combined processing using the existing simulateProcessing function | ||
| const simulationResult = await simulateProcessing({ | ||
| params: { | ||
| path: { name }, | ||
| body: { | ||
| processing: combinedProcessing, | ||
| documents: failureStoreDocs, | ||
| }, | ||
| }, | ||
| scopedClusterClient, | ||
| streamsClient, | ||
| fieldsMetadataClient, | ||
| }); | ||
|
|
||
| // 6. Extract the processed document sources from the simulation result | ||
| const processedDocs = simulationResult.documents.map((docReport) => docReport.value); | ||
|
|
||
| return { documents: processedDocs }; | ||
| }; | ||
|
|
||
| /** | ||
| * Checks if a stream is a direct child of a root stream (depth = 1). | ||
| * Direct children (e.g., "logs.child") have no ancestors with processing to apply. | ||
| * Root streams are identified by having no dots in their name. | ||
| */ | ||
| function isDirectChildOfRoot(streamName: string): boolean { | ||
| const parts = streamName.split('.'); | ||
| // A direct child has exactly 2 parts: root.child | ||
| return parts.length === 2; | ||
| } | ||
|
|
||
| /** | ||
| * Fetches documents from the failure store for the given stream. | ||
| * | ||
| * Documents in the failure store are wrapped with error metadata. This function | ||
| * unwraps them and returns only the original document sources that can be used | ||
| * for simulation. | ||
| * | ||
| * Optionally filters by time range if start/end are provided. | ||
| */ | ||
| async function fetchFailureStoreDocuments({ | ||
| scopedClusterClient, | ||
| streamName, | ||
| size, | ||
| start, | ||
| end, | ||
| }: { | ||
| scopedClusterClient: IScopedClusterClient; | ||
| streamName: string; | ||
| size: number; | ||
| start?: string; | ||
| end?: string; | ||
| }): Promise<FlattenRecord[]> { | ||
| try { | ||
| // Build query with optional time range filter | ||
| const query = | ||
| start || end | ||
| ? { | ||
| bool: { | ||
| must: [ | ||
| { | ||
| range: { | ||
| '@timestamp': { | ||
| ...(start && { gte: start }), | ||
| ...(end && { lte: end }), | ||
| }, | ||
| }, | ||
| }, | ||
| ], | ||
| }, | ||
| } | ||
| : undefined; | ||
|
|
||
| const response = await scopedClusterClient.asCurrentUser.search({ | ||
| index: `${streamName}${FAILURE_STORE_SELECTOR}`, | ||
| size, | ||
| sort: [{ '@timestamp': { order: 'desc' } }], | ||
| ...(query && { query }), | ||
| }); | ||
|
|
||
| // Unwrap the original documents from the failure store wrapper. | ||
| // Failure store documents have the structure: { document: { source: <original doc> }, error: {...} } | ||
| // We want to return just the original document so users can fix their processing | ||
| // for newly incoming docs that will have the same structure. | ||
| return response.hits.hits | ||
| .map((hit) => { | ||
| const failureDoc = hit._source as FailureStoreDocument | undefined; | ||
| return failureDoc?.document?.source; | ||
| }) | ||
| .filter((doc): doc is FlattenRecord => doc !== undefined); | ||
| } catch (error) { | ||
| // If the failure store doesn't exist or is empty, return empty array | ||
| if (error.meta?.statusCode === 404) { | ||
| return []; | ||
| } | ||
| throw error; | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Collects and combines processing steps from all ancestors in order from root to current stream. | ||
| * This ensures processors are applied in the correct order as they would be during normal ingestion. | ||
| * Returns a combined StreamlangDSL that can be passed to simulateProcessing. | ||
| */ | ||
| function collectAncestorProcessing( | ||
| ancestors: Streams.WiredStream.Definition[], | ||
| currentStream: Streams.all.Definition | ||
| ): StreamlangDSL { | ||
| const allSteps: StreamlangDSL['steps'] = []; | ||
|
|
||
| // Sort ancestors from root (shortest name) to closest parent | ||
| const sortedAncestors = [...ancestors].sort((a, b) => a.name.length - b.name.length); | ||
|
|
||
| // Add processing steps from each ancestor | ||
| for (const ancestor of sortedAncestors) { | ||
| if (ancestor.ingest.processing.steps.length > 0) { | ||
| allSteps.push(...ancestor.ingest.processing.steps); | ||
| } | ||
| } | ||
|
|
||
| // Add processing steps from the current stream if it's a wired or classic stream | ||
| if (Streams.WiredStream.Definition.is(currentStream)) { | ||
| if (currentStream.ingest.processing.steps.length > 0) { | ||
| allSteps.push(...currentStream.ingest.processing.steps); | ||
| } | ||
| } else if (Streams.ClassicStream.Definition.is(currentStream)) { | ||
| if (currentStream.ingest.processing.steps.length > 0) { | ||
| allSteps.push(...currentStream.ingest.processing.steps); | ||
| } | ||
| } | ||
|
|
||
| return { steps: allSteps }; | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.