Skip to content

[SecuritySolution] [Dashboard Migrations] Add security automatic migrations evaluation suite#261568

Merged
enriquesanchez-elastic merged 30 commits intomainfrom
scaffold-evals-suite-automatic-migrations
Apr 29, 2026
Merged

[SecuritySolution] [Dashboard Migrations] Add security automatic migrations evaluation suite#261568
enriquesanchez-elastic merged 30 commits intomainfrom
scaffold-evals-suite-automatic-migrations

Conversation

@enriquesanchez-elastic
Copy link
Copy Markdown
Contributor

Summary

This PR introduces the @kbn/evals-suite-security-automatic-migrations package, which includes a new evaluation suite for the Splunk-to-Kibana dashboard migration AI pipeline. The suite features various evaluators to assess the migration quality, including checks for lookup joins, ES|QL syntax validity, and translation fidelity.

Key changes:

  • Added new package with necessary configuration files.
  • Implemented evaluators and dataset handling for dashboard migration.
  • Created test specifications to validate the migration process.

This enhancement aims to improve the accuracy and reliability of dashboard migrations from Splunk to Kibana.

@enriquesanchez-elastic enriquesanchez-elastic self-assigned this Apr 7, 2026
@enriquesanchez-elastic enriquesanchez-elastic requested review from a team as code owners April 7, 2026 13:45
@enriquesanchez-elastic enriquesanchez-elastic added release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting Team:Automatic Migrations Label for Security Automatic Migrations project related task and bugs v9.4.0 evals:security-automatic-migrations labels Apr 7, 2026
Comment thread .github/CODEOWNERS Outdated
/** Panel-level ground truth */
panels: ExpectedPanel[];
/** Category for conditional evaluator logic */
category: 'standard' | 'complex' | 'edge_case';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate, how does this help?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The category field on DashboardExpected drives conditional evaluator logic. For example, edge_case dashboards might use relaxed scoring thresholds or skip certain evaluators (like index pattern matching). This avoids hardcoding per-dashboard exceptions.

}

export interface DashboardMetadata {
category: 'standard' | 'complex' | 'edge_case';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a difference between DashboardExpected['category'] and DashboardMetadata['category']?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they share the same type ('standard' | 'complex' | 'edge_case'). The duplication is intentional as they serve different roles, but I'm happy to DRY it up by having DashboardMetadata reference DashboardExpected['category'] if preferred.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is okay.. i just wanted to know the purpose of keeping them separate.

import type { MigrationResult } from '../migration_client';
import { extractEsqlQueries } from '../helpers';

export const createEsqlSyntaxValidityEvaluator = (): Evaluator<
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this mostly looks like a ESQL query completeness rather than syntax rather than syntax check.

It might be worth dividing them into 2

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will rename this to better reflect its role as a "completeness" check.
Full ES|QL syntax validation (via endpoint or parser) is planned as a follow-up. Would you prefer I split the logic now or track it in an issue?

[key: string]: unknown;
}

export class DashboardMigrationClient {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer this to be graph instance .. since we will need to asset the graph state as well, specially in case of tool calls.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Running the graph directly allows access to intermediate states (inline_query, nl_query) needed for deep evaluation. I'll create a follow-up issue to expose the graph invocation endpoint.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

@logeekal logeekal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so overall PR looks good. Thank you @enriquesanchez-elastic . Apart from my minor comments, i would like to highlight one important things that need to be changed.

This will impact how we do some evaluations where we need to access the internal state of the graph.

Lemme know what you think.

@enriquesanchez-elastic enriquesanchez-elastic force-pushed the scaffold-evals-suite-automatic-migrations branch from e2c4eee to 0dd61b9 Compare April 14, 2026 13:35
Copy link
Copy Markdown
Contributor

@logeekal logeekal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @enriquesanchez-elastic for starting this up.

@enriquesanchez-elastic enriquesanchez-elastic enabled auto-merge (squash) April 21, 2026 14:32
@elasticmachine
Copy link
Copy Markdown
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] Scout: [ security / entity_store ] plugin / local-stateful-classic - Entity Store stop/start API tests - Should stop and start the extract entity task after install
  • [job] [logs] FTR Configs #157 / serverless observability UI - onboarding Onboarding Onboarding Firehose Quickstart Flow shows the existing data callout and detected AWS services when data was ingested previously

Metrics [docs]

✅ unchanged

History

cc @enriquesanchez-elastic

enriquesanchez-elastic and others added 10 commits April 28, 2026 15:08
This commit introduces the `@kbn/evals-suite-security-automatic-migrations` package, which includes a new evaluation suite for the Splunk-to-Kibana dashboard migration AI pipeline. The suite features various evaluators to assess the migration quality, including checks for lookup joins, ES|QL syntax validity, and translation fidelity.

Key changes:
- Added new package with necessary configuration files.
- Implemented evaluators and dataset handling for dashboard migration.
- Created test specifications to validate the migration process.

This enhancement aims to improve the accuracy and reliability of dashboard migrations from Splunk to Kibana.
Defines RuleExample, RuleInput, RuleExpected, and RuleMetadata types
that model the dataset shape for Splunk SPL and QRadar rule migration
evaluations, following the existing dashboards dataset pattern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements the HTTP client for the SIEM rules migration API, following
the same patterns as DashboardMigrationClient: create migration, upload
rule and resources, start migration, poll until complete (max 30 min),
fetch translated result, and always cleanup in a finally block.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements three code-based evaluators for rule migration quality:
- esql_validity: checks FROM clause and placeholder resolution
- lookup_join_preservation: verifies LOOKUP JOIN presence matches expectations
- unsupported_pattern_detection: validates untranslatable rules are not hallucinated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements 4 CODE-kind evaluators: custom query accuracy (Levenshtein
similarity), integration match, prebuilt rule match, and translation
result for rule migration evaluation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
enriquesanchez-elastic and others added 17 commits April 28, 2026 15:08
Wires all rule evaluators into a factory function that runs shared
evaluators for both Splunk and QRadar, plus QRadar-only evaluators,
tracking per-dataset success/failure stats.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extends the base evaluate fixture with rule migration client,
rule dataset evaluator, rule display options, display groups,
and rule skip summary reporting alongside the existing dashboard ones.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Creates the splunk rules dataset (3 placeholder examples covering simple,
lookup-based, and unsupported patterns) and the corresponding
splunk_rule_migration.spec.ts evaluation spec that exercises
evaluateRuleDataset.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds placeholder QRadar rules dataset (simple event rule, reference set rule,
unsupported sequence rule) and corresponding evaluation spec following the same
structure as the Splunk SPL dataset.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ormatting

- Add dataset re-exports (splunkRules, qradarRules) to datasets/rules/index.ts
- Fix @typescript-eslint/no-shadow lint error in helpers.ts (rename shadowed _ params)
- Apply eslint --fix formatting to evaluate.ts, evaluate_dataset.ts, migration_client.ts, and evaluators

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d evaluators

- Add empty-array guard before accessing rules[0] in migration client
- Move TranslationResult evaluator from QRadar-only to shared (applies to both vendors)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…pecs

Add { tag: tags.stateful.classic } to both rule migration specs to match
the dashboard spec pattern. Also add empty-dataset guards and progress
logging consistent with the dashboard spec.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename and refactor rule dataset summary functions for clarity and consistency.
- Update dashboard metadata to enable markdown panels.
- Adjust evaluation logic to handle new dataset structure.
… add queries to metadata

The evaluator checks for unresolved placeholders, not actual syntax parsing.
Rename to reflect its true purpose and include generated ES|QL queries in
the evaluator result metadata for debugging visibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Changed CODEOWNERS entry for `kbn-evals-suite-security-automatic-migrations` to assign ownership to `@elastic/security-threat-hunting`.
- Refactored `index_pattern_validity.ts` to improve handling of actual index patterns, allowing for multiple index patterns per panel title.
- Adjusted regex in `helpers.ts` for better query matching.
- Modified the regex in `helpers.ts` to allow for optional backticks around index patterns in the FROM clause of queries, enhancing the accuracy of index pattern extraction from panels.
…andalone lookup

splHasLookups returned false for SPL queries containing both a standalone
`lookup` and an `inputlookup`/`outputlookup` (e.g. `"lookup users | inputlookup extra.csv"`).
The global exclusion regex short-circuited the first branch, and the fallback
only matched piped lookups. Now iterates matches of `(?<![a-zA-Z])lookup\s+\w+`
and filters out `input`/`output`-prefixed ones.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…gainst missing eai:data

The previous guard only checked `result` truthiness. If `result` existed but
lacked `eai:data`, `sourceSpl.slice(0, 3000)` threw a TypeError outside the
LLM try/catch. Now narrows the guard to the actual field used.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…o threat-hunting

Per review feedback, moves the kbn-evals-suite-security-automatic-migrations
package owner from security-generative-ai to security-threat-hunting.
CODEOWNERS regenerated from kibana.jsonc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@enriquesanchez-elastic enriquesanchez-elastic force-pushed the scaffold-evals-suite-automatic-migrations branch from 9e770e8 to 14ac71b Compare April 28, 2026 13:17
@kibanamachine
Copy link
Copy Markdown
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

✅ unchanged

History

cc @enriquesanchez-elastic

Copy link
Copy Markdown
Contributor

@SrdjanLL SrdjanLL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbn/evals changes and suite setup LGTM!

Please note that the eval suite won't run in the weekly automated run against the golden cluster, unless it's added here - that's okay for early stage suites so we don't bump the token usage, but if/when you think it's ready, you're welcome to add it.

@enriquesanchez-elastic enriquesanchez-elastic merged commit 73c2578 into main Apr 29, 2026
26 checks passed
@enriquesanchez-elastic enriquesanchez-elastic deleted the scaffold-evals-suite-automatic-migrations branch April 29, 2026 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting evals:security-automatic-migrations release_note:skip Skip the PR/issue when compiling release notes Team:Automatic Migrations Label for Security Automatic Migrations project related task and bugs v9.4.0 v9.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants