[SecuritySolution] [Dashboard Migrations] Add security automatic migrations evaluation suite by enriquesanchez-elastic · Pull Request #261568 · elastic/kibana

enriquesanchez-elastic · 2026-04-07T13:45:52Z

Summary

This PR introduces the @kbn/evals-suite-security-automatic-migrations package, which includes a new evaluation suite for the Splunk-to-Kibana dashboard migration AI pipeline. The suite features various evaluators to assess the migration quality, including checks for lookup joins, ES|QL syntax validity, and translation fidelity.

Key changes:

Added new package with necessary configuration files.
Implemented evaluators and dataset handling for dashboard migration.
Created test specifications to validate the migration process.

This enhancement aims to improve the accuracy and reliability of dashboard migrations from Splunk to Kibana.

logeekal · 2026-04-08T06:27:14Z

+  /** Panel-level ground truth */
+  panels: ExpectedPanel[];
+  /** Category for conditional evaluator logic */
+  category: 'standard' | 'complex' | 'edge_case';


Could you please elaborate, how does this help?

The category field on DashboardExpected drives conditional evaluator logic. For example, edge_case dashboards might use relaxed scoring thresholds or skip certain evaluators (like index pattern matching). This avoids hardcoding per-dashboard exceptions.

logeekal · 2026-04-08T06:28:06Z

+}
+
+export interface DashboardMetadata {
+  category: 'standard' | 'complex' | 'edge_case';


is there a difference between DashboardExpected['category'] and DashboardMetadata['category']?

No, they share the same type ('standard' | 'complex' | 'edge_case'). The duplication is intentional as they serve different roles, but I'm happy to DRY it up by having DashboardMetadata reference DashboardExpected['category'] if preferred.

it is okay.. i just wanted to know the purpose of keeping them separate.

logeekal · 2026-04-08T08:32:33Z

+import type { MigrationResult } from '../migration_client';
+import { extractEsqlQueries } from '../helpers';
+
+export const createEsqlSyntaxValidityEvaluator = (): Evaluator<


this mostly looks like a ESQL query completeness rather than syntax rather than syntax check.

It might be worth dividing them into 2

I will rename this to better reflect its role as a "completeness" check.
Full ES|QL syntax validation (via endpoint or parser) is planned as a follow-up. Would you prefer I split the logic now or track it in an issue?

logeekal · 2026-04-13T09:47:58Z

+  [key: string]: unknown;
+}
+
+export class DashboardMigrationClient {


I would prefer this to be graph instance .. since we will need to asset the graph state as well, specially in case of tool calls.

Agreed. Running the graph directly allows access to intermediate states (inline_query, nl_query) needed for deep evaluation. I'll create a follow-up issue to expose the graph invocation endpoint.

The issue: https://github.com/elastic/security-team/issues/16820

logeekal

Okay so overall PR looks good. Thank you @enriquesanchez-elastic . Apart from my minor comments, i would like to highlight one important things that need to be changed.

We need to directly run graph instead of whole migration.
- I think easiest way is to create an endpoint to run the Migrations graph which can take below inputs. basically it will simply call this (
  
  kibana/x-pack/solutions/security/plugins/security_solution/server/lib/siem_migrations/common/task/siem_migrations_task_runner.ts
  
  Line 191 in 9b55055
  
  async executeTask(input: P, config: RunnableConfig<C>) {
  
  )
  - graph name
  - graph input
  - invokation config

This will impact how we do some evaluations where we need to access the internal state of the graph.

Lemme know what you think.

logeekal

Thanks @enriquesanchez-elastic for starting this up.

elasticmachine · 2026-04-23T09:15:25Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: 9e770e8

Failed CI Steps

Test Failures

[job] [logs] Scout: [ security / entity_store ] plugin / local-stateful-classic - Entity Store stop/start API tests - Should stop and start the extract entity task after install
[job] [logs] FTR Configs #157 / serverless observability UI - onboarding Onboarding Onboarding Firehose Quickstart Flow shows the existing data callout and detected AWS services when data was ingested previously

Metrics [docs]

✅ unchanged

History

💔 Build #431919 failed 31cdaab
💔 Build #431865 failed 322f6bb
💔 Build #431302 failed 18a9ca9
💛 Build #430637 was flaky 9297d26
💔 Build #428159 failed d4e5fe7
💔 Build #427383 failed 4c73427

cc @enriquesanchez-elastic

This commit introduces the `@kbn/evals-suite-security-automatic-migrations` package, which includes a new evaluation suite for the Splunk-to-Kibana dashboard migration AI pipeline. The suite features various evaluators to assess the migration quality, including checks for lookup joins, ES|QL syntax validity, and translation fidelity. Key changes: - Added new package with necessary configuration files. - Implemented evaluators and dataset handling for dashboard migration. - Created test specifications to validate the migration process. This enhancement aims to improve the accuracy and reliability of dashboard migrations from Splunk to Kibana.

Defines RuleExample, RuleInput, RuleExpected, and RuleMetadata types that model the dataset shape for Splunk SPL and QRadar rule migration evaluations, following the existing dashboards dataset pattern. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Implements the HTTP client for the SIEM rules migration API, following the same patterns as DashboardMigrationClient: create migration, upload rule and resources, start migration, poll until complete (max 30 min), fetch translated result, and always cleanup in a finally block. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Implements three code-based evaluators for rule migration quality: - esql_validity: checks FROM clause and placeholder resolution - lookup_join_preservation: verifies LOOKUP JOIN presence matches expectations - unsupported_pattern_detection: validates untranslatable rules are not hallucinated Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Implements 4 CODE-kind evaluators: custom query accuracy (Levenshtein similarity), integration match, prebuilt rule match, and translation result for rule migration evaluation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Wires all rule evaluators into a factory function that runs shared evaluators for both Splunk and QRadar, plus QRadar-only evaluators, tracking per-dataset success/failure stats. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Extends the base evaluate fixture with rule migration client, rule dataset evaluator, rule display options, display groups, and rule skip summary reporting alongside the existing dashboard ones. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Creates the splunk rules dataset (3 placeholder examples covering simple, lookup-based, and unsupported patterns) and the corresponding splunk_rule_migration.spec.ts evaluation spec that exercises evaluateRuleDataset. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds placeholder QRadar rules dataset (simple event rule, reference set rule, unsupported sequence rule) and corresponding evaluation spec following the same structure as the Splunk SPL dataset. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ormatting - Add dataset re-exports (splunkRules, qradarRules) to datasets/rules/index.ts - Fix @typescript-eslint/no-shadow lint error in helpers.ts (rename shadowed _ params) - Apply eslint --fix formatting to evaluate.ts, evaluate_dataset.ts, migration_client.ts, and evaluators Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…d evaluators - Add empty-array guard before accessing rules[0] in migration client - Move TranslationResult evaluator from QRadar-only to shared (applies to both vendors) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…pecs Add { tag: tags.stateful.classic } to both rule migration specs to match the dashboard spec pattern. Also add empty-dataset guards and progress logging consistent with the dashboard spec. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Rename and refactor rule dataset summary functions for clarity and consistency. - Update dashboard metadata to enable markdown panels. - Adjust evaluation logic to handle new dataset structure.

… add queries to metadata The evaluator checks for unresolved placeholders, not actual syntax parsing. Rename to reflect its true purpose and include generated ES|QL queries in the evaluator result metadata for debugging visibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Changed CODEOWNERS entry for `kbn-evals-suite-security-automatic-migrations` to assign ownership to `@elastic/security-threat-hunting`. - Refactored `index_pattern_validity.ts` to improve handling of actual index patterns, allowing for multiple index patterns per panel title. - Adjusted regex in `helpers.ts` for better query matching.

- Modified the regex in `helpers.ts` to allow for optional backticks around index patterns in the FROM clause of queries, enhancing the accuracy of index pattern extraction from panels.

…andalone lookup splHasLookups returned false for SPL queries containing both a standalone `lookup` and an `inputlookup`/`outputlookup` (e.g. `"lookup users | inputlookup extra.csv"`). The global exclusion regex short-circuited the first branch, and the fallback only matched piped lookups. Now iterates matches of `(?<![a-zA-Z])lookup\s+\w+` and filters out `input`/`output`-prefixed ones. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…gainst missing eai:data The previous guard only checked `result` truthiness. If `result` existed but lacked `eai:data`, `sourceSpl.slice(0, 3000)` threw a TypeError outside the LLM try/catch. Now narrows the guard to the actual field used. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…o threat-hunting Per review feedback, moves the kbn-evals-suite-security-automatic-migrations package owner from security-generative-ai to security-threat-hunting. CODEOWNERS regenerated from kibana.jsonc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

kibanamachine · 2026-04-29T11:58:17Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: 6a5b94e

Failed CI Steps

FTR Configs #108

Metrics [docs]

✅ unchanged

History

💛 Build #435432 was flaky 7da720a
💛 Build #434930 was flaky 14ac71b

cc @enriquesanchez-elastic

SrdjanLL

@kbn/evals changes and suite setup LGTM!

Please note that the eval suite won't run in the weekly automated run against the golden cluster, unless it's added here - that's okay for early stage suites so we don't bump the token usage, but if/when you think it's ready, you're welcome to add it.

enriquesanchez-elastic self-assigned this Apr 7, 2026

enriquesanchez-elastic requested review from a team as code owners April 7, 2026 13:45

macroscopeapp Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread ...-evals-suite-security-automatic-migrations/src/dashboards/evaluators/translation_fidelity.ts Outdated

Comment thread ...vals-suite-security-automatic-migrations/src/dashboards/evaluators/index_pattern_validity.ts Outdated

macroscopeapp Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread ...ges/kbn-evals-suite-security-automatic-migrations/datasets/dashboards/standard_dashboards.ts Outdated

logeekal reviewed Apr 8, 2026

View reviewed changes

Comment thread .github/CODEOWNERS Outdated

logeekal reviewed Apr 13, 2026

View reviewed changes

Comment thread ...n-evals-suite-security-automatic-migrations/src/rules/evaluators/lookup_join_preservation.ts

Comment thread ...vals-suite-security-automatic-migrations/src/rules/evaluators/nl_description_faithfulness.ts

enriquesanchez-elastic force-pushed the scaffold-evals-suite-automatic-migrations branch from e2c4eee to 0dd61b9 Compare April 14, 2026 13:35

macroscopeapp Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread ...ns/security/packages/kbn-evals-suite-security-automatic-migrations/src/dashboards/helpers.ts Outdated

logeekal approved these changes Apr 15, 2026

View reviewed changes

macroscopeapp Bot reviewed Apr 16, 2026

View reviewed changes

Comment thread ...ns/security/packages/kbn-evals-suite-security-automatic-migrations/src/dashboards/helpers.ts Outdated

macroscopeapp Bot reviewed Apr 20, 2026

View reviewed changes

Comment thread ...ns/security/packages/kbn-evals-suite-security-automatic-migrations/src/dashboards/helpers.ts

Comment thread ...-evals-suite-security-automatic-migrations/src/dashboards/evaluators/translation_fidelity.ts Outdated

enriquesanchez-elastic enabled auto-merge (squash) April 21, 2026 14:32

enriquesanchez-elastic and others added 10 commits April 28, 2026 15:08

Changes from node scripts/lint_ts_projects --fix

b6bd305

Changes from node scripts/generate codeowners

8c3c965

Changes from node scripts/regenerate_moon_projects.js --update

601a726

Changes from node scripts/eslint_all_files --no-cache --fix

d7fd18b

Add shared helper utilities for automatic migrations eval suite

b935bd1

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

enriquesanchez-elastic and others added 17 commits April 28, 2026 15:08

refactor(evals): update rule dataset handling and dashboard metadata

fb9717e

- Rename and refactor rule dataset summary functions for clarity and consistency. - Update dashboard metadata to enable markdown panels. - Adjust evaluation logic to handle new dataset structure.

Changes from node scripts/generate codeowners

eb41782

Update regex in extractIndexPatternFromPanel for improved query matching

2c83338

- Modified the regex in `helpers.ts` to allow for optional backticks around index patterns in the FROM clause of queries, enhancing the accuracy of index pattern extraction from panels.

Changes from node scripts/generate codeowners

602bdff

Changes from node scripts/regenerate_moon_projects.js --update

14ac71b

enriquesanchez-elastic force-pushed the scaffold-evals-suite-automatic-migrations branch from 9e770e8 to 14ac71b Compare April 28, 2026 13:17

enriquesanchez-elastic added 2 commits April 29, 2026 10:33

Merge branch 'main' into scaffold-evals-suite-automatic-migrations

7da720a

Merge branch 'main' into scaffold-evals-suite-automatic-migrations

6a5b94e

SrdjanLL approved these changes Apr 29, 2026

View reviewed changes

enriquesanchez-elastic merged commit 73c2578 into main Apr 29, 2026
26 checks passed

enriquesanchez-elastic deleted the scaffold-evals-suite-automatic-migrations branch April 29, 2026 12:14

kibanamachine added the v9.5.0 label Apr 29, 2026

This was referenced Apr 29, 2026

[Entity Store] Implement logs pagination in CCS #266307

Merged

Update dependency msw to v2.13.4 (main) #266770

Merged

Update dependency lodash to v4.18.1 (main) #263633

Merged

This was referenced May 7, 2026

Update dependency msw to v2.13.5 (main) #268091

Merged

chore(deps): bump @redocly/cli to 2.30.4 and postcss to 8.5.14 #268223

Merged

Conversation

enriquesanchez-elastic commented Apr 7, 2026

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

logeekal left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

logeekal left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

elasticmachine commented Apr 23, 2026

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

Metrics [docs]

History

Uh oh!

kibanamachine commented Apr 29, 2026

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

History

Uh oh!

SrdjanLL left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants