Added AI Insight evals by yuliia-fryshko · Pull Request #263561 · elastic/kibana

yuliia-fryshko · 2026-04-15T16:53:00Z

Closes https://github.com/elastic/obs-ai-team/issues/533
Closes https://github.com/elastic/obs-ai-team/issues/536
Closes https://github.com/elastic/obs-ai-team/issues/534
Closes https://github.com/elastic/obs-ai-team/issues/535

This PR introduces an evaluation dataset along with corresponding tests for AI Insights across different scenarios.

Added:

Error AI Insights eval tests with the productCatalogFailure feature
Alert AI Insights eval tests with paymentUnreachable
Logs AI Insights eval tests with productCatalog and paymentUnreachable scenarios

These tests aim to improve coverage and ensure consistent evaluation across key AI Insights use cases.

SrdjanLL

Great work!

I just left some minor comments (mainly questions and a suggestion for avoiding bespoke polling implementation).

SrdjanLL · 2026-04-24T09:44:43Z

+      const deadline = Date.now() + ALERT_POLL_TIMEOUT_MS;

-      await esClient.indices.refresh({ index: scenario.alertRule.alertsIndex });
+      while (Date.now() < deadline) {


I suggest using pRetry here for polling with exponential backoff, similar to how it's done here.

SrdjanLL · 2026-04-24T09:50:53Z

+ * The AI insight endpoints return SSE (Server-Sent Events) streams.
+ * This parses the raw SSE text into the summary and context fields.
+ */
+function parseSseResponse(raw: unknown): AiInsightResponse {


So I assume this was the root cause of us not having AI Insights responses in the task-under-evaluation payloads?

Yes, that was exactly it :)

SrdjanLL · 2026-04-24T09:51:49Z

    -   Validate that the error is properly handled and does not impact payment processing for valid tokens.
    -   If no further errors occur, monitor for recurrence but no urgent action is required. If errors increase, investigate token validation logic and upstream authentication flows.`;

+const PAYMENT_UNREACHABLE_ALERT_EXPECTED = `-   Summary: An APM error count alert fired for the frontend service because the payment service is unreachable. The checkout flow fails with a gRPC Unavailable error ("name resolver error: produced zero addresses") when attempting to charge a card via the payment service. This is a connectivity or infrastructure failure, not an application code defect.


[Question] Just curious if you tweaked the expected responses for all insights based on our preferences/expectations or this is an actual response from the AI Insight API?

Good question, @SrdjanLL ! I took an answer from Claude Opus and tweaked the wording a bit

macroscopeapp · 2026-04-28T13:51:45Z

Catch flakiness early (recommended): run the flaky test runner against this PR before merging.

This PR unskips a previously-flaky Scout test (landing.spec.ts, ref #253824) with new retry timing, and adds a brand-new FTR integration test (search_rules.ts) loaded by both ESS and serverless configs.

Trigger a run with the Flaky Test Runner UI or post this comment on the PR:

/flaky scoutConfig:x-pack/solutions/observability/plugins/observability/test/scout/ui/parallel.playwright.config.ts:30 ftrConfig:x-pack/solutions/security/test/security_solution_api_integration/test_suites/detections_response/rules_management/rule_read/trial_license_complete_tier/configs/ess.config.ts:30 ftrConfig:x-pack/solutions/security/test/security_solution_api_integration/test_suites/detections_response/rules_management/rule_read/trial_license_complete_tier/configs/serverless.config.ts:30

^{Share feedback in the #appex-qa channel.}

^{Posted via Macroscope — Flaky Test Runner nudge}

SrdjanLL · 2026-04-29T08:56:58Z

+          await kbnClient.request<void>({
+            method: 'POST',
+            path: `/internal/alerting/rule/${ruleId}/_run_soon`,
+          });


_run_soon sits inside the pRetry callback, so it fires on every poll iteration.
IIRC this will queue up rule runs, while we only need the rule to trigger once and the polling should just wait for the alert to appear.

I think it's worthing moving this outside of the pRetry's block.

Also, I noticed the CI is failing with:

Error: Alert not yet available -- 76 \| const alertDoc = alertsResponse.hits.hits[0]; 77 \| if (!alertDoc) { > 78 \| throw new Error('Alert not yet available'); \| ^ 79 \| } 80 \| return alertDoc._id as string; 81 \| },

Do you think that's just a polling error? When you run snapshot replay manually (using CLI), are you able to see the alert?

Thanks, @SrdjanLL , for the review and comments. I'm looking why it can happen, locally it worked fine

github-actions · 2026-05-05T13:11:32Z

@yuliia-fryshko, it looks like you're updating the parameters for a rule type!

Please review the guidelines for making additive changes to rule type parameters and determine if your changes require an intermediate release.

kibanamachine · 2026-05-06T08:05:10Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: 6f71e26

Failed CI Steps

Metrics [docs]

✅ unchanged

History

💛 Build #438353 was flaky 25cfd76
💚 Build #438199 succeeded e0ee499
💔 Build #438148 failed b1eb01e
💔 Build #438008 failed 1e3dbe8

cc @yuliia-fryshko

SrdjanLL

The new scenarios LGTM (as long as they are 🟢 on CI 🙂)!

For visibility, c9dc50d removes a failed scenario where alert wasn't triggering on CI. This will be tracked separately so we get some of the work complete before @yuliia-fryshko 's PTO. The removed scenario is tracked separately via https://github.com/elastic/obs-ai-team/issues/537 and I've added it to current iteration.

yuliia-fryshko self-assigned this Apr 15, 2026

yuliia-fryshko added the release_note:skip Skip the PR/issue when compiling release notes label Apr 15, 2026

yuliia-fryshko requested a review from a team as a code owner April 15, 2026 16:53

elastic deleted a comment from elasticmachine Apr 21, 2026

yuliia-fryshko changed the title ~~Added Error AI Insight evals for Product Catalog failure~~ Added AI Insight evals Apr 23, 2026

SrdjanLL reviewed Apr 24, 2026

View reviewed changes

yuliia-fryshko requested review from a team as code owners April 28, 2026 13:49

yuliia-fryshko requested review from dplumlee and rylnd April 28, 2026 13:49

yuliia-fryshko removed request for a team, dplumlee and rylnd April 28, 2026 14:00

elastic deleted a comment from elasticmachine Apr 28, 2026

SrdjanLL reviewed Apr 29, 2026

View reviewed changes

yuliia-fryshko added 4 commits April 29, 2026 14:21

Merge branch 'main' into ai-insight-evals-505

0d0d216

move the request outside of the pRetry's block

f5bdf67

adjusted dataset and ground truth for high cpu alert ai insight

a3c11a5

Merge branch 'main' into ai-insight-evals-505

c7e4e58

elastic deleted a comment from kibanamachine Apr 30, 2026

Merge branch 'main' into ai-insight-evals-505

0c165f0

elastic deleted a comment from kibanamachine May 4, 2026

yuliia-fryshko and others added 9 commits May 4, 2026 16:38

Merge branch 'main' into ai-insight-evals-505

3f0f558

increased polling time for alert ai insight

c9db5c4

fixed tests; added refresh replayed source indices so APM rules see data

8ddcc71

Changes from node scripts/eslint_all_files --no-cache --fix

c6318c0

added delay for alert ai tests

d742c53

Merge branch 'main' into ai-insight-evals-505

2341965

Merge branch 'main' into ai-insight-evals-505

1e3dbe8

fix tests on CI

827b0a8

fix test

b1eb01e

yuliia-fryshko added 4 commits May 5, 2026 16:13

removed failed scenario, will cover it separately

c9dc50d

Merge branch 'main' into ai-insight-evals-505

e0ee499

removed unused methods

25cfd76

Merge branch 'main' into ai-insight-evals-505

6f71e26

SrdjanLL approved these changes May 6, 2026

View reviewed changes

yuliia-fryshko added backport:skip This PR does not require backporting v9.5.0 and removed backport:version Backport to applied version labels v9.4.0 labels May 6, 2026

yuliia-fryshko merged commit 32dff45 into elastic:main May 6, 2026
65 checks passed

Conversation

yuliia-fryshko commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SrdjanLL left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

macroscopeapp Bot commented Apr 28, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

kibanamachine commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

History

Uh oh!

SrdjanLL left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yuliia-fryshko commented Apr 15, 2026 •

edited

Loading

kibanamachine commented May 6, 2026 •

edited

Loading