[EA] Add attachment side-effect evals for Entity Store V2 by enriquesanchez-elastic · Pull Request #265465 · elastic/kibana

enriquesanchez-elastic · 2026-04-24T07:57:01Z

Summary

Companion eval coverage for #264985 (merged). Adds coverage that the security.entity conversation attachment is persisted as a side effect of security.get_entity (single card) and security.search_entities (table), and that no attachment is persisted when an entity cannot be resolved.

Eval harness changes

chat_client: fetches /api/agent_builder/conversations/{id}/attachments after each converse and surfaces them on the task output.
evaluate_dataset: adds AttachmentAssertion schema (type/shape/entityId/entityType/minEntities/count/criteria) and an Attachments evaluator alongside Criteria and ToolCalls. Deterministic match for type/shape/identifier/count; LLM judge for free-form criteria over the matched payload.

New spec

evals/v2/entity_attachment_side_effect.spec.ts:

Bulk-indexes two seeded user entities (attach-alice, attach-bob) directly into the V2 latest alias (fast path — follows highlights_v2.ts), so the attachment codepath activates without running the full extractor + maintainer pipeline (beforeAll runs in ~5s vs several minutes).
Asserts single-card (count.min: 1, shape: single), table (shape: table, minEntities: 2), and negative (count.exact: 0) cases.

Config

Enables entityAttachmentRichRenderer in the evals_entity_analytics_v2 Scout configSet so tool-side attachment creation is active.
Adds @kbn/entity-store to tsconfig.json kbn_references.
README coverage matrix and assertion docs updated.

Test plan

Start Scout server: node scripts/scout start-server --arch stateful --domain classic --serverConfigSet evals_entity_analytics_v2
Run the new spec: node scripts/playwright test --config x-pack/solutions/security/packages/kbn-evals-suite-entity-analytics/playwright.v2.config.ts x-pack/solutions/security/packages/kbn-evals-suite-entity-analytics/evals/v2/entity_attachment_side_effect.spec.ts --project="<connector>"
Attachments column reports mean: 1, std: 0 across all 3 examples (single, table, negative).
Confirm existing v2 specs (entity_store_v2_{get_entity,search_entities,multi_skill}.spec.ts) still pass — they have no attachments assertions, so the new evaluator must auto-pass with score: 1.
Verify afterAll teardown cleans up entity engines.

🤖 Generated with Claude Code

macroscopeapp · 2026-04-24T08:15:41Z

+        {hasGroup || isLoading ? (
+          <ResolutionGroupTable
+            group={group ?? null}
+            isLoading={isLoading}
+            isError={isError}
+            targetEntityId={targetEntityId}
+            currentEntityId={currentEntityStoreEntityId}
+            showActions={false}
+          />
+        ) : (
+          <EuiText size="xs" color="subdued">
+            {EMPTY_LABEL}
+          </EuiText>
+        )}
+      </EuiAccordion>


🟡 Medium entity_card/resolution_mini.tsx:107

When useResolutionGroup returns an error (isError=true), the component renders EMPTY_LABEL ("No resolution group yet.") instead of the error message. At line 70, the early return only covers !isError, so error states fall through. Then at line 107, the condition hasGroup || isLoading evaluates to false when there's an error without data, routing to the EMPTY_LABEL branch. The ResolutionGroupTable already handles errors and would display RESOLUTION_FETCH_ERROR, but it's never reached. Consider changing line 107 to hasGroup || isLoading || isError so errors route to the table.

- {hasGroup || isLoading ? ( + {hasGroup || isLoading || isError ? ( <ResolutionGroupTable group={group ?? null} isLoading={isLoading} isError={isError} targetEntityId={targetEntityId} currentEntityId={currentEntityStoreEntityId} showActions={false} /> ) : ( <EuiText size="xs" color="subdued"> {EMPTY_LABEL} </EuiText> )}

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:

In file x-pack/solutions/security/plugins/security_solution/public/agent_builder/attachment_types/entity_attachment/entity_card/resolution_mini.tsx around lines 107-121: When `useResolutionGroup` returns an error (`isError=true`), the component renders `EMPTY_LABEL` ("No resolution group yet.") instead of the error message. At line 70, the early return only covers `!isError`, so error states fall through. Then at line 107, the condition `hasGroup || isLoading` evaluates to `false` when there's an error without data, routing to the `EMPTY_LABEL` branch. The `ResolutionGroupTable` already handles errors and would display `RESOLUTION_FETCH_ERROR`, but it's never reached. Consider changing line 107 to `hasGroup || isLoading || isError` so errors route to the table. Evidence trail: x-pack/solutions/security/plugins/security_solution/public/agent_builder/attachment_types/entity_attachment/entity_card/resolution_mini.tsx (REVIEWED_COMMIT) - line 70 shows early return condition `if (!isLoading && !isError && !hasGroup)`, line 107 shows ternary `hasGroup || isLoading ? <ResolutionGroupTable.../> : <EMPTY_LABEL>` x-pack/solutions/security/plugins/security_solution/public/entity_analytics/components/entity_resolution/resolution_group_table.tsx (REVIEWED_COMMIT) - lines 206-211 show proper error handling with `if (isError) { return ... RESOLUTION_FETCH_ERROR }` that is unreachable from resolution_mini.tsx during error states

Extends the entity-analytics evals suite with assertions that the security.entity conversation attachment is persisted as a side effect of security.get_entity (single card) and security.search_entities (table), and that no attachment is persisted when an entity cannot be resolved. Changes: - evals_suite chat_client: fetch /api/agent_builder/conversations/{id}/attachments after each converse call; surface on the chat task output. - evals_suite evaluate_dataset: new AttachmentAssertion schema (type/shape/entityId/entityType/minEntities/count/criteria) + Attachments evaluator alongside Criteria/ToolCalls. - New spec evals/v2/entity_attachment_side_effect.spec.ts: bulk-indexes two user entities into the V2 latest alias (following highlights_v2.ts pattern) so the attachment codepath activates without running the extractor + maintainer pipeline; asserts single-card, table, and negative cases. - Enables entityAttachmentRichRenderer in the evals_entity_analytics_v2 Scout configSet so the tool-side attachment creation is active. - Adds @kbn/entity-store to tsconfig.json kbn_references. - README coverage matrix and assertion docs updated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Groundedness `score` collapsed to 0 whenever the LLM judge emitted a per-claim verdict of `NOT_FOUND`, even when `summary_verdict` was `GROUNDED`. The scoring map keyed off `NOT_IN_GROUND_TRUTH`, while the prompt schema and types use `NOT_FOUND` — the lookup missed and `claimScore` defaulted to 0, propagating through the geometric mean. Also persist the full `groundednessAnalysis` and `correctnessAnalysis` on the quantitative evaluators' `metadata`, so per-claim verdicts are queryable in the `kibana-evaluations` index for forensic triage instead of requiring a re-run. Tracking issues: - elastic/security-team#17044 (scoring key drift) - elastic/security-team#17045 (metadata forensics gap) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Drop endsWith fallback in entityIdMatches: it matched any suffix (e.g. expected "smith" matched "goldsmith"). Equality on the full or stripped {type}: form is sufficient. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

infra-vault-gh-plugin-prod · 2026-04-29T11:29:38Z

Pinging @elastic/security-entity-analytics (Team:Entity Analytics)

ymao1 · 2026-04-29T19:11:57Z

/ci

ymao1 · 2026-04-29T19:12:27Z

      `--uiSettings.overrides.agentBuilder:experimentalFeatures=true`,
      `--uiSettings.overrides.securitySolution:entityStoreEnableV2=true`,
-      `--xpack.securitySolution.enableExperimental=["entityAnalyticsEntityStoreV2"]`,
+      `--xpack.securitySolution.enableExperimental=["entityAnalyticsEntityStoreV2","entityAttachmentRichRenderer"]`,


this entityAttachmentRichRenderer is no longer needed. it was removed in the original PR

Good catch — removed in ccf33af.

ymao1

LGTM

Flag was removed in original PR; drop from evals stateful config. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

SrdjanLL

LGTM!

And thanks for the fix on the groundedness evaluator 🙌🏻

kibanamachine · 2026-05-04T09:01:17Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: 350833d

Failed CI Steps

FTR Configs #217

Metrics [docs]

✅ unchanged

History

💚 Build #436176 succeeded ccf33af
💛 Build #435926 was flaky b849334
💔 Build #435627 failed b849334

cc @enriquesanchez-elastic

…5465) ## Summary Companion eval coverage for elastic#264985 (merged). Adds coverage that the `security.entity` conversation attachment is persisted as a side effect of `security.get_entity` (single card) and `security.search_entities` (table), and that no attachment is persisted when an entity cannot be resolved. ### Eval harness changes - `chat_client`: fetches `/api/agent_builder/conversations/{id}/attachments` after each `converse` and surfaces them on the task output. - `evaluate_dataset`: adds `AttachmentAssertion` schema (`type`/`shape`/`entityId`/`entityType`/`minEntities`/`count`/`criteria`) and an `Attachments` evaluator alongside `Criteria` and `ToolCalls`. Deterministic match for type/shape/identifier/count; LLM judge for free-form `criteria` over the matched payload. ### New spec `evals/v2/entity_attachment_side_effect.spec.ts`: - Bulk-indexes two seeded user entities (`attach-alice`, `attach-bob`) directly into the V2 latest alias (fast path — follows `highlights_v2.ts`), so the attachment codepath activates without running the full extractor + maintainer pipeline (`beforeAll` runs in ~5s vs several minutes). - Asserts single-card (`count.min: 1`, `shape: single`), table (`shape: table`, `minEntities: 2`), and negative (`count.exact: 0`) cases. ### Config - Enables `entityAttachmentRichRenderer` in the `evals_entity_analytics_v2` Scout configSet so tool-side attachment creation is active. - Adds `@kbn/entity-store` to `tsconfig.json` `kbn_references`. - README coverage matrix and assertion docs updated. ## Test plan - [ ] Start Scout server: `node scripts/scout start-server --arch stateful --domain classic --serverConfigSet evals_entity_analytics_v2` - [ ] Run the new spec: `node scripts/playwright test --config x-pack/solutions/security/packages/kbn-evals-suite-entity-analytics/playwright.v2.config.ts x-pack/solutions/security/packages/kbn-evals-suite-entity-analytics/evals/v2/entity_attachment_side_effect.spec.ts --project="<connector>"` - [ ] Attachments column reports `mean: 1, std: 0` across all 3 examples (single, table, negative). - [ ] Confirm existing v2 specs (`entity_store_v2_{get_entity,search_entities,multi_skill}.spec.ts`) still pass — they have no `attachments` assertions, so the new evaluator must auto-pass with `score: 1`. - [ ] Verify `afterAll` teardown cleans up entity engines. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>

macroscopeapp Bot reviewed Apr 24, 2026

View reviewed changes

enriquesanchez-elastic force-pushed the ea/attachment-evals branch from 359dade to 3f6dd79 Compare April 27, 2026 08:31

macroscopeapp Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread x-pack/solutions/security/packages/kbn-evals-suite-entity-analytics/src/evaluate_dataset.ts

enriquesanchez-elastic self-assigned this Apr 29, 2026

enriquesanchez-elastic and others added 3 commits April 29, 2026 13:17

[EA] fix entityIdMatches suffix false-positive

f90c9cc

Drop endsWith fallback in entityIdMatches: it matched any suffix (e.g. expected "smith" matched "goldsmith"). Equality on the full or stripped {type}: form is sufficient. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

enriquesanchez-elastic force-pushed the ea/attachment-evals branch from 11cb1cb to f90c9cc Compare April 29, 2026 11:22

Merge branch 'main' into ea/attachment-evals

2cfdb32

enriquesanchez-elastic marked this pull request as ready for review April 29, 2026 11:29

enriquesanchez-elastic requested review from a team as code owners April 29, 2026 11:29

enriquesanchez-elastic requested a review from tcalopes April 29, 2026 11:29

infra-vault-gh-plugin-prod Bot deleted a comment from elasticmachine Apr 29, 2026

kibanamachine added 2 commits April 29, 2026 11:46

Changes from node scripts/regenerate_moon_projects.js --update

2dfd6bd

Changes from node scripts/eslint_all_files --no-cache --fix

b849334

ymao1 added the models:weekly-eis-models Run evals against the weekly EIS model set (see eval_pipeline.ts) label Apr 29, 2026

ymao1 reviewed Apr 29, 2026

View reviewed changes

ymao1 approved these changes Apr 29, 2026

View reviewed changes

[EA] remove unused entityAttachmentRichRenderer feature flag

ccf33af

Flag was removed in original PR; drop from evals stateful config. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

SrdjanLL approved these changes Apr 30, 2026

View reviewed changes

Merge branch 'main' into ea/attachment-evals

350833d

enriquesanchez-elastic enabled auto-merge (squash) May 4, 2026 08:02

enriquesanchez-elastic merged commit 9ad2153 into main May 4, 2026
25 checks passed

enriquesanchez-elastic deleted the ea/attachment-evals branch May 4, 2026 09:01

kibanamachine added the v9.5.0 label May 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EA] Add attachment side-effect evals for Entity Store V2#265465

[EA] Add attachment side-effect evals for Entity Store V2#265465
enriquesanchez-elastic merged 8 commits intomainfrom
ea/attachment-evals

enriquesanchez-elastic commented Apr 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

macroscopeapp Bot Apr 24, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

infra-vault-gh-plugin-prod Bot commented Apr 29, 2026

Uh oh!

ymao1 commented Apr 29, 2026

Uh oh!

ymao1 Apr 29, 2026

Uh oh!

enriquesanchez-elastic Apr 30, 2026

Uh oh!

ymao1 left a comment

Uh oh!

SrdjanLL left a comment

Uh oh!

kibanamachine commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

enriquesanchez-elastic commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Eval harness changes

New spec

Config

Test plan

Uh oh!

Uh oh!

macroscopeapp Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

infra-vault-gh-plugin-prod Bot commented Apr 29, 2026

Uh oh!

ymao1 commented Apr 29, 2026

Uh oh!

ymao1 Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

enriquesanchez-elastic Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

ymao1 left a comment

Choose a reason for hiding this comment

Uh oh!

SrdjanLL left a comment

Choose a reason for hiding this comment

Uh oh!

kibanamachine commented May 4, 2026

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

History

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

enriquesanchez-elastic commented Apr 24, 2026 •

edited

Loading