Skip to content

[EA] Add attachment side-effect evals for Entity Store V2#265465

Merged
enriquesanchez-elastic merged 8 commits intomainfrom
ea/attachment-evals
May 4, 2026
Merged

[EA] Add attachment side-effect evals for Entity Store V2#265465
enriquesanchez-elastic merged 8 commits intomainfrom
ea/attachment-evals

Conversation

@enriquesanchez-elastic
Copy link
Copy Markdown
Contributor

@enriquesanchez-elastic enriquesanchez-elastic commented Apr 24, 2026

Summary

Companion eval coverage for #264985 (merged). Adds coverage that the security.entity conversation attachment is persisted as a side effect of security.get_entity (single card) and security.search_entities (table), and that no attachment is persisted when an entity cannot be resolved.

Eval harness changes

  • chat_client: fetches /api/agent_builder/conversations/{id}/attachments after each converse and surfaces them on the task output.
  • evaluate_dataset: adds AttachmentAssertion schema (type/shape/entityId/entityType/minEntities/count/criteria) and an Attachments evaluator alongside Criteria and ToolCalls. Deterministic match for type/shape/identifier/count; LLM judge for free-form criteria over the matched payload.

New spec

evals/v2/entity_attachment_side_effect.spec.ts:

  • Bulk-indexes two seeded user entities (attach-alice, attach-bob) directly into the V2 latest alias (fast path — follows highlights_v2.ts), so the attachment codepath activates without running the full extractor + maintainer pipeline (beforeAll runs in ~5s vs several minutes).
  • Asserts single-card (count.min: 1, shape: single), table (shape: table, minEntities: 2), and negative (count.exact: 0) cases.

Config

  • Enables entityAttachmentRichRenderer in the evals_entity_analytics_v2 Scout configSet so tool-side attachment creation is active.
  • Adds @kbn/entity-store to tsconfig.json kbn_references.
  • README coverage matrix and assertion docs updated.

Test plan

  • Start Scout server: node scripts/scout start-server --arch stateful --domain classic --serverConfigSet evals_entity_analytics_v2
  • Run the new spec: node scripts/playwright test --config x-pack/solutions/security/packages/kbn-evals-suite-entity-analytics/playwright.v2.config.ts x-pack/solutions/security/packages/kbn-evals-suite-entity-analytics/evals/v2/entity_attachment_side_effect.spec.ts --project="<connector>"
  • Attachments column reports mean: 1, std: 0 across all 3 examples (single, table, negative).
  • Confirm existing v2 specs (entity_store_v2_{get_entity,search_entities,multi_skill}.spec.ts) still pass — they have no attachments assertions, so the new evaluator must auto-pass with score: 1.
  • Verify afterAll teardown cleans up entity engines.

🤖 Generated with Claude Code

Comment on lines +107 to +121
{hasGroup || isLoading ? (
<ResolutionGroupTable
group={group ?? null}
isLoading={isLoading}
isError={isError}
targetEntityId={targetEntityId}
currentEntityId={currentEntityStoreEntityId}
showActions={false}
/>
) : (
<EuiText size="xs" color="subdued">
{EMPTY_LABEL}
</EuiText>
)}
</EuiAccordion>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium entity_card/resolution_mini.tsx:107

When useResolutionGroup returns an error (isError=true), the component renders EMPTY_LABEL ("No resolution group yet.") instead of the error message. At line 70, the early return only covers !isError, so error states fall through. Then at line 107, the condition hasGroup || isLoading evaluates to false when there's an error without data, routing to the EMPTY_LABEL branch. The ResolutionGroupTable already handles errors and would display RESOLUTION_FETCH_ERROR, but it's never reached. Consider changing line 107 to hasGroup || isLoading || isError so errors route to the table.

-        {hasGroup || isLoading ? (
+        {hasGroup || isLoading || isError ? (
           <ResolutionGroupTable
             group={group ?? null}
             isLoading={isLoading}
             isError={isError}
             targetEntityId={targetEntityId}
             currentEntityId={currentEntityStoreEntityId}
             showActions={false}
           />
         ) : (
           <EuiText size="xs" color="subdued">
             {EMPTY_LABEL}
           </EuiText>
         )}
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file x-pack/solutions/security/plugins/security_solution/public/agent_builder/attachment_types/entity_attachment/entity_card/resolution_mini.tsx around lines 107-121:

When `useResolutionGroup` returns an error (`isError=true`), the component renders `EMPTY_LABEL` ("No resolution group yet.") instead of the error message. At line 70, the early return only covers `!isError`, so error states fall through. Then at line 107, the condition `hasGroup || isLoading` evaluates to `false` when there's an error without data, routing to the `EMPTY_LABEL` branch. The `ResolutionGroupTable` already handles errors and would display `RESOLUTION_FETCH_ERROR`, but it's never reached. Consider changing line 107 to `hasGroup || isLoading || isError` so errors route to the table.

Evidence trail:
x-pack/solutions/security/plugins/security_solution/public/agent_builder/attachment_types/entity_attachment/entity_card/resolution_mini.tsx (REVIEWED_COMMIT) - line 70 shows early return condition `if (!isLoading && !isError && !hasGroup)`, line 107 shows ternary `hasGroup || isLoading ? <ResolutionGroupTable.../> : <EMPTY_LABEL>`

x-pack/solutions/security/plugins/security_solution/public/entity_analytics/components/entity_resolution/resolution_group_table.tsx (REVIEWED_COMMIT) - lines 206-211 show proper error handling with `if (isError) { return ... RESOLUTION_FETCH_ERROR }` that is unreachable from resolution_mini.tsx during error states

enriquesanchez-elastic and others added 3 commits April 29, 2026 13:17
Extends the entity-analytics evals suite with assertions that the
security.entity conversation attachment is persisted as a side effect of
security.get_entity (single card) and security.search_entities (table),
and that no attachment is persisted when an entity cannot be resolved.

Changes:
- evals_suite chat_client: fetch /api/agent_builder/conversations/{id}/attachments after each converse call; surface on the chat task output.
- evals_suite evaluate_dataset: new AttachmentAssertion schema (type/shape/entityId/entityType/minEntities/count/criteria) + Attachments evaluator alongside Criteria/ToolCalls.
- New spec evals/v2/entity_attachment_side_effect.spec.ts: bulk-indexes two user entities into the V2 latest alias (following highlights_v2.ts pattern) so the attachment codepath activates without running the extractor + maintainer pipeline; asserts single-card, table, and negative cases.
- Enables entityAttachmentRichRenderer in the evals_entity_analytics_v2 Scout configSet so the tool-side attachment creation is active.
- Adds @kbn/entity-store to tsconfig.json kbn_references.
- README coverage matrix and assertion docs updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Groundedness `score` collapsed to 0 whenever the LLM judge emitted a
per-claim verdict of `NOT_FOUND`, even when `summary_verdict` was
`GROUNDED`. The scoring map keyed off `NOT_IN_GROUND_TRUTH`, while the
prompt schema and types use `NOT_FOUND` — the lookup missed and
`claimScore` defaulted to 0, propagating through the geometric mean.

Also persist the full `groundednessAnalysis` and `correctnessAnalysis`
on the quantitative evaluators' `metadata`, so per-claim verdicts are
queryable in the `kibana-evaluations` index for forensic triage instead
of requiring a re-run.

Tracking issues:
- elastic/security-team#17044 (scoring key drift)
- elastic/security-team#17045 (metadata forensics gap)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop endsWith fallback in entityIdMatches: it matched any suffix
(e.g. expected "smith" matched "goldsmith"). Equality on the full or
stripped {type}: form is sufficient.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@enriquesanchez-elastic enriquesanchez-elastic added release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting Team:Entity Analytics Security Entity Analytics Team 9.4 candidate v9.4.0 evals:entity-analytics-v2 Run the entity-analytics-v2 @kbn/evals and removed 9.4 candidate labels Apr 29, 2026
@enriquesanchez-elastic enriquesanchez-elastic marked this pull request as ready for review April 29, 2026 11:29
@enriquesanchez-elastic enriquesanchez-elastic requested review from a team as code owners April 29, 2026 11:29
@infra-vault-gh-plugin-prod
Copy link
Copy Markdown

Pinging @elastic/security-entity-analytics (Team:Entity Analytics)

@ymao1 ymao1 added the models:weekly-eis-models Run evals against the weekly EIS model set (see eval_pipeline.ts) label Apr 29, 2026
@ymao1
Copy link
Copy Markdown
Contributor

ymao1 commented Apr 29, 2026

/ci

`--uiSettings.overrides.agentBuilder:experimentalFeatures=true`,
`--uiSettings.overrides.securitySolution:entityStoreEnableV2=true`,
`--xpack.securitySolution.enableExperimental=["entityAnalyticsEntityStoreV2"]`,
`--xpack.securitySolution.enableExperimental=["entityAnalyticsEntityStoreV2","entityAttachmentRichRenderer"]`,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this entityAttachmentRichRenderer is no longer needed. it was removed in the original PR

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — removed in ccf33af.

Copy link
Copy Markdown
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Flag was removed in original PR; drop from evals stateful config.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@SrdjanLL SrdjanLL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

And thanks for the fix on the groundedness evaluator 🙌🏻

@enriquesanchez-elastic enriquesanchez-elastic enabled auto-merge (squash) May 4, 2026 08:02
@kibanamachine
Copy link
Copy Markdown
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

✅ unchanged

History

cc @enriquesanchez-elastic

@enriquesanchez-elastic enriquesanchez-elastic merged commit 9ad2153 into main May 4, 2026
25 checks passed
@enriquesanchez-elastic enriquesanchez-elastic deleted the ea/attachment-evals branch May 4, 2026 09:01
seanrathier pushed a commit to seanrathier/kibana that referenced this pull request May 4, 2026
…5465)

## Summary

Companion eval coverage for elastic#264985 (merged). Adds coverage that the
`security.entity` conversation attachment is persisted as a side effect
of `security.get_entity` (single card) and `security.search_entities`
(table), and that no attachment is persisted when an entity cannot be
resolved.

### Eval harness changes

- `chat_client`: fetches
`/api/agent_builder/conversations/{id}/attachments` after each
`converse` and surfaces them on the task output.
- `evaluate_dataset`: adds `AttachmentAssertion` schema
(`type`/`shape`/`entityId`/`entityType`/`minEntities`/`count`/`criteria`)
and an `Attachments` evaluator alongside `Criteria` and `ToolCalls`.
Deterministic match for type/shape/identifier/count; LLM judge for
free-form `criteria` over the matched payload.

### New spec

`evals/v2/entity_attachment_side_effect.spec.ts`:
- Bulk-indexes two seeded user entities (`attach-alice`, `attach-bob`)
directly into the V2 latest alias (fast path — follows
`highlights_v2.ts`), so the attachment codepath activates without
running the full extractor + maintainer pipeline (`beforeAll` runs in
~5s vs several minutes).
- Asserts single-card (`count.min: 1`, `shape: single`), table (`shape:
table`, `minEntities: 2`), and negative (`count.exact: 0`) cases.

### Config

- Enables `entityAttachmentRichRenderer` in the
`evals_entity_analytics_v2` Scout configSet so tool-side attachment
creation is active.
- Adds `@kbn/entity-store` to `tsconfig.json` `kbn_references`.
- README coverage matrix and assertion docs updated.

## Test plan

- [ ] Start Scout server: `node scripts/scout start-server --arch
stateful --domain classic --serverConfigSet evals_entity_analytics_v2`
- [ ] Run the new spec: `node scripts/playwright test --config
x-pack/solutions/security/packages/kbn-evals-suite-entity-analytics/playwright.v2.config.ts
x-pack/solutions/security/packages/kbn-evals-suite-entity-analytics/evals/v2/entity_attachment_side_effect.spec.ts
--project="<connector>"`
- [ ] Attachments column reports `mean: 1, std: 0` across all 3 examples
(single, table, negative).
- [ ] Confirm existing v2 specs
(`entity_store_v2_{get_entity,search_entities,multi_skill}.spec.ts`)
still pass — they have no `attachments` assertions, so the new evaluator
must auto-pass with `score: 1`.
- [ ] Verify `afterAll` teardown cleans up entity engines.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting evals:entity-analytics-v2 Run the entity-analytics-v2 @kbn/evals models:weekly-eis-models Run evals against the weekly EIS model set (see eval_pipeline.ts) release_note:skip Skip the PR/issue when compiling release notes Team:Entity Analytics Security Entity Analytics Team v9.4.0 v9.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants