[EA] Add attachment side-effect evals for Entity Store V2#265465
[EA] Add attachment side-effect evals for Entity Store V2#265465enriquesanchez-elastic merged 8 commits intomainfrom
Conversation
| {hasGroup || isLoading ? ( | ||
| <ResolutionGroupTable | ||
| group={group ?? null} | ||
| isLoading={isLoading} | ||
| isError={isError} | ||
| targetEntityId={targetEntityId} | ||
| currentEntityId={currentEntityStoreEntityId} | ||
| showActions={false} | ||
| /> | ||
| ) : ( | ||
| <EuiText size="xs" color="subdued"> | ||
| {EMPTY_LABEL} | ||
| </EuiText> | ||
| )} | ||
| </EuiAccordion> |
There was a problem hiding this comment.
🟡 Medium entity_card/resolution_mini.tsx:107
When useResolutionGroup returns an error (isError=true), the component renders EMPTY_LABEL ("No resolution group yet.") instead of the error message. At line 70, the early return only covers !isError, so error states fall through. Then at line 107, the condition hasGroup || isLoading evaluates to false when there's an error without data, routing to the EMPTY_LABEL branch. The ResolutionGroupTable already handles errors and would display RESOLUTION_FETCH_ERROR, but it's never reached. Consider changing line 107 to hasGroup || isLoading || isError so errors route to the table.
- {hasGroup || isLoading ? (
+ {hasGroup || isLoading || isError ? (
<ResolutionGroupTable
group={group ?? null}
isLoading={isLoading}
isError={isError}
targetEntityId={targetEntityId}
currentEntityId={currentEntityStoreEntityId}
showActions={false}
/>
) : (
<EuiText size="xs" color="subdued">
{EMPTY_LABEL}
</EuiText>
)}🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file x-pack/solutions/security/plugins/security_solution/public/agent_builder/attachment_types/entity_attachment/entity_card/resolution_mini.tsx around lines 107-121:
When `useResolutionGroup` returns an error (`isError=true`), the component renders `EMPTY_LABEL` ("No resolution group yet.") instead of the error message. At line 70, the early return only covers `!isError`, so error states fall through. Then at line 107, the condition `hasGroup || isLoading` evaluates to `false` when there's an error without data, routing to the `EMPTY_LABEL` branch. The `ResolutionGroupTable` already handles errors and would display `RESOLUTION_FETCH_ERROR`, but it's never reached. Consider changing line 107 to `hasGroup || isLoading || isError` so errors route to the table.
Evidence trail:
x-pack/solutions/security/plugins/security_solution/public/agent_builder/attachment_types/entity_attachment/entity_card/resolution_mini.tsx (REVIEWED_COMMIT) - line 70 shows early return condition `if (!isLoading && !isError && !hasGroup)`, line 107 shows ternary `hasGroup || isLoading ? <ResolutionGroupTable.../> : <EMPTY_LABEL>`
x-pack/solutions/security/plugins/security_solution/public/entity_analytics/components/entity_resolution/resolution_group_table.tsx (REVIEWED_COMMIT) - lines 206-211 show proper error handling with `if (isError) { return ... RESOLUTION_FETCH_ERROR }` that is unreachable from resolution_mini.tsx during error states
359dade to
3f6dd79
Compare
Extends the entity-analytics evals suite with assertions that the
security.entity conversation attachment is persisted as a side effect of
security.get_entity (single card) and security.search_entities (table),
and that no attachment is persisted when an entity cannot be resolved.
Changes:
- evals_suite chat_client: fetch /api/agent_builder/conversations/{id}/attachments after each converse call; surface on the chat task output.
- evals_suite evaluate_dataset: new AttachmentAssertion schema (type/shape/entityId/entityType/minEntities/count/criteria) + Attachments evaluator alongside Criteria/ToolCalls.
- New spec evals/v2/entity_attachment_side_effect.spec.ts: bulk-indexes two user entities into the V2 latest alias (following highlights_v2.ts pattern) so the attachment codepath activates without running the extractor + maintainer pipeline; asserts single-card, table, and negative cases.
- Enables entityAttachmentRichRenderer in the evals_entity_analytics_v2 Scout configSet so the tool-side attachment creation is active.
- Adds @kbn/entity-store to tsconfig.json kbn_references.
- README coverage matrix and assertion docs updated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Groundedness `score` collapsed to 0 whenever the LLM judge emitted a per-claim verdict of `NOT_FOUND`, even when `summary_verdict` was `GROUNDED`. The scoring map keyed off `NOT_IN_GROUND_TRUTH`, while the prompt schema and types use `NOT_FOUND` — the lookup missed and `claimScore` defaulted to 0, propagating through the geometric mean. Also persist the full `groundednessAnalysis` and `correctnessAnalysis` on the quantitative evaluators' `metadata`, so per-claim verdicts are queryable in the `kibana-evaluations` index for forensic triage instead of requiring a re-run. Tracking issues: - elastic/security-team#17044 (scoring key drift) - elastic/security-team#17045 (metadata forensics gap) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop endsWith fallback in entityIdMatches: it matched any suffix
(e.g. expected "smith" matched "goldsmith"). Equality on the full or
stripped {type}: form is sufficient.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
11cb1cb to
f90c9cc
Compare
|
Pinging @elastic/security-entity-analytics (Team:Entity Analytics) |
|
/ci |
| `--uiSettings.overrides.agentBuilder:experimentalFeatures=true`, | ||
| `--uiSettings.overrides.securitySolution:entityStoreEnableV2=true`, | ||
| `--xpack.securitySolution.enableExperimental=["entityAnalyticsEntityStoreV2"]`, | ||
| `--xpack.securitySolution.enableExperimental=["entityAnalyticsEntityStoreV2","entityAttachmentRichRenderer"]`, |
There was a problem hiding this comment.
this entityAttachmentRichRenderer is no longer needed. it was removed in the original PR
Flag was removed in original PR; drop from evals stateful config. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SrdjanLL
left a comment
There was a problem hiding this comment.
LGTM!
And thanks for the fix on the groundedness evaluator 🙌🏻
💛 Build succeeded, but was flaky
Failed CI StepsMetrics [docs]
History
|
…5465) ## Summary Companion eval coverage for elastic#264985 (merged). Adds coverage that the `security.entity` conversation attachment is persisted as a side effect of `security.get_entity` (single card) and `security.search_entities` (table), and that no attachment is persisted when an entity cannot be resolved. ### Eval harness changes - `chat_client`: fetches `/api/agent_builder/conversations/{id}/attachments` after each `converse` and surfaces them on the task output. - `evaluate_dataset`: adds `AttachmentAssertion` schema (`type`/`shape`/`entityId`/`entityType`/`minEntities`/`count`/`criteria`) and an `Attachments` evaluator alongside `Criteria` and `ToolCalls`. Deterministic match for type/shape/identifier/count; LLM judge for free-form `criteria` over the matched payload. ### New spec `evals/v2/entity_attachment_side_effect.spec.ts`: - Bulk-indexes two seeded user entities (`attach-alice`, `attach-bob`) directly into the V2 latest alias (fast path — follows `highlights_v2.ts`), so the attachment codepath activates without running the full extractor + maintainer pipeline (`beforeAll` runs in ~5s vs several minutes). - Asserts single-card (`count.min: 1`, `shape: single`), table (`shape: table`, `minEntities: 2`), and negative (`count.exact: 0`) cases. ### Config - Enables `entityAttachmentRichRenderer` in the `evals_entity_analytics_v2` Scout configSet so tool-side attachment creation is active. - Adds `@kbn/entity-store` to `tsconfig.json` `kbn_references`. - README coverage matrix and assertion docs updated. ## Test plan - [ ] Start Scout server: `node scripts/scout start-server --arch stateful --domain classic --serverConfigSet evals_entity_analytics_v2` - [ ] Run the new spec: `node scripts/playwright test --config x-pack/solutions/security/packages/kbn-evals-suite-entity-analytics/playwright.v2.config.ts x-pack/solutions/security/packages/kbn-evals-suite-entity-analytics/evals/v2/entity_attachment_side_effect.spec.ts --project="<connector>"` - [ ] Attachments column reports `mean: 1, std: 0` across all 3 examples (single, table, negative). - [ ] Confirm existing v2 specs (`entity_store_v2_{get_entity,search_entities,multi_skill}.spec.ts`) still pass — they have no `attachments` assertions, so the new evaluator must auto-pass with `score: 1`. - [ ] Verify `afterAll` teardown cleans up entity engines. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Summary
Companion eval coverage for #264985 (merged). Adds coverage that the
security.entityconversation attachment is persisted as a side effect ofsecurity.get_entity(single card) andsecurity.search_entities(table), and that no attachment is persisted when an entity cannot be resolved.Eval harness changes
chat_client: fetches/api/agent_builder/conversations/{id}/attachmentsafter eachconverseand surfaces them on the task output.evaluate_dataset: addsAttachmentAssertionschema (type/shape/entityId/entityType/minEntities/count/criteria) and anAttachmentsevaluator alongsideCriteriaandToolCalls. Deterministic match for type/shape/identifier/count; LLM judge for free-formcriteriaover the matched payload.New spec
evals/v2/entity_attachment_side_effect.spec.ts:attach-alice,attach-bob) directly into the V2 latest alias (fast path — followshighlights_v2.ts), so the attachment codepath activates without running the full extractor + maintainer pipeline (beforeAllruns in ~5s vs several minutes).count.min: 1,shape: single), table (shape: table,minEntities: 2), and negative (count.exact: 0) cases.Config
entityAttachmentRichRendererin theevals_entity_analytics_v2Scout configSet so tool-side attachment creation is active.@kbn/entity-storetotsconfig.jsonkbn_references.Test plan
node scripts/scout start-server --arch stateful --domain classic --serverConfigSet evals_entity_analytics_v2node scripts/playwright test --config x-pack/solutions/security/packages/kbn-evals-suite-entity-analytics/playwright.v2.config.ts x-pack/solutions/security/packages/kbn-evals-suite-entity-analytics/evals/v2/entity_attachment_side_effect.spec.ts --project="<connector>"mean: 1, std: 0across all 3 examples (single, table, negative).entity_store_v2_{get_entity,search_entities,multi_skill}.spec.ts) still pass — they have noattachmentsassertions, so the new evaluator must auto-pass withscore: 1.afterAllteardown cleans up entity engines.🤖 Generated with Claude Code