[kbn-evals] Fix missing datasets in report table due to refresh race by patrykkopycinski · Pull Request #265549 · elastic/kibana

patrykkopycinski · 2026-04-24T13:33:42Z

Summary

Fixes a race condition where the last dataset(s) in an eval run would intermittently be missing from the results table.

Root cause: indexSingleScore() writes individual score documents with refresh: false for performance. When exportEvaluations() later attempts a bulk upsert of the same documents, it gets 409 Conflict responses for already-written docs. The refresh: 'wait_for' on the bulk request only applies to newly created documents — not the conflicted ones. This leaves the last scenario's score documents invisible when reportModelScore() queries the kibana-evaluations index for aggregated stats.

Fix: Add an explicit indices.refresh({ index: 'kibana-evaluations' }) call after exportEvaluations() and before reportModelScore() to ensure all score documents are searchable before the stats query runs. The .catch(() => {}) silently handles the case where the index doesn't exist yet.

Test plan

Run a multi-dataset eval suite (e.g., pci-compliance with 8 datasets) — all datasets should appear in the final results table
Verified the fix resolves the missing "no matching data" dataset that was previously intermittently absent

Made with Cursor

indexSingleScore writes documents with refresh:false for performance. The subsequent exportEvaluations bulk upsert gets 409 conflicts on already-written docs, and its refresh:'wait_for' only applies to newly created documents. This leaves the last scenario's scores invisible when reportModelScore queries the index for aggregated stats. Add an explicit indices.refresh() after exportEvaluations and before reportModelScore to ensure all score documents are searchable.

elasticmachine · 2026-04-24T14:33:41Z

⏳ Build in-progress, with failures

Buildkite Build
Commit: 1dd7853

Failed CI Steps

Check Types

History

💔 Build #433602 failed 4eb5f87

Model.id is string | undefined in @kbn/inference-common, so the interface must accept undefined too.

indexSingleScore writes documents with refresh:false for performance. The subsequent exportEvaluations bulk upsert gets 409 conflicts on already-written docs, and its refresh:'wait_for' only applies to newly created documents. This leaves the last scenario's scores invisible when reportModelScore queries the index for aggregated stats. Add an explicit indices.refresh() after exportEvaluations and before reportModelScore to ensure all score documents are searchable.

macroscopeapp · 2026-04-24T14:46:25Z

+      // on those docs so its refresh:'wait_for' won't cover them. Force a refresh
+      // to make every score visible before the stats query.
+      await evaluationsEsClient.indices
+        .refresh({ index: 'kibana-evaluations' })


🟢 Low src/evaluate.ts:391

The refresh call at line 391 uses the hardcoded string 'kibana-evaluations' instead of the EVALUATIONS_DATA_STREAM_ALIAS constant used throughout EvaluationScoreRepository. If the constant value changes, the refresh silently targets the wrong index (error swallowed by .catch(() => {})), causing reportModelScore to potentially not see all documents. Consider using EVALUATIONS_DATA_STREAM_ALIAS instead.

🤖 Copy this AI Prompt to have your agent fix this:

In file x-pack/platform/packages/shared/kbn-evals/src/evaluate.ts around line 391: The refresh call at line 391 uses the hardcoded string `'kibana-evaluations'` instead of the `EVALUATIONS_DATA_STREAM_ALIAS` constant used throughout `EvaluationScoreRepository`. If the constant value changes, the refresh silently targets the wrong index (error swallowed by `.catch(() => {})`), causing `reportModelScore` to potentially not see all documents. Consider using `EVALUATIONS_DATA_STREAM_ALIAS` instead. Evidence trail: x-pack/platform/packages/shared/kbn-evals/src/evaluate.ts lines 389-393 (REVIEWED_COMMIT) - shows hardcoded 'kibana-evaluations' and .catch(() => {}); x-pack/platform/packages/shared/kbn-evals/src/utils/score_repository.ts line 187 (REVIEWED_COMMIT) - defines EVALUATIONS_DATA_STREAM_ALIAS = 'kibana-evaluations'; git_grep results show EVALUATIONS_DATA_STREAM_ALIAS used at lines 411, 416, 418, 485, 501, 541, 634, 658, 670, 727 in score_repository.ts; git_grep for 'export.*EVALUATIONS_DATA_STREAM_ALIAS' returned no results confirming constant is not exported.

kibanamachine · 2026-04-27T10:02:31Z

💚 Build Succeeded

Buildkite Build
Commit: be0ee3d

Metrics [docs]

✅ unchanged

cc @patrykkopycinski

patrykkopycinski requested review from a team as code owners April 24, 2026 13:33

patrykkopycinski mentioned this pull request Apr 24, 2026

[Security Solution][Agent Builder] Harden PCI compliance tools + add eval suite #264378

Merged

6 tasks

patrykkopycinski force-pushed the pk/evals-fix-refresh-race branch from c608bf9 to 4eb5f87 Compare April 24, 2026 13:52

Changes from node scripts/eslint_all_files --no-cache --fix

1dd7853

patrykkopycinski added 4 commits April 24, 2026 16:37

Fix type error: make taskModelId optional in ExportAndReportOptions

6db4aba

Model.id is string | undefined in @kbn/inference-common, so the interface must accept undefined too.

Replace inline imports with top-level imports in ExportAndReportOptions

cd667ab

Add test documenting the 409-conflict refresh gap in exportScores

3dbecea

macroscopeapp Bot reviewed Apr 24, 2026

View reviewed changes

Changes from node scripts/eslint_all_files --no-cache --fix

fbc5e6f

patrykkopycinski self-assigned this Apr 27, 2026

patrykkopycinski added release_note:skip Skip the PR/issue when compiling release notes v9.4.0 v9.5.0 labels Apr 27, 2026

Merge branch 'main' into pk/evals-fix-refresh-race

be0ee3d

patrykkopycinski added the backport:version Backport to applied version labels label Apr 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[kbn-evals] Fix missing datasets in report table due to refresh race#265549

[kbn-evals] Fix missing datasets in report table due to refresh race#265549
patrykkopycinski wants to merge 8 commits into
elastic:mainfrom
patrykkopycinski:pk/evals-fix-refresh-race

patrykkopycinski commented Apr 24, 2026

Uh oh!

elasticmachine commented Apr 24, 2026

Uh oh!

macroscopeapp Bot Apr 24, 2026

Uh oh!

kibanamachine commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

patrykkopycinski commented Apr 24, 2026

Summary

Test plan

Uh oh!

elasticmachine commented Apr 24, 2026

⏳ Build in-progress, with failures

Failed CI Steps

History

Uh oh!

macroscopeapp Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

kibanamachine commented Apr 27, 2026

💚 Build Succeeded

Metrics [docs]

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants