[kbn-evals] Fix missing datasets in report table due to refresh race#265549
Open
patrykkopycinski wants to merge 8 commits into
Open
[kbn-evals] Fix missing datasets in report table due to refresh race#265549patrykkopycinski wants to merge 8 commits into
patrykkopycinski wants to merge 8 commits into
Conversation
6 tasks
indexSingleScore writes documents with refresh:false for performance. The subsequent exportEvaluations bulk upsert gets 409 conflicts on already-written docs, and its refresh:'wait_for' only applies to newly created documents. This leaves the last scenario's scores invisible when reportModelScore queries the index for aggregated stats. Add an explicit indices.refresh() after exportEvaluations and before reportModelScore to ensure all score documents are searchable.
c608bf9 to
4eb5f87
Compare
Contributor
⏳ Build in-progress, with failures
Failed CI StepsHistory
|
Model.id is string | undefined in @kbn/inference-common, so the interface must accept undefined too.
indexSingleScore writes documents with refresh:false for performance. The subsequent exportEvaluations bulk upsert gets 409 conflicts on already-written docs, and its refresh:'wait_for' only applies to newly created documents. This leaves the last scenario's scores invisible when reportModelScore queries the index for aggregated stats. Add an explicit indices.refresh() after exportEvaluations and before reportModelScore to ensure all score documents are searchable.
| // on those docs so its refresh:'wait_for' won't cover them. Force a refresh | ||
| // to make every score visible before the stats query. | ||
| await evaluationsEsClient.indices | ||
| .refresh({ index: 'kibana-evaluations' }) |
Contributor
There was a problem hiding this comment.
🟢 Low src/evaluate.ts:391
The refresh call at line 391 uses the hardcoded string 'kibana-evaluations' instead of the EVALUATIONS_DATA_STREAM_ALIAS constant used throughout EvaluationScoreRepository. If the constant value changes, the refresh silently targets the wrong index (error swallowed by .catch(() => {})), causing reportModelScore to potentially not see all documents. Consider using EVALUATIONS_DATA_STREAM_ALIAS instead.
🤖 Copy this AI Prompt to have your agent fix this:
In file x-pack/platform/packages/shared/kbn-evals/src/evaluate.ts around line 391:
The refresh call at line 391 uses the hardcoded string `'kibana-evaluations'` instead of the `EVALUATIONS_DATA_STREAM_ALIAS` constant used throughout `EvaluationScoreRepository`. If the constant value changes, the refresh silently targets the wrong index (error swallowed by `.catch(() => {})`), causing `reportModelScore` to potentially not see all documents. Consider using `EVALUATIONS_DATA_STREAM_ALIAS` instead.
Evidence trail:
x-pack/platform/packages/shared/kbn-evals/src/evaluate.ts lines 389-393 (REVIEWED_COMMIT) - shows hardcoded 'kibana-evaluations' and .catch(() => {}); x-pack/platform/packages/shared/kbn-evals/src/utils/score_repository.ts line 187 (REVIEWED_COMMIT) - defines EVALUATIONS_DATA_STREAM_ALIAS = 'kibana-evaluations'; git_grep results show EVALUATIONS_DATA_STREAM_ALIAS used at lines 411, 416, 418, 485, 501, 541, 634, 658, 670, 727 in score_repository.ts; git_grep for 'export.*EVALUATIONS_DATA_STREAM_ALIAS' returned no results confirming constant is not exported.
Contributor
💚 Build Succeeded
Metrics [docs]
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a race condition where the last dataset(s) in an eval run would intermittently be missing from the results table.
Root cause:
indexSingleScore()writes individual score documents withrefresh: falsefor performance. WhenexportEvaluations()later attempts a bulk upsert of the same documents, it gets409 Conflictresponses for already-written docs. Therefresh: 'wait_for'on the bulk request only applies to newly created documents — not the conflicted ones. This leaves the last scenario's score documents invisible whenreportModelScore()queries thekibana-evaluationsindex for aggregated stats.Fix: Add an explicit
indices.refresh({ index: 'kibana-evaluations' })call afterexportEvaluations()and beforereportModelScore()to ensure all score documents are searchable before the stats query runs. The.catch(() => {})silently handles the case where the index doesn't exist yet.Test plan
pci-compliancewith 8 datasets) — all datasets should appear in the final results tableMade with Cursor