[8.19] [ML]Fix latest transforms disregarding updates when sort and sync fields are non-monotonic (#142856) by valeriy42 · Pull Request #143214 · elastic/elasticsearch

valeriy42 · 2026-02-27T08:53:40Z

Backports the following commits to 8.19:

[ML]Fix latest transforms disregarding updates when sort and sync fields are non-monotonic ([ML]Fix latest transforms disregarding updates when sort and sync fields are non-monotonic #142856)

…lds are non-monotonic (elastic#142856) Continuous latest transforms could overwrite newer documents if sort and sync fields didn't increase together. Checkpoint N only queried documents in `[lastCheckpoint, nextCheckpoint)`, making some documents invisible and causing `top_hits` to select the wrong document. The fix introduces two-phase change detection: phase 1 finds updated keys via a composite aggregation; phase 2 runs a full query with a filter for those keys, ensuring `top_hits` picks the latest document. Changes are within `LatestChangeCollector` with no impact on API or schema, ensuring correct behavior after upgrade. The new behavior aligns `latest` with the pattern used by `pivot` transforms. Pivot employs `CompositeBucketsChangeCollector`, which runs two phases via `TransformIndexer`: in **IDENTIFY_CHANGES**, it performs a composite aggregation over the checkpoint window with sync range, recording changed buckets using collectors like `TermsFieldCollector` and `DateHistogramFieldCollector`. In **APPLY_RESULTS**, it builds the pivot query, narrowing it with filters from these collectors. `Latest` now mirrors this at the unique-key level: phase 1 is a composite over unique key fields, and phase 2 filters by collected key values to run over full history. The key difference is that pivot’s “changed buckets” are the group-by dimensions, while latest’s are the unique key values for recomputing Performance impact is limited: one extra search per checkpoint in phase 1 (composite aggregation only, no `top_hits`), and phase 2 processes only changed unique keys, not the whole dataset. No Painless scripts, per-document GET/UpdateRequest, or new destination fields. Unit tests cover `LatestChangeCollector` (buildChangesQuery, processSearchResponse, buildFilterQuery, clear, single and multi-field unique key, null buckets); a Java REST test reproduces the non-monotonic scenario (two docs, same key, different sort/sync order) and asserts the destination keeps the doc with higher sort value after checkpoint 2; YAML REST tests assert latest preview and batch behavior with non-monotonic data. Fixes elastic#90643

…ync fields are non-monotonic (#142856) (#143214) * [ML]Fix latest transforms disregarding updates when sort and sync fields are non-monotonic (#142856) Continuous latest transforms could overwrite newer documents if sort and sync fields didn't increase together. Checkpoint N only queried documents in `[lastCheckpoint, nextCheckpoint)`, making some documents invisible and causing `top_hits` to select the wrong document. The fix introduces two-phase change detection: phase 1 finds updated keys via a composite aggregation; phase 2 runs a full query with a filter for those keys, ensuring `top_hits` picks the latest document. Changes are within `LatestChangeCollector` with no impact on API or schema, ensuring correct behavior after upgrade. The new behavior aligns `latest` with the pattern used by `pivot` transforms. Pivot employs `CompositeBucketsChangeCollector`, which runs two phases via `TransformIndexer`: in **IDENTIFY_CHANGES**, it performs a composite aggregation over the checkpoint window with sync range, recording changed buckets using collectors like `TermsFieldCollector` and `DateHistogramFieldCollector`. In **APPLY_RESULTS**, it builds the pivot query, narrowing it with filters from these collectors. `Latest` now mirrors this at the unique-key level: phase 1 is a composite over unique key fields, and phase 2 filters by collected key values to run over full history. The key difference is that pivot’s “changed buckets” are the group-by dimensions, while latest’s are the unique key values for recomputing Performance impact is limited: one extra search per checkpoint in phase 1 (composite aggregation only, no `top_hits`), and phase 2 processes only changed unique keys, not the whole dataset. No Painless scripts, per-document GET/UpdateRequest, or new destination fields. Unit tests cover `LatestChangeCollector` (buildChangesQuery, processSearchResponse, buildFilterQuery, clear, single and multi-field unique key, null buckets); a Java REST test reproduces the non-monotonic scenario (two docs, same key, different sort/sync order) and asserts the destination keeps the doc with higher sort value after checkpoint 2; YAML REST tests assert latest preview and batch behavior with non-monotonic data. Fixes #90643 * Fix build error

valeriy42 added :ml/Transform Transform >bug auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport Team:ML Meta label for the ML team labels Feb 27, 2026

elasticsearchmachine added the v8.19.13 label Feb 27, 2026

elasticsearchmachine mentioned this pull request Feb 27, 2026

[ML]Fix latest transforms disregarding updates when sort and sync fields are non-monotonic #142856

Merged

Fix build error

8b4d0d3

elasticsearchmachine merged commit d00780e into elastic:8.19 Feb 27, 2026
28 checks passed

valeriy42 deleted the backport/8.19/pr-142856 branch February 27, 2026 11:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[8.19] [ML]Fix latest transforms disregarding updates when sort and sync fields are non-monotonic (#142856)#143214

[8.19] [ML]Fix latest transforms disregarding updates when sort and sync fields are non-monotonic (#142856)#143214
elasticsearchmachine merged 2 commits intoelastic:8.19from
valeriy42:backport/8.19/pr-142856

valeriy42 commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

valeriy42 commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants