In a rolling upgrade, or node failure scenario, a continuous transform seems to re-read the entire index when it is moved to a new node. I think the last checkpoints timestamp information may be missed when the transform starts executing again on its new node.
I saw this behavior by looking at the stats between the the continuous transform moving between nodes. The documents read increased by the number of docs present in the index each time it changed nodes.
It could either be:
- the stats are being gathered incorrectly (the docs are not actually being read)
- The entire index is being read again when the task starts again.
DataFrameTransformPersistentTasksExecutor and how it reads in checkpoint information is suspect.