CCR: Use single global checkpoint to normalize range#33545
CCR: Use single global checkpoint to normalize range#33545jasontedor merged 1 commit intoelastic:masterfrom
Conversation
We may use different global checkpoints to validate/normalize the range of a change request if the global checkpoint is advanced between these calls. If this is the case, then we generate an invalid request range.
|
Pinging @elastic/es-distributed |
| if (indexShard.state() != IndexShardState.STARTED) { | ||
| throw new IndexShardNotStartedException(indexShard.shardId(), indexShard.state()); | ||
| } | ||
| if (fromSeqNo > indexShard.getGlobalCheckpoint()) { |
There was a problem hiding this comment.
Question: so indexShard.getGlobalCheckpoint() may return a lower seqno than acquired from indexShard.seqNoStats().getGlobalCheckpoint()? I always assumed that the seqno acquired from IndexShard could not go backwards.
There was a problem hiding this comment.
The global checkpoint in the stats object and directly from the index shard are sourced from the same place, the replication tracker. The problem here, as I understand it, is that the global checkpoint could have advanced after capturing the stats. Here is what can happen then:
- suppose that
fromSeqNois 17 - suppose that the global checkpoint in the stats instance is 16
- suppose that the global checkpoint advances to 17 after the stats object is captured
- the
fromSeqNo > indexShard.getGlobalCheckpoint()check will fail (because of the advance), meaning that we skip returning an empty operations response - we then calculate
toSeqNo = Math.min(globalCheckpoint, (fromSeqNo + maxOperationCount) - 1)whereglobalCheckpointis from the stats instance; this would givetoSeqNo == 16 - now we have
[fromSeqNo, toSeqNo] == [17, 16]which produces the invalid range error message
This all happened because we allowed the global checkpoint advancing to become visible to this logic. Had we reused globalCheckpoint from the stats object then fromSeqNo > globalCheckpoint would have succeeded and we would have returned an empty operations response.
martijnvg
left a comment
There was a problem hiding this comment.
I think this looks good. I left a question for my understanding.
|
I'm going to have merge conflicts with this PR @dnhatn so I am going to merge it now. |
We may use different global checkpoints to validate/normalize the range of a change request if the global checkpoint is advanced between these calls. If this is the case, then we generate an invalid request range.
|
Thanks @jasontedor and @martijnvg. |
* master: Remove underscore from auto-follow API (elastic#33550) CCR: Use single global checkpoint to normalize range (elastic#33545)
We may use different global checkpoint values to validate/normalize the range of a change request if the global checkpoint is advanced between these calls. If this is the case, then we generate an invalid request range and cause the follow task aborted.
CI: