Immediately write checkpoint when copy begins #298

samongyr-sq · 2024-06-13T18:33:07Z

If a migration is interrupted before Spirit gets a chance to write to the _chkpnt table, progress is lost as Spirit is unable to pick up from where it left off and the following message is logged

INFO[0000] could not resume from checkpoint: reason=could not read from table '_sbtest1_chkpnt'

Ideally, Spirit would write to _chkpnt ASAP to avoid losing progress. Alternatively, an option to configure the interval at which checkpoints are taken may be sufficient.

The text was updated successfully, but these errors were encountered:

morgo · 2024-08-16T22:00:24Z

I have un-assigned because I'm not currently working on this - but I might pick it up again in future.

There is a tradeoff with saving progress immediately, in that if it fails immediately, and then there is a long period of delay before restarting, resuming will actually be more expensive than starting again because of all the binary logs that need to be processed (vs. starting again where they don't need to be).

This is a weak argument however, since we currently start the first checkpoint ~1 minute in, and checkpointing after 1m vs 1s doesn't make much of a difference.

jayjanssen · 2024-11-01T16:07:17Z

I think the bigger issue is that spirit is unhappy if the checkpoint table is totally empty and it just fails.

one quick fix might be to let spirit continue if the checkpoint table is empty as if the table just wasn't there to begin with

morgo self-assigned this Jun 17, 2024

morgo removed their assignment Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Immediately write checkpoint when copy begins #298

Immediately write checkpoint when copy begins #298

samongyr-sq commented Jun 13, 2024

morgo commented Aug 16, 2024

jayjanssen commented Nov 1, 2024

Immediately write checkpoint when copy begins #298

Immediately write checkpoint when copy begins #298

Comments

samongyr-sq commented Jun 13, 2024

morgo commented Aug 16, 2024

jayjanssen commented Nov 1, 2024