DBZ-7050 Add automatic retry for snapshots#163
Conversation
|
@jpechane Can you review this when you get the chance? Thank you! |
| } | ||
|
|
||
| public List<String> getGtid() { | ||
| public String getGtid() { |
There was a problem hiding this comment.
Should the method name called getVGtid?
|
|
||
| public static final Field GTID = Field.create(VITESS_CONFIG_GROUP_PREFIX + "gtid") | ||
| .withDisplayName("gtid") | ||
| public static final Field VGTID = Field.create(VITESS_CONFIG_GROUP_PREFIX + "vgtid") |
There was a problem hiding this comment.
Will it cause any regression with the config name changes from gtid to vgtid?
There was a problem hiding this comment.
Yes, it will. It is necessary to deprecate the existing setting. So both should be available and when the deprecated one is used then WARN should be written to the log.
jpechane
left a comment
There was a problem hiding this comment.
@twthorn Thanks for the PR. I left few minor comments. One question for you. I supposed that incomplete snapshots would be resumed automaically so I'd expect additional data being writtent to or read from offsets. Is is the case?
Also for testing there should be test when connector is started. Then it is stopped in the middle of snapshot and started again. There should be no lost records or duplicates.
| <type>test-jar</type> | ||
| <scope>test</scope> | ||
| </dependency> | ||
| <dependency> |
There was a problem hiding this comment.
Could you please place this dependency version into dependencyManagement and configure it via property?
Also I'd recommend to align the version with the one pulled by grpc client.
|
|
||
| import binlogdata.Binlogdata; | ||
|
|
||
| public class TablePrimaryKeys { |
There was a problem hiding this comment.
Could you please add JavaDoc to this class?
|
|
||
| public static final List<String> EMPTY_GTID_LIST = List.of(Vgtid.EMPTY_GTID); | ||
| public static final List<String> DEFAULT_GTID_LIST = List.of(Vgtid.CURRENT_GTID); | ||
| public static final String DEFAULT_GTID = Vgtid.CURRENT_GTID; |
There was a problem hiding this comment.
Should this be moved to Vgtid class?
|
|
||
| public static final Field GTID = Field.create(VITESS_CONFIG_GROUP_PREFIX + "gtid") | ||
| .withDisplayName("gtid") | ||
| public static final Field VGTID = Field.create(VITESS_CONFIG_GROUP_PREFIX + "vgtid") |
There was a problem hiding this comment.
Yes, it will. It is necessary to deprecate the existing setting. So both should be available and when the deprecated one is used then WARN should be written to the log.
Correct, there will be additional data read/written from offsets. There is now an additional field Also worth noting, for rolling forward, this works, and I added a test showing this (we can read offsets that lack the field |
|
|
||
| @Test | ||
| public void testSnapshotLargeTable() throws Exception { | ||
| TestHelper.executeDDL("vitess_create_tables.ddl"); |
There was a problem hiding this comment.
@twthorn Could you please store all the records captured and assert at the end that all are present and there are neither gaps nor duplicates?
|
@twthorn LGTM. I left one comment requesting a hardening of a a test. Otherwise we are good to go. |
|
@jpechane Thanks for the changes! Please let me know if there's anything else I can do on the PR |
|
@twthorn Applied, thanks a lot |
Add automatic retry for snapshots.
Automatic Retry
As detailed in the table copy RFC the last PK value in the VGTID is used for resuming partially complete snapshots. We utilize this for our automatic retries and parse this additional information now being sent by Vitess in the latest version (i.e., including fixes like this).
One implementation decision I debated between was switching all of our VGTID parsing logic to instead use the provided protobuf json converters. This would simplify our codebase but also mean the Debezium logic is more tightly coupled with Vitesss/protobuf, so for the latter reason I decided against this.
GTID -> VGTID config
Previous configs had the option of
vitess.gtidwhich was originally added whenvitess.shardonly allowed for a single shard. With the changes to support multiple shards back in #135, we made GTID into a CSV as well. In order to easily test the automatic retry snapshot behavior, it would be useful to also specify the last PK values in the config. Additionally, GTID csv adds unnecessary complexity to support another format of specifying GTID(s). So to resolve both of these issues, I changed this to simply be avitess.vgtidconfig which is a JSON parseable string that we can use to initialize our VGTID for a VStream. Since VGTID is an array of keyspace/shard/gtid/lastpks it has far more versatility for the user to be able to arbitrarily specify the GTID(s) for any shard(s) subset and even PKs to resume snapshots on. This is what is used to test the snapshot resume behavior in the integration tests.