Skip to content

Conversation

@rdblue
Copy link

@rdblue rdblue commented Jan 27, 2020

This updates the cherry-pick addition with a few changes. I took a closer look at the commit logic to make sure it was correct and found that I had previously made some bad recommendations. That's why I thought it would be easier for me to fix it and open a PR.

One problem was that SnapshotManager was keeping a TableMetadata instance and a Table instance and using those for validation. That's not a good practice because we should assume that the table metadata is changing. Table was passed in because I wanted to keep the SnapshotUtil API in terms of Table and not TableMetadata or TableOperations, but I think it wasn't worth it because it confuses validation logic. Instead, I've moved the ancestor methods into SnapshotManager. We can generalize this later if we decide to allow SnapshotUtil methods to accept TableMetadata.

Another problem was that apply was validating changes and then calling super.apply for cherry-pick operations. In super.apply, the table metadata is refreshed, so it could change between validating the current state and applying the changes. To fix this, I added apply(TableMetadata) that is called by SnapshotProducer.apply so that the validation is always done on the correct version.

This also catches the case where a cherry-pick is actually a fast-forward and uses that commit instead.

Last, I made some minor modifications to TableMetadata to minimize the overall changes.

return super.apply();
if (base.currentSnapshot().snapshotId() == base.snapshot(targetSnapshotId).parentId()) {
// the snapshot to cherrypick is already based on the current state: fast-forward
validate(base);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how many times do you reckon validate(base) would be called per snapshot commit? I'm asking coz it's an expensive operations, validate is θ(2N) where N is # ancestors, which can grow monotonically.

Copy link
Collaborator

@prodeezy prodeezy Jan 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR only calls validate(base) once for each commit. So your PR doesn't add any further complexity to what we already had earlier in the original changes. Although, i'm just pondering on how this can impact commit performance more generally. (this doesn't have to be addressed in this pr)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to be concerned about performance here. Tables usually have under a few thousand snapshots for us, and we have very frequent commits and a 7 day retention window. Even if we process a few thousand snapshots a couple times, it's not going to be much of a delay because these are all in memory.

Also, we have to process the snapshots for each new version because the set of valid snapshots may have changed. Every time the commit fails, this will retry and validate the latest table state, which is unavoidable. In the best case, that will happen just once, but it could be up to the number of retries configured for the table.

Copy link
Collaborator

@prodeezy prodeezy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had some concerns around average complexity based on how often validate(base) would be called but lgtm otherwise. Thanks for catching and addressing this!

Copy link
Owner

@rominparekh rominparekh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdblue : I like the simplicity of the code change. Using TableMetadata from TableOperations was nice touch!

updated = base.replaceCurrentSnapshot(newSnapshot);
}

if (updated == base) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@prodeezy prodeezy merged commit 0a305c4 into rominparekh:cherrypick.snapshot Jan 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants