-
Notifications
You must be signed in to change notification settings - Fork 0
Minimize changes and fix commit logic. #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimize changes and fix commit logic. #2
Conversation
| return super.apply(); | ||
| if (base.currentSnapshot().snapshotId() == base.snapshot(targetSnapshotId).parentId()) { | ||
| // the snapshot to cherrypick is already based on the current state: fast-forward | ||
| validate(base); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how many times do you reckon validate(base) would be called per snapshot commit? I'm asking coz it's an expensive operations, validate is θ(2N) where N is # ancestors, which can grow monotonically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR only calls validate(base) once for each commit. So your PR doesn't add any further complexity to what we already had earlier in the original changes. Although, i'm just pondering on how this can impact commit performance more generally. (this doesn't have to be addressed in this pr)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to be concerned about performance here. Tables usually have under a few thousand snapshots for us, and we have very frequent commits and a 7 day retention window. Even if we process a few thousand snapshots a couple times, it's not going to be much of a delay because these are all in memory.
Also, we have to process the snapshots for each new version because the set of valid snapshots may have changed. Every time the commit fails, this will retry and validate the latest table state, which is unavoidable. In the best case, that will happen just once, but it could be up to the number of retries configured for the table.
prodeezy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had some concerns around average complexity based on how often validate(base) would be called but lgtm otherwise. Thanks for catching and addressing this!
rominparekh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rdblue : I like the simplicity of the code change. Using TableMetadata from TableOperations was nice touch!
| updated = base.replaceCurrentSnapshot(newSnapshot); | ||
| } | ||
|
|
||
| if (updated == base) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
This updates the cherry-pick addition with a few changes. I took a closer look at the commit logic to make sure it was correct and found that I had previously made some bad recommendations. That's why I thought it would be easier for me to fix it and open a PR.
One problem was that
SnapshotManagerwas keeping aTableMetadatainstance and aTableinstance and using those for validation. That's not a good practice because we should assume that the table metadata is changing.Tablewas passed in because I wanted to keep theSnapshotUtilAPI in terms ofTableand notTableMetadataorTableOperations, but I think it wasn't worth it because it confuses validation logic. Instead, I've moved the ancestor methods intoSnapshotManager. We can generalize this later if we decide to allowSnapshotUtilmethods to acceptTableMetadata.Another problem was that
applywas validating changes and then callingsuper.applyfor cherry-pick operations. Insuper.apply, the table metadata is refreshed, so it could change between validating the current state and applying the changes. To fix this, I addedapply(TableMetadata)that is called bySnapshotProducer.applyso that the validation is always done on the correct version.This also catches the case where a cherry-pick is actually a fast-forward and uses that commit instead.
Last, I made some minor modifications to
TableMetadatato minimize the overall changes.