-
Notifications
You must be signed in to change notification settings - Fork 3k
Spark: Fix Fast forward procedure output for non-main branches #8854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark: Fix Fast forward procedure output for non-main branches #8854
Conversation
| table -> { | ||
| long currentRef = table.currentSnapshot().snapshotId(); | ||
| long currentRef = table.snapshot(source).snapshotId(); | ||
| table.manageSnapshots().fastForwardBranch(source, target).commit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing to fix....I think when I originally implemented the fastForward operation I switched the definitions of source/target in my head so the API is confusing (in the API, target means the branch that is actually being fast forwarded, source is where target will be moved). That should probably be reversed so target is actually the target to which source will be moved.
I think we should be able to safely just rename the parameters in the API and update the javadoc. Luckily both are String parameters.
cc @rdblue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I too got confused today when I checked replace branch API vs this procedure variables. Source and target is reversed.
The procedure's named arguments of 'branch' and 'to' is proper. It is like fast forward branch x to y.
Only thing is these internal variables in this procedure is reversed. I think we can rename it in this PR.
ManageSnapshots.replaceBranch and ManageSnapshots.fastForwardBranch seems to have a correct naming IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also agree it is confusing. It would be nice to rename in a follow-up PR or here. It seems like source represents a branch name in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put up a separate PR here #9134
|
also cc @rakesh-das08 let me know your thoughts on this fix! |
|
@amogh-jahagirdar the fix LGTM. Thanks for fixing this. |
...nsions/src/test/java/org/apache/iceberg/spark/extensions/TestFastForwardBranchProcedure.java
Outdated
Show resolved
Hide resolved
| table -> { | ||
| long currentRef = table.currentSnapshot().snapshotId(); | ||
| long currentRef = table.snapshot(source).snapshotId(); | ||
| table.manageSnapshots().fastForwardBranch(source, target).commit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I too got confused today when I checked replace branch API vs this procedure variables. Source and target is reversed.
The procedure's named arguments of 'branch' and 'to' is proper. It is like fast forward branch x to y.
Only thing is these internal variables in this procedure is reversed. I think we can rename it in this PR.
ManageSnapshots.replaceBranch and ManageSnapshots.fastForwardBranch seems to have a correct naming IMO
| tableIdent, | ||
| table -> { | ||
| long currentRef = table.currentSnapshot().snapshotId(); | ||
| long currentRef = table.snapshot(source).snapshotId(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also a recent issue reported for fast forward on an empty branch
#8849.
I have analyzed but it looks to be clumsy if we try to support dummy snapshot. Let me know what you guys think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, actually I was investigating that issue and when going through the procedure code I just noticed this issue :) On the dummy snapshot idea, I need to think more. I think that idea has been floated around a few times and it logically makes sense I just don't know all the implications of that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I also couldn't conclude on the implications. All these days, if the snapshot id is -1, we assume that it is an empty table or just create table happened.
We also need a dummy snapshot id for ancestor check to be passed for fast forward operations.
Nessie uses a constant default hash for on empty branch for handling this kind of ancestor problems. Maybe we need to introduce something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had thought of this case where we could introduce a dummy snapshot, but as you mentioned, it did not look like an elegant solution. And with this PR : #7652 , i just basically fall back to the underlying replace operation to throw an appropriate exception.
|
logged the flaky test: #8855 |
...v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/FastForwardBranchProcedure.java
Outdated
Show resolved
Hide resolved
8fb30d0 to
bcd7722
Compare
...v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/FastForwardBranchProcedure.java
Outdated
Show resolved
Hide resolved
...v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/FastForwardBranchProcedure.java
Outdated
Show resolved
Hide resolved
nastra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall LGTM once checkstyle/CI is happy. I would probably also use fast-forward in the error msg
bcd7722 to
cb6a841
Compare
|
Thanks for the reviews @nastra @rakesh-das08 @ajantha-bhat @aokolnychyi ! I'm going to merge this to keep the fix focused on the procedure output for non-main branches. There are still 2 pending related items: 1.) Fixing the naming in the replace/fastForward APIs: That PR is here https://github.com/apache/iceberg/pull/9134/files For 2, I do agree it's an awkward experience that should be fixed (after all many folks would write to audit first before the existence of main). I noticed that recently there was a |
Currently the output of the Spark fast forward procedure always outputs
currentSnapshotbefore and after the fast forward operation. However, this is not correct in case the branch being fast forwarded is not main. This change updates the output to always output the before and after of the actual branch being fast forwarded.