-
Notifications
You must be signed in to change notification settings - Fork 3k
Spark: REPLACE BRANCH SQL implementation #6638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| | ALTER TABLE multipartIdentifier CREATE BRANCH identifier (AS OF VERSION snapshotId)? (RETAIN snapshotRefRetain snapshotRefRetainTimeUnit)? (snapshotRetentionClause)? #createBranch | ||
| | ALTER TABLE multipartIdentifier REPLACE BRANCH identifier (AS OF VERSION snapshotId)? (RETAIN snapshotRefRetain snapshotRefRetainTimeUnit)? (snapshotRetentionClause)? #replaceBranch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After writing this out, starting to change my mind :) maybe we combine {CREATE | REPLACE} in the same definition if that's possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I think in this case there are a few arguments for combining it:
- syntax is exactly the same, and adding replace does not complicate the logic too much
- CREATE/REPLACE table follows a similar pattern
- At API level, we also have similar thing like
createOrReplaceTableTransaction
....3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestReplaceBranch.java
Show resolved
Hide resolved
f1ca19f to
78b000f
Compare
...ensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ReplaceBranchExec.scala
Outdated
Show resolved
Hide resolved
78b000f to
3320eb8
Compare
...sions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4
Show resolved
Hide resolved
...src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateOrReplaceBranchExec.scala
Outdated
Show resolved
Hide resolved
....3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestReplaceBranch.java
Show resolved
Hide resolved
....3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestReplaceBranch.java
Outdated
Show resolved
Hide resolved
...sions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4
Outdated
Show resolved
Hide resolved
71206d9 to
245cf1b
Compare
....3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestReplaceBranch.java
Outdated
Show resolved
Hide resolved
...src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateOrReplaceBranchExec.scala
Outdated
Show resolved
Hide resolved
...3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestCreateBranch.java
Outdated
Show resolved
Hide resolved
245cf1b to
4c22fb6
Compare
| return Nil | ||
| } | ||
|
|
||
| manageSnapshots.createBranch(branch, snapshotId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like we are implicitly letting the API throw exception for branch already exists. It's fine to me, but would like to know what other people think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, it's implicits since the API provides those guarantees. We could also throw our own before calling createBranch but, I'll let @hililiwei @yyanyy @rdblue provide their thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good to me
jackye1995
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good to me, just 1 doubt regarding create existing branch. @hililiwei @yyanyy @rdblue any thoughts?
4c22fb6 to
3f663fb
Compare
....3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestReplaceBranch.java
Outdated
Show resolved
Hide resolved
...src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateOrReplaceBranchExec.scala
Outdated
Show resolved
Hide resolved
flyrain
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for working on it.
Not related to this PR directly, neither a blocker. Do we need to delete a branch? I didn't go through all design details, but I assume this is how a branch is deleted now. We relies on the retain to keep the branch. A branch will be clean up by procedure expireSnapshots once it expired. Could you confirm?
Co-authored-by: liliwei [email protected] Co-authored-by: xuwei [email protected] Co-authored-by: chidayong [email protected]
3f663fb to
59e5c7d
Compare
|
Thanks for the review @flyrain really appreciate it! So there are a few operations: 1.) replaceBranch: Replace branch will change the snapshot that the branch points to. Nothing will be removed here. The same principle applies for replaceTag. 2.) The removeBranch API (DROP BRANCH in SQL): This will remove the branch from the references in metadata. When this is committed it's an immediate removal of the reference. Correct that a branch will be cleaned up during snapshot expiration if it is past the reference retention age I'm also working on a PR for a dedicated branching/tagging doc page which has some details there. #6723 appreciate any feedback on this as well if anything is unclear! |
|
Looks like we have enough votes and all comments are addressed. I will go ahead to merge this, and we can address further comments in subsequent PRs like #6637 Thanks @amogh-jahagirdar and @hililiwei for the work, thanks @yyanyy and @flyrain for the review! |
|
Thanks for the reviews @flyrain @jackye1995 @yyanyy @hililiwei! |
Co-authored-by: liliwei [email protected] Co-authored-by: xuwei [email protected] Co-authored-by: chidayong [email protected]
Co-authored-by: liliwei [email protected]
Co-authored-by: xuwei [email protected]
Co-authored-by: chidayong [email protected]
CC: @jackye1995 @hililiwei @flyrain