-
Notifications
You must be signed in to change notification settings - Fork 3k
Spark3.4: Throw a friendly exception if table is empty when creating branch #7593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| if (iceberg.table.currentSnapshot() == null) { | ||
| throw new UnsupportedOperationException(s"The Iceberg table: $iceberg" + " has no snapshots") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code belowiceberg.table.currentSnapshot().snapshotId always assume the currentSnapshot is not null but it's not the valid chaining case.
Probably need a way to probably handle that instead of branching out and throw a new exception. FYI @amogh-jahagirdar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very familiar with design of iceberg snapshots. I found many annoying code snippet snapshot == null , maybe the issue will be gone if every new table has a default snapshot. Just my immature thought. :)
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for identifying this issue and raising this PR @zhangbutao! Have some comments on the fix
| if (iceberg.table.currentSnapshot() == null) { | ||
| throw new UnsupportedOperationException(s"The Iceberg table: $iceberg" + " has no snapshots") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely should throw a better exception here thanks for identifying this issue! Few points:
1.) I don't think the fix is correct since it is a valid case that there is no main branch and thus table.currentSnapshot() is null, and the user specifies an explicit snapshot to create from some other non-main branch. We don't want to throw unnecessarily
So the commit graph looks something like
(empty main table state)
S4 (Branch "b")
/
S1 -> S2 -> S3 (Branch "a")
Creating branch b should still succeed but with this implementation we'll fail unnecessarily.
So in the core library invalid snapshots are prevented here https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/TableMetadata.java#L1181
but this case is a bit different just because in the spark procedure we default to choosing the latest main snapshot if a snapshot is not specified. I think one simple way forward is assign snapshotId to a Long which can be null in case table.currentSnapshot() is null. Then before the API call to create the branch, throw our own exception in case the current snapshot is null.
2.) UnsupportedOperationException is not the right exception imo. I think we should be throwing an IllegalArgumentException to indicate that there is no latest main snapshot, and that the user should specify an explicit snapshot.
3.) Could we just keep the PR focused to 3.4 and also add unit tests to cover this case? We can later backport and address if any other DDLs are impacted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amogh-jahagirdar Thanks for detailed explanation! I might prefer to throw UnsupportedOperationException if user does not specify the snapshot and the main latest snapshot is null.
But if you think it is more reasonable to pass null snapshotId to this API https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/TableMetadata.java#L1181, I'm happy to make further changes.
I have refined the PR. Please take a look again if you have a chance.
dc69de1 to
da9a19a
Compare
dramaticlly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on what @amogh-jahagirdar suggested, looks like we want to let failure/validation happen in java code instead of scala, so suggested change to allow for snapshotId pass through instead of throwing exception.
| var snapshotId = branchOptions.snapshotId.getOrElse(-1L) | ||
| if (snapshotId == -1) { | ||
| val currentSnapshot = Option(iceberg.table().currentSnapshot()).getOrElse(throw new IllegalArgumentException( | ||
| s"Please specify an explicit snapshot as table: $iceberg" + " has no latest main snapshot")) | ||
| snapshotId = currentSnapshot.snapshotId() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| var snapshotId = branchOptions.snapshotId.getOrElse(-1L) | |
| if (snapshotId == -1) { | |
| val currentSnapshot = Option(iceberg.table().currentSnapshot()).getOrElse(throw new IllegalArgumentException( | |
| s"Please specify an explicit snapshot as table: $iceberg" + " has no latest main snapshot")) | |
| snapshotId = currentSnapshot.snapshotId() | |
| } | |
| val snapshotId: java.lang.Long = branchOptions.snapshotId | |
| .orElse(Option(iceberg.table.currentSnapshot()).map(_.snapshotId())) | |
| .map(java.lang.Long.valueOf) | |
| .orNull |
| } | ||
|
|
||
| private Table createEmptyTable() { | ||
| return validationCatalog.loadTable(tableIdent); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not create table but rather load a table from catalog.
I think empty table creation is already handled in before method at line 43 so this method is not needed at all
|
Actually I just realized that we also need to fix for its counterpart in CreateTag, so I ended up just doing it in #7652 |
@dramaticlly Thanks for the fix. I am ok to close this PR if your change is merged. :) |
create a new empty iceberg table using Spark3.3
create table testicespark(id int) using iceberg;create a brancn on the empty table:
alter table hivedblake.testicespark create branch "branch1";NullPointerException
I think if table is empty, we should throw a friendly exception instead of null exception when creating branch.