-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Spark 3.3: support version travel by reference name #6575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -37,6 +37,7 @@ | |
| import org.apache.iceberg.HasTableOperations; | ||
| import org.apache.iceberg.MetadataTableType; | ||
| import org.apache.iceberg.Schema; | ||
| import org.apache.iceberg.SnapshotRef; | ||
| import org.apache.iceberg.Transaction; | ||
| import org.apache.iceberg.catalog.Catalog; | ||
| import org.apache.iceberg.catalog.Namespace; | ||
|
|
@@ -159,7 +160,15 @@ public Table loadTable(Identifier ident, String version) throws NoSuchTableExcep | |
| sparkTable.snapshotId() == null, | ||
| "Cannot do time-travel based on both table identifier and AS OF"); | ||
|
|
||
| return sparkTable.copyWithSnapshotId(Long.parseLong(version)); | ||
| try { | ||
| return sparkTable.copyWithSnapshotId(Long.parseLong(version)); | ||
| } catch (NumberFormatException e) { | ||
| SnapshotRef ref = sparkTable.table().refs().get(version); | ||
| ValidationException.check( | ||
| ref != null, | ||
| "Cannot find matching snapshot ID or reference name for version " + version); | ||
| return sparkTable.copyWithSnapshotId(ref.snapshotId()); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It always use the latest commit from the reference. So along with existing of But whether to use '@' or some other syntax is an open point for a long time which @rdblue wanted to conclude. Nessie SQL syntax for reference:
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Never mind.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think one thing that might be useful is to time travel in a branch, something like |
||
| } | ||
|
|
||
| } else if (table instanceof SparkChangelogTable) { | ||
| throw new UnsupportedOperationException("AS OF is not supported for changelogs"); | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this can come up, but do we allow version tags to be SnapshotIds?
Like can I tag snapshot 2 to be known as 1?
Weird edge case so I don't think we really need to handle it, just thinking if this is a potential issue with the lookup code here
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently there's no restrictions on what references can be named. For the lookup code, I think we should always be able to differentiate between snapshot ID and ref since for refs it will be in a quoted identifier, and should always fail the Long.parseLong() with a NumberParseException. So the current implementation seems good to me.
But that's just me reading the code :), I think it's worth having a unit test just for this case to give us that confidence that it works as expected in this scenario. cc @jackye1995 let me know your thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test case specifically for this. Unlike Trino, Spark directly ignores the type of the
VERSION AS OF, so if a tag name matches exactly the snapshot ID, then snapshot ID is always chosen.I think this is a okay limitation, because people can work around it by adding some text like
snapshot-123456890as the tag name. But we should make it very clear in documentation.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I don't want this to be a blocker, just something to take note of.