-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Core: Assign main branch to table's current snapshot if there is no main but there is a current table state #4922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5a661d8 to
744bd4f
Compare
|
Tests are failing, definitely related to this change. Looking into it. |
ef609f5 to
b3af0ad
Compare
| int numRetries) { | ||
| refreshFromMetadataLocation(newLocation, shouldRetry, numRetries, | ||
| metadataLocation -> TableMetadataParser.read(io(), metadataLocation)); | ||
| metadataLocation -> TableMetadata.buildFromLocation(io(), metadataLocation).build()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know about this, here we parse the metadata and then again copy over the structures in the builder. But ultimately when we read, we need to go through a single point in the builder for setting main if it doesn't exist and there is a current snapshot. I don't think just setting the ref when parsing would work because ultimately it needs to be encapsulated in an metadata update even for operations which don't produce snapshots. @rdblue thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should change the metadata after reading it, so it is probably best to fix this up in the TableMetadata constructor rather than in the builder, which is used to produce new TableMetadata objects.
16bfd7d to
c4de98f
Compare
| previousFiles, previousFileLocation, base.lastUpdatedMillis(), properties); | ||
| List<HistoryEntry> newSnapshotLog = updateSnapshotLog(snapshotLog, snapshotsById, currentSnapshotId, changes); | ||
|
|
||
| if (refs.isEmpty() && currentSnapshotId != -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add tests for this if we deem setting main here is the right way to go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
… there is a current snapshot
c4de98f to
946b317
Compare
|
Is there any update here? |
|
Reopening this. I accidentally closed it through a different PR. |
|
This PR is no longer needed due to #5669 which already includes the change here |
As part of https://github.com/apache/iceberg/pull/4428/files#r884916051 and other PRs, a common theme has been around checking if main even exists in different API implementations. In this PR, main will always be set in a table metadata's refs. If refs don't exist or the main ref does not exist for some reason when parsing, it will be set to the table's current snapshot.
This change also allows -1 to be set on different table metadata builder APIs to allow main to be created when there is no current table snapshot.
I'm still mulling over the implications of this change, the benefit is that this change allows different operations to be implemented with the assumption that main exists. However, I think the real long term solution is that we produce a snapshot upon table creation and assign main at that point. In this change, we only allow -1 for main, but this prevents the ability to create a branch before a snapshot is produced on main, which is a bit awkward but I think is needed until we produce a snapshot upon table creation.
@rdblue @jackye1995 let me know your thoughts! If we determine that it makes more sense to just explicitly handle the non-existing main branch cases, then I will just close this PR.