Skip to content

Conversation

@amogh-jahagirdar
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar commented May 31, 2022

As part of https://github.com/apache/iceberg/pull/4428/files#r884916051 and other PRs, a common theme has been around checking if main even exists in different API implementations. In this PR, main will always be set in a table metadata's refs. If refs don't exist or the main ref does not exist for some reason when parsing, it will be set to the table's current snapshot.

This change also allows -1 to be set on different table metadata builder APIs to allow main to be created when there is no current table snapshot.

I'm still mulling over the implications of this change, the benefit is that this change allows different operations to be implemented with the assumption that main exists. However, I think the real long term solution is that we produce a snapshot upon table creation and assign main at that point. In this change, we only allow -1 for main, but this prevents the ability to create a branch before a snapshot is produced on main, which is a bit awkward but I think is needed until we produce a snapshot upon table creation.

@rdblue @jackye1995 let me know your thoughts! If we determine that it makes more sense to just explicitly handle the non-existing main branch cases, then I will just close this PR.

@github-actions github-actions bot added the core label May 31, 2022
@amogh-jahagirdar amogh-jahagirdar changed the title Core: Assign main branch to table's current snapshot if main could not be found when parsing. TableMetadata.Builder will allow setting -1 to main Core: Assign main branch to table's current snapshot if main could not be found when parsing May 31, 2022
@amogh-jahagirdar amogh-jahagirdar changed the title Core: Assign main branch to table's current snapshot if main could not be found when parsing Core: Assign main branch to table's current snapshot if main could not be found May 31, 2022
@amogh-jahagirdar
Copy link
Contributor Author

Tests are failing, definitely related to this change. Looking into it.

int numRetries) {
refreshFromMetadataLocation(newLocation, shouldRetry, numRetries,
metadataLocation -> TableMetadataParser.read(io(), metadataLocation));
metadataLocation -> TableMetadata.buildFromLocation(io(), metadataLocation).build());
Copy link
Contributor Author

@amogh-jahagirdar amogh-jahagirdar Jun 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know about this, here we parse the metadata and then again copy over the structures in the builder. But ultimately when we read, we need to go through a single point in the builder for setting main if it doesn't exist and there is a current snapshot. I don't think just setting the ref when parsing would work because ultimately it needs to be encapsulated in an metadata update even for operations which don't produce snapshots. @rdblue thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should change the metadata after reading it, so it is probably best to fix this up in the TableMetadata constructor rather than in the builder, which is used to produce new TableMetadata objects.

@amogh-jahagirdar amogh-jahagirdar force-pushed the set-main-snapshot branch 2 times, most recently from 16bfd7d to c4de98f Compare June 1, 2022 21:02
@amogh-jahagirdar amogh-jahagirdar changed the title Core: Assign main branch to table's current snapshot if main could not be found Core: Assign main branch to table's current snapshot if there is no main but there is a current table state Jun 1, 2022
previousFiles, previousFileLocation, base.lastUpdatedMillis(), properties);
List<HistoryEntry> newSnapshotLog = updateSnapshotLog(snapshotLog, snapshotsById, currentSnapshotId, changes);

if (refs.isEmpty() && currentSnapshotId != -1) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add tests for this if we deem setting main here is the right way to go

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@hililiwei
Copy link
Contributor

Is there any update here?

@rdblue
Copy link
Contributor

rdblue commented Jul 7, 2022

Reopening this. I accidentally closed it through a different PR.

@amogh-jahagirdar
Copy link
Contributor Author

This PR is no longer needed due to #5669 which already includes the change here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants