Skip to content

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented Feb 11, 2022

This updates the TableMetadata.Builder methods and replaces the previous setCurrentSnapshot with a branch-aware setBranchSnapshot.

I'm posting this for early feedback. This is the most straight-forward change, but it doesn't support modifying the ref settings like max age and it doesn't support creating/removing tags. We may want to change the builder API to be more generic and if so this is a step toward that API.

@rdblue
Copy link
Contributor Author

rdblue commented Feb 11, 2022

@amogh-jahagirdar and @jackye1995, this is a first attempt at updating the table metadata builder. It doesn't support everything yet but I want to get this up for early feedback so we can compare the most direct changes with what you were proposing on the other PRs.


class SetCurrentSnapshot implements MetadataUpdate {
private final Long snapshotId;
class RemoveSnapshotRef implements MetadataUpdate {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RemoveSnapshotRef is used for dropping a branch instead of SetCurrentSnapshot(null). Now the snapshot can be required instead of nullable.

if (!snapshotsById.containsKey(currentSnapshotId)) {
setCurrentSnapshot(null, System.currentTimeMillis());

// remove any refs that are no longer valid
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this fail if there are still refs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the current usage, we are only calling this when expiring snapshots. At that time the snapshots send in here should already be the ones that have to be removed after evaluating all retention policies. I think that's why we directly remove the refs instead of failing.

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar Feb 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is in the scope for removeSnapshots(snapshots) to remove the references itself for any snapshots we are removing. Otherwise we would have to have the caller do a separate removeReferences and then do a removeSnapshots which seems more cumbersome.


if (SnapshotRef.MAIN_BRANCH.equals(branch)) {
this.currentSnapshotId = snapshot.snapshotId();
snapshotLog.add(new SnapshotLogEntry(lastUpdatedMillis, snapshot.snapshotId()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The snapshot log is only for main right now. We may want to update and add logs for branches? Probably not for tags because tags are labels that we probably don't track history for (they should be static).

@rdblue rdblue changed the title Replace setCurrentSnapshot with setBranchSnapshot in metadata builder Core: Replace setCurrentSnapshot with setBranchSnapshot in metadata builder Feb 11, 2022
snapshotLog.add(new SnapshotLogEntry(lastUpdatedMillis, snapshot.snapshotId()));
}

SnapshotRef.Builder refBuilder = SnapshotRef.branchBuilder(snapshot.snapshotId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just call SnapshotRef.builderFrom here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that ref may be null. If it is null, then we create a new one with defaults. If it is non-null, then we copy it. I'll make this a bit more clear.

@github-actions github-actions bot added the spark label Feb 14, 2022
TableMetadata.Builder update = TableMetadata.buildFrom(base);
if (base.snapshot(newSnapshot.snapshotId()) != null) {
// this is a rollback operation
update.setBranchSnapshot(newSnapshot.snapshotId(), SnapshotRef.MAIN_BRANCH);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wypoon and @sririshindra, I think this fixes the problem from #4088. The previous call to replaceCurrentSnapshot always used setCurrentSnapshot(Snapshot, String) rather than setCurrentSnapshot(long, String). This detects when the snapshot already exists and calls the correct method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense.
I cherry-picked some prerequisite commits, and then applied this change to our 0.13.0 branch, and reran @sririshindra's rollback test, and it passed.

TestHelpers.assertEqualsSafe(historyTable.schema().asStruct(), expected.get(0), actual.get(0));
TestHelpers.assertEqualsSafe(historyTable.schema().asStruct(), expected.get(1), actual.get(1));
TestHelpers.assertEqualsSafe(historyTable.schema().asStruct(), expected.get(2), actual.get(2));
TestHelpers.assertEqualsSafe(historyTable.schema().asStruct(), expected.get(3), actual.get(3));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was failing, I think due to the problem in #4088. While looking at it, I realized that the last record was not being checked as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong about the cause of the history test failure. It turns out that I called the wrong method from setBranchSnapshot(Snapshot, String).

Copy link
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, just 1 nit comment, up to you if you think it's worth adding a util method for that.

snapshotLog.add(new SnapshotLogEntry(lastUpdatedMillis, snapshot.snapshotId()));
changes.add(new MetadataUpdate.SetCurrentSnapshot(snapshot.snapshotId()));

if (SnapshotRef.MAIN_BRANCH.equals(branch)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the check for main branch is done in many places, should we also have a method for ref.isMainBranch()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem is that there aren't many places where we actually have a SnapshotRef instance. Here, for example, we don't necessarily have one because ref might be null. So we actually want to use the string branch name.

Below, we have a SetSnapshotRef instance, which is a change. And in places like removeBranch, we do the check before loading the ref. If there is no main ref to remove, we should still ensure that we set current-snapshot-id appropriately.

@rdblue rdblue merged commit a3a112d into apache:master Feb 15, 2022
@rdblue
Copy link
Contributor Author

rdblue commented Feb 15, 2022

Thanks for reviewing, @jackye1995 and @amogh-jahagirdar!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants