Skip to content

Conversation

@amogh-jahagirdar
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar commented Aug 23, 2022

I realized as I was implementing some of the snapshot producing operations for branches, that one operation that could be handled in a straightforward manner was delete files. It also helps with writing some of the test cases for the implementation of branch operations because for some of those tests we want to delete files on a given branch. So separating out this PR.

cc: @rdblue @jackye1995 @namrathamyske

@amogh-jahagirdar amogh-jahagirdar marked this pull request as ready for review August 23, 2022 06:28
@github-actions github-actions bot added the core label Aug 23, 2022
Assert.assertEquals("Should have 1 manifest", 1, deleteSnapshot.allManifests(FILE_IO).size());
validateManifestEntries(
deleteSnapshot.allManifests(FILE_IO).get(0),
ids(initialSnapshot.snapshotId(), deleteSnapshot.snapshotId()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In test cases initialSnapshot is always behind deleteSnapshot. Should there be a check when it's ahead of deleteSnapshot ? Not sure if it's necessary.

Copy link
Contributor Author

@amogh-jahagirdar amogh-jahagirdar Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry not sure what you mean by "ahead" and "behind"? Do you mean if we should validate the state of main after the branch writes?

In that case, it makes sense but I don't think its particularly necessary because the existing tests should make sure we're committing snapshots to the correct branch in the first place meaning the existing tests would guarantee us that the operations we perform are done on the correct lineage, and existing lineages would be untouched. That being said, no harm in adding more validations! @jackye1995 @rdblue any thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes agree with what Amogh says, but we can always add a test to verify the behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these tests are specific enough that we don't need to validate the order of commits. Entries with Status.DELETED will only appear in a single manifest and are dropped if the manifest is rewritten. So validating DELETE entries can only succeed if the delete operation produced the manifest.

}

@Test
public void testDeleteWithRowFilterWithCombinedPredicatesOnBranch() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the PR for RowDelta, can we parameterize this so that we test all of the existing cases on a branch that is independent from main? Then we only need tests for modifications to both main and a separate branch (to validate that they evolve independently).

We'll also need to update snapshot expiration tests to ensure that file deletion happens as expected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've parameterized the tests, it does seem like for meaningful tests to be written here we also need to have an implementation for MergeAppend. So I added that.

@amogh-jahagirdar amogh-jahagirdar force-pushed the branch-deletes branch 5 times, most recently from 7ac2d2f to fe82186 Compare August 27, 2022 06:38
@amogh-jahagirdar amogh-jahagirdar changed the title Support performing delete files on branches Support performing merge appends and delete files on branches Aug 27, 2022
@amogh-jahagirdar
Copy link
Contributor Author

The tests for delete files depend on merge appends for some test cases, and vice versa. So to have meaningful branch write tests, I've added the implementation for merge append as well and updated all the test cases to be parameterized so existing cases work for main and branch cases. I still need to follow up on updated expiration tests and then the tests which can test evolution of different branches independently.

public static Object[] parameters() {
return new Object[] {1, 2};
return new Object[][] {
new Object[] {1, "main"},
Copy link
Contributor

@namrathamyske namrathamyske Aug 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am little confused regarding parameters. Does each {1, "testBranch"} translate to {formatVersion, branchname}?.

Copy link
Contributor Author

@amogh-jahagirdar amogh-jahagirdar Aug 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, we test for every combination of format version + branch. I'll need to update the "Parameters" annotation so that we can get nicer test names. The branch parameter gets injected as an argument in the test class constructor.

Comment on lines 352 to 372
Snapshot commit(Table tbl, SnapshotUpdate snapshotUpdate, String branch) {
Snapshot snapshot = null;
if (branch.equals(SnapshotRef.MAIN_BRANCH)) {
snapshotUpdate.commit();
snapshot = tbl.currentSnapshot();
} else {
((SnapshotProducer) snapshotUpdate.toBranch(branch)).commit();
snapshot = tbl.snapshot(tbl.refs().get(branch).snapshotId());
}
return snapshot;
}

Snapshot apply(SnapshotUpdate snapshotUpdate, String branch) {
if (branch.equals(SnapshotRef.MAIN_BRANCH)) {
return ((SnapshotProducer) snapshotUpdate).apply();
} else {
return ((SnapshotProducer) snapshotUpdate.toBranch(branch)).apply();
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could combine these into a common method apply(Table tl, SnapshotUpdate snapshotUpdate, String branch, boolean commit) and then only commit if commit is true. but for now opting to keep it explicit and separate, unless there's any objections

@amogh-jahagirdar amogh-jahagirdar force-pushed the branch-deletes branch 2 times, most recently from 8e9eda7 to 3507e4f Compare August 29, 2022 04:37
@amogh-jahagirdar amogh-jahagirdar force-pushed the branch-deletes branch 3 times, most recently from a728c40 to 2f828a4 Compare October 16, 2022 04:37
snapshotUpdate.commit();
snapshot = tbl.currentSnapshot();
} else {
((SnapshotProducer) snapshotUpdate.toBranch(branch)).commit();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just pass a SnapshotProducer to this method? That would eliminate casting right?

Copy link
Contributor Author

@amogh-jahagirdar amogh-jahagirdar Oct 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AppendFiles doesn't extend SnapshotProducer, so the casting would just have to shift to the caller.

}
}

Snapshot currentSnapshot(Table tbl, String branch) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about latest rather than current? I think that describes it better since current has a specific meaning (the latest on the main branch).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to latest

Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amogh-jahagirdar, overall this is really close. I think that all the changes are correct and it does exactly the right thing to test branch commits by running all of the existing tests on both main and another branch. Nice work!

There are a few minor things to fix, but we should be able to get this in pretty quickly!

@amogh-jahagirdar amogh-jahagirdar force-pushed the branch-deletes branch 5 times, most recently from 261722a to f8928d8 Compare October 18, 2022 00:00
@jackye1995
Copy link
Contributor

Since @rdblue also approved I will go ahead and merge this, thanks for the contribution @amogh-jahagirdar! And thanks for the review @rdblue !

@jackye1995 jackye1995 merged commit 88a6e4e into apache:master Oct 18, 2022
@amogh-jahagirdar
Copy link
Contributor Author

amogh-jahagirdar commented Oct 18, 2022

Thanks for the review @rdblue @jackye1995 @namrathamyske !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants