Skip to content

Conversation

@namrathamyske
Copy link
Contributor

@namrathamyske namrathamyske commented Jul 10, 2022

issue addressed from: #3896

PR to address: #4926 (comment)

This PR implements the performing RowDelta operations on a particular branch ref. Throws unsupported operations with other operations.

PR continued from:
#4926

nkeshavaprakash added 13 commits June 9, 2022 13:22
Append changes to snapshot branch - changes after review comments

remove space

Adding support methods for supporting performing append/delete on branch

Removing spaces

Removing spaces

Removing unwanted vars

Adding check for null and invalid branch

Adding check for null and invalid branch
@namrathamyske
Copy link
Contributor Author

@amogh-jahagirdar @rdblue As per #4926 (comment) I have changed validation checks when we commit to a branch. Right now I have implemented for BaseRowDelta usecase. My future plan is to extend for others.

My thought process is every-time we check for validations, the start snapshot should start from an ancestor of the branch. If not we can throw a validation exception. Start snapshot should ideally never go before the snapshot with which branch was created. But we don't store the commit for which the branch was created in metadata. So i am quite confused how to go ahead. Any thoughts appreciated!

@namrathamyske namrathamyske marked this pull request as ready for review August 23, 2022 06:57
@github-actions github-actions bot removed the API label Aug 23, 2022
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, I forgot you had this draft PR already so I think we can just continue on here. I closed out my PR #5595

@namrathamyske
Copy link
Contributor Author

namrathamyske commented Aug 28, 2022

In PR #5618 MergeAppend and TableTestBase commit and apply methods are implemented. Will make use of those methods in RowDelta after its merged.

@namrathamyske
Copy link
Contributor Author

@rdblue Appreciate your feedback on this!

Snapshot parent) {
// if there is no current table state, no files have been added
if (base.currentSnapshot() == null) {
if (parent == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, missed this when I raised the change to your branch. I think we should update this comment to say "If there is no parent, no files have been added"

\ T) throws java.io.IOException, com.fasterxml.jackson.core.JacksonException\
\ @ org.apache.iceberg.rest.RESTSerializers.UpdateRequirementDeserializer"
justification: "False positive - JacksonException is a subclass of IOException"
- code: "java.method.inheritedMovedToClass"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the review is closer to approval, we should see how many of these breaking changes can be solved. For example, this one about moving the method we may be able to take care of by overriding rather than replacing the method.

for (DataFile file : expectedFiles) {
expectedFilePaths.add(file.path());
}
Set<CharSequence> actualFilePaths = Sets.newHashSet();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: block spacing.

currentMetadata, startingSnapshotId, Expressions.alwaysTrue(), snapshot);
} else {
validateAddedDataFiles(currentMetadata, startingSnapshotId, replacedPartitions);
validateAddedDataFiles(currentMetadata, startingSnapshotId, replacedPartitions, snapshot);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I think it makes more sense to use parent rather than snapshot for the parent snapshot variable. Not a blocker, but it makes it easier to understand what's going on.

"Snapshot %s is not an ancestor of %s",
startingSnapshotId,
parent.snapshotId());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: whitespace after this block.

if (startingSnapshotId != null) {
Preconditions.checkArgument(
SnapshotUtil.isAncestorOf(parent.snapshotId(), startingSnapshotId, base::snapshot),
"Snapshot %s is not an ancestor of %s",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a bit more context would be helpful if you ever encounter this error. "Cannot validate changes: starting snapshot %s is not an ancestor of parent snapshot %s (branch %s)"

Copy link
Contributor Author

@namrathamyske namrathamyske Jan 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amogh-jahagirdar Is this ancestor check even required anymore ? We are anyway doing ancestor check in

TableMetadata base = TestTables.readMetadata(TABLE_NAME);
long baseId = base.currentSnapshot().snapshotId();
long baseId =
latestSnapshot(base, branch) == null ? -1 : latestSnapshot(base, branch).snapshotId();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: prefer to keep ternary expressions as simple as possible. If you're calling a method twice with the same arguments, you generally want to use a variable:

Snapshot latest = latestSnapshot(base, branch);
long baseId = latest == null ? -1 : latest.snapshotId();

It's also a bit odd that the current snapshot is always defined, but here we handle the case where the branch does not yet exist. Why is that? Are we committing to main and then implicitly branching somewhere?

.newOverwrite()
.overwriteByRowFilter(equal("date", "2018-06-08"))
.addFile(FILE_10_TO_14),
branch); // in 2018-06-09, NOT in 2018-06-08
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Can the comment remain on the FILE_10_TO_14 line?

table.updateProperties().set(TableProperties.MANIFEST_MIN_MERGE_COUNT, "1").commit();

table.newFastAppend().appendFile(FILE_A).appendFile(FILE_B).commit();
table.newFastAppend().appendFile(FILE_A).appendFile(FILE_B).toBranch(branch).commit();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the commit method in these tests?

table.newRowDelta().addRows(FILE_A).addDeletes(FILE_A_DELETES).addDeletes(FILE_B_DELETES);

Snapshot snap = table.currentSnapshot();
commit(table, rowDelta, branch);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd spacing here. I'd expect the commit to come after rowDelta insteadd of being combined with the latestSnapshot call.

Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@namrathamyske and @amogh-jahagirdar, this is looking great! There are no major blockers, just minor comments. I think that means we should move forward with fixing the revapi failures (add back methods and deprecate them). After that, we can get this in.

Thanks!

@amogh-jahagirdar
Copy link
Contributor

amogh-jahagirdar commented Jan 21, 2023

Thanks for the reviews @rdblue ! @namrathamyske I raised a PR to your branch for adding back the old validation methods/marking them as deprecated and updating rev API namrathamyske#21. When you get a chance please take a look!

Core: Deprecate validation methods which don't accept parent snapshot…
@rdblue rdblue merged commit eec20ed into apache:master Jan 23, 2023
@rdblue
Copy link
Contributor

rdblue commented Jan 23, 2023

Thanks, @namrathamyske and @amogh-jahagirdar! Great to have this in.

@amogh-jahagirdar
Copy link
Contributor

Thanks for the reviews @rdblue! and thanks @namrathamyske for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants