-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Core, API: BaseRowDelta, BaseOverwrite,BaseReplacePartitions, BaseRewrite to branch Impl #5234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Append changes to snapshot branch - changes after review comments remove space Adding support methods for supporting performing append/delete on branch Removing spaces Removing spaces Removing unwanted vars Adding check for null and invalid branch Adding check for null and invalid branch
|
@amogh-jahagirdar @rdblue As per #4926 (comment) I have changed validation checks when we commit to a branch. Right now I have implemented for BaseRowDelta usecase. My future plan is to extend for others. My thought process is every-time we check for validations, the start snapshot should start from an ancestor of the branch. If not we can throw a validation exception. Start snapshot should ideally never go before the snapshot with which branch was created. But we don't store the commit for which the branch was created in metadata. So i am quite confused how to go ahead. Any thoughts appreciated! |
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, I forgot you had this draft PR already so I think we can just continue on here. I closed out my PR #5595
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
Outdated
Show resolved
Hide resolved
|
In PR #5618 MergeAppend and TableTestBase commit and apply methods are implemented. Will make use of those methods in RowDelta after its merged. |
|
@rdblue Appreciate your feedback on this! |
| Snapshot parent) { | ||
| // if there is no current table state, no files have been added | ||
| if (base.currentSnapshot() == null) { | ||
| if (parent == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, missed this when I raised the change to your branch. I think we should update this comment to say "If there is no parent, no files have been added"
.palantir/revapi.yml
Outdated
| \ T) throws java.io.IOException, com.fasterxml.jackson.core.JacksonException\ | ||
| \ @ org.apache.iceberg.rest.RESTSerializers.UpdateRequirementDeserializer" | ||
| justification: "False positive - JacksonException is a subclass of IOException" | ||
| - code: "java.method.inheritedMovedToClass" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the review is closer to approval, we should see how many of these breaking changes can be solved. For example, this one about moving the method we may be able to take care of by overriding rather than replacing the method.
| for (DataFile file : expectedFiles) { | ||
| expectedFilePaths.add(file.path()); | ||
| } | ||
| Set<CharSequence> actualFilePaths = Sets.newHashSet(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: block spacing.
| currentMetadata, startingSnapshotId, Expressions.alwaysTrue(), snapshot); | ||
| } else { | ||
| validateAddedDataFiles(currentMetadata, startingSnapshotId, replacedPartitions); | ||
| validateAddedDataFiles(currentMetadata, startingSnapshotId, replacedPartitions, snapshot); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: I think it makes more sense to use parent rather than snapshot for the parent snapshot variable. Not a blocker, but it makes it easier to understand what's going on.
| "Snapshot %s is not an ancestor of %s", | ||
| startingSnapshotId, | ||
| parent.snapshotId()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: whitespace after this block.
| if (startingSnapshotId != null) { | ||
| Preconditions.checkArgument( | ||
| SnapshotUtil.isAncestorOf(parent.snapshotId(), startingSnapshotId, base::snapshot), | ||
| "Snapshot %s is not an ancestor of %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a bit more context would be helpful if you ever encounter this error. "Cannot validate changes: starting snapshot %s is not an ancestor of parent snapshot %s (branch %s)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amogh-jahagirdar Is this ancestor check even required anymore ? We are anyway doing ancestor check in
| ValidationException.check( |
| TableMetadata base = TestTables.readMetadata(TABLE_NAME); | ||
| long baseId = base.currentSnapshot().snapshotId(); | ||
| long baseId = | ||
| latestSnapshot(base, branch) == null ? -1 : latestSnapshot(base, branch).snapshotId(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: prefer to keep ternary expressions as simple as possible. If you're calling a method twice with the same arguments, you generally want to use a variable:
Snapshot latest = latestSnapshot(base, branch);
long baseId = latest == null ? -1 : latest.snapshotId();It's also a bit odd that the current snapshot is always defined, but here we handle the case where the branch does not yet exist. Why is that? Are we committing to main and then implicitly branching somewhere?
| .newOverwrite() | ||
| .overwriteByRowFilter(equal("date", "2018-06-08")) | ||
| .addFile(FILE_10_TO_14), | ||
| branch); // in 2018-06-09, NOT in 2018-06-08 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Can the comment remain on the FILE_10_TO_14 line?
| table.updateProperties().set(TableProperties.MANIFEST_MIN_MERGE_COUNT, "1").commit(); | ||
|
|
||
| table.newFastAppend().appendFile(FILE_A).appendFile(FILE_B).commit(); | ||
| table.newFastAppend().appendFile(FILE_A).appendFile(FILE_B).toBranch(branch).commit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the commit method in these tests?
| table.newRowDelta().addRows(FILE_A).addDeletes(FILE_A_DELETES).addDeletes(FILE_B_DELETES); | ||
|
|
||
| Snapshot snap = table.currentSnapshot(); | ||
| commit(table, rowDelta, branch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Odd spacing here. I'd expect the commit to come after rowDelta insteadd of being combined with the latestSnapshot call.
rdblue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@namrathamyske and @amogh-jahagirdar, this is looking great! There are no major blockers, just minor comments. I think that means we should move forward with fixing the revapi failures (add back methods and deprecate them). After that, we can get this in.
Thanks!
|
Thanks for the reviews @rdblue ! @namrathamyske I raised a PR to your branch for adding back the old validation methods/marking them as deprecated and updating rev API namrathamyske#21. When you get a chance please take a look! |
Core: Deprecate validation methods which don't accept parent snapshot…
|
Thanks, @namrathamyske and @amogh-jahagirdar! Great to have this in. |
|
Thanks for the reviews @rdblue! and thanks @namrathamyske for the contribution! |
issue addressed from: #3896
PR to address: #4926 (comment)
This PR implements the performing RowDelta operations on a particular branch ref. Throws unsupported operations with other operations.
PR continued from:
#4926