-
Notifications
You must be signed in to change notification settings - Fork 3k
Spark 3.3 write to branch snapshot #6651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
9e8bf34
ee4cadb
3225506
e1dfa45
58b4bf2
8677134
af17f25
7642b9e
da9dcc0
ca8e1ff
2e4eefe
de20c76
85d7475
bbf57e3
0e081e1
51b1052
aa42e2e
03c962d
bed5ec3
332064e
6ef5f4e
8ecfdcd
6b8f954
f8b34bd
a8a5d89
7ee1689
64db07e
1b2cd5a
4c94693
2f3d6e1
9bbed3a
51a29b3
b2692fe
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -247,9 +247,6 @@ public ScanBuilder newScanBuilder(CaseInsensitiveStringMap options) { | |||||||
|
|
||||||||
| @Override | ||||||||
| public WriteBuilder newWriteBuilder(LogicalWriteInfo info) { | ||||||||
| Preconditions.checkArgument( | ||||||||
| snapshotId == null, "Cannot write to table at a specific snapshot: %s", snapshotId); | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this no longer valid? I think that we do not want to write to a specific snapshot. Is branch somehow passed as the snapshot ID?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After looking into this a bit more, I think this is incorrect. The
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rdblue Can we add more checks that if the snapshot Id is the tip of the branch, then writing to branch is supported ? I believe when we do We are calling When passing the snapshotId() is getting set
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like this isn't an issue. I reverted this change and ran
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rdblue @amogh-jahagirdar if bug fix for read by snapshot ref gets merged #6717, then write to branch snapshot will fail as per test TestDeleteFrom.java That's because of the above condition. If feel we have to tweak the condition if this is going to be there.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually it seems the issue is that
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @namrathamyske Yeah just updated to use the name
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But i think we can't disregard calling loadTable wrt to ref passed. Later in future when we implement session configs for testing iceberg/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java Line 260 in 32a8ef5
iceberg/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java Line 424 in 32a8ef5
iceberg/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java Line 393 in 32a8ef5
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point @namrathamyske , I was a bit short sighted we actually do want to leverage the statistics for the specific snapshot for writes. These statistics would be used during the scan itself (for example MERGE INTO branch) . So either we 1.) seek a good way to differentiate between a time travel query where the write shouldn't be able to be applied and an intentional write on a branch or 2.) we just relax the check that a snapshot is set as you did earlier.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rdblue @amogh-jahagirdar @jackye1995 this is still an open item for this PR get merged. I would prefer to go with second option. But let me know otherwise! |
||||||||
|
|
||||||||
namrathamyske marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
| return new SparkWriteBuilder(sparkSession(), icebergTable, info); | ||||||||
| } | ||||||||
|
|
||||||||
|
|
||||||||
Uh oh!
There was an error while loading. Please reload this page.