-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Revert #2960 and commit no-op partition replacement operations #3043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert #2960 and commit no-op partition replacement operations #3043
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the consensus in Slack was that we don't want this to be the default behavior. We only want to allow empty commits if a flag is set, otherwise we want it to be a no-op.
So we need to add a new table option, and check it on whether or not to make the commit in the MergingSnapshotProducer apply method.
What would you suggest as the name of this configuration option? How about one of the following?
Having the verb first is different than an existing option that has the affected entity followed by the action, i.e. noun and then verb: iceberg/core/src/main/java/org/apache/iceberg/TableProperties.java Lines 83 to 84 in 5f90476
Some other options:
|
|
Allow isn't a good term because it implies failure if something is not allowed, not skipping. Skip and omit are okay, but I think that we want the default to be false so that adding the setting is positive: keep empty commits vs skip empty commits. That helps the default seem less surprising. So what I'm leaning toward is |
|
I don't think we need "enabled" but I'm fine with anything really.
…On Sun, Sep 12, 2021 at 11:44 AM Ryan Blue ***@***.***> wrote:
Allow isn't a good term because it implies failure if something is not
allowed, not skipping. Skip and omit are okay, but I think that we want the
default to be false so that adding the setting is positive: keep empty
commits vs skip empty commits. That helps the default seem less surprising.
So what I'm leaning toward is commit.keep-empty.enabled. Does that sound
alright to everyone?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3043 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADE2YI7MRQ4LQ3MN67IU5LUBTKHVANCNFSM5C6NFAYQ>
.
|
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
Summary
Partially revert e4df91e (from #2960) and allow a no-op partition replacement operation to be committed.
Motivation
#2895 encountered an exception when attempting to insert overwrite with an empty dataset from Spark.
#2960 addressed the issue above by skipping the commit operation entirely (in both Spark 2 and Spark 3).
However, we need to be able to differentiate between a no-op commit vs. a lack of attempt to commit.
Concretely, we have scheduled Spark pipelines that use Iceberg metadata to track commits and read targeted Iceberg snapshots. We additionally set some
snapshot-property.<custom key>to externally "name" each snapshot.With #2960, an upstream Spark application skipping a commit would cause the downstream Spark application to fail to find and read the expected Iceberg snapshot by the custom snapshot property.
Testing
The test case introduced by #2960 still passes:
iceberg/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkDataWrite.java
Lines 192 to 233 in 7d6f692
On Spark 2, I've also run an application that saves an empty
Datasetin overwrite mode, resulting in a new but no-op snapshot: