-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Core: Avoid removing reused metadata file when registerTable fails #14083
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Avoid removing reused metadata file when registerTable fails #14083
Conversation
8e7fd48 to
e18363a
Compare
|
I think we had decided we were going to just completely remove the "commit based" register tables to try to avoid this issue. @dramaticlly were you working on that? Did I forget to review it? |
|
@RussellSpitzer thanks for your clarification. Another concern: this seems to be an issue that could lead to data corruption in concurrent environments. So, do we also have any plan to fix it in earlier versions (e.g., 1.7, 1.8), or should users be advised to upgrade to the latest release to avoid this problem? Since in most use cases, data corruption is unacceptable... |
|
I think it's worth trying to fix the bug but we generally do not do backports. The Iceberg philosophy is basically that we only ever are really maintaining the latest release and that if users want backports they can do that in their own branch. So really the question is just which is a better forward looking solution: Change the register table behavior Personally I prefer the first solution because I think we've been making a mistake long term by repurposing the |
@RussellSpitzer Here's the PR but it involves a bit more work as we move away from TableOperation.commit() #12228. For this change, I think there's value in this for existing APIs but we will need tests to cover the scenario so we can avoid breaking this in future, |
|
My main worry here is we are doing another kind of "implicit" hint in the api here. So rather than explicitly telling the api "This is a register table request" we are instead trying to determine that we are doing that by checking whether a new metadata file is written. So are we sure that we are always talking about a "register table" case when newTable is True and metadataLocation is not null? |
e18363a to
9b26456
Compare
9b26456 to
fddd4bd
Compare
|
Thanks @RussellSpitzer for your detailed explanation. I completely understand your preference for the first solution as it seems to be a better long-term solution. I also agree with @dramaticlly's point that there's value in this change for existing APIs. This change is lightweight and doesn't introduce any downsides, but rather makes the code more robust and secure, guarding against the impacts from several unexpected—or even expected—exceptions. What's your opinion? Should I keep going? @dramaticlly I've just added a relevant test case in |
After checking the code, I think this is the case for all catalog types (specifically, all subclasses of |
I don't like having to rely on that mostly because we don't know what other folks who are extending this class are doing. That's why I'm nervous about changing the implementation of existing classes. |
|
@RussellSpitzer thank you so much for taking the time to review this change and share your perspective. I understand your concerns and design preferences—especially the worry that this change might come across as just a hacky fix. I'd like to clarify this issue from a purely design-principle perspective to see if that helps address your doubts. It is the responsibility of the subclasses that extends Furthermore, following your concern, even if a future class (let's call it |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
When executing
registerTable, some catalog types leverage the existing metadata file of the source table to avoid creating new metadata file. Refer to PR #6591. However, in this scenario, if theregisterTableoperation fails during commit(for example, due to a concurrent transaction that has just created a table with the same target name), it would be problematic to delete the metadata file directly. Since the metadata file is reused from the source table, not newly created.This PR fixes such problems, only deleting the metadata file if it is not reused from the source table.