Skip to content

Conversation

@hantangwangd
Copy link
Contributor

When executing registerTable, some catalog types leverage the existing metadata file of the source table to avoid creating new metadata file. Refer to PR #6591. However, in this scenario, if the registerTable operation fails during commit(for example, due to a concurrent transaction that has just created a table with the same target name), it would be problematic to delete the metadata file directly. Since the metadata file is reused from the source table, not newly created.

This PR fixes such problems, only deleting the metadata file if it is not reused from the source table.

@RussellSpitzer
Copy link
Member

I think we had decided we were going to just completely remove the "commit based" register tables to try to avoid this issue. @dramaticlly were you working on that? Did I forget to review it?

@hantangwangd
Copy link
Contributor Author

@RussellSpitzer thanks for your clarification. Another concern: this seems to be an issue that could lead to data corruption in concurrent environments. So, do we also have any plan to fix it in earlier versions (e.g., 1.7, 1.8), or should users be advised to upgrade to the latest release to avoid this problem? Since in most use cases, data corruption is unacceptable...

@RussellSpitzer
Copy link
Member

I think it's worth trying to fix the bug but we generally do not do backports. The Iceberg philosophy is basically that we only ever are really maintaining the latest release and that if users want backports they can do that in their own branch. So really the question is just which is a better forward looking solution:

Change the register table behavior
Attempt to special case TableOperations Commits where a new metadata file is not created

Personally I prefer the first solution because I think we've been making a mistake long term by repurposing the
commit pathway for registration.

@dramaticlly
Copy link
Contributor

I think we had decided we were going to just completely remove the "commit based" register tables to try to avoid this issue. @dramaticlly were you working on that? Did I forget to review it?

@RussellSpitzer Here's the PR but it involves a bit more work as we move away from TableOperation.commit() #12228.

For this change, I think there's value in this for existing APIs but we will need tests to cover the scenario so we can avoid breaking this in future,

@RussellSpitzer
Copy link
Member

My main worry here is we are doing another kind of "implicit" hint in the api here. So rather than explicitly telling the api "This is a register table request" we are instead trying to determine that we are doing that by checking whether a new metadata file is written.

  protected boolean reuseMetadataLocation(boolean newTable, TableMetadata metadata) {
    return newTable && metadata.metadataFileLocation() != null;
  }

So are we sure that we are always talking about a "register table" case when newTable is True and metadataLocation is not null?

@hantangwangd hantangwangd force-pushed the avoid_remove_reused_metadata_file branch from e18363a to 9b26456 Compare September 17, 2025 05:35
@github-actions github-actions bot added the GCP label Sep 17, 2025
@hantangwangd hantangwangd force-pushed the avoid_remove_reused_metadata_file branch from 9b26456 to fddd4bd Compare September 17, 2025 06:32
@hantangwangd
Copy link
Contributor Author

Thanks @RussellSpitzer for your detailed explanation. I completely understand your preference for the first solution as it seems to be a better long-term solution. I also agree with @dramaticlly's point that there's value in this change for existing APIs. This change is lightweight and doesn't introduce any downsides, but rather makes the code more robust and secure, guarding against the impacts from several unexpected—or even expected—exceptions. What's your opinion? Should I keep going?

@dramaticlly I've just added a relevant test case in TestHiveCatalog using the non-existent target namespace scenario described in #13434. Please take a look when you have a moment. Also, writing a test case for the scenario where a concurrent operation causes a registerTable fail during committing seems a bit tricky. It requires ensuring that the commit operation for creating a table with the same target name happens precisely between the pre-check and the commit phases of the registerTable operation. Do you have any suggestions? Any feedback would be greatly appreciated!

@hantangwangd
Copy link
Contributor Author

So are we sure that we are always talking about a "register table" case when newTable is True and metadataLocation is not null?

After checking the code, I think this is the case for all catalog types (specifically, all subclasses of BaseMetastoreTableOperations except Snowflake) that attempt to reuse the metadataLocation when newTable is true and metadataLocation is not null. Please correct me if I'm wrong about this, but it doesn't seem like these catalogs would ever be in a state where newTable is True and metadataLocation is not null during a create table operation.

@RussellSpitzer
Copy link
Member

So are we sure that we are always talking about a "register table" case when newTable is True and metadataLocation is not null?

After checking the code, I think this is the case for all catalog types (specifically, all subclasses of BaseMetastoreTableOperations except Snowflake) that attempt to reuse the metadataLocation when newTable is true and metadataLocation is not null. Please correct me if I'm wrong about this, but it doesn't seem like these catalogs would ever be in a state where newTable is True and metadataLocation is not null during a create table operation.

I don't like having to rely on that mostly because we don't know what other folks who are extending this class are doing. That's why I'm nervous about changing the implementation of existing classes.

@hantangwangd
Copy link
Contributor Author

@RussellSpitzer thank you so much for taking the time to review this change and share your perspective. I understand your concerns and design preferences—especially the worry that this change might come across as just a hacky fix. I'd like to clarify this issue from a purely design-principle perspective to see if that helps address your doubts. It is the responsibility of the subclasses that extends BaseMetastoreTableOperations to properly implement the doCommit logic, since there is no reusable common logic in the parent class's corresponding method. Specifically, any subclass that attempts to reuse an existing metadata location should avoid to inadvertently delete it. A key principle here is: one has the right and the responsibility to decide whether to delete a metadata file they themselves created, but do not have the responsibility to delete a metadata file they did not create. They lack the necessary context to understand the consequences of deleting these existing metadata files directly.

Furthermore, following your concern, even if a future class (let's call it TestOperations) that extends one of these subclasses (e.g., of HiveTableOperations) wants to inherit or extend its parent's doCommit logic—perhaps to implement a two-phase commit that pre-creates a metadata location—this responsibility principle still holds. In such conditions, we might encounter the case where newTable is true and metadataLocation is not null in HiveTableOperations.doCommit during the table creation operation, because TestOperations pre-created the metadata file on the first phase. Even in this scenario, not deleting that pre-created metadata file in HiveTableOperations.doCommit seems to be safer and more preferable. TestOperations may rely on some underlying system to maintain the tables temporarily created during the first phase, and directly deleting its metadata files could potentially lead to some form of data corruption. So it's TestOperations's responsibility to decide whether it's safe to delete such metadata files directly. Do you think this makes sense? Looking forward to your feedback, thanks!

@github-actions
Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Oct 19, 2025
@github-actions
Copy link

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Oct 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants