Skip to content

Cleanup metadata file when commitNewTable fails for the Iceberg table#14869

Merged
ebyhr merged 1 commit intotrinodb:masterfrom
krvikash:fix-iceberg-metadata-cleanup
Feb 16, 2023
Merged

Cleanup metadata file when commitNewTable fails for the Iceberg table#14869
ebyhr merged 1 commit intotrinodb:masterfrom
krvikash:fix-iceberg-metadata-cleanup

Conversation

@krvikash
Copy link
Contributor

@krvikash krvikash commented Nov 2, 2022

Description

Fixes #14798

Iceberg connector creates a new metadata file when we do DDL/DML operation. In such case, if the operation gets fails then the metadata file does not get cleanup. Metastore Table Operation can fail in various cases like permission denied, and not having valid credentials. This fix will cleanup the metadata file if gets it is created.

The fix is inspired from https://github.com/apache/iceberg/blob/3cddc9f28c93b9231060ecb6b90e2d524bd5d160/aws/src/main/java/org/apache/iceberg/aws/glue/GlueTableOperations.java#L142

Non-technical explanation

NA

Release notes

(X) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Nov 2, 2022
@krvikash
Copy link
Contributor Author

krvikash commented Nov 2, 2022

TODO Add test cases.

@krvikash krvikash force-pushed the fix-iceberg-metadata-cleanup branch from ce68802 to 8106393 Compare November 2, 2022 19:24
@krvikash krvikash changed the title Fix metadata file cleanup in case of failure Cleanup metadata file when commit fails for the Iceberg table Nov 2, 2022
@krvikash krvikash force-pushed the fix-iceberg-metadata-cleanup branch 2 times, most recently from 4d110da to d2e44a1 Compare November 3, 2022 06:50
@krvikash krvikash changed the title Cleanup metadata file when commit fails for the Iceberg table Cleanup metadata file when commitNewTable fails for the Iceberg table Nov 3, 2022
@krvikash krvikash force-pushed the fix-iceberg-metadata-cleanup branch from d2e44a1 to bb34560 Compare November 3, 2022 06:57
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a local File system, This operation does not delete the parent folder.

@krvikash krvikash self-assigned this Nov 7, 2022
@krvikash krvikash force-pushed the fix-iceberg-metadata-cleanup branch 3 times, most recently from cabcba5 to f1d331c Compare November 17, 2022 08:01
@krvikash krvikash changed the title Cleanup metadata file when commitNewTable fails for the Iceberg table WIP: Cleanup metadata file when commitNewTable fails for the Iceberg table Dec 22, 2022
@krvikash krvikash force-pushed the fix-iceberg-metadata-cleanup branch 3 times, most recently from 0150fee to 95036ab Compare December 27, 2022 13:29
@krvikash krvikash marked this pull request as ready for review December 27, 2022 14:02
@krvikash krvikash added the no-release-notes This pull request does not require release notes entry label Dec 27, 2022
@krvikash krvikash changed the title WIP: Cleanup metadata file when commitNewTable fails for the Iceberg table Cleanup metadata file when commitNewTable fails for the Iceberg table Dec 27, 2022
@krvikash krvikash force-pushed the fix-iceberg-metadata-cleanup branch 3 times, most recently from e90cc3a to c287c6a Compare December 29, 2022 13:26
@krvikash krvikash requested a review from homar December 29, 2022 14:15
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why atomic when it runs in one thread?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krvikash krvikash force-pushed the fix-iceberg-metadata-cleanup branch from c287c6a to 4919a81 Compare December 29, 2022 14:32
@krvikash krvikash force-pushed the fix-iceberg-metadata-cleanup branch 2 times, most recently from 5e5a1ad to 2faf497 Compare January 18, 2023 12:40
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why UnsupportedOperationException?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking for createTable implementation and the exceptions thrown by them. InMemoryThriftMetastore#createTable throw UnsupportedOperationException.

But now I realized that InMemoryThriftMetastore is implemented for test cases. So UnsupportedOperationException check is needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why single threaded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. It was not needed earlier. But now it's needed with the latest change (// testException is a shared mutable state).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why single threaded?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why single threaded?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't need to be a base class with two subclasses.

  • one option is to just move QueryRunner setup to a test method (or helper of a test method) and have ordinary test class with two test methods
    • clean
    • the downside is that you pay query runner setup cost twice
  • another option is to just have a query runner and a test instance field (eg AtomicReference) which determines what exception to throw
    • this would be very similar to the code you have
    • a bit less clean as you have a mutable field
    • but you pay query runner setup cost once only
    • this is the option i'd implement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the 2nd option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure we need that test, since file metastore is just a testing utliity.
if we use file metastore as an approximation of HMS, that's also fine, but let's have a code comment explaining that

@findinpath
Copy link
Contributor

@krvikash please rebase & resolve the conflicts.

@krvikash krvikash force-pushed the fix-iceberg-metadata-cleanup branch from 915e512 to bdf2fea Compare February 2, 2023 12:05
@krvikash
Copy link
Contributor Author

krvikash commented Feb 2, 2023

Rebased with master.

@krvikash
Copy link
Contributor Author

Hi @ebyhr, when you get time could you please review this PR?

@krvikash krvikash force-pushed the fix-iceberg-metadata-cleanup branch from bdf2fea to 311ae6e Compare February 14, 2023 06:32
@krvikash
Copy link
Contributor Author

Thanks, @ebyhr for the review. I have addressed the comments and updated the PR.

@ebyhr
Copy link
Member

ebyhr commented Feb 14, 2023

/test-with-secrets sha=311ae6e98b205d8d9eea761fd75b0b4d1cd375d7

@ebyhr ebyhr force-pushed the fix-iceberg-metadata-cleanup branch from 311ae6e to d657166 Compare February 15, 2023 05:00
@ebyhr
Copy link
Member

ebyhr commented Feb 15, 2023

I pushed a small fix to rename test classes:

  • TestIcebergFileCreateTableFailureTestTestIcebergFileMetastoreCreateTableFailure
  • TestIcebergGlueCreateTableFailureTestTestIcebergGlueCreateTableFailure

@ebyhr ebyhr force-pushed the fix-iceberg-metadata-cleanup branch from d657166 to b36e2aa Compare February 15, 2023 05:02
@ebyhr ebyhr force-pushed the fix-iceberg-metadata-cleanup branch from b36e2aa to b8dd25d Compare February 15, 2023 05:02
@ebyhr ebyhr merged commit 666dc6f into trinodb:master Feb 16, 2023
@github-actions github-actions bot added this to the 407 milestone Feb 16, 2023
@krvikash krvikash deleted the fix-iceberg-metadata-cleanup branch February 16, 2023 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed no-release-notes This pull request does not require release notes entry

Development

Successfully merging this pull request may close these issues.

Cleanup metadata file when commit fails to create Iceberg table

5 participants