-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Support Iceberg's DROP TABLE for corrupted tables #16674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Iceberg's DROP TABLE for corrupted tables #16674
Conversation
2e44d09 to
bf58ccf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check against an error message is not ideal.
For S3, look into TrinoS3FileSystem whether it would be appropriate to throw a FileNotFoundException in such cases
io.trino.plugin.hive.s3.TrinoS3FileSystem.TrinoS3InputStream#read(long, byte[], int, int)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what was the exception previously raised here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, It was intentional in 1st commit Test Iceberg connector behavior for a corrupted table. The exception thrown is not a TrinoException, that is why I can not use assertQueryFailure and end up using assertThatThrownBy.
However, in the 2nd commit, I replaced assertThatThrownBy with assertQueryFailure.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what was the exception previously raised here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need any changes to the Drop table logic in a catalog implementation?
perhaps, dropping corrupted table could be done separately, as an unregister_table + delete table directory.
if the table is corrupted, we probably should not expect catalog do anything smarter than that
cc @alexjo2144 @findinpath @electrum thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unregister_table + delete table directory.
I like this approach. I was concerned when data files of the Iceberg table may be located in different locations, But when there is a corrupted table (metadata file is missing) then we won't able to perform dropTableData because we won't able to load the table.
So I think it is fine to use unregister_table + delete table directory.. But we have to get the table location from metastore and for that, we have to approach the respective catalog to provide the table location. what if we introduce a new method (dropCorruptedTable, forceDrop ...) in the catalog to drop corrupted tables? thoughts?
// Use the Iceberg routine for dropping the table data because the data files
// of the Iceberg table may be located in different locations
dropTableData(table.io(), table.operations().current());There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps, dropping corrupted table could be done separately, as an unregister_table + delete table directory.
Do we have the certainty in such cases that the directory can be safely removed or should this operation be left to the system administrator?
IMO we are speaking here about a corner case which should probably be handled in a best effort manner.
I'd advocate to simply unregister the table.
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestCorruptedIcebergTable.java
Outdated
Show resolved
Hide resolved
...java/io/trino/plugin/iceberg/catalog/rest/TestIcebergTrinoRestCatalogConnectorSmokeTest.java
Outdated
Show resolved
Hide resolved
...roduct-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java
Outdated
Show resolved
Hide resolved
bf58ccf to
b9afd7d
Compare
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/CorruptedIcebergTableHandle.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps, dropping corrupted table could be done separately, as an unregister_table + delete table directory.
Do we have the certainty in such cases that the directory can be safely removed or should this operation be left to the system administrator?
IMO we are speaking here about a corner case which should probably be handled in a best effort manner.
I'd advocate to simply unregister the table.
6f84d4e to
a55225d
Compare
|
Thanks, @findepi | @findinpath for the reviews. Addressed comments. |
a55225d to
6b746cd
Compare
|
Fixed CI failure |
6b746cd to
4b5a180
Compare
|
Tests are red |
4b5a180 to
b521efa
Compare
|
Fixed CI failure. |
6b2ff5a to
0acb814
Compare
|
Addressed comments. |
0acb814 to
022c934
Compare
|
rebased with master. |
3abd742 to
b7faab2
Compare
|
Fixed CI failure. |
b7faab2 to
907a312
Compare
907a312 to
49da147
Compare
|
rebased with master and resolved conflicts |
...java/io/trino/plugin/iceberg/catalog/rest/TestIcebergTrinoRestCatalogConnectorSmokeTest.java
Outdated
Show resolved
Hide resolved
...java/io/trino/plugin/iceberg/catalog/rest/TestIcebergTrinoRestCatalogConnectorSmokeTest.java
Outdated
Show resolved
Hide resolved
...java/io/trino/plugin/iceberg/catalog/rest/TestIcebergTrinoRestCatalogConnectorSmokeTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestCorruptedIcebergTable.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestCorruptedIcebergTable.java
Outdated
Show resolved
Hide resolved
...no-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/AbstractIcebergTableOperations.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/jdbc/TrinoJdbcCatalog.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorSmokeTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this affect concurrently executing tests (in case they query information_schema.columns)?
i hope not (i'd be a product problem, not a test issue), but let's keep an eye
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that in such cases the columns of the table don't get taken into account while querying information_schema.columns
trino/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Lines 642 to 646 in 0c99af4
| catch (RuntimeException e) { | |
| // Table can be being removed and this may cause all sorts of exceptions. Log, because we're catching broadly. | |
| log.warn(e, "Failed to access metadata of table %s during streaming table columns for %s", tableName, prefix); | |
| return Stream.empty(); | |
| } |
09a5673 to
5710f7d
Compare
|
Addressed comments. |
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestCorruptedIcebergTable.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Piotr Findeisen <[email protected]>
5710f7d to
ea7d71a
Compare
|
Addressed comments. |
| @Override | ||
| public void testCorruptedTableLocation() | ||
| { | ||
| throw new SkipException("Skipping test, This test override will be removed in next commit"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit It's not clear from the exception why the skipping is being done - even if it is temporary.
| } | ||
|
|
||
| @Test | ||
| public void testDropTableWithMissingMetadataFile() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have these tests on both the Iceberg BCT as well as on Iceberg BCST
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BCT should have no less coverage than BCST
we need BCST so that all Catalog impls are exercised.
| } | ||
|
|
||
| @Test | ||
| public void testDropTableWithMissingSnapshotFile() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also make sure that we don't delete on DROP more than the content associated to the corrupted table.
I'm thinking that a test where two tables exist before the drop:
- one table is ok
- one table is corrupt
When dropping the corrupt table, the OK table should still be present in the metastore and the amount of files in the storage (within the test schema) should decrease with only the number of files corresponding to the corrupted table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having the test as part of BCT / BCST provides that. there are other test tables (like nation) which don't get deleted.
findinpath
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % comments
|
Thank you all for reviewing the PR 😊 |
| catch (RuntimeException e) { | ||
| // If the snapshot file is not found, an exception will be thrown by the dropTableData function. | ||
| // So log the exception and continue with deleting the table location | ||
| LOG.warn(e, "Failed to delete table data referenced by metadata"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks suspect to me. If dropTableData failed, then we should surface the failure and print a message advising the user to run unregister_table and fix the problem manually instead of swallowing the error and forcing deletion anyway.
There's no guarantee that deleteTableDirectory in next step won't fail and leave the table in a worse state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point we don't know what state the table data is, so leaving the table in metastore would be also problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have already deleted the table from the metastore before calling dropTableData. After this we're only trying to do a recursive delete of the table location. I don't understand why we do deleteTableDirectory in general, if that was desirable then the iceberg library function itself would do it. Even if we have a great reason for doing it, it seems fishy to ignore the failures in preceding step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n general, if that was desirable then the iceberg library function itself would do it.
there are different opinions about what DROP TABLE behavior should be
Spark has special syntax DROP TABLE PURGE for dropping the table along with data. It maybe perhaps makes sense from Spark perspective where users sometimes operate on raw files buckets
however, from Trino, or generally form SQL perspective, DROP TABLE is the reverse of CREATE TABLE. if CREATE creates table in metastore and on disk, the DROP should clean all these places. by default at least.
what further makes things complicated is that Iceberg defines table storage location but allows the table to contain files from other places. it even allows a table to share storage location with another table. Both these things break table pruning (remove orphan files), and we don't strive to support them.
thus we do drop files in two ways
- drop via library (best effort; in case table has some weird files in some external arbitrary locations)
- actual drop table directory (if this errors, we don't ignore)
in principle, it should be OK not to ignore errors from "drop via library" step, which would basically reintroduce the #12318 problem. We can do so, if we consider that problem as something we don't want to fix. For now, i am convinced it was "an OK problem to fix", and that requires ignoring "drop via library" errors.
is there some particular problem that you think should change how we look at this?
Description
Fixes #12318
Follow up of #16651 for corrupted Iceberg Tables
Release notes
(X) Release notes are required, with the following suggested text: