Skip to content

Conversation

@krvikash
Copy link
Contributor

@krvikash krvikash commented Mar 22, 2023

Description

Fixes #12318

Follow up of #16651 for corrupted Iceberg Tables

Release notes

(X) Release notes are required, with the following suggested text:

# Iceberg
* Support Iceberg's DROP TABLE for corrupted tables. ({issue}`16674`)

@cla-bot cla-bot bot added the cla-signed label Mar 22, 2023
@krvikash krvikash self-assigned this Mar 22, 2023
@github-actions github-actions bot added iceberg Iceberg connector tests:hive labels Mar 22, 2023
@krvikash krvikash force-pushed the support-drop-corrupted-iceberg-table-alternative-2 branch 3 times, most recently from 2e44d09 to bf58ccf Compare March 23, 2023 09:51
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check against an error message is not ideal.
For S3, look into TrinoS3FileSystem whether it would be appropriate to throw a FileNotFoundException in such cases

io.trino.plugin.hive.s3.TrinoS3FileSystem.TrinoS3InputStream#read(long, byte[], int, int)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what was the exception previously raised here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you intentionally using different syntax than in #16651 (5b178ea, Test Delta connector behavior for a corrupted table)?
differences will make it harder to keep the tests aligned, if we want to improve test coverage or just check it's same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, It was intentional in 1st commit Test Iceberg connector behavior for a corrupted table. The exception thrown is not a TrinoException, that is why I can not use assertQueryFailure and end up using assertThatThrownBy.

However, in the 2nd commit, I replaced assertThatThrownBy with assertQueryFailure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what was the exception previously raised here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need any changes to the Drop table logic in a catalog implementation?

perhaps, dropping corrupted table could be done separately, as an unregister_table + delete table directory.
if the table is corrupted, we probably should not expect catalog do anything smarter than that

cc @alexjo2144 @findinpath @electrum thoughts?

Copy link
Contributor Author

@krvikash krvikash Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unregister_table + delete table directory.

I like this approach. I was concerned when data files of the Iceberg table may be located in different locations, But when there is a corrupted table (metadata file is missing) then we won't able to perform dropTableData because we won't able to load the table.

So I think it is fine to use unregister_table + delete table directory.. But we have to get the table location from metastore and for that, we have to approach the respective catalog to provide the table location. what if we introduce a new method (dropCorruptedTable, forceDrop ...) in the catalog to drop corrupted tables? thoughts?

        // Use the Iceberg routine for dropping the table data because the data files
        // of the Iceberg table may be located in different locations
        dropTableData(table.io(), table.operations().current());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps, dropping corrupted table could be done separately, as an unregister_table + delete table directory.

Do we have the certainty in such cases that the directory can be safely removed or should this operation be left to the system administrator?
IMO we are speaking here about a corner case which should probably be handled in a best effort manner.

I'd advocate to simply unregister the table.

@krvikash krvikash force-pushed the support-drop-corrupted-iceberg-table-alternative-2 branch from bf58ccf to b9afd7d Compare March 24, 2023 09:11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps, dropping corrupted table could be done separately, as an unregister_table + delete table directory.

Do we have the certainty in such cases that the directory can be safely removed or should this operation be left to the system administrator?
IMO we are speaking here about a corner case which should probably be handled in a best effort manner.

I'd advocate to simply unregister the table.

@krvikash krvikash force-pushed the support-drop-corrupted-iceberg-table-alternative-2 branch 2 times, most recently from 6f84d4e to a55225d Compare March 27, 2023 11:49
@krvikash
Copy link
Contributor Author

Thanks, @findepi | @findinpath for the reviews. Addressed comments.

@github-actions github-actions bot added delta-lake Delta Lake connector hive Hive connector labels Mar 27, 2023
@krvikash krvikash force-pushed the support-drop-corrupted-iceberg-table-alternative-2 branch from a55225d to 6b746cd Compare March 28, 2023 05:38
@krvikash
Copy link
Contributor Author

Fixed CI failure

@krvikash krvikash force-pushed the support-drop-corrupted-iceberg-table-alternative-2 branch from 6b746cd to 4b5a180 Compare March 28, 2023 05:41
@findinpath
Copy link
Contributor

Tests are red

023-03-28T10:37:36.1282148Z 2023-03-28T04:37:35.895-0600	ERROR	http-worker-659	org.apache.iceberg.rest.RESTCatalogServlet	Error processing REST request
2023-03-28T10:37:36.1283043Z org.apache.iceberg.exceptions.RESTException: Unhandled error: ErrorResponse(code=404, type=NoSuchTableException, message=Table does not exist: tpch.test_drop_table_with_missing_snapshot_file_kljh1kx0xl)
2023-03-28T10:37:36.1283886Z org.apache.iceberg.exceptions.NoSuchTableException: Table does not exist: tpch.test_drop_table_with_missing_snapshot_file_kljh1kx0xl
2023-03-28T10:37:36.1284622Z 	at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:50)
2023-03-28T10:37:36.1285554Z 	at org.apache.iceberg.rest.CatalogHandlers.loadTable(CatalogHandlers.java:240)
2023-03-28T10:37:36.1286219Z 	at org.apache.iceberg.rest.RESTCatalogAdapter.handleRequest(RESTCatalogAdapter.java:336)
2023-03-28T10:37:36.1286891Z 	at org.apache.iceberg.rest.RESTCatalogAdapter.execute(RESTCatalogAdapter.java:384)
2023-03-28T10:37:36.1287535Z 	at org.apache.iceberg.rest.RESTCatalogServlet.execute(RESTCatalogServlet.java:100)
2023-03-28T10:37:36.1288150Z 	at org.apache.iceberg.rest.RESTCatalogServlet.doGet(RESTCatalogServlet.java:66)
2023-03-28T10:37:36.1288707Z 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:503)
2023-03-28T10:37:36.1289213Z 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:590)
2023-03-28T10:44:47.9500066Z 2023-03-28T04:44:47.907-0600	ERROR	pool-3-thread-1	io.trino.testng.services.ProgressLoggingListener	[TEST FAILURE] io.trino.plugin.iceberg.TestIcebergMinioOrcConnectorSmokeTest.testDropTableWithMissingSnapshotFile; (took: 0.9 seconds)
2023-03-28T10:44:47.9501766Z io.trino.testing.QueryFailedException: Failed to get status for file: s3://test-iceberg-minio-smoke-test-vqhe3hnf6z/tpch_orc/test_drop_table_with_missing_snapshot_file_a05pfyc9tt-8c2da560b5424ec1bb46ec83f3cb44bf/metadata/snap-6691070755498700963-1-bd1b58ee-204d-4044-80bd-a02170234d30.avro
2023-03-28T10:44:47.9508936Z 	at io.trino.testing.AbstractTestingTrinoClient.execute(AbstractTestingTrinoClient.java:122)
2023-03-28T10:44:47.9509565Z 	at io.trino.testing.DistributedQueryRunner.executeWithQueryId(DistributedQueryRunner.java:494)
2023-03-28T10:44:47.9510306Z 	at io.trino.testing.QueryAssertions.assertDistributedUpdate(QueryAssertions.java:107)
2023-03-28T10:44:47.9510869Z 	at io.trino.testing.QueryAssertions.assertUpdate(QueryAssertions.java:63)
2023-03-28T10:44:47.9511441Z 	at io.trino.testing.AbstractTestQueryFramework.assertUpdate(AbstractTestQueryFramework.java:401)
2023-03-28T10:44:47.9512141Z 	at io.trino.testing.AbstractTestQueryFramework.assertUpdate(AbstractTestQueryFramework.java:396)
2023-03-28T10:44:47.9512961Z 	at io.trino.plugin.iceberg.BaseIcebergConnectorSmokeTest.testDropTableWithMissingSnapshotFile(BaseIcebergConnectorSmokeTest.java:536)
2023-03-28T10:44:47.9513708Z 	at java.base/jdk.int

@krvikash krvikash force-pushed the support-drop-corrupted-iceberg-table-alternative-2 branch from 4b5a180 to b521efa Compare March 28, 2023 20:46
@krvikash
Copy link
Contributor Author

Fixed CI failure.

@krvikash krvikash requested review from findepi and findinpath March 29, 2023 05:57
@krvikash krvikash force-pushed the support-drop-corrupted-iceberg-table-alternative-2 branch from 6b2ff5a to 0acb814 Compare April 6, 2023 16:57
@krvikash
Copy link
Contributor Author

krvikash commented Apr 6, 2023

Addressed comments.

@krvikash krvikash force-pushed the support-drop-corrupted-iceberg-table-alternative-2 branch from 0acb814 to 022c934 Compare April 6, 2023 17:04
@krvikash
Copy link
Contributor Author

krvikash commented Apr 6, 2023

rebased with master.

@krvikash krvikash force-pushed the support-drop-corrupted-iceberg-table-alternative-2 branch from 3abd742 to b7faab2 Compare April 7, 2023 07:03
@krvikash
Copy link
Contributor Author

krvikash commented Apr 7, 2023

Fixed CI failure.

@krvikash krvikash force-pushed the support-drop-corrupted-iceberg-table-alternative-2 branch from b7faab2 to 907a312 Compare April 7, 2023 12:58
@krvikash krvikash force-pushed the support-drop-corrupted-iceberg-table-alternative-2 branch from 907a312 to 49da147 Compare April 12, 2023 11:02
@krvikash
Copy link
Contributor Author

rebased with master and resolved conflicts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this affect concurrently executing tests (in case they query information_schema.columns)?

i hope not (i'd be a product problem, not a test issue), but let's keep an eye

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that in such cases the columns of the table don't get taken into account while querying information_schema.columns

catch (RuntimeException e) {
// Table can be being removed and this may cause all sorts of exceptions. Log, because we're catching broadly.
log.warn(e, "Failed to access metadata of table %s during streaming table columns for %s", tableName, prefix);
return Stream.empty();
}

@krvikash krvikash force-pushed the support-drop-corrupted-iceberg-table-alternative-2 branch 2 times, most recently from 09a5673 to 5710f7d Compare April 13, 2023 09:30
@krvikash
Copy link
Contributor Author

Addressed comments.

@krvikash krvikash force-pushed the support-drop-corrupted-iceberg-table-alternative-2 branch from 5710f7d to ea7d71a Compare April 17, 2023 11:21
@krvikash
Copy link
Contributor Author

Addressed comments.

@Override
public void testCorruptedTableLocation()
{
throw new SkipException("Skipping test, This test override will be removed in next commit");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit It's not clear from the exception why the skipping is being done - even if it is temporary.

}

@Test
public void testDropTableWithMissingMetadataFile()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have these tests on both the Iceberg BCT as well as on Iceberg BCST

why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BCT should have no less coverage than BCST

we need BCST so that all Catalog impls are exercised.

}

@Test
public void testDropTableWithMissingSnapshotFile()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also make sure that we don't delete on DROP more than the content associated to the corrupted table.

I'm thinking that a test where two tables exist before the drop:

  • one table is ok
  • one table is corrupt

When dropping the corrupt table, the OK table should still be present in the metastore and the amount of files in the storage (within the test schema) should decrease with only the number of files corresponding to the corrupted table.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having the test as part of BCT / BCST provides that. there are other test tables (like nation) which don't get deleted.

Copy link
Contributor

@findinpath findinpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % comments

@findepi findepi merged commit f5084be into trinodb:master Apr 24, 2023
@krvikash
Copy link
Contributor Author

Thank you all for reviewing the PR 😊

@krvikash krvikash deleted the support-drop-corrupted-iceberg-table-alternative-2 branch April 24, 2023 15:24
@github-actions github-actions bot added this to the 415 milestone Apr 24, 2023
Comment on lines +371 to +375
catch (RuntimeException e) {
// If the snapshot file is not found, an exception will be thrown by the dropTableData function.
// So log the exception and continue with deleting the table location
LOG.warn(e, "Failed to delete table data referenced by metadata");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks suspect to me. If dropTableData failed, then we should surface the failure and print a message advising the user to run unregister_table and fix the problem manually instead of swallowing the error and forcing deletion anyway.
There's no guarantee that deleteTableDirectory in next step won't fail and leave the table in a worse state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point we don't know what state the table data is, so leaving the table in metastore would be also problematic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have already deleted the table from the metastore before calling dropTableData. After this we're only trying to do a recursive delete of the table location. I don't understand why we do deleteTableDirectory in general, if that was desirable then the iceberg library function itself would do it. Even if we have a great reason for doing it, it seems fishy to ignore the failures in preceding step.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n general, if that was desirable then the iceberg library function itself would do it.

there are different opinions about what DROP TABLE behavior should be
Spark has special syntax DROP TABLE PURGE for dropping the table along with data. It maybe perhaps makes sense from Spark perspective where users sometimes operate on raw files buckets
however, from Trino, or generally form SQL perspective, DROP TABLE is the reverse of CREATE TABLE. if CREATE creates table in metastore and on disk, the DROP should clean all these places. by default at least.

what further makes things complicated is that Iceberg defines table storage location but allows the table to contain files from other places. it even allows a table to share storage location with another table. Both these things break table pruning (remove orphan files), and we don't strive to support them.
thus we do drop files in two ways

  • drop via library (best effort; in case table has some weird files in some external arbitrary locations)
  • actual drop table directory (if this errors, we don't ignore)

in principle, it should be OK not to ignore errors from "drop via library" step, which would basically reintroduce the #12318 problem. We can do so, if we consider that problem as something we don't want to fix. For now, i am convinced it was "an OK problem to fix", and that requires ignoring "drop via library" errors.

is there some particular problem that you think should change how we look at this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed iceberg Iceberg connector

Development

Successfully merging this pull request may close these issues.

Iceberg connector can't drop tables if the metadata is missing

6 participants