Support Glue catalog in Iceberg connector#10151
Support Glue catalog in Iceberg connector#10151jackye1995 wants to merge 1 commit intotrinodb:masterfrom
Conversation
e84aab7 to
59d9dff
Compare
...trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergGlueCatalogConnectorTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
This could probably use BaseConnectorSmokeTest.
Also, a test class should not extend from a test class.
For code sharing, an explicit abstract Base.. class should be used
There was a problem hiding this comment.
changed to use BaseIcebergConnectorTest
There was a problem hiding this comment.
sorry did not read the first line, changed to smoke test instead
There was a problem hiding this comment.
| Map.of("iceberg.file-format", "parquet", | |
| Map.of( | |
| "iceberg.file-format", "parquet", |
There was a problem hiding this comment.
Trino Glue catalog -> Iceberg Glue catalog
There was a problem hiding this comment.
because Glue has no database user concept and thus does not support AUTHORIZATION USER
There was a problem hiding this comment.
the test in smoke test works, removed
b9feee2 to
3ac1f4d
Compare
523fede to
4ef57e9
Compare
4ef57e9 to
0541b2d
Compare
|
@findepi Regarding running the Glue test, I checked Hive does run the Glue test suite using: But |
| </profile> | ||
|
|
||
| <profile> | ||
| <id>test-iceberg-glue</id> |
There was a problem hiding this comment.
This needs to be run on CI.
Maybe instead of a profile with single test, let's have a profile with all tests (no exclusions).
trino/.github/workflows/ci.yml
Line 351 in 3c0cf0b
would become
- ":trino-iceberg -P test-all,:trino-druid"
also, we should move trino-druid to a different group, maybe with kudu (need to check running times)
| ICEBERG_WRITE_VALIDATION_FAILED(10, INTERNAL_ERROR), | ||
| ICEBERG_INVALID_SNAPSHOT_ID(11, USER_ERROR), | ||
| ICEBERG_CATALOG_ERROR(12, EXTERNAL), | ||
| ICEBERG_COMMIT_ERROR(13, EXTERNAL) |
There was a problem hiding this comment.
Extract commit introducing this error code.
It should be used in io.trino.plugin.iceberg.catalog.hms.AbstractMetastoreTableOperations#commitNewTable and io.trino.plugin.iceberg.catalog.hms.HiveMetastoreTableOperations#commitToExistingTable
| return defaultSchemaLocation; | ||
| } | ||
|
|
||
| @Config("iceberg.default-schema-location") |
There was a problem hiding this comment.
This is applicable to Glue only, so should be in a config class that's bound only when Glue catalog is used.
it should start with iceberg.glue.
| import io.trino.plugin.hive.gcs.HiveGcsModule; | ||
| import io.trino.plugin.hive.metastore.HiveMetastore; | ||
| import io.trino.plugin.hive.s3.HiveS3Module; | ||
| import io.trino.plugin.iceberg.catalog.IcebergCatalogModule; |
There was a problem hiding this comment.
Extract commit that introduces io.trino.plugin.iceberg.catalog package.
|
|
||
| @NotThreadSafe | ||
| public abstract class AbstractMetastoreTableOperations | ||
| public abstract class AbstractIcebergTableOperations |
There was a problem hiding this comment.
Extract a commit which splits AbstractMetastoreTableOperations into AbstractMetastoreTableOperations and AbstractIcebergTableOperations
| throw new TrinoException(ICEBERG_COMMIT_ERROR, format("Cannot commit %s due to unexpected exception", getSchemaTableName()), e); | ||
| } | ||
| finally { | ||
| cleanupMetadataLocation(!succeeded, newMetadataLocation); |
There was a problem hiding this comment.
this can swallow exception in flight.
| throw new TrinoException(ICEBERG_COMMIT_ERROR, format("Cannot commit %s because of concurrent update", getSchemaTableName()), e); | ||
| } | ||
| finally { | ||
| cleanupMetadataLocation(!succeeded, newMetadataLocation); |
There was a problem hiding this comment.
this can swallow exception in flight.
| try { | ||
| io().deleteFile(metadataLocation); | ||
| } | ||
| catch (RuntimeException ex) { |
| return createIcebergQueryRunner( | ||
| ImmutableMap.of(), | ||
| ImmutableMap.of( | ||
| "iceberg.file-format", "orc", |
| protected boolean hasBehavior(TestingConnectorBehavior connectorBehavior) | ||
| { | ||
| switch (connectorBehavior) { | ||
| case SUPPORTS_RENAME_SCHEMA: | ||
| case SUPPORTS_COMMENT_ON_COLUMN: | ||
| case SUPPORTS_TOPN_PUSHDOWN: | ||
| case SUPPORTS_CREATE_VIEW: | ||
| case SUPPORTS_CREATE_MATERIALIZED_VIEW: | ||
| case SUPPORTS_RENAME_MATERIALIZED_VIEW: | ||
| case SUPPORTS_RENAME_MATERIALIZED_VIEW_ACROSS_SCHEMAS: | ||
| return false; | ||
|
|
||
| case SUPPORTS_DELETE: | ||
| return true; | ||
| default: | ||
| return super.hasBehavior(connectorBehavior); | ||
| } |
There was a problem hiding this comment.
This class and TestIcebergConnectorSmokeTest should have a common base (BaseIcebergConnectorSmokeTest) capturing Iceberg behavior.
|
cc @phd3 |
| import static org.apache.iceberg.BaseMetastoreTableOperations.ICEBERG_TABLE_TYPE_VALUE; | ||
| import static org.apache.iceberg.BaseMetastoreTableOperations.TABLE_TYPE_PROP; | ||
|
|
||
| public class GlueTableOperations |
There was a problem hiding this comment.
GlueIcebergTableOperations, seems more natural from the base class AbstractIcebergTableOperations
| @Override | ||
| protected String getRefreshedLocation() | ||
| { | ||
| return stats.getGetTable().call(() -> { |
There was a problem hiding this comment.
Just wrap the Glue API call in the lambda
| @Inject | ||
| public GlueTableOperationsProvider(FileIoProvider fileIoProvider) | ||
| { | ||
| this.fileIoProvider = fileIoProvider; |
| private final String catalogId; | ||
| private final GlueMetastoreStats stats; | ||
|
|
||
| private final Map<SchemaTableName, TableMetadata> tableMetadataCache = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
How are entries in this cache invalidated?
| this.tableOperationsProvider = requireNonNull(tableOperationsProvider, "tableOperationsProvider is null"); | ||
| this.glueClient = requireNonNull(glueClient, "glueClient is null"); | ||
| this.stats = requireNonNull(stats, "stats is null"); | ||
| this.catalogId = catalogId; // null is a valid catalogId, meaning the current account |
| @Override | ||
| public void renameNamespace(ConnectorSession session, String source, String target) | ||
| { | ||
| throw new TrinoException(NOT_SUPPORTED, "renameNamespace is not supported by Iceberg Glue catalog"); |
There was a problem hiding this comment.
Error message nit:
| throw new TrinoException(NOT_SUPPORTED, "renameNamespace is not supported by Iceberg Glue catalog"); | |
| throw new TrinoException(NOT_SUPPORTED, "renameNamespace is not supported for Iceberg Glue catalogs"); |
|
Superseded by #10845 |
Add support for reading Glue catalog data and creating Glue table, based on draft in #9646.
Also reorganized the files in the following ways:
catalogmoduleTrinoCatalogFactoryto an interface to work with dependency injectionAbstractMetastoreTableOperationstoAbstractIcebergTableOperationsto share with Glue implementationiceberg.default-schema-locationas discussed in Support Iceberg default warehouse location config #9614test-iceberg-glueto not run the Glue test because it requires AWS setup. I have ran the test to make sure all 280 tests pass.