-
Notifications
You must be signed in to change notification settings - Fork 3k
Allow table defaults to be configured and/ or enforced at catalog level using catalog properties. #4011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow table defaults to be configured and/ or enforced at catalog level using catalog properties. #4011
Changes from all commits
add90e0
4d26bb3
7eb19c5
46711e0
5c70c4d
f9e1ee4
ed1a3a6
6af5b51
a2fbf59
9fe9215
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -21,6 +21,7 @@ | |
|
|
||
| import java.io.Closeable; | ||
| import java.io.IOException; | ||
| import java.util.Collections; | ||
| import java.util.List; | ||
| import java.util.Map; | ||
| import java.util.Set; | ||
|
|
@@ -92,7 +93,7 @@ public class GlueCatalog extends BaseMetastoreCatalog | |
| private FileIO fileIO; | ||
| private LockManager lockManager; | ||
| private CloseableGroup closeableGroup; | ||
| private Map<String, String> catalogProperties; | ||
| private Map<String, String> catalogProperties = Collections.emptyMap(); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: Unless we need null values in the map, we almost always prefer shaded-guava's |
||
|
|
||
| // Attempt to set versionId if available on the path | ||
| private static final DynMethods.UnboundMethod SET_VERSION_ID = DynMethods.builder("versionId") | ||
|
|
@@ -110,6 +111,7 @@ public GlueCatalog() { | |
|
|
||
| @Override | ||
| public void initialize(String name, Map<String, String> properties) { | ||
| this.catalogProperties = properties; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: Might want to wrapt his in |
||
| AwsClientFactory awsClientFactory; | ||
| FileIO catalogFileIO; | ||
| if (PropertyUtil.propertyAsBoolean( | ||
|
|
@@ -162,6 +164,13 @@ private FileIO initializeFileIO(Map<String, String> properties) { | |
| } | ||
| } | ||
|
|
||
| @VisibleForTesting | ||
| void initialize(String name, String path, AwsProperties properties, GlueClient client, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It isn't clear why this method is needed. @SinghAsDev, is this necessary?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This shouldn't be necessary, it is just used in test to check properties map is not empty when catalog properties is initialised. |
||
| LockManager lock, FileIO io, Map<String, String> catalogProps) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit / non-blocking: The indentation for this seems a little weird to me. If all of the properties are on the next line, will it fit? I do see conflicting things at times so it's not a huge deal, but I believe the preferred way would be something more like: @VisibleForTesting
void initialize(String name, String path, AwsProperties properties, GlueClient client,
LockManager lock, FileIO io, Map<String, String> catalogProps) {But again, I see different formats so without clarification, I'd either go with the above, a single line (the next line) if they all fit, or just leave it unless somebody states otherwise. I'll follow up for more clarification, but this is absolutley non-blocking.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed. The starting indent of all argument lines should match. That could be starting after |
||
| this.catalogProperties = catalogProps; | ||
| initialize(name, path, properties, client, lock, io); | ||
| } | ||
|
|
||
| @VisibleForTesting | ||
| void initialize(String name, String path, AwsProperties properties, GlueClient client, LockManager lock, FileIO io) { | ||
| Preconditions.checkArgument(path != null && path.length() > 0, | ||
|
|
@@ -495,4 +504,9 @@ public void close() throws IOException { | |
| public void setConf(Configuration conf) { | ||
| this.hadoopConf = conf; | ||
| } | ||
|
|
||
| @Override | ||
| protected Map<String, String> properties() { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [question] Looks like we are missing test cases for Glue catalog ?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, we should have test cases for Glue catalog.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have added the tests for Glue catalog. |
||
| return catalogProperties; | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,6 +19,7 @@ | |
|
|
||
| package org.apache.iceberg; | ||
|
|
||
| import java.util.Collections; | ||
| import java.util.Map; | ||
| import org.apache.iceberg.catalog.Catalog; | ||
| import org.apache.iceberg.catalog.TableIdentifier; | ||
|
|
@@ -27,7 +28,8 @@ | |
| import org.apache.iceberg.exceptions.NoSuchTableException; | ||
| import org.apache.iceberg.relocated.com.google.common.base.MoreObjects; | ||
| import org.apache.iceberg.relocated.com.google.common.base.Preconditions; | ||
| import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; | ||
| import org.apache.iceberg.relocated.com.google.common.collect.Maps; | ||
| import org.apache.iceberg.util.PropertyUtil; | ||
| import org.slf4j.Logger; | ||
| import org.slf4j.LoggerFactory; | ||
|
|
||
|
|
@@ -94,6 +96,10 @@ protected boolean isValidIdentifier(TableIdentifier tableIdentifier) { | |
| return true; | ||
| } | ||
|
|
||
| protected Map<String, String> properties() { | ||
| return Collections.emptyMap(); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: If it's empty, usually we use |
||
| } | ||
|
|
||
| @Override | ||
| public String toString() { | ||
| return MoreObjects.toStringHelper(this).toString(); | ||
|
|
@@ -106,7 +112,7 @@ public String toString() { | |
| protected class BaseMetastoreCatalogTableBuilder implements TableBuilder { | ||
| private final TableIdentifier identifier; | ||
| private final Schema schema; | ||
| private final ImmutableMap.Builder<String, String> propertiesBuilder = ImmutableMap.builder(); | ||
| private final Map<String, String> tableProperties = Maps.newHashMap(); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this needed for null-values? If so, can we ensure that we have tests that include null values? Otherwise it's likely to get refactored at some point.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we will need null values.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's mainly because the map gets altered with |
||
| private PartitionSpec spec = PartitionSpec.unpartitioned(); | ||
| private SortOrder sortOrder = SortOrder.unsorted(); | ||
| private String location = null; | ||
|
|
@@ -116,6 +122,7 @@ public BaseMetastoreCatalogTableBuilder(TableIdentifier identifier, Schema schem | |
|
|
||
| this.identifier = identifier; | ||
| this.schema = schema; | ||
| this.tableProperties.putAll(tableDefaultProperties()); | ||
| } | ||
|
|
||
| @Override | ||
|
|
@@ -139,14 +146,14 @@ public TableBuilder withLocation(String newLocation) { | |
| @Override | ||
| public TableBuilder withProperties(Map<String, String> properties) { | ||
| if (properties != null) { | ||
| propertiesBuilder.putAll(properties); | ||
| tableProperties.putAll(properties); | ||
| } | ||
| return this; | ||
| } | ||
|
|
||
| @Override | ||
| public TableBuilder withProperty(String key, String value) { | ||
| propertiesBuilder.put(key, value); | ||
| tableProperties.put(key, value); | ||
| return this; | ||
| } | ||
|
|
||
|
|
@@ -158,8 +165,8 @@ public Table create() { | |
| } | ||
|
|
||
| String baseLocation = location != null ? location : defaultWarehouseLocation(identifier); | ||
| Map<String, String> properties = propertiesBuilder.build(); | ||
| TableMetadata metadata = TableMetadata.newTableMetadata(schema, spec, sortOrder, baseLocation, properties); | ||
| tableProperties.putAll(tableOverrideProperties()); | ||
| TableMetadata metadata = TableMetadata.newTableMetadata(schema, spec, sortOrder, baseLocation, tableProperties); | ||
|
|
||
| try { | ||
| ops.commit(null, metadata); | ||
|
|
@@ -178,8 +185,8 @@ public Transaction createTransaction() { | |
| } | ||
|
|
||
| String baseLocation = location != null ? location : defaultWarehouseLocation(identifier); | ||
| Map<String, String> properties = propertiesBuilder.build(); | ||
| TableMetadata metadata = TableMetadata.newTableMetadata(schema, spec, sortOrder, baseLocation, properties); | ||
| tableProperties.putAll(tableOverrideProperties()); | ||
| TableMetadata metadata = TableMetadata.newTableMetadata(schema, spec, sortOrder, baseLocation, tableProperties); | ||
| return Transactions.createTableTransaction(identifier.toString(), ops, metadata); | ||
| } | ||
|
|
||
|
|
@@ -200,12 +207,13 @@ private Transaction newReplaceTableTransaction(boolean orCreate) { | |
| } | ||
|
|
||
| TableMetadata metadata; | ||
| tableProperties.putAll(tableOverrideProperties()); | ||
| if (ops.current() != null) { | ||
| String baseLocation = location != null ? location : ops.current().location(); | ||
| metadata = ops.current().buildReplacement(schema, spec, sortOrder, baseLocation, propertiesBuilder.build()); | ||
| metadata = ops.current().buildReplacement(schema, spec, sortOrder, baseLocation, tableProperties); | ||
| } else { | ||
| String baseLocation = location != null ? location : defaultWarehouseLocation(identifier); | ||
| metadata = TableMetadata.newTableMetadata(schema, spec, sortOrder, baseLocation, propertiesBuilder.build()); | ||
| metadata = TableMetadata.newTableMetadata(schema, spec, sortOrder, baseLocation, tableProperties); | ||
| } | ||
|
|
||
| if (orCreate) { | ||
|
|
@@ -214,6 +222,24 @@ private Transaction newReplaceTableTransaction(boolean orCreate) { | |
| return Transactions.replaceTableTransaction(identifier.toString(), ops, metadata); | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Get default table properties set at Catalog level through catalog properties. | ||
| * | ||
| * @return default table properties specified in catalog properties | ||
| */ | ||
| private Map<String, String> tableDefaultProperties() { | ||
| return PropertyUtil.propertiesWithPrefix(properties(), CatalogProperties.TABLE_DEFAULT_PREFIX); | ||
| } | ||
|
|
||
| /** | ||
| * Get table properties that are enforced at Catalog level through catalog properties. | ||
| * | ||
| * @return default table properties enforced through catalog properties | ||
| */ | ||
| private Map<String, String> tableOverrideProperties() { | ||
| return PropertyUtil.propertiesWithPrefix(properties(), CatalogProperties.TABLE_OVERRIDE_PREFIX); | ||
| } | ||
| } | ||
|
|
||
| protected static String fullTableName(String catalogName, TableIdentifier identifier) { | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -29,6 +29,8 @@ private CatalogProperties() { | |
| public static final String CATALOG_IMPL = "catalog-impl"; | ||
| public static final String FILE_IO_IMPL = "io-impl"; | ||
| public static final String WAREHOUSE_LOCATION = "warehouse"; | ||
| public static final String TABLE_DEFAULT_PREFIX = "table-default."; | ||
| public static final String TABLE_OVERRIDE_PREFIX = "table-override."; | ||
|
Comment on lines
+32
to
+33
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit / open-question: How do people feel about these names? I'd kind of like them to have
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm good with this. |
||
|
|
||
| /** | ||
| * Controls whether the catalog will cache table entries upon load. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -25,6 +25,7 @@ | |
| import java.io.UncheckedIOException; | ||
| import java.nio.file.AccessDeniedException; | ||
| import java.util.Arrays; | ||
| import java.util.Collections; | ||
| import java.util.List; | ||
| import java.util.Map; | ||
| import java.util.Set; | ||
|
|
@@ -93,12 +94,14 @@ public class HadoopCatalog extends BaseMetastoreCatalog implements Closeable, Su | |
| private FileIO fileIO; | ||
| private LockManager lockManager; | ||
| private boolean suppressPermissionError = false; | ||
| private Map<String, String> catalogProperties = Collections.emptyMap(); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit / non-blocking : Does this need to be initialized? I'm assuming this is to suppress potential not-initialized errors? |
||
|
|
||
| public HadoopCatalog() { | ||
| } | ||
|
|
||
| @Override | ||
| public void initialize(String name, Map<String, String> properties) { | ||
| this.catalogProperties = properties; | ||
| String inputWarehouseLocation = properties.get(CatalogProperties.WAREHOUSE_LOCATION); | ||
| Preconditions.checkArgument(inputWarehouseLocation != null && inputWarehouseLocation.length() > 0, | ||
| "Cannot initialize HadoopCatalog because warehousePath must not be null or empty"); | ||
|
|
@@ -393,6 +396,11 @@ public Configuration getConf() { | |
| return conf; | ||
| } | ||
|
|
||
| @Override | ||
| protected Map<String, String> properties() { | ||
| return catalogProperties; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: For here, _if it will remove the not-initialized warning from return catalogProperies != null ? ImmutableMap.of() : catalogProperties; |
||
| } | ||
|
|
||
| private class HadoopCatalogTableBuilder extends BaseMetastoreCatalogTableBuilder { | ||
| private final String defaultLocation; | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we move this test to TestGlueCatalog instead, as at present we don't run integ test as part of our GA. WDYT ?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently in
TestGlueCatalogwe mock objects ofGlueCatalogandGlueClient. The class is mainly used for creating/listing/dropping/renaming namespace/tables by mocking the response. Also, it looks like we cannot mock S3 Client inS3FileIOdirectly fromTestGlueCatalog. Do let me know your thoughts.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally really don't like working with mocks, so I understand what you mean. I would consider adding some test that does get run through the CI test suite, to ensure that any changes that are made in the API are made against the Glue catalog (outside of compiler related errors).
As it is true that the AWS integration tests aren't run on any schedule within the open source community (that I'm aware of) due to lack of infra.
That said, I wouldn't consider it a hard requirement given that HadoopCatalog and HiveCatalog both have tests. Maybe adding GlueCatalog to the existing CatalogTests suite would be beneficial in the long run? Something to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, one more limitation here with mocking the glue client is we cannot mock the glue table response with a path for the
BaseMetastoreTableOperations.METADATA_LOCATION_PROPproperty (as it starts checking for the existence of the file while refreshing the metadata inBaseMetastoreTableOperations).I have added a test which asserts that the properties API should not return an empty map from Glue Catalog when the catalog properties are initialised. The processing of the properties in
BaseMetastoreCatalogis validated byHadoopCatalogandHiveCatalog. Meanwhile,TestGlueCatalogTableshould check the actual behaviour for GlueCatalog.Yes, I think adding GlueCatalog to the existing CatalogTests suite would be beneficial. I can publish a separate PR for the same.
Please let me know your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 understood. Wondering if we should comment this somewhere. I suppose a standalone comment on this PR is sufficient (or maybe in the final squashed git PR summarty a small mention of that in the extended message?). But overall +1 understood. Mocks can't solve everything.
+1
Major +1. That would be really beneficial. Even if it involves some mocks (right now there's backing catalog, which for REST we use JDBC via the
CatalogAdaptor.Even just having some mocks there one time would make it easy for people who aren't super knowledgable in Glue to add tests that hit multiple catalogs without knowing the details of all of them (to some degree or with review).
Definitely for a separate PR though =)