Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
import org.apache.iceberg.relocated.com.google.common.collect.Maps;
import org.apache.iceberg.types.Types;
import org.apache.iceberg.types.Types.NestedField;
import org.junit.Assert;
import org.junit.Test;
import software.amazon.awssdk.services.glue.model.Column;
Expand Down Expand Up @@ -368,4 +369,54 @@ public void testColumnCommentsAndParameters() {
);
Assert.assertEquals("Columns do not match", expectedColumns, actualColumns);
}

@Test
public void testTablePropsDefinedAtCatalogLevel() {
String namespace = createNamespace();
String tableName = getRandomName();
TableIdentifier tableIdent = TableIdentifier.of(namespace, tableName);
ImmutableMap<String, String> catalogProps = ImmutableMap.of(
"table-default.key1", "catalog-default-key1",
"table-default.key2", "catalog-default-key2",
"table-default.key3", "catalog-default-key3",
"table-override.key3", "catalog-override-key3",
"table-override.key4", "catalog-override-key4",
"warehouse", "s3://" + testBucketName + "/" + testPathPrefix);

glueCatalog.initialize("glue", catalogProps);

Schema schema = new Schema(
NestedField.required(3, "id", Types.IntegerType.get(), "unique ID"),
NestedField.required(4, "data", Types.StringType.get())
);

org.apache.iceberg.Table table = glueCatalog.buildTable(tableIdent, schema)
.withProperty("key2", "table-key2")
.withProperty("key3", "table-key3")
.withProperty("key5", "table-key5")
.create();

Assert.assertEquals(
"Table defaults set for the catalog must be added to the table properties.",
"catalog-default-key1",
table.properties().get("key1"));
Assert.assertEquals(
"Table property must override table default properties set at catalog level.",
"table-key2",
table.properties().get("key2"));
Assert.assertEquals(
"Table property override set at catalog level must override table default" +
" properties set at catalog level and table property specified.",
"catalog-override-key3",
table.properties().get("key3"));
Assert.assertEquals(
"Table override not in table props or defaults should be added to table properties",
"catalog-override-key4",
table.properties().get("key4"));
Assert.assertEquals(
"Table properties without any catalog level default or override should be added to table" +
" properties.",
"table-key5",
table.properties().get("key5"));
}
Comment on lines +374 to +421
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we move this test to TestGlueCatalog instead, as at present we don't run integ test as part of our GA. WDYT ?

Copy link
Contributor

@rajarshisarkar rajarshisarkar May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently in TestGlueCatalog we mock objects of GlueCatalog and GlueClient. The class is mainly used for creating/listing/dropping/renaming namespace/tables by mocking the response. Also, it looks like we cannot mock S3 Client in S3FileIO directly from TestGlueCatalog. Do let me know your thoughts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally really don't like working with mocks, so I understand what you mean. I would consider adding some test that does get run through the CI test suite, to ensure that any changes that are made in the API are made against the Glue catalog (outside of compiler related errors).

As it is true that the AWS integration tests aren't run on any schedule within the open source community (that I'm aware of) due to lack of infra.

That said, I wouldn't consider it a hard requirement given that HadoopCatalog and HiveCatalog both have tests. Maybe adding GlueCatalog to the existing CatalogTests suite would be beneficial in the long run? Something to consider.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally really don't like working with mocks, so I understand what you mean

Yes, one more limitation here with mocking the glue client is we cannot mock the glue table response with a path for the BaseMetastoreTableOperations.METADATA_LOCATION_PROP property (as it starts checking for the existence of the file while refreshing the metadata in BaseMetastoreTableOperations).

I would consider adding some test that does get run through the CI test suite, to ensure that any changes that are made in the API are made against the Glue catalog

I have added a test which asserts that the properties API should not return an empty map from Glue Catalog when the catalog properties are initialised. The processing of the properties in BaseMetastoreCatalog is validated by HadoopCatalog and HiveCatalog. Meanwhile, TestGlueCatalogTable should check the actual behaviour for GlueCatalog.

Maybe adding GlueCatalog to the existing CatalogTests suite would be beneficial in the long run? Something to consider.

Yes, I think adding GlueCatalog to the existing CatalogTests suite would be beneficial. I can publish a separate PR for the same.

Please let me know your thoughts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, one more limitation here with mocking the glue client is we cannot mock the glue table response with a path for the BaseMetastoreTableOperations.METADATA_LOCATION_PROP property (as it starts checking for the existence of the file while refreshing the metadata in BaseMetastoreTableOperations)

+1 understood. Wondering if we should comment this somewhere. I suppose a standalone comment on this PR is sufficient (or maybe in the final squashed git PR summarty a small mention of that in the extended message?). But overall +1 understood. Mocks can't solve everything.

I have added a test which asserts that the properties API should not return an empty map from Glue Catalog when the catalog properties are initialised. The processing of the properties in BaseMetastoreCatalog is validated by HadoopCatalog and HiveCatalog. Meanwhile, TestGlueCatalogTable should check the actual behaviour for GlueCatalog.

+1

Yes, I think adding GlueCatalog to the existing CatalogTests suite would be beneficial. I can publish a separate PR for the same.

Major +1. That would be really beneficial. Even if it involves some mocks (right now there's backing catalog, which for REST we use JDBC via the CatalogAdaptor.

Even just having some mocks there one time would make it easy for people who aren't super knowledgable in Glue to add tests that hit multiple catalogs without knowing the details of all of them (to some degree or with review).

Definitely for a separate PR though =)

}
16 changes: 15 additions & 1 deletion aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@

import java.io.Closeable;
import java.io.IOException;
import java.util.Collections;
import java.util.List;
import java.util.Map;
import java.util.Set;
Expand Down Expand Up @@ -92,7 +93,7 @@ public class GlueCatalog extends BaseMetastoreCatalog
private FileIO fileIO;
private LockManager lockManager;
private CloseableGroup closeableGroup;
private Map<String, String> catalogProperties;
private Map<String, String> catalogProperties = Collections.emptyMap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Unless we need null values in the map, we almost always prefer shaded-guava's ImmutableMap.of().


// Attempt to set versionId if available on the path
private static final DynMethods.UnboundMethod SET_VERSION_ID = DynMethods.builder("versionId")
Expand All @@ -110,6 +111,7 @@ public GlueCatalog() {

@Override
public void initialize(String name, Map<String, String> properties) {
this.catalogProperties = properties;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Might want to wrapt his in ImmutableMap.copyOf(properties) (which is a no-op if the underlying map is an ImmutableMap. Though again if you need null-values then that won't work for you.

AwsClientFactory awsClientFactory;
FileIO catalogFileIO;
if (PropertyUtil.propertyAsBoolean(
Expand Down Expand Up @@ -162,6 +164,13 @@ private FileIO initializeFileIO(Map<String, String> properties) {
}
}

@VisibleForTesting
void initialize(String name, String path, AwsProperties properties, GlueClient client,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't clear why this method is needed. @SinghAsDev, is this necessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary, it is just used in test to check properties map is not empty when catalog properties is initialised.

LockManager lock, FileIO io, Map<String, String> catalogProps) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit / non-blocking: The indentation for this seems a little weird to me.

If all of the properties are on the next line, will it fit? I do see conflicting things at times so it's not a huge deal, but I believe the preferred way would be something more like:

  @VisibleForTesting
  void initialize(String name, String path, AwsProperties properties, GlueClient client,
                          LockManager lock, FileIO io, Map<String, String> catalogProps) {

But again, I see different formats so without clarification, I'd either go with the above, a single line (the next line) if they all fit, or just leave it unless somebody states otherwise. I'll follow up for more clarification, but this is absolutley non-blocking.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. The starting indent of all argument lines should match. That could be starting after initialize( or using a continuation indent on the following line. But the important thing is that all lines with arguments indent arguments at the same place.

this.catalogProperties = catalogProps;
initialize(name, path, properties, client, lock, io);
}

@VisibleForTesting
void initialize(String name, String path, AwsProperties properties, GlueClient client, LockManager lock, FileIO io) {
Preconditions.checkArgument(path != null && path.length() > 0,
Expand Down Expand Up @@ -495,4 +504,9 @@ public void close() throws IOException {
public void setConf(Configuration conf) {
this.hadoopConf = conf;
}

@Override
protected Map<String, String> properties() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] Looks like we are missing test cases for Glue catalog ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we should have test cases for Glue catalog.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the tests for Glue catalog.

return catalogProperties;
}
}
24 changes: 24 additions & 0 deletions aws/src/test/java/org/apache/iceberg/aws/glue/TestGlueCatalog.java
Original file line number Diff line number Diff line change
Expand Up @@ -455,4 +455,28 @@ public void testRemoveProperties() {
.when(glue).updateDatabase(Mockito.any(UpdateDatabaseRequest.class));
glueCatalog.removeProperties(Namespace.of("db1"), Sets.newHashSet("key"));
}

@Test
public void testTablePropsDefinedAtCatalogLevel() {
ImmutableMap<String, String> catalogProps = ImmutableMap.of(
"table-default.key1", "catalog-default-key1",
"table-default.key2", "catalog-default-key2",
"table-default.key3", "catalog-default-key3",
"table-override.key3", "catalog-override-key3",
"table-override.key4", "catalog-override-key4");
glueCatalog.initialize(CATALOG_NAME, WAREHOUSE_PATH, new AwsProperties(), glue,
LockManagers.defaultLockManager(), null, catalogProps);
Map<String, String> properties = glueCatalog.properties();
Assert.assertFalse(properties.isEmpty());
Assert.assertTrue(properties.containsKey("table-default.key1"));
Assert.assertEquals("catalog-default-key1", properties.get("table-default.key1"));
Assert.assertTrue(properties.containsKey("table-default.key2"));
Assert.assertEquals("catalog-default-key2", properties.get("table-default.key2"));
Assert.assertTrue(properties.containsKey("table-default.key3"));
Assert.assertEquals("catalog-default-key3", properties.get("table-default.key3"));
Assert.assertTrue(properties.containsKey("table-override.key3"));
Assert.assertEquals("catalog-override-key3", properties.get("table-override.key3"));
Assert.assertTrue(properties.containsKey("table-override.key4"));
Assert.assertEquals("catalog-override-key4", properties.get("table-override.key4"));
}
}
46 changes: 36 additions & 10 deletions core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@

package org.apache.iceberg;

import java.util.Collections;
import java.util.Map;
import org.apache.iceberg.catalog.Catalog;
import org.apache.iceberg.catalog.TableIdentifier;
Expand All @@ -27,7 +28,8 @@
import org.apache.iceberg.exceptions.NoSuchTableException;
import org.apache.iceberg.relocated.com.google.common.base.MoreObjects;
import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
import org.apache.iceberg.relocated.com.google.common.collect.Maps;
import org.apache.iceberg.util.PropertyUtil;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

Expand Down Expand Up @@ -94,6 +96,10 @@ protected boolean isValidIdentifier(TableIdentifier tableIdentifier) {
return true;
}

protected Map<String, String> properties() {
return Collections.emptyMap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: If it's empty, usually we use ImmutableMap.of(). I see you updated from ImmutableMap.builder so I assume you need null values, but I think even still we'd normally use ImmutableMap.of() in the empty case.

}

@Override
public String toString() {
return MoreObjects.toStringHelper(this).toString();
Expand All @@ -106,7 +112,7 @@ public String toString() {
protected class BaseMetastoreCatalogTableBuilder implements TableBuilder {
private final TableIdentifier identifier;
private final Schema schema;
private final ImmutableMap.Builder<String, String> propertiesBuilder = ImmutableMap.builder();
private final Map<String, String> tableProperties = Maps.newHashMap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed for null-values? If so, can we ensure that we have tests that include null values? Otherwise it's likely to get refactored at some point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we will need null values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mainly because the map gets altered with tableProperties.putAll(tableDefaultProperties()); and tableProperties.putAll(tableOverrideProperties());

private PartitionSpec spec = PartitionSpec.unpartitioned();
private SortOrder sortOrder = SortOrder.unsorted();
private String location = null;
Expand All @@ -116,6 +122,7 @@ public BaseMetastoreCatalogTableBuilder(TableIdentifier identifier, Schema schem

this.identifier = identifier;
this.schema = schema;
this.tableProperties.putAll(tableDefaultProperties());
}

@Override
Expand All @@ -139,14 +146,14 @@ public TableBuilder withLocation(String newLocation) {
@Override
public TableBuilder withProperties(Map<String, String> properties) {
if (properties != null) {
propertiesBuilder.putAll(properties);
tableProperties.putAll(properties);
}
return this;
}

@Override
public TableBuilder withProperty(String key, String value) {
propertiesBuilder.put(key, value);
tableProperties.put(key, value);
return this;
}

Expand All @@ -158,8 +165,8 @@ public Table create() {
}

String baseLocation = location != null ? location : defaultWarehouseLocation(identifier);
Map<String, String> properties = propertiesBuilder.build();
TableMetadata metadata = TableMetadata.newTableMetadata(schema, spec, sortOrder, baseLocation, properties);
tableProperties.putAll(tableOverrideProperties());
TableMetadata metadata = TableMetadata.newTableMetadata(schema, spec, sortOrder, baseLocation, tableProperties);

try {
ops.commit(null, metadata);
Expand All @@ -178,8 +185,8 @@ public Transaction createTransaction() {
}

String baseLocation = location != null ? location : defaultWarehouseLocation(identifier);
Map<String, String> properties = propertiesBuilder.build();
TableMetadata metadata = TableMetadata.newTableMetadata(schema, spec, sortOrder, baseLocation, properties);
tableProperties.putAll(tableOverrideProperties());
TableMetadata metadata = TableMetadata.newTableMetadata(schema, spec, sortOrder, baseLocation, tableProperties);
return Transactions.createTableTransaction(identifier.toString(), ops, metadata);
}

Expand All @@ -200,12 +207,13 @@ private Transaction newReplaceTableTransaction(boolean orCreate) {
}

TableMetadata metadata;
tableProperties.putAll(tableOverrideProperties());
if (ops.current() != null) {
String baseLocation = location != null ? location : ops.current().location();
metadata = ops.current().buildReplacement(schema, spec, sortOrder, baseLocation, propertiesBuilder.build());
metadata = ops.current().buildReplacement(schema, spec, sortOrder, baseLocation, tableProperties);
} else {
String baseLocation = location != null ? location : defaultWarehouseLocation(identifier);
metadata = TableMetadata.newTableMetadata(schema, spec, sortOrder, baseLocation, propertiesBuilder.build());
metadata = TableMetadata.newTableMetadata(schema, spec, sortOrder, baseLocation, tableProperties);
}

if (orCreate) {
Expand All @@ -214,6 +222,24 @@ private Transaction newReplaceTableTransaction(boolean orCreate) {
return Transactions.replaceTableTransaction(identifier.toString(), ops, metadata);
}
}

/**
* Get default table properties set at Catalog level through catalog properties.
*
* @return default table properties specified in catalog properties
*/
private Map<String, String> tableDefaultProperties() {
return PropertyUtil.propertiesWithPrefix(properties(), CatalogProperties.TABLE_DEFAULT_PREFIX);
}

/**
* Get table properties that are enforced at Catalog level through catalog properties.
*
* @return default table properties enforced through catalog properties
*/
private Map<String, String> tableOverrideProperties() {
return PropertyUtil.propertiesWithPrefix(properties(), CatalogProperties.TABLE_OVERRIDE_PREFIX);
}
}

protected static String fullTableName(String catalogName, TableIdentifier identifier) {
Expand Down
2 changes: 2 additions & 0 deletions core/src/main/java/org/apache/iceberg/CatalogProperties.java
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ private CatalogProperties() {
public static final String CATALOG_IMPL = "catalog-impl";
public static final String FILE_IO_IMPL = "io-impl";
public static final String WAREHOUSE_LOCATION = "warehouse";
public static final String TABLE_DEFAULT_PREFIX = "table-default.";
public static final String TABLE_OVERRIDE_PREFIX = "table-override.";
Comment on lines +32 to +33
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit / open-question: How do people feel about these names? I'd kind of like them to have properties or something in them, but this might be overkill.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this.


/**
* Controls whether the catalog will cache table entries upon load.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
import java.io.UncheckedIOException;
import java.nio.file.AccessDeniedException;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import java.util.Map;
import java.util.Set;
Expand Down Expand Up @@ -93,12 +94,14 @@ public class HadoopCatalog extends BaseMetastoreCatalog implements Closeable, Su
private FileIO fileIO;
private LockManager lockManager;
private boolean suppressPermissionError = false;
private Map<String, String> catalogProperties = Collections.emptyMap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit / non-blocking : Does this need to be initialized? I'm assuming this is to suppress potential not-initialized errors?


public HadoopCatalog() {
}

@Override
public void initialize(String name, Map<String, String> properties) {
this.catalogProperties = properties;
String inputWarehouseLocation = properties.get(CatalogProperties.WAREHOUSE_LOCATION);
Preconditions.checkArgument(inputWarehouseLocation != null && inputWarehouseLocation.length() > 0,
"Cannot initialize HadoopCatalog because warehousePath must not be null or empty");
Expand Down Expand Up @@ -393,6 +396,11 @@ public Configuration getConf() {
return conf;
}

@Override
protected Map<String, String> properties() {
return catalogProperties;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: For here, _if it will remove the not-initialized warning from catalogProperties, we often do the following:

return catalogProperies != null ? ImmutableMap.of() : catalogProperties;

}

private class HadoopCatalogTableBuilder extends BaseMetastoreCatalogTableBuilder {
private final String defaultLocation;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ public static Map<String, String> propertiesWithPrefix(
return ImmutableMap.of();
}

Preconditions.checkArgument(prefix != null, "prefix can't be null.");
Preconditions.checkArgument(prefix != null, "Invalid prefix: null");

return properties.entrySet().stream()
.filter(e -> e.getKey().startsWith(prefix))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,9 @@
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Collections;
import java.util.List;
import java.util.Map;
import java.util.zip.GZIPOutputStream;
import org.apache.hadoop.conf.Configuration;
import org.apache.iceberg.CatalogProperties;
Expand Down Expand Up @@ -179,10 +181,18 @@ void rewriteMetadataAsGzipWithOldExtension() throws IOException {
}

protected HadoopCatalog hadoopCatalog() throws IOException {
return hadoopCatalog(Collections.emptyMap());
}

protected HadoopCatalog hadoopCatalog(Map<String, String> catalogProperties) throws IOException {
HadoopCatalog hadoopCatalog = new HadoopCatalog();
hadoopCatalog.setConf(new Configuration());
hadoopCatalog.initialize("hadoop",
ImmutableMap.of(CatalogProperties.WAREHOUSE_LOCATION, temp.newFolder().getAbsolutePath()));
hadoopCatalog.initialize(
"hadoop",
ImmutableMap.<String, String>builder()
.putAll(catalogProperties)
.put(CatalogProperties.WAREHOUSE_LOCATION, temp.newFolder().getAbsolutePath())
.build());
return hadoopCatalog;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -547,4 +547,46 @@ private static void addVersionsToTable(Table table) {
table.newAppend().appendFile(dataFile1).commit();
table.newAppend().appendFile(dataFile2).commit();
}

@Test
public void testTablePropsDefinedAtCatalogLevel() throws IOException {
TableIdentifier tableIdent = TableIdentifier.of("db", "ns1", "ns2", "tbl");
ImmutableMap<String, String> catalogProps = ImmutableMap.of(
"table-default.key1", "catalog-default-key1",
"table-default.key2", "catalog-default-key2",
"table-default.key3", "catalog-default-key3",
"table-override.key3", "catalog-override-key3",
"table-override.key4", "catalog-override-key4");

Table table = hadoopCatalog(catalogProps).buildTable(tableIdent, SCHEMA)
.withPartitionSpec(SPEC)
.withProperties(null)
.withProperty("key2", "table-key2")
.withProperty("key3", "table-key3")
.withProperty("key5", "table-key5")
.create();

Assert.assertEquals(
"Table defaults set for the catalog must be added to the table properties.",
"catalog-default-key1",
table.properties().get("key1"));
Assert.assertEquals(
"Table property must override table default properties set at catalog level.",
"table-key2",
table.properties().get("key2"));
Assert.assertEquals(
"Table property override set at catalog level must override table default" +
" properties set at catalog level and table property specified.",
"catalog-override-key3",
table.properties().get("key3"));
Assert.assertEquals(
"Table override not in table props or defaults should be added to table properties",
"catalog-override-key4",
table.properties().get("key4"));
Assert.assertEquals(
"Table properties without any catalog level default or override should be added to table" +
" properties.",
"table-key5",
table.properties().get("key5"));
}
}
Loading