Skip to content
Closed
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .palantir/revapi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -410,6 +410,14 @@ acceptedBreaks:
old: "field org.apache.iceberg.actions.RewriteDataFiles.MAX_CONCURRENT_FILE_GROUP_REWRITES_DEFAULT"
new: "field org.apache.iceberg.actions.RewriteDataFiles.MAX_CONCURRENT_FILE_GROUP_REWRITES_DEFAULT"
justification: "Update default to align with recommendation and allow more parallelism"
- code: "java.method.addedToInterface"
new: "method boolean org.apache.iceberg.catalog.SessionCatalog::dropNamespace(org.apache.iceberg.catalog.SessionCatalog.SessionContext,\
\ org.apache.iceberg.catalog.Namespace, boolean)"
justification: "extending api to add new method, not removing any public api"
- code: "java.method.addedToInterface"
new: "method boolean org.apache.iceberg.catalog.SupportsNamespaces::dropNamespace(org.apache.iceberg.catalog.Namespace,\
\ boolean) throws org.apache.iceberg.exceptions.NamespaceNotEmptyException"
justification: "extending api to add new method, not removing any public api"
- code: "java.method.addedToInterface"
new: "method java.lang.String org.apache.iceberg.view.ViewVersion::operation()"
justification: "Add operation API to view version"
Expand Down
11 changes: 11 additions & 0 deletions api/src/main/java/org/apache/iceberg/catalog/SessionCatalog.java
Original file line number Diff line number Diff line change
Expand Up @@ -309,6 +309,17 @@ default List<Namespace> listNamespaces(SessionContext context) {
*/
boolean dropNamespace(SessionContext context, Namespace namespace);

/**
* Drop a namespace. If the namespace exists and was dropped, this will return true.
*
* @param context session context
* @param namespace a {@link Namespace namespace}
* @param cascade – When true, deletes all objects under the namespace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a stricter definition IMO. In some instances below here you delete folders, sometimes you delete files, sometimes it deletes entries. We need a strong definition here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One big missing thing here is how this works with "purge"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have two options, drop_namespace with cascade will:

  1. drop the tables not the data (no purge)
  2. drop the tables and delete the data (purge)

We can standardize this in Iceberg and go with the preferred option, #1 sounds better as #2 can take a long time depending on number of tables and data in them. This behavior is not consistent across all catalogs, for example hive metastore cascade deletes all the data (purge) in tables where as others don't.

should I create a vote thread for this to decide on the preference or can we decide that in this PR? I can then update the implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RussellSpitzer documented the current purge behavior in the PR description and docs.

Only Hive catalog purges the tables as that's the default behavior of hive when we pass the cascade=true parameter.

I can make it consistent across all catalogs and modify current hive catalog behavior to just drop table without purge similar to other catalogs. Let me know your thoughts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry I didn't get back to this but I do believe we probably need a discuss thread, or we can get feedback here but I think we need to hear from @jackye1995 and others who are maintaining other Catalog implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed implementation for other catalogs, will initiate a discussion once api change in this PR is merged

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glue by default does cascade drop if just call glue.deleteDatabase, but does not drop data in the table. So technically you can support that directly. But I am okay to publish another PR to support that, up to you.

For doing purge or not, I think the behavior of cascade is not clearly specified in Spark. It could also be argued as something that is catalog-specific, just like the behavior of Hive and Glue are different and it will be difficult to satisfy the other behavior on both sides.

This is related to the issue about different behaviors of list namespace that comes up recently. Maybe we should make a page documenting all catalogs and their different behaviors.

+1 to have a devlist discussion thread first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will start a devlist discussion and create a one pager with current state. Thanks @jackye1995

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackye1995 To confirm are you OK with adding this api to Iceberg? Or do you want keep it within engine implementations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I am okay with adding this to Iceberg API, otherwise there is no way to leverage catalog service level integration features. I would imagine people using REST catalog would like to just handle this behind the service.

* @return true if the namespace was dropped, false otherwise.
* @throws NamespaceNotEmptyException If the namespace is not empty
*/
boolean dropNamespace(SessionContext context, Namespace namespace, boolean cascade);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the default here as well? I think we do for any implementers of SessionCatalog

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.


/**
* Set a collection of properties on a namespace in the catalog.
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,16 @@ default List<Namespace> listNamespaces() {
*/
boolean dropNamespace(Namespace namespace) throws NamespaceNotEmptyException;

/**
* Drop a namespace. If the namespace exists and was dropped, this will return true.
*
* @param namespace a namespace. {@link Namespace}
* @param cascade – When true, deletes all objects under the namespace
* @return true if the namespace was dropped, false otherwise.
* @throws NamespaceNotEmptyException If the namespace is not empty
*/
boolean dropNamespace(Namespace namespace, boolean cascade) throws NamespaceNotEmptyException;

/**
* Set a collection of properties on a namespace in the catalog.
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
import org.apache.iceberg.catalog.Namespace;
import org.apache.iceberg.catalog.TableIdentifier;
import org.apache.iceberg.exceptions.AlreadyExistsException;
import org.apache.iceberg.exceptions.NamespaceNotEmptyException;
import org.apache.iceberg.exceptions.NoSuchTableException;
import org.apache.iceberg.exceptions.ValidationException;
import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
Expand Down Expand Up @@ -359,6 +360,44 @@ public void testDropNamespace() {
Assert.assertFalse("namespace must not exist", response.hasItem());
}

@Test
public void testDropNamespaceCascade() {
Namespace namespace = Namespace.of(genRandomName());
catalog.createNamespace(namespace);
TableIdentifier tableIdentifier = TableIdentifier.of(namespace, genRandomName());
catalog.createTable(tableIdentifier, SCHEMA);
catalog.dropNamespace(namespace, true);
GetItemResponse response =
dynamo.getItem(
GetItemRequest.builder()
.tableName(catalogTableName)
.key(DynamoDbCatalog.namespacePrimaryKey(namespace))
.build());
Assert.assertFalse("namespace must not exist", response.hasItem());
}

@Test
public void testDropNamespaceFalse() {
Namespace namespace = Namespace.of(genRandomName());
catalog.createNamespace(namespace);
TableIdentifier tableIdentifier = TableIdentifier.of(namespace, genRandomName());
catalog.createTable(tableIdentifier, SCHEMA);
AssertHelpers.assertThrows(
"Should fail to drop namespace is not empty" + namespace,
NamespaceNotEmptyException.class,
"Namespace dbname_drop is not empty. One or more tables exist.",
() -> {
catalog.dropNamespace(namespace, false);
});
GetItemResponse response =
dynamo.getItem(
GetItemRequest.builder()
.tableName(catalogTableName)
.key(DynamoDbCatalog.namespacePrimaryKey(namespace))
.build());
Assert.assertTrue(response.hasItem());
}

@Test
public void testRegisterTable() {
Namespace namespace = Namespace.of(genRandomName());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,15 @@ public boolean dropNamespace(Namespace namespace) throws NamespaceNotEmptyExcept
}
}

@Override
public boolean dropNamespace(Namespace namespace, boolean cascade)
throws NamespaceNotEmptyException {
if (cascade) {
listTables(namespace).forEach(this::dropTable);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think i'm missing something here since I'm not sure how dropTable get's called here with a single arg. The issue like I mentioned above is this would be a purely metadata delete for Dynamo I think in this case. Is that the expected behavior we want?

Copy link
Contributor Author

@abmo-x abmo-x Apr 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropTable is overloaded and has two methods, 1 with just single arg TableIdentifier and another which takes the boolean purge arg, we are using the former one here where purge is false.

addressed the purge comment here (#7275 (comment)))

}
return dropNamespace(namespace);
}

@Override
public boolean setProperties(Namespace namespace, Map<String, String> properties)
throws NoSuchNamespaceException {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -560,6 +560,15 @@ public boolean dropNamespace(Namespace namespace) throws NamespaceNotEmptyExcept
return true;
}

@Override
public boolean dropNamespace(Namespace namespace, boolean cascade)
throws NamespaceNotEmptyException {
if (cascade) {
listTables(namespace).forEach(this::dropTable);
}
return dropNamespace(namespace);
}

@Override
public boolean setProperties(Namespace namespace, Map<String, String> properties)
throws NoSuchNamespaceException {
Expand Down
61 changes: 61 additions & 0 deletions aws/src/test/java/org/apache/iceberg/aws/glue/TestGlueCatalog.java
Original file line number Diff line number Diff line change
Expand Up @@ -552,6 +552,67 @@ public void testDropNamespaceThatContainsNonIcebergTable() {
.hasMessage("Cannot drop namespace db1 because it still contains non-Iceberg tables");
}

@Test
public void testDropNamespaceCascade() {
Table table =
Table.builder()
.databaseName("db1")
.name("t1")
.parameters(
ImmutableMap.of(
BaseMetastoreTableOperations.TABLE_TYPE_PROP,
BaseMetastoreTableOperations.ICEBERG_TABLE_TYPE_VALUE))
.build();
GetTablesResponse getTablesResponse = GetTablesResponse.builder().tableList(table).build();
GetTableResponse getTableResponse = GetTableResponse.builder().table(table).build();
Mockito.doReturn(getTablesResponse)
.doReturn(GetTablesResponse.builder().build())
.when(glue)
.getTables(Mockito.any(GetTablesRequest.class));
Mockito.doReturn(getTableResponse)
.doReturn(GetTableResponse.builder().build())
.when(glue)
.getTable(Mockito.any(GetTableRequest.class));
Mockito.doReturn(
GetDatabaseResponse.builder().database(Database.builder().name("db1").build()).build())
.when(glue)
.getDatabase(Mockito.any(GetDatabaseRequest.class));
Mockito.doReturn(DeleteDatabaseResponse.builder().build())
.when(glue)
.deleteDatabase(Mockito.any(DeleteDatabaseRequest.class));
Assert.assertTrue(glueCatalog.dropNamespace(Namespace.of("db1"), true));
Mockito.verify(glue).deleteTable(Mockito.any(DeleteTableRequest.class));
}

@Test
public void testDropNamespaceCascadeFalse() {
Mockito.doReturn(
GetTablesResponse.builder()
.tableList(
Table.builder()
.databaseName("db1")
.name("t1")
.parameters(
ImmutableMap.of(
BaseMetastoreTableOperations.TABLE_TYPE_PROP,
BaseMetastoreTableOperations.ICEBERG_TABLE_TYPE_VALUE))
.build())
.build())
.when(glue)
.getTables(Mockito.any(GetTablesRequest.class));
Mockito.doReturn(
GetDatabaseResponse.builder().database(Database.builder().name("db1").build()).build())
.when(glue)
.getDatabase(Mockito.any(GetDatabaseRequest.class));
Mockito.doReturn(DeleteDatabaseResponse.builder().build())
.when(glue)
.deleteDatabase(Mockito.any(DeleteDatabaseRequest.class));

Assertions.assertThatThrownBy(() -> glueCatalog.dropNamespace(Namespace.of("db1"), false))
.isInstanceOf(NamespaceNotEmptyException.class)
.hasMessage("Cannot drop namespace db1 because it still contains Iceberg tables");
}

@Test
public void testSetProperties() {
Map<String, String> parameters = Maps.newHashMap();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,16 @@ public boolean dropNamespace(Namespace namespace) throws NamespaceNotEmptyExcept
return BaseSessionCatalog.this.dropNamespace(context, namespace);
}

@Override
public boolean dropNamespace(Namespace namespace, boolean cascade)
throws NamespaceNotEmptyException {
if (cascade) {
return BaseSessionCatalog.this.dropNamespace(context, namespace, true);
} else {
return dropNamespace(namespace);
}
}

@Override
public boolean setProperties(Namespace namespace, Map<String, String> updates) {
return BaseSessionCatalog.this.updateNamespaceMetadata(
Expand Down
11 changes: 11 additions & 0 deletions core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
Original file line number Diff line number Diff line change
Expand Up @@ -349,6 +349,17 @@ public boolean dropNamespace(Namespace namespace) {
}
}

@Override
public boolean dropNamespace(Namespace namespace, boolean cascade)
throws NamespaceNotEmptyException {
if (cascade) {
// recursively delete all nested namespaces
listNamespaces(namespace).forEach(n -> dropNamespace(n, true));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the other catalogs do not drop nested namespaces, do we need to cover this in other cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my testing and existing tests, not all catalogs support nested namespaces only the ones I covered here do. Example Hive doesn't support nested namespaces.

If I missed any, I can add those as well.

listTables(namespace).forEach(this::dropTable);
}
return dropNamespace(namespace);
}

@Override
public boolean setProperties(Namespace namespace, Map<String, String> properties) {
throw new UnsupportedOperationException(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,15 @@ public boolean dropNamespace(Namespace namespace) throws NamespaceNotEmptyExcept
return namespaces.remove(namespace) != null;
}

@Override
public boolean dropNamespace(Namespace namespace, boolean cascade)
throws NamespaceNotEmptyException {
if (cascade) {
listTables(namespace).forEach(this::dropTable);
}
return dropNamespace(namespace);
}

@Override
public boolean setProperties(Namespace namespace, Map<String, String> properties)
throws NoSuchNamespaceException {
Expand Down
11 changes: 11 additions & 0 deletions core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java
Original file line number Diff line number Diff line change
Expand Up @@ -419,6 +419,17 @@ public boolean dropNamespace(Namespace namespace) throws NamespaceNotEmptyExcept
return deletedRows > 0;
}

@Override
public boolean dropNamespace(Namespace namespace, boolean cascade)
throws NamespaceNotEmptyException {
if (cascade) {
// recursively delete all nested namespaces
listNamespaces(namespace).forEach(n -> dropNamespace(n, true));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JDBC Supports nested namespaces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so based on

listTables(namespace).forEach(this::dropTable);
}
return dropNamespace(namespace);
}

@Override
public boolean setProperties(Namespace namespace, Map<String, String> properties)
throws NoSuchNamespaceException {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,14 @@ public static void dropNamespace(SupportsNamespaces catalog, Namespace namespace
}
}

public static void dropNamespace(
SupportsNamespaces catalog, Namespace namespace, boolean cascade) {
boolean dropped = catalog.dropNamespace(namespace, cascade);
if (!dropped) {
throw new NoSuchNamespaceException("Namespace does not exist: %s", namespace);
}
}

public static UpdateNamespacePropertiesResponse updateNamespaceProperties(
SupportsNamespaces catalog, Namespace namespace, UpdateNamespacePropertiesRequest request) {
request.validate();
Expand Down
6 changes: 6 additions & 0 deletions core/src/main/java/org/apache/iceberg/rest/RESTCatalog.java
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,12 @@ public boolean dropNamespace(Namespace ns) throws NamespaceNotEmptyException {
return nsDelegate.dropNamespace(ns);
}

@Override
public boolean dropNamespace(Namespace namespace, boolean cascade)
throws NamespaceNotEmptyException {
return nsDelegate.dropNamespace(namespace, cascade);
}

@Override
public boolean setProperties(Namespace ns, Map<String, String> props)
throws NoSuchNamespaceException {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -444,17 +444,30 @@ public Map<String, String> loadNamespaceMetadata(SessionContext context, Namespa

@Override
public boolean dropNamespace(SessionContext context, Namespace ns) {
return dropNamespaceInternal(context, ns, false);
}

private boolean dropNamespaceInternal(SessionContext context, Namespace ns, boolean cascade) {
checkNamespaceIsValid(ns);

try {
client.delete(
paths.namespace(ns), null, headers(context), ErrorHandlers.namespaceErrorHandler());
paths.namespace(ns),
ImmutableMap.of("cascade", Boolean.toString(cascade)),
null,
headers(context),
ErrorHandlers.namespaceErrorHandler());
return true;
} catch (NoSuchNamespaceException e) {
return false;
}
}

@Override
public boolean dropNamespace(SessionContext context, Namespace namespace, boolean cascade) {
return dropNamespaceInternal(context, namespace, cascade);
}

@Override
public boolean updateNamespaceMetadata(
SessionContext context, Namespace ns, Map<String, String> updates, Set<String> removals) {
Expand Down
29 changes: 28 additions & 1 deletion core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
import org.apache.iceberg.UpdateSchema;
import org.apache.iceberg.exceptions.AlreadyExistsException;
import org.apache.iceberg.exceptions.CommitFailedException;
import org.apache.iceberg.exceptions.NamespaceNotEmptyException;
import org.apache.iceberg.exceptions.NoSuchNamespaceException;
import org.apache.iceberg.exceptions.NoSuchTableException;
import org.apache.iceberg.expressions.Expressions;
Expand Down Expand Up @@ -363,8 +364,34 @@ public void testDropNamespace() {
catalog.createNamespace(NS);
Assert.assertTrue("Namespace should exist", catalog.namespaceExists(NS));

catalog.dropNamespace(NS, true);
boolean condition = catalog.namespaceExists(NS);
Assert.assertFalse(condition);
}

@Test
public void testDropNamespaceCascade() {
C catalog = catalog();

catalog.createNamespace(NS);
TableIdentifier ident = TableIdentifier.of("ns", "table");
Table table = catalog.buildTable(ident, SCHEMA).create();

Assert.assertTrue(
"Dropping an existing namespace should return true", catalog.dropNamespace(NS));
"Dropping an existing namespace should return true", catalog.dropNamespace(NS, true));
Assert.assertFalse("Namespace should not exist", catalog.namespaceExists(NS));
}

@Test
public void testDropNamespaceCascadeFalse() {
C catalog = catalog();

catalog.createNamespace(NS);
TableIdentifier ident = TableIdentifier.of("ns", "table");
Table table = catalog.buildTable(ident, SCHEMA).create();

Assertions.assertThatThrownBy(() -> catalog.dropNamespace(NS, false))
.isInstanceOf(NamespaceNotEmptyException.class);
Assert.assertFalse("Namespace should not exist", catalog.namespaceExists(NS));
}

Expand Down
Loading