-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Spark 3.3: drop_namespace with CASCADE support #7275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 10 commits
b41e4bf
f2a7ace
8441c74
00671d2
a4049d6
01de116
d13837a
da335fd
2c31a4f
db87d5a
aa63ee3
28b289b
c5dd8a5
5ad1e94
d4d9dc2
c75ea5c
4c05fcb
59bb5e3
e485616
4a04148
95d6676
f5219db
ac28327
187d378
e43de0e
3ba10f9
3e852ac
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -309,6 +309,17 @@ default List<Namespace> listNamespaces(SessionContext context) { | |
| */ | ||
| boolean dropNamespace(SessionContext context, Namespace namespace); | ||
|
|
||
| /** | ||
| * Drop a namespace. If the namespace exists and was dropped, this will return true. | ||
| * | ||
| * @param context session context | ||
| * @param namespace a {@link Namespace namespace} | ||
| * @param cascade – When true, deletes all objects under the namespace | ||
| * @return true if the namespace was dropped, false otherwise. | ||
| * @throws NamespaceNotEmptyException If the namespace is not empty | ||
| */ | ||
| boolean dropNamespace(SessionContext context, Namespace namespace, boolean cascade); | ||
|
||
|
|
||
| /** | ||
| * Set a collection of properties on a namespace in the catalog. | ||
| * | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -284,6 +284,15 @@ public boolean dropNamespace(Namespace namespace) throws NamespaceNotEmptyExcept | |
| } | ||
| } | ||
|
|
||
| @Override | ||
| public boolean dropNamespace(Namespace namespace, boolean cascade) | ||
| throws NamespaceNotEmptyException { | ||
| if (cascade) { | ||
| listTables(namespace).forEach(this::dropTable); | ||
|
||
| } | ||
| return dropNamespace(namespace); | ||
| } | ||
|
|
||
| @Override | ||
| public boolean setProperties(Namespace namespace, Map<String, String> properties) | ||
| throws NoSuchNamespaceException { | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -349,6 +349,17 @@ public boolean dropNamespace(Namespace namespace) { | |
| } | ||
| } | ||
|
|
||
| @Override | ||
| public boolean dropNamespace(Namespace namespace, boolean cascade) | ||
| throws NamespaceNotEmptyException { | ||
| if (cascade) { | ||
| // recursively delete all nested namespaces | ||
| listNamespaces(namespace).forEach(n -> dropNamespace(n, true)); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All of the other catalogs do not drop nested namespaces, do we need to cover this in other cases?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From my testing and existing tests, not all catalogs support nested namespaces only the ones I covered here do. Example Hive doesn't support nested namespaces. If I missed any, I can add those as well. |
||
| listTables(namespace).forEach(this::dropTable); | ||
| } | ||
| return dropNamespace(namespace); | ||
| } | ||
|
|
||
| @Override | ||
| public boolean setProperties(Namespace namespace, Map<String, String> properties) { | ||
| throw new UnsupportedOperationException( | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -419,6 +419,17 @@ public boolean dropNamespace(Namespace namespace) throws NamespaceNotEmptyExcept | |||
| return deletedRows > 0; | ||||
| } | ||||
|
|
||||
| @Override | ||||
| public boolean dropNamespace(Namespace namespace, boolean cascade) | ||||
| throws NamespaceNotEmptyException { | ||||
| if (cascade) { | ||||
| // recursively delete all nested namespaces | ||||
| listNamespaces(namespace).forEach(n -> dropNamespace(n, true)); | ||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. JDBC Supports nested namespaces?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe so based on
|
||||
| listTables(namespace).forEach(this::dropTable); | ||||
| } | ||||
| return dropNamespace(namespace); | ||||
| } | ||||
|
|
||||
| @Override | ||||
| public boolean setProperties(Namespace namespace, Map<String, String> properties) | ||||
| throws NoSuchNamespaceException { | ||||
|
|
||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a stricter definition IMO. In some instances below here you delete folders, sometimes you delete files, sometimes it deletes entries. We need a strong definition here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One big missing thing here is how this works with "purge"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have two options, drop_namespace with cascade will:
We can standardize this in Iceberg and go with the preferred option, #1 sounds better as #2 can take a long time depending on number of tables and data in them. This behavior is not consistent across all catalogs, for example hive metastore cascade deletes all the data (purge) in tables where as others don't.
should I create a vote thread for this to decide on the preference or can we decide that in this PR? I can then update the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RussellSpitzer documented the current purge behavior in the PR description and docs.
Only Hive catalog purges the tables as that's the default behavior of hive when we pass the cascade=true parameter.
I can make it consistent across all catalogs and modify current hive catalog behavior to just drop table without purge similar to other catalogs. Let me know your thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sorry I didn't get back to this but I do believe we probably need a discuss thread, or we can get feedback here but I think we need to hear from @jackye1995 and others who are maintaining other Catalog implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed implementation for other catalogs, will initiate a discussion once api change in this PR is merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glue by default does cascade drop if just call
glue.deleteDatabase, but does not drop data in the table. So technically you can support that directly. But I am okay to publish another PR to support that, up to you.For doing purge or not, I think the behavior of cascade is not clearly specified in Spark. It could also be argued as something that is catalog-specific, just like the behavior of Hive and Glue are different and it will be difficult to satisfy the other behavior on both sides.
This is related to the issue about different behaviors of list namespace that comes up recently. Maybe we should make a page documenting all catalogs and their different behaviors.
+1 to have a devlist discussion thread first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will start a devlist discussion and create a one pager with current state. Thanks @jackye1995
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackye1995 To confirm are you OK with adding this api to Iceberg? Or do you want keep it within engine implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I am okay with adding this to Iceberg API, otherwise there is no way to leverage catalog service level integration features. I would imagine people using REST catalog would like to just handle this behind the service.