feat(plugin-iceberg): Add DDL statements to drop branches and tags#23614
Conversation
|
Codenotify: Notifying subscribers in CODENOTIFY files for diff 367b7d7...84553ba.
|
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
Pull branch, local docs build, looks good. Thanks!
3fe4d56 to
ee54cd3
Compare
There was a problem hiding this comment.
Some high level feedback. Most of the code looks good. I will take a closer look on 2nd pass.
Also, I wanted to bring a thought I had about parser extensions. Since many connectors will not support this branching and tagging, I was thinking that maybe we ought to consider designing a SQL syntax plugin extension interface. Spark allows custom syntax extensions through implementing some set of interfaces or bringing your own parser. The upstream iceberg project now maintains their own spark SQL syntax extensions.
I'm not proposing we need that for this PR, but maybe it's something we should start thinking about if connectors start adding more radically different features that would be best left to some syntax extensions/optional plugins, especially for things outside the SQL specification.
| } | ||
|
|
||
| @Override | ||
| public void checkCanDropTag(ConnectorTransactionHandle transaction, ConnectorIdentity identity, AccessControlContext context, SchemaTableName tableName) |
There was a problem hiding this comment.
I wonder about the granularity of these methods. In other implementation (e.g. spark?) at what granularity do they enforce the ability to do CRUD operations on tags and branches?
I'm thinking about a few cases
- A group or user(s) can only access a certain set of branches or tags
- A group or user(s) can only create branches starting from a specific branch
- A group or user(s) can create tags
I know we're only implementing DROP but I want to understand the whole story for access control around branches and tags. Would we ever need to pass the branch/tag to these methods?
There was a problem hiding this comment.
I think the granularity of access control methods in Spark for CRUD operations on Iceberg tags and branches is limited by Spark's integration with external systems (such as file systems, catalogs, and security frameworks like Apache Ranger).
For example, Ranger policies can define access controls at the table level, which could be extended to manage specific branches or tag-based access.
And like for cloud-based catalogs like AWS Glue, you can control access to Iceberg metadata (branches and tags) via IAM policies that grant or restrict specific operations.
There was a problem hiding this comment.
Thanks for the info. Would the parameters passed here as context have enough information for us to act at a similar granularity? I don't see anything in the method parameters that contains the branch name which I assume we would need to perform access control at a similar level.
There was a problem hiding this comment.
I would like to discuss around this, Systems like Ranger can define access controls at the table level, column level. So in this case I think access of drop branch & tags could be table based. As per I can think of branch & tag level policies then has to be maintained on engine side if we introduce branch name / tag name in here?
There was a problem hiding this comment.
@tdcmeehan What do you think about access control for branches and tags? Would be based on the parent table itself or based on the tags/branches?
My thinking was that since no policies are enforced based on branch/tags via security frameworks, should this honor the same access policies as table? Or if we don't even need access control exposed for dropTag & dropBranch?
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorMetadata.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorMetadata.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/test/java/com/facebook/presto/iceberg/IcebergDistributedTestBase.java
Show resolved
Hide resolved
presto-iceberg/src/test/java/com/facebook/presto/iceberg/IcebergDistributedTestBase.java
Show resolved
Hide resolved
8f4d3fa to
1e47c05
Compare
|
@ZacBlanco Can you please take another pass? |
hantangwangd
left a comment
There was a problem hiding this comment.
The whole change overall looks good to me. Some little nits, and one problem for discussing about the behavior of if exists on branch and tag.
presto-parser/src/main/java/com/facebook/presto/sql/tree/DropBranch.java
Outdated
Show resolved
Hide resolved
presto-parser/src/main/java/com/facebook/presto/sql/tree/DropBranch.java
Outdated
Show resolved
Hide resolved
presto-parser/src/main/java/com/facebook/presto/sql/tree/DropBranch.java
Outdated
Show resolved
Hide resolved
presto-parser/src/main/java/com/facebook/presto/sql/tree/DropTag.java
Outdated
Show resolved
Hide resolved
presto-parser/src/main/java/com/facebook/presto/sql/tree/DropTag.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/DropTagTask.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Show resolved
Hide resolved
presto-iceberg/src/test/java/com/facebook/presto/iceberg/IcebergDistributedTestBase.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/test/java/com/facebook/presto/iceberg/IcebergDistributedTestBase.java
Outdated
Show resolved
Hide resolved
0abecd5 to
5af0a0d
Compare
|
@hantangwangd Thanks for you review. I have addressed your comments, please review. |
hantangwangd
left a comment
There was a problem hiding this comment.
Thanks for the fix. LGTM!
|
Hi @tdcmeehan, This is now waiting for the final committer review. Please review whenever you have time. |
pratyakshsharma
left a comment
There was a problem hiding this comment.
Thank you for the PR, minor comments, and rest looks good.
|
|
||
| Optional<MaterializedViewDefinition> optionalMaterializedView = metadata.getMetadataResolver(session).getMaterializedView(tableName); | ||
| if (optionalMaterializedView.isPresent()) { | ||
| if (!statement.isTableExists()) { |
There was a problem hiding this comment.
should we throw an error in cases when if exists is present in the query and it is a materialized view? If drop branch is just not supported for MVs, ideally error should be thrown irrespective of exists check?
There was a problem hiding this comment.
Alternate way of handling this can be introducing an enum Type in DropBranch and have only TABLE as the supported type.
| return immediateFuture(null); | ||
| } | ||
|
|
||
| ConnectorId connectorId = metadata.getCatalogHandle(session, tableName.getCatalogName()) |
There was a problem hiding this comment.
nit: the variable connectorId is not getting used and can be removed. Also I have refactored and extracted this piece of code in MetadataUtil class in this commit - dec2952.
So we can probably do a similar change here as well so that all the Task classes use the same methods?i
| return immediateFuture(null); | ||
| } | ||
|
|
||
| ConnectorId connectorId = metadata.getCatalogHandle(session, tableName.getCatalogName()) |
There was a problem hiding this comment.
nit: same as above.
|
|
||
| Optional<MaterializedViewDefinition> optionalMaterializedView = metadata.getMetadataResolver(session).getMaterializedView(tableName); | ||
| if (optionalMaterializedView.isPresent()) { | ||
| if (!statement.isTableExists()) { |
There was a problem hiding this comment.
nit: same as above. What about the case when if exists is present and it is MV?
260b490
5af0a0d to
260b490
Compare
|
Thanks for the release note! Nit on phrasing to follow the Order of changes in the Release Notes Guidelines. |
260b490 to
e333bb1
Compare
e333bb1 to
dbabe20
Compare
hantangwangd
left a comment
There was a problem hiding this comment.
Thanks @agrawalreetika, LGTM for this change. +1 to @ZacBlanco's point about the extension mechanism for connector-specific SQL syntax - maybe we should start looking into this.
dbabe20 to
10d47a5
Compare
9522908 to
d91e3ae
Compare
d91e3ae to
84553ba
Compare
|
I'm comfortable with the idea of this change. While we should be cautious around the standard eventually adopting this type of syntax, the pace at which is the standard moves is far slower than even the Iceberg spec moves. It is too slow for us to block progress on this PR. I agree with the idea of customizing the parser and planning in general though, along the same lines as Spark. However, given we have already broken standards compliance with time travel (on |
hantangwangd
left a comment
There was a problem hiding this comment.
@agrawalreetika thanks for this feature, lgtm! Just a heads-up: the two entries under Iceberg Connector Changes appear to be duplicates.
Description
Add DDL statements to drop branches and tags
Motivation and Context
Resolves #22028
Impact
Resolves #22028
SQL support for dropping a branch from a table :
SQL support for dropping a tag from a table :
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.