Disallow querying iceberg tables in hive#10441
Conversation
There was a problem hiding this comment.
I found the same model also in the testHideDeltaLakeTables() and I am wondering whether it makes sense to do the skipping at the end of the method.
Can anyone explain the reasoning behind this ?
There was a problem hiding this comment.
the assertion is so we do not forget to "uncomment" the test when we fix the tested code.
But actually why is this test supposed to fail?
There was a problem hiding this comment.
seems testDisallowQueryingOfIcebergTables is expected to pass, so why the override?
There was a problem hiding this comment.
I'm not sure whether the strategy of using hive.hide-xxx-tables properties is that good.
Would it make sense to rather offer a generic property to hide any non-hive tables?
8b30733 to
d679601
Compare
There was a problem hiding this comment.
nit: propertiesTableName. Or jus tinline.
losipiuk
left a comment
There was a problem hiding this comment.
First and last commit LGTM (can you extract separate PR with those?)
Or two separate PRs? |
|
While I understand the need for this change, shouldn't this PR - #10173 - be a better solution for production use? |
@sajjoseph I'd say that the redirect usage is a different use case than what this PR is trying to solve. |
I've created a separate PR for disallowing the query of |
plugin/trino-hive/src/main/java/io/trino/plugin/hive/PropertiesSystemTableProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/PropertiesSystemTableProvider.java
Outdated
Show resolved
Hide resolved
|
As @sajjoseph (#10441 (comment)) and @findepi (#10441 (comment)) pointed out, the iceberg redirects are a better option to deal with iceberg tables from shared hive metastores. |
|
@findinpath i think there is a misunderstanding about my comment #10441 (comment) . We still want to properly reject Iceberg tables when redirects are not enabled, #8693 |
d679601 to
b87a079
Compare
There was a problem hiding this comment.
Throw "Cannot query Iceberg table" instead of making the system throw "table not found".
There was a problem hiding this comment.
Unfortunately this doesn't work well with the concept of redirection recently implemented in hive.
If we throw an exception while trying to get the table handle in Hive, the workflow used for in the analyzer for getting the table handle will stop abruptly before checking for redirections.
See https://github.com/trinodb/trino/blob/master/core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java#L1517 for reference
The current logic first checks whether we're dealing with a materialized view, then with a view and only at the end with a table (that may contain redirections).
There was a problem hiding this comment.
If we throw an exception while trying to get the table handle in Hive, the workflow used for in the analyzer for getting the table handle will stop abruptly before checking for redirections.
the redirections should be consulted first.
There was a problem hiding this comment.
The current logic first checks whether we're dealing with a materialized view, then with a view and only at the end with a table (that may contain redirections).
An iceberg table is not a materialized view, nor a view, so get[Materialized]View should return Optional.empty for these calls.
There was a problem hiding this comment.
At the moment, when checking whether we're dealing with a materialized view or view there is subsequently made a call towards HiveMetadata#getTableHandle which would end up in throwing an exception instead of returning Optional.empty in case that an exception would be thrown (in a similar fashion as for delta lake tables).
There was a problem hiding this comment.
Changing the logic of the StatementAnalyzer is something I'd avoid to do in this case for retrieving first the table redirections.
For this reason, I have opted to partly replicate the logic of HiveMetadata#getTableHandle method in the PartitionsSystemTableProvider which offers me the possibility to do a fine grained handling in case of dealing with an iceberg/delta lake table.
One inconvenient of this approach is that we have duplicated logic in HiveMetadata and in PartitionsSystemTableProvider.
There was a problem hiding this comment.
Changing the logic of the
StatementAnalyzeris something I'd avoid to do in this case for retrieving first the table redirections.
Agreed, this is out of question.
There was a problem hiding this comment.
why disabled?
add assertThatThrownBy(super:: ...
There was a problem hiding this comment.
While trying to run the test I receive
2022-01-11T03:30:02.708-0600 INFO Copying resource dir 'spark_bucketed_nation' to /var/folders/1q/y42hmc4s3yl38kp0t0q142_c0000gn/T/TestHiveInMemoryMetastore7495715074983273908
java.lang.IllegalArgumentException: Table directory does not exist
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:145)
at io.trino.plugin.hive.metastore.thrift.InMemoryThriftMetastore.createTable(InMemoryThriftMetastore.java:190)
at io.trino.plugin.hive.metastore.thrift.BridgingHiveMetastore.createTable(BridgingHiveMetastore.java:205)
at io.trino.plugin.hive.AbstractTestHive.testDisallowQueryingOfIcebergTables(AbstractTestHive.java:2978)
at io.trino.plugin.hive.TestHiveInMemoryMetastore.testDisallowQueryingOfIcebergTables(TestHiveInMemoryMetastore.java:68)
There was a problem hiding this comment.
Which directory?
This sounds like a test setup problem. Does it mean TestHiveInMemoryMetastore cannot run a test that uses createTable?
There was a problem hiding this comment.
I updated the comment to contain the missing directory path.
Note that the similar test testHideDeltaLakeTables is also containing a not supported skip exception.
1ec8505 to
20ba90a
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/PartitionsSystemTableProvider.java
Outdated
Show resolved
Hide resolved
96bb5b3 to
44f670f
Compare
core/trino-main/src/main/java/io/trino/execution/AddColumnTask.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
.add(row("iceberg_table")) // TODO: should this be filtered out?
filtering out the entry from show tables would involve querying the properties metadata of each table.
Can I remove this TODO ?
|
I've added a separate PR #10577 for extending the redirection awareness on the table tasks because the The current PR has the tests from |
bdafcba to
96b1e51
Compare
bccfb68 to
3fb545e
Compare
There was a problem hiding this comment.
Disallow querying Delta Lake system tables
This is not the right commit title, since it was disallowed before.
There was a problem hiding this comment.
I thought to name it Disallow instead of Deny.
I considered that before it was denied - by throwing the exception and now it is disallowed. Any wording suggestions in this case?
There was a problem hiding this comment.
for follow-up, this returns empty when table not found, while PropertiesSystemTableProvider throws in such case
There was a problem hiding this comment.
Actually PropertiesSystemTableProvider returns Optional.empty() as well.
This is covered by the previous test 3fb545e#diff-92c37834248d69878a49876bfa440c9f6afbee04b1363d402a04cf7fddc7bb62R2905-R2913
There was a problem hiding this comment.
do we have an issue to link to?
(nit: add space after //)
There was a problem hiding this comment.
NOTE that compared to HiveMetadata#getTableHandle I didn't include here anymore the check
// we must not allow system tables due to how permissions are checked in SystemTableAwareAccessControl
if (getSourceTableNameFromSystemTable(systemTableProviders, tableName).isPresent()) {
throw new TrinoException(HIVE_INVALID_METADATA, "Unexpected table present in Hive metastore: " + tableName);
}
Hive connector cannot read from Delta Lake tables reason why querying system tables for such tables shouldn't be permitted within the hive connector. In case of trying to query on the hive connector the special hive tables: - $properties - $partitions on a Delta Lake table, the user will receive a table not found exception.
Hive connector cannot read from Iceberg tables reason why querying such tables shouldn't be permitted within the hive connector. In case of trying to query an Iceberg table from the hive connector (without iceberg redirection enabled) the user will receive a hive unsupported format exception. It is no longer possible to create a view in hive which selects from Iceberg. In case of trying to query on the hive connector the special hive tables: - $properties - $partitions on an Iceberg table, the user will receive a table not found exception.
3fb545e to
22c899a
Compare
|
The force push diff (https://github.com/trinodb/trino/compare/3fb545ed8d4c40042966ec147923987ff79208f3..22c899a87226b5636191acc3e25cfc00be641f1b) is large. |
|
@findepi I changed the assertions in the test class |
Hive connector cannot read from Iceberg tables reason
why querying such tables shouldn't be permitted within
the hive connector.
In case of trying to query an Iceberg table from the
hive connector provide a meaningful message to the user
about not supporting such an operation.
In case that the hive users don't want to see at all
the Iceberg tables, the property
hive.hide-iceberg-tablescan be set for the
hiveconnector totrue.Fixes #8693