Create external table location for Hive#17920
Conversation
a7e603c to
a8a7670
Compare
This comment was marked as off-topic.
This comment was marked as off-topic.
krvikash
left a comment
There was a problem hiding this comment.
LGTM. nitpick comments only.
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
a8a7670 to
12b7e34
Compare
Big thanks, all comments addressed |
|
I would recommend reading #1277. I don't think we want to allow creating the directory with external_location table property. |
krvikash
left a comment
There was a problem hiding this comment.
It seems the commit message is separated into two lines. Could you please make it in one line?
12b7e34 to
ad55aa3
Compare
|
need a way to disable this for following reasons:
|
tooptoop4
left a comment
There was a problem hiding this comment.
need config to prevent
For this there is This change does allow a user to create a table pointing to a non-existent location though because the directory would get auto-created. Also I agree with Yuya on that we shouldn't add this. Hive doesn't allow this for example and also because "Create external table target location, if it's not exists." is what a managed table is - not an external table. External tables are supposed to point to a table which already exists but isn't registered. And for allowing arbitrary location for managed tables see concerns in the issue and the two PRs linked from that issue that Yuya has shared. |
|
For me this change is more about, synchronise behaviour with s3: As far as I understand on s3 we don't have abstraction as directory, so if for example someone wants to create external table, he could just provide any non existing path and we even don't check it (exists or not exists), because this path will be automatically "created" by s3 (path will be just part of the key). So I'd like to give the same option for HDFS, if we want external table just give trino a path and we will create it. |
I mean, that I could not find code which actually check that table exists on provided external_path, it's just check that path exists (and only for non s3 file systems). |
Yes and this is good option to allow create managed table with location provided by the user. So as soon, as it will be implemented and merged, it will eliminate current change, but mean time current change just synchronise behaviours between s3 and hdfs. |
Last time when I checked with Hive - if create an external table pointing to non-existing directory, hive would create an empty directory #1277 is about setting the location of the manged table while this PR is about external table - I'm not sure if it would help us here. |
Praveen2112
left a comment
There was a problem hiding this comment.
Can we have additional PT coverage where we check the permission of the directory for external tables ? We do have a similar test for hdfs-impersonation
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
ded99ad to
1a8c89a
Compare
so we should introduce smth like |
didn't find how to create directory using TrinoFileSystem, or we should add method like |
@electrum Please correct me if I am wrong, this definition is derived from hive right, so if hive supports creating an empty directory if it doesn't exist shouldn't (for an external table) we support the same. Today we do support inserting data into a new partition based on
Even in this PR we do hide them behind a flag which is disabled by default. |
fffebe7 to
89246ff
Compare
|
Discussed offline with @electrum , now external location will be created if |
89246ff to
5442e70
Compare
a99dad3 to
39aa317
Compare
|
/test-with-secrets sha=39aa31743ed6cd83b0fcd785e22c43f450783ceb |
|
The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/5961189431 |
39aa317 to
883ae60
Compare
|
/test-with-secrets sha=883ae60c59bafb1f4d7dfb6dafb0ad8db2e1c602 |
|
The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/5961868678 |
|
There were a few flaky failures, re-ran them, now the pipeline is green |
|
Thanks for working on this. |
| return ImmutableList.of( | ||
| testOnEnvironment(EnvMultinode.class) | ||
| .withGroups("configured_features", "hdfs_no_impersonation") | ||
| .withExcludedTests("io.trino.tests.product.TestImpersonation.testExternalLocationTableCreationSuccess") |
There was a problem hiding this comment.
to be honest, don't remember any specific reason, need to run and check this test
There was a problem hiding this comment.
understood. please recover this info & capture it as a code comment.
There was a problem hiding this comment.
understood. please recover this info & capture it as a code comment.
got it I didn't want enable hive.non-managed-table-writes-enabled for whole EnvMultinode environment to break other tests,
which is required for this test
Description
Create directory structure if it's not exists.
external location will be created if
writesToNonManagedTablesEnabledflag is setThe same behaviour is on pure Hive as mentioned in #17920 (comment)
Additional context and related issues
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:
With current change - external location will be created for Hive tables if flag hive.non-managed-table-writes-enabled is set,
otherwise exception will raised as it was before.