Avoid creating Hive tables with double slashed location#17958
Conversation
69692fc to
d0e7c99
Compare
losipiuk
left a comment
There was a problem hiding this comment.
LGTM. @findinpath @alexjo2144 PTAL.
There was a problem hiding this comment.
trino> create schema hive.doubleslashes with (location='s3://trino-test/double_slash//location');
CREATE SCHEMA
trino> show create schema hive.doubleslashes;
Create Schema
----------------------------------------------------------------
CREATE SCHEMA hive.doubleslashes
WITH (
location = 's3://trino-test/double_slash//location'
)
(1 row)
trino> create table hive.doubleslashes.test1 (c integer);
CREATE TABLE
trino> insert into hive.doubleslashes.test1 values 1,2,3;
INSERT: 3 rows
sele
Query 20230620_060740_00002_8c4bz, FINISHED, 1 node
http://localhost:8080/ui/query.html?20230620_060740_00002_8c4bz
Splits: 14 total, 14 done (100.00%)
CPU Time: 0.3s total, 0 rows/s, 0B/s, 28% active
Per Node: 0.1 parallelism, 0 rows/s, 0B/s
Parallelism: 0.1
Peak Memory: 38.2KB
3.11 [0 rows, 0B] [0 rows/s, 0B/s]
trino> select "$path" from hive.doubleslashes.test1;
$path
----------------------------------------------------------------------------------------------------------------------
s3://trino-test/double_slash/location/test1/20230620_060740_00002_8c4bz_e91a2ac5-4bc1-40ba-ab3f-2145eaa6bf97
s3://trino-test/double_slash/location/test1/20230620_060740_00002_8c4bz_e91a2ac5-4bc1-40ba-ab3f-2145eaa6bf97
s3://trino-test/double_slash/location/test1/20230620_060740_00002_8c4bz_e91a2ac5-4bc1-40ba-ab3f-2145eaa6bf97
(3 rows)
Note that this strategy may eventually cause issues with managed tables which get created into schemas having a double-slashed location.
schema location: s3://trino-test/double_slash//location
table location: s3://trino-test/double_slash/location/test1
There was a problem hiding this comment.
schema location:
s3://trino-test/double_slash//location
table location:s3://trino-test/double_slash/location/test1
well, that's the whole point of this PR
thanks for calling this out!
d0e7c99 to
b509013
Compare
There was a problem hiding this comment.
Rewrote the logic to accept locations such as file:///tmp/6383092576615971655, file:///tmp/6383092576615971655/, s3://bucket/foo/.
There was a problem hiding this comment.
as found in #17964 , two trailing slashes are not OK either.
i will not add a test case for that, but will block such locations.
test case coming in the other PR
b509013 to
38a2eda
Compare
ebyhr
left a comment
There was a problem hiding this comment.
Looks good except for CI failure:
Error: Failures:
Error: TestHiveS3AndGlueMetastoreTest.testBasicOperationsWithProvidedTableLocation:137
expected: "s3://trino-ci-test/test_glue_s3_icyjziw7ri/trailing_slash/test_basic_operations_kjxzwqj9yr"
but was: "s3://trino-ci-test/test_glue_s3_icyjziw7ri/trailing_slash/test_basic_operations_kjxzwqj9yr/"
Error: TestHiveS3AndGlueMetastoreTest.testBasicOperationsWithProvidedTableLocation:137
expected: "s3://trino-ci-test/test_glue_s3_icyjziw7ri/trailing_slash/test_basic_operations_tuoqpbqkww"
but was: "s3://trino-ci-test/test_glue_s3_icyjziw7ri/trailing_slash/test_basic_operations_tuoqpbqkww/"
Hive connector does not support table locations containing double slashes. On S3 this leads to correctness issues (e.g. INSERT works, but SELECT does not find any data). This commit - restores normalization of implicit table location during CREATE TABLE. There used to be such normalization until 8bd9f75. - rejects explicit table locations containing double slash during `CREATE TABLE .. WITH (external_location = ...)`. Before 8bd9f75 there used to be normalization also during this flow, but rejecting such unsupported locations is deemed more correct.
38a2eda to
9151dc8
Compare
|
Does this need release notes? @findepi |
|
yes. it fixes #17803 |
Hive connector does not support table locations containing double slashes. On S3 this leads to correctness issues (e.g. INSERT works, but SELECT does not find any data).
This commit
CREATE TABLE .. WITH (external_location = ...). Before 8bd9f75 there used to be normalization also during this flow, but rejecting such unsupported locations is deemed more correct.Alternative to #17947
Stop gap solution until we fix all problems related to double slashes in Hive connector (#17803).
After merging this, #17803 is no longer a
release-blocker.