Fix CTE Materialization unsupported hive bucket types#21549
Fix CTE Materialization unsupported hive bucket types#21549jaystarshot merged 1 commit intoprestodb:masterfrom
Conversation
1f20316 to
839457c
Compare
839457c to
082fa1a
Compare
There was a problem hiding this comment.
Can we think of a way to do this that doesn't require engine changes that presume the underlying connector's capabilities? I'm just thinking how this might work if we use something like Iceberg.
There was a problem hiding this comment.
Yes, I thought of the same but that requires spi changes. (isTypeBucketable etc) or we can switch on partitioning_provider_catalog session property
There was a problem hiding this comment.
Also not sure the iceberg connector supports temporary table functionality atm
There was a problem hiding this comment.
SPI changes could be fine if they're targeted and allow the engine to function properly without baking in underlying presumptions of the storage subsystem.
430f5a2 to
a9ed71f
Compare
There was a problem hiding this comment.
Added so that the testing framwork (logicalCteOptimizer) can test successfully.
a9ed71f to
e1c843a
Compare
|
@tdcmeehan I have refactored to check via SPI, please review |
There was a problem hiding this comment.
Can we use a more concise name, like isTypeBucketable?
|
Actually I had hardcoded all the uses of partitioning_provider_catalog to "hive" in our testing. On correcting and using partitioning_provider_catalog property, I see unrelated failures in different stack like https://github.com/prestodb/presto/blob/d121d92484856665157f36a74151be3ce3a3f474/p[…]com/facebook/presto/sql/planner/optimizations/AddExchanges.java without cte materialization |
|
Should I separate those session properties in #21625 ? |
e1c843a to
c5d4896
Compare
|
Revisiting and regaining context |
|
@tdcmeehan During his review mentioned that the stack trace in the attached ticket ( #21540 a bit suspicious since it suggests that a bucketFunctionType HIVE_COMPATIBLE was used. |
c5d4896 to
3f194ea
Compare
3f194ea to
9008e10
Compare
tdcmeehan
left a comment
There was a problem hiding this comment.
LGTM. @feilong-liu would you like to have a look at the current PR again?
There was a problem hiding this comment.
@tdcmeehan I just saw and one test case is failing
@Test
public void testCteWithZeroLengthVarchar()
{
String testQuery = "WITH temp AS (" +
" SELECT * FROM (VALUES " +
" (CAST('' AS VARCHAR(0)), 9)" +
" ) AS t (text_column, number_column)" +
") SELECT * FROM temp";
QueryRunner queryRunner = getQueryRunner();
compareResults(queryRunner.execute(getMaterializedSession(),
testQuery),
queryRunner.execute(getSession(),
testQuery));
}
Stack Trace
Caused by: java.lang.RuntimeException: Varchar length 0 out of allowed range [1, 65535]
at org.apache.hadoop.hive.serde2.typeinfo.BaseCharUtils.validateVarcharParameter(BaseCharUtils.java:32)
at org.apache.hadoop.hive.serde2.typeinfo.VarcharTypeInfo.<init>(VarcharTypeInfo.java:33)
at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory.createPrimitiveTypeInfo(TypeInfoFactory.java:159)
at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory.getPrimitiveTypeInfo(TypeInfoFactory.java:117)
at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory.getVarcharTypeInfo(TypeInfoFactory.java:183)
at com.facebook.presto.hive.HiveTypeTranslator.translate(HiveTypeTranslator.java:98)
at com.facebook.presto.hive.HiveType.toHiveType(HiveType.java:218)
at com.facebook.presto.hive.HiveMetadata.getColumnHandles(HiveMetadata.java:3584)
at com.facebook.presto.hive.HiveMetadata.createTemporaryTable(HiveMetadata.java:1107)
So we might need this check for the 0 length Varchar due to , because write with 0 length varchar fails.
See reference - While Presto supports Varchar of length 0 (as discussed in trinodb/trino#1136
There was a problem hiding this comment.
I am hence planning to disallow cte creation if there is a 0 length varchar but the current present fix was anyway not enough (checking for varchar and 0 length) because it didn't check for nested types, we need a proper separate fix for this since I suspect that this issue exists even in current CTAS
There was a problem hiding this comment.
Si will differ this test case into a new issue
There was a problem hiding this comment.
Confirmed that
CREATE TABLE tmp.test2512351 AS (
SELECT * FROM (VALUES
(CAST('' AS VARCHAR(0)), 9)
) AS t(col1, col2)
);
fails in Presto
c0c1e04 to
3ce1760
Compare
3ce1760 to
0e2923c
Compare
|
cc: @tdcmeehan, need an approval again thanks! Fixed some tests which were based on the previous unsupported bucket functions |
0e2923c to
da29a8f
Compare
Fixes #21540
Hive catalog does not allow bucketing on a lot of presto types including user defined types. link.
Hence making cte materialization avoid those types to bucket on. If all columns are of unsupported types, then the cte will not be materialized
Added test cases for all supported and unsupported types to make sure that the queries are successful
Motivation and Context
Impact
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.