Skip to content

Test all file formats in TestSparkCompatibility#6699

Closed
lxynov wants to merge 3 commits intotrinodb:masterfrom
lxynov:TestSparkCompatibility
Closed

Test all file formats in TestSparkCompatibility#6699
lxynov wants to merge 3 commits intotrinodb:masterfrom
lxynov:TestSparkCompatibility

Conversation

@lxynov
Copy link
Copy Markdown
Member

@lxynov lxynov commented Jan 23, 2021

Trino started failing to read Spark-created ORC Iceberg tables for some reason. See CI result for reference


Update: Fixed after iceberg.use-file-size-from-metadata=false is set. Thanks to @phd3

@cla-bot cla-bot bot added the cla-signed label Jan 23, 2021
Copy link
Copy Markdown
Member

@phd3 phd3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lxynov One possible reason behind this is #6174 , can you please try with iceberg.use-file-size-from-metadata=false ( PR #6539 )?

@lxynov lxynov force-pushed the TestSparkCompatibility branch 2 times, most recently from ee8bf92 to 465f8a8 Compare January 25, 2021 23:36
@lxynov lxynov changed the title [WIP] Test all file formats in TestSparkCompatibility Test all file formats in TestSparkCompatibility Jan 26, 2021
@lxynov lxynov requested a review from phd3 January 26, 2021 02:50
@lxynov
Copy link
Copy Markdown
Member Author

lxynov commented Jan 26, 2021

Thanks @phd3! It was indeed because the Spark writer populated incorrect file sizes in Iceberg metadata. Now the test passes. I've just requested a review from you since you've probably reviewed this part when you were reviewing #4776

Copy link
Copy Markdown
Member

@phd3 phd3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, just some minor comments.

@lxynov lxynov force-pushed the TestSparkCompatibility branch from 465f8a8 to 63ac92b Compare January 27, 2021 17:47
@lxynov lxynov force-pushed the TestSparkCompatibility branch from 63ac92b to 4dc1f6a Compare January 29, 2021 23:25
@lxynov
Copy link
Copy Markdown
Member Author

lxynov commented Jan 30, 2021

@phd3 Changed TestSparkCompatibility to include formats in temporary table names to circumvent concurrency issues. This PR should be good to go. cc @electrum

@phd3 phd3 self-requested a review February 3, 2021 04:25
String baseTableName = "test_spark_reads_presto_partitioned_table_" + format;
String prestoTableName = prestoTableName(baseTableName);
onPresto().executeQuery(format("CREATE TABLE %s (_string VARCHAR, _bigint BIGINT) WITH (partitioning = ARRAY['_string'])", prestoTableName));
onPresto().executeQuery(format("CREATE TABLE %s (_string VARCHAR, _bigint BIGINT) WITH (partitioning = ARRAY['_string'], format = '" + format + "')", prestoTableName));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
onPresto().executeQuery(format("CREATE TABLE %s (_string VARCHAR, _bigint BIGINT) WITH (partitioning = ARRAY['_string'], format = '" + format + "')", prestoTableName));
onPresto().executeQuery(format("CREATE TABLE %s (_string VARCHAR, _bigint BIGINT) WITH (partitioning = ARRAY['_string'], format = '%s')", prestoTableName, format));

sorry if it was confusing, meant a change like the above in #6699 (comment)

@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented Jul 13, 2022

Superseded by #8751

@ebyhr ebyhr closed this Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants