Skip to content

Test for Hive compatibility across file formats#10498

Merged
findepi merged 1 commit intotrinodb:masterfrom
findinpath:test-hive-compatibility-across-file-formats-2
Jan 14, 2022
Merged

Test for Hive compatibility across file formats#10498
findepi merged 1 commit intotrinodb:masterfrom
findinpath:test-hive-compatibility-across-file-formats-2

Conversation

@findinpath
Copy link
Copy Markdown
Contributor

Test whether the information written by Trino is correctly read by Hive across the following file formats:

ORC
PARQUET
RCBINARY
RCTEXT
SEQUENCEFILE
TEXTFILE
AVRO

This is the second attempt to merge these changes after #10309 (which has been reverted after finding the issue #10486)

@findepi
Copy link
Copy Markdown
Member

findepi commented Jan 7, 2022

cc @hashhar @kokosing

Copy link
Copy Markdown
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. A question for my own understanding.

@findepi
Copy link
Copy Markdown
Member

findepi commented Jan 12, 2022

@findinpath the tests discovered a preexisting standing bug in the new Parquet writer -- #4608 (i reopened #6377 for that).

Please accomodate the current state of things to make the tests happy, so that we can merge the PR and prevent future regressions.

@findinpath findinpath force-pushed the test-hive-compatibility-across-file-formats-2 branch from 2c52865 to 6d2b51f Compare January 13, 2022 06:49
@findinpath findinpath requested a review from findepi January 13, 2022 06:53
@findinpath findinpath force-pushed the test-hive-compatibility-across-file-formats-2 branch from 6d2b51f to 823ff89 Compare January 14, 2022 08:39
@findepi
Copy link
Copy Markdown
Member

findepi commented Jan 14, 2022

@findinpath it seems you did a rebase since the initial post, at which point it was a mere cherry pick.
What are the changes compared to #10309 that i should review?

@findinpath
Copy link
Copy Markdown
Contributor Author

@findepi these are the changes that have appeared on the PR since the last review:

        if (!(isParquetStorageFormat && isParquetOptimizedWriterEnabled)) {
            // Hive expects `FIXED_LEN_BYTE_ARRAY` for decimal values irrespective of the Parquet specification which allows `INT32`, `INT64` for short precision decimal types
            columnDataList.add(new HiveCompatibilityColumnData("c_decimal_10_0", "decimal(10,0)", "346", new BigDecimal("346")));
            columnDataList.add(new HiveCompatibilityColumnData("c_decimal_10_2", "decimal(10,2)", "12345678.91", new BigDecimal("12345678.91")));
        }
if (!(isParquetStorageFormat && isParquetOptimizedWriterEnabled)) {
            // Hive expects `INT96` (deprecated on Parquet) for timestamp values
            ...
}

@findepi
Copy link
Copy Markdown
Member

findepi commented Jan 14, 2022

As a follow-up please add tests showing the timestamp and decimal-related Parquet problems.
They will be failing for now (io.trino.tempto.assertions.QueryAssert#assertQueryFailure)

@findepi findepi merged commit 206fff7 into trinodb:master Jan 14, 2022
@github-actions github-actions bot added this to the 369 milestone Jan 14, 2022
@findinpath
Copy link
Copy Markdown
Contributor Author

Created #10613 follow-up PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants