-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Add support for timestamp to varchar coercer in hive tables #16869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for timestamp to varchar coercer in hive tables #16869
Conversation
|
Will be adding some additional test coverage for |
skrzypo987
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok
...ct-tests/src/main/java/io/trino/tests/product/hive/TestHiveCoercionOnUnpartitionedTable.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/coercions/TimestampCoercer.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/orc/OrcPageSource.java
Outdated
Show resolved
Hide resolved
53b81db to
5468428
Compare
5468428 to
17eb591
Compare
|
@skrzypo987 AC |
67ef13a to
dcee921
Compare
skrzypo987
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
dcee921 to
969cfb7
Compare
969cfb7 to
18210ea
Compare
plugin/trino-hive/src/test/java/io/trino/plugin/hive/coercions/CoercionAssertions.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/coercions/CoercionAssertions.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/coercions/TestTimestampCoercer.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/coercions/CoercionAssertions.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have an IDE warning on this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share the warning here - Mine is not showing any warning.
plugin/trino-hive/src/main/java/io/trino/plugin/hive/coercions/TimestampCoercer.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/coercions/TimestampCoercer.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/coercions/TimestampCoercer.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/coercions/TestTimestampCoercer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When doing implicit coercion to timestamp, hive cast to a zone specific String - so wanted it to be in sync with UTC (like we do in Trino). This is only specific to ORC file for other formats it doesn't consider the timezone of the HiveServer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick google search suggests that TZ env variable means time zone name. /usr/share/zoneinfo/UTC looks like a file path, not a time zone name. How does it work?
When doing implicit coercion to timestamp, hive cast to a zone specific String - so wanted it to be in sync with UTC (like we do in Trino).
would this information be useful for future code maintainers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick google search suggested me to set TZ to this file and it worked for me. Is there any workaround I could try here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick google search suggested me to set TZ to this file
interesting. i should have taken a note what was the search terms in my quick google search...
what happens if you
container.withEnv("TZ", "UTC")
(instead of the file)?
and what happens if you
container.withEnv("TZ", "/dev/null")
or
container.withEnv("TZ", "absolutely-random-string")
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like specifying a timezone also works like passing the file
container.withEnv("TZ", "UTC")
If we pass
container.withEnv("TZ", "/dev/null")
It switches to default value UTC
For
container.withEnv("TZ", "absolutely-random-string")
It switches to default value UTC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we pass
container.withEnv("TZ", "/dev/null")It switches to default value
UTCFor
container.withEnv("TZ", "absolutely-random-string")It switches to default value
UTC.
so /usr/share/zoneinfo/UTC looks like something meaningful, but actually is "garbage time zone" and causes something (the JVM?) to switch to UTC.
funny
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually /usr/share/zoneinfo/UTC is not a garbage timezone - pointing to a different file like usr/share/zoneinfo/EET changes to EEST timezone unlike UTC
|
@findepi AC |
|
Will fix a few test failures |
plugin/trino-hive/src/main/java/io/trino/plugin/hive/orc/OrcPageSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/orc/OrcPageSource.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be clearer to define a BlockTypeCoercion interface for public Block apply(Block block) and then use that in TypeCoercer instead of Function<Block, Block> and here instead of UnaryOperator<Block>.
It will also be easier to track possible implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The from and to types of the coercion would be useful info in the toString
a3246bd to
174032d
Compare
|
@findepi / @raunaqmorarka AC |
0bc0616 to
3c55156
Compare
|
@raunaqmorarka AC (first one) |
|
@findepi / @raunaqmorarka Can you PTAL ? |
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HivePageSourceProvider.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move above all other members
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the other members are static final so do we need to move the the default constructor ?
plugin/trino-hive/src/main/java/io/trino/plugin/hive/orc/OrcPageSourceFactory.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively we could support to-varchar directly in io.trino.orc.reader.TimestampColumnReader,
no change requested (i like the current approach)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could but orc is being used in iceberg connector also so we might be applying coercion (IIRC it is not supported by iceberg)
plugin/trino-hive/src/main/java/io/trino/plugin/hive/orc/OrcTypeTranslator.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick google search suggested me to set TZ to this file
interesting. i should have taken a note what was the search terms in my quick google search...
what happens if you
container.withEnv("TZ", "UTC")
(instead of the file)?
and what happens if you
container.withEnv("TZ", "/dev/null")
or
container.withEnv("TZ", "absolutely-random-string")
?
05556a4 to
367940c
Compare
|
@findepi AC |
plugin/trino-hive/src/main/java/io/trino/plugin/hive/orc/OrcPageSourceFactory.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we pass
container.withEnv("TZ", "/dev/null")It switches to default value
UTCFor
container.withEnv("TZ", "absolutely-random-string")It switches to default value
UTC.
so /usr/share/zoneinfo/UTC looks like something meaningful, but actually is "garbage time zone" and causes something (the JVM?) to switch to UTC.
funny
367940c to
420ef58
Compare
|
@findepi AC |
420ef58 to
af5b3d2
Compare
|
Did a few minor changes on |
af5b3d2 to
5e298c7
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HivePageSource.java
Outdated
Show resolved
Hide resolved
5e298c7 to
c888e65
Compare
|
@huberty89 AC |
|
@raunaqmorarka Gentle ping |
raunaqmorarka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
I'm wondering why this is specifically supported for ORC and not parquet or other file formats for unpartitioned columns. Does Apache Hive support this for non-orc formats ? If this coercion is to be supported regardless of file format, then we might want the coercion to be done in HivePageSource rather than OrcPageSource
plugin/trino-hive/src/test/java/io/trino/plugin/hive/coercions/TestTimestampCoercer.java
Outdated
Show resolved
Hide resolved
c888e65 to
60285de
Compare
Parquet doesn't support all types of coercion like ORC - For instance Parquet supports timestamp to varchar coercion but it doesn't support varchar to timestamp coercion, while both are supported by ORC formats. So to start with we are planning for orc and then we could extend it for other file formats.
In that case we might need to introduce some sort of an abstraction that allows to Reader to provide the schema and compare with the Implementing for ORC allows us to create some sort of a testing framework which could be utilized for other formats in the future. |
Description
Add support for timestamp to varchar coercer in hive tables. In case of partitioned table it is supported by most of the format and in case of unpartitioned tables only ORC format is supported as of now.
The coercion that was supported as of current master for unpartitioned tables are inherently supported by the
ColumnReaderand this PR introduces a framework which maps aOrcTypeKindto a correspondingTrinoTypeand also re-uses theTypeCoercerused by partitioned tables.Additional context and related issues
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: