-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Support for TIMESTAMP WITH TIME ZONE with nanosecond precision in Parquet #13599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for TIMESTAMP WITH TIME ZONE with nanosecond precision in Parquet #13599
Conversation
|
What is the end-user visible effect of these changes? |
It's now possible to read Parquet files that use NANOS format (link to spec) with columns of type TIMESTAMP WITH TIME ZONE. Previously Trino simply raised exception when attempting to query them. I have only tested this on Hive connector (with #13595), but this presumably applies to Iceberg and Delta Lake, which already have support for TIMESTAMP WITH TIME ZONE. |
cad9528 to
76fee2c
Compare
|
@zielmicha thanks for explaining. What's the schema of the attached parquet file? |
|
I'm not sure what's the best way to get Parquet schema, but: prints |
|
76fee2c to
4041a7c
Compare
skrzypo987
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR name is sort of misleading, as the TIMESTAMP WITH TIME ZONE is the Tirno type, not Parquet.
Try something like "Support read of Parquet nanosecond timestamp column into TIMESTAMP WITH TIME ZONE Trino type"
plugin/trino-hive/src/test/java/io/trino/plugin/hive/parquet/TestTimestampPrecision.java
Outdated
Show resolved
Hide resolved
4041a7c to
134a9df
Compare
53588d0 to
05a0efa
Compare
|
@raunaqmorarka can you take a look? If I'm not mistaken @skrzypo987 has requested your review. |
|
@raunaqmorarka will you have time to review this PR? Let me know if I should find another reviewer. |
I'll take a look at it early next week |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we don't have a way to write this data via trino-parquet, we should add support for that.
There should be a test in AbstractTestParquetReader similar to testTimestamp but with time zone. It'll need update to ParquetTester#writeValue to add support for timestamp with timezone type.
There should also be a test in TestHiveCompatibility once the write path is working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing TIMESTAMP WITH TIME ZONE isn't supported by Hive connector anyway yet - I'm planning to implement this in the next PR.
lib/trino-parquet/src/main/java/io/trino/parquet/reader/Int64TimestampNanosColumnReader.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/reader/Int64TimestampNanosColumnReader.java
Outdated
Show resolved
Hide resolved
|
I may not be the best person to review this. @raunaqmorarka could you please take an another look? |
05a0efa to
43fbecd
Compare
761f519 to
2f2dd5d
Compare
|
@zielmicha could you rebase? @martint @findepi and @raunaqmorarka .. could you help moving this forward? |
9120e07 to
2e9a4b1
Compare
|
Please rebase to latest master |
b41dc13 to
47990db
Compare
eb5899f to
3cb2abc
Compare
3cb2abc to
b019ac2
Compare
|
@raunaqmorarka @zielmicha Any progress on this? At Iceberg we're adding support for Nanosecond precision: apache/iceberg#8683 |
|
@zielmicha @martint @raunaqmorarka could we rebase this and move towards merge? |
|
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
|
@zielmicha @martint @raunaqmorarka I know this PR has been around a long time. From what I understand it would still be great to get this feature in for @zielmicha .. what are next steps to enable that? Rebase probably.. anything else? |
|
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
|
Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time. |
|
Reopening since I believe we want this feature added based on past discussions. I will leave @martint to decide on next steps. |
Add tests for reading Parquet files with nanosecond timestamp precision into various Trino timestamp types (MILLIS, MICROS, NANOS), both with and without time zone. Test resources and test structure from trinodb#13599 Co-Authored-By: Michał Zieliński <michal@zielinscy.org.pl>
Add tests for reading Parquet files with nanosecond timestamp precision into various Trino timestamp types (MILLIS, MICROS, NANOS), both with and without time zone. Test resources and test structure from trinodb#13599 Co-Authored-By: Michał Zieliński <michal@zielinscy.org.pl>
Add tests for reading Parquet files with nanosecond timestamp precision into various Trino timestamp types (MILLIS, MICROS, NANOS), both with and without time zone. Test resources and test structure from #13599 Co-Authored-By: Michał Zieliński <michal@zielinscy.org.pl>
Description
Adds support for reading of Parquet nanosecond timestamp column into TIMESTAMP WITH TIME ZONE Trino type.
This is a fix or a new feature, depending on how you look at it - the support for TIMESTAMP WITH TIME ZONE already existed for millis and micros precision.
This is change to a library used by some connectors (Hive, Delta Lake, Iceberg), but in practice only changes anything for Hive connector (other connectors don't support nanosecond TIMESTAMP WITH TIME ZONE)
For Hive connector, the change adds support for reading into TIMESTAMP WITH TIME ZONE columns from Parquet files that contain timestamp columns formatted in certain way (storing the timestamp as number of nanoseconds).
Related issues, pull requests, and links
Documentation
(I think no documentation is needed)
( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
( ) Release notes entries required with the following suggested text: