Skip to content

Conversation

@zielmicha
Copy link
Member

@zielmicha zielmicha commented Aug 10, 2022

Description

Adds support for reading of Parquet nanosecond timestamp column into TIMESTAMP WITH TIME ZONE Trino type.

Is this change a fix, improvement, new feature, refactoring, or other?

This is a fix or a new feature, depending on how you look at it - the support for TIMESTAMP WITH TIME ZONE already existed for millis and micros precision.

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

This is change to a library used by some connectors (Hive, Delta Lake, Iceberg), but in practice only changes anything for Hive connector (other connectors don't support nanosecond TIMESTAMP WITH TIME ZONE)

How would you describe this change to a non-technical end user or system administrator?

For Hive connector, the change adds support for reading into TIMESTAMP WITH TIME ZONE columns from Parquet files that contain timestamp columns formatted in certain way (storing the timestamp as number of nanoseconds).

Related issues, pull requests, and links

Documentation

(I think no documentation is needed)

( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Support for TIMESTAMP WITH TIME ZONE with nanosecond precision in Parquet

@cla-bot cla-bot bot added the cla-signed label Aug 10, 2022
@findepi
Copy link
Member

findepi commented Aug 10, 2022

What is the end-user visible effect of these changes?

@zielmicha
Copy link
Member Author

What is the end-user visible effect of these changes?

It's now possible to read Parquet files that use NANOS format (link to spec) with columns of type TIMESTAMP WITH TIME ZONE. Previously Trino simply raised exception when attempting to query them.

I have only tested this on Hive connector (with #13595), but this presumably applies to Iceberg and Delta Lake, which already have support for TIMESTAMP WITH TIME ZONE.

@zielmicha zielmicha force-pushed the parquet-nanos-zoned-timestamp-fix branch from cad9528 to 76fee2c Compare August 10, 2022 14:53
@findepi
Copy link
Member

findepi commented Aug 10, 2022

@zielmicha thanks for explaining. What's the schema of the attached parquet file?

@zielmicha
Copy link
Member Author

I'm not sure what's the best way to get Parquet schema, but:

import pyarrow.parquet as pq
pq.read_table('issue-5483-nanos.parquet').schema

prints

created: timestamp[ns]

@findepi
Copy link
Member

findepi commented Aug 10, 2022

$ parquet meta plugin/trino-hive/src/test/resources/issue-5483-nanos.parquet

File path:  plugin/trino-hive/src/test/resources/issue-5483-nanos.parquet
Created by: parquet-cpp-arrow version 5.0.0
Properties: (none)
Schema:
message schema {
  optional int64 created (TIMESTAMP(NANOS,false));
}


Row group 0:  count: 1  96.00 B records  start: 4  total(compressed): 96 B total(uncompressed):92 B
--------------------------------------------------------------------------------
         type      encodings count     avg size   nulls   min / max
created  INT64     S _ R     1         96.00 B    0       "2020-10-12T16:26:02.90666..." / "2020-10-12T16:26:02.90666..."

@zielmicha zielmicha force-pushed the parquet-nanos-zoned-timestamp-fix branch from 76fee2c to 4041a7c Compare August 10, 2022 15:27
Copy link
Member

@skrzypo987 skrzypo987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR name is sort of misleading, as the TIMESTAMP WITH TIME ZONE is the Tirno type, not Parquet.
Try something like "Support read of Parquet nanosecond timestamp column into TIMESTAMP WITH TIME ZONE Trino type"

@zielmicha zielmicha force-pushed the parquet-nanos-zoned-timestamp-fix branch from 4041a7c to 134a9df Compare August 17, 2022 18:25
@github-actions github-actions bot added the docs label Aug 17, 2022
@zielmicha zielmicha force-pushed the parquet-nanos-zoned-timestamp-fix branch 3 times, most recently from 53588d0 to 05a0efa Compare August 22, 2022 14:27
@zielmicha
Copy link
Member Author

@raunaqmorarka can you take a look? If I'm not mistaken @skrzypo987 has requested your review.

@zielmicha
Copy link
Member Author

@raunaqmorarka will you have time to review this PR? Let me know if I should find another reviewer.

@raunaqmorarka
Copy link
Member

@raunaqmorarka will you have time to review this PR? Let me know if I should find another reviewer.

I'll take a look at it early next week

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we don't have a way to write this data via trino-parquet, we should add support for that.
There should be a test in AbstractTestParquetReader similar to testTimestamp but with time zone. It'll need update to ParquetTester#writeValue to add support for timestamp with timezone type.
There should also be a test in TestHiveCompatibility once the write path is working.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing TIMESTAMP WITH TIME ZONE isn't supported by Hive connector anyway yet - I'm planning to implement this in the next PR.

@arhimondr
Copy link
Contributor

I may not be the best person to review this. @raunaqmorarka could you please take an another look?

@zielmicha zielmicha force-pushed the parquet-nanos-zoned-timestamp-fix branch from 05a0efa to 43fbecd Compare May 16, 2023 08:25
@github-actions github-actions bot added the hive Hive connector label May 16, 2023
@zielmicha zielmicha force-pushed the parquet-nanos-zoned-timestamp-fix branch 2 times, most recently from 761f519 to 2f2dd5d Compare May 17, 2023 18:27
@mosabua
Copy link
Member

mosabua commented Jul 10, 2023

@zielmicha could you rebase?

@martint @findepi and @raunaqmorarka .. could you help moving this forward?

@zielmicha zielmicha force-pushed the parquet-nanos-zoned-timestamp-fix branch from 9120e07 to 2e9a4b1 Compare August 16, 2023 22:28
@raunaqmorarka
Copy link
Member

Please rebase to latest master

@zielmicha zielmicha force-pushed the parquet-nanos-zoned-timestamp-fix branch 3 times, most recently from b41dc13 to 47990db Compare August 23, 2023 19:42
@zielmicha zielmicha force-pushed the parquet-nanos-zoned-timestamp-fix branch 3 times, most recently from eb5899f to 3cb2abc Compare August 28, 2023 15:12
@zielmicha zielmicha force-pushed the parquet-nanos-zoned-timestamp-fix branch from 3cb2abc to b019ac2 Compare August 28, 2023 15:19
@Fokko
Copy link
Contributor

Fokko commented Nov 13, 2023

@raunaqmorarka @zielmicha Any progress on this? At Iceberg we're adding support for Nanosecond precision: apache/iceberg#8683

@mosabua
Copy link
Member

mosabua commented Jan 12, 2024

@zielmicha @martint @raunaqmorarka could we rebase this and move towards merge?

@github-actions
Copy link

github-actions bot commented Sep 4, 2024

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Sep 4, 2024
@mosabua
Copy link
Member

mosabua commented Sep 18, 2024

@zielmicha @martint @raunaqmorarka I know this PR has been around a long time. From what I understand it would still be great to get this feature in for @zielmicha .. what are next steps to enable that? Rebase probably.. anything else?

@github-actions github-actions bot removed the stale label Sep 19, 2024
@github-actions
Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Oct 11, 2024
@github-actions
Copy link

github-actions bot commented Nov 4, 2024

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

@github-actions github-actions bot closed this Nov 4, 2024
@mosabua mosabua added stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. and removed stale labels Nov 4, 2024
@mosabua mosabua reopened this Nov 4, 2024
@mosabua
Copy link
Member

mosabua commented Nov 4, 2024

Reopening since I believe we want this feature added based on past discussions. I will leave @martint to decide on next steps.

@raunaqmorarka
Copy link
Member

Superseded by #27835 and #22781

dain added a commit to dain/trino that referenced this pull request Jan 6, 2026
Add tests for reading Parquet files with nanosecond timestamp precision
into various Trino timestamp types (MILLIS, MICROS, NANOS), both with
and without time zone.

Test resources and test structure from trinodb#13599

Co-Authored-By: Michał Zieliński <michal@zielinscy.org.pl>
dain added a commit to dain/trino that referenced this pull request Jan 6, 2026
Add tests for reading Parquet files with nanosecond timestamp precision
into various Trino timestamp types (MILLIS, MICROS, NANOS), both with
and without time zone.

Test resources and test structure from trinodb#13599

Co-Authored-By: Michał Zieliński <michal@zielinscy.org.pl>
raunaqmorarka pushed a commit that referenced this pull request Jan 6, 2026
Add tests for reading Parquet files with nanosecond timestamp precision
into various Trino timestamp types (MILLIS, MICROS, NANOS), both with
and without time zone.

Test resources and test structure from #13599

Co-Authored-By: Michał Zieliński <michal@zielinscy.org.pl>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed delta-lake Delta Lake connector docs hive Hive connector stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed.

Development

Successfully merging this pull request may close these issues.

7 participants