Arrow Connector : Preserve TIME and TIMESTAMP as Wall-Clock (Align with JDBC Connector)#25901
Arrow Connector : Preserve TIME and TIMESTAMP as Wall-Clock (Align with JDBC Connector)#25901lithinwxd wants to merge 6 commits intoprestodb:masterfrom
Conversation
...arrow-flight/src/test/java/com/facebook/plugin/arrow/testingServer/TestingArrowProducer.java
Show resolved
Hide resolved
presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowBlockBuilder.java
Show resolved
Hide resolved
|
Do you see this problem when you toggle the |
@tdcmeehan I see the issue even if i set the property SET SESSION legacy_timestamp = true; |
@lithinwxd, true is the default value. Did you also try to set it to false? |
@imjalpreet I tried with false as well , same problem exists without the fix . After the fix , I see consistent results with jdbc and arrow . I tried testing with client as (Moscow UTC +3) and the results got rendered for client according to their timezone. |
|
@imjalpreet here is the analysis for a sample Query in oracle for timestamp type query : SELECT customer_id , created_at FROM tm_lakehouse_gfs_user.customer_master cm; value in datasource : 2023-01-01 10:15:30.000 ⏱ Timestamp Behavior ComparisonLegacy Timestamp = false
Legacy Timestamp = true
note : All queries are executed wrt to IST timezone(UTC+5h30min) |
BryanCutler
left a comment
There was a problem hiding this comment.
Thanks @lithinwxd , but I don't think this is the right approach. An Arrow TimeMilliVector is not meant to be UTC and Arrow does not make any interpretation of values to be in UTC, see https://github.com/apache/arrow/blob/main/format/Schema.fbs#L307.
Presto has been interpreting these values as local time and we should continue to do this. If the data source is providing values in UTC, then it should be using a TimeStampMilliTZVector and set the time zone to UTC.
We should fix handling of explicit time zones and leave vectors with no time zone to be local time, not assume UTC. Then whatever data source is producing your values, they should either have a timezone or be in local time before writing to the Arrow vector.
| @@ -432,11 +436,25 @@ public void assignBlockFromTimeStampMilliVector(TimeStampMilliVector vector, Typ | |||
| } | |||
There was a problem hiding this comment.
What about assignBlockFromTimeStampMicroVector ?
| @@ -639,7 +659,9 @@ public void assignBlockFromTimeMilliTZVector(TimeStampMilliTZVector vector, Type | |||
| } | |||
There was a problem hiding this comment.
A TimeStampMilliTZVector has a specific TimeZone, you would need to use this instead of treating it as UTC. Actually, I think we should be converting to the presto data type TimestampWithTimeZoneType and use the TimeZone from the Arrow vector. Can you try this?
| .atZone(ZoneOffset.UTC) // interpret Arrow millis as UTC | ||
| .toLocalDateTime(); | ||
|
|
||
| value = localDateTime |
There was a problem hiding this comment.
nit: don't reuse value, make a new variable expected
| .atZone(ZoneOffset.UTC) // interpret Arrow millis as UTC | ||
| .toLocalDateTime(); | ||
|
|
||
| expectedMillis = localDateTime |
There was a problem hiding this comment.
nit: change the definition of long expectedMillis to just long value and use expectedMillis here
| long millis = vector.get(i); | ||
| type.writeLong(builder, millis); | ||
| // Interpret Arrow millis as if they were "local wall-clock time" in datasource | ||
| long localMillis = getLocalMillisFromUTCMillis(millis); |
There was a problem hiding this comment.
A TimeMilliVector has no associated timezone, but here you are forcing it to be a UTC timezone, we shouldn't do this. We have to assume that a timestamp without a timezone is local time.
| try (ResultSet resultSet = stmt.executeQuery(query.toUpperCase())) { | ||
| JdbcToArrowConfig config = new JdbcToArrowConfigBuilder().setAllocator(allocator).setTargetBatchSize(2048) | ||
| .setCalendar(Calendar.getInstance(TimeZone.getDefault())).build(); | ||
| .setCalendar(Calendar.getInstance(TimeZone.getTimeZone("UTC"))).build(); |
There was a problem hiding this comment.
Yes, your assumption is correct. Timestamp vectors are represented as UTC instants in Arrow since arrow doesn't have timezone.
This is wrong, an Arrow TimeStampMilliVector does not assume UTC. It's interpretation is up to presto, and we have been using this as local time. You must use a TimeStampMilliTZVector and set the time zone to UTC, if you want a UTC values.
You shouldn't need to change this in the test, it is saying that our datasource is generating localtime data with a timezone, resulting in a TimeStampMilliTZVector with the system timezone.
|
@BryanCutler I did some testing comparing the jdbc values vs arrow values and here is the result The value obtained from jdbc oracle connector is 1672548330000 ms
The value obtained from arrow based oracle connector without fix is 1672568130000 ms (TimeStampMilliVector)
I checked the unix timestamp calculator for both of these values and here is the info found Note : this was tested for a timestamp column and not for Timestamp with timezone column |
For an Arrow Please read the Arrow format spec, it is very clear https://github.com/apache/arrow/blob/main/format/Schema.fbs#L304 |
From arrow server side perspective , if we are not using a UTC instant , instead if we use local system time that would default to arrow server time which could be anywhere say here (UTC-7) its present in a vm, then there would be double shifts . Datasource can be in a cloud but vm may be in UTC-7 zone . If you must use an Arrow timestamp vector ( from server side) like TimeStampMilliVector, you have no choice but to convert the local datetime to epoch millis using some timezone (e.g., system default or UTC), and timezone-based shifts will happen. |
That is the problem, you are changing the data in Presto irreversibly by converting it to the system timezone. You are free to change the data in your flight server, but Presto should not be manipulating the data this way. It would be best if you can make a new unit test that will fail without your changes, but pass with your changes applied. Once you have this, it will be easier to discuss. Can you do that? |
@BryanCutler I tried creating a new unit test to replicate the issue experienced in Arrow server but it seems even the existing test cases in master fails when the Presto client and Arrow server are in different timezone. We are not trying to irreversibly change the data may be until legacy timestamp issue is fixed this fix might solve the issue, we need to make sure both jdbc and arrow replicate same values. One thing I observed is arrow is returning same millis but presto according to whether legacy timestamp is set or not is reintrepreting the result. Until a fix for legacy timestamp is in place this issue cannot be resolved.
|
|
#26449 - shall close once legacy timestamp issue is merged |
|
Closing as this was trying to match a bug in the JDBC connectors, which was fixed in #26449. |





Description
This PR fixes the handling of TIME and TIMESTAMP values in the Arrow connector.
Previously, Arrow interpreted raw millis as UTC instants, which caused unintended timezone shifts when queried from Presto. With this change, the Arrow connector preserves the values as wall-clock times, consistent with the behavior of the JDBC connector.
Example
A source value of 2025-08-01 10:00:00 (TIMESTAMP) was treated as 2025-08-01 10:00 UTC, which then shifted when displayed in Asia/Kolkata or America/Los_Angeles.
Similarly, TIME '10:00:00' was treated as 10:00 UTC instead of 10:00 wall-clock.
Motivation and Context
Impact
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.