-
Notifications
You must be signed in to change notification settings - Fork 1k
Support different TimeUnits and timezones when reading Timestamps from INT96 #7285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mbutrovich and @a10y -- this is looking pretty good to me
I think we need to fix the API to be a non breaking change if we want to get this into the next non-breaking release (in the next few days)
Otherwise all i think this PR needs is Tests for timezones and it should be good to go
| #[inline] | ||
| pub fn convert_int96(_descr: &ColumnDescPtr, value: Int96) -> Self { | ||
| Field::TimestampMillis(value.to_i64()) | ||
| Field::TimestampMillis(value.to_millis()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be to_nanos() to preserve the old behavior
But then again it doesn't make sense to erturn a nanosecond timestamp for a value with millisecond precision 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be to_nanos() to preserve the old behavior
The old behavior is actually to convert it to millis.
Current behavior for convert_i96 has it call to_i64 which converts to millis, so I tried to keep the behavior the same.
|
BTW the msrv test failure is not related: |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @mbutrovich -- this looks good to me.
Seems like there are some CI failures to address but then it should be good to merge
|
Thanks again @mbutrovich |
…m INT96 (apache#7285) * Support different Timestamp TimeUnit resolutions for INT96. * Use i64 for subtracting JULIAN_DAY_OF_EPOCH. * docs. * Add deprecation comment. * Fix typo. * Add timezone test. * Fix clippy. * Try to fix Clippy again.
Which issue does this PR close?
Brings behavior similar to arrow-cpp: https://github.com/apache/arrow/blob/6b66c842eefec520d391203d205cd91d1ca0dd65/cpp/src/parquet/arrow/schema_internal.cc#L206 except this approach relies on Parquet metadata, or passing
Schemainto Parquet reader rather than a specific flag.Rationale for this change
See issue.
What changes are included in this PR?
IntoBufferforVec<T>whereTis a Parquet type now takes anArrowTypearg so we know what to materialize INT96 into.ArrowTypemetadata in Parquet can now specify the resolution. Alternatively, asupplied_schematoArrowReaderOptionscan achieve the same effect, which is how DataFusion will pass schema hints for INT96 (similar toStringViewoptimizations).ArrowTypehints.Are there any user-facing changes?
#7250 (comment)
I don't believe so.