-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-4985][SQL Parquet] Parquet date support #3855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Jenkins, this is ok to test |
|
Test build #24979 has finished for PR 3855 at commit
|
|
cc @marmbrus |
d29224e to
8851d1a
Compare
|
Test build #25509 has finished for PR 3855 at commit
|
8851d1a to
929f294
Compare
|
Test build #25510 has finished for PR 3855 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Usually I'd prefer import java.sql.Date and just use Date here.
|
Seems like this is subsumed by #3822, and thus we can close this issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't conform to the Parquet specification for date and produces invalid data.
When using the DATE annotation, the value must be the number of days from the Unix epoch, 1 January 1970. java.sql.Date and java.util.Date are backed by a long timestamp, the number of milliseconds from the Unix epoch (which is a Parquet TIMESTAMP_MILLIS) and casting that value to an integer makes it impossible to recover the real date.
I recommend using TIMESTAMP_MILLIS instead of date here (you won't need the toInt part). That seems to be what you want, if you're interested in using java.sql.Date. The reason why there is a name mismatch is that the Parquet types mirror SQL types more closely than Java objects.
|
I just looked at #3822 and it looks correct, so you can ignore my review comment above. In the future, please feel free to ping me for reviews when you're using logical types like Date, Timestamp, Decimal, etc. in either Parquet or Avro. I've been working on the specs in those communities and I'm happy to make sure the implementations look correct. |
|
@liancheng, I'll take a look as soon as I can. I'm a little swamped this week though, so I can't guarantee it'll be quick. Sorry! |
No description provided.