-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-8753][SQL] Create an IntervalType data type #7226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @rxin |
|
Test build #36544 has finished for PR 7226 at commit
|
|
Test build #36547 has finished for PR 7226 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's create an abstract data type for IntervalType.
|
We should add a way to cast interval into string and string to interval too. That can go in a separate pull request though. |
|
Test build #36572 has finished for PR 7226 at commit
|
|
@cloud-fan actually upon further thought, I think it makes more sense to have a single IntervalType and internally represent it as a 12-byte value (1 long + 1 int). We can create a new class for it. |
|
Test build #36683 has finished for PR 7226 at commit
|
|
Test build #36684 has finished for PR 7226 at commit
|
|
Test build #36685 has finished for PR 7226 at commit
|
|
Test build #36691 has finished for PR 7226 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a legit comparison actually -- i'm not sure if you can sort directly on interval data types this way.
(the problem is that the number of seconds or days can be greater than month)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just assume there are 30 days in a month and transform months to microseconds and compare them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't make it extend AtomicType here, as I haven't figured out how to compare intervals. 30 days and 1 months may have different compare result in different context.
|
Test build #36757 has finished for PR 7226 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when is this called? if it is called during analysis, it'd make more sense to throw AnalysisException, since that has better error reporting in Python.
|
looks good otherwise. |
|
Test build #36787 has finished for PR 7226 at commit
|
|
Test build #36789 has finished for PR 7226 at commit
|
|
Thanks - merging this in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you implement to string here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I filed https://issues.apache.org/jira/browse/SPARK-8938 for it.
|
Should we make |
|
hi @liancheng , I didn't make it extend AtomicType, as I haven't figured out how to compare intervals. 30 days and 1 months may have different compare result in different context. |
|
@cloud-fan Thanks for the explanation :) |
We need a new data type to represent time intervals. Because we can't determine how many days in a month, so we need 2 values for interval: a int
months, a longmicroseconds.The interval literal syntax looks like:
interval 3 years -4 month 4 weeks 3 secondBecause we use number of 100ns as value of
TimestampType, so it may not makes sense to support nano second unit.