-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-23272][SQL] add calendar interval type support to ColumnVector #20438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| * struct field. | ||
| */ | ||
| public final ColumnarRow getStruct(int rowId) { | ||
| if (isNullAt(rowId)) return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
| int month = data.getChild(0).getInt(offset + ordinal); | ||
| long microseconds = data.getChild(1).getLong(offset + ordinal); | ||
| return new CalendarInterval(month, microseconds); | ||
| return data.getInterval(offset + ordinal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should insert if (data.isNullAt(offset + ordinal)) return null; to be consistent with other ColumnarXxxs?
Or I guess we can remove these null-checks from all other ColumnarXxxs and leave it to ColumnVector.getInterval()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I'm going to require ColumnVector.getXXX to return null if the value is null, but I'll do it in another PR, to update all the documents and define the behavior of batched getXXX methods.
| } | ||
|
|
||
| public static long toLongWithRange(String fieldName, | ||
| private static long toLongWithRange(String fieldName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?! It's much harder (if at all possible) to test private methods (been bitten few times this week and remember the pain).
| * | ||
| * In Spark, calendar interval type value is basically an integer value representing the number of | ||
| * months in this interval, and a long value representing the number of microseconds in this | ||
| * interval. A interval type vector is same as a struct type vector with 2 fields: `months` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"An interval" + "is the same as"
| * `microseconds`. | ||
| * | ||
| * To support interval type, implementations must implement {@link #getChild(int)} and define 2 | ||
| * child vectors: the first child vector is a int type vector, containing all the month values of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is an int type vector
| */ | ||
| public final CalendarInterval getInterval(int rowId) { | ||
| if (isNullAt(rowId)) return null; | ||
| final int months = getChild(0).getInt(rowId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of final keyword here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's from the previous code, probably it tries to make the compiler happy and run the code faster.
| } | ||
|
|
||
| /** | ||
| * Returns the ordinal's child column vector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@return [[ColumnVector]] at the ordinal
| * interval. An interval type vector is the same as a struct type vector with 2 fields: `months` | ||
| * and `microseconds`. | ||
| * | ||
| * To support interval type, implementations must implement {@link #getChild(int)} and define 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: interval type -> calendar interval type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little annoying to type calendar interval type all the time...
|
Test build #86819 has finished for PR 20438 at commit
|
|
Test build #86825 has finished for PR 20438 at commit
|
| * @return child [[ColumnVector]] at the given ordinal. | ||
| */ | ||
| public abstract ColumnVector getChild(int ordinal); | ||
| protected abstract ColumnVector getChild(int ordinal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. Now, it became protected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since ColumnVector is public, could you add some description in PR description for this kind of visibility change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
|
LGTM. |
| @@ -146,9 +146,7 @@ public byte[] getBinary(int ordinal) { | |||
| @Override | |||
| public CalendarInterval getInterval(int ordinal) { | |||
| if (columns[ordinal].isNullAt(rowId)) return null; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this null check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #20438 (comment) . In this PR I just fixed the returning null issue for getStruct and getInterval, because they are non-abstract. There should be a follow up to clearly document that ColumnVector.getBinary/getUTF8String/... should return null if this slot is null. Then we can remove these null checks here. I appreciate it if you have time to take this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try to prepare a PR tonight.
| @@ -139,9 +139,7 @@ public byte[] getBinary(int ordinal) { | |||
| @Override | |||
| public CalendarInterval getInterval(int ordinal) { | |||
| if (data.getChild(ordinal).isNullAt(rowId)) return null; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this null check now?
|
thanks, merging to master/2.3! |
## What changes were proposed in this pull request? `ColumnVector` is aimed to support all the data types, but `CalendarIntervalType` is missing. Actually we do support interval type for inner fields, e.g. `ColumnarRow`, `ColumnarArray` both support interval type. It's weird if we don't support interval type at the top level. This PR adds the interval type support. This PR also makes `ColumnVector.getChild` protect. We need it public because `MutableColumnaRow.getInterval` needs it. Now the interval implementation is in `ColumnVector.getInterval`. ## How was this patch tested? a new test. Author: Wenchen Fan <[email protected]> Closes #20438 from cloud-fan/interval. (cherry picked from commit 695f714) Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
ColumnVectoris aimed to support all the data types, butCalendarIntervalTypeis missing. Actually we do support interval type for inner fields, e.g.ColumnarRow,ColumnarArrayboth support interval type. It's weird if we don't support interval type at the top level.This PR adds the interval type support.
This PR also makes
ColumnVector.getChildprotect. We need it public becauseMutableColumnaRow.getIntervalneeds it. Now the interval implementation is inColumnVector.getInterval.How was this patch tested?
a new test.