-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-19548][SQL] Support Hive UDFs which return typed Lists/Maps #16886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #72703 has finished for PR 16886 at commit
|
|
Test build #72706 has finished for PR 16886 at commit
|
|
Looks great to me because Hive actually supports these types for |
| throw new AnalysisException( | ||
| "List type in java is unsupported because " + | ||
| "JVM type erasure makes spark fail to catch a component type in List<>") | ||
| "Raw list type in java is unsupported because Spark cannot infer the element type.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this error handling? "Unsupported java type interface java.util.List" thrown in the bottom entry is not enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is quite likely that a user/developer will make a mistake for either a list or a map. I think these errors are more informative than the generic error, so I would like to retain them for the sake of user experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: All the entries for error handling would be better to be placed in the bottom
| case _: WildcardType => | ||
| throw new AnalysisException( | ||
| "Collection types with wildcards (e.g. List<?> or Map<?, ?>) are unsupported because " + | ||
| "Spark cannot infer the data type for these type parameters.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this explicit error message for the special case seems good to make users understood. So, would it be better to need an additional error handling for BoundedType, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BoundedType is a mockito class and not a JVM class. A bound type that cannot be translated to a DataType is caught by the final case in the match.
|
Test build #72711 has finished for PR 16886 at commit
|
| /** | ||
| * UDF that returns a raw (non-parameterized) java List. | ||
| */ | ||
| public class UDFRawList extends UDF { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: in Spark java files should be indented with 2 spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, all files in that dir are indented with 4 spaces. I can modify those if you want me to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems half of them indented with 4 spaces, yes let's fix them together.
|
LGTM |
|
Test build #72719 has finished for PR 16886 at commit
|
|
thanks, merging to master! |
## What changes were proposed in this pull request? This PR adds support for Hive UDFs that return fully typed java Lists or Maps, for example `List<String>` or `Map<String, Integer>`. It is also allowed to nest these structures, for example `Map<String, List<Integer>>`. Raw collections or collections using wildcards are still not supported, and cannot be supported due to the lack of type information. ## How was this patch tested? Modified existing tests in `HiveUDFSuite`, and I have added test cases for raw collection and collection using wildcards. Author: Herman van Hovell <[email protected]> Closes apache#16886 from hvanhovell/SPARK-19548.
What changes were proposed in this pull request?
This PR adds support for Hive UDFs that return fully typed java Lists or Maps, for example
List<String>orMap<String, Integer>. It is also allowed to nest these structures, for exampleMap<String, List<Integer>>. Raw collections or collections using wildcards are still not supported, and cannot be supported due to the lack of type information.How was this patch tested?
Modified existing tests in
HiveUDFSuite, and I have added test cases for raw collection and collection using wildcards.