-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Spark: Fix ClassCastException when using bucket UDF #3368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Related issue: #2838 |
|
@rdblue @RussellSpitzer , could you help to review this PR? :) |
| } | ||
|
|
||
| @SuppressWarnings("checkstyle:CyclomaticComplexity") | ||
| public static Object convertAtomicValue(DataType atomic, Object object) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed?
I think you could handle short and byte types by updating the convert(Type, Object) method above:
case INTEGER:
return ((Number) object).intValue();Then you wouldn't need a new method at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review. The method is not needed.
|
@izchen, could you fix just the latest version of Spark and then we'll port the changes to the other versions when after it is merged? That way we don't have a commit that affects all Spark versions. |
|
@rdblue done, could you help to review this PR again? |
jackye1995
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me
|
Looks good. Thanks, @izchen! |
|
Thanks, @rdblue @jackye1995 ! |
Port of apache#3368 to Spark 3.1.
Port of apache#3368 to Spark 3.0.
Port of apache#3368 to Spark 2.4.
Port of apache#3368 to Spark 3.1.
Port of apache#3368 to Spark 3.0.
Port of apache#3368 to Spark 2.4.
Currently, directly register iceberg api
org.apache.iceberg.transforms.Bucket#applyas spark UDF inorg.apache.iceberg.spark#IcebergSpark.For byte, short, date, timestamp, and binary, Spark value of these types is different from iceberg‘s internal representation, which will cause a
ClassCastExceptionerror.We should first convert the spark value to iceberg's internal representation, and then use the converted value as the input of iceberg api
org.apache.iceberg.spark#IcebergSpark.In addition, add more ut in this PR to cover all spark atom types.