Spark: Fix ClassCastException when using bucket UDF #3368

izchen · 2021-10-25T15:22:01Z

Currently, directly register iceberg api org.apache.iceberg.transforms.Bucket#apply as spark UDF in org.apache.iceberg.spark#IcebergSpark.

For byte, short, date, timestamp, and binary, Spark value of these types is different from iceberg‘s internal representation, which will cause a ClassCastException error.

SPARK TYPE	SPARK VALUE	ICEBERG TYPE	ICEBERG VALUE
ByteType	java.lang.Byte	IntegerType	java.lang.Integer
ShortType	java.lang.Short	IntegerType	java.lang.Integer
DateType	java.sql.Date	DateType	java.lang.Integer
TimestampType	java.sql.Timestamp	TimestampType.withZone	java.lang.Long
BinaryType	byte array	BinaryType	java.nio.ByteBuffer

We should first convert the spark value to iceberg's internal representation, and then use the converted value as the input of iceberg api org.apache.iceberg.spark#IcebergSpark.

In addition, add more ut in this PR to cover all spark atom types.

izchen · 2021-10-25T15:22:22Z

Related issue: #2838

izchen · 2021-10-25T15:24:22Z

@rdblue @RussellSpitzer , could you help to review this PR? :)

rdblue · 2021-10-25T15:58:27Z

spark/v2.4/spark/src/main/java/org/apache/iceberg/spark/SparkValueConverter.java

  }
+
+  @SuppressWarnings("checkstyle:CyclomaticComplexity")
+  public static Object convertAtomicValue(DataType atomic, Object object) {


Is this needed?

I think you could handle short and byte types by updating the convert(Type, Object) method above:

case INTEGER: return ((Number) object).intValue();

Then you wouldn't need a new method at all.

Thanks for your review. The method is not needed.

rdblue · 2021-10-25T15:59:38Z

@izchen, could you fix just the latest version of Spark and then we'll port the changes to the other versions when after it is merged? That way we don't have a commit that affects all Spark versions.

This reverts commit 6919b32.

This reverts commit 5b7201f.

izchen · 2021-10-25T17:04:55Z

@rdblue done, could you help to review this PR again？

jackye1995

looks good to me

rdblue · 2021-10-26T22:58:01Z

Looks good. Thanks, @izchen!

izchen · 2021-10-27T01:56:44Z

Thanks, @rdblue @jackye1995 !

Port of apache#3368 to Spark 3.1.

Port of apache#3368 to Spark 3.0.

Port of apache#3368 to Spark 2.4.

Port of #3368 to Spark 3.1.

Port of #3368 to Spark 3.0.

Port of #3368 to Spark 2.4.

Port of apache#3368 to Spark 3.1.

Port of apache#3368 to Spark 3.0.

Port of apache#3368 to Spark 2.4.

zhangchen added 3 commits October 25, 2021 18:21

v3.0

6919b32

v2.4

5b7201f

v3.2

0bc7172

github-actions bot added the spark label Oct 25, 2021

rdblue reviewed Oct 25, 2021

View reviewed changes

zhangchen added 3 commits October 26, 2021 00:41

Revert "v3.0"

4c306b0

This reverts commit 6919b32.

Revert "v2.4"

078dfe4

This reverts commit 5b7201f.

updating convert method

b87e126

fix spelling mistakes

cf0db4b

jackye1995 approved these changes Oct 26, 2021

View reviewed changes

rdblue approved these changes Oct 26, 2021

View reviewed changes

rdblue added this to the Java 0.12.1 Release milestone Oct 26, 2021

rdblue merged commit 425641a into apache:master Oct 26, 2021

izchen deleted the fix_bucket_udf branch October 27, 2021 01:45

kbendick pushed a commit to kbendick/iceberg that referenced this pull request Oct 27, 2021

Spark: Fix ClassCastException when using bucket UDF (apache#3368)

2f0154c

kbendick mentioned this pull request Oct 27, 2021

Hotfix - Remove additional parens to satisfy checkstyle for 1.12.1 cherry-picking #3386

Merged

kbendick pushed a commit to kbendick/iceberg that referenced this pull request Nov 1, 2021

Spark: Fix ClassCastException when using bucket UDF (apache#3368)

0ade1c0

rdblue pushed a commit that referenced this pull request Nov 1, 2021

Spark: Fix ClassCastException when using bucket UDF (#3368)

4ba2157

wypoon added a commit to wypoon/iceberg that referenced this pull request Nov 18, 2021

Spark 3.1: Fix ClassCastException when using bucket UDF

f6c295e

Port of apache#3368 to Spark 3.1.

wypoon added a commit to wypoon/iceberg that referenced this pull request Nov 18, 2021

Spark 3.0: Fix ClassCastException when using bucket UDF

f8e5c46

Port of apache#3368 to Spark 3.0.

wypoon added a commit to wypoon/iceberg that referenced this pull request Nov 18, 2021

Spark 2.4: Fix ClassCastException when using bucket UDF

6260c38

Port of apache#3368 to Spark 2.4.

This was referenced Nov 18, 2021

Spark 3.1: Fix ClassCastException when using bucket UDF #3568

Merged

Spark 3.0: Fix ClassCastException when using bucket UDF #3569

Merged

Spark 2.4: Fix ClassCastException when using bucket UDF #3570

Merged

rdblue pushed a commit that referenced this pull request Nov 18, 2021

Spark 3.1: Fix ClassCastException when using bucket UDF (#3568)

9d339c9

Port of #3368 to Spark 3.1.

rdblue pushed a commit that referenced this pull request Nov 18, 2021

Spark 3.0: Fix ClassCastException when using bucket UDF (#3569)

4eef02d

Port of #3368 to Spark 3.0.

rdblue pushed a commit that referenced this pull request Nov 18, 2021

Spark 2.4: Fix ClassCastException when using bucket UDF (#3570)

3b2c32d

Port of #3368 to Spark 2.4.

Initial-neko pushed a commit to Initial-neko/iceberg that referenced this pull request Nov 23, 2021

Spark 3.1: Fix ClassCastException when using bucket UDF (apache#3568)

a908e40

Port of apache#3368 to Spark 3.1.

Initial-neko pushed a commit to Initial-neko/iceberg that referenced this pull request Nov 23, 2021

Spark 3.0: Fix ClassCastException when using bucket UDF (apache#3569)

bd8987d

Port of apache#3368 to Spark 3.0.

Initial-neko pushed a commit to Initial-neko/iceberg that referenced this pull request Nov 23, 2021

Spark 2.4: Fix ClassCastException when using bucket UDF (apache#3570)

b57b91c

Port of apache#3368 to Spark 2.4.

izchen added a commit to izchen/iceberg that referenced this pull request Dec 7, 2021

Spark: Fix ClassCastException when using bucket UDF (apache#3368)

3d0e282

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Spark: Fix ClassCastException when using bucket UDF #3368

Spark: Fix ClassCastException when using bucket UDF #3368

Uh oh!

izchen commented Oct 25, 2021 •

edited

Loading

Uh oh!

izchen commented Oct 25, 2021

Uh oh!

izchen commented Oct 25, 2021

Uh oh!

rdblue Oct 25, 2021

Uh oh!

izchen Oct 25, 2021

Uh oh!

rdblue commented Oct 25, 2021

Uh oh!

izchen commented Oct 25, 2021

Uh oh!

jackye1995 left a comment

Uh oh!

rdblue commented Oct 26, 2021

Uh oh!

izchen commented Oct 27, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Spark: Fix ClassCastException when using bucket UDF #3368

Spark: Fix ClassCastException when using bucket UDF #3368

Uh oh!

Conversation

izchen commented Oct 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

izchen commented Oct 25, 2021

Uh oh!

izchen commented Oct 25, 2021

Uh oh!

rdblue Oct 25, 2021

Choose a reason for hiding this comment

Uh oh!

izchen Oct 25, 2021

Choose a reason for hiding this comment

Uh oh!

rdblue commented Oct 25, 2021

Uh oh!

izchen commented Oct 25, 2021

Uh oh!

jackye1995 left a comment

Choose a reason for hiding this comment

Uh oh!

rdblue commented Oct 26, 2021

Uh oh!

izchen commented Oct 27, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

izchen commented Oct 25, 2021 •

edited

Loading