[SPARK-25990][SQL] ScriptTransformation should handle different data types correctly by Ngone51 · Pull Request #27556 · apache/spark

Ngone51 · 2020-02-13T03:44:00Z

What changes were proposed in this pull request?

We should convert Spark InternalRows to hive data via HiveInspectors.wrapperFor.

Why are the changes needed?

We may hit below exception without this change:

[info]    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, 192.168.1.6, executor driver): java.lang.ClassCastException: org.apache.spark.sql.types.Decimal cannot be cast to org.apache.hadoop.hive.common.type.HiveDecimal
[info]   	at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaHiveDecimalObjectInspector.getPrimitiveJavaObject(JavaHiveDecimalObjectInspector.java:55)
[info]   	at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:321)
[info]   	at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292)
[info]   	at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247)
[info]   	at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231)
[info]   	at org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55)
[info]   	at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread.$anonfun$run$2(ScriptTransformationExec.scala:300)
[info]   	at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread.$anonfun$run$2$adapted(ScriptTransformationExec.scala:281)
[info]   	at scala.collection.Iterator.foreach(Iterator.scala:941)
[info]   	at scala.collection.Iterator.foreach$(Iterator.scala:941)
[info]   	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
[info]   	at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread.$anonfun$run$1(ScriptTransformationExec.scala:281)
[info]   	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info]   	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1932)
[info]   	at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread.run(ScriptTransformationExec.scala:270)

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added new test. But please note that this test returns different result between Hive1.2 and Hive2.3 due to HiveDecimal or SerDe difference(don't know the root cause yet).

Ngone51 · 2020-02-13T04:54:25Z

cc @cloud-fan

SparkQA · 2020-02-13T06:22:49Z

Test build #118334 has finished for PR 27556 at commit 5a176b3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-02-13T12:14:55Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala

+          |FROM v
+        """.stripMargin)
+
+      val castDecimal: Column => Column = if (HiveUtils.isHive23) {


can we add some code comments to explain it?

and maybe name it as decimalToString

SparkQA · 2020-02-13T18:01:18Z

Test build #118370 has finished for PR 27556 at commit a07ccff.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

Ngone51 · 2020-02-14T03:10:38Z

Jenkins, retest this please.

SparkQA · 2020-02-14T06:26:04Z

Test build #118389 has finished for PR 27556 at commit a07ccff.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-02-14T08:52:41Z

thanks, merging to master/3.0!

…types correctly ### What changes were proposed in this pull request? We should convert Spark InternalRows to hive data via `HiveInspectors.wrapperFor`. ### Why are the changes needed? We may hit below exception without this change: ``` [info] org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, 192.168.1.6, executor driver): java.lang.ClassCastException: org.apache.spark.sql.types.Decimal cannot be cast to org.apache.hadoop.hive.common.type.HiveDecimal [info] at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaHiveDecimalObjectInspector.getPrimitiveJavaObject(JavaHiveDecimalObjectInspector.java:55) [info] at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:321) [info] at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292) [info] at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247) [info] at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231) [info] at org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55) [info] at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread.$anonfun$run$2(ScriptTransformationExec.scala:300) [info] at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread.$anonfun$run$2$adapted(ScriptTransformationExec.scala:281) [info] at scala.collection.Iterator.foreach(Iterator.scala:941) [info] at scala.collection.Iterator.foreach$(Iterator.scala:941) [info] at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) [info] at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread.$anonfun$run$1(ScriptTransformationExec.scala:281) [info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) [info] at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1932) [info] at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread.run(ScriptTransformationExec.scala:270) ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added new test. But please note that this test returns different result between Hive1.2 and Hive2.3 due to `HiveDecimal` or `SerDe` difference(don't know the root cause yet). Closes #27556 from Ngone51/script_transform. Lead-authored-by: yi.wu <yi.wu@databricks.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 99b8136) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

AngersZhuuuu · 2020-03-21T13:34:23Z

@Ngone51
Dose your pr make transform support array/map datatype as script's input column?

Ngone51 · 2020-03-23T01:09:49Z

Dose your pr make transform support array/map datatype as script's input column?

@AngersZhuuuu No. This doesn't add support for new types but just improve already supported types.

…types correctly ### What changes were proposed in this pull request? We should convert Spark InternalRows to hive data via `HiveInspectors.wrapperFor`. ### Why are the changes needed? We may hit below exception without this change: ``` [info] org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, 192.168.1.6, executor driver): java.lang.ClassCastException: org.apache.spark.sql.types.Decimal cannot be cast to org.apache.hadoop.hive.common.type.HiveDecimal [info] at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaHiveDecimalObjectInspector.getPrimitiveJavaObject(JavaHiveDecimalObjectInspector.java:55) [info] at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:321) [info] at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292) [info] at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247) [info] at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231) [info] at org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55) [info] at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread.$anonfun$run$2(ScriptTransformationExec.scala:300) [info] at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread.$anonfun$run$2$adapted(ScriptTransformationExec.scala:281) [info] at scala.collection.Iterator.foreach(Iterator.scala:941) [info] at scala.collection.Iterator.foreach$(Iterator.scala:941) [info] at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) [info] at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread.$anonfun$run$1(ScriptTransformationExec.scala:281) [info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) [info] at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1932) [info] at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread.run(ScriptTransformationExec.scala:270) ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added new test. But please note that this test returns different result between Hive1.2 and Hive2.3 due to `HiveDecimal` or `SerDe` difference(don't know the root cause yet). Closes apache#27556 from Ngone51/script_transform. Lead-authored-by: yi.wu <yi.wu@databricks.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

cloud-fan and others added 2 commits February 13, 2020 11:03

fix

db637f7

fix test

5a176b3

Ngone51 changed the title ~~[SPARK-25990] ScriptTransform should handle different data types correctly~~ [SPARK-25990] ScriptTransformation should handle different data types correctly Feb 13, 2020

Ngone51 changed the title ~~[SPARK-25990] ScriptTransformation should handle different data types correctly~~ [SPARK-25990][SQL] ScriptTransformation should handle different data types correctly Feb 13, 2020

cloud-fan reviewed Feb 13, 2020

View reviewed changes

cloud-fan approved these changes Feb 13, 2020

View reviewed changes

Ngone51 added 2 commits February 13, 2020 23:13

rename

656ccd0

add comments

a07ccff

cloud-fan approved these changes Feb 13, 2020

View reviewed changes

cloud-fan closed this in 99b8136 Feb 14, 2020

AngersZhuuuu mentioned this pull request Jul 20, 2020

[SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core #29085

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[SPARK-25990][SQL] ScriptTransformation should handle different data types correctly#27556

[SPARK-25990][SQL] ScriptTransformation should handle different data types correctly#27556
Ngone51 wants to merge 4 commits intoapache:masterfrom
Ngone51:script_transform

Ngone51 commented Feb 13, 2020 •

edited

Loading

Uh oh!

Ngone51 commented Feb 13, 2020

Uh oh!

SparkQA commented Feb 13, 2020

Uh oh!

cloud-fan Feb 13, 2020

Uh oh!

cloud-fan Feb 13, 2020

Uh oh!

SparkQA commented Feb 13, 2020

Uh oh!

Ngone51 commented Feb 14, 2020

Uh oh!

SparkQA commented Feb 14, 2020

Uh oh!

cloud-fan commented Feb 14, 2020

Uh oh!

AngersZhuuuu commented Mar 21, 2020

Uh oh!

Ngone51 commented Mar 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

Ngone51 commented Feb 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Ngone51 commented Feb 13, 2020

Uh oh!

SparkQA commented Feb 13, 2020

Uh oh!

cloud-fan Feb 13, 2020

Choose a reason for hiding this comment

Uh oh!

cloud-fan Feb 13, 2020

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 13, 2020

Uh oh!

Ngone51 commented Feb 14, 2020

Uh oh!

SparkQA commented Feb 14, 2020

Uh oh!

cloud-fan commented Feb 14, 2020

Uh oh!

AngersZhuuuu commented Mar 21, 2020

Uh oh!

Ngone51 commented Mar 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Ngone51 commented Feb 13, 2020 •

edited

Loading