[SPARK-16714][SQL] `array` should create a decimal array from decimals with different precisions and scales by dongjoon-hyun · Pull Request #14353 · apache/spark

dongjoon-hyun · 2016-07-25T22:50:03Z

What changes were proposed in this pull request?

In Spark 2.0, we will parse float literals as decimals. However, it introduces a side-effect, which is described below.

Before

scala> sql("select array(0.001, 0.02)")
org.apache.spark.sql.AnalysisException: cannot resolve `array(CAST(0.001 AS DECIMAL(3,3)), CAST(0.02 AS DECIMAL(2,2)))` due to data type mismatch: input to function array should all be the same type, but it's [decimal(3,3), decimal(2,2)]; line 1 pos 7

After

scala> sql("select array(0.001, 0.02)")
res0: org.apache.spark.sql.DataFrame = [array(0.001, 0.02): array<decimal(3,3)>]

How was this patch tested?

Pass the Jenkins tests with a new test case.

SparkQA · 2016-07-26T00:54:24Z

Test build #62848 has finished for PR 14353 at commit d3dd7fb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2016-07-26T03:19:17Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala

I think we cannot just make the check pass. We need to need to actually cast those element to the same prevision and scale.

For example, if we access a single element, its data type actually may not be the one shown as the array's datatype.

Thank you for review, @yhuai .
I see. I'll check that more.

Hi, @yhuai . I checked the following.

scala> sql("select a[0], a[1] from (select array(0.001, 0.02) a) T") res4: org.apache.spark.sql.DataFrame = [a[0]: decimal(3,3), a[1]: decimal(3,3)] scala> sql("select a[0], a[1] from (select array(0.001, 0.02) a) T").show() +-----+-----+ | a[0]| a[1]| +-----+-----+ |0.001|0.020| +-----+-----+ scala> sql("select a[0], a[1] from (select array(0.001, 0.02) a) T").explain(true) == Parsed Logical Plan == 'Project [unresolvedalias('a[0], None), unresolvedalias('a[1], None)] +- 'SubqueryAlias T +- 'Project ['array(0.001, 0.02) AS a#54] +- OneRowRelation$ == Analyzed Logical Plan == a[0]: decimal(3,3), a[1]: decimal(3,3) Project [a#54[0] AS a[0]#61, a#54[1] AS a[1]#62] +- SubqueryAlias T +- Project [array(0.001, 0.02) AS a#54] +- OneRowRelation$

scala> sql("create table d1(a DECIMAL(3,2))") scala> sql("create table d2(a DECIMAL(2,1))") scala> sql("insert into d1 values(1.0)") scala> sql("insert into d2 values(1.0)") scala> sql("select * from d1, d2").show() +----+---+ | a| a| +----+---+ |1.00|1.0| +----+---+ scala> sql("select array(d1.a,d2.a),array(d2.a,d1.a),* from d1, d2") res5: org.apache.spark.sql.DataFrame = [array(a, a): array<decimal(3,2)>, array(a, a): array<decimal(3,2)> ... 2 more fields] scala> sql("select array(d1.a,d2.a),array(d2.a,d1.a),* from d1, d2").show() +------------+------------+----+---+ | array(a, a)| array(a, a)| a| a| +------------+------------+----+---+ |[1.00, 1.00]|[1.00, 1.00]|1.00|1.0| +------------+------------+----+---+ scala> sql("select array(d1.a,d2.a)[0],array(d2.a,d1.a)[0],* from d1, d2").show() +--------------+--------------+----+---+ |array(a, a)[0]|array(a, a)[0]| a| a| +--------------+--------------+----+---+ | 1.00| 1.00|1.00|1.0| +--------------+--------------+----+---+ scala> sql("select array(d1.a,d2.a)[1],array(d2.a,d1.a)[1],* from d1, d2").show() +--------------+--------------+----+---+ |array(a, a)[1]|array(a, a)[1]| a| a| +--------------+--------------+----+---+ | 1.00| 1.00|1.00|1.0| +--------------+--------------+----+---+

And Finally, the following is the codegen result. Please see the line 29.

scala> sql("explain codegen select array(0.001, 0.02)[1]").collect().foreach(println) [Found 1 WholeStageCodegen subtrees. == Subtree 1 / 1 == *Project [0.02 AS array(0.001, 0.02)[1]#75] +- Scan OneRowRelation[] Generated code: /* 001 */ public Object generate(Object[] references) { /* 002 */ return new GeneratedIterator(references); /* 003 */ } /* 004 */ /* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator { /* 006 */ private Object[] references; /* 007 */ private scala.collection.Iterator inputadapter_input; /* 008 */ private UnsafeRow project_result; /* 009 */ private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder project_holder; /* 010 */ private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter project_rowWriter; /* 011 */ /* 012 */ public GeneratedIterator(Object[] references) { /* 013 */ this.references = references; /* 014 */ } /* 015 */ /* 016 */ public void init(int index, scala.collection.Iterator inputs[]) { /* 017 */ partitionIndex = index; /* 018 */ inputadapter_input = inputs[0]; /* 019 */ project_result = new UnsafeRow(1); /* 020 */ this.project_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(project_result, 0); /* 021 */ this.project_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(project_holder, 1); /* 022 */ } /* 023 */ /* 024 */ protected void processNext() throws java.io.IOException { /* 025 */ while (inputadapter_input.hasNext()) { /* 026 */ InternalRow inputadapter_row = (InternalRow) inputadapter_input.next(); /* 027 */ Object project_obj = ((Expression) references[0]).eval(null); /* 028 */ Decimal project_value = (Decimal) project_obj; /* 029 */ project_rowWriter.write(0, project_value, 3, 3); /* 030 */ append(project_result); /* 031 */ if (shouldStop()) return; /* 032 */ } /* 033 */ }

In short, those are recognized correctly in the Analyzed Logical Plan. As a result, the codegen correctly writes it with the unified precision and scale.

== Analyzed Logical Plan == a[0]: decimal(3,3), a[1]: decimal(3,3)

Is there anything to check more?

Hi, @yhuai .
Could you give me some advice?

dongjoon-hyun · 2016-07-27T12:03:23Z

Hi, @rxin .
Could you review this PR?

…ng different inferred precessions and scales

dongjoon-hyun · 2016-07-28T03:07:07Z

Rebased.

SparkQA · 2016-07-28T04:55:14Z

Test build #62954 has finished for PR 14353 at commit a095389.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-07-28T06:33:39Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala

+    var elementType: DataType = children.headOption.map(_.dataType).getOrElse(NullType)
+    if (elementType.isInstanceOf[DecimalType]) {
+      children.foreach { child =>
+        if (elementType.asInstanceOf[DecimalType].isTighterThan(child.dataType)) {


i think this suffers from the same issue as the map pr.

Thank you, @rxin .
Yep. I've read you comment about the lose.
I'll check that and revise.

petermaxlee · 2016-07-28T07:38:22Z

@dongjoon-hyun I created a patch here: #14389

dongjoon-hyun · 2016-08-01T21:05:24Z

Close this for the better PR #14439

yhuai reviewed Jul 26, 2016
View reviewed changes

dongjoon-hyun mentioned this pull request Jul 27, 2016

[SPARK-16735][SQL] map should create a decimal key or value from decimals with different precisions and scales #14374

Closed

[SPARK-16714][SQL] Fail to create a decimal arrays with literals havi…

a095389

…ng different inferred precessions and scales

rxin reviewed Jul 28, 2016
View reviewed changes

petermaxlee mentioned this pull request Jul 28, 2016

[SPARK-16714][SQL] Refactor type widening for consistency #14389

Closed

dongjoon-hyun closed this Aug 1, 2016

dongjoon-hyun deleted the SPARK-16714 branch August 14, 2016 09:46

Conversation

dongjoon-hyun commented Jul 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Jul 26, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Jul 26, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Jul 26, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jul 27, 2016

Uh oh!

dongjoon-hyun commented Jul 28, 2016

Uh oh!

SparkQA commented Jul 28, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

petermaxlee commented Jul 28, 2016

Uh oh!

dongjoon-hyun commented Aug 1, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dongjoon-hyun commented Jul 25, 2016 •

edited

Loading

dongjoon-hyun Jul 26, 2016 •

edited

Loading

dongjoon-hyun Jul 26, 2016 •

edited

Loading