[SPARK-32344][SQL][2.4] Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates #29157

maropu · 2020-07-20T03:31:33Z

What changes were proposed in this pull request?

This PR intends to fix a bug of distinct FIRST/LAST aggregates in v2.4.6;

scala> sql("SELECT FIRST(DISTINCT v) FROM VALUES 1, 2, 3 t(v)").show()
...
Caused by: java.lang.UnsupportedOperationException: Cannot evaluate expression: false#37
  at org.apache.spark.sql.catalyst.expressions.Unevaluable$class.eval(Expression.scala:258)
  at org.apache.spark.sql.catalyst.expressions.AttributeReference.eval(namedExpressions.scala:226)
  at org.apache.spark.sql.catalyst.expressions.aggregate.First.ignoreNulls(First.scala:68)
  at org.apache.spark.sql.catalyst.expressions.aggregate.First.updateExpressions$lzycompute(First.scala:82)
  at org.apache.spark.sql.catalyst.expressions.aggregate.First.updateExpressions(First.scala:81)
  at org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$15.apply(HashAggregateExec.scala:268)

A root cause of this bug is that the Aggregation strategy replaces a foldable boolean ignoreNullsExpr expr with a Unevaluable expr (AttributeReference) for distinct FIRST/LAST aggregate functions. But, this operation cannot be allowed because the Analyzer has checked that it must be foldabe;

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/First.scala

Lines 74 to 76 in ffdbbae

    
           } else if (!ignoreNullsExpr.foldable) { 
        
             TypeCheckFailure( 
        
               s"The second argument of First must be a boolean literal, but got: ${ignoreNullsExpr.sql}")

So, this PR proposes to change a vriable for IGNORE NULLS from Expression to Boolean to avoid the case.

This is the backport of #29143.

Why are the changes needed?

Bugfix.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added a test in DataFrameAggregateSuite.

dongjoon-hyun

+1, LGTM. (Pending Jenkins.)

maropu · 2020-07-20T06:33:09Z

Thanks, @dongjoon-hyun, for the update in the jira.

dongjoon-hyun · 2020-07-20T06:33:23Z

:)

…ullsExpr in distinct aggregates ### What changes were proposed in this pull request? This PR intends to fix a bug of distinct FIRST/LAST aggregates in v2.4.6; ``` scala> sql("SELECT FIRST(DISTINCT v) FROM VALUES 1, 2, 3 t(v)").show() ... Caused by: java.lang.UnsupportedOperationException: Cannot evaluate expression: false#37 at org.apache.spark.sql.catalyst.expressions.Unevaluable$class.eval(Expression.scala:258) at org.apache.spark.sql.catalyst.expressions.AttributeReference.eval(namedExpressions.scala:226) at org.apache.spark.sql.catalyst.expressions.aggregate.First.ignoreNulls(First.scala:68) at org.apache.spark.sql.catalyst.expressions.aggregate.First.updateExpressions$lzycompute(First.scala:82) at org.apache.spark.sql.catalyst.expressions.aggregate.First.updateExpressions(First.scala:81) at org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$15.apply(HashAggregateExec.scala:268) ``` A root cause of this bug is that the `Aggregation` strategy replaces a foldable boolean `ignoreNullsExpr` expr with a `Unevaluable` expr (`AttributeReference`) for distinct FIRST/LAST aggregate functions. But, this operation cannot be allowed because the `Analyzer` has checked that it must be foldabe; https://github.com/apache/spark/blob/ffdbbae1d465fe2c710d020de62ca1a6b0b924d9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/First.scala#L74-L76 So, this PR proposes to change a vriable for `IGNORE NULLS` from `Expression` to `Boolean` to avoid the case. This is the backport of #29143. ### Why are the changes needed? Bugfix. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added a test in `DataFrameAggregateSuite`. Closes #29157 from maropu/SPARK-32344-BRANCH2.4. Authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

dongjoon-hyun · 2020-07-20T06:37:41Z

This PR removed UnsupportedOperationException and all relevant test cases are passed in Scala/Java. Also, Python UTs passed. Currently, it's running on R mllib testing. So, I merged this to branch-2.4. Thanks!

SparkQA · 2020-07-20T06:50:37Z

Test build #126141 has finished for PR 29157 at commit c190886.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class First(child: Expression, ignoreNulls: Boolean)
case class Last(child: Expression, ignoreNulls: Boolean)

Fix

c190886

probot-autolabeler bot added the SQL label Jul 20, 2020

dongjoon-hyun approved these changes Jul 20, 2020

View reviewed changes

HyukjinKwon approved these changes Jul 20, 2020

View reviewed changes

dongjoon-hyun closed this Jul 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-32344][SQL][2.4] Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates #29157

[SPARK-32344][SQL][2.4] Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates #29157

Uh oh!

maropu commented Jul 20, 2020 •

edited

Loading

Uh oh!

dongjoon-hyun left a comment

Uh oh!

maropu commented Jul 20, 2020

Uh oh!

dongjoon-hyun commented Jul 20, 2020

Uh oh!

dongjoon-hyun commented Jul 20, 2020

Uh oh!

SparkQA commented Jul 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	} else if (!ignoreNullsExpr.foldable) {
	TypeCheckFailure(
	s"The second argument of First must be a boolean literal, but got: ${ignoreNullsExpr.sql}")

[SPARK-32344][SQL][2.4] Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates #29157

[SPARK-32344][SQL][2.4] Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates #29157

Uh oh!

Conversation

maropu commented Jul 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

maropu commented Jul 20, 2020

Uh oh!

dongjoon-hyun commented Jul 20, 2020

Uh oh!

dongjoon-hyun commented Jul 20, 2020

Uh oh!

SparkQA commented Jul 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

maropu commented Jul 20, 2020 •

edited

Loading