Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,10 @@ class EquivalentExpressions {
val skip = expr.isInstanceOf[LeafExpression] ||
// `LambdaVariable` is usually used as a loop variable, which can't be evaluated ahead of the
// loop. So we can't evaluate sub-expressions containing `LambdaVariable` at the beginning.
expr.find(_.isInstanceOf[LambdaVariable]).isDefined
expr.find(_.isInstanceOf[LambdaVariable]).isDefined ||
// `PlanExpression` wraps query plan. To compare query plans of `PlanExpression` on executor,
// can cause unexpected error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there isn't any other reason, shall we mention NPE specifically instead of unexpected error? Both SPARK-29239 and SPARK-29221 are due to NPE.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I updated this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding including NPE after unexpected error? IMHO unexpected error is actually correct (we can't predict which error we will get), but it would help much if we enumerate known errors as well.

Copy link
Contributor

@HeartSaVioR HeartSaVioR Sep 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I realized my browser didn't show the comment from @viirya . It's just a 2 cents and like NPE seems OK to me.

expr.isInstanceOf[PlanExpression[_]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to only skip it on executors? This may still be useful when we codegen at driver side.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like check TaskContext.get != null?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah that's a good idea!


// There are some special expressions that we should not recurse into all of its children.
// 1. CodegenFallback: it's children will not be used to generate code (call eval() instead)
Expand Down
17 changes: 17 additions & 0 deletions sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ import java.util.concurrent.atomic.AtomicBoolean

import org.apache.spark.{AccumulatorSuite, SparkException}
import org.apache.spark.scheduler.{SparkListener, SparkListenerJobStart}
import org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation
import org.apache.spark.sql.catalyst.util.StringUtils
import org.apache.spark.sql.execution.aggregate.{HashAggregateExec, SortAggregateExec}
import org.apache.spark.sql.execution.columnar.InMemoryTableScanExec
Expand Down Expand Up @@ -3149,6 +3150,7 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession {
checkAnswer(sql("select * from t1 where d > '1999-13'"), Row(result))
checkAnswer(sql("select to_timestamp('2000-01-01 01:10:00') > '1'"), Row(true))
}
sql("DROP VIEW t1")
}

test("SPARK-28156: self-join should not miss cached view") {
Expand Down Expand Up @@ -3192,6 +3194,21 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession {
checkAnswer(df3, Array(Row(new java.math.BigDecimal("0.100000000000000000000000100"))))
}
}

test("SPARK-29239: Subquery should not cause NPE when eliminating subexpression") {
withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false",
SQLConf.SUBQUERY_REUSE_ENABLED.key -> "false",
SQLConf.CODEGEN_FACTORY_MODE.key -> "CODEGEN_ONLY",
SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> ConvertToLocalRelation.ruleName) {
withTempView("t1", "t2") {
sql("create temporary view t1 as select * from values ('val1a', 10L) as t1(t1a, t1b)")
sql("create temporary view t2 as select * from values ('val3a', 110L) as t2(t2a, t2b)")
val df = sql("SELECT min, min from (SELECT (SELECT min(t2b) FROM t2) min " +
"FROM t1 WHERE t1a = 'val1c')")
assert(df.collect().size == 0)
}
}
}
}

case class Foo(bar: Option[String])