[SPARK-24840][SQL] do not use dummy filter to switch codegen on/of by cloud-fan · Pull Request #21795 · apache/spark

cloud-fan · 2018-07-17T14:58:44Z

What changes were proposed in this pull request?

It's a little tricky and fragile to use a dummy filter to switch codegen on/off. For now we should use local/cached relation to switch. In the future when we are able to use a config to turn off codegen, we shall use that.

How was this patch tested?

test only PR.

cloud-fan · 2018-07-17T14:59:21Z

cc @mn-mikke @ueshin @viirya

cloud-fan · 2018-07-17T15:00:32Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

-    checkResult(mapIfDF, false)
-
-    // with codegen
-    checkResult(structWhenDF.filter('cond.isNotNull), true)


no matter we add filter or not, the Project will always be evaluated without codegen, because it's above local relation and the optimizer will evaluate it eagerly.

If it's the case why the assert didn't fail?

hmm, this is the execution plan I see for structWhenDF.filter('cond.isNotNull):

*(1) Project [CASE WHEN cond#77042 THEN [a,10] ELSE s#77043 END.val1 AS res.val1#77054] +- *(1) Filter isnotnull(cond#77042) +- LocalTableScan [cond#77042, s#77043, a#77044, m#77045]

This is the execution plan for structWhenDF:

LocalTableScan [res.val1#77079]

ah that's tricky. Because filter pushdown runs first, the local relation optimization can't be applied.

To prevent confusions like this, how about we use local/cached relation to test it?

BTW if the local relation optimization includes filter in the future, this test will be broken.

I saw some tests using similar dummy filters in DataFrameFunctionsSuite. Should we fix them as well?

@cloud-fan Thanks for the clarification and this PR!

Btw, there are many tests in DataFrameFunctionsSuite that test only the scenarios without codgen. WDYT about adding a generic checkAnswer method to QueryTest that would evaluate a dataframe for both cases similarly like ExressionEvalHelper.checkEvaluation does for expressions? If it's possible, of course.

it will be very hard to write a general checkAnswer, because the local relation optimization can only handle Project. I'd like to wait for the general codegen config.

SparkQA · 2018-07-17T18:31:19Z

Test build #93178 has finished for PR 21795 at commit d1bc612.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-07-18T07:05:01Z

Test build #93215 has finished for PR 21795 at commit de5a232.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-07-18T07:11:26Z

retest this please

SparkQA · 2018-07-18T08:03:25Z

Test build #93219 has finished for PR 21795 at commit de5a232.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

ueshin · 2018-07-18T09:05:14Z

Jenkins, retest this please.

SparkQA · 2018-07-18T12:42:36Z

Test build #93225 has finished for PR 21795 at commit de5a232.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mn-mikke · 2018-07-18T13:30:02Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala

-      oneRowDF.filter(dummyFilter('i)).selectExpr("reverse(array(1, null, 2, null))"),
-      Seq(Row(Seq(null, 2, null, 1)))
-    )
+    def checkResult2(): Unit = {


What about using more specific names for functions checkResult2, checkResult3 etc.? Maybe checkStringTestCases, checkCasesWithArraysOfComplexTypes or something like that?

mn-mikke · 2018-07-18T13:32:16Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala

-    val dummyFilter = (c: Column) => c.isNull || c.isNotNull // switch codeGen on
-
    // Simple test cases
-    checkAnswer(


Good catch!

SparkQA · 2018-07-18T19:43:06Z

Test build #93239 has finished for PR 21795 at commit c83eeeb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2018-07-18T22:07:00Z

LGTM

mgaido91 · 2018-07-18T22:51:26Z

LGTM

cloud-fan · 2018-07-19T03:54:59Z

thanks, merging to master!

fix the test

d1bc612

cloud-fan commented Jul 17, 2018

View reviewed changes

fix more places

de5a232

cloud-fan changed the title ~~[SPARK-24165][SQL][followup] Fixing conditional expressions to handle nullability of nested types~~ [SPARK-24840][SQL] do not use dummy filter to switch codegen on/of Jul 18, 2018

mn-mikke reviewed Jul 18, 2018

View reviewed changes

address comment

c83eeeb

asfgit closed this in d05a926 Jul 19, 2018

ueshin mentioned this pull request Jun 4, 2019

[SPARK-27905] [SQL] Add higher order function 'forall' #24761

Closed

Comments

Conversation

cloud-fan commented Jul 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

cloud-fan commented Jul 17, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Jul 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 17, 2018

Uh oh!

SparkQA commented Jul 18, 2018

Uh oh!

HyukjinKwon commented Jul 18, 2018

Uh oh!

SparkQA commented Jul 18, 2018

Uh oh!

ueshin commented Jul 18, 2018

Uh oh!

SparkQA commented Jul 18, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 18, 2018

Uh oh!

viirya commented Jul 18, 2018

Uh oh!

mgaido91 commented Jul 18, 2018

Uh oh!

cloud-fan commented Jul 19, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

cloud-fan commented Jul 17, 2018 •

edited

Loading

cloud-fan Jul 18, 2018 •

edited

Loading