Skip to content

Comments

[SPARK-24840][SQL] do not use dummy filter to switch codegen on/of#21795

Closed
cloud-fan wants to merge 3 commits intoapache:masterfrom
cloud-fan:follow
Closed

[SPARK-24840][SQL] do not use dummy filter to switch codegen on/of#21795
cloud-fan wants to merge 3 commits intoapache:masterfrom
cloud-fan:follow

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Jul 17, 2018

What changes were proposed in this pull request?

It's a little tricky and fragile to use a dummy filter to switch codegen on/off. For now we should use local/cached relation to switch. In the future when we are able to use a config to turn off codegen, we shall use that.

How was this patch tested?

test only PR.

@cloud-fan
Copy link
Contributor Author

cc @mn-mikke @ueshin @viirya

checkResult(mapIfDF, false)

// with codegen
checkResult(structWhenDF.filter('cond.isNotNull), true)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no matter we add filter or not, the Project will always be evaluated without codegen, because it's above local relation and the optimizer will evaluate it eagerly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's the case why the assert didn't fail?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this is the execution plan I see for structWhenDF.filter('cond.isNotNull):

*(1) Project [CASE WHEN cond#77042 THEN [a,10] ELSE s#77043 END.val1 AS res.val1#77054]
+- *(1) Filter isnotnull(cond#77042)
   +- LocalTableScan [cond#77042, s#77043, a#77044, m#77045]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the execution plan for structWhenDF:

LocalTableScan [res.val1#77079]      

Copy link
Contributor Author

@cloud-fan cloud-fan Jul 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah that's tricky. Because filter pushdown runs first, the local relation optimization can't be applied.

To prevent confusions like this, how about we use local/cached relation to test it?

BTW if the local relation optimization includes filter in the future, this test will be broken.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw some tests using similar dummy filters in DataFrameFunctionsSuite. Should we fix them as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan Thanks for the clarification and this PR!

Btw, there are many tests in DataFrameFunctionsSuite that test only the scenarios without codgen. WDYT about adding a generic checkAnswer method to QueryTest that would evaluate a dataframe for both cases similarly like ExressionEvalHelper.checkEvaluation does for expressions? If it's possible, of course.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be very hard to write a general checkAnswer, because the local relation optimization can only handle Project. I'd like to wait for the general codegen config.

@SparkQA
Copy link

SparkQA commented Jul 17, 2018

Test build #93178 has finished for PR 21795 at commit d1bc612.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan cloud-fan changed the title [SPARK-24165][SQL][followup] Fixing conditional expressions to handle nullability of nested types [SPARK-24840][SQL] do not use dummy filter to switch codegen on/of Jul 18, 2018
@SparkQA
Copy link

SparkQA commented Jul 18, 2018

Test build #93215 has finished for PR 21795 at commit de5a232.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 18, 2018

Test build #93219 has finished for PR 21795 at commit de5a232.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ueshin
Copy link
Member

ueshin commented Jul 18, 2018

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jul 18, 2018

Test build #93225 has finished for PR 21795 at commit de5a232.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

oneRowDF.filter(dummyFilter('i)).selectExpr("reverse(array(1, null, 2, null))"),
Seq(Row(Seq(null, 2, null, 1)))
)
def checkResult2(): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using more specific names for functions checkResult2, checkResult3 etc.? Maybe checkStringTestCases, checkCasesWithArraysOfComplexTypes or something like that?

val dummyFilter = (c: Column) => c.isNull || c.isNotNull // switch codeGen on

// Simple test cases
checkAnswer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@SparkQA
Copy link

SparkQA commented Jul 18, 2018

Test build #93239 has finished for PR 21795 at commit c83eeeb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Jul 18, 2018

LGTM

1 similar comment
@mgaido91
Copy link
Contributor

LGTM

@cloud-fan
Copy link
Contributor Author

thanks, merging to master!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants