[SPARK-24840][SQL] do not use dummy filter to switch codegen on/of#21795
[SPARK-24840][SQL] do not use dummy filter to switch codegen on/of#21795cloud-fan wants to merge 3 commits intoapache:masterfrom
Conversation
| checkResult(mapIfDF, false) | ||
|
|
||
| // with codegen | ||
| checkResult(structWhenDF.filter('cond.isNotNull), true) |
There was a problem hiding this comment.
no matter we add filter or not, the Project will always be evaluated without codegen, because it's above local relation and the optimizer will evaluate it eagerly.
There was a problem hiding this comment.
hmm, this is the execution plan I see for structWhenDF.filter('cond.isNotNull):
*(1) Project [CASE WHEN cond#77042 THEN [a,10] ELSE s#77043 END.val1 AS res.val1#77054]
+- *(1) Filter isnotnull(cond#77042)
+- LocalTableScan [cond#77042, s#77043, a#77044, m#77045]There was a problem hiding this comment.
This is the execution plan for structWhenDF:
LocalTableScan [res.val1#77079] There was a problem hiding this comment.
ah that's tricky. Because filter pushdown runs first, the local relation optimization can't be applied.
To prevent confusions like this, how about we use local/cached relation to test it?
BTW if the local relation optimization includes filter in the future, this test will be broken.
There was a problem hiding this comment.
I saw some tests using similar dummy filters in DataFrameFunctionsSuite. Should we fix them as well?
There was a problem hiding this comment.
@cloud-fan Thanks for the clarification and this PR!
Btw, there are many tests in DataFrameFunctionsSuite that test only the scenarios without codgen. WDYT about adding a generic checkAnswer method to QueryTest that would evaluate a dataframe for both cases similarly like ExressionEvalHelper.checkEvaluation does for expressions? If it's possible, of course.
There was a problem hiding this comment.
it will be very hard to write a general checkAnswer, because the local relation optimization can only handle Project. I'd like to wait for the general codegen config.
|
Test build #93178 has finished for PR 21795 at commit
|
|
Test build #93215 has finished for PR 21795 at commit
|
|
retest this please |
|
Test build #93219 has finished for PR 21795 at commit
|
|
Jenkins, retest this please. |
|
Test build #93225 has finished for PR 21795 at commit
|
| oneRowDF.filter(dummyFilter('i)).selectExpr("reverse(array(1, null, 2, null))"), | ||
| Seq(Row(Seq(null, 2, null, 1))) | ||
| ) | ||
| def checkResult2(): Unit = { |
There was a problem hiding this comment.
What about using more specific names for functions checkResult2, checkResult3 etc.? Maybe checkStringTestCases, checkCasesWithArraysOfComplexTypes or something like that?
| val dummyFilter = (c: Column) => c.isNull || c.isNotNull // switch codeGen on | ||
|
|
||
| // Simple test cases | ||
| checkAnswer( |
|
Test build #93239 has finished for PR 21795 at commit
|
|
LGTM |
1 similar comment
|
LGTM |
|
thanks, merging to master! |
What changes were proposed in this pull request?
It's a little tricky and fragile to use a dummy filter to switch codegen on/off. For now we should use local/cached relation to switch. In the future when we are able to use a config to turn off codegen, we shall use that.
How was this patch tested?
test only PR.