-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28537][SQL] DebugExec cannot debug broadcast or columnar related queries. #25274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…g broadcast or columnar related queries
|
Test build #108253 has finished for PR 25274 at commit
|
| val rightDF = spark.range(10) | ||
| val leftDF = spark.range(10) | ||
| val joinedDF = leftDF.join(rightDF, leftDF("id") === rightDF("id")) | ||
| Try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about checking the actual output like this instead of using Try?
assert(joinedDF.queryExecution.sparkPlan.collect { case _: BroadcastHashJoinExec => true }.nonEmpty)
val output = new java.io.ByteArrayOutputStream()
Console.withOut(output) {
joinedDF.debug()
}
assert(output.toString.contains("BroadcastHashJoin"))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment. I changed test-cases added to compare expected result to actual one using Console.withOut . But I think it's good to still handle exceptions for better error message when test-cases fail.
To do so, it's easy for us to identify which test-case fails (line number appears on the top of error message).
[info] - SPARK-28537: DebugExec cannot debug broadcast or columnar related queries *** FAILED *** (89 milliseconds)
[info] debug() for broadcast failed with exception (DebuggingSuite.scala:76)
[info] org.scalatest.exceptions.TestFailedException:
[info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:528)
[info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:527)
[info] at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
|
Test build #108269 has finished for PR 25274 at commit
|
| subtree.contains("Range") && code.contains("Object[]")}) | ||
| } | ||
|
|
||
| test("SPARK-28537: DebugExec cannot debug broadcast or columnar related queries") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is of improvements for debugging, so we don't need the prefix. cc: @dongjoon-hyun
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or -> and in the title.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that @sarutak reported this issue as a BUG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a BUG.
| | id LongType: {} | ||
| |""".stripMargin)) | ||
| } catch { | ||
| case e: Throwable => fail("debug() for columnar failed with exception", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case NonFatal(e) =>
|
|
||
| val exprId = df.queryExecution.executedPlan.output.head.toString | ||
| val output = captured.toString() | ||
| assert(output.contains( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these kinds of tests, substring matching seems to be better than exact matching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll follow the manner in ExplainSuite.
maropu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for minor comments.
|
Test build #108316 has finished for PR 25274 at commit
|
|
cc: @dongjoon-hyun |
|
@dongjoon-hyun Do you have any feedbacks? |
| |== Range (0, 10, step=1, splits=2) == | ||
| |Tuples output: 0 | ||
| | id LongType: {}""".stripMargin)) | ||
| } catch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this catch? the test would fail anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's for better error message.
With catch, we can identify which assertion fails easily.
But if we split test cases, I think we can remove try/catch
| |Tuples output: 0 | ||
| | id LongType: {} | ||
| |""".stripMargin)) | ||
| } catch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| case NonFatal(e) => fail("debug() for broadcast failed with exception", e) | ||
| } | ||
|
|
||
| val df = spark.range(5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we split this into 2 different tests?
mgaido91
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM apart from few comments
|
|
||
| val output = captured.toString()replaceAll ("#\\d+", "#x") | ||
| assert(output.contains( | ||
| s"""== InMemoryTableScan [id#xL] == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
| s"""== InMemoryTableScan [id#xL] == | |
| """== InMemoryTableScan [id#xL] == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh... I forgot to remove it.
| df.debug() | ||
| } | ||
|
|
||
| val output = captured.toString()replaceAll ("#\\d+", "#x") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
| val output = captured.toString()replaceAll ("#\\d+", "#x") | |
| val output = captured.toString().replaceAll("#\\d+", "#x") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! What an embarrassing mistake...
|
|
||
| test("SPARK-28537: DebugExec cannot debug columnar related queries") { | ||
| val df = spark.range(5) | ||
| df.persist() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we unpersist this?
|
Test build #108624 has finished for PR 25274 at commit
|
|
Test build #108626 has finished for PR 25274 at commit
|
|
Hmmm .. seems this commit causes the test failure: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108687/ There are three consecutive builds being failed and it's failed in my local too. |
|
seems 03e3006 was conflicted with this one. |
DebugExec does not implement doExecuteBroadcast and doExecuteColumnar so we can't debug broadcast or columnar related query.
One example for broadcast is here.
Another for columnar is here.
How was this patch tested?
Additional test cases in DebuggingSuite.