[SPARK-28537][SQL] DebugExec cannot debug broadcast or columnar related queries. #25274

sarutak · 2019-07-27T17:10:25Z

DebugExec does not implement doExecuteBroadcast and doExecuteColumnar so we can't debug broadcast or columnar related query.

One example for broadcast is here.

val df1 = Seq(1, 2, 3).toDF
val df2 = Seq(1, 2, 3).toDF
val joined = df1.join(df2, df1("value") === df2("value"))
joined.debug()

java.lang.UnsupportedOperationException: Debug does not implement doExecuteBroadcast
...

Another for columnar is here.

val df = Seq(1, 2, 3).toDF
df.persist
df.debug()

java.lang.IllegalStateException: Internal Error class org.apache.spark.sql.execution.debug.package$DebugExec has column support mismatch:
...

How was this patch tested?

Additional test cases in DebuggingSuite.

…g broadcast or columnar related queries

SparkQA · 2019-07-27T21:54:48Z

Test build #108253 has finished for PR 25274 at commit 6fd6260.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2019-07-28T00:08:41Z

sql/core/src/test/scala/org/apache/spark/sql/execution/debug/DebuggingSuite.scala

+    val rightDF = spark.range(10)
+    val leftDF = spark.range(10)
+    val joinedDF = leftDF.join(rightDF, leftDF("id") === rightDF("id"))
+    Try {


How about checking the actual output like this instead of using Try?

assert(joinedDF.queryExecution.sparkPlan.collect { case _: BroadcastHashJoinExec => true }.nonEmpty) val output = new java.io.ByteArrayOutputStream() Console.withOut(output) { joinedDF.debug() } assert(output.toString.contains("BroadcastHashJoin"))

Thanks for the comment. I changed test-cases added to compare expected result to actual one using Console.withOut . But I think it's good to still handle exceptions for better error message when test-cases fail.
To do so, it's easy for us to identify which test-case fails (line number appears on the top of error message).

[info] - SPARK-28537: DebugExec cannot debug broadcast or columnar related queries *** FAILED *** (89 milliseconds) [info] debug() for broadcast failed with exception (DebuggingSuite.scala:76) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:528) [info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:527) [info] at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)

SparkQA · 2019-07-28T11:45:34Z

Test build #108269 has finished for PR 25274 at commit 8ee28f5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2019-07-28T23:27:06Z

sql/core/src/test/scala/org/apache/spark/sql/execution/debug/DebuggingSuite.scala

      subtree.contains("Range") && code.contains("Object[]")})
  }
+
+  test("SPARK-28537: DebugExec cannot debug broadcast or columnar related queries") {


IMO this is of improvements for debugging, so we don't need the prefix. cc: @dongjoon-hyun

or -> and in the title.

It seems that @sarutak reported this issue as a BUG.

I think this is a BUG.

maropu · 2019-07-28T23:28:27Z

sql/core/src/test/scala/org/apache/spark/sql/execution/debug/DebuggingSuite.scala

+          | id LongType: {}
+          |""".stripMargin))
+    } catch {
+      case e: Throwable => fail("debug() for columnar failed with exception", e)


case NonFatal(e) =>

maropu · 2019-07-28T23:30:28Z

sql/core/src/test/scala/org/apache/spark/sql/execution/debug/DebuggingSuite.scala

+
+      val exprId = df.queryExecution.executedPlan.output.head.toString
+      val output = captured.toString()
+      assert(output.contains(


For these kinds of tests, substring matching seems to be better than exact matching.

I'll follow the manner in ExplainSuite.

maropu

LGTM except for minor comments.

SparkQA · 2019-07-29T12:59:08Z

Test build #108316 has finished for PR 25274 at commit 2685272.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2019-07-30T00:55:33Z

cc: @dongjoon-hyun

sarutak · 2019-08-04T14:37:11Z

@dongjoon-hyun Do you have any feedbacks?

mgaido91 · 2019-08-04T14:59:07Z

sql/core/src/test/scala/org/apache/spark/sql/execution/debug/DebuggingSuite.scala

+        |== Range (0, 10, step=1, splits=2) ==
+        |Tuples output: 0
+        | id LongType: {}""".stripMargin))
+    } catch {


why do we need this catch? the test would fail anyway

It's for better error message.
With catch, we can identify which assertion fails easily.
But if we split test cases, I think we can remove try/catch

mgaido91 · 2019-08-04T14:59:31Z

sql/core/src/test/scala/org/apache/spark/sql/execution/debug/DebuggingSuite.scala

+          |Tuples output: 0
+          | id LongType: {}
+          |""".stripMargin))
+    } catch {


mgaido91 · 2019-08-04T14:59:52Z

sql/core/src/test/scala/org/apache/spark/sql/execution/debug/DebuggingSuite.scala

+      case NonFatal(e) => fail("debug() for broadcast failed with exception", e)
+    }
+
+    val df = spark.range(5)


can we split this into 2 different tests?

mgaido91

LGTM apart from few comments

mgaido91 · 2019-08-04T15:50:29Z

sql/core/src/test/scala/org/apache/spark/sql/execution/debug/DebuggingSuite.scala

+
+    val output = captured.toString()replaceAll ("#\\d+", "#x")
+    assert(output.contains(
+      s"""== InMemoryTableScan [id#xL] ==


nit:

Suggested change

s"""== InMemoryTableScan [id#xL] ==

"""== InMemoryTableScan [id#xL] ==

Oh... I forgot to remove it.

mgaido91 · 2019-08-04T15:51:13Z

sql/core/src/test/scala/org/apache/spark/sql/execution/debug/DebuggingSuite.scala

+      df.debug()
+    }
+
+    val output = captured.toString()replaceAll ("#\\d+", "#x")


nit:

Suggested change

val output = captured.toString()replaceAll ("#\\d+", "#x")

val output = captured.toString().replaceAll("#\\d+", "#x")

Thanks! What an embarrassing mistake...

mgaido91 · 2019-08-04T15:52:01Z

sql/core/src/test/scala/org/apache/spark/sql/execution/debug/DebuggingSuite.scala

+
+  test("SPARK-28537: DebugExec cannot debug columnar related queries") {
+    val df = spark.range(5)
+    df.persist()


shall we unpersist this?

SparkQA · 2019-08-04T18:37:03Z

Test build #108624 has finished for PR 25274 at commit ee1c26f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-08-04T19:43:41Z

Test build #108626 has finished for PR 25274 at commit 46c6598.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2019-08-05T23:27:53Z

Thanks! Merged to master.
Thanks all for your work! @sarutak @mgaido91 @kiszk

HyukjinKwon · 2019-08-06T05:39:38Z

Hmmm .. seems this commit causes the test failure:

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108687/
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108688/
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108693/

There are three consecutive builds being failed and it's failed in my local too.

HyukjinKwon · 2019-08-06T05:40:47Z

seems 03e3006 was conflicted with this one.

Implement doExecuteBroadcast and doExecuteColumnar to enable debuggin…

6fd6260

…g broadcast or columnar related queries

dongjoon-hyun added the SQL label Jul 27, 2019

maropu reviewed Jul 28, 2019

View reviewed changes

Change the test cases added to compare expected result.

8ee28f5

maropu reviewed Jul 28, 2019

View reviewed changes

maropu approved these changes Jul 28, 2019

View reviewed changes

Reflect comments for the test cases

2685272

maropu approved these changes Jul 30, 2019

View reviewed changes

kiszk approved these changes Aug 4, 2019

View reviewed changes

mgaido91 reviewed Aug 4, 2019

View reviewed changes

Split added test cases

ee1c26f

mgaido91 reviewed Aug 4, 2019

View reviewed changes

sarutak added 2 commits August 5, 2019 01:06

Fix styles

e67d004

Removed NonFatal

46c6598

mgaido91 approved these changes Aug 4, 2019

View reviewed changes

maropu closed this Aug 5, 2019

HyukjinKwon mentioned this pull request Aug 6, 2019

[SPARK-28537][SQL][HOTFIX][FOLLOW-UP] Add supportColumnar in DebugExec #25365

Closed

	s"""== InMemoryTableScan [id#xL] ==
	"""== InMemoryTableScan [id#xL] ==

	val output = captured.toString()replaceAll ("#\\d+", "#x")
	val output = captured.toString().replaceAll("#\\d+", "#x")

[SPARK-28537][SQL] DebugExec cannot debug broadcast or columnar related queries. #25274

[SPARK-28537][SQL] DebugExec cannot debug broadcast or columnar related queries. #25274

Uh oh!

Conversation

sarutak commented Jul 27, 2019

How was this patch tested?

Uh oh!

SparkQA commented Jul 27, 2019

Uh oh!

maropu Jul 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sarutak Jul 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 28, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu Jul 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 29, 2019

Uh oh!

maropu commented Jul 30, 2019

Uh oh!

sarutak commented Aug 4, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mgaido91 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Aug 4, 2019

Uh oh!

SparkQA commented Aug 4, 2019

Uh oh!

maropu commented Aug 5, 2019

Uh oh!

HyukjinKwon commented Aug 6, 2019

Uh oh!

HyukjinKwon commented Aug 6, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

maropu Jul 28, 2019 •

edited

Loading

sarutak Jul 28, 2019 •

edited

Loading

maropu Jul 28, 2019 •

edited

Loading