[SPARK-26129][SQL] Instrumentation for per-query planning time #23096

rxin · 2018-11-20T16:24:22Z

What changes were proposed in this pull request?

We currently don't have good visibility into query planning time (analysis vs optimization vs physical planning). This patch adds a simple utility to track the runtime of various rules and various planning phases.

How was this patch tested?

Added unit tests and end-to-end integration tests.

rxin · 2018-11-20T16:25:20Z

cc @hvanhovell @gatorsmile

This is different from the existing metrics for rules as it is query specific. We might want to replace that one with this in the future.

hvanhovell

LGTM - pending jenkins

SparkQA · 2018-11-20T17:18:17Z

Test build #99069 has finished for PR 23096 at commit dd61273.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-11-20T17:34:41Z

retest this please

dongjoon-hyun · 2018-11-20T18:08:11Z

sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala

   */
  def sql(sqlText: String): DataFrame = {
-    Dataset.ofRows(self, sessionState.sqlParser.parsePlan(sqlText))
+    val tracker = new QueryPlanningTracker


Hi, @rxin .
Can we have an option to disable this new feature?

@dongjoon-hyun just out of curiosity, what would you like to disable here?

This is a Nice-To-Have instead of Must-Have. The framework seems to be designed to support disabling by assigning None here. If tracker is None, all the following operation will ignore this new feature.

I don't think it makes sense to add random flags for everything. If the argument is that this change has a decent chance of introducing regressions (e.g. due to higher memory usage, or cpu overhead), then it would make a lot of sense to put it behind a flag so it can be disabled in production if that happens.

That said, the overhead on the hot code path here is substantially smaller than even transforming the simplest Catalyst plan (hash map look up is orders of magnitude cheaper than calling a partial function to transform a Scala collection for TreeNode), so I think the risk is so low that it does not warrant adding a config.

Thanks. I got it, @rxin .

SparkQA · 2018-11-20T18:39:08Z

Test build #99067 has finished for PR 23096 at commit b2a2a01.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class QueryExecution(

SparkQA · 2018-11-20T19:43:55Z

Test build #99070 has finished for PR 23096 at commit dd61273.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

abehm · 2018-11-20T19:16:24Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/QueryPlanningTracker.scala

+  }
+
+  /**
+   * Reecord a specific invocation of a rule.


typo: Record

abehm · 2018-11-20T19:24:07Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

        substituteCTE(child, relations.foldLeft(Seq.empty[(String, LogicalPlan)]) {
          case (resolved, (name, relation)) =>
-            resolved :+ name -> executeSameContext(substituteCTE(relation, resolved))
+            resolved :+ name -> executeSameContext(substituteCTE(relation, resolved), None)


For my understanding, why should we pass a None tracker here? Wouldn't this hide the time of, e.g., Metastore operations to resolve tables in the CTE definition?

abehm · 2018-11-20T19:43:03Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

              "around this.")
          }
-          executeSameContext(child)
+          executeSameContext(child, None)


Without knowing this code well it isn't obvious to me why sometimes we pass None as the tracker. What's the thinking behind it?

No great reason. I just used None for everything, except the top level, because it is very difficult to wire the tracker here without refactoring a lot of code.

abehm · 2018-11-20T20:01:54Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/QueryPlanningTracker.scala

+object QueryPlanningTracker {
+
+  // Define a list of common phases here.
+  val PARSING = "parsing"


Why not enum?

Mostly because Scala enum is not great, and I was thinking about making this a generic thing that's extensible.

abehm · 2018-11-20T20:06:49Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala

            queryExecutionMetrics.incExecutionTimeBy(rule.ruleName, runTime)
            queryExecutionMetrics.incNumExecution(rule.ruleName)

+            tracker.foreach(_.recordRuleInvocation(rule.ruleName, runTime, effective))


Doesn't this make the query-local and the global metrics inconsistent when tracker is None?

yes! (not great -- but I'd probably remove the global tracker at some point)

removing the global tracker would be great!

SparkQA · 2018-11-20T20:19:52Z

Test build #99065 has finished for PR 23096 at commit b6a3d02.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class RuleSummary(
class QueryPlanningTracker

SparkQA · 2018-11-20T21:39:29Z

Test build #99080 has finished for PR 23096 at commit f36a231.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

abehm

Thanks for making the thread-local changes. Understood it's not ideal, but at least following an existing pattern and we get the full level of detail now!

I'm happy with these changes.

abehm · 2018-11-20T21:45:37Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala

            queryExecutionMetrics.incExecutionTimeBy(rule.ruleName, runTime)
            queryExecutionMetrics.incNumExecution(rule.ruleName)

+            if (tracker ne null) {


Why do we need to be defensive. Should this be an assert instead? Might be worth a comment explaining.

if one calls execute directly tracker would be null.

SparkQA · 2018-11-20T23:39:31Z

Test build #99081 has finished for PR 23096 at commit 2cd069c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-11-21T10:50:41Z

Test build #4437 has finished for PR 23096 at commit 2cd069c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-11-21T15:21:07Z

Test build #99118 has finished for PR 23096 at commit 34f8bfe.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2018-11-21T15:40:40Z

Merging this. Feel free to leave more comments. I'm hoping we can wire this into the UI eventually.

## What changes were proposed in this pull request? We currently don't have good visibility into query planning time (analysis vs optimization vs physical planning). This patch adds a simple utility to track the runtime of various rules and various planning phases. ## How was this patch tested? Added unit tests and end-to-end integration tests. Closes apache#23096 from rxin/SPARK-26129. Authored-by: Reynold Xin <[email protected]> Signed-off-by: Reynold Xin <[email protected]>

[SPARK-26129][SQL] Instrumentation for query planning time

b6a3d02

rxin changed the title ~~[SPARK-26129][SQL] Instrumentation for query planning time~~ [SPARK-26129][SQL] Instrumentation for per-query planning time Nov 20, 2018

rxin added 2 commits November 20, 2018 17:43

Add parsing

b2a2a01

remove merge

dd61273

hvanhovell approved these changes Nov 20, 2018

View reviewed changes

dongjoon-hyun reviewed Nov 20, 2018

View reviewed changes

abehm reviewed Nov 20, 2018

View reviewed changes

rxin added 2 commits November 20, 2018 22:29

thread local

f36a231

fix compile

2cd069c

abehm reviewed Nov 20, 2018

View reviewed changes

fix Hive test cases

34f8bfe

asfgit closed this in 07a700b Nov 21, 2018

[SPARK-26129][SQL] Instrumentation for per-query planning time #23096

[SPARK-26129][SQL] Instrumentation for per-query planning time #23096

Uh oh!

Conversation

rxin commented Nov 20, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

rxin commented Nov 20, 2018

Uh oh!

hvanhovell left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 20, 2018

Uh oh!

gatorsmile commented Nov 20, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Nov 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 20, 2018

Uh oh!

SparkQA commented Nov 20, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 20, 2018

Uh oh!

SparkQA commented Nov 20, 2018

Uh oh!

abehm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 20, 2018

Uh oh!

SparkQA commented Nov 21, 2018

Uh oh!

SparkQA commented Nov 21, 2018

Uh oh!

rxin commented Nov 21, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

dongjoon-hyun Nov 20, 2018 •

edited

Loading