Skip to content

Conversation

@rxin
Copy link
Contributor

@rxin rxin commented Nov 20, 2018

What changes were proposed in this pull request?

We currently don't have good visibility into query planning time (analysis vs optimization vs physical planning). This patch adds a simple utility to track the runtime of various rules and various planning phases.

How was this patch tested?

Added unit tests and end-to-end integration tests.

@rxin rxin changed the title [SPARK-26129][SQL] Instrumentation for query planning time [SPARK-26129][SQL] Instrumentation for per-query planning time Nov 20, 2018
@rxin
Copy link
Contributor Author

rxin commented Nov 20, 2018

cc @hvanhovell @gatorsmile

This is different from the existing metrics for rules as it is query specific. We might want to replace that one with this in the future.

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - pending jenkins

@SparkQA
Copy link

SparkQA commented Nov 20, 2018

Test build #99069 has finished for PR 23096 at commit dd61273.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

retest this please

*/
def sql(sqlText: String): DataFrame = {
Dataset.ofRows(self, sessionState.sqlParser.parsePlan(sqlText))
val tracker = new QueryPlanningTracker
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @rxin .
Can we have an option to disable this new feature?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun just out of curiosity, what would you like to disable here?

Copy link
Member

@dongjoon-hyun dongjoon-hyun Nov 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a Nice-To-Have instead of Must-Have. The framework seems to be designed to support disabling by assigning None here. If tracker is None, all the following operation will ignore this new feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to add random flags for everything. If the argument is that this change has a decent chance of introducing regressions (e.g. due to higher memory usage, or cpu overhead), then it would make a lot of sense to put it behind a flag so it can be disabled in production if that happens.

That said, the overhead on the hot code path here is substantially smaller than even transforming the simplest Catalyst plan (hash map look up is orders of magnitude cheaper than calling a partial function to transform a Scala collection for TreeNode), so I think the risk is so low that it does not warrant adding a config.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I got it, @rxin .

@SparkQA
Copy link

SparkQA commented Nov 20, 2018

Test build #99067 has finished for PR 23096 at commit b2a2a01.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class QueryExecution(

@SparkQA
Copy link

SparkQA commented Nov 20, 2018

Test build #99070 has finished for PR 23096 at commit dd61273.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}

/**
* Reecord a specific invocation of a rule.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: Record

substituteCTE(child, relations.foldLeft(Seq.empty[(String, LogicalPlan)]) {
case (resolved, (name, relation)) =>
resolved :+ name -> executeSameContext(substituteCTE(relation, resolved))
resolved :+ name -> executeSameContext(substituteCTE(relation, resolved), None)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding, why should we pass a None tracker here? Wouldn't this hide the time of, e.g., Metastore operations to resolve tables in the CTE definition?

"around this.")
}
executeSameContext(child)
executeSameContext(child, None)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without knowing this code well it isn't obvious to me why sometimes we pass None as the tracker. What's the thinking behind it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No great reason. I just used None for everything, except the top level, because it is very difficult to wire the tracker here without refactoring a lot of code.

object QueryPlanningTracker {

// Define a list of common phases here.
val PARSING = "parsing"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not enum?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly because Scala enum is not great, and I was thinking about making this a generic thing that's extensible.

queryExecutionMetrics.incExecutionTimeBy(rule.ruleName, runTime)
queryExecutionMetrics.incNumExecution(rule.ruleName)

tracker.foreach(_.recordRuleInvocation(rule.ruleName, runTime, effective))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this make the query-local and the global metrics inconsistent when tracker is None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! (not great -- but I'd probably remove the global tracker at some point)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing the global tracker would be great!

@SparkQA
Copy link

SparkQA commented Nov 20, 2018

Test build #99065 has finished for PR 23096 at commit b6a3d02.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class RuleSummary(
  • class QueryPlanningTracker

@SparkQA
Copy link

SparkQA commented Nov 20, 2018

Test build #99080 has finished for PR 23096 at commit f36a231.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link

@abehm abehm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the thread-local changes. Understood it's not ideal, but at least following an existing pattern and we get the full level of detail now!

I'm happy with these changes.

queryExecutionMetrics.incExecutionTimeBy(rule.ruleName, runTime)
queryExecutionMetrics.incNumExecution(rule.ruleName)

if (tracker ne null) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to be defensive. Should this be an assert instead? Might be worth a comment explaining.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if one calls execute directly tracker would be null.

@SparkQA
Copy link

SparkQA commented Nov 20, 2018

Test build #99081 has finished for PR 23096 at commit 2cd069c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 21, 2018

Test build #4437 has finished for PR 23096 at commit 2cd069c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 21, 2018

Test build #99118 has finished for PR 23096 at commit 34f8bfe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor Author

rxin commented Nov 21, 2018

Merging this. Feel free to leave more comments. I'm hoping we can wire this into the UI eventually.

@asfgit asfgit closed this in 07a700b Nov 21, 2018
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?
We currently don't have good visibility into query planning time (analysis vs optimization vs physical planning). This patch adds a simple utility to track the runtime of various rules and various planning phases.

## How was this patch tested?
Added unit tests and end-to-end integration tests.

Closes apache#23096 from rxin/SPARK-26129.

Authored-by: Reynold Xin <[email protected]>
Signed-off-by: Reynold Xin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants