-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26129][SQL] Instrumentation for per-query planning time #23096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is different from the existing metrics for rules as it is query specific. We might want to replace that one with this in the future. |
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - pending jenkins
|
Test build #99069 has finished for PR 23096 at commit
|
|
retest this please |
| */ | ||
| def sql(sqlText: String): DataFrame = { | ||
| Dataset.ofRows(self, sessionState.sqlParser.parsePlan(sqlText)) | ||
| val tracker = new QueryPlanningTracker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @rxin .
Can we have an option to disable this new feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun just out of curiosity, what would you like to disable here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a Nice-To-Have instead of Must-Have. The framework seems to be designed to support disabling by assigning None here. If tracker is None, all the following operation will ignore this new feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense to add random flags for everything. If the argument is that this change has a decent chance of introducing regressions (e.g. due to higher memory usage, or cpu overhead), then it would make a lot of sense to put it behind a flag so it can be disabled in production if that happens.
That said, the overhead on the hot code path here is substantially smaller than even transforming the simplest Catalyst plan (hash map look up is orders of magnitude cheaper than calling a partial function to transform a Scala collection for TreeNode), so I think the risk is so low that it does not warrant adding a config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I got it, @rxin .
|
Test build #99067 has finished for PR 23096 at commit
|
|
Test build #99070 has finished for PR 23096 at commit
|
| } | ||
|
|
||
| /** | ||
| * Reecord a specific invocation of a rule. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: Record
| substituteCTE(child, relations.foldLeft(Seq.empty[(String, LogicalPlan)]) { | ||
| case (resolved, (name, relation)) => | ||
| resolved :+ name -> executeSameContext(substituteCTE(relation, resolved)) | ||
| resolved :+ name -> executeSameContext(substituteCTE(relation, resolved), None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding, why should we pass a None tracker here? Wouldn't this hide the time of, e.g., Metastore operations to resolve tables in the CTE definition?
| "around this.") | ||
| } | ||
| executeSameContext(child) | ||
| executeSameContext(child, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without knowing this code well it isn't obvious to me why sometimes we pass None as the tracker. What's the thinking behind it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No great reason. I just used None for everything, except the top level, because it is very difficult to wire the tracker here without refactoring a lot of code.
| object QueryPlanningTracker { | ||
|
|
||
| // Define a list of common phases here. | ||
| val PARSING = "parsing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly because Scala enum is not great, and I was thinking about making this a generic thing that's extensible.
| queryExecutionMetrics.incExecutionTimeBy(rule.ruleName, runTime) | ||
| queryExecutionMetrics.incNumExecution(rule.ruleName) | ||
|
|
||
| tracker.foreach(_.recordRuleInvocation(rule.ruleName, runTime, effective)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this make the query-local and the global metrics inconsistent when tracker is None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! (not great -- but I'd probably remove the global tracker at some point)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing the global tracker would be great!
|
Test build #99065 has finished for PR 23096 at commit
|
|
Test build #99080 has finished for PR 23096 at commit
|
abehm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the thread-local changes. Understood it's not ideal, but at least following an existing pattern and we get the full level of detail now!
I'm happy with these changes.
| queryExecutionMetrics.incExecutionTimeBy(rule.ruleName, runTime) | ||
| queryExecutionMetrics.incNumExecution(rule.ruleName) | ||
|
|
||
| if (tracker ne null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to be defensive. Should this be an assert instead? Might be worth a comment explaining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if one calls execute directly tracker would be null.
|
Test build #99081 has finished for PR 23096 at commit
|
|
Test build #4437 has finished for PR 23096 at commit
|
|
Test build #99118 has finished for PR 23096 at commit
|
|
Merging this. Feel free to leave more comments. I'm hoping we can wire this into the UI eventually. |
## What changes were proposed in this pull request? We currently don't have good visibility into query planning time (analysis vs optimization vs physical planning). This patch adds a simple utility to track the runtime of various rules and various planning phases. ## How was this patch tested? Added unit tests and end-to-end integration tests. Closes apache#23096 from rxin/SPARK-26129. Authored-by: Reynold Xin <[email protected]> Signed-off-by: Reynold Xin <[email protected]>
What changes were proposed in this pull request?
We currently don't have good visibility into query planning time (analysis vs optimization vs physical planning). This patch adds a simple utility to track the runtime of various rules and various planning phases.
How was this patch tested?
Added unit tests and end-to-end integration tests.