-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-31721][SQL] Assert optimized is initialized before tracking the planning time #28543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ok to test |
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Test build #122674 has finished for PR 28543 at commit
|
|
Test build #122675 has finished for PR 28543 at commit
|
|
Test build #122808 has finished for PR 28543 at commit
|
|
retest this please |
|
Test build #122819 has finished for PR 28543 at commit
|
|
Merging to master/3.0. Thanks! |
…e planning time ### What changes were proposed in this pull request? The QueryPlanningTracker in QueryExeuction reports the planning time that also includes the optimization time. This happens because the optimizedPlan in QueryExecution is lazy and only will initialize when first called. When df.queryExecution.executedPlan is called, the the tracker starts recording the planning time, and then calls the optimized plan. This causes the planning time to start before optimization and also include the planning time. This PR fixes this behavior by introducing a method assertOptimized, similar to assertAnalyzed that explicitly initializes the optimized plan. This method is called before measuring the time for sparkPlan and executedPlan. We call it before sparkPlan because that also counts as planning time. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit tests Closes #28543 from dbaliafroozeh/AddAssertOptimized. Authored-by: Ali Afroozeh <ali.afroozeh@databricks.com> Signed-off-by: herman <herman@databricks.com> (cherry picked from commit b9cc31c) Signed-off-by: herman <herman@databricks.com>
What changes were proposed in this pull request?
The QueryPlanningTracker in QueryExeuction reports the planning time that also includes the optimization time. This happens because the optimizedPlan in QueryExecution is lazy and only will initialize when first called. When df.queryExecution.executedPlan is called, the the tracker starts recording the planning time, and then calls the optimized plan. This causes the planning time to start before optimization and also include the planning time.
This PR fixes this behavior by introducing a method assertOptimized, similar to assertAnalyzed that explicitly initializes the optimized plan. This method is called before measuring the time for sparkPlan and executedPlan. We call it before sparkPlan because that also counts as planning time.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit tests