-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26023][SQL] Dumping truncated plans and generated code to a file #23018
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The is #22429 without the |
|
Test build #98742 has finished for PR 23018 at commit
|
| verbose: Boolean, | ||
| addSuffix: Boolean, | ||
| maxFields: Option[Int]): Unit = { | ||
| generateTreeString(0, Nil, writer, verbose, "", addSuffix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about add another function only save nodeName? We can use it here: #22879
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If #22879 is merged first, we should add that function here. If this one is merged first, that PR better should have the function there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to avoid overcomplicating the PR again, frankly speaking.
|
Looks fine to me. adding @cloud-fan and @hvanhovell |
|
Test build #98763 has finished for PR 23018 at commit
|
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
retest this please |
|
Test build #98771 has finished for PR 23018 at commit
|
|
Merging to master. Thanks! |
|
@hvanhovell @HyukjinKwon Thank you for the review. |
## What changes were proposed in this pull request? In the PR, I propose new method for debugging queries by dumping info about their execution to a file. It saves logical, optimized and physical plan similar to the `explain()` method + generated code. One of the advantages of the method over `explain` is it does not materializes full output as one string in memory which can cause OOMs. ## How was this patch tested? Added a few tests to `QueryExecutionSuite` to check positive and negative scenarios. Closes apache#23018 from MaxGekk/truncated-plan-to-file. Authored-by: Maxim Gekk <[email protected]> Signed-off-by: Herman van Hovell <[email protected]>
The PR puts in a limit on the size of a debug string generated for a tree node. Helps to fix out of memory errors when large plans have huge debug strings. In addition to SPARK-26103, this should also address SPARK-23904 and SPARK-25380. AN alternative solution was proposed in apache#23076, but that solution doesn't address all the cases that can cause a large query. This limit is only on calls treeString that don't pass a Writer, which makes it play nicely with apache#22429, apache#23018 and apache#23039. Full plans can be written to files, but truncated plans will be used when strings are held in memory, such as for the UI. - A new configuration parameter called spark.sql.debug.maxPlanLength was added to control the length of the plans. - When plans are truncated, "..." is printed to indicate that it isn't a full plan - A warning is printed out the first time a truncated plan is displayed. The warning explains what happened and how to adjust the limit. Unit tests were created for the new SizeLimitedWriter. Also a unit test for TreeNode was created that checks that a long plan is correctly truncated. Closes apache#23169 from DaveDeCaprio/text-plan-size. Lead-authored-by: Dave DeCaprio <[email protected]> Co-authored-by: David DeCaprio <[email protected]> Signed-off-by: Marcelo Vanzin <[email protected]>
What changes were proposed in this pull request?
In the PR, I propose new method for debugging queries by dumping info about their execution to a file. It saves logical, optimized and physical plan similar to the
explain()method + generated code. One of the advantages of the method overexplainis it does not materializes full output as one string in memory which can cause OOMs.How was this patch tested?
Added a few tests to
QueryExecutionSuiteto check positive and negative scenarios.