Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Nov 12, 2018

What changes were proposed in this pull request?

In the PR, I propose new method for debugging queries by dumping info about their execution to a file. It saves logical, optimized and physical plan similar to the explain() method + generated code. One of the advantages of the method over explain is it does not materializes full output as one string in memory which can cause OOMs.

How was this patch tested?

Added a few tests to QueryExecutionSuite to check positive and negative scenarios.

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 12, 2018

The is #22429 without the maxFields parameter.

@SparkQA
Copy link

SparkQA commented Nov 13, 2018

Test build #98742 has finished for PR 23018 at commit 090e8c2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

verbose: Boolean,
addSuffix: Boolean,
maxFields: Option[Int]): Unit = {
generateTreeString(0, Nil, writer, verbose, "", addSuffix)
Copy link
Member

@wangyum wangyum Nov 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about add another function only save nodeName? We can use it here: #22879

Copy link
Member

@HyukjinKwon HyukjinKwon Nov 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If #22879 is merged first, we should add that function here. If this one is merged first, that PR better should have the function there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to avoid overcomplicating the PR again, frankly speaking.

@HyukjinKwon
Copy link
Member

Looks fine to me. adding @cloud-fan and @hvanhovell

@SparkQA
Copy link

SparkQA commented Nov 13, 2018

Test build #98763 has finished for PR 23018 at commit cdc6cab.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hvanhovell
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Nov 13, 2018

Test build #98771 has finished for PR 23018 at commit cdc6cab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

Merging to master. Thanks!

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 13, 2018

@hvanhovell @HyukjinKwon Thank you for the review.

@asfgit asfgit closed this in 44683e0 Nov 13, 2018
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

In the PR, I propose new method for debugging queries by dumping info about their execution to a file. It saves logical, optimized and physical plan similar to the `explain()` method + generated code. One of the advantages of the method over `explain` is it does not materializes full output as one string in memory which can cause OOMs.

## How was this patch tested?

Added a few tests to `QueryExecutionSuite` to check positive and negative scenarios.

Closes apache#23018 from MaxGekk/truncated-plan-to-file.

Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Herman van Hovell <[email protected]>
@MaxGekk MaxGekk deleted the truncated-plan-to-file branch August 17, 2019 13:32
dvallejo pushed a commit to Telefonica/spark that referenced this pull request Aug 31, 2021
The PR puts in a limit on the size of a debug string generated for a tree node.  Helps to fix out of memory errors when large plans have huge debug strings.   In addition to SPARK-26103, this should also address SPARK-23904 and SPARK-25380.  AN alternative solution was proposed in apache#23076, but that solution doesn't address all the cases that can cause a large query.  This limit is only on calls treeString that don't pass a Writer, which makes it play nicely with apache#22429, apache#23018 and apache#23039.  Full plans can be written to files, but truncated plans will be used when strings are held in memory, such as for the UI.

- A new configuration parameter called spark.sql.debug.maxPlanLength was added to control the length of the plans.
- When plans are truncated, "..." is printed to indicate that it isn't a full plan
- A warning is printed out the first time a truncated plan is displayed. The warning explains what happened and how to adjust the limit.

Unit tests were created for the new SizeLimitedWriter.  Also a unit test for TreeNode was created that checks that a long plan is correctly truncated.

Closes apache#23169 from DaveDeCaprio/text-plan-size.

Lead-authored-by: Dave DeCaprio <[email protected]>
Co-authored-by: David DeCaprio <[email protected]>
Signed-off-by: Marcelo Vanzin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants