[SPARK-26023][SQL] Dumping truncated plans and generated code to a file #23018

MaxGekk · 2018-11-12T21:36:07Z

What changes were proposed in this pull request?

In the PR, I propose new method for debugging queries by dumping info about their execution to a file. It saves logical, optimized and physical plan similar to the explain() method + generated code. One of the advantages of the method over explain is it does not materializes full output as one string in memory which can cause OOMs.

How was this patch tested?

Added a few tests to QueryExecutionSuite to check positive and negative scenarios.

MaxGekk · 2018-11-12T21:37:23Z

The is #22429 without the maxFields parameter.

SparkQA · 2018-11-13T01:13:59Z

Test build #98742 has finished for PR 23018 at commit 090e8c2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2018-11-13T01:46:59Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala

+      verbose: Boolean,
+      addSuffix: Boolean,
+      maxFields: Option[Int]): Unit = {
+    generateTreeString(0, Nil, writer, verbose, "", addSuffix)


How about add another function only save nodeName? We can use it here: #22879

If #22879 is merged first, we should add that function here. If this one is merged first, that PR better should have the function there.

I would prefer to avoid overcomplicating the PR again, frankly speaking.

HyukjinKwon · 2018-11-13T03:02:57Z

Looks fine to me. adding @cloud-fan and @hvanhovell

SparkQA · 2018-11-13T10:39:37Z

Test build #98763 has finished for PR 23018 at commit cdc6cab.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell

LGTM

hvanhovell · 2018-11-13T10:43:59Z

retest this please

SparkQA · 2018-11-13T14:20:26Z

Test build #98771 has finished for PR 23018 at commit cdc6cab.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2018-11-13T14:22:01Z

Merging to master. Thanks!

MaxGekk · 2018-11-13T14:27:47Z

@hvanhovell @HyukjinKwon Thank you for the review.

## What changes were proposed in this pull request? In the PR, I propose new method for debugging queries by dumping info about their execution to a file. It saves logical, optimized and physical plan similar to the `explain()` method + generated code. One of the advantages of the method over `explain` is it does not materializes full output as one string in memory which can cause OOMs. ## How was this patch tested? Added a few tests to `QueryExecutionSuite` to check positive and negative scenarios. Closes apache#23018 from MaxGekk/truncated-plan-to-file. Authored-by: Maxim Gekk <[email protected]> Signed-off-by: Herman van Hovell <[email protected]>

The PR puts in a limit on the size of a debug string generated for a tree node. Helps to fix out of memory errors when large plans have huge debug strings. In addition to SPARK-26103, this should also address SPARK-23904 and SPARK-25380. AN alternative solution was proposed in apache#23076, but that solution doesn't address all the cases that can cause a large query. This limit is only on calls treeString that don't pass a Writer, which makes it play nicely with apache#22429, apache#23018 and apache#23039. Full plans can be written to files, but truncated plans will be used when strings are held in memory, such as for the UI. - A new configuration parameter called spark.sql.debug.maxPlanLength was added to control the length of the plans. - When plans are truncated, "..." is printed to indicate that it isn't a full plan - A warning is printed out the first time a truncated plan is displayed. The warning explains what happened and how to adjust the limit. Unit tests were created for the new SizeLimitedWriter. Also a unit test for TreeNode was created that checks that a long plan is correctly truncated. Closes apache#23169 from DaveDeCaprio/text-plan-size. Lead-authored-by: Dave DeCaprio <[email protected]> Co-authored-by: David DeCaprio <[email protected]> Signed-off-by: Marcelo Vanzin <[email protected]>

Dumping truncated plans to a file

090e8c2

wangyum reviewed Nov 13, 2018

View reviewed changes

Removing the maxFields parameter

cdc6cab

hvanhovell approved these changes Nov 13, 2018

View reviewed changes

asfgit closed this in 44683e0 Nov 13, 2018

This was referenced Nov 14, 2018

[SPARK-25440][SQL] Dumping query execution info to a file #22429

Closed

[SPARK-26103][SQL] Added maxDepth to limit the length of text plans #23076

Closed

DaveDeCaprio mentioned this pull request Nov 28, 2018

[SPARK-26103][SQL] Limit the length of debug strings for query plans #23169

Closed

MaxGekk deleted the truncated-plan-to-file branch August 17, 2019 13:32

[SPARK-26023][SQL] Dumping truncated plans and generated code to a file #23018

[SPARK-26023][SQL] Dumping truncated plans and generated code to a file #23018

Uh oh!

Conversation

MaxGekk commented Nov 12, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

MaxGekk commented Nov 12, 2018

Uh oh!

SparkQA commented Nov 13, 2018

Uh oh!

wangyum Nov 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Nov 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaxGekk Nov 13, 2018

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Nov 13, 2018

Uh oh!

SparkQA commented Nov 13, 2018

Uh oh!

hvanhovell left a comment

Choose a reason for hiding this comment

Uh oh!

hvanhovell commented Nov 13, 2018

Uh oh!

SparkQA commented Nov 13, 2018

Uh oh!

hvanhovell commented Nov 13, 2018

Uh oh!

MaxGekk commented Nov 13, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

wangyum Nov 13, 2018 •

edited

Loading

HyukjinKwon Nov 13, 2018 •

edited

Loading