-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26504][SQL] Rope-wise dumping of Spark plans #23406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@hvanhovell Please, take a look at the PR. |
|
Test build #100537 has finished for PR 23406 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
Outdated
Show resolved
Hide resolved
|
Test build #100539 has finished for PR 23406 at commit
|
| } | ||
|
|
||
| class StringRope { | ||
| private var list = List.empty[String] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use a ListBuffer or an ArrayBuffer here. Those have (amortized) constant time appends and do not force you to reverse the collection when building the string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I supposed this reverse is relatively cheap O(n). Coping a string would be more expensive than adding element to the head of list inside of the reverse() method. In any case, need to traverse over the list in toString.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced List by ArrayBuffer
|
|
||
| private def writeOrError(writer: Writer)(f: Writer => Unit): Unit = { | ||
| try f(writer) | ||
| private def appendOrError(append: String => Unit)(f: (String => Unit) => Unit): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You always call this method on a QueryPlan. Can we specialize this method for this scenario and pass the plan and all the needed treeString parameters?
Different question. Is it possible that treeString already writes to the appender before throwing an exception. It it does the output might look pretty weird, because it will and contain a part of the tree and the exception thrown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because it will and contain a part of the tree and the exception thrown
In the case of writing to a file, I think it is possible. I believe it will be a nice feature in trouble shooting. I would image a huge (maybe wrong) plan causes OOMs at some point. If we write some part of the plan to file, it would be helpful in debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we specialize this method for this scenario and pass the plan and all the needed treeString parameters?
@hvanhovell Do you mean changes like in the PR MaxGekk#16 ? Unfortunately it doesn't work well because plan's constructor can produce AnalysisException which cannot be handled with this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that would be it. You can pass in a call-by-name parameter if you want to capture errors during construction. I prefer his over having some overly generic function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be ok if I move appendOrError to the companion object of QueryPlan? Like:
object QueryPlan extends PredicateHelper {
/**
* Converts the query plan to string and appends it via provided function.
*/
def append[T <: QueryPlan[T]](
plan: => QueryPlan[T],
append: String => Unit,
verbose: Boolean,
addSuffix: Boolean,
maxFields: Int = SQLConf.get.maxToStringFields): Unit = { () }There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure that works.
sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
Show resolved
Hide resolved
|
cc @rednaxelafx |
|
Test build #100556 has finished for PR 23406 at commit
|
| */ | ||
| override def toString: String = { | ||
| val buffer = new StringBuffer(length) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the new line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for beauty
| } | ||
|
|
||
| /** | ||
| * Concatenation of sequence of strings to final string with cheap append method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if StringConcatenation is a better name for this class, as this class is technically a rope (that is a binary tree).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StringRope looks nicer ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and is wrong :P...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StringConcat?
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - pending jenkins
|
Test build #100573 has finished for PR 23406 at commit
|
rednaxelafx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. One minor comment on using StringBuffer
| * returns concatenated string. | ||
| */ | ||
| override def toString: String = { | ||
| val result = new StringBuffer(length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just use java.lang.StringBuilder here for the sake of reducing one useless allocation of the Scala wrapper?
Which StringBuffer is this anyway? If you're using the Scala scala.collection.mutable.*, it should have been StringBuilder, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced by java.lang.StringBuilder
|
Test build #100587 has finished for PR 23406 at commit
|
|
Merging to master. Thanks! |
|
Late but looks good to me. |
|
@hvanhovell @srowen @rednaxelafx Happy New Year!!! |
## What changes were proposed in this pull request? Proposed new class `StringConcat` for converting a sequence of strings to string with one memory allocation in the `toString` method. `StringConcat` replaces `StringBuilderWriter` in methods of dumping of Spark plans and codegen to strings. All `Writer` arguments are replaced by `String => Unit` in methods related to Spark plans stringification. ## How was this patch tested? It was tested by existing suites `QueryExecutionSuite`, `DebuggingSuite` as well as new tests for `StringConcat` in `StringUtilsSuite`. Closes apache#23406 from MaxGekk/rope-plan. Authored-by: Maxim Gekk <[email protected]> Signed-off-by: Herman van Hovell <[email protected]>
Proposed new class `StringConcat` for converting a sequence of strings to string with one memory allocation in the `toString` method. `StringConcat` replaces `StringBuilderWriter` in methods of dumping of Spark plans and codegen to strings. All `Writer` arguments are replaced by `String => Unit` in methods related to Spark plans stringification. It was tested by existing suites `QueryExecutionSuite`, `DebuggingSuite` as well as new tests for `StringConcat` in `StringUtilsSuite`. Closes apache#23406 from MaxGekk/rope-plan. Authored-by: Maxim Gekk <[email protected]> Signed-off-by: Herman van Hovell <[email protected]>
What changes were proposed in this pull request?
Proposed new class
StringConcatfor converting a sequence of strings to string with one memory allocation in thetoStringmethod.StringConcatreplacesStringBuilderWriterin methods of dumping of Spark plans and codegen to strings.All
Writerarguments are replaced byString => Unitin methods related to Spark plans stringification.How was this patch tested?
It was tested by existing suites
QueryExecutionSuite,DebuggingSuiteas well as new tests forStringConcatinStringUtilsSuite.