Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Dec 29, 2018

What changes were proposed in this pull request?

Proposed new class StringConcat for converting a sequence of strings to string with one memory allocation in the toString method. StringConcat replaces StringBuilderWriter in methods of dumping of Spark plans and codegen to strings.

All Writer arguments are replaced by String => Unit in methods related to Spark plans stringification.

How was this patch tested?

It was tested by existing suites QueryExecutionSuite, DebuggingSuite as well as new tests for StringConcat in StringUtilsSuite.

@MaxGekk
Copy link
Member Author

MaxGekk commented Dec 29, 2018

@hvanhovell Please, take a look at the PR.

@SparkQA
Copy link

SparkQA commented Dec 29, 2018

Test build #100537 has finished for PR 23406 at commit f621de7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 29, 2018

Test build #100539 has finished for PR 23406 at commit ca4aed8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}

class StringRope {
private var list = List.empty[String]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use a ListBuffer or an ArrayBuffer here. Those have (amortized) constant time appends and do not force you to reverse the collection when building the string.

Copy link
Member Author

@MaxGekk MaxGekk Dec 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I supposed this reverse is relatively cheap O(n). Coping a string would be more expensive than adding element to the head of list inside of the reverse() method. In any case, need to traverse over the list in toString.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced List by ArrayBuffer


private def writeOrError(writer: Writer)(f: Writer => Unit): Unit = {
try f(writer)
private def appendOrError(append: String => Unit)(f: (String => Unit) => Unit): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You always call this method on a QueryPlan. Can we specialize this method for this scenario and pass the plan and all the needed treeString parameters?

Different question. Is it possible that treeString already writes to the appender before throwing an exception. It it does the output might look pretty weird, because it will and contain a part of the tree and the exception thrown.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it will and contain a part of the tree and the exception thrown

In the case of writing to a file, I think it is possible. I believe it will be a nice feature in trouble shooting. I would image a huge (maybe wrong) plan causes OOMs at some point. If we write some part of the plan to file, it would be helpful in debugging.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we specialize this method for this scenario and pass the plan and all the needed treeString parameters?

@hvanhovell Do you mean changes like in the PR MaxGekk#16 ? Unfortunately it doesn't work well because plan's constructor can produce AnalysisException which cannot be handled with this approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that would be it. You can pass in a call-by-name parameter if you want to capture errors during construction. I prefer his over having some overly generic function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be ok if I move appendOrError to the companion object of QueryPlan? Like:

object QueryPlan extends PredicateHelper {
  /**
   * Converts the query plan to string and appends it via provided function.
   */
  def append[T <: QueryPlan[T]](
      plan: => QueryPlan[T],
      append: String => Unit,
      verbose: Boolean,
      addSuffix: Boolean,
      maxFields: Int = SQLConf.get.maxToStringFields): Unit = { () }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that works.

@hvanhovell
Copy link
Contributor

cc @rednaxelafx

@SparkQA
Copy link

SparkQA commented Dec 30, 2018

Test build #100556 has finished for PR 23406 at commit 544b80e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

*/
override def toString: String = {
val buffer = new StringBuffer(length)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the new line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for beauty

}

/**
* Concatenation of sequence of strings to final string with cheap append method
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if StringConcatenation is a better name for this class, as this class is technically a rope (that is a binary tree).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StringRope looks nicer ;-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and is wrong :P...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StringConcat?

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - pending jenkins

@SparkQA
Copy link

SparkQA commented Dec 31, 2018

Test build #100573 has finished for PR 23406 at commit 074e9b8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@rednaxelafx rednaxelafx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. One minor comment on using StringBuffer

* returns concatenated string.
*/
override def toString: String = {
val result = new StringBuffer(length)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use java.lang.StringBuilder here for the sake of reducing one useless allocation of the Scala wrapper?
Which StringBuffer is this anyway? If you're using the Scala scala.collection.mutable.*, it should have been StringBuilder, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced by java.lang.StringBuilder

@SparkQA
Copy link

SparkQA commented Dec 31, 2018

Test build #100587 has finished for PR 23406 at commit 29b62bf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

Merging to master. Thanks!

@asfgit asfgit closed this in 89c92cc Dec 31, 2018
@srowen
Copy link
Member

srowen commented Dec 31, 2018

Late but looks good to me.

@MaxGekk
Copy link
Member Author

MaxGekk commented Dec 31, 2018

@hvanhovell @srowen @rednaxelafx Happy New Year!!!

jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

Proposed new class `StringConcat` for converting a sequence of strings to string with one memory allocation in the `toString` method.  `StringConcat` replaces `StringBuilderWriter` in methods of dumping of Spark plans and codegen to strings.

All `Writer` arguments are replaced by `String => Unit` in methods related to Spark plans stringification.

## How was this patch tested?

It was tested by existing suites `QueryExecutionSuite`, `DebuggingSuite` as well as new tests for `StringConcat` in `StringUtilsSuite`.

Closes apache#23406 from MaxGekk/rope-plan.

Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Herman van Hovell <[email protected]>
@MaxGekk MaxGekk deleted the rope-plan branch August 17, 2019 13:35
dvallejo pushed a commit to Telefonica/spark that referenced this pull request Aug 31, 2021
Proposed new class `StringConcat` for converting a sequence of strings to string with one memory allocation in the `toString` method.  `StringConcat` replaces `StringBuilderWriter` in methods of dumping of Spark plans and codegen to strings.

All `Writer` arguments are replaced by `String => Unit` in methods related to Spark plans stringification.

It was tested by existing suites `QueryExecutionSuite`, `DebuggingSuite` as well as new tests for `StringConcat` in `StringUtilsSuite`.

Closes apache#23406 from MaxGekk/rope-plan.

Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Herman van Hovell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants