Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
1df9943
Add API for handling expression code generation.
viirya Apr 30, 2018
5fe425c
Add new abstraction for expression codegen.
viirya May 1, 2018
00bef6b
Add basic tests.
viirya May 3, 2018
5d9c454
Merge remote-tracking branch 'upstream/master' into SPARK-24121
viirya May 3, 2018
162deb2
Deal merging conflict.
viirya May 3, 2018
d138ee0
Address comments and add more tests.
viirya May 4, 2018
ee9a4c0
Address comments.
viirya May 5, 2018
e7cfa28
Remove JavaCode.block. We should always use code string interpolator …
viirya May 5, 2018
5945c15
We should not implicitly convert code block to string. Otherwise we m…
viirya May 5, 2018
2b30654
Address comment and trim expected code string.
viirya May 5, 2018
aff411b
Remove unused import.
viirya May 8, 2018
53b329a
Address comments.
viirya May 8, 2018
72faac3
Address some comments.
viirya May 9, 2018
ffbf4ab
Merge remote-tracking branch 'upstream/master' into SPARK-24121
viirya May 17, 2018
d040676
Use code block for newly merged codegen.
viirya May 17, 2018
c378ce2
Use Set as method exprValues method returning type.
viirya May 17, 2018
2ca9741
Address comments.
viirya May 19, 2018
d91f111
Merge remote-tracking branch 'upstream/master' into SPARK-24121
viirya May 19, 2018
4b49e8a
Merge remote-tracking branch 'upstream/master' into SPARK-24121
viirya May 22, 2018
96c594a
Use Java call style and use JavaCode instead of Any.
viirya May 22, 2018
00cc564
Merge remote-tracking branch 'upstream/master' into SPARK-24121
viirya May 22, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -101,8 +101,7 @@ abstract class Expression extends TreeNode[Expression] {
ctx.subExprEliminationExprs.get(this).map { subExprState =>
// This expression is repeated which means that the code to evaluate it has already been added
// as a function before. In that case, we just re-use it.
ExprCode(ctx.registerComment(this.toString), subExprState.isNull,
subExprState.value)
ExprCode(ctx.registerComment(this.toString), subExprState.isNull, subExprState.value)
}.getOrElse {
val isNull = ctx.freshName("isNull")
val value = ctx.freshName("value")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ package org.apache.spark.sql.catalyst.expressions.codegen

import java.lang.{Boolean => JBool}

import scala.collection.mutable.ArrayBuffer
import scala.language.{existentials, implicitConversions}

import org.apache.spark.sql.types.{BooleanType, DataType}
Expand Down Expand Up @@ -130,6 +131,8 @@ trait Block extends JavaCode {

def length: Int = toString.length

def nonEmpty: Boolean = toString.nonEmpty

// The leading prefix that should be stripped from each line.
// By default we strip blanks or control characters followed by '|' from the line.
var _marginChar: Option[Char] = Some('|')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do we need this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like:

  code"""
   | val ...
   | ...
  """.stripMargin

It's basically for compatibility. We can remove this and disallow stripMargin(customPrefix). WDYT?

Expand Down Expand Up @@ -167,9 +170,40 @@ object Block {
case other => throw new IllegalArgumentException(
s"Can not interpolate ${other.getClass.getName} into code block.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime exception? sys.error?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, how about accepting other values as strings? e.g., case other => other.toString

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feels it's more like an illegal argument to string interpolator for now. I'm open for others ideas on this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to limit the types of objects we can interpolate at the first. So there will be less cases I'm not aware of. Can be open to all others later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+100000

}
CodeBlock(sc.parts, args)

val (codeParts, blockInputs) = foldLiteralArgs(sc.parts, args)
CodeBlock(codeParts, blockInputs)
}
}
}

// Folds eagerly the literal args into the code parts.
private def foldLiteralArgs(parts: Seq[String], args: Seq[Any]): (Seq[String], Seq[Any]) = {
val codeParts = ArrayBuffer.empty[String]
val blockInputs = ArrayBuffer.empty[Any]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we make the type JavaCode instead of Any?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. JavaCode is better.


val strings = parts.iterator
val inputs = args.iterator
val buf = new StringBuilder(Block.CODE_BLOCK_BUFFER_LENGTH)

buf append strings.next
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use java style here? buf.append(strings.next).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

while (strings.hasNext) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

val input = inputs.next
input match {
case _: ExprValue | _: Block =>
codeParts += buf.toString
buf.clear
blockInputs += input
case _ =>
buf append input
}
buf append strings.next
}
if (buf.nonEmpty) {
codeParts += buf.toString
}

(codeParts.toSeq, blockInputs.toSeq)
}
}

Expand All @@ -182,11 +216,10 @@ case class CodeBlock(codeParts: Seq[String], blockInputs: Seq[Any]) extends Bloc
blockInputs.flatMap {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about foldLeft(Set.empty[ExprValue]) { ... }?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think flatMap looks simpler.

Copy link
Contributor

@mgaido91 mgaido91 May 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think foldLeft has better performance but I am not sure whether the difference can be significant here. I'd consider this a nit probably.

case b: Block => b.exprValues
case e: ExprValue => Set(e)
case _ => Set.empty[ExprValue]
}.toSet
}

override def code: String = {
override lazy val code: String = {
val strings = codeParts.iterator
val inputs = blockInputs.iterator
val buf = new StringBuilder(Block.CODE_BLOCK_BUFFER_LENGTH)
Expand All @@ -207,7 +240,7 @@ case class CodeBlock(codeParts: Seq[String], blockInputs: Seq[Any]) extends Bloc

case class Blocks(blocks: Seq[Block]) extends Block {
override lazy val exprValues: Set[ExprValue] = blocks.flatMap(_.exprValues).toSet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toSet is redundant I think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's required. Otherwise a type mismatch compile error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, sorry, blocks is a Seq, sorry, my bad.

override def code: String = blocks.map(_.toString).mkString("\n")
override lazy val code: String = blocks.map(_.toString).mkString("\n")

override def + (other: Block): Block = other match {
case c: CodeBlock => Blocks(blocks :+ c)
Expand All @@ -217,7 +250,7 @@ case class Blocks(blocks: Seq[Block]) extends Block {
}

object EmptyBlock extends Block with Serializable {
override def code: String = ""
override val code: String = ""
override val exprValues: Set[ExprValue] = Set.empty

override def + (other: Block): Block = other
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,9 @@ case class InputFileName() extends LeafExpression with Nondeterministic {

override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
val className = InputFileBlockHolder.getClass.getName.stripSuffix("$")
ev.copy(code = code"final ${CodeGenerator.javaType(dataType)} ${ev.value} = " +
code"$className.getInputFilePath();", isNull = FalseLiteral)
val typeDef = s"final ${CodeGenerator.javaType(dataType)}"
ev.copy(code = code"$typeDef ${ev.value} = $className.getInputFilePath();",
isNull = FalseLiteral)
}
}

Expand All @@ -66,8 +67,8 @@ case class InputFileBlockStart() extends LeafExpression with Nondeterministic {

override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
val className = InputFileBlockHolder.getClass.getName.stripSuffix("$")
ev.copy(code = code"final ${CodeGenerator.javaType(dataType)} ${ev.value} = " +
code"$className.getStartOffset();", isNull = FalseLiteral)
val typeDef = s"final ${CodeGenerator.javaType(dataType)}"
ev.copy(code = code"$typeDef ${ev.value} = $className.getStartOffset();", isNull = FalseLiteral)
}
}

Expand All @@ -89,7 +90,7 @@ case class InputFileBlockLength() extends LeafExpression with Nondeterministic {

override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
val className = InputFileBlockHolder.getClass.getName.stripSuffix("$")
ev.copy(code = code"final ${CodeGenerator.javaType(dataType)} ${ev.value} = " +
code"$className.getLength();", isNull = FalseLiteral)
val typeDef = s"final ${CodeGenerator.javaType(dataType)}"
ev.copy(code = code"$typeDef ${ev.value} = $className.getLength();", isNull = FalseLiteral)
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,20 @@ import org.apache.spark.sql.types.{BooleanType, IntegerType}

class CodeBlockSuite extends SparkFunSuite {

test("Block can interpolate string and ExprValue inputs") {
test("Block interpolates string and ExprValue inputs") {
val isNull = JavaCode.isNullVariable("expr1_isNull")
val code = code"boolean ${isNull} = ${JavaCode.defaultLiteral(BooleanType)};"
val stringLiteral = "false"
val code = code"boolean $isNull = $stringLiteral;"
assert(code.toString == "boolean expr1_isNull = false;")
}

test("Literals are folded into string code parts instead of block inputs") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I like it!

val value = JavaCode.variable("expr1", IntegerType)
val intLiteral = 1
val code = code"int $value = $intLiteral;"
assert(code.asInstanceOf[CodeBlock].blockInputs === Seq(value))
}

test("Block.stripMargin") {
val isNull = JavaCode.isNullVariable("expr1_isNull")
val value = JavaCode.variable("expr1", IntegerType)
Expand Down Expand Up @@ -92,26 +100,26 @@ class CodeBlockSuite extends SparkFunSuite {
}

test("Throws exception when interpolating unexcepted object in code block") {
val obj = TestClass(100)
val obj = Tuple2(1, 1)
val e = intercept[IllegalArgumentException] {
code"$obj"
}
assert(e.getMessage().contains(s"Can not interpolate ${obj.getClass.getName}"))
}

test("replace expr values in code block") {
val statement = JavaCode.expression("1 + 1", IntegerType)
val expr = JavaCode.expression("1 + 1", IntegerType)
val isNull = JavaCode.isNullVariable("expr1_isNull")
val exprInFunc = JavaCode.variable("expr1", IntegerType)

val code =
code"""
|callFunc(int $statement) {
|callFunc(int $expr) {
| boolean $isNull = false;
| int $exprInFunc = $statement + 1;
| int $exprInFunc = $expr + 1;
|}""".stripMargin

val aliasedParam = JavaCode.variable("aliased", statement.javaType)
val aliasedParam = JavaCode.variable("aliased", expr.javaType)
val aliasedInputs = code.asInstanceOf[CodeBlock].blockInputs.map {
case _: SimpleExprValue => aliasedParam
case other => other
Expand All @@ -126,5 +134,3 @@ class CodeBlockSuite extends SparkFunSuite {
assert(aliasedCode.toString == expected.toString)
}
}

private case class TestClass(a: Int)
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ private[sql] trait ColumnarBatchScan extends CodegenSupport {
}
val valueVar = ctx.freshName("value")
val str = s"columnVector[$columnVar, $ordinal, ${dataType.simpleString}]"
val code = code"${ctx.registerComment(str)}\n" + (if (nullable) {
val code = code"${ctx.registerComment(str)}" + (if (nullable) {
code"""
boolean $isNullVar = $columnVar.isNullAt($ordinal);
$javaType $valueVar = $isNullVar ? ${CodeGenerator.defaultValue(dataType)} : ($value);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,7 @@ trait CodegenSupport extends SparkPlan {
* them to be evaluated twice.
*/
protected def evaluateVariables(variables: Seq[ExprCode]): String = {
val evaluate = variables.filter(_.code.toString != "").map(_.code.toString).mkString("\n")
val evaluate = variables.filter(_.code.nonEmpty).map(_.code.toString).mkString("\n")
variables.foreach(_.code = EmptyBlock)
evaluate
}
Expand Down