Skip to content
Original file line number Diff line number Diff line change
Expand Up @@ -232,27 +232,55 @@ case class NewInstance(

override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
val javaType = ctx.javaType(dataType)
val argGen = arguments.map(_.genCode(ctx))
val argString = argGen.map(_.value).mkString(", ")
val argIsNulls = ctx.freshName("argIsNulls")
ctx.addMutableState("boolean[]", argIsNulls, "")
val argTypes = arguments.map(e => ctx.javaType(e.dataType))
val argValues = arguments.zipWithIndex.map { case (e, i) =>
val argValue = ctx.freshName("argValue")
ctx.addMutableState(argTypes(i), argValue, "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ctx.addMutableState(ctx.javaType(e.dataType), argValue, ""), then we don't need to define argTypes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I did.

argValue
}

val argCodes = arguments.zipWithIndex.map { case (e, i) =>
val expr = e.genCode(ctx)
expr.code + s"""
$argIsNulls[$i] = ${expr.isNull};
${argValues(i)} = ${expr.value};
"""
}
val argCode = ctx.splitExpressions(ctx.INPUT_ROW, argCodes)

val outer = outerPointer.map(func => Literal.fromObject(func()).genCode(ctx))

var isNull = ev.isNull
val setIsNull = if (propagateNull && arguments.nonEmpty) {
s"final boolean $isNull = ${argGen.map(_.isNull).mkString(" || ")};"
if (arguments.length <= 10) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any particular reason to pick 10?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no particular reason for pick 10. 100 seems to be too long.
Do you have any suggestion to pick the value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW how useful is this optimization? What if we always use the for-loop version? It will be better if we can have a small benchmark here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. But, do we use the for-loop version for 1 or 2? For 1, clearly, we do not need a loop. For 2, we always exit a loop only for executing one iteration.

val argIsNull = arguments.zipWithIndex.map { case (e, i) =>
s"$argIsNulls[$i]"
}
s"final boolean $isNull = ${argIsNull.mkString(" || ")};"
} else {
s"""
boolean $isNull = false;
for (int idx = 0; idx < ${arguments.length}; idx++) {
if ($argIsNulls[idx]) { $isNull = true; break; }
}
"""
}
} else {
isNull = "false"
""
}

val constructorCall = outer.map { gen =>
s"""${gen.value}.new ${cls.getSimpleName}($argString)"""
s"""${gen.value}.new ${cls.getSimpleName}(${argValues.mkString(", ")})"""
}.getOrElse {
s"new $className($argString)"
s"new $className(${argValues.mkString(", ")})"
}

val code = s"""
${argGen.map(_.code).mkString("\n")}
$argIsNulls = new boolean[${arguments.size}];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to create a new boolean array every round?

${argCode.mkString("")}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't argCode already a string?

${outer.map(_.code).getOrElse("")}
$setIsNull
final $javaType ${ev.value} = $isNull ? ${ctx.defaultValue(javaType)} : $constructorCall;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,4 +58,38 @@ class DataFrameComplexTypeSuite extends QueryTest with SharedSQLContext {
val nullIntRow = df.selectExpr("i[1]").collect()(0)
assert(nullIntRow == org.apache.spark.sql.Row(null))
}

test("SPARK-15285 Generated SpecificSafeProjection.apply method grows beyond 64KB") {
val ds100_5 = Seq(S100_5()).toDS()
ds100_5.show
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not use show in test, can collect reproduce this bug?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To use collect cannot reproduce this bug. I think that to use show involves code generation for Projection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this is weird, the main logic in show is calling collect to get the data, can you double check it? and how about ds.rdd.collect?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I conformed that to call collect cannot reproduce this bug in the latest master branch. However, I confirmed that to call ds.rdd.collect can reproduce this bug. The latest code uses ds.rdd.collect.

}
}

case class S100(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scala 2.10 doesn't support large case class. We can create a new test suite under scala-2.11/src/test and put this test there, so that we only run it under scala 2.10. repl module is a good example about it. cc @kiszk do you mind resend this PR with the fix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take this approach for running test suite without scala 2.10.

s1: String = "1", s2: String = "2", s3: String = "3", s4: String = "4",
s5: String = "5", s6: String = "6", s7: String = "7", s8: String = "8",
s9: String = "9", s10: String = "10", s11: String = "11", s12: String = "12",
s13: String = "13", s14: String = "14", s15: String = "15", s16: String = "16",
s17: String = "17", s18: String = "18", s19: String = "19", s20: String = "20",
s21: String = "21", s22: String = "22", s23: String = "23", s24: String = "24",
s25: String = "25", s26: String = "26", s27: String = "27", s28: String = "28",
s29: String = "29", s30: String = "30", s31: String = "31", s32: String = "32",
s33: String = "33", s34: String = "34", s35: String = "35", s36: String = "36",
s37: String = "37", s38: String = "38", s39: String = "39", s40: String = "40",
s41: String = "41", s42: String = "42", s43: String = "43", s44: String = "44",
s45: String = "45", s46: String = "46", s47: String = "47", s48: String = "48",
s49: String = "49", s50: String = "50", s51: String = "51", s52: String = "52",
s53: String = "53", s54: String = "54", s55: String = "55", s56: String = "56",
s57: String = "57", s58: String = "58", s59: String = "59", s60: String = "60",
s61: String = "61", s62: String = "62", s63: String = "63", s64: String = "64",
s65: String = "65", s66: String = "66", s67: String = "67", s68: String = "68",
s69: String = "69", s70: String = "70", s71: String = "71", s72: String = "72",
s73: String = "73", s74: String = "74", s75: String = "75", s76: String = "76",
s77: String = "77", s78: String = "78", s79: String = "79", s80: String = "80",
s81: String = "81", s82: String = "82", s83: String = "83", s84: String = "84",
s85: String = "85", s86: String = "86", s87: String = "87", s88: String = "88",
s89: String = "89", s90: String = "90", s91: String = "91", s92: String = "92",
s93: String = "93", s94: String = "94", s95: String = "95", s96: String = "96",
s97: String = "97", s98: String = "98", s99: String = "99", s100: String = "100")
case class S100_5(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a blank line between these 2 classes.

s1: S100 = S100(), s2: S100 = S100(), s3: S100 = S100(), s4: S100 = S100(), s5: S100 = S100())