Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1258,7 +1258,8 @@ abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Loggin

object CodeGenerator extends Logging {

// This is the value of HugeMethodLimit in the OpenJDK JVM settings
// This is the default value of HugeMethodLimit in the OpenJDK HotSpot JVM,
// beyond which methods will be rejected from JIT compilation
final val DEFAULT_JVM_HUGE_METHOD_LIMIT = 8000

// The max valid length of method parameters in JVM.
Expand Down Expand Up @@ -1385,9 +1386,15 @@ object CodeGenerator extends Logging {
try {
val cf = new ClassFile(new ByteArrayInputStream(classBytes))
val stats = cf.methodInfos.asScala.flatMap { method =>
method.getAttributes().filter(_.getClass.getName == codeAttr.getName).map { a =>
method.getAttributes().filter(_.getClass eq codeAttr).map { a =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need change this condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The getName was accessing the class name unnecessarily and then doing string comparison unnecessarily. Just changing it when touching the code around it.
The JVM guarantees that a (defining class loader, full class name) pair is unique at runtime, in this case the java.lang.Class instance is guaranteed to be unique, so a reference equality check is fast and sufficient.
There's no worry of cross class loader issue here, because if there is, the code that follows won't work anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I know The current comparison is more strict. Although the previous comparison was only for name, the current comparison is for a pair of class loader and name.

I worried whether the strictness may change behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worried whether the strictness may change behavior.

Right, I can tell. And as I've mentioned above, although my new check is stricter, it doesn't make the behavior any "worse" than before, because we're reflectively accessing the code field immediately after via codeAttrField.get(a), and that won't work unless the classes are matching exactly.

The old code before my change would actually be too permissive -- in the case of class loader mismatch, the old check will allow the it go run to the reflective access site, but it'll then fail because reflection doesn't allow access from the wrong class.

This can be exemplified by the following pseudocode

val c1 = new URLClassLoader(somePath).loadClass("Foo") // load a class
val c2 = new URLClassLoader(somePath).loadClass("Foo") // load another class with the same name from the same path, but different class loader
val nameEq = c1.getName == c2.getName // true
val refEq = c1 eq c2 // false
val f1 = c1.getClass.getField("a")
val o1 = c1.newInstance
val o2 = c2.newInstance
f1.get(o1) // okay
f1.get(o2) // fail with exception

val byteCodeSize = codeAttrField.get(a).asInstanceOf[Array[Byte]].length
CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE.update(byteCodeSize)

if (byteCodeSize > DEFAULT_JVM_HUGE_METHOD_LIMIT) {
logInfo("Generated method too long to be JIT compiled: " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why info and not debug?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we hit this, the JIT will very likely not work and performance may drop a lot. This even worth a warning...

Since it's just an estimation and Spark SQL can still work, I think info is fine here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Generated method is too long ...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say either way is fine. They're different tenses and the nuances are slightly different.

"This story is too good to be true"
vs
"A story too good to be true"

s"${cf.getThisClassName}.${method.getName} is $byteCodeSize bytes")
}

byteCodeSize
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,16 @@ package org.apache.spark.sql.catalyst.expressions

import java.sql.Timestamp

import org.apache.log4j.{Appender, AppenderSkeleton, Logger}
import org.apache.log4j.spi.LoggingEvent

import org.apache.spark.SparkFunSuite
import org.apache.spark.metrics.source.CodegenMetrics
import org.apache.spark.sql.Row
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.catalyst.dsl.expressions._
import org.apache.spark.sql.catalyst.expressions.codegen._
import org.apache.spark.sql.catalyst.expressions.codegen.Block._
import org.apache.spark.sql.catalyst.expressions.objects._
import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, DateTimeUtils}
import org.apache.spark.sql.types._
Expand Down Expand Up @@ -499,4 +503,64 @@ class CodeGenerationSuite extends SparkFunSuite with ExpressionEvalHelper {
ctx.freshName("a_1") :: ctx.freshName("a_0") :: Nil
assert(names2.distinct.length == 4)
}

test("SPARK-25113: should log when there exists generated methods above HugeMethodLimit") {
class MockAppender extends AppenderSkeleton {
var seenMessage = false

override def append(loggingEvent: LoggingEvent): Unit = {
if (loggingEvent.getRenderedMessage().contains("Generated method too long")) {
seenMessage = true
}
}

override def close(): Unit = {}
override def requiresLayout(): Boolean = false
}

val appender = new MockAppender()
withLogAppender(appender) {
val x = 42
val expr = HugeCodeIntExpression(x)
val proj = GenerateUnsafeProjection.generate(Seq(expr))
val actual = proj(null)
assert(actual.getInt(0) == x)
}
assert(appender.seenMessage)
}

private def withLogAppender(appender: Appender)(f: => Unit): Unit = {
val logger =
Logger.getLogger(classOf[CodeGenerator[_, _]].getName)
logger.addAppender(appender)
try f finally {
logger.removeAppender(appender)
}
}
}

case class HugeCodeIntExpression(value: Int) extends Expression {
override def nullable: Boolean = true
override def dataType: DataType = IntegerType
override def children: Seq[Expression] = Nil
override def eval(input: InternalRow): Any = value
override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
// Assuming HugeMethodLimit to be 8000
val HugeMethodLimit = CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT
// A single "int dummyN = 0;" will be at least 2 bytes of bytecode:
// 0: iconst_0
// 1: istore_1
// and it'll become bigger as the number of local variables increases.
// So 4000 such dummy local variable definitions are sufficient to bump the bytecode size
// of a generated method to above 8000 bytes.
val hugeCode = (0 until (HugeMethodLimit / 2)).map(i => s"int dummy$i = 0;").mkString("\n")
val code =
code"""{
| $hugeCode
|}
|boolean ${ev.isNull} = false;
|int ${ev.value} = $value;
""".stripMargin
ev.copy(code = code)
}
}