[SPARK-25113][SQL] Add logging to CodeGenerator when any generated method's bytecode size goes above HugeMethodLimit #22103

rednaxelafx · 2018-08-14T10:53:21Z

What changes were proposed in this pull request?

Add logging for all generated methods from the CodeGenerator whose bytecode size goes above 8000 bytes.
This is to help with gathering stats on how often Spark is generating methods too big to be JIT'd. It covers all codegen scenarios, include whole-stage codegen and also individual expression codegen, e.g. unsafe projection, mutable projection, etc.

How was this patch tested?

Manually tested that logging did happen when generated method was above 8000 bytes.
Also added a new unit test case to CodeGenerationSuite to verify that the logging did happen.

… bytecode size goes above HugeMethodLimit

hvanhovell

LGTM

mgaido91 · 2018-08-14T11:19:30Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala

            CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE.update(byteCodeSize)
+
+            if (byteCodeSize > DEFAULT_JVM_HUGE_METHOD_LIMIT) {
+              logInfo("Generated method too long to be JIT compiled: " +


why info and not debug?

when we hit this, the JIT will very likely not work and performance may drop a lot. This even worth a warning...

Since it's just an estimation and Spark SQL can still work, I think info is fine here.

cloud-fan · 2018-08-14T12:46:00Z

LGTM

SparkQA · 2018-08-14T14:48:10Z

Test build #94740 has finished for PR 22103 at commit 640c9cd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-08-14T15:18:44Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala

            CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE.update(byteCodeSize)
+
+            if (byteCodeSize > DEFAULT_JVM_HUGE_METHOD_LIMIT) {
+              logInfo("Generated method too long to be JIT compiled: " +


nit: Generated method is too long ...?

I'd say either way is fine. They're different tenses and the nuances are slightly different.

"This story is too good to be true"
vs
"A story too good to be true"

kiszk · 2018-08-14T15:24:03Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala

        val cf = new ClassFile(new ByteArrayInputStream(classBytes))
        val stats = cf.methodInfos.asScala.flatMap { method =>
-          method.getAttributes().filter(_.getClass.getName == codeAttr.getName).map { a =>
+          method.getAttributes().filter(_.getClass eq codeAttr).map { a =>


Why do we need change this condition?

The getName was accessing the class name unnecessarily and then doing string comparison unnecessarily. Just changing it when touching the code around it.
The JVM guarantees that a (defining class loader, full class name) pair is unique at runtime, in this case the java.lang.Class instance is guaranteed to be unique, so a reference equality check is fast and sufficient.
There's no worry of cross class loader issue here, because if there is, the code that follows won't work anyway.

Yeah, I know The current comparison is more strict. Although the previous comparison was only for name, the current comparison is for a pair of class loader and name.

I worried whether the strictness may change behavior.

I worried whether the strictness may change behavior.

Right, I can tell. And as I've mentioned above, although my new check is stricter, it doesn't make the behavior any "worse" than before, because we're reflectively accessing the code field immediately after via codeAttrField.get(a), and that won't work unless the classes are matching exactly.

The old code before my change would actually be too permissive -- in the case of class loader mismatch, the old check will allow the it go run to the reflective access site, but it'll then fail because reflection doesn't allow access from the wrong class.

This can be exemplified by the following pseudocode

val c1 = new URLClassLoader(somePath).loadClass("Foo") // load a class val c2 = new URLClassLoader(somePath).loadClass("Foo") // load another class with the same name from the same path, but different class loader val nameEq = c1.getName == c2.getName // true val refEq = c1 eq c2 // false val f1 = c1.getClass.getField("a") val o1 = c1.newInstance val o2 = c2.newInstance f1.get(o1) // okay f1.get(o2) // fail with exception

gatorsmile

LGTM

Thanks! Merged to master.

SPARK-25113: Add logging to CodeGenerator when any generated method's…

640c9cd

… bytecode size goes above HugeMethodLimit

hvanhovell approved these changes Aug 14, 2018

View reviewed changes

mgaido91 reviewed Aug 14, 2018

View reviewed changes

kiszk reviewed Aug 14, 2018

View reviewed changes

gatorsmile reviewed Aug 14, 2018

View reviewed changes

asfgit closed this in 3c614d0 Aug 14, 2018

[SPARK-25113][SQL] Add logging to CodeGenerator when any generated method's bytecode size goes above HugeMethodLimit #22103

[SPARK-25113][SQL] Add logging to CodeGenerator when any generated method's bytecode size goes above HugeMethodLimit #22103

Uh oh!

Conversation

rednaxelafx commented Aug 14, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

hvanhovell left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Aug 14, 2018

Uh oh!

SparkQA commented Aug 14, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants