Skip to content

Conversation

@rednaxelafx
Copy link
Contributor

What changes were proposed in this pull request?

Add logging for all generated methods from the CodeGenerator whose bytecode size goes above 8000 bytes.
This is to help with gathering stats on how often Spark is generating methods too big to be JIT'd. It covers all codegen scenarios, include whole-stage codegen and also individual expression codegen, e.g. unsafe projection, mutable projection, etc.

How was this patch tested?

Manually tested that logging did happen when generated method was above 8000 bytes.
Also added a new unit test case to CodeGenerationSuite to verify that the logging did happen.

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE.update(byteCodeSize)

if (byteCodeSize > DEFAULT_JVM_HUGE_METHOD_LIMIT) {
logInfo("Generated method too long to be JIT compiled: " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why info and not debug?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we hit this, the JIT will very likely not work and performance may drop a lot. This even worth a warning...

Since it's just an estimation and Spark SQL can still work, I think info is fine here.

@cloud-fan
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Aug 14, 2018

Test build #94740 has finished for PR 22103 at commit 640c9cd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE.update(byteCodeSize)

if (byteCodeSize > DEFAULT_JVM_HUGE_METHOD_LIMIT) {
logInfo("Generated method too long to be JIT compiled: " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Generated method is too long ...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say either way is fine. They're different tenses and the nuances are slightly different.

"This story is too good to be true"
vs
"A story too good to be true"

val cf = new ClassFile(new ByteArrayInputStream(classBytes))
val stats = cf.methodInfos.asScala.flatMap { method =>
method.getAttributes().filter(_.getClass.getName == codeAttr.getName).map { a =>
method.getAttributes().filter(_.getClass eq codeAttr).map { a =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need change this condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The getName was accessing the class name unnecessarily and then doing string comparison unnecessarily. Just changing it when touching the code around it.
The JVM guarantees that a (defining class loader, full class name) pair is unique at runtime, in this case the java.lang.Class instance is guaranteed to be unique, so a reference equality check is fast and sufficient.
There's no worry of cross class loader issue here, because if there is, the code that follows won't work anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I know The current comparison is more strict. Although the previous comparison was only for name, the current comparison is for a pair of class loader and name.

I worried whether the strictness may change behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worried whether the strictness may change behavior.

Right, I can tell. And as I've mentioned above, although my new check is stricter, it doesn't make the behavior any "worse" than before, because we're reflectively accessing the code field immediately after via codeAttrField.get(a), and that won't work unless the classes are matching exactly.

The old code before my change would actually be too permissive -- in the case of class loader mismatch, the old check will allow the it go run to the reflective access site, but it'll then fail because reflection doesn't allow access from the wrong class.

This can be exemplified by the following pseudocode

val c1 = new URLClassLoader(somePath).loadClass("Foo") // load a class
val c2 = new URLClassLoader(somePath).loadClass("Foo") // load another class with the same name from the same path, but different class loader
val nameEq = c1.getName == c2.getName // true
val refEq = c1 eq c2 // false
val f1 = c1.getClass.getField("a")
val o1 = c1.newInstance
val o2 = c2.newInstance
f1.get(o1) // okay
f1.get(o2) // fail with exception

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks! Merged to master.

@asfgit asfgit closed this in 3c614d0 Aug 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants