-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25113][SQL] Add logging to CodeGenerator when any generated method's bytecode size goes above HugeMethodLimit #22103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… bytecode size goes above HugeMethodLimit
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE.update(byteCodeSize) | ||
|
|
||
| if (byteCodeSize > DEFAULT_JVM_HUGE_METHOD_LIMIT) { | ||
| logInfo("Generated method too long to be JIT compiled: " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why info and not debug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we hit this, the JIT will very likely not work and performance may drop a lot. This even worth a warning...
Since it's just an estimation and Spark SQL can still work, I think info is fine here.
|
LGTM |
|
Test build #94740 has finished for PR 22103 at commit
|
| CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE.update(byteCodeSize) | ||
|
|
||
| if (byteCodeSize > DEFAULT_JVM_HUGE_METHOD_LIMIT) { | ||
| logInfo("Generated method too long to be JIT compiled: " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Generated method is too long ...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say either way is fine. They're different tenses and the nuances are slightly different.
"This story is too good to be true"
vs
"A story too good to be true"
| val cf = new ClassFile(new ByteArrayInputStream(classBytes)) | ||
| val stats = cf.methodInfos.asScala.flatMap { method => | ||
| method.getAttributes().filter(_.getClass.getName == codeAttr.getName).map { a => | ||
| method.getAttributes().filter(_.getClass eq codeAttr).map { a => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need change this condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The getName was accessing the class name unnecessarily and then doing string comparison unnecessarily. Just changing it when touching the code around it.
The JVM guarantees that a (defining class loader, full class name) pair is unique at runtime, in this case the java.lang.Class instance is guaranteed to be unique, so a reference equality check is fast and sufficient.
There's no worry of cross class loader issue here, because if there is, the code that follows won't work anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I know The current comparison is more strict. Although the previous comparison was only for name, the current comparison is for a pair of class loader and name.
I worried whether the strictness may change behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worried whether the strictness may change behavior.
Right, I can tell. And as I've mentioned above, although my new check is stricter, it doesn't make the behavior any "worse" than before, because we're reflectively accessing the code field immediately after via codeAttrField.get(a), and that won't work unless the classes are matching exactly.
The old code before my change would actually be too permissive -- in the case of class loader mismatch, the old check will allow the it go run to the reflective access site, but it'll then fail because reflection doesn't allow access from the wrong class.
This can be exemplified by the following pseudocode
val c1 = new URLClassLoader(somePath).loadClass("Foo") // load a class
val c2 = new URLClassLoader(somePath).loadClass("Foo") // load another class with the same name from the same path, but different class loader
val nameEq = c1.getName == c2.getName // true
val refEq = c1 eq c2 // false
val f1 = c1.getClass.getField("a")
val o1 = c1.newInstance
val o2 = c2.newInstance
f1.get(o1) // okay
f1.get(o2) // fail with exception
gatorsmile
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks! Merged to master.
What changes were proposed in this pull request?
Add logging for all generated methods from the
CodeGeneratorwhose bytecode size goes above 8000 bytes.This is to help with gathering stats on how often Spark is generating methods too big to be JIT'd. It covers all codegen scenarios, include whole-stage codegen and also individual expression codegen, e.g. unsafe projection, mutable projection, etc.
How was this patch tested?
Manually tested that logging did happen when generated method was above 8000 bytes.
Also added a new unit test case to
CodeGenerationSuiteto verify that the logging did happen.