-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-29061][SQL] Prints bytecode statistics in debugCodegen #25766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -1211,6 +1211,20 @@ abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Loggin | |||||
| } | ||||||
| } | ||||||
|
|
||||||
| /** | ||||||
| * Java bytecode statistics of a compiled class by Janino. | ||||||
| */ | ||||||
| case class ByteCodeStats(maxClassCodeSize: Int, maxMethodCodeSize: Int, maxConstPoolSize: Int) | ||||||
|
|
||||||
| object ByteCodeStats { | ||||||
|
|
||||||
| val unavailable = ByteCodeStats(-1, -1, -1) | ||||||
|
||||||
| val unavailable = ByteCodeStats(-1, -1, -1) | |
| val UNAVAILABLE = ByteCodeStats(-1, -1, -1) |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmmh..do we really need this?
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: inspecting janino private fields.inspecting janino private fields seems weird.
Also: could we always spell "Janino" as such?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh... I'll fix soon. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to make the code more readable, by
val (classSizes, maxMethodSizes, constPoolSize) = classes.map....unzip3
ByteCodeStats(
maxClassCodeSize = classSizes.max,
maxMethodCodeSize = maxMethodSizes.max,
maxConstPoolSize = constPoolSize.max,
// Minus 2 for `GeneratedClass` and an outer-most generated class
numInnerClasses = classSizes.size - 2)
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious: now that we've got a nice new ByteCodeStats type, why use a tuple here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong reason... I just did because I avoided the longer statement in https://github.com/apache/spark/pull/25766/files#diff-8bcc5aea39c73d4bf38aef6f6951d42cR1382;
ByteCodeStats(codeStats.reduce { case (v1, v2) =>
(Math.max(v1.maxClassCodeSize, v2.maxClassCodeSize),
Math.max(v1.maxMethodCodeSize, v2.maxMethodCodeSize),
Math.max(v1.maxConstPoolSize, v2.maxConstPoolSize))
})
If there are other reviewers who like that, I'll update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find named fields much more readable than _1 _2 _3. In fact even with tuples I may have written the code like:
ByteCodeStats(codeStats.reduce { case ((maxClassCodeSize1, maxMethodCodeSize1, maxConstPoolSize), (maxClassCodeSize2, maxMethodCodeSize2, maxConstPoolSize2)) =>
(Math.max(maxClassCodeSize1, maxClassCodeSize2),
Math.max(maxMethodCodeSize1, maxMethodCodeSize2),
Math.max(maxConstPoolSize1, maxConstPoolSize2))
})and...I'd say the v1.maxClassCodeSize version looks better here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about the latest code? I added a new metric (# of inner classes), so using a tuple in that part is ok?
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this direction.
May we see these values regarding different classes? Is it better to show class name and method name, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, this pr prints statistics per a whole-stage codegen entry, so the current one looks ok to me.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,14 +20,15 @@ package org.apache.spark.sql.execution | |
| import java.util.Collections | ||
|
|
||
| import scala.collection.JavaConverters._ | ||
| import scala.util.control.NonFatal | ||
|
|
||
| import org.apache.spark.broadcast.Broadcast | ||
| import org.apache.spark.internal.Logging | ||
| import org.apache.spark.rdd.RDD | ||
| import org.apache.spark.sql._ | ||
| import org.apache.spark.sql.catalyst.InternalRow | ||
| import org.apache.spark.sql.catalyst.expressions.Attribute | ||
| import org.apache.spark.sql.catalyst.expressions.codegen.{CodeFormatter, CodegenContext, ExprCode} | ||
| import org.apache.spark.sql.catalyst.expressions.codegen.{ByteCodeStats, CodeFormatter, CodegenContext, CodeGenerator, ExprCode} | ||
| import org.apache.spark.sql.catalyst.plans.physical.Partitioning | ||
| import org.apache.spark.sql.catalyst.trees.TreeNodeRef | ||
| import org.apache.spark.sql.catalyst.util.StringUtils.StringConcat | ||
|
|
@@ -81,11 +82,14 @@ package object debug { | |
| def writeCodegen(append: String => Unit, plan: SparkPlan): Unit = { | ||
| val codegenSeq = codegenStringSeq(plan) | ||
| append(s"Found ${codegenSeq.size} WholeStageCodegen subtrees.\n") | ||
| for (((subtree, code), i) <- codegenSeq.zipWithIndex) { | ||
| append(s"== Subtree ${i + 1} / ${codegenSeq.size} ==\n") | ||
| for (((subtree, code, codeStats), i) <- codegenSeq.zipWithIndex) { | ||
| val codeStatsStr = s"maxClassCodeSize:${codeStats.maxClassCodeSize} " + | ||
|
||
| s"maxMethodCodeSize:${codeStats.maxMethodCodeSize} " + | ||
| s"maxConstantPoolSize:${codeStats.maxConstPoolSize}" | ||
| append(s"== Subtree ${i + 1} / ${codegenSeq.size} ($codeStatsStr) ==\n") | ||
| append(subtree) | ||
| append("\nGenerated code:\n") | ||
| append(s"${code}\n") | ||
| append(s"$code\n") | ||
| } | ||
| } | ||
|
|
||
|
|
@@ -95,7 +99,7 @@ package object debug { | |
| * @param plan the query plan for codegen | ||
| * @return Sequence of WholeStageCodegen subtrees and corresponding codegen | ||
| */ | ||
| def codegenStringSeq(plan: SparkPlan): Seq[(String, String)] = { | ||
| def codegenStringSeq(plan: SparkPlan): Seq[(String, String, ByteCodeStats)] = { | ||
| val codegenSubtrees = new collection.mutable.HashSet[WholeStageCodegenExec]() | ||
| plan transform { | ||
| case s: WholeStageCodegenExec => | ||
|
|
@@ -105,7 +109,13 @@ package object debug { | |
| } | ||
| codegenSubtrees.toSeq.map { subtree => | ||
| val (_, source) = subtree.doCodeGen() | ||
| (subtree.toString, CodeFormatter.format(source)) | ||
| val codeStats = try { | ||
| CodeGenerator.compile(source)._2 | ||
| } catch { | ||
| case NonFatal(_) => | ||
| ByteCodeStats.unavailable | ||
| } | ||
| (subtree.toString, CodeFormatter.format(source), codeStats) | ||
| } | ||
| } | ||
|
|
||
|
|
@@ -130,7 +140,7 @@ package object debug { | |
| * @param query the streaming query for codegen | ||
| * @return Sequence of WholeStageCodegen subtrees and corresponding codegen | ||
| */ | ||
| def codegenStringSeq(query: StreamingQuery): Seq[(String, String)] = { | ||
| def codegenStringSeq(query: StreamingQuery): Seq[(String, String, ByteCodeStats)] = { | ||
| val w = asStreamExecution(query) | ||
| if (w.lastExecution != null) { | ||
| codegenStringSeq(w.lastExecution.executedPlan) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about adding also the number od inner classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks nice.