Skip to content

[SPARK-22716][SQL] Avoid the creation of mutable states in addReferenceObj#19916

Closed
mgaido91 wants to merge 7 commits intoapache:masterfrom
mgaido91:SPARK-22716
Closed

[SPARK-22716][SQL] Avoid the creation of mutable states in addReferenceObj#19916
mgaido91 wants to merge 7 commits intoapache:masterfrom
mgaido91:SPARK-22716

Conversation

@mgaido91
Copy link
Copy Markdown
Contributor

@mgaido91 mgaido91 commented Dec 6, 2017

What changes were proposed in this pull request?

We have two methods to reference an object addReferenceMinorObj and addReferenceObj . The latter creates a new global variable, which means new entries in the constant pool.

The PR unifies the two method in a single addReferenceObj which returns the code to access the object in the references array and doesn't add new mutable states.

How was this patch tested?

added UTs.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Dec 7, 2017

Test build #84575 has finished for PR 19916 at commit 3004a69.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Copy Markdown
Member

viirya commented Dec 7, 2017

val schemaField = ctx.addReferenceObj("schema", schema)

@mgaido91
Copy link
Copy Markdown
Contributor Author

mgaido91 commented Dec 7, 2017

@viirya thanks but that is going to be fixed by #19908, that is why I haven't changed it here. Thanks.

@viirya
Copy link
Copy Markdown
Member

viirya commented Dec 7, 2017

Yea, I noticed that. Thanks.

val plan = df.queryExecution.executedPlan
val sortExec = plan.children.head.asInstanceOf[SortExec]
sortExec.produce(ctx, plan.asInstanceOf[CodegenSupport])
// we expect 8 global variables:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test and that one in BroadcastJoinSuite seems a bit overkill.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I was also not very happy with them too, but I have not found a better way to test them. Do you have any idea/suggestion? Thanks.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we make sure addReferenceMinorObj work well, the two tests may not be necessary, IMHO. Let's wait others options.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, we don't need new tests if we trust addReferenceMinorObj

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I'll remove them

@viirya
Copy link
Copy Markdown
Member

viirya commented Dec 7, 2017

LGTM with one minor comment.

@cloud-fan
Copy link
Copy Markdown
Contributor

Actually I think addReferenceObj is not a strong use case for global variable, how hard it is to remove all of them?

@viirya
Copy link
Copy Markdown
Member

viirya commented Dec 7, 2017

I also have the same feeling when I just looked at addReferenceObj. Seems no necessary reason to have a global variable for such reference object.

@mgaido91
Copy link
Copy Markdown
Contributor Author

mgaido91 commented Dec 7, 2017

my only concern about this PR and about removing addReferenceObj is that we are saving a cast every time. I am not sure about how much this can affect performances, but I am planning to make some tests. Actually after this patch, the only place where it is used is for metric terms.
WDYT?

@mgaido91
Copy link
Copy Markdown
Contributor Author

mgaido91 commented Dec 7, 2017

I am reporting the result of my tests. I run:

   Object a = new Integer(1);
   long start = System.currentTimeMillis();
   for(long i = 0; i < 1000000000; ++ i){
     Integer b = ((Integer)a) + 1;
   }
   long end = System.currentTimeMillis();
   System.out.println("Time for casting: " + (end-start));
   long start1 = System.currentTimeMillis();
   Integer ai = (Integer) a;
   for(long i = 0; i < 1000000000; ++ i){
     Integer b1 = ai + 1;
   }
   long end1 = System.currentTimeMillis();   
   System.out.println("Time without casting: " + (end1-start1));

In the compiled code there is an additional operation checkcast (both with janinoc and javac), but the performance result of running this seems telling that there in not a significant impact:

➜  janino_tests java A                            
Time for casting: 305
Time without casting: 311
➜  janino_tests java A
Time for casting: 305
Time without casting: 302
➜  janino_tests java A
Time for casting: 309
Time without casting: 293
➜  janino_tests java A
Time for casting: 291
Time without casting: 312
➜  janino_tests java A
Time for casting: 297
Time without casting: 294

What do you think given these results? Should I go on with the PR and remove also the remaining addReferenceObj?
Thanks.

@cloud-fan
Copy link
Copy Markdown
Contributor

Thanks for the test! Yea lets' remove all of them!

@SparkQA
Copy link
Copy Markdown

SparkQA commented Dec 8, 2017

Test build #84651 has finished for PR 19916 at commit 29c92c0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Copy Markdown
Member

kiszk commented Dec 8, 2017

@mgaido91 Thank you for running the benchmark. I have two questions.

  1. Did you confirm that you measured only the jitted code. This is because this benchmark may not include warm-up run.
  2. Did you have chance that loop body is not eliminated? This is because local variable b and b1 are not read anywhere. Thus, an optimizing compiler may delete an store and related operations.

@mgaido91
Copy link
Copy Markdown
Contributor Author

mgaido91 commented Dec 8, 2017

thanks for your answer @kiszk. I included the code in a main function and run it with the command I showed in the previous comment.
The loop is not eliminated, I checked the java bytecode as said, and the only difference between the two loops is an additional operation checkcast.

For reference I can report here the bytecode:

         0: new           #2                  // class java/lang/Integer
         3: dup
         4: iconst_1
         5: invokespecial #3                  // Method java/lang/Integer."<init>":(I)V
         8: astore_1
         9: invokestatic  #4                  // Method java/lang/System.currentTimeMillis:()J
        12: lstore_2
        13: lconst_0
        14: lstore        4
        16: lload         4
        18: ldc2_w        #5                  // long 1000000000l
        21: lcmp
        22: ifge          48
        25: aload_1
        26: checkcast     #2                  // class java/lang/Integer
        29: invokevirtual #7                  // Method java/lang/Integer.intValue:()I
        32: iconst_1
        33: iadd
        34: invokestatic  #8                  // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
        37: astore        6
        39: lload         4
        41: lconst_1
        42: ladd
        43: lstore        4
        45: goto          16
        48: invokestatic  #4                  // Method java/lang/System.currentTimeMillis:()J
        51: lstore        4
        53: getstatic     #9                  // Field java/lang/System.out:Ljava/io/PrintStream;
        56: new           #10                 // class java/lang/StringBuilder
        59: dup
        60: invokespecial #11                 // Method java/lang/StringBuilder."<init>":()V
        63: ldc           #12                 // String Time for casting:
        65: invokevirtual #13                 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
        68: lload         4
        70: lload_2
        71: lsub
        72: invokevirtual #14                 // Method java/lang/StringBuilder.append:(J)Ljava/lang/StringBuilder;
        75: invokevirtual #15                 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
        78: invokevirtual #16                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
        81: invokestatic  #4                  // Method java/lang/System.currentTimeMillis:()J
        84: lstore        6
        86: aload_1
        87: checkcast     #2                  // class java/lang/Integer
        90: astore        8
        92: lconst_0
        93: lstore        9
        95: lload         9
        97: ldc2_w        #5                  // long 1000000000l
       100: lcmp
       101: ifge          125
       104: aload         8
       106: invokevirtual #7                  // Method java/lang/Integer.intValue:()I
       109: iconst_1
       110: iadd
       111: invokestatic  #8                  // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
       114: astore        11
       116: lload         9
       118: lconst_1
       119: ladd
       120: lstore        9
       122: goto          95
       125: invokestatic  #4                  // Method java/lang/System.currentTimeMillis:()J
       128: lstore        9
       130: getstatic     #9                  // Field java/lang/System.out:Ljava/io/PrintStream;
       133: new           #10                 // class java/lang/StringBuilder
       136: dup
       137: invokespecial #11                 // Method java/lang/StringBuilder."<init>":()V
       140: ldc           #17                 // String Time without casting:
       142: invokevirtual #13                 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
       145: lload         9
       147: lload         6
       149: lsub
       150: invokevirtual #14                 // Method java/lang/StringBuilder.append:(J)Ljava/lang/StringBuilder;
       153: invokevirtual #15                 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
       156: invokevirtual #16                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
       159: return

Did I answer properly to your questions?

@kiszk
Copy link
Copy Markdown
Member

kiszk commented Dec 8, 2017

Thank you for checking Java bytecode. I am talking about native code, not about bytecode. Hotspot compiler may eliminate statements.

I think that this elapsed time includes interpreter execution, JIT compilation, and native code execution. It would be good to add warm-up. Did you see the native code sequence generated by the HotSpot compiler?

I think that it is not the best to write a long-running loop in main() method since it leads to on-stack-replacement. It would be good to put the long-running target loop into a callee.
I think that it would be good to store a result into a global variable unlikely to eliminate unused variables and related operations.

How about such a code? WDYT?

Integer globalVar g;
void targetMethod() {
   Integer b;
   for(long i = 0; i < 1000000000; ++ i){
     b += ((Integer)a) + 1;
   }
   g = b;
}
main() { 
  // warm up
  for (int i = 0; i < 10000000; i++) { targetMethod();}

  long start = System.currentTimeMillis();
  targetMethod();
  long end = System.currentTimeMillis();
}

@mgaido91
Copy link
Copy Markdown
Contributor Author

mgaido91 commented Dec 8, 2017

sure, I can make the test as you suggested, I'll report here the results in few minutes, thanks.

@mgaido91
Copy link
Copy Markdown
Contributor Author

mgaido91 commented Dec 8, 2017

Thank you again very much for your comment @kiszk! Results are different now and they show a difference in performance.
The code I am running is:

public class A {
 private static Long g;
 
 public static void main(String[] args){
   A a = new A();
   a.testWithCast();
   a.testWithoutCast();
   long start = System.currentTimeMillis();
   a.testWithCast();
   long end = System.currentTimeMillis();
   System.out.println("Time for casting: " + (end-start));
   long start1 = System.currentTimeMillis();
   a.testWithoutCast();
   long end1 = System.currentTimeMillis();   
   System.out.println("Time without casting: " + (end1-start1));
  
 }
 
 public void testWithCast() {
   Object a = new Long(1);
   Long b = 0L;
   for(long i = 0; i < 1000000000; ++ i){
     b += ((Long)a);
   }
   g = b;
 }
 
 public void testWithoutCast() {
   Object a = new Long(1);
   Long al = (Long) a;
   Long b = 0L;
   for(long i = 0; i < 1000000000; ++ i){
     b += al;
   }
   g = b;
 }
}

and the results now are:

➜  janino_tests java A
Time for casting: 3484
Time without casting: 3393
➜  janino_tests java A
Time for casting: 3348
Time without casting: 3262
➜  janino_tests java A
Time for casting: 3491
Time without casting: 3281
➜  janino_tests java A
Time for casting: 3379
Time without casting: 3416
➜  janino_tests java A
Time for casting: 3358
Time without casting: 3365
➜  janino_tests java A
Time for casting: 3276
Time without casting: 3244
➜  janino_tests java A
Time for casting: 3628
Time without casting: 3259
➜  janino_tests java A
Time for casting: 3344
Time without casting: 3300

Thus the time needed for casting is visible now.
The average time without the cast is 3331 ms while the average time with the cast is 3413 ms, which means a 2.4% difference of overall time.
I am not sure, then, that replacing the old method is the right thing to do. Anyway, we can also consider that in this case I am just performing a very simple operation other than the cast, thus in real use cases the impact would be much lower.
WDYT?

PS I edited the comment because I forget the warmup cycle yesterday. Thanks again to @kiszk for his suggestions about how to run properly this benchmark.

@cloud-fan
Copy link
Copy Markdown
Contributor

In a real application I think the difference is much smaller than 2.4%. So I think it's ok to remove addReferenceObj.

One problem is readability of the generated code, may be we can generate
(MyClass) reference[3] /* variableNane */

@mgaido91
Copy link
Copy Markdown
Contributor Author

@cloud-fan thanks for your answer. I don't think that something like:

((MyClass) reference[3] /* variableName */).doSomething();

would be readable. But it is just my opinion. Do you want me to add a local variable to generate more readable code? Or if you think that adding the comment is the right thing to do, I can do that.
Thanks.

@cloud-fan
Copy link
Copy Markdown
Contributor

adding a local variable to generate more readable code is better, but that needs a lot of caller-side change, which may not worth.

@mgaido91
Copy link
Copy Markdown
Contributor Author

@cloud-fan I think a lot of caller-side changes would be needed as well, since now we are not passing any variable name when we are referencing with addReferenceMinorObj. I am not sure that adding a new argument to addReferenceMinorObj only for the comment is the right way to go.

Sorry for this additional comment, I just want to make sure that I am providing you all the details to choose the best option. I swear that now if you say, let's add the comment, I'll do.

@cloud-fan
Copy link
Copy Markdown
Contributor

I think a lot of caller-side changes would be needed as well, since now we are not passing any variable name

I mean something like

def addReferenceObj(name: String, obj: Any, className: String = null): String = {
  val idx = references.length
  references += obj
  val clsName = Option(className).getOrElse(obj.getClass.getName)
  val comment = Option(name).map(n => s"/* $name */").getOrElse("")
    s"(($clsName) references[$idx] $comment)"
  }

def addReferenceMinorObj(obj: Any, className: String = null): String = {
  addReferenceObj(name = null, obj, className)
} 

Then no caller side change is needed.

In the future we can also remove addReferenceMinorObj as it's always better to have a name.

@mgaido91
Copy link
Copy Markdown
Contributor Author

@cloud-fan thanks I am doing this. Actually I already removed addReferenceMinorObj in this PR. Thanks.

@kiszk
Copy link
Copy Markdown
Member

kiszk commented Dec 11, 2017

I have a question about benchmark. What is the purpose of this warmup? Why does this code perform warmup run for a loop will not be measured?

@mgaido91
Copy link
Copy Markdown
Contributor Author

@kiszk sorry, may I please ask you to elaborate a bit more what you meant? Thanks.

case other =>
ev.copy(code = "", value = ctx.addReferenceMinorObj(value, ctx.javaType(dataType)))
case _ =>
ev.copy(code = "", value = ctx.addReferenceObj("literalValue", value,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: literal instead of literalValue

test(null, null, null)
}

test("SPARK-22716: UnixTimestamp should not use global variables") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need a lot of this kind of tests, just one test to make sure ctx.addReferenceObj doesn't add global variable.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Dec 11, 2017

Test build #84716 has finished for PR 19916 at commit bfa3bae.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Dec 12, 2017

Test build #84762 has finished for PR 19916 at commit 9bbfe2a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Copy Markdown
Contributor Author

Jenkins, retest this please

@cloud-fan
Copy link
Copy Markdown
Contributor

LGTM, pending jenkins

@SparkQA
Copy link
Copy Markdown

SparkQA commented Dec 12, 2017

Test build #84767 has finished for PR 19916 at commit 9bbfe2a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Copy Markdown
Member

viirya commented Dec 12, 2017

Looks like a valid test failure.

@mgaido91
Copy link
Copy Markdown
Contributor Author

@viirya it is caused by the fact that the generated code now is bigger and it triggered reduceCodeSize for ScalaUDF. I fixed the UT. Thanks.

@kiszk
Copy link
Copy Markdown
Member

kiszk commented Dec 12, 2017

@mgaido91 I am asking the following code. This code will be translated to the native code. However, I think that this code does not make any affect to testWithCast and testWithoutCast.
Did you confirm that this measurement does not include the JIT compilation time?

   // warmup cycle
   Long b = 0L;
   for(long i = 0; i < 1000000000; ++ i){
     b += al;
   }

@mgaido91
Copy link
Copy Markdown
Contributor Author

@kiszk, yes, sorry, I see now that the code I pasted is not updated. I will update it immediately. I am calling the tho methods instead of that fake warmup cycle. Sorry, my bad. I am updating my previous comment with the right code, sorry.

@viirya
Copy link
Copy Markdown
Member

viirya commented Dec 12, 2017

The PR title and description need to be modified accordingly.

@mgaido91 mgaido91 changed the title [SPARK-22716][SQL] Replace addReferenceObj to reduce the number of global variables [SPARK-22716][SQL] Avoid the creation of mutable states in addReferenceObj Dec 12, 2017
@SparkQA
Copy link
Copy Markdown

SparkQA commented Dec 12, 2017

Test build #84771 has finished for PR 19916 at commit 8c6dfb1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Copy Markdown
Contributor

thanks, merging to master!

@asfgit asfgit closed this in 4117786 Dec 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants