Provide alternative to warming up multiple identical Contexts #67

japgolly · 2018-10-31T09:18:55Z

I'm planning to create a pool of Contexts that will all do the same thing, that I warm up for ~30 sec first.

The problem is even though I'm sharing the same Engine between the Contexts and evaluating the same Sources in each in the same order, context B doesn't benefit from context A's warmup. I was thinking the Engine is where all of the JIT memory would be and warming up one context would warm up the engine thus having the effect of all contexts being warmed up.

Is this as designed?

If so, what options are there for sharing optimisation between identical contexts? Being used to FP and everything being immutable, an idea I had was to create and warm up one Context, then copy it to create the other Contexts (from which point they could diverge) but I didn't see such an option. Is there something similar available?

Even better would be to warmup a Context then be able to serialise it! That would allow us to create one at build-time, include it in our applications docker builds, then on startup simply deserialise it mulltiple time to not even need warmup on application startup.

What do you think?

chumer · 2018-10-31T11:47:22Z

Thank you for this excellent question.

Is this as designed?

It currently depends on the language how much of the code is reused for multiple contexts per engine. Currently for Graal.JS the ASTs and code cache is only shared if the previous context was closed. This is currently necessary as we are not fully confident to share the ASTs with multiple threads.

To clarify, this one shares the code:

Engine engine = Engine.create();
Source source = ...;
try(Context context = Context.newBuilder().engine(engine).build()) {
  context.eval(source);
}

And, this one doesn't atm:

Engine engine = Engine.create();
Source source = ...;
Context c0 = Context.newBuilder().engine(engine).build();
c0.eval(source);
Context c1 = Context.newBuilder().engine(engine).build();
c1.eval(source);
c0.close();
c1.close();

I agree the current situation is not ideal. This is what we will do to improve the situation:

Short term (1-2 months): We will allow to bind a context to a single thread only. This will allow us to share code for multiple contexts, if they are used on the same thread.
Mid term (~6months): We will allow to share code from any thread for Graal.JS.
Long term(TBA): support serialization of SVM (native-image) isolates (that is very far out).

Would 1) help you already? Or do you need your contexts on multiple threads?

japgolly · 2018-11-01T03:43:10Z

Hi @chumer, thanks a lot for that info!

Currently for Graal.JS the ASTs and code cache is only shared if the previous context was closed

Ok great, I gave something similar to the below a try and it didn't perform the way I expected. I expected that the eval in the second try block would be very fast but it was still very slow as if it hadn't been warmed up yet.

Engine engine = Engine.create();
Source source = ...;
try(Context context = Context.newBuilder().engine(engine).build()) {
  for (i=0; i<10000; i++) context.eval(source);
  context.eval(source); // <-- at this point, eval takes ~5 ms
}
try(Context context = Context.newBuilder().engine(engine).build()) {
  context.eval(source); // <-- at this point, eval takes 300+ ms
}

Would 1) help you already? Or do you need your contexts on multiple threads?

I don't think that would help. My goal is to have a number of different contexts, each bound to a specific thread. In other words, I don't want to share contexts between threads, I'd just like to create one context, warm it up, create warm copies, then allocate it and the copies to a thread each. In simple sample code I'm envisioning something like this:

Engine engine = Engine.create()

Context context1 = Context.newBuilder().engine(engine).build();
warmup(context1);

Context context2 = context1.copy();
Context context3 = context1.copy();

Context[] contextsPerThread = new Array(context1, context2, context3);

Where as at the moment I need to do something like this:

Engine engine = Engine.create()

Context context1 = Context.newBuilder().engine(engine).build();
warmup(context1);

Context context2 = Context.newBuilder().engine(engine).build();
warmup(context2);

Context context3 = Context.newBuilder().engine(engine).build();
warmup(context3);

Context[] contextsPerThread = new Array(context1, context2, context3);

(Oh and btw, I absolutely ❤️love ❤️ the work that you all have done on Graal! It's amazing! Thank you so much for all you have done and will continue to do, it's much appreciated!)

chumer · 2018-11-01T12:27:24Z

We will have a look at your example. Just to clarify which GraalVM version are you using?

japgolly · 2018-11-01T22:27:13Z

I'm using rc8. Thanks

…

On Thu., 1 Nov. 2018, 11:27 pm Christian Humer ***@***.*** wrote: We will have a look at your example. Just to clarify which version are you using? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#67 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAMYtyoSKdo_rVfAv2n_6skfvaVjDe1Lks5uquiugaJpZM4YDpmi> .

japgolly · 2018-11-01T22:27:27Z

(on Linux)

…

On Fri., 2 Nov. 2018, 9:26 am David Barri ***@***.*** wrote: I'm using rc8. Thanks On Thu., 1 Nov. 2018, 11:27 pm Christian Humer ***@***.*** wrote: > We will have a look at your example. Just to clarify which version are > you using? > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#67 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AAMYtyoSKdo_rVfAv2n_6skfvaVjDe1Lks5uquiugaJpZM4YDpmi> > . >

ghost · 2018-11-01T23:07:08Z

Looks like the warmup overhead disappears after a few contexts..

public class Warmup {

  private static final int LOOPS = 6;
  private static final int REPS = 100;

  private static final String source =
      "function showProps(obj, max) {" +
      "  for (var i=0; i<max; i++) {" +
      "    for (var prop in obj) {" +
      "      let x = prop + ' : ' + obj[prop];" +
      "    }" +
      "  }" +
      "};" +
      "var obj = {name: 'Graaljs', lang: 'javascript', doStuff: function(x,y,z) {var dummy = x + y + z}};" +
      "showProps(obj, 1000)";

  public static void main(String[] args) throws IOException {
    new Warmup().start(source);
  }

  private void start(String script) throws IOException {
    int contextNumber = 0;
    Source source = Source.newBuilder("js", script, "somecript").build();
    try (Engine e = Engine.create()) {
      while (contextNumber++ < 25) {
        try (Context context = Context.newBuilder("js").engine(e).build()) {
          System.out.println("Next Context: " + contextNumber);
          timeExecution(LOOPS, REPS, () -> {
            context.eval(source);
          });
        }
      }
    }
  }

  private void timeExecution(int loops, int repeats, Runnable code) {
    for (int i = 0; i < loops; i++) {
      long t1 = System.currentTimeMillis();
      for (int j = 0; j < repeats; j++) {
        code.run();
      }
      System.out.println((System.currentTimeMillis() - t1) + "ms");
    }
  }

}

After the 6th context, see a sudden speedup.
But.. if we let the js code loop more times by using showProps(obj, 10000), it appears that the warmup overhead is gone, but also the overall execution is a bit slower..

And this is on windows, openjdk11 with the rc8 graal/truffle/graaljs jars..

wirthi · 2018-11-02T11:01:50Z

Hi @hanzr

What do you mean with "overall execution is a bit slower"?

With REPS=100, I get a peak of "73ms" on my machine; increasing that to REPS=10000, the peak (after warmup) is around "3950ms", which is almost 2x faster per REP.

But what I also see is that with REPS=10000 the first set of 6 iterations are faster than the second set of 6 iterations (at least, excluding the first iteration of the set for both):

Next Context: 1
6735ms
4006ms
3881ms
3870ms
3837ms
3879ms
Next Context: 2
5703ms
4432ms
4384ms
4432ms
4483ms
4433ms

The reason for that seems to be that in the first set, we speculate on having only one Context, while in the second set, this assumption does not hold any longer. With graal.TraceTruffleCompilation enabled, this invalidation is triggered at the start of the second set:

[truffle] opt invalidated @1a64e0c1 |SourceClass OptimizedAssumption |Source Assumption(valid, name=Single Context) |Reason assumption invalidated

@chumer can go into more detail, but I think this drop is somewhat expected.

ghost · 2018-11-02T12:50:12Z

Hi Christian,

I'm getting these numbers for REPS=100 and showProps(obj, 10000):

Next Context: 5
1166ms
538ms
559ms
576ms
559ms
546ms
Next Context: 6
1223ms
739ms
736ms
739ms
739ms
740ms
Next Context: 7
740ms
734ms
730ms
727ms
741ms
738ms

For context 6, the timings (after the first one) are slower than those of the earlier contexts. Then for context 7, there seems to be no more slow first timing (no more warmup?), and times remain pretty constant after that.
With graal.TraceTruffleCompilation enabled, context 6 is the last that shows any logging..

ghost · 2018-11-02T13:31:39Z

It just occured to me that in my example, I'm closing the contexts after use. When I don't, every new context keeps the initial warmup overhead as @japgolly states.

@chumer Could you elaborate a bit more on the improvements 1&2 you mention above?

japgolly · 2018-11-03T12:15:48Z

Hi. I've created a reproduction repo that you can run that demonstrates the kind of results I'm seeing.

To reproduce, install SBT (Scala Build Tool), check out https://github.com/japgolly/misc/tree/graaljs-67 and type sbt run. The main source code is here.

What it does is,

Create a Context, warm it up by running eval 10000x, then benchmark 100x
Create a Context using the same Engine, and repeat the benchmark 100x (without warmup).

Results are:

[info] ================================================================================
[info] Warming up (10000) ...
[info] Benchmarking (100) ...
[info] p50 =   1 ms
[info] p90 =   1 ms
[info] p95 =   1 ms
[info] p98 =   2 ms
[info] p99 =   2 ms
[info] ================================================================================
[info] Benchmarking (100) ...
[info] p50 =  18 ms
[info] p90 =  32 ms
[info] p95 =  35 ms
[info] p98 =  45 ms
[info] p99 = 303 ms
[info] ================================================================================
[success] Total time: 64 s, completed 03/11/2018 11:06:18 PM

and if I don't call .close() on the first context, the results look like this:

[info] ================================================================================
[info] Warming up (10000) ...
[info] Benchmarking (100) ...
[info] p50 =   1 ms
[info] p90 =   1 ms
[info] p95 =   1 ms
[info] p98 =   1 ms
[info] p99 =   2 ms
[info] ================================================================================
[info] Benchmarking (100) ...
[info] p50 =  17 ms
[info] p90 =  38 ms
[info] p95 =  43 ms
[info] p98 =  68 ms
[info] p99 = 440 ms
[info] ================================================================================
[success] Total time: 66 s, completed 03/11/2018 11:08:45 PM

japgolly · 2018-11-04T21:59:56Z

FYI I wrote some JMH benchmarks to test different combinations of warmup contexts and reps. Warmup effectiveness results are here, and I also did a bit of measurement around time to perform warmup here.

karthickpdy · 2018-12-06T05:23:42Z

We are also looking to migrate from Nashorn to Graal and this is one of our biggest blockers. In nashorn we could compile just once and use the same object across multiple threads. You can argue that its not thread safe, but most of our functions have no side effects and this does not pose much problems for us. But in Graal we have to initialize the context per thread and pay the warmup costs.
We would need the ability to share warmed up objects across contexts. This would be a big blocker for us and we have to take a call based on this feature. Is this being actively worked on?

wirthi · 2019-03-08T10:18:21Z

Hi @karthickpdy,

did you follow the description in http://www.graalvm.org/docs/graalvm-as-a-platform/embed/#enable-source-caching ? As described above, multi-threading might have an impact on warmup and the first few iterations (as our compiler is competing for CPU resources with other threads), but the source caching and thus reuse of compiled methods across our Contexts should work regardless of multi-threading (at least in principle, depending on the actual code being compiled).

-- Christian

wirthi · 2020-11-18T12:37:07Z

I don't think there is anything open here. Documentation on Source Code caching is at https://www.graalvm.org/reference-manual/embed-languages/#code-caching-across-multiple-contexts

Please reopen this ticket with a clarifying question or open a new ticket if anything is unclear there.

japgolly changed the title ~~Despite sharing an Engine instance, each Context needs warming up~~ Provide alternative to warming up multiple identical Contexts Oct 31, 2018

chumer self-assigned this Oct 31, 2018

chumer added the enhancement New feature or request label Oct 31, 2018

chumer assigned woess Nov 1, 2018

ralleman mentioned this issue Feb 26, 2019

Sharing precompiled Source objects between contexts oracle/graal#1013

Closed

wirthi added the performance Performance of the engine (peak or warmup) label Mar 8, 2019

hashtag-smashtag mentioned this issue Apr 11, 2019

Unremovable Javascript Bindings #146

Closed

wirthi closed this as completed Nov 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide alternative to warming up multiple identical Contexts #67

Provide alternative to warming up multiple identical Contexts #67

japgolly commented Oct 31, 2018

chumer commented Oct 31, 2018 •

edited

Loading

japgolly commented Nov 1, 2018

chumer commented Nov 1, 2018 •

edited

Loading

japgolly commented Nov 1, 2018 via email

japgolly commented Nov 1, 2018 via email

ghost commented Nov 1, 2018 •

edited by ghost

Loading

wirthi commented Nov 2, 2018

ghost commented Nov 2, 2018

ghost commented Nov 2, 2018

japgolly commented Nov 3, 2018

japgolly commented Nov 4, 2018 •

edited

Loading

karthickpdy commented Dec 6, 2018

wirthi commented Mar 8, 2019

wirthi commented Nov 18, 2020

Provide alternative to warming up multiple identical Contexts #67

Provide alternative to warming up multiple identical Contexts #67

Comments

japgolly commented Oct 31, 2018

chumer commented Oct 31, 2018 • edited Loading

japgolly commented Nov 1, 2018

chumer commented Nov 1, 2018 • edited Loading

japgolly commented Nov 1, 2018 via email

japgolly commented Nov 1, 2018 via email

ghost commented Nov 1, 2018 • edited by ghost Loading

wirthi commented Nov 2, 2018

ghost commented Nov 2, 2018

ghost commented Nov 2, 2018

japgolly commented Nov 3, 2018

japgolly commented Nov 4, 2018 • edited Loading

karthickpdy commented Dec 6, 2018

wirthi commented Mar 8, 2019

wirthi commented Nov 18, 2020

chumer commented Oct 31, 2018 •

edited

Loading

chumer commented Nov 1, 2018 •

edited

Loading

ghost commented Nov 1, 2018 •

edited by ghost

Loading

japgolly commented Nov 4, 2018 •

edited

Loading