Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide alternative to warming up multiple identical Contexts #67

Closed
japgolly opened this issue Oct 31, 2018 · 14 comments
Closed

Provide alternative to warming up multiple identical Contexts #67

japgolly opened this issue Oct 31, 2018 · 14 comments
Assignees
Labels
enhancement New feature or request performance Performance of the engine (peak or warmup)

Comments

@japgolly
Copy link

I'm planning to create a pool of Contexts that will all do the same thing, that I warm up for ~30 sec first.

The problem is even though I'm sharing the same Engine between the Contexts and evaluating the same Sources in each in the same order, context B doesn't benefit from context A's warmup. I was thinking the Engine is where all of the JIT memory would be and warming up one context would warm up the engine thus having the effect of all contexts being warmed up.

Is this as designed?

If so, what options are there for sharing optimisation between identical contexts? Being used to FP and everything being immutable, an idea I had was to create and warm up one Context, then copy it to create the other Contexts (from which point they could diverge) but I didn't see such an option. Is there something similar available?

Even better would be to warmup a Context then be able to serialise it! That would allow us to create one at build-time, include it in our applications docker builds, then on startup simply deserialise it mulltiple time to not even need warmup on application startup.

What do you think?

@japgolly japgolly changed the title Despite sharing an Engine instance, each Context needs warming up Provide alternative to warming up multiple identical Contexts Oct 31, 2018
@chumer
Copy link
Member

chumer commented Oct 31, 2018

Thank you for this excellent question.

Is this as designed?

It currently depends on the language how much of the code is reused for multiple contexts per engine. Currently for Graal.JS the ASTs and code cache is only shared if the previous context was closed. This is currently necessary as we are not fully confident to share the ASTs with multiple threads.

To clarify, this one shares the code:

Engine engine = Engine.create();
Source source = ...;
try(Context context = Context.newBuilder().engine(engine).build()) {
  context.eval(source);
}

And, this one doesn't atm:

Engine engine = Engine.create();
Source source = ...;
Context c0 = Context.newBuilder().engine(engine).build();
c0.eval(source);
Context c1 = Context.newBuilder().engine(engine).build();
c1.eval(source);
c0.close();
c1.close();

I agree the current situation is not ideal. This is what we will do to improve the situation:

  1. Short term (1-2 months): We will allow to bind a context to a single thread only. This will allow us to share code for multiple contexts, if they are used on the same thread.
  2. Mid term (~6months): We will allow to share code from any thread for Graal.JS.
  3. Long term(TBA): support serialization of SVM (native-image) isolates (that is very far out).

Would 1) help you already? Or do you need your contexts on multiple threads?

@chumer chumer self-assigned this Oct 31, 2018
@chumer chumer added the enhancement New feature or request label Oct 31, 2018
@japgolly
Copy link
Author

japgolly commented Nov 1, 2018

Hi @chumer, thanks a lot for that info!

Currently for Graal.JS the ASTs and code cache is only shared if the previous context was closed

Ok great, I gave something similar to the below a try and it didn't perform the way I expected. I expected that the eval in the second try block would be very fast but it was still very slow as if it hadn't been warmed up yet.

Engine engine = Engine.create();
Source source = ...;
try(Context context = Context.newBuilder().engine(engine).build()) {
  for (i=0; i<10000; i++) context.eval(source);
  context.eval(source); // <-- at this point, eval takes ~5 ms
}
try(Context context = Context.newBuilder().engine(engine).build()) {
  context.eval(source); // <-- at this point, eval takes 300+ ms
}

Would 1) help you already? Or do you need your contexts on multiple threads?

I don't think that would help. My goal is to have a number of different contexts, each bound to a specific thread. In other words, I don't want to share contexts between threads, I'd just like to create one context, warm it up, create warm copies, then allocate it and the copies to a thread each. In simple sample code I'm envisioning something like this:

Engine engine = Engine.create()

Context context1 = Context.newBuilder().engine(engine).build();
warmup(context1);

Context context2 = context1.copy();
Context context3 = context1.copy();

Context[] contextsPerThread = new Array(context1, context2, context3);

Where as at the moment I need to do something like this:

Engine engine = Engine.create()

Context context1 = Context.newBuilder().engine(engine).build();
warmup(context1);

Context context2 = Context.newBuilder().engine(engine).build();
warmup(context2);

Context context3 = Context.newBuilder().engine(engine).build();
warmup(context3);

Context[] contextsPerThread = new Array(context1, context2, context3);

(Oh and btw, I absolutely ❤️love ❤️ the work that you all have done on Graal! It's amazing! Thank you so much for all you have done and will continue to do, it's much appreciated!)

@chumer
Copy link
Member

chumer commented Nov 1, 2018

We will have a look at your example. Just to clarify which GraalVM version are you using?

@japgolly
Copy link
Author

japgolly commented Nov 1, 2018 via email

@japgolly
Copy link
Author

japgolly commented Nov 1, 2018 via email

@ghost
Copy link

ghost commented Nov 1, 2018

Looks like the warmup overhead disappears after a few contexts..

public class Warmup {

  private static final int LOOPS = 6;
  private static final int REPS = 100;

  private static final String source =
      "function showProps(obj, max) {" +
      "  for (var i=0; i<max; i++) {" +
      "    for (var prop in obj) {" +
      "      let x = prop + ' : ' + obj[prop];" +
      "    }" +
      "  }" +
      "};" +
      "var obj = {name: 'Graaljs', lang: 'javascript', doStuff: function(x,y,z) {var dummy = x + y + z}};" +
      "showProps(obj, 1000)";

  public static void main(String[] args) throws IOException {
    new Warmup().start(source);
  }

  private void start(String script) throws IOException {
    int contextNumber = 0;
    Source source = Source.newBuilder("js", script, "somecript").build();
    try (Engine e = Engine.create()) {
      while (contextNumber++ < 25) {
        try (Context context = Context.newBuilder("js").engine(e).build()) {
          System.out.println("Next Context: " + contextNumber);
          timeExecution(LOOPS, REPS, () -> {
            context.eval(source);
          });
        }
      }
    }
  }

  private void timeExecution(int loops, int repeats, Runnable code) {
    for (int i = 0; i < loops; i++) {
      long t1 = System.currentTimeMillis();
      for (int j = 0; j < repeats; j++) {
        code.run();
      }
      System.out.println((System.currentTimeMillis() - t1) + "ms");
    }
  }

}

After the 6th context, see a sudden speedup.
But.. if we let the js code loop more times by using showProps(obj, 10000), it appears that the warmup overhead is gone, but also the overall execution is a bit slower..

And this is on windows, openjdk11 with the rc8 graal/truffle/graaljs jars..

@wirthi
Copy link
Member

wirthi commented Nov 2, 2018

Hi @hanzr

What do you mean with "overall execution is a bit slower"?

With REPS=100, I get a peak of "73ms" on my machine; increasing that to REPS=10000, the peak (after warmup) is around "3950ms", which is almost 2x faster per REP.

But what I also see is that with REPS=10000 the first set of 6 iterations are faster than the second set of 6 iterations (at least, excluding the first iteration of the set for both):

Next Context: 1
6735ms
4006ms
3881ms
3870ms
3837ms
3879ms
Next Context: 2
5703ms
4432ms
4384ms
4432ms
4483ms
4433ms

The reason for that seems to be that in the first set, we speculate on having only one Context, while in the second set, this assumption does not hold any longer. With graal.TraceTruffleCompilation enabled, this invalidation is triggered at the start of the second set:

[truffle] opt invalidated @1a64e0c1 |SourceClass OptimizedAssumption |Source Assumption(valid, name=Single Context) |Reason assumption invalidated

@chumer can go into more detail, but I think this drop is somewhat expected.

@ghost
Copy link

ghost commented Nov 2, 2018

Hi Christian,

I'm getting these numbers for REPS=100 and showProps(obj, 10000):

Next Context: 5
1166ms
538ms
559ms
576ms
559ms
546ms
Next Context: 6
1223ms
739ms
736ms
739ms
739ms
740ms
Next Context: 7
740ms
734ms
730ms
727ms
741ms
738ms

For context 6, the timings (after the first one) are slower than those of the earlier contexts. Then for context 7, there seems to be no more slow first timing (no more warmup?), and times remain pretty constant after that.
With graal.TraceTruffleCompilation enabled, context 6 is the last that shows any logging..

@ghost
Copy link

ghost commented Nov 2, 2018

It just occured to me that in my example, I'm closing the contexts after use. When I don't, every new context keeps the initial warmup overhead as @japgolly states.

@chumer Could you elaborate a bit more on the improvements 1&2 you mention above?

@japgolly
Copy link
Author

japgolly commented Nov 3, 2018

Hi. I've created a reproduction repo that you can run that demonstrates the kind of results I'm seeing.

To reproduce, install SBT (Scala Build Tool), check out https://github.com/japgolly/misc/tree/graaljs-67 and type sbt run. The main source code is here.

What it does is,

  1. Create a Context, warm it up by running eval 10000x, then benchmark 100x
  2. Create a Context using the same Engine, and repeat the benchmark 100x (without warmup).

Results are:

[info] ================================================================================
[info] Warming up (10000) ...
[info] Benchmarking (100) ...
[info] p50 =   1 ms
[info] p90 =   1 ms
[info] p95 =   1 ms
[info] p98 =   2 ms
[info] p99 =   2 ms
[info] ================================================================================
[info] Benchmarking (100) ...
[info] p50 =  18 ms
[info] p90 =  32 ms
[info] p95 =  35 ms
[info] p98 =  45 ms
[info] p99 = 303 ms
[info] ================================================================================
[success] Total time: 64 s, completed 03/11/2018 11:06:18 PM

and if I don't call .close() on the first context, the results look like this:

[info] ================================================================================
[info] Warming up (10000) ...
[info] Benchmarking (100) ...
[info] p50 =   1 ms
[info] p90 =   1 ms
[info] p95 =   1 ms
[info] p98 =   1 ms
[info] p99 =   2 ms
[info] ================================================================================
[info] Benchmarking (100) ...
[info] p50 =  17 ms
[info] p90 =  38 ms
[info] p95 =  43 ms
[info] p98 =  68 ms
[info] p99 = 440 ms
[info] ================================================================================
[success] Total time: 66 s, completed 03/11/2018 11:08:45 PM

@japgolly
Copy link
Author

japgolly commented Nov 4, 2018

FYI I wrote some JMH benchmarks to test different combinations of warmup contexts and reps. Warmup effectiveness results are here, and I also did a bit of measurement around time to perform warmup here.

@karthickpdy
Copy link

We are also looking to migrate from Nashorn to Graal and this is one of our biggest blockers. In nashorn we could compile just once and use the same object across multiple threads. You can argue that its not thread safe, but most of our functions have no side effects and this does not pose much problems for us. But in Graal we have to initialize the context per thread and pay the warmup costs.
We would need the ability to share warmed up objects across contexts. This would be a big blocker for us and we have to take a call based on this feature. Is this being actively worked on?

@wirthi
Copy link
Member

wirthi commented Mar 8, 2019

Hi @karthickpdy,

did you follow the description in http://www.graalvm.org/docs/graalvm-as-a-platform/embed/#enable-source-caching ? As described above, multi-threading might have an impact on warmup and the first few iterations (as our compiler is competing for CPU resources with other threads), but the source caching and thus reuse of compiled methods across our Contexts should work regardless of multi-threading (at least in principle, depending on the actual code being compiled).

-- Christian

@wirthi wirthi added the performance Performance of the engine (peak or warmup) label Mar 8, 2019
@wirthi
Copy link
Member

wirthi commented Nov 18, 2020

I don't think there is anything open here. Documentation on Source Code caching is at https://www.graalvm.org/reference-manual/embed-languages/#code-caching-across-multiple-contexts

Please reopen this ticket with a clarifying question or open a new ticket if anything is unclear there.

@wirthi wirthi closed this as completed Nov 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Performance of the engine (peak or warmup)
Projects
None yet
Development

No branches or pull requests

5 participants