-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace interpreter with partial evaluation #38
base: master
Are you sure you want to change the base?
Conversation
Comes with a whole bunch of other major changes <3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrmr, this looks very encouraging. Thank you for working on this!
I can't see anything obvious which would break - I guess my main concern would be stack overflows, but I don't think that applies here.
Is there anything obvious breaking with the coroutine tests? They're probably the ones which are the most finicky.
final int i = code[pc]; | ||
final int a = (i >> POS_A) & MAXARG_A; | ||
|
||
final Function<UnwindableRunnable, UnwindableCallable<EvalCont>> cont = continuation(state, pc, p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrmr, why the indirection here, rather than just doing continuation(state, pc, p, di -> {...})
at each point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried removing the indirection in ad7b12b, then reverted the commit. Avoiding the indirection worsened the performance of PerformanceTest
, so I ran the comprehensive PerformanceBenchmark
for both approaches.
With indirection:
Benchmark Mode Cnt Score Error Units
PerformanceBenchmark.binarytrees thrpt 15 7.128 ± 0.084 ops/s
PerformanceBenchmark.fannkuch thrpt 15 11.447 ± 0.066 ops/s
PerformanceBenchmark.nbody thrpt 15 1.213 ± 0.016 ops/s
PerformanceBenchmark.nsieve thrpt 15 0.531 ± 0.018 ops/s
Without indirection:
Benchmark Mode Cnt Score Error Units
PerformanceBenchmark.binarytrees thrpt 15 6.238 ± 0.131 ops/s
PerformanceBenchmark.fannkuch thrpt 15 9.525 ± 0.124 ops/s
PerformanceBenchmark.nbody thrpt 15 1.215 ± 0.011 ops/s
PerformanceBenchmark.nsieve thrpt 15 0.556 ± 0.013 ops/s
I didn't run PerformanceTest
with trace output. It would be interesting to see diffs of machine code generated for each version of partialEvalStep()
. I typically use these VM flags to print JITted functions: -ea -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:PrintAssemblyOptions=intel,mpad=10,cpad=10,code -XX:CompileCommand=print,*LuaInterpreter.*
. I'd investigate this further myself, but fixing the remaining test cases has a higher priority.
Here are the full benchmark logs, if anyone's interested:
PerformanceBenchmark
: with indirection
/usr/lib/jvm/java-8-openjdk/bin/java -Dvisualvm.id=16506306300690 -javaagent:/home/viluon/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-1/201.6668.121/lib/idea_rt.jar=44903:/home/viluon/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-1/201.6668.121/bin -Dfile.encoding=UTF-8 -classpath /usr/lib/jvm/java-8-openjdk/jre/lib/charsets.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/ext/cldrdata.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/ext/dnsns.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/ext/jaccess.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/ext/localedata.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/ext/nashorn.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/ext/sunec.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/ext/sunjce_provider.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/ext/sunpkcs11.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/ext/zipfs.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/jce.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/jsse.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/management-agent.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/resources.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/rt.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/ext/jfxrt.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/jfxswt.jar:/home/viluon/Projects/Java/Cobalt/out/test/classes:/home/viluon/Projects/Java/Cobalt/out/test/resources:/home/viluon/Projects/Java/Cobalt/out/production/classes:/home/viluon/.gradle/caches/modules-2/files-2.1/org.openjdk.jmh/jmh-core/1.23/eb242d3261f3795c8bf09818d17c3241191284a0/jmh-core-1.23.jar:/home/viluon/.gradle/caches/modules-2/files-2.1/org.junit.jupiter/junit-jupiter-params/5.6.0/b28e078d4e8424de01df02ec92410d225e5d6444/junit-jupiter-params-5.6.0.jar:/home/viluon/.gradle/caches/modules-2/files-2.1/org.junit.jupiter/junit-jupiter-api/5.6.0/f29e6318333d2303ce4965c9819cfad08de7d1e5/junit-jupiter-api-5.6.0.jar:/home/viluon/.gradle/caches/modules-2/files-2.1/org.hamcrest/hamcrest-library/2.2/cf530c8a0bc993487c64e940ae639bb4a6104dc6/hamcrest-library-2.2.jar:/home/viluon/.gradle/caches/modules-2/files-2.1/net.sf.jopt-simple/jopt-simple/4.6/306816fb57cf94f108a43c95731b08934dcae15c/jopt-simple-4.6.jar:/home/viluon/.gradle/caches/modules-2/files-2.1/org.apache.commons/commons-math3/3.2/ec2544ab27e110d2d431bdad7d538ed509b21e62/commons-math3-3.2.jar:/home/viluon/.gradle/caches/modules-2/files-2.1/org.apiguardian/apiguardian-api/1.1.0/fc9dff4bb36d627bdc553de77e1f17efd790876c/apiguardian-api-1.1.0.jar:/home/viluon/.gradle/caches/modules-2/files-2.1/org.junit.platform/junit-platform-commons/1.6.0/b0a75795cf03841d4f9cc54099557baffc11c727/junit-platform-commons-1.6.0.jar:/home/viluon/.gradle/caches/modules-2/files-2.1/org.opentest4j/opentest4j/1.2.0/28c11eb91f9b6d8e200631d46e20a7f407f2a046/opentest4j-1.2.0.jar:/home/viluon/.gradle/caches/modules-2/files-2.1/org.hamcrest/hamcrest-core/2.2/3f2bd07716a31c395e2837254f37f21f0f0ab24b/hamcrest-core-2.2.jar:/home/viluon/.gradle/caches/modules-2/files-2.1/org.hamcrest/hamcrest/2.2/1820c0968dba3a11a1b30669bb1f01978a91dedc/hamcrest-2.2.jar:/home/viluon/.gradle/caches/modules-2/files-2.1/org.junit.jupiter/junit-jupiter-engine/5.6.0/83c9e737f6015d9e00029b9b1d51e952a884b8f9/junit-jupiter-engine-5.6.0.jar:/home/viluon/.gradle/caches/modules-2/files-2.1/org.junit.platform/junit-platform-engine/1.6.0/a3a6ec96c010875444b3ca31828108093758ec00/junit-platform-engine-1.6.0.jar org.squiddev.cobalt.PerformanceBenchmark
# JMH version: 1.23
# VM version: JDK 1.8.0_242, OpenJDK 64-Bit Server VM, 25.242-b08
# VM invoker: /usr/lib/jvm/java-8-openjdk/jre/bin/java
# VM options: -server -Dvisualvm.id=16506306300690 -javaagent:/home/viluon/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-1/201.6668.121/lib/idea_rt.jar=44903:/home/viluon/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-1/201.6668.121/bin -Dfile.encoding=UTF-8 -server -disablesystemassertions
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 12000 ms each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.squiddev.cobalt.PerformanceBenchmark.binarytrees
# Run progress: 0.00% complete, ETA 00:22:00
# Fork: 1 of 3
# Warmup Iteration 1: 6.083 ops/s
# Warmup Iteration 2: 6.691 ops/s
# Warmup Iteration 3: 6.879 ops/s
# Warmup Iteration 4: 6.863 ops/s
# Warmup Iteration 5: 6.598 ops/s
Iteration 1: 6.989 ops/s
Iteration 2: 7.088 ops/s
Iteration 3: 7.020 ops/s
Iteration 4: 7.075 ops/s
Iteration 5: 7.048 ops/s
# Run progress: 8.33% complete, ETA 00:20:28
# Fork: 2 of 3
# Warmup Iteration 1: 6.639 ops/s
# Warmup Iteration 2: 7.099 ops/s
# Warmup Iteration 3: 7.162 ops/s
# Warmup Iteration 4: 7.156 ops/s
# Warmup Iteration 5: 7.145 ops/s
Iteration 1: 7.205 ops/s
Iteration 2: 7.185 ops/s
Iteration 3: 7.220 ops/s
Iteration 4: 7.190 ops/s
Iteration 5: 7.262 ops/s
# Run progress: 16.67% complete, ETA 00:18:36
# Fork: 3 of 3
# Warmup Iteration 1: 6.613 ops/s
# Warmup Iteration 2: 7.060 ops/s
# Warmup Iteration 3: 7.000 ops/s
# Warmup Iteration 4: 7.065 ops/s
# Warmup Iteration 5: 7.146 ops/s
Iteration 1: 7.136 ops/s
Iteration 2: 7.070 ops/s
Iteration 3: 7.176 ops/s
Iteration 4: 7.132 ops/s
Iteration 5: 7.124 ops/s
Result "org.squiddev.cobalt.PerformanceBenchmark.binarytrees":
7.128 ±(99.9%) 0.084 ops/s [Average]
(min, avg, max) = (6.989, 7.128, 7.262), stdev = 0.079
CI (99.9%): [7.044, 7.212] (assumes normal distribution)
# JMH version: 1.23
# VM version: JDK 1.8.0_242, OpenJDK 64-Bit Server VM, 25.242-b08
# VM invoker: /usr/lib/jvm/java-8-openjdk/jre/bin/java
# VM options: -server -Dvisualvm.id=16506306300690 -javaagent:/home/viluon/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-1/201.6668.121/lib/idea_rt.jar=44903:/home/viluon/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-1/201.6668.121/bin -Dfile.encoding=UTF-8 -server -disablesystemassertions
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 12000 ms each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.squiddev.cobalt.PerformanceBenchmark.fannkuch
# Run progress: 25.00% complete, ETA 00:16:44
# Fork: 1 of 3
# Warmup Iteration 1: 10.316 ops/s
# Warmup Iteration 2: 11.095 ops/s
# Warmup Iteration 3: 11.299 ops/s
# Warmup Iteration 4: 11.274 ops/s
# Warmup Iteration 5: 11.291 ops/s
Iteration 1: 11.532 ops/s
Iteration 2: 11.451 ops/s
Iteration 3: 11.402 ops/s
Iteration 4: 11.449 ops/s
Iteration 5: 11.384 ops/s
# Run progress: 33.33% complete, ETA 00:14:51
# Fork: 2 of 3
# Warmup Iteration 1: 10.592 ops/s
# Warmup Iteration 2: 11.093 ops/s
# Warmup Iteration 3: 11.233 ops/s
# Warmup Iteration 4: 10.886 ops/s
# Warmup Iteration 5: 11.458 ops/s
Iteration 1: 11.447 ops/s
Iteration 2: 11.450 ops/s
Iteration 3: 11.380 ops/s
Iteration 4: 11.427 ops/s
Iteration 5: 11.433 ops/s
# Run progress: 41.67% complete, ETA 00:13:00
# Fork: 3 of 3
# Warmup Iteration 1: 10.295 ops/s
# Warmup Iteration 2: 11.432 ops/s
# Warmup Iteration 3: 11.477 ops/s
# Warmup Iteration 4: 11.423 ops/s
# Warmup Iteration 5: 11.521 ops/s
Iteration 1: 11.542 ops/s
Iteration 2: 11.521 ops/s
Iteration 3: 11.458 ops/s
Iteration 4: 11.317 ops/s
Iteration 5: 11.508 ops/s
Result "org.squiddev.cobalt.PerformanceBenchmark.fannkuch":
11.447 ±(99.9%) 0.066 ops/s [Average]
(min, avg, max) = (11.317, 11.447, 11.542), stdev = 0.062
CI (99.9%): [11.381, 11.513] (assumes normal distribution)
# JMH version: 1.23
# VM version: JDK 1.8.0_242, OpenJDK 64-Bit Server VM, 25.242-b08
# VM invoker: /usr/lib/jvm/java-8-openjdk/jre/bin/java
# VM options: -server -Dvisualvm.id=16506306300690 -javaagent:/home/viluon/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-1/201.6668.121/lib/idea_rt.jar=44903:/home/viluon/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-1/201.6668.121/bin -Dfile.encoding=UTF-8 -server -disablesystemassertions
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 12000 ms each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.squiddev.cobalt.PerformanceBenchmark.nbody
# Run progress: 50.00% complete, ETA 00:11:08
# Fork: 1 of 3
# Warmup Iteration 1: 1.127 ops/s
# Warmup Iteration 2: 1.161 ops/s
# Warmup Iteration 3: 1.208 ops/s
# Warmup Iteration 4: 1.194 ops/s
# Warmup Iteration 5: 1.193 ops/s
Iteration 1: 1.201 ops/s
Iteration 2: 1.207 ops/s
Iteration 3: 1.204 ops/s
Iteration 4: 1.192 ops/s
Iteration 5: 1.204 ops/s
# Run progress: 58.33% complete, ETA 00:09:19
# Fork: 2 of 3
# Warmup Iteration 1: 1.146 ops/s
# Warmup Iteration 2: 1.202 ops/s
# Warmup Iteration 3: 1.203 ops/s
# Warmup Iteration 4: 1.207 ops/s
# Warmup Iteration 5: 1.192 ops/s
Iteration 1: 1.206 ops/s
Iteration 2: 1.211 ops/s
Iteration 3: 1.208 ops/s
Iteration 4: 1.195 ops/s
Iteration 5: 1.203 ops/s
# Run progress: 66.67% complete, ETA 00:07:29
# Fork: 3 of 3
# Warmup Iteration 1: 1.186 ops/s
# Warmup Iteration 2: 1.183 ops/s
# Warmup Iteration 3: 1.221 ops/s
# Warmup Iteration 4: 1.228 ops/s
# Warmup Iteration 5: 1.231 ops/s
Iteration 1: 1.228 ops/s
Iteration 2: 1.227 ops/s
Iteration 3: 1.238 ops/s
Iteration 4: 1.226 ops/s
Iteration 5: 1.238 ops/s
Result "org.squiddev.cobalt.PerformanceBenchmark.nbody":
1.213 ±(99.9%) 0.016 ops/s [Average]
(min, avg, max) = (1.192, 1.213, 1.238), stdev = 0.015
CI (99.9%): [1.197, 1.229] (assumes normal distribution)
# JMH version: 1.23
# VM version: JDK 1.8.0_242, OpenJDK 64-Bit Server VM, 25.242-b08
# VM invoker: /usr/lib/jvm/java-8-openjdk/jre/bin/java
# VM options: -server -Dvisualvm.id=16506306300690 -javaagent:/home/viluon/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-1/201.6668.121/lib/idea_rt.jar=44903:/home/viluon/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-1/201.6668.121/bin -Dfile.encoding=UTF-8 -server -disablesystemassertions
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 12000 ms each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.squiddev.cobalt.PerformanceBenchmark.nsieve
# Run progress: 75.00% complete, ETA 00:05:37
# Fork: 1 of 3
# Warmup Iteration 1: 0.469 ops/s
# Warmup Iteration 2: 0.500 ops/s
# Warmup Iteration 3: 0.515 ops/s
# Warmup Iteration 4: 0.508 ops/s
# Warmup Iteration 5: 0.489 ops/s
Iteration 1: 0.505 ops/s
Iteration 2: 0.508 ops/s
Iteration 3: 0.506 ops/s
Iteration 4: 0.512 ops/s
Iteration 5: 0.512 ops/s
# Run progress: 83.33% complete, ETA 00:03:47
# Fork: 2 of 3
# Warmup Iteration 1: 0.493 ops/s
# Warmup Iteration 2: 0.515 ops/s
# Warmup Iteration 3: 0.539 ops/s
# Warmup Iteration 4: 0.546 ops/s
# Warmup Iteration 5: 0.507 ops/s
Iteration 1: 0.540 ops/s
Iteration 2: 0.538 ops/s
Iteration 3: 0.545 ops/s
Iteration 4: 0.539 ops/s
Iteration 5: 0.543 ops/s
# Run progress: 91.67% complete, ETA 00:01:54
# Fork: 3 of 3
# Warmup Iteration 1: 0.492 ops/s
# Warmup Iteration 2: 0.526 ops/s
# Warmup Iteration 3: 0.541 ops/s
# Warmup Iteration 4: 0.520 ops/s
# Warmup Iteration 5: 0.546 ops/s
Iteration 1: 0.542 ops/s
Iteration 2: 0.544 ops/s
Iteration 3: 0.543 ops/s
Iteration 4: 0.546 ops/s
Iteration 5: 0.539 ops/s
Result "org.squiddev.cobalt.PerformanceBenchmark.nsieve":
0.531 ±(99.9%) 0.018 ops/s [Average]
(min, avg, max) = (0.505, 0.531, 0.546), stdev = 0.017
CI (99.9%): [0.513, 0.549] (assumes normal distribution)
# Run complete. Total time: 00:22:59
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
Benchmark Mode Cnt Score Error Units
PerformanceBenchmark.binarytrees thrpt 15 7.128 ± 0.084 ops/s
PerformanceBenchmark.fannkuch thrpt 15 11.447 ± 0.066 ops/s
PerformanceBenchmark.nbody thrpt 15 1.213 ± 0.016 ops/s
PerformanceBenchmark.nsieve thrpt 15 0.531 ± 0.018 ops/s
Process finished with exit code 0
PerformanceBenchmark
: without indirection
/usr/lib/jvm/java-8-openjdk/bin/java -Dvisualvm.id=14799730901152 -javaagent:/home/viluon/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-1/201.6668.121/lib/idea_rt.jar=39617:/home/viluon/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-1/201.6668.121/bin -Dfile.encoding=UTF-8 -classpath /usr/lib/jvm/java-8-openjdk/jre/lib/charsets.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/ext/cldrdata.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/ext/dnsns.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/ext/jaccess.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/ext/localedata.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/ext/nashorn.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/ext/sunec.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/ext/sunjce_provider.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/ext/sunpkcs11.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/ext/zipfs.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/jce.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/jsse.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/management-agent.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/resources.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/rt.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/ext/jfxrt.jar:/usr/lib/jvm/java-8-openjdk/jre/lib/jfxswt.jar:/home/viluon/Projects/Java/Cobalt/out/test/classes:/home/viluon/Projects/Java/Cobalt/out/test/resources:/home/viluon/Projects/Java/Cobalt/out/production/classes:/home/viluon/.gradle/caches/modules-2/files-2.1/org.openjdk.jmh/jmh-core/1.23/eb242d3261f3795c8bf09818d17c3241191284a0/jmh-core-1.23.jar:/home/viluon/.gradle/caches/modules-2/files-2.1/org.junit.jupiter/junit-jupiter-params/5.6.0/b28e078d4e8424de01df02ec92410d225e5d6444/junit-jupiter-params-5.6.0.jar:/home/viluon/.gradle/caches/modules-2/files-2.1/org.junit.jupiter/junit-jupiter-api/5.6.0/f29e6318333d2303ce4965c9819cfad08de7d1e5/junit-jupiter-api-5.6.0.jar:/home/viluon/.gradle/caches/modules-2/files-2.1/org.hamcrest/hamcrest-library/2.2/cf530c8a0bc993487c64e940ae639bb4a6104dc6/hamcrest-library-2.2.jar:/home/viluon/.gradle/caches/modules-2/files-2.1/net.sf.jopt-simple/jopt-simple/4.6/306816fb57cf94f108a43c95731b08934dcae15c/jopt-simple-4.6.jar:/home/viluon/.gradle/caches/modules-2/files-2.1/org.apache.commons/commons-math3/3.2/ec2544ab27e110d2d431bdad7d538ed509b21e62/commons-math3-3.2.jar:/home/viluon/.gradle/caches/modules-2/files-2.1/org.apiguardian/apiguardian-api/1.1.0/fc9dff4bb36d627bdc553de77e1f17efd790876c/apiguardian-api-1.1.0.jar:/home/viluon/.gradle/caches/modules-2/files-2.1/org.junit.platform/junit-platform-commons/1.6.0/b0a75795cf03841d4f9cc54099557baffc11c727/junit-platform-commons-1.6.0.jar:/home/viluon/.gradle/caches/modules-2/files-2.1/org.opentest4j/opentest4j/1.2.0/28c11eb91f9b6d8e200631d46e20a7f407f2a046/opentest4j-1.2.0.jar:/home/viluon/.gradle/caches/modules-2/files-2.1/org.hamcrest/hamcrest-core/2.2/3f2bd07716a31c395e2837254f37f21f0f0ab24b/hamcrest-core-2.2.jar:/home/viluon/.gradle/caches/modules-2/files-2.1/org.hamcrest/hamcrest/2.2/1820c0968dba3a11a1b30669bb1f01978a91dedc/hamcrest-2.2.jar:/home/viluon/.gradle/caches/modules-2/files-2.1/org.junit.jupiter/junit-jupiter-engine/5.6.0/83c9e737f6015d9e00029b9b1d51e952a884b8f9/junit-jupiter-engine-5.6.0.jar:/home/viluon/.gradle/caches/modules-2/files-2.1/org.junit.platform/junit-platform-engine/1.6.0/a3a6ec96c010875444b3ca31828108093758ec00/junit-platform-engine-1.6.0.jar org.squiddev.cobalt.PerformanceBenchmark
# JMH version: 1.23
# VM version: JDK 1.8.0_242, OpenJDK 64-Bit Server VM, 25.242-b08
# VM invoker: /usr/lib/jvm/java-8-openjdk/jre/bin/java
# VM options: -server -Dvisualvm.id=14799730901152 -javaagent:/home/viluon/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-1/201.6668.121/lib/idea_rt.jar=39617:/home/viluon/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-1/201.6668.121/bin -Dfile.encoding=UTF-8 -server -disablesystemassertions
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 12000 ms each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.squiddev.cobalt.PerformanceBenchmark.binarytrees
# Run progress: 0.00% complete, ETA 00:22:00
# Fork: 1 of 3
# Warmup Iteration 1: 5.490 ops/s
# Warmup Iteration 2: 6.030 ops/s
# Warmup Iteration 3: 6.142 ops/s
# Warmup Iteration 4: 6.067 ops/s
# Warmup Iteration 5: 5.253 ops/s
Iteration 1: 6.208 ops/s
Iteration 2: 6.338 ops/s
Iteration 3: 6.157 ops/s
Iteration 4: 6.241 ops/s
Iteration 5: 6.337 ops/s
# Run progress: 8.33% complete, ETA 00:20:32
# Fork: 2 of 3
# Warmup Iteration 1: 6.083 ops/s
# Warmup Iteration 2: 6.168 ops/s
# Warmup Iteration 3: 6.324 ops/s
# Warmup Iteration 4: 6.246 ops/s
# Warmup Iteration 5: 6.181 ops/s
Iteration 1: 6.132 ops/s
Iteration 2: 6.314 ops/s
Iteration 3: 6.228 ops/s
Iteration 4: 6.054 ops/s
Iteration 5: 5.943 ops/s
# Run progress: 16.67% complete, ETA 00:18:37
# Fork: 3 of 3
# Warmup Iteration 1: 5.962 ops/s
# Warmup Iteration 2: 6.098 ops/s
# Warmup Iteration 3: 6.094 ops/s
# Warmup Iteration 4: 6.220 ops/s
# Warmup Iteration 5: 6.251 ops/s
Iteration 1: 6.254 ops/s
Iteration 2: 6.360 ops/s
Iteration 3: 6.328 ops/s
Iteration 4: 6.362 ops/s
Iteration 5: 6.310 ops/s
Result "org.squiddev.cobalt.PerformanceBenchmark.binarytrees":
6.238 ±(99.9%) 0.131 ops/s [Average]
(min, avg, max) = (5.943, 6.238, 6.362), stdev = 0.122
CI (99.9%): [6.107, 6.368] (assumes normal distribution)
# JMH version: 1.23
# VM version: JDK 1.8.0_242, OpenJDK 64-Bit Server VM, 25.242-b08
# VM invoker: /usr/lib/jvm/java-8-openjdk/jre/bin/java
# VM options: -server -Dvisualvm.id=14799730901152 -javaagent:/home/viluon/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-1/201.6668.121/lib/idea_rt.jar=39617:/home/viluon/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-1/201.6668.121/bin -Dfile.encoding=UTF-8 -server -disablesystemassertions
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 12000 ms each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.squiddev.cobalt.PerformanceBenchmark.fannkuch
# Run progress: 25.00% complete, ETA 00:16:45
# Fork: 1 of 3
# Warmup Iteration 1: 8.939 ops/s
# Warmup Iteration 2: 9.437 ops/s
# Warmup Iteration 3: 9.519 ops/s
# Warmup Iteration 4: 9.477 ops/s
# Warmup Iteration 5: 9.386 ops/s
Iteration 1: 9.236 ops/s
Iteration 2: 9.578 ops/s
Iteration 3: 9.586 ops/s
Iteration 4: 9.489 ops/s
Iteration 5: 9.593 ops/s
# Run progress: 33.33% complete, ETA 00:14:53
# Fork: 2 of 3
# Warmup Iteration 1: 8.855 ops/s
# Warmup Iteration 2: 9.499 ops/s
# Warmup Iteration 3: 9.665 ops/s
# Warmup Iteration 4: 9.651 ops/s
# Warmup Iteration 5: 9.619 ops/s
Iteration 1: 9.708 ops/s
Iteration 2: 9.359 ops/s
Iteration 3: 9.490 ops/s
Iteration 4: 9.557 ops/s
Iteration 5: 9.654 ops/s
# Run progress: 41.67% complete, ETA 00:13:00
# Fork: 3 of 3
# Warmup Iteration 1: 8.762 ops/s
# Warmup Iteration 2: 9.452 ops/s
# Warmup Iteration 3: 9.354 ops/s
# Warmup Iteration 4: 9.498 ops/s
# Warmup Iteration 5: 9.596 ops/s
Iteration 1: 9.614 ops/s
Iteration 2: 9.510 ops/s
Iteration 3: 9.518 ops/s
Iteration 4: 9.474 ops/s
Iteration 5: 9.511 ops/s
Result "org.squiddev.cobalt.PerformanceBenchmark.fannkuch":
9.525 ±(99.9%) 0.124 ops/s [Average]
(min, avg, max) = (9.236, 9.525, 9.708), stdev = 0.116
CI (99.9%): [9.401, 9.649] (assumes normal distribution)
# JMH version: 1.23
# VM version: JDK 1.8.0_242, OpenJDK 64-Bit Server VM, 25.242-b08
# VM invoker: /usr/lib/jvm/java-8-openjdk/jre/bin/java
# VM options: -server -Dvisualvm.id=14799730901152 -javaagent:/home/viluon/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-1/201.6668.121/lib/idea_rt.jar=39617:/home/viluon/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-1/201.6668.121/bin -Dfile.encoding=UTF-8 -server -disablesystemassertions
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 12000 ms each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.squiddev.cobalt.PerformanceBenchmark.nbody
# Run progress: 50.00% complete, ETA 00:11:09
# Fork: 1 of 3
# Warmup Iteration 1: 1.182 ops/s
# Warmup Iteration 2: 1.187 ops/s
# Warmup Iteration 3: 1.227 ops/s
# Warmup Iteration 4: 1.230 ops/s
# Warmup Iteration 5: 1.219 ops/s
Iteration 1: 1.207 ops/s
Iteration 2: 1.224 ops/s
Iteration 3: 1.229 ops/s
Iteration 4: 1.224 ops/s
Iteration 5: 1.210 ops/s
# Run progress: 58.33% complete, ETA 00:09:19
# Fork: 2 of 3
# Warmup Iteration 1: 1.171 ops/s
# Warmup Iteration 2: 1.195 ops/s
# Warmup Iteration 3: 1.218 ops/s
# Warmup Iteration 4: 1.218 ops/s
# Warmup Iteration 5: 1.215 ops/s
Iteration 1: 1.228 ops/s
Iteration 2: 1.213 ops/s
Iteration 3: 1.223 ops/s
Iteration 4: 1.206 ops/s
Iteration 5: 1.206 ops/s
# Run progress: 66.67% complete, ETA 00:07:29
# Fork: 3 of 3
# Warmup Iteration 1: 1.168 ops/s
# Warmup Iteration 2: 1.177 ops/s
# Warmup Iteration 3: 1.151 ops/s
# Warmup Iteration 4: 1.195 ops/s
# Warmup Iteration 5: 1.207 ops/s
Iteration 1: 1.224 ops/s
Iteration 2: 1.196 ops/s
Iteration 3: 1.215 ops/s
Iteration 4: 1.208 ops/s
Iteration 5: 1.208 ops/s
Result "org.squiddev.cobalt.PerformanceBenchmark.nbody":
1.215 ±(99.9%) 0.011 ops/s [Average]
(min, avg, max) = (1.196, 1.215, 1.229), stdev = 0.010
CI (99.9%): [1.204, 1.225] (assumes normal distribution)
# JMH version: 1.23
# VM version: JDK 1.8.0_242, OpenJDK 64-Bit Server VM, 25.242-b08
# VM invoker: /usr/lib/jvm/java-8-openjdk/jre/bin/java
# VM options: -server -Dvisualvm.id=14799730901152 -javaagent:/home/viluon/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-1/201.6668.121/lib/idea_rt.jar=39617:/home/viluon/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-1/201.6668.121/bin -Dfile.encoding=UTF-8 -server -disablesystemassertions
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 12000 ms each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.squiddev.cobalt.PerformanceBenchmark.nsieve
# Run progress: 75.00% complete, ETA 00:05:37
# Fork: 1 of 3
# Warmup Iteration 1: 0.492 ops/s
# Warmup Iteration 2: 0.531 ops/s
# Warmup Iteration 3: 0.538 ops/s
# Warmup Iteration 4: 0.521 ops/s
# Warmup Iteration 5: 0.541 ops/s
Iteration 1: 0.538 ops/s
Iteration 2: 0.539 ops/s
Iteration 3: 0.539 ops/s
Iteration 4: 0.544 ops/s
Iteration 5: 0.538 ops/s
# Run progress: 83.33% complete, ETA 00:03:46
# Fork: 2 of 3
# Warmup Iteration 1: 0.514 ops/s
# Warmup Iteration 2: 0.550 ops/s
# Warmup Iteration 3: 0.563 ops/s
# Warmup Iteration 4: 0.550 ops/s
# Warmup Iteration 5: 0.559 ops/s
Iteration 1: 0.556 ops/s
Iteration 2: 0.568 ops/s
Iteration 3: 0.562 ops/s
Iteration 4: 0.562 ops/s
Iteration 5: 0.564 ops/s
# Run progress: 91.67% complete, ETA 00:01:53
# Fork: 3 of 3
# Warmup Iteration 1: 0.515 ops/s
# Warmup Iteration 2: 0.536 ops/s
# Warmup Iteration 3: 0.564 ops/s
# Warmup Iteration 4: 0.561 ops/s
# Warmup Iteration 5: 0.544 ops/s
Iteration 1: 0.567 ops/s
Iteration 2: 0.556 ops/s
Iteration 3: 0.567 ops/s
Iteration 4: 0.564 ops/s
Iteration 5: 0.569 ops/s
Result "org.squiddev.cobalt.PerformanceBenchmark.nsieve":
0.556 ±(99.9%) 0.013 ops/s [Average]
(min, avg, max) = (0.538, 0.556, 0.569), stdev = 0.012
CI (99.9%): [0.542, 0.569] (assumes normal distribution)
# Run complete. Total time: 00:22:49
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
Benchmark Mode Cnt Score Error Units
PerformanceBenchmark.binarytrees thrpt 15 6.238 ± 0.131 ops/s
PerformanceBenchmark.fannkuch thrpt 15 9.525 ± 0.124 ops/s
PerformanceBenchmark.nbody thrpt 15 1.215 ± 0.011 ops/s
PerformanceBenchmark.nsieve thrpt 15 0.556 ± 0.013 ops/s
Process finished with exit code 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's absurd. Why are JITs so terrible :D:.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may have noticed that the numbers above are lower than what's reported in the original PR. This is due to other changes in the evaluation strategy, I suspect the performance degradation is primarily caused by continuation object construction in each jump and some extra branching in partialEval()
. I won't optimise these away until I fix the remaining tests. The original optimisations served as a proof-of-concept to verify that this sort of development is worthwhile. More optimisations will come after the evaluator achieves correct semantics.
return new EvalCont(debugFrame, fn); | ||
} | ||
|
||
switch (i & (MASK_B | MASK_C)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to move this switch outside the function, or is it not worth it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Places like this one make me jealous of Scala's macros and Kotlin's inline functions. I'd love to optimise this together with the previous switch
, but the combinatorial explosion is real...
Gotta love how broken GH actions is at times. |
As of commit 5fd9b71, the performance is as follows:
Still faster than I'm marking this PR as "ready for review" -- more work has yet to be done, but now is the right time to dive into the code and point out potential problems which the test suite failed to catch. As usual, here's the full benchmark log.
|
src/main/java/org/squiddev/cobalt/function/UnwindableCallable.java
Outdated
Show resolved
Hide resolved
@@ -1,4 +1,8 @@ | |||
stretch tree of depth 7 check: -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm so confused why the output of this has changed. Is there an obvious change I've missed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PerformanceTest
and PerformanceBenchmark
never actually verify the correctness of the programs' output. When I added helpers.runComparisonTest(name)
to PerformanceTest
on master
, I was surprised to see that the test failed for every single benchmark -- the .out
files did not match the actual output. Since I wanted behaviour identical to master
, I replaced the test resources with the output I got from there.
@@ -148,7 +156,14 @@ private static DebugFrame setupCall(LuaState state, LuaInterpretedFunction funct | |||
return di; | |||
} | |||
|
|||
static Varargs execute(final LuaState state, DebugFrame di, LuaInterpretedFunction function) throws LuaError, UnwindThrowable { | |||
static Varargs execute(LuaState state, DebugFrame di, LuaInterpretedFunction function) throws LuaError, UnwindThrowable { | |||
if (true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrmr, I assume this should be doing something different. Are we doing to switch over to the partialEval
after a certain number of calls, or just ditch the interpreter entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's much use in trying to be clever about which parts of a function to interpret and which parts to partially evaluate. I keep this if
around to quickly test the performance of partialEval()
vs interpret()
and to verify that the two approaches still differ solely in these two functions (i.e. edits of other methods preserve semantics).
I'll remove the if
eventually. Would you like to keep the interpret()
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrmr. So the interpreter has a significantly lower memory cost than partialEval
- so it would be good to have some sort of enhancement capability.
However, getting the two to play well together seems painful. Ideally you'd want to be able to seamlessly call between interpreted and partialEval
functions, and I'm not sure how to do that quite yet.
This is looking really nice, thank you. I'll try to have a fiddle later today, maybe load the logs within JITWatch and see if there's anything obvious which is problematic. Just left a couple of comments and questions about some of the changes. I might cherry-pick out the |
The advertised refactor of
This is slightly faster than the previous benchmark in all cases except for
|
cont.programCounter = di.pc; | ||
function = cont.function; | ||
proto = function.getPrototype(); | ||
ds = DebugHandler.getDebugState(state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be needed - the DebugState
is constant for the current coroutine, and the interpreter shouldn't switch coroutines.
} | ||
} | ||
|
||
// FIXME beware: if a reference to initialFrame makes it into any of the lambdas it could introduce serious memory leaks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be fixed now? I assume it'd be quite easy to just pass the pc and Prototype
? There's also a bit of me which wonders if we want to shift this (and any partial evaluation stuff) to an external class?
@@ -148,7 +156,14 @@ private static DebugFrame setupCall(LuaState state, LuaInterpretedFunction funct | |||
return di; | |||
} | |||
|
|||
static Varargs execute(final LuaState state, DebugFrame di, LuaInterpretedFunction function) throws LuaError, UnwindThrowable { | |||
static Varargs execute(LuaState state, DebugFrame di, LuaInterpretedFunction function) throws LuaError, UnwindThrowable { | |||
if (true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrmr. So the interpreter has a significantly lower memory cost than partialEval
- so it would be good to have some sort of enhancement capability.
However, getting the two to play well together seems painful. Ideally you'd want to be able to seamlessly call between interpreted and partialEval
functions, and I'm not sure how to do that quite yet.
ac6810d is once again a controversial change. It seems to mildly improve performance of I had
Nevertheless, I ran the benchmark again, this time with no other resource-intensive programs in the background.
|
I ran
Any ideas on what to do about these discrepancies in performance?
|
I've absolutely no clue - I ran the benchmarks using the same setup as the above graph, and had a massive performance decay which is not in line with what's above. JMH is meant to be pretty good at eliminating this, so I'm really not sure. |
Approach
This PR implements the first Futamura projection in Cobalt by replacing Lua bytecode with Java lambdas. Partial evaluation reduces branching on hot code paths and unlocks optimisation opportunities for both maintainers and JVM's JIT compiler.
Status
I've added changes and fixes to pass Cobalt's test suite. Now that it's all green I'll focus on optimising the hell out of this.
Performance comparison
This is the result of
PerformanceBenchmark
using the interpreter:Here is the same benchmark executed with partial evaluation:
The full output of both benchmark runs follows.
PerformanceBenchmark
: interpretationPerformanceBenchmark
: partial evaluation