-
Notifications
You must be signed in to change notification settings - Fork 28
Benchmark
Kazuaki Ishizaki edited this page Feb 3, 2016
·
10 revisions
To show the efficiency of our column-based RDD, we measure performance with/without GPU by running a simple logistic regression program that uses map() and reduce().
We achieved 3.15x performance improvement of logistic regression (SparkGPULR) in examples on a 16-thread IvyBridge box with an NVIDIA K40 GPU card over that with no GPU card. We still have rooms to improve performance (e.g. eliminate data copy between map() and reduce())
Spark code for non-GPU version
Spark code for GPU version, CUDA code
N=1000000
D=400
ITERATIONS=5
Slices=128 (w/o GPU), 16 (with GPU)
MASTER=local[8] (w/o GPU), local[8] (with GPU)
Machine: nx360 M4, 2 sockets 8-core Intel Xeon E5-2667 3.3GHz, 256GB memory, with one NVIDIA K40m card
OS: RedHat 6.6
CUDA: 7.0
Java: IBM Java8 pxa6480sr2-20151023_01(SR2)
Spark version: https://github.com/kiszk/spark-gpu/commit/34e9b75c0cab297ed7feb8aef7072164b6a5972c
spark-env.sh
JAVA_HOME=/u/ishizaki/ibm-java-x86_64-802
CUDA_DEVICE_MAX_CONNECTION=32
CUDA_VISIBLE_DEVICES=0
spark-default.conf
spark.driver.extraJavaOptions -Xmn96g -Xgcthreads8 -Xdump:system:none -Xdump:heap:none -Xtrace:none -Xnoloa -Xdisableexplicitgc
spark.eventLog.enabled true
spark.eventLog.dir file:///tmp/eventlog-ishizaki
spark.history.fs.logDirectory file:///tmp/eventlog-ishizaki
spark.driver.cores 16
spark.driver.memory 144g
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryoserializer.buffer.max 512m
spark.akka.frameSize 1024
spark.history.ui.port 18080
non-GPU version
$ MASTER='local[8]' bin/run-example SparkLR 128 1000000 400 5
GPU version
$ MASTER='local[8]' bin/run-example SparkGPULR 16 1000000 400 5