diff --git a/README.md b/README.md index 1f77aa3767..a94d2c7d69 100644 --- a/README.md +++ b/README.md @@ -46,30 +46,23 @@ The following chart shows the time it takes to run the 22 TPC-H queries against using a single executor with 8 cores. See the [Comet Benchmarking Guide](https://datafusion.apache.org/comet/contributor-guide/benchmarking.html) for details of the environment used for these benchmarks. -When using Comet, the overall run time is reduced from 615 seconds to 364 seconds, a 1.7x speedup, with query 1 -running 9x faster than Spark. +When using Comet, the overall run time is reduced from 640 seconds to 331 seconds, very close to a 2x speedup. -Running the same queries with DataFusion standalone (without Spark) using the same number of cores results in a 3.6x -speedup compared to Spark. +![](docs/source/_static/images/benchmark-results/0.5.0/tpch_allqueries.png) -Comet is not yet achieving full DataFusion speeds in all cases, but with future work we aim to provide a 2x-4x speedup -for a broader set of queries. +Here is a breakdown showing relative performance of Spark and Comet for each TPC-H query. -![](docs/source/_static/images/benchmark-results/0.4.0/tpch_allqueries.png) - -Here is a breakdown showing relative performance of Spark, Comet, and DataFusion for each TPC-H query. - -![](docs/source/_static/images/benchmark-results/0.4.0/tpch_queries_compare.png) +![](docs/source/_static/images/benchmark-results/0.5.0/tpch_queries_compare.png) The following charts shows how much Comet currently accelerates each query from the benchmark. ### Relative speedup -![](docs/source/_static/images/benchmark-results/0.4.0/tpch_queries_speedup_rel.png) +![](docs/source/_static/images/benchmark-results/0.5.0/tpch_queries_speedup_rel.png) ### Absolute speedup -![](docs/source/_static/images/benchmark-results/0.4.0/tpch_queries_speedup_abs.png) +![](docs/source/_static/images/benchmark-results/0.5.0/tpch_queries_speedup_abs.png) These benchmarks can be reproduced in any environment using the documentation in the [Comet Benchmarking Guide](https://datafusion.apache.org/comet/contributor-guide/benchmarking.html). We encourage diff --git a/docs/source/_static/images/benchmark-results/0.5.0/tpch_allqueries.png b/docs/source/_static/images/benchmark-results/0.5.0/tpch_allqueries.png new file mode 100644 index 0000000000..0855f7bf21 Binary files /dev/null and b/docs/source/_static/images/benchmark-results/0.5.0/tpch_allqueries.png differ diff --git a/docs/source/_static/images/benchmark-results/0.5.0/tpch_queries_compare.png b/docs/source/_static/images/benchmark-results/0.5.0/tpch_queries_compare.png new file mode 100644 index 0000000000..c17c1af15e Binary files /dev/null and b/docs/source/_static/images/benchmark-results/0.5.0/tpch_queries_compare.png differ diff --git a/docs/source/_static/images/benchmark-results/0.5.0/tpch_queries_speedup_abs.png b/docs/source/_static/images/benchmark-results/0.5.0/tpch_queries_speedup_abs.png new file mode 100644 index 0000000000..aa8391cc52 Binary files /dev/null and b/docs/source/_static/images/benchmark-results/0.5.0/tpch_queries_speedup_abs.png differ diff --git a/docs/source/_static/images/benchmark-results/0.5.0/tpch_queries_speedup_rel.png b/docs/source/_static/images/benchmark-results/0.5.0/tpch_queries_speedup_rel.png new file mode 100644 index 0000000000..1fce882eb8 Binary files /dev/null and b/docs/source/_static/images/benchmark-results/0.5.0/tpch_queries_speedup_rel.png differ diff --git a/docs/source/contributor-guide/benchmark-results/0.5.0/comet-tpch.json b/docs/source/contributor-guide/benchmark-results/0.5.0/comet-tpch.json new file mode 100644 index 0000000000..b1b3ae127b --- /dev/null +++ b/docs/source/contributor-guide/benchmark-results/0.5.0/comet-tpch.json @@ -0,0 +1,209 @@ +{ + "engine": "datafusion-comet", + "benchmark": "tpch", + "data_path": "/mnt/bigdata/tpch/sf100/", + "query_path": "/home/andy/git/apache/datafusion-benchmarks/tpch/queries", + "spark_conf": { + "spark.comet.explain.native.enabled": "false", + "spark.eventLog.enabled": "true", + "spark.executor.extraClassPath": "/home/andy/git/apache/datafusion-comet/spark/target/comet-spark-spark3.4_2.12-0.5.0-SNAPSHOT.jar", + "spark.comet.explainFallback.enabled": "false", + "spark.comet.exec.replaceSortMergeJoin": "true", + "spark.comet.exec.shuffle.enabled": "true", + "spark.memory.offHeap.enabled": "true", + "spark.comet.exec.shuffle.compression.level": "1", + "spark.executor.memory": "16g", + "spark.app.name": "comet benchmark derived from tpch", + "spark.comet.batchSize": "8192", + "spark.app.startTime": "1736802464855", + "spark.comet.exec.shuffle.fallbackToColumnar": "true", + "spark.serializer.objectStreamReset": "100", + "spark.driver.host": "10.0.0.118", + "spark.comet.exec.shuffle.enableFastEncoding": "true", + "spark.submit.deployMode": "client", + "spark.driver.port": "33103", + "spark.comet.scan.impl": "native_comet", + "spark.driver.extraClassPath": "/home/andy/git/apache/datafusion-comet/spark/target/comet-spark-spark3.4_2.12-0.5.0-SNAPSHOT.jar", + "spark.executor.cores": "8", + "spark.comet.explain.verbose.enabled": "false", + "spark.driver.extraJavaOptions": "-Djava.net.preferIPv6Addresses=false -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false", + "spark.shuffle.manager": "org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager", + "spark.comet.exec.enabled": "true", + "spark.sql.warehouse.dir": "file:/home/andy/git/personal/research/benchmarks/spark-standalone/spark-warehouse", + "spark.comet.scan.enabled": "true", + "spark.app.submitTime": "1736802464584", + "spark.executor.id": "driver", + "spark.master": "spark://woody:7077", + "spark.comet.exec.shuffle.mode": "auto", + "spark.sql.extensions": "org.apache.comet.CometSparkSessionExtensions", + "spark.driver.memory": "8G", + "spark.repl.local.jars": "file:///home/andy/git/apache/datafusion-comet/spark/target/comet-spark-spark3.4_2.12-0.5.0-SNAPSHOT.jar", + "spark.app.initial.jar.urls": "spark://10.0.0.118:33103/jars/comet-spark-spark3.4_2.12-0.5.0-SNAPSHOT.jar", + "spark.app.id": "app-20250113140745-0058", + "spark.rdd.compress": "True", + "spark.executor.extraJavaOptions": "-Djava.net.preferIPv6Addresses=false -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false", + "spark.executor.instances": "1", + "spark.cores.max": "8", + "spark.comet.enabled": "true", + "spark.submit.pyFiles": "", + "spark.comet.exec.sortMergeJoinWithJoinFilter.enabled": "false", + "spark.comet.exec.shuffle.compression.codec": "lz4", + "spark.jars": "file:///home/andy/git/apache/datafusion-comet/spark/target/comet-spark-spark3.4_2.12-0.5.0-SNAPSHOT.jar", + "spark.memory.offHeap.size": "16g", + "spark.comet.columnar.shuffle.batch.size": "8192" + }, + "1": [ + 12.59755539894104, + 10.855465650558472, + 11.160947799682617, + 11.323237657546997, + 11.410452365875244 + ], + "2": [ + 6.155406475067139, + 5.539891719818115, + 5.698071002960205, + 5.684133529663086, + 5.742799758911133 + ], + "3": [ + 16.097025156021118, + 14.982890367507935, + 14.998259544372559, + 15.659432649612427, + 15.878185749053955 + ], + "4": [ + 10.319517850875854, + 10.0553297996521, + 10.136846780776978, + 9.925675392150879, + 10.140193462371826 + ], + "5": [ + 26.09030055999756, + 25.57556390762329, + 26.102373600006104, + 26.540887117385864, + 26.162983655929565 + ], + "6": [ + 2.691145658493042, + 2.5986382961273193, + 2.659151792526245, + 2.6488683223724365, + 2.6785433292388916 + ], + "7": [ + 15.326677560806274, + 15.57035493850708, + 16.023503065109253, + 16.015883207321167, + 15.79127025604248 + ], + "8": [ + 27.72478675842285, + 27.45163321495056, + 27.935590267181396, + 27.86525869369507, + 28.016165733337402 + ], + "9": [ + 39.186867237091064, + 39.73552465438843, + 40.866581439971924, + 40.73869442939758, + 40.89244842529297 + ], + "10": [ + 14.022773742675781, + 14.476953029632568, + 14.305155515670776, + 14.187727451324463, + 14.57831335067749 + ], + "11": [ + 5.223851919174194, + 4.722897291183472, + 4.844727277755737, + 4.803720474243164, + 4.822873592376709 + ], + "12": [ + 4.974349021911621, + 5.013054132461548, + 5.0682995319366455, + 5.1071436405181885, + 5.142468452453613 + ], + "13": [ + 9.769477128982544, + 9.743404626846313, + 9.935744285583496, + 9.966437339782715, + 9.854998588562012 + ], + "14": [ + 5.320314168930054, + 5.26824426651001, + 5.269179344177246, + 5.322073698043823, + 5.292902708053589 + ], + "15": [ + 9.532674789428711, + 9.520610570907593, + 9.538906335830688, + 9.553953886032104, + 9.65409803390503 + ], + "16": [ + 5.146467924118042, + 4.716687440872192, + 4.863113164901733, + 4.725494384765625, + 4.653785228729248 + ], + "17": [ + 30.45087242126465, + 30.785797119140625, + 30.950777530670166, + 31.04833745956421, + 31.12831139564514 + ], + "18": [ + 27.549716472625732, + 27.610363960266113, + 27.41417407989502, + 27.633289098739624, + 27.72838020324707 + ], + "19": [ + 5.9813477993011475, + 6.041543483734131, + 6.087557554244995, + 6.106397390365601, + 6.011293888092041 + ], + "20": [ + 10.53919005393982, + 10.382107019424438, + 10.370867729187012, + 10.376642942428589, + 10.48800802230835 + ], + "21": [ + 42.36113142967224, + 42.296979904174805, + 42.56899857521057, + 42.587459564208984, + 42.86927652359009 + ], + "22": [ + 3.755877733230591, + 3.523585319519043, + 3.5420711040496826, + 3.605468273162842, + 3.6084585189819336 + ] +} \ No newline at end of file diff --git a/docs/source/contributor-guide/benchmark-results/0.5.0/spark-tpch.json b/docs/source/contributor-guide/benchmark-results/0.5.0/spark-tpch.json new file mode 100644 index 0000000000..a82ea19533 --- /dev/null +++ b/docs/source/contributor-guide/benchmark-results/0.5.0/spark-tpch.json @@ -0,0 +1,185 @@ +{ + "engine": "datafusion-comet", + "benchmark": "tpch", + "data_path": "/mnt/bigdata/tpch/sf100/", + "query_path": "/home/andy/git/apache/datafusion-benchmarks/tpch/queries", + "spark_conf": { + "spark.eventLog.enabled": "true", + "spark.driver.extraJavaOptions": "-Djava.net.preferIPv6Addresses=false -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false", + "spark.sql.warehouse.dir": "file:/home/andy/git/personal/research/benchmarks/spark-standalone/spark-warehouse", + "spark.executor.id": "driver", + "spark.master": "spark://woody:7077", + "spark.driver.memory": "8G", + "spark.memory.offHeap.enabled": "true", + "spark.driver.port": "34999", + "spark.executor.memory": "16g", + "spark.app.id": "app-20250113143837-0059", + "spark.app.startTime": "1736804317132", + "spark.rdd.compress": "True", + "spark.app.name": "spark benchmark derived from tpch", + "spark.app.submitTime": "1736804316845", + "spark.executor.extraJavaOptions": "-Djava.net.preferIPv6Addresses=false -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false", + "spark.serializer.objectStreamReset": "100", + "spark.driver.host": "10.0.0.118", + "spark.executor.instances": "1", + "spark.cores.max": "8", + "spark.submit.pyFiles": "", + "spark.submit.deployMode": "client", + "spark.executor.cores": "8", + "spark.memory.offHeap.size": "16g" + }, + "1": [ + 80.39240145683289, + 79.9300479888916, + 80.08533000946045, + 79.95083689689636, + 79.85243964195251 + ], + "2": [ + 12.361650466918945, + 12.305850267410278, + 12.250919818878174, + 11.97881293296814, + 11.725982427597046 + ], + "3": [ + 24.618537187576294, + 24.87288188934326, + 24.59152317047119, + 24.218815803527832, + 24.44655966758728 + ], + "4": [ + 22.427627086639404, + 20.657840728759766, + 21.091227531433105, + 21.05953359603882, + 20.46818709373474 + ], + "5": [ + 50.51570105552673, + 49.615350008010864, + 49.8210015296936, + 49.847323179244995, + 49.476715087890625 + ], + "6": [ + 3.347008466720581, + 3.21842360496521, + 3.1940605640411377, + 3.310222625732422, + 3.24226713180542 + ], + "7": [ + 21.848386764526367, + 21.017245531082153, + 21.077393054962158, + 21.38729691505432, + 21.2225079536438 + ], + "8": [ + 32.984848499298096, + 33.09590411186218, + 32.93970465660095, + 32.48174071311951, + 32.516525983810425 + ], + "9": [ + 76.0926308631897, + 74.51611924171448, + 74.34783029556274, + 74.69494938850403, + 74.43283152580261 + ], + "10": [ + 19.82606053352356, + 19.577861070632935, + 19.701807498931885, + 19.774757146835327, + 19.246259689331055 + ], + "11": [ + 12.777667760848999, + 12.865848302841187, + 12.908553838729858, + 12.77224063873291, + 12.639224767684937 + ], + "12": [ + 13.610698223114014, + 13.52022099494934, + 13.459492206573486, + 13.30645203590393, + 13.632066011428833 + ], + "13": [ + 22.276411771774292, + 22.426157474517822, + 22.358247995376587, + 22.300530672073364, + 22.17827558517456 + ], + "14": [ + 5.936513662338257, + 5.709101915359497, + 5.788166046142578, + 5.724794626235962, + 5.8799731731414795 + ], + "15": [ + 14.949971437454224, + 14.810836553573608, + 14.723956823348999, + 14.827229738235474, + 14.890477657318115 + ], + "16": [ + 7.259998559951782, + 6.874396324157715, + 7.139331817626953, + 7.010579586029053, + 6.851199626922607 + ], + "17": [ + 59.51790142059326, + 61.10569620132446, + 61.5246102809906, + 62.90809988975525, + 60.082926988601685 + ], + "18": [ + 71.04480576515198, + 71.47302889823914, + 73.99175548553467, + 70.6622040271759, + 73.06630039215088 + ], + "19": [ + 6.902673006057739, + 7.26209568977356, + 6.773332357406616, + 6.810155868530273, + 6.643052577972412 + ], + "20": [ + 10.397570133209229, + 9.828125, + 9.871150493621826, + 9.73079776763916, + 9.764204502105713 + ], + "21": [ + 66.58646297454834, + 65.52946162223816, + 65.2813880443573, + 67.38516497612, + 68.25078797340393 + ], + "22": [ + 9.243720531463623, + 9.1110098361969, + 9.128684282302856, + 9.107020854949951, + 9.347189903259277 + ] +} \ No newline at end of file diff --git a/docs/source/contributor-guide/benchmark-results/tpc-h.md b/docs/source/contributor-guide/benchmark-results/tpc-h.md index 2285489356..336deb7a7c 100644 --- a/docs/source/contributor-guide/benchmark-results/tpc-h.md +++ b/docs/source/contributor-guide/benchmark-results/tpc-h.md @@ -25,21 +25,21 @@ and we encourage you to run these benchmarks in your own environments. The tracking issue for improving TPC-H performance is [#391](https://github.com/apache/datafusion-comet/issues/391). -![](../../_static/images/benchmark-results/0.4.0/tpch_allqueries.png) +![](../../_static/images/benchmark-results/0.5.0-SNAPSHOT-2025-01-09/tpch_allqueries.png) -Here is a breakdown showing relative performance of Spark, Comet, and DataFusion for each query. +Here is a breakdown showing relative performance of Spark and Comet for each query. -![](../../_static/images/benchmark-results/0.4.0/tpch_queries_compare.png) +![](../../_static/images/benchmark-results/0.5.0-SNAPSHOT-2025-01-09/tpch_queries_compare.png) The following chart shows how much Comet currently accelerates each query from the benchmark in relative terms. -![](../../_static/images/benchmark-results/0.4.0/tpch_queries_speedup_rel.png) +![](../../_static/images/benchmark-results/0.5.0-SNAPSHOT-2025-01-09/tpch_queries_speedup_rel.png) The following chart shows how much Comet currently accelerates each query from the benchmark in absolute terms. -![](../../_static/images/benchmark-results/0.4.0/tpch_queries_speedup_abs.png) +![](../../_static/images/benchmark-results/0.5.0-SNAPSHOT-2025-01-09/tpch_queries_speedup_abs.png) The raw results of these benchmarks in JSON format is available here: -- [Spark](0.4.0/spark-tpch.json) -- [Comet](0.4.0/comet-tpch.json) +- [Spark](0.5.0/spark-tpch.json) +- [Comet](0.5.0/comet-tpch.json) diff --git a/docs/source/contributor-guide/benchmarking.md b/docs/source/contributor-guide/benchmarking.md index bd280f47fd..173d598ac2 100644 --- a/docs/source/contributor-guide/benchmarking.md +++ b/docs/source/contributor-guide/benchmarking.md @@ -49,6 +49,8 @@ $SPARK_HOME/bin/spark-submit \ ## Running Benchmarks Against Apache Spark with Apache DataFusion Comet Enabled +### TPC-H + ```shell $SPARK_HOME/bin/spark-submit \ --master $SPARK_MASTER \ @@ -58,15 +60,19 @@ $SPARK_HOME/bin/spark-submit \ --conf spark.executor.cores=8 \ --conf spark.cores.max=8 \ --conf spark.memory.offHeap.enabled=true \ - --conf spark.memory.offHeap.size=32g \ + --conf spark.memory.offHeap.size=16g \ --jars $COMET_JAR \ --conf spark.driver.extraClassPath=$COMET_JAR \ --conf spark.executor.extraClassPath=$COMET_JAR \ --conf spark.plugins=org.apache.spark.CometPlugin \ --conf spark.comet.cast.allowIncompatible=true \ --conf spark.comet.exec.replaceSortMergeJoin=true \ - --conf spark.comet.exec.shuffle.enabled=true \ --conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager \ + --conf spark.comet.exec.shuffle.enabled=true \ + --conf spark.comet.exec.shuffle.mode=auto \ + --conf spark.comet.exec.shuffle.enableFastEncoding=true \ + --conf spark.comet.exec.shuffle.fallbackToColumnar=true \ + --conf spark.comet.exec.shuffle.compression.codec=lz4 \ tpcbench.py \ --benchmark tpch \ --data /mnt/bigdata/tpch/sf100/ \ @@ -74,6 +80,10 @@ $SPARK_HOME/bin/spark-submit \ --iterations 3 ``` +### TPC-DS + +For TPC-DS, use `spark.comet.exec.replaceSortMergeJoin=false`. + ## Current Benchmark Results - [Benchmarks derived from TPC-H](benchmark-results/tpc-h)