-
Notifications
You must be signed in to change notification settings - Fork 266
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
I have followed the building from source guide since I am on macOS. Only difference is that I ran the build with version 3.3: make release-nogit PROFILES="-Pspark-3.3".
With the produced jar from the build I can run Spark with Comet fine in the terminal like this:
export COMET_JAR=apache-datafusion-comet-0.3.0/spark/target/comet-spark-spark3.3_2.12-0.3.0.jar
SPARK_HOME/bin/spark-shell \
--jars $COMET_JAR \
--conf spark.driver.extraClassPath=$COMET_JAR \
--conf spark.executor.extraClassPath=$COMET_JAR \
--conf spark.plugins=org.apache.spark.CometPlugin \
--conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager \
--conf spark.comet.explainFallback.enabled=true \
--conf spark.memory.offHeap.enabled=true \
--conf spark.memory.offHeap.size=16g
However, when adding comet spark to my spark config options in my own project like this:
"spark.jars": "apache-datafusion-comet-0.3.0/spark/target/comet-spark-spark3.3_2.12-0.3.0.jar",
"spark.driver.extraClassPath": "apache-datafusion-comet-0.3.0/spark/target/comet-spark-spark3.3_2.12-0.3.0.jar",
"spark.executor.extraClassPath": "apache-datafusion-comet-0.3.0/spark/target/comet-spark-spark3.3_2.12-0.3.0.jar",
"spark.plugins": "org.apache.spark.CometPlugin",
"spark.shuffle.manager": "org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager",
"spark.comet.explainFallback.enabled": "true",
"spark.memory.offHeap.enabled": "true",
"spark.memory.offHeap.size": "16g",
And running a spark test using pytest, which always succeeds when not adding the comet spark configurations mentioned above, I get the following exception:
---------------------------------------------------------------------------- Captured stdout call -----------------------------------------------------------------------------
24/10/20 07:25:32 WARN CometSparkSessionExtensions$CometExecRule: Comet cannot execute some parts of this plan natively (set spark.comet.explainFallback.enabled=false to disable this logging):
HashAggregate
+- Exchange [COMET: Exchange is not native because the following children are not native (HashAggregate)]
+- HashAggregate [COMET: HashAggregate is not native because the following children are not native (Project)]
+- Project [COMET: Project is not native because the following children are not native (BroadcastHashJoin)]
+- BroadcastHashJoin [COMET: BroadcastHashJoin is not native because the following children are not native (Scan ExistingRDD, BroadcastExchange)]
:- Scan ExistingRDD [COMET: Scan ExistingRDD is not supported]
+- BroadcastExchange
+- CometProject
+- CometFilter
+- CometScanWrapper
24/10/20 07:25:32 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
java.lang.ExceptionInInitializerError
at org.apache.comet.package$.<init>(package.scala:90)
at org.apache.comet.package$.<clinit>(package.scala)
at org.apache.comet.vector.NativeUtil.<init>(NativeUtil.scala:48)
at org.apache.comet.CometExecIterator.<init>(CometExecIterator.scala:52)
at org.apache.spark.sql.comet.CometNativeExec.createCometExecIter$1(operators.scala:223)
at org.apache.spark.sql.comet.CometNativeExec.$anonfun$doExecuteColumnar$6(operators.scala:298)
at org.apache.spark.sql.comet.ZippedPartitionsRDD.compute(ZippedPartitionsRDD.scala:43)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:136)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.comet.CometRuntimeException: Could not find comet-git-info.properties
at org.apache.comet.package$CometBuildInfo$.<init>(package.scala:57)
at org.apache.comet.package$CometBuildInfo$.<clinit>(package.scala)
... 23 more
Searching in datafusion-comet source code it looks like the error comes from here.
Details of environment:
- macOS Sonoma version 14.6
- Spark 3.3.4 using pyspark
- Scala version 2.12
Steps to reproduce
No response
Expected behavior
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working