Skip to content

java.lang.ExceptionInInitializerError: Could not find comet-git-info.properties #1026

@BjarkeTornager

Description

@BjarkeTornager

Describe the bug

I have followed the building from source guide since I am on macOS. Only difference is that I ran the build with version 3.3: make release-nogit PROFILES="-Pspark-3.3".

With the produced jar from the build I can run Spark with Comet fine in the terminal like this:

export COMET_JAR=apache-datafusion-comet-0.3.0/spark/target/comet-spark-spark3.3_2.12-0.3.0.jar

SPARK_HOME/bin/spark-shell \
    --jars $COMET_JAR \
    --conf spark.driver.extraClassPath=$COMET_JAR \
    --conf spark.executor.extraClassPath=$COMET_JAR \
    --conf spark.plugins=org.apache.spark.CometPlugin \
    --conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager \
    --conf spark.comet.explainFallback.enabled=true \
    --conf spark.memory.offHeap.enabled=true \
    --conf spark.memory.offHeap.size=16g

However, when adding comet spark to my spark config options in my own project like this:

"spark.jars": "apache-datafusion-comet-0.3.0/spark/target/comet-spark-spark3.3_2.12-0.3.0.jar",
"spark.driver.extraClassPath": "apache-datafusion-comet-0.3.0/spark/target/comet-spark-spark3.3_2.12-0.3.0.jar",
"spark.executor.extraClassPath": "apache-datafusion-comet-0.3.0/spark/target/comet-spark-spark3.3_2.12-0.3.0.jar",
"spark.plugins": "org.apache.spark.CometPlugin",
"spark.shuffle.manager": "org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager",
"spark.comet.explainFallback.enabled": "true",
"spark.memory.offHeap.enabled": "true",
"spark.memory.offHeap.size": "16g",

And running a spark test using pytest, which always succeeds when not adding the comet spark configurations mentioned above, I get the following exception:

---------------------------------------------------------------------------- Captured stdout call -----------------------------------------------------------------------------
24/10/20 07:25:32 WARN CometSparkSessionExtensions$CometExecRule: Comet cannot execute some parts of this plan natively (set spark.comet.explainFallback.enabled=false to disable this logging):
HashAggregate
+-  Exchange [COMET: Exchange is not native because the following children are not native (HashAggregate)]
   +-  HashAggregate [COMET: HashAggregate is not native because the following children are not native (Project)]
      +-  Project [COMET: Project is not native because the following children are not native (BroadcastHashJoin)]
         +-  BroadcastHashJoin [COMET: BroadcastHashJoin is not native because the following children are not native (Scan ExistingRDD, BroadcastExchange)]
            :-  Scan ExistingRDD [COMET: Scan ExistingRDD is not supported]
            +- BroadcastExchange
               +- CometProject
                  +- CometFilter
                     +- CometScanWrapper

24/10/20 07:25:32 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
java.lang.ExceptionInInitializerError
	at org.apache.comet.package$.<init>(package.scala:90)
	at org.apache.comet.package$.<clinit>(package.scala)
	at org.apache.comet.vector.NativeUtil.<init>(NativeUtil.scala:48)
	at org.apache.comet.CometExecIterator.<init>(CometExecIterator.scala:52)
	at org.apache.spark.sql.comet.CometNativeExec.createCometExecIter$1(operators.scala:223)
	at org.apache.spark.sql.comet.CometNativeExec.$anonfun$doExecuteColumnar$6(operators.scala:298)
	at org.apache.spark.sql.comet.ZippedPartitionsRDD.compute(ZippedPartitionsRDD.scala:43)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.comet.CometRuntimeException: Could not find comet-git-info.properties
	at org.apache.comet.package$CometBuildInfo$.<init>(package.scala:57)
	at org.apache.comet.package$CometBuildInfo$.<clinit>(package.scala)
	... 23 more

Searching in datafusion-comet source code it looks like the error comes from here.

Details of environment:

  • macOS Sonoma version 14.6
  • Spark 3.3.4 using pyspark
  • Scala version 2.12

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions