feat: transfer Apache Spark runtime conf to native engine #1649

comphead · 2025-04-15T01:24:08Z

Which issue does this PR close?

Related #1360.

Rationale for this change

Very often the native engine behavior depends on external Spark job params (HDFS configuration, INT96, etc) but there is no access from native code to Spark configuration.

What changes are included in this PR?

Extending ExecutionContext to enclose Spark params from JVM

How are these changes tested?

comphead · 2025-04-15T23:49:39Z

Some part were rolled back in #1101

parthchandra · 2025-04-21T22:33:51Z

spark/src/main/scala/org/apache/comet/CometExecIterator.scala

      debug = COMET_DEBUG_ENABLED.get(),
-      explain = COMET_EXPLAIN_NATIVE_ENABLED.get())
+      explain = COMET_EXPLAIN_NATIVE_ENABLED.get(),
+      sparkConfig = SparkEnv.get.conf.getAll.toMap)


Doesn't toMap return a Scala Map (and not a Java HashMap)? Does this translate correctly to a JMap?
Also, SparkConf.getAll will return all Spark confs. Should we filter based on Comet conf and Spark confs of interest only?

Comments are super valid and everything done in local branch yes, haven't yet pushed it

I commented because I thought we were planning to get this in the 0.8.0 release :)

andygrove · 2025-04-22T13:59:55Z

native/core/src/execution/jni_api.rs

    task_attempt_id: jlong,
    debug_native: jboolean,
    explain_native: jboolean,
+    spark_conf: JObject,


An alternate approach could be to encode the Spark config in protobuf format and pass those bytes into native code rather than have native code make calls to the JVM to iterate over the map.

comphead · 2025-04-22T20:07:16Z

When testing I just realized the Apache Spark already send defaultFS with schema to the logical plan and then to Comet

So this test works and Spark sends /tmp/2 to native site with prefixed hdfs://namenode:9000/tmp/2

  test("Test V1 parquet scan uses native_datafusion with HDFS") {
    withSQLConf(
      CometConf.COMET_ENABLED.key -> "true",
      CometConf.COMET_EXEC_ENABLED.key -> "true",
      CometConf.COMET_NATIVE_SCAN_IMPL.key -> CometConf.SCAN_NATIVE_DATAFUSION,
      SQLConf.USE_V1_SOURCE_LIST.key -> "parquet",
      "fs.defaultFS" -> "hdfs://namenode:9000",
      "dfs.client.use.datanode.hostname" -> "true") {
      val df = spark.read.parquet("/tmp/2")
      df.show(false)
      df.explain("extended")
    }
  }

However when running spark-shell another param should be used --conf spark.hadoop.fs.defaultFS=hdfs://namenode:9000

comphead · 2025-04-22T20:15:36Z

@parthchandra @andygrove I'm planning to close the PR as spark transparently sends the conf to native, however keeping the Spark conf code in DataFusion like style in this PR for future if we still need it.

comphead · 2025-04-22T20:20:20Z

Code in 3e93995

comphead force-pushed the dev branch from a887ff4 to 69537c5 Compare April 15, 2025 17:42

parthchandra reviewed Apr 21, 2025

View reviewed changes

andygrove reviewed Apr 22, 2025

View reviewed changes

comphead closed this Apr 22, 2025

comphead force-pushed the dev branch from 3e93995 to cfc4cbb Compare April 22, 2025 20:19

comphead mentioned this pull request Apr 22, 2025

doc: Document local HDFS setup #1673

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: transfer Apache Spark runtime conf to native engine #1649

feat: transfer Apache Spark runtime conf to native engine #1649

Uh oh!

comphead commented Apr 15, 2025

Uh oh!

comphead commented Apr 15, 2025

Uh oh!

parthchandra Apr 21, 2025

Uh oh!

comphead Apr 21, 2025

Uh oh!

parthchandra Apr 21, 2025

Uh oh!

andygrove Apr 22, 2025

Uh oh!

comphead commented Apr 22, 2025

Uh oh!

comphead commented Apr 22, 2025

Uh oh!

comphead commented Apr 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: transfer Apache Spark runtime conf to native engine #1649

feat: transfer Apache Spark runtime conf to native engine #1649

Uh oh!

Conversation

comphead commented Apr 15, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

comphead commented Apr 15, 2025

Uh oh!

parthchandra Apr 21, 2025

Choose a reason for hiding this comment

Uh oh!

comphead Apr 21, 2025

Choose a reason for hiding this comment

Uh oh!

parthchandra Apr 21, 2025

Choose a reason for hiding this comment

Uh oh!

andygrove Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

comphead commented Apr 22, 2025

Uh oh!

comphead commented Apr 22, 2025

Uh oh!

comphead commented Apr 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants