Skip to content

Conversation

@techaddict
Copy link
Contributor

What changes were proposed in this pull request?

Add Protobuf serializer for SparkPlanGraphWrapper

Why are the changes needed?

Support fast and compact serialization/deserialization for SparkPlanGraphWrapper over RocksDB.

Does this PR introduce any user-facing change?

No

How was this patch tested?

New UT

sql/core/pom.xml Outdated
</dependency>
</dependencies>
</profile>
<profile>
Copy link
Contributor

@LuciferYang LuciferYang Dec 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also shaded and relocation protobuf-java in sql module, like

spark/core/pom.xml

Lines 690 to 696 in 73593d8

<relocation>
<pattern>com.google.protobuf</pattern>
<shadedPattern>${spark.shade.packageName}.spark-core.protobuf</shadedPattern>
<includes>
<include>com.google.protobuf.**</include>
</includes>
</relocation>

Copy link
Contributor

@LuciferYang LuciferYang Dec 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding to this pr? This is some necessary initial work for sql module @techaddict

both pom.xml and SparkBuild.scala

Copy link
Contributor

@LuciferYang LuciferYang Dec 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for SBT we can refer to

lazy val settings = Seq(
// Setting version for the protobuf compiler. This has to be propagated to every sub-project
// even if the project is not using it.
PB.protocVersion := BuildCommons.protoVersion,
// For some reason the resolution from the imported Maven build does not work for some
// of these dependendencies that we need to shade later on.
libraryDependencies ++= {
Seq(
"com.google.protobuf" % "protobuf-java" % protoVersion % "protobuf"
)
},
(Compile / PB.targets) := Seq(
PB.gens.java -> (Compile / sourceManaged).value
),

) ++ {
val sparkProtocExecPath = sys.props.get("spark.protoc.executable.path")
if (sparkProtocExecPath.isDefined) {
Seq(
PB.protocExecutable := file(sparkProtocExecPath.get)
)
} else {
Seq.empty
}
}

Otherwise, GA task may can't compile proto file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old protobuf deps in root pom causes so many troubles...

Copy link
Contributor

@LuciferYang LuciferYang Dec 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spark/pom.xml

Lines 821 to 831 in e56f31d

<!-- In theory we need not directly depend on protobuf since Spark does not directly
use it. However, when building with Hadoop/YARN 2.2 Maven doesn't correctly bump
the protobuf version up from the one Mesos gives. For now we include this variable
to explicitly bump the version when building with YARN. It would be nice to figure
out why Maven can't resolve this correctly (like SBT does). -->
<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<version>${protobuf.hadoopDependency.version}</version>
<scope>${hadoop.deps.scope}</scope>
</dependency>

From the comments, maybe we can try to remove the old protobuf dependency. mesos uses the shaded-protobuf one now. Let me do some tests with different hadoop profiles first

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into it!

@LuciferYang
Copy link
Contributor

Since this pr will include some initialization work of the sql module, I think we should promote this first to facilitate the submission of other pr related to the sql module.

@gengliangwang
Copy link
Member

@techaddict could you focus on this one first?

@techaddict
Copy link
Contributor Author

@LuciferYang @gengliangwang sounds good

@codecov-commenter
Copy link

codecov-commenter commented Dec 22, 2022

Codecov Report

Merging #39164 (e56f31d) into master (e56f31d) will not change coverage.
The diff coverage is n/a.

❗ Current head e56f31d differs from pull request most recent head 25b63fa. Consider uploading reports for the commit 25b63fa to get more accurate results

@@           Coverage Diff           @@
##           master   #39164   +/-   ##
=======================================
  Coverage   76.50%   76.50%           
=======================================
  Files         249      249           
  Lines       61187    61187           
  Branches     9069     9069           
=======================================
  Hits        46811    46811           
  Misses      13097    13097           
  Partials     1279     1279           
Flag Coverage Δ
unittests 76.48% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@techaddict
Copy link
Contributor Author

@gengliangwang @LuciferYang addressed all the comments

@gengliangwang
Copy link
Member

@techaddict How about following #39139? In that way, we don't need to have another shading in the SQL module.

@github-actions github-actions bot added CORE and removed BUILD labels Dec 22, 2022
@techaddict
Copy link
Contributor Author

@gengliangwang done

@gengliangwang
Copy link
Member

@LuciferYang This PR passes without the protobuf-java dep. Probably there is no maven build in GA tests?

@LuciferYang
Copy link
Contributor

Yes, GA only runs sbt tests now , while Maven only run build with Java 11 & 17, no tests.

@LuciferYang
Copy link
Contributor

LuciferYang commented Dec 23, 2022

Run

gh pr checkout 39164  
mvn clean test -pl sql/core -am -Dtest=none -DwildcardSuites=org.apache.spark.status.protobuf.sql.KVStoreProtobufSerializerSuite

locally

[INFO] --- scala-maven-plugin:4.8.0:compile (scala-compile-first) @ spark-sql_2.12 ---
[INFO] Compiler bridge file: /Users/yangjie01/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.12-1.8.0-bin_2.12.17__52.0-1.8.0_20221110T195421.jar
[INFO] compiler plugin: BasicArtifact(com.github.ghik,silencer-plugin_2.12.17,1.7.10,null)
[INFO] compiling 576 Scala sources and 75 Java sources to /spark-source/sql/core/target/scala-2.12/classes ...
[ERROR] [Error] /spark-source/sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SparkPlanGraphWrapperSerializer.scala:34: Class com.google.protobuf.GeneratedMessageV3 not found - continuing with a stub.
[ERROR] [Error] : Unable to locate class corresponding to inner class entry for Builder in owner com.google.protobuf.GeneratedMessageV3
[ERROR] [Error] /spark-source/sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SparkPlanGraphWrapperSerializer.scala:37: Class com.google.protobuf.GeneratedMessageV3 not found - continuing with a stub.
[ERROR] [Error] /spark-source/sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SparkPlanGraphWrapperSerializer.scala:40: Class com.google.protobuf.GeneratedMessageV3 not found - continuing with a stub.
[ERROR] [Error] /spark-source/sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SparkPlanGraphWrapperSerializer.scala:42: value toByteArray is not a member of org.apache.spark.status.protobuf.StoreTypes.SparkPlanGraphWrapper
[ERROR] [Error] : Unable to locate class corresponding to inner class entry for Builder in owner com.google.protobuf.GeneratedMessageV3
[ERROR] [Error] /spark-source/sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SparkPlanGraphWrapperSerializer.scala:58: Class com.google.protobuf.GeneratedMessageV3 not found - continuing with a stub.
[ERROR] [Error] /spark-source/sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SparkPlanGraphWrapperSerializer.scala:59: Class com.google.protobuf.GeneratedMessageV3 not found - continuing with a stub.
[ERROR] [Error] : Unable to locate class corresponding to inner class entry for Builder in owner com.google.protobuf.GeneratedMessageV3
[ERROR] [Error] : Unable to locate class corresponding to inner class entry for Builder in owner com.google.protobuf.GeneratedMessageV3
[ERROR] [Error] /spark-source/sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SparkPlanGraphWrapperSerializer.scala:94: Class com.google.protobuf.GeneratedMessageV3 not found - continuing with a stub.
[ERROR] [Error] : Unable to locate class corresponding to inner class entry for Builder in owner com.google.protobuf.GeneratedMessageV3
[ERROR] [Error] : Unable to locate class corresponding to inner class entry for Builder in owner com.google.protobuf.GeneratedMessageV3
[ERROR] 13 errors found

@github-actions github-actions bot added the BUILD label Dec 23, 2022
@techaddict
Copy link
Contributor Author

@LuciferYang I see the issue, too; it looks like the way around is to add a maven dependency

@github-actions github-actions bot removed the BUILD label Dec 24, 2022
@techaddict
Copy link
Contributor Author

@gengliangwang @LuciferYang rebased this, and its ready for review

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only one comment

+1, LGTM

@gengliangwang
Copy link
Member

Thanks, merging to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants