-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-41432][UI][SQL] Protobuf serializer for SparkPlanGraphWrapper #39164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-41432][UI][SQL] Protobuf serializer for SparkPlanGraphWrapper #39164
Conversation
sql/core/pom.xml
Outdated
| </dependency> | ||
| </dependencies> | ||
| </profile> | ||
| <profile> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also shaded and relocation protobuf-java in sql module, like
Lines 690 to 696 in 73593d8
| <relocation> | |
| <pattern>com.google.protobuf</pattern> | |
| <shadedPattern>${spark.shade.packageName}.spark-core.protobuf</shadedPattern> | |
| <includes> | |
| <include>com.google.protobuf.**</include> | |
| </includes> | |
| </relocation> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind adding to this pr? This is some necessary initial work for sql module @techaddict
both pom.xml and SparkBuild.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for SBT we can refer to
spark/project/SparkBuild.scala
Lines 618 to 631 in 73593d8
| lazy val settings = Seq( | |
| // Setting version for the protobuf compiler. This has to be propagated to every sub-project | |
| // even if the project is not using it. | |
| PB.protocVersion := BuildCommons.protoVersion, | |
| // For some reason the resolution from the imported Maven build does not work for some | |
| // of these dependendencies that we need to shade later on. | |
| libraryDependencies ++= { | |
| Seq( | |
| "com.google.protobuf" % "protobuf-java" % protoVersion % "protobuf" | |
| ) | |
| }, | |
| (Compile / PB.targets) := Seq( | |
| PB.gens.java -> (Compile / sourceManaged).value | |
| ), |
spark/project/SparkBuild.scala
Lines 645 to 654 in 73593d8
| ) ++ { | |
| val sparkProtocExecPath = sys.props.get("spark.protoc.executable.path") | |
| if (sparkProtocExecPath.isDefined) { | |
| Seq( | |
| PB.protocExecutable := file(sparkProtocExecPath.get) | |
| ) | |
| } else { | |
| Seq.empty | |
| } | |
| } |
Otherwise, GA task may can't compile proto file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old protobuf deps in root pom causes so many troubles...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 821 to 831 in e56f31d
| <!-- In theory we need not directly depend on protobuf since Spark does not directly | |
| use it. However, when building with Hadoop/YARN 2.2 Maven doesn't correctly bump | |
| the protobuf version up from the one Mesos gives. For now we include this variable | |
| to explicitly bump the version when building with YARN. It would be nice to figure | |
| out why Maven can't resolve this correctly (like SBT does). --> | |
| <dependency> | |
| <groupId>com.google.protobuf</groupId> | |
| <artifactId>protobuf-java</artifactId> | |
| <version>${protobuf.hadoopDependency.version}</version> | |
| <scope>${hadoop.deps.scope}</scope> | |
| </dependency> |
From the comments, maybe we can try to remove the old protobuf dependency. mesos uses the shaded-protobuf one now. Let me do some tests with different hadoop profiles first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into it!
|
Since this pr will include some initialization work of the sql module, I think we should promote this first to facilitate the submission of other pr related to the sql module. |
|
@techaddict could you focus on this one first? |
|
@LuciferYang @gengliangwang sounds good |
Codecov Report
@@ Coverage Diff @@
## master #39164 +/- ##
=======================================
Coverage 76.50% 76.50%
=======================================
Files 249 249
Lines 61187 61187
Branches 9069 9069
=======================================
Hits 46811 46811
Misses 13097 13097
Partials 1279 1279
Flags with carried forward coverage won't be shown. Click here to find out more. 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
sql/core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto
Outdated
Show resolved
Hide resolved
|
@gengliangwang @LuciferYang addressed all the comments |
|
@techaddict How about following #39139? In that way, we don't need to have another shading in the SQL module. |
|
@gengliangwang done |
|
@LuciferYang This PR passes without the protobuf-java dep. Probably there is no maven build in GA tests? |
|
Yes, GA only runs sbt tests now , while Maven only run build with Java 11 & 17, no tests. |
|
Run locally |
|
@LuciferYang I see the issue, too; it looks like the way around is to add a maven dependency |
|
@gengliangwang @LuciferYang rebased this, and its ready for review |
...re/src/main/scala/org/apache/spark/status/protobuf/sql/SparkPlanGraphWrapperSerializer.scala
Outdated
Show resolved
Hide resolved
...re/src/main/scala/org/apache/spark/status/protobuf/sql/SparkPlanGraphWrapperSerializer.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only one comment
+1, LGTM
|
Thanks, merging to master |
What changes were proposed in this pull request?
Add Protobuf serializer for SparkPlanGraphWrapper
Why are the changes needed?
Support fast and compact serialization/deserialization for SparkPlanGraphWrapper over RocksDB.
Does this PR introduce any user-facing change?
No
How was this patch tested?
New UT