-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-41429][UI] Protobuf serializer for RDDOperationGraphWrapper #39110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-41429][UI] Protobuf serializer for RDDOperationGraphWrapper #39110
Conversation
cf450e3 to
877faaf
Compare
|
@gengliangwang @LuciferYang rebased this one, and it's ready for review |
| val wrapper = StoreTypes.RDDOperationGraphWrapper.parseFrom(bytes) | ||
| new RDDOperationGraphWrapper( | ||
| stageId = wrapper.getStageId, | ||
| edges = wrapper.getEdgesList.asScala.map(deserializeRDDOperationEdge).toSeq, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that there are many redundant toSeq for Scala 2.12 in status.protobuf package, this is for Scala 2.13 compatibility due to Seq represents collection.Seq in Scala 2.12 and immutable.Seq in Scala 2.13.
This conversion will not affect Scala 2.12, but will make the performance of Scala 2.13 worse than Scala 2.12. Since these are internal definitions of Spark, I suggest explicitly defining them as scala.collection.Seq to make no performance difference between Scala 2.12 and Scala 2.13. Do you think it's ok? @gengliangwang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LuciferYang Agree, and since these are private[spark], it shouldn't be an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explicitly defining them as scala.collection.Seq to make no performance difference
@LuciferYang could you explain details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/apache/spark/pull/39215/files is doing some refactor work, which does not block this one
| toId = edge.getToId) | ||
| } | ||
|
|
||
| private def serializeDeterministicLevel(d: DeterministicLevel.Value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only used by one place, maybe inline is ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeterministicLevel is used in multiple places; we might need to re-use it.
core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto
Outdated
Show resolved
Hide resolved
core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/status/protobuf/RDDOperationGraphWrapperSerializer.scala
Outdated
Show resolved
Hide resolved
|
@gengliangwang addressed all the comments |
| repeated RDDOperationEdge outgoing_edges = 3; | ||
| repeated RDDOperationEdge incoming_edges = 4; | ||
| RDDOperationClusterWrapper root_cluster = 5; | ||
| enum DeterministicLevel { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if DeterministicLevel only used by RDDOperationNode, should we move it into RDDOperationNode? Will DeterministicLevel be used elsewhere in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, make sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LuciferYang @gengliangwang done 👍🏼
|
@techaddict can you resolve the conflicts? |
|
@gengliangwang updated the PR |
|
Thanks, merging to master |
What changes were proposed in this pull request?
Add Protobuf serializer for RDDOperationGraphWrapper
Why are the changes needed?
Support fast and compact serialization/deserialization for RDDOperationGraphWrapper over RocksDB.
Does this PR introduce any user-facing change?
No
How was this patch tested?
New UT