Skip to content

Conversation

@techaddict
Copy link
Contributor

What changes were proposed in this pull request?

Add Protobuf serializer for RDDOperationGraphWrapper

Why are the changes needed?

Support fast and compact serialization/deserialization for RDDOperationGraphWrapper over RocksDB.

Does this PR introduce any user-facing change?

No

How was this patch tested?

New UT

@github-actions github-actions bot added the CORE label Dec 18, 2022
@techaddict techaddict changed the title [SPARK-41429] Protobuf serializer for RDDOperationGraphWrapper [SPARK-41429][UI] Protobuf serializer for RDDOperationGraphWrapper Dec 19, 2022
@techaddict techaddict force-pushed the SPARK-41429-RDDOperationGraphWrapper branch from cf450e3 to 877faaf Compare December 23, 2022 13:59
@techaddict
Copy link
Contributor Author

@gengliangwang @LuciferYang rebased this one, and it's ready for review

val wrapper = StoreTypes.RDDOperationGraphWrapper.parseFrom(bytes)
new RDDOperationGraphWrapper(
stageId = wrapper.getStageId,
edges = wrapper.getEdgesList.asScala.map(deserializeRDDOperationEdge).toSeq,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that there are many redundant toSeq for Scala 2.12 in status.protobuf package, this is for Scala 2.13 compatibility due to Seq represents collection.Seq in Scala 2.12 and immutable.Seq in Scala 2.13.

This conversion will not affect Scala 2.12, but will make the performance of Scala 2.13 worse than Scala 2.12. Since these are internal definitions of Spark, I suggest explicitly defining them as scala.collection.Seq to make no performance difference between Scala 2.12 and Scala 2.13. Do you think it's ok? @gengliangwang

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LuciferYang Agree, and since these are private[spark], it shouldn't be an issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicitly defining them as scala.collection.Seq to make no performance difference

@LuciferYang could you explain details?

Copy link
Contributor

@LuciferYang LuciferYang Dec 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/apache/spark/pull/39215/files is doing some refactor work, which does not block this one

toId = edge.getToId)
}

private def serializeDeterministicLevel(d: DeterministicLevel.Value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only used by one place, maybe inline is ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeterministicLevel is used in multiple places; we might need to re-use it.

@techaddict
Copy link
Contributor Author

@gengliangwang addressed all the comments

repeated RDDOperationEdge outgoing_edges = 3;
repeated RDDOperationEdge incoming_edges = 4;
RDDOperationClusterWrapper root_cluster = 5;
enum DeterministicLevel {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if DeterministicLevel only used by RDDOperationNode, should we move it into RDDOperationNode? Will DeterministicLevel be used elsewhere in the future?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, make sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gengliangwang
Copy link
Member

@techaddict can you resolve the conflicts?

@techaddict
Copy link
Contributor Author

@gengliangwang updated the PR

@gengliangwang
Copy link
Member

Thanks, merging to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants