Skip to content

Comments

[SPARK-41768][CORE] Refactor the definition of enum to follow with the code style#39286

Closed
panbingkun wants to merge 6 commits intoapache:masterfrom
panbingkun:SPARK-41768
Closed

[SPARK-41768][CORE] Refactor the definition of enum to follow with the code style#39286
panbingkun wants to merge 6 commits intoapache:masterfrom
panbingkun:SPARK-41768

Conversation

@panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Dec 29, 2022

What changes were proposed in this pull request?

The pr aims to refactor the definition of enum in UI protobuf serializer to follow with the code style.

Why are the changes needed?

Following code style:
https://developers.google.com/protocol-buffers/docs/style#enums
image

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass GA
Existed UT.

private def PREFIX = "JOB_EXECUTION_STATUS_"

private[protobuf] def serialize(input: JobExecutionStatus): StoreTypes.JobExecutionStatus = {
StoreTypes.JobExecutionStatus.valueOf(PREFIX + input.toString)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much additional delay will string join and string remove bring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or following as?
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about use a map to store the mapping relationships?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for using pattern match here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@panbingkun shall we refactor enum DeterministicLevel as well?

Done

@panbingkun
Copy link
Contributor Author

cc @gengliangwang

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@gengliangwang
Copy link
Member

@panbingkun shall we refactor enum DeterministicLevel as well?


private[protobuf] object DeterministicLevelSerializer {

def serialize(input: DeterministicLevel.Value): GDeterministicLevel = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still want to know which of the following code and case match is faster

  private lazy val scalaToPb = Map(
    DeterministicLevel.DETERMINATE -> GDeterministicLevel.DETERMINISTIC_LEVEL_DETERMINATE,
    DeterministicLevel.UNORDERED -> GDeterministicLevel.DETERMINISTIC_LEVEL_UNORDERED,
    DeterministicLevel.INDETERMINATE -> GDeterministicLevel.DETERMINISTIC_LEVEL_INDETERMINATE
  )

  def serialize(input: DeterministicLevel.Value): GDeterministicLevel = scalaToPb(input)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple test is as follows:

LuciferYang@3d4dc25

run with GA: https://github.com/LuciferYang/spark/actions/runs/3816561346

case match is slow:

OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Linux 5.15.0-1023-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test serialize:                           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Use map                                             192            195           3          5.2         192.4       1.0X
Use case match                                      454            455           1          2.2         453.8       0.4X

Copy link
Contributor Author

@panbingkun panbingkun Jan 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LuciferYang Could you run it again in reverse order?
Because when I run on local as follow, result:
image
It is very weird!

Copy link
Contributor Author

@panbingkun panbingkun Jan 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see , it fine to me to use case match now. I think we should investigate why the result data changes with the test order , but this does not block this pr

@LuciferYang
Copy link
Contributor

cloud you update the pr title and description

@panbingkun panbingkun changed the title [SPARK-41768][CORE] Refactor the definition of enum - JobExecutionStatus to follow with the code style [SPARK-41768][CORE] Refactor the definition of enum in protobuf to follow with the code style Jan 1, 2023
@panbingkun panbingkun changed the title [SPARK-41768][CORE] Refactor the definition of enum in protobuf to follow with the code style [SPARK-41768][CORE] Refactor the definition of enum to follow with the code style Jan 2, 2023
import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
import org.apache.spark.status.protobuf.StoreTypes.{DeterministicLevel => GDeterministicLevel}

object SerializerBenchmark extends BenchmarkBase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to put the benchmark into the code base.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me delete it.

private def serializeRDDOperationNode(node: RDDOperationNode): StoreTypes.RDDOperationNode = {
val outputDeterministicLevel = StoreTypes.RDDOperationNode.DeterministicLevel
.valueOf(node.outputDeterministicLevel.toString)
val outputDeterministicLevel = DeterministicLevelSerializer.serialize(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since DeterministicLevelSerializer is only used here, shall we move it into this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@gengliangwang gengliangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for two minor comments.

SUCCEEDED = 2;
FAILED = 3;
UNKNOWN = 4;
JOB_EXECUTION_STATUS_RUNNING = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels redundant to have the name in the enum class and in the constant name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen Yes, but we are following https://developers.google.com/protocol-buffers/docs/style#enums here. The purpose is to avoid naming conflicts. For example, if there is another enum containing FAILED or SUCCEEDED, the Protobuf compiler won't fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok I get it

@gengliangwang
Copy link
Member

Thanks, merging to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants