[SPARK-41768][CORE] Refactor the definition of enum to follow with the code style#39286
[SPARK-41768][CORE] Refactor the definition of enum to follow with the code style#39286panbingkun wants to merge 6 commits intoapache:masterfrom
Conversation
…atus` to follow with the code style
| private def PREFIX = "JOB_EXECUTION_STATUS_" | ||
|
|
||
| private[protobuf] def serialize(input: JobExecutionStatus): StoreTypes.JobExecutionStatus = { | ||
| StoreTypes.JobExecutionStatus.valueOf(PREFIX + input.toString) |
There was a problem hiding this comment.
How much additional delay will string join and string remove bring?
There was a problem hiding this comment.
how about use a map to store the mapping relationships?
There was a problem hiding this comment.
+1 for using pattern match here.
There was a problem hiding this comment.
@panbingkun shall we refactor
enum DeterministicLevelas well?
Done
…atus` to follow with the code style
|
Can one of the admins verify this patch? |
|
@panbingkun shall we refactor |
…atus` to follow with the code style
|
|
||
| private[protobuf] object DeterministicLevelSerializer { | ||
|
|
||
| def serialize(input: DeterministicLevel.Value): GDeterministicLevel = { |
There was a problem hiding this comment.
I still want to know which of the following code and case match is faster
private lazy val scalaToPb = Map(
DeterministicLevel.DETERMINATE -> GDeterministicLevel.DETERMINISTIC_LEVEL_DETERMINATE,
DeterministicLevel.UNORDERED -> GDeterministicLevel.DETERMINISTIC_LEVEL_UNORDERED,
DeterministicLevel.INDETERMINATE -> GDeterministicLevel.DETERMINISTIC_LEVEL_INDETERMINATE
)
def serialize(input: DeterministicLevel.Value): GDeterministicLevel = scalaToPb(input)There was a problem hiding this comment.
A simple test is as follows:
run with GA: https://github.com/LuciferYang/spark/actions/runs/3816561346
case match is slow:
OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Linux 5.15.0-1023-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test serialize: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use map 192 195 3 5.2 192.4 1.0X
Use case match 454 455 1 2.2 453.8 0.4X
There was a problem hiding this comment.
@LuciferYang Could you run it again in reverse order?
Because when I run on local as follow, result:

It is very weird!
There was a problem hiding this comment.
@LuciferYang When I run it on GA, result as follow:

https://github.com/panbingkun/spark/actions/runs/3816951445/jobs/6492899136
Code:
https://github.com/apache/spark/pull/39286/files#diff-eca16ee278b786f4ab866072396455b93c6bedd8e3b8d562935623931ae3936cR36-R58
There was a problem hiding this comment.
OK, I see , it fine to me to use case match now. I think we should investigate why the result data changes with the test order , but this does not block this pr
…atus` to follow with the code style
|
cloud you update the pr title and description |
JobExecutionStatus to follow with the code style| import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} | ||
| import org.apache.spark.status.protobuf.StoreTypes.{DeterministicLevel => GDeterministicLevel} | ||
|
|
||
| object SerializerBenchmark extends BenchmarkBase { |
There was a problem hiding this comment.
I don't think we need to put the benchmark into the code base.
There was a problem hiding this comment.
Let me delete it.
| private def serializeRDDOperationNode(node: RDDOperationNode): StoreTypes.RDDOperationNode = { | ||
| val outputDeterministicLevel = StoreTypes.RDDOperationNode.DeterministicLevel | ||
| .valueOf(node.outputDeterministicLevel.toString) | ||
| val outputDeterministicLevel = DeterministicLevelSerializer.serialize( |
There was a problem hiding this comment.
Since DeterministicLevelSerializer is only used here, shall we move it into this file?
gengliangwang
left a comment
There was a problem hiding this comment.
LGTM except for two minor comments.
| SUCCEEDED = 2; | ||
| FAILED = 3; | ||
| UNKNOWN = 4; | ||
| JOB_EXECUTION_STATUS_RUNNING = 1; |
There was a problem hiding this comment.
This feels redundant to have the name in the enum class and in the constant name?
There was a problem hiding this comment.
@srowen Yes, but we are following https://developers.google.com/protocol-buffers/docs/style#enums here. The purpose is to avoid naming conflicts. For example, if there is another enum containing FAILED or SUCCEEDED, the Protobuf compiler won't fail.
|
Thanks, merging to master |

What changes were proposed in this pull request?
The pr aims to refactor the definition of enum in
UI protobuf serializerto follow with the code style.Why are the changes needed?
Following code style:

https://developers.google.com/protocol-buffers/docs/style#enums
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Pass GA
Existed UT.