[SPARK-37809][K8S] Add `YuniKorn` Feature Step #35663

yangwwei · 2022-02-26T00:51:21Z

What changes were proposed in this pull request?

Add yunikorn feature step in order to support yunikorn scheduler. Apache yunikorn is an open-source batch scheduler for K8s, designed to solve the problems of running batch workloads, e.g Spark, on K8s. This PR will add a build profile yunikorn, the yunikorn feature step, UT, and integration tests.

This PR introduced the bare minimal changes for the integration with yunikorn. YuniKorn has a concept of the application, and a pod will only be scheduled if it belongs to an application. Therefore, it needs to identify which pod belongs to which application. This is done by looking at the pod annotation: yunikorn.apache.org/app-id, and this must be a unique ID per job. In this PR, we automatically set this pod annotation with value equals to spark appID for all the driver and executor pods.

Why are the changes needed?

This is a part of the effort SPARK-36057, in order to make Spark natively support customized K8s schedulers. This PR particularly is similar to #35422.

Does this PR introduce any user-facing change?

Once this is done, user can easily submit their jobs to yunikorn scheduler with the following configs:

--conf spark.kubernete.driver.scheduler.name=yunikorn \
--conf spark.kubernetes.driver.pod.featureSteps=org.apache.spark.deploy.k8s.features.scheduler.YuniKornFeatureStep

on a yunikorn enabled cluster, the Spark job will be scheduled by the yunikorn scheduler instead of the default scheduler.

How was this patch tested?

UT

Integration tests

On docker-desktop 4.4.2, K8s 1.22.5, I've run the integration tests with and without yunikorn profile:

without yunikorn

resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh --exclude-tags minikube,r --deploy-mode docker-desktop

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Spark Project Parent POM 3.3.0-SNAPSHOT:
[INFO] 
[INFO] Spark Project Parent POM ........................... SUCCESS [  4.969 s]
[INFO] Spark Project Tags ................................. SUCCESS [  7.028 s]
[INFO] Spark Project Local DB ............................. SUCCESS [  5.965 s]
[INFO] Spark Project Networking ........................... SUCCESS [  9.096 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  9.911 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [  5.523 s]
[INFO] Spark Project Launcher ............................. SUCCESS [  5.144 s]
[INFO] Spark Project Core ................................. SUCCESS [02:28 min]
[INFO] Spark Project Kubernetes ........................... SUCCESS [ 39.188 s]
[INFO] Spark Project Kubernetes Integration Tests ......... SUCCESS [15:25 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  19:21 min
[INFO] Finished at: 2022-02-26T21:34:21-08:00
[INFO] ------------------------------------------------------------------------

with yunikorn profile: -Pyunikorn

Download yunikorn 0.12.2 release:
Install yunikorn locally: helm install yunikorn ./helm-charts/yunikorn
Run integration tests

build/sbt -Pyunikorn -Dspark.kubernetes.test.deployMode=docker-desktop -Pkubernetes -Pkubernetes-integration-tests -Dtest.exclude.tags=minikube,r "kubernetes-integration-tests/test"

[info] KubernetesSuite:
  ...
[info] YuniKornSuite:
 ...
[info] Run completed in 32 minutes, 4 seconds.
[info] Total number of tests run: 45
[info] Suites: completed 2, aborted 0
[info] Tests: succeeded 45, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

Yikun

@yangwwei thanks for contribution, and there are some comments inline, and also some note as below:

Please enable the github action in your repo to make CI green.
It would be good to add the yunikorn deploy cmd line in PR description to help user test using yunikorn.
Feel free to paste the integration test results when it's ready.
Note that Spark support arm64 and x86 officially, so we need to test it both in x86 and arm64. Especially, Spark also running the integration test under MacOS Silicon.

...etes/core/src/test/scala/org/apache/spark/deploy/k8s/features/YuniKornFeatureStepSuite.scala

...gration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/YuniKornSuite.scala

...etes/core/src/test/scala/org/apache/spark/deploy/k8s/features/YuniKornFeatureStepSuite.scala

Yikun · 2022-02-28T02:58:13Z

project/SparkBuild.scala

Suggested change

unmanagedSources / excludeFilter := HiddenFileFilter || "*YuniKorn*.scala")

unmanagedSources / excludeFilter += HiddenFileFilter || "*YuniKorn*.scala")

Looks like yunikorn setting overwrite accidently, try to use += rather than := in here.

[1] https://www.scala-sbt.org/1.x/docs/Appending-Values.html

@Yikun it seems volcano needs the same improvement.

martin-g · 2022-02-28T07:06:04Z

...ubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/YuniKornFeatureStep.scala

I think it would be good to extract this as a constant in a companion object YuniKornFeatureStep and reuse it in the test.

Good suggestion, just get that done in the last commit. Thanks, @martin-g .

Yikun · 2022-02-28T09:24:45Z

project/SparkBuild.scala

[1] https://groups.google.com/g/simple-build-tool/c/zVMyoWRAVWg?hl=en
[2] https://stackoverflow.com/questions/41627627/sbt-0-13-8-what-does-the-settingkey-method-do

Note for myself and othre reviewers. : ) but looks like we couldn't find any offcial sbt doc

dongjoon-hyun

Sorry, but I don't think we need this at this stage.

spark.kubernetes.scheduler.name can be used for any custom scheduler name.
We can add any annotation names easily.

holdenk · 2022-03-01T20:50:02Z

Jenkins ok to test

dongjoon-hyun · 2022-03-01T20:51:25Z

Jenkins ok to test

@holdenk . As you know, AMP lab infra is gone.

holdenk · 2022-03-01T20:52:21Z

...ubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/YuniKornFeatureStep.scala

So this is the one part that we can't do by just setting the scheduler name right now correct? Are there any other parts that you think we need to add to the featurestep in the future?

Correct. This is the one part needed to get this working for yunikorn. Except for this one, there is another one needed: https://issues.apache.org/jira/browse/SPARK-38310, I will be working on it once this one gets merged.

yangwwei · 2022-03-01T21:40:15Z

Sorry, but I don't think we need this at this stage.

spark.kubernetes.scheduler.name can be used for any custom scheduler name.

We can add any annotation names easily.

I am sorry I did not get your point. This is the same effort as what was done for volcano feature step #35422, a separate feature step for yunikorn. This was proposed together in the SPIP doc, why this is not needed?

dongjoon-hyun · 2022-03-01T21:47:36Z

Sorry, but I don't think we need this at this stage.

spark.kubernetes.scheduler.name can be used for any custom scheduler name.

We can add any annotation names easily.

I am sorry I did not get your point. This is the same effort as what was done for volcano feature step #35422, a separate feature step for yunikorn. This was proposed together in the SPIP doc, why this is not needed?

It's because master branch already supports the following. I don't see any value addition in this PR at this stage.

--conf spark.kubernete.driver.scheduler.name=yunikorn \

yangwwei · 2022-03-01T22:00:42Z

It's because master branch already supports the following. I don't see any value addition in this PR at this stage.

Sorry, I should add more context in the beginning.
The additional thing is the change for adding some pod annotation, like what @holdenk pointed out: #35663 (comment). And note, this is the base for yunikorn related changes, after we have the yunikorn feature step, I will create another PR to add queue-related configs, that will be tracked in https://issues.apache.org/jira/browse/SPARK-38310. For phase 1 (targeted for Spark 3.3), that's the main stuff to get spark jobs scheduled by yunikorn. For phase 2, there will be more optimizations coming. Hope this clarifies things.

dongjoon-hyun

If then, please bring this back with more requirement at phase 2's real use case, @yangwwei .

I don't think see any values in this PR including that static annotation, too.

addToAnnotations(YuniKornFeatureStep.AppIdAnnotationKey, kubernetesConf.appId)

yangwwei · 2022-03-01T23:39:25Z

hi @dongjoon-hyun

Sorry, I am not convinced. Without this, there is no integration with yunikorn. Volcano side changes already got merged, this is a very similar change. With this PR and the coming one for https://issues.apache.org/jira/browse/SPARK-38310. Users will have a very simple (and consistent) way to submit Spark jobs to yunikorn enabled cluster, that's one of the goals of SPIP Spark-36057 Support Customized Kubernetes Schedulers Proposal. Why do we have to wait?

dongjoon-hyun · 2022-03-01T23:44:06Z

It seems that you missed my point. This is just a duplication, @yangwwei . We want to avoid this kind of boiler templating as much as possible.

this is a very similar change.

Again, we already support --conf spark.kubernete.driver.scheduler.name=yunikorn in the master branch when we extend for volcano. Please be specific about missing part and what we really need.

yangwwei · 2022-03-02T00:08:06Z

For the scheduler name, that's fine. In this PR, the addition is we need to add this:

addToAnnotations(YuniKornFeatureStep.AppIdAnnotationKey, kubernetesConf.appId)

this is needed in order to let yunikorn know which pods belong to which job. Otherwise, it won't be able to be scheduled. In the next PR for queue-related configs, I will add some other logic to load the queue configs from conf and add that to the pod annotation.

PS: Updated the description to include some more details why we need this. Hope this makes sense.

dongjoon-hyun · 2022-03-02T01:20:04Z

Here is my answer. I believe this is the desirable way which Apache Spark wants to go, @yangwwei . We want to extensible and support all future custom scheduler instead of locking in any specific scheduler.

[SPARK-38383][K8S] Support APP_ID and EXECUTOR_ID placeholder in annotations #35704

yangwwei · 2022-03-02T02:31:54Z

hi @dongjoon-hyun thank you for sharing your thoughts in the PR.

Maybe this PR gives the impression that we just need to add some annotations, but the actual integration will be much more complicated than that. AppID and queue name are the very first thing for integrating with a scheduler, to take advantage of the scheduler features, there are more than that. If you look at this section in the SPIP doc, it gives some more context. A full story will require us to support many things that (most of them) are already supported in YARN, such as priority, preemption, gang scheduling, etc. For these features, we will need to add more logic to tweak pod spec, or add additional K8s resources. And different scheduler implementations have different semantics to support these features. That's why we want to introduce scheduler feature step, in order to customize this with e.g VolcanoFeatureStep, YuniKornFeatureStep. The 1st phase for yunikorn, as well as volcano, is simple: let the Spark job be able to be scheduled by a customized scheduler natively. But it doesn't stop here, based on the added feature step, we can do more integration in the 2nd, 3rd phases. Hope this clarifies things. Thanks!

dongjoon-hyun · 2022-03-02T02:42:43Z

@yangwwei . It seems that you didn't pay attention for the content of my feedbacks.

Maybe this PR gives the impression that we just need to add some annotations,

My first comment was the following #35663 (review) . I know you need to add more, but I don't think we need to duplicate Apache Spark like the original PR.

Sorry, but I don't think we need this at this stage.

spark.kubernetes.scheduler.name can be used for any custom scheduler name.

We can add any annotation names easily.

My second comment was the following.

If then, please bring this back with more requirement at phase 2's real use case, @yangwwei .
I don't think see any values in this PR including that static annotation, too.

My 3rd comment was

It seems that you missed my point. This is just a duplication, @yangwwei . We want to avoid this kind of boiler templating as much as possible. Again, we already support --conf spark.kubernete.driver.scheduler.name=yunikorn in the master branch when we extend for volcano. Please be specific about missing part and what we really need.

My review comments are consistent and the same. Please build the missing part by utilizing the existing one instead by simply claiming that you need the same copy of code again and again. We want to be extensible and support all future custom schedulers instead of duplicating for all customer schedulers.

From my perspective, I'd like to recommend to add a new feature to YuniKorn to support custom ID annotation and allow Spark jobs to specify it to spark-app-id. YuniKorn is not a golden standard written on rock. Please improve it more flexible first.

Lastly, we are reviewing the PR piece by piece. Please make a PR meaningful, complete, and reasonable. Unfortunately, I don't think this PR meets the criteria.

Yikun · 2022-03-02T02:56:42Z

@yangwwei

There are two alternative way to support Yunikorn at this stage:

New annotation placeholder support which @dongjoon-hyun introduced. In theory, all annotations can be set this way.(such as queue placeholder). It's very flexible. This is very helpful for custom schedulers which only need annotations set.
A separate feature step (this PR): this will help to simplify annotation configuration, we just need to set feature step to instead of setting many annotations conf. This way it's more friendly to schedulers which need to create extra CRD and also need many annotations.

I think dongjoon-hyun's concern is how to use the most simple way to integrate Yunikorn at this stage. And I also think even if we support by any way, spark with Yunikorn can also doc in spark, telling users how to use it officially.

martin-g · 2022-03-02T09:13:43Z

...etes/core/src/test/scala/org/apache/spark/deploy/k8s/features/YuniKornFeatureStepSuite.scala

should these tests also use yunikornTag ?

yangwwei · 2022-03-07T06:16:53Z

Hi @dongjoon-hyun

I want to work with you and see what is the best way to solve this. Apologies for this long comment, there are mainly 2 parts: 1) why yunikorn feature step; 2) if not feature step, what's the alternative.

1) why yunikorn feature step

In the proposal, the goal is to provide users a customizable and consistent fashion to use a 3rd party K8s scheduler for Spark, we proposed the following user-facing changes when submitting spark jobs:

--conf spark.kubernetes.driver.scheduler.name=xxx 
--conf spark.kubernetes.driver.pod.featureSteps=org.apache.spark.deploy.k8s.features.scheduler.XxxFeatureStep
--conf spark.kubernetes.job.queue default
--conf spark.kubernetes.job.minCPU4
--conf spark.kubernetes.job.minMemory8G

both Volcano and YuniKorn will honor these configs and set up driver/executor pods accordingly via feature step. That's why I was waiting for @Yikun to finish the general stuff in the K8s feature step code implementation and then submitted this PR. For yunikorn side logic, there is no major difference from Volcano. We need to set up a few pod annotations for appID, job queue, and also K8s CRD. In the case of yunikorn, it is application.yunikorn.apache.org, CRD definition. The only difference is, PodGroup is a mandatory resource required by Volcano, but app CRD is optional in YuniKorn. So in the 1st phase, my PR doesn't introduce the CRD creation, but at least we have the basic integration working. BTW, yunikorn has already passed the ASF graduation votes, so it will become to be an Apache TLP in a few weeks.

2) if not feature step, what's the alternative

@Yikun summarize the alternative here: use the annotation placeholders introduced via: #35704. I looked into this approach, that looks like we will need to set up something like:

--conf spark.kubernetes.driver.scheduler.name=yunikorn 
--conf spark.kubernetes.driver.annotation.yunikorn.apache.org/app-id={{APP_ID}}
--conf spark.kubernetes.executor.annotation.yunikorn.apache.org/app-id={{APP_ID}}
--conf spark.kubernetes.job.queue default
--conf spark.kubernetes.job.minCPU 4
--conf spark.kubernetes.job.minMemory 8G

this can work for the 1st phase. However, I am not sure how to achieve our 2nd phase target when the CRD is introduced. Are you suggesting for the 1st phase we use this approach, and add the feature step in the 2nd phase? Will that lead to a different user experience for the end-users?

I really appreciate it if you can share your thoughts. Thanks!

martin-g · 2022-03-07T08:59:48Z

@yangwwei I guess it was just a typo and copy paste error but --conf spark.kubernete... should be --conf spark.kubernetes...
Also the min related configs should be minCPU and minMemory. There is no namespace min.
Please also check whether podTemplate might be used for Yunikorn integration needs.

yangwwei · 2022-03-07T17:17:32Z

@yangwwei I guess it was just a typo and copy paste error but --conf spark.kubernete... should be --conf spark.kubernetes... Also the min related configs should be minCPU and minMemory. There is no namespace min. Please also check whether podTemplate might be used for Yunikorn integration needs.

Thanks!! Sorry for being lazy, I just copy them from the proposal doc. Fixed the typos in the doc as well.

Outdated

dongjoon-hyun

Could you rebase this PR to the master branch for Apache Spark 3.4.0, @yangwwei ?

dongjoon-hyun · 2022-08-17T05:16:02Z

project/SparkBuild.scala

 }

+object YuniKorn {
+  // Exclude all yunikorn file for Compile and Test


nit. file -> files. Last time, we missed this at Volcano PR.

dongjoon-hyun · 2022-08-17T05:24:57Z

...ubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/YuniKornFeatureStep.scala

+  override def configurePod(pod: SparkPod): SparkPod = {
+    val k8sPodBuilder = new PodBuilder(pod.pod)
+      .editMetadata()
+      .addToAnnotations(YuniKornFeatureStep.AppIdAnnotationKey, kubernetesConf.appId)


Since Apache Spark 3.3.0, SPARK-38383 supports APP_ID and EXECUTOR_ID placeholder in annotations. Do we need this specialized logic still?

dongjoon-hyun · 2022-08-17T05:28:39Z

...ubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/YuniKornFeatureStep.scala

+
+object YuniKornFeatureStep {
+  val AppIdAnnotationKey = "yunikorn.apache.org/app-id"
+  val SchedulerName = "yunikorn"


Apache Spark naming scheme use all CAPITAL for constants.

https://spark.apache.org/contributing.html (Code style guide)

https://github.com/databricks/scala-style-guide#naming-convention

- val AppIdAnnotationKey = "yunikorn.apache.org/app-id" - val SchedulerName = "yunikorn" + val APP_ID_ANNOTATION_KEY = "yunikorn.apache.org/app-id" + val SCHEDULER_NAME = "yunikorn"

BTW, you are already using this style in this PR in the following.

private[spark] object YuniKornTestsSuite extends SparkFunSuite { val YUNIKORN_FEATURE_STEP = classOf[YuniKornFeatureStep].getName }

dongjoon-hyun · 2022-08-17T05:30:31Z

...etes/core/src/test/scala/org/apache/spark/deploy/k8s/features/YuniKornFeatureStepSuite.scala

+import org.apache.spark.{SparkConf, SparkFunSuite}
+import org.apache.spark.deploy.k8s._
+
+class YuniKornFeatureStepSuite extends SparkFunSuite {


This PR seems to duplicate the existing test coverage. Could you remove this test suite?

spark/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStepSuite.scala

Line 41 in 74deefb

"yunikorn.apache.org/app-id" -> "{{APPID}}")

spark/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/BasicTestsSuite.scala

Line 85 in 74deefb

.set("spark.kubernetes.driver.annotation.yunikorn.apache.org/app-id", "{{APP_ID}}")

spark/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/BasicTestsSuite.scala

Line 92 in 74deefb

.set("spark.kubernetes.executor.annotation.yunikorn.apache.org/app-id", "{{APP_ID}}")

dongjoon-hyun · 2022-08-17T05:42:18Z

...on-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/YuniKornTestsSuite.scala

+    assert(annotations.get(YuniKornFeatureStep.AppIdAnnotationKey) === appId)
+  }
+
+  test("Run SparkPi with yunikorn scheduler", k8sTestTag, yunikornTag) {


We need the actual YuniKorn feature test coverage here. Otherwise, we cannot prevent the future regression. Technically, I don't think YuniKorn has less features than Volcano scheduler. At least, we need similar test coverage with Volcano and it would be great if we can have more test coverage about YuniKorn's specialty. FYI, we have at least the following test coverage at least at Volcano.

spark/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/VolcanoTestsSuite.scala

Line 304 in 74deefb

test("SPARK-38187: Run SparkPi Jobs with minCPU", k8sTestTag, volcanoTag) {

spark/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/VolcanoTestsSuite.scala

Line 321 in 74deefb

test("SPARK-38187: Run SparkPi Jobs with minMemory", k8sTestTag, volcanoTag) {

spark/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/VolcanoTestsSuite.scala

Line 338 in 74deefb

test("SPARK-38188: Run SparkPi jobs with 2 queues (only 1 enabled)", k8sTestTag, volcanoTag) {

spark/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/VolcanoTestsSuite.scala

Line 360 in 74deefb

test("SPARK-38188: Run SparkPi jobs with 2 queues (all enabled)", k8sTestTag, volcanoTag) {

spark/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/VolcanoTestsSuite.scala

Line 380 in 74deefb

test("SPARK-38423: Run driver job to validate priority order", k8sTestTag, volcanoTag) {

dongjoon-hyun

Hi, I left some comments. WDYT, @yangwwei?

De-duplicate by reusing SPARK-38383 and remove YuniKornFeatureStepSuite.scala.
Follow Apache Spark Coding Style consistently.
Add more test coverage to prevent YuniKorn feature regression.
nit. A typo.
Lastly, can we add some documents like https://github.com/apache/spark/blame/master/docs/running-on-kubernetes.md#L1728-L1813 ?

Also, cc @Yikun since he added Volcano before this at Apache Spark 3.3.0.

Yikun · 2022-08-17T06:43:07Z

I think @dongjoon-hyun already catch all key parts. And I think we also need:

doc in here to show requreiment and how to do complete IT for spark and volcano.
IIRC, as we dicussed before, we need to support x86/arm64 arch if we get it supported in Spark. According to Yunikorn doc, looks like it is still only supported amd64? https://hub.docker.com/r/apache/yunikron/tags . I will help to test in linux arm64/x86 and macos m1 when it's ready.
BTW, unrelated but might useful Volcano community is going to introduce the native multi-arch support in 1.7 (about relese in next month), I will help to upgrade Volcano to 1.7 version in Spark 3.4. (I believe this is also @dongjoon-hyun's concern) , so pls don't mislead by arm64/x86 separate installtions of spark volcano integrations.

dongjoon-hyun · 2022-08-17T06:55:10Z

Thank you, @Yikun . BTW, Yes, I'm a little worried about the tightly-coupled relation between Fabric8 K8s client and Volcano. Is there a way to support the latest Volcano while Apache Spark still uses io.fabric8:kubernetes-client 5.12.x?

dongjoon-hyun · 2022-08-17T06:57:23Z

BTW, @yangwwei . I'm reviewing this PR freshly. I might lose some context on this PR. Please let me know if I forgot something (which was discussed previously already).

Yikun · 2022-08-17T07:03:54Z

I'm a little worried about the tightly-coupled relation between Fabric8 K8s client and Volcano. Is there a way to support the latest Volcano while Apache Spark still uses io.fabric8:kubernetes-client 5.12.x?

Because the volcano API is compatible, it is naturally supported. That is to say, for the functions used by Spark, even if the old version of volcano client is used, it is still supported. Recently, I have done enough testing on the latest master version of Volcano and Spark. And volcano e2e_spark also protect it.

yangwwei · 2022-08-17T20:49:56Z

hi @dongjoon-hyun , @Yikun thanks for the comments, very good points. Let me work on the update. A few questions/comments in parallel:

For docs like here and here.

Can we address in another PR? Just want to keep this PR concise and focus on the code implementation.

De-duplicate by reusing SPARK-38383 and remove YuniKornFeatureStepSuite.scala.

I think we still need to preserve the feature step. From Spark, we want to give a consistent user experience when people want to run Spark with customized schedulers. Volcano integration was exposed via FeatureStep, I prefer we do the same for YuniKorn. It will be easier for users to configure. What do you think?

dongjoon-hyun · 2022-08-18T20:33:41Z

For the doc, you can proceed it in another PRs, but you should make those PRs first and mention them in your PR description as a link for the community reviews. Otherwise, we cannot verify your contribution.
Yes, we are expecting something like the following.

--conf spark.kubernetes.driver.scheduler.name=yunikorn \
--conf spark.kubernetes.driver.annotation.yunikorn.apache.org/app-id={{APP_ID}} \
--conf spark.kubernetes.executor.annotation.yunikorn.apache.org/app-id={{APP_ID}} \
--conf spark.kubernetes.driver.pod.featureSteps=org.apache.spark.deploy.k8s.features.YuniKornFeatureStep \
--conf spark.kubernetes.executor.pod.featureSteps=org.apache.spark.deploy.k8s.features.YuniKornFeatureStep

In short, Apache Spark community don't want to repeat the same logic like SPARK-38383 in every custom scheduler implementations and their test codes. It's only error-prone and misleading to have the same logic in many places. To be safe and consistent across all custom scheduler backend, please respect Apache Spark's feature.

dongjoon-hyun · 2022-08-18T20:37:56Z

BTW, is your concern that YuniKornFeatureStep has no other logic? So, we can use more simpler way like the following? Well, it's the best for Apache Spark to support many custom schedulers consistently, isn't it? Why do we need to have Java class like YuniKornFeatureStep additionally when we can do that with the existing configurations without code change at all.

--conf spark.kubernetes.driver.scheduler.name=yunikorn \
--conf spark.kubernetes.driver.annotation.yunikorn.apache.org/app-id={{APP_ID}} \
--conf spark.kubernetes.executor.annotation.yunikorn.apache.org/app-id={{APP_ID}} \

yangwwei · 2022-08-19T06:16:41Z

Hi @dongjoon-hyun

Wait a sec. I think I missed one point. Does this mean we can simply set {{APP_ID}}, and this {{APP_ID}} will be substituted to the actual job ID automatically? If this is the case, we don't need any code changes, using the configs like the following will work:

--conf spark.kubernetes.scheduler.name=yunikorn
--conf spark.kubernetes.driver.annotation.yunikorn.apache.org/app-id={{APP_ID}}
--conf spark.kubernetes.executor.annotation.yunikorn.apache.org/app-id={{APP_ID}}

if you can confirm this. I can remove this PR, and all we need is a doc PR!

Yikun · 2022-08-19T06:29:26Z

@yangwwei IIRC, we already had the discussion before [1]. If all info like queue, miniRes can be passed by annotation/label well, I think current Spark already meet the requirement of Yunikorn. That's also the recommand way which is mentioned in [2] Specify scheduler related configurations.

So, just like our offiline discusssion at the end of version 3.3, just need a doc for Yunikorn. On the other hand, even for documentation, we should meet the basic requirement of Spark which was mentioned above.

[1] #35663 (comment)
[2] https://spark.apache.org/docs/latest/running-on-kubernetes.html#customized-kubernetes-schedulers-for-spark-on-kubernetes

dongjoon-hyun · 2022-08-19T09:44:23Z

Yes, that is a place holder, @yangwwei . We can skip the code unless we need additional function with that class.
Yes, of course, we need a doc. However, Apache Spark needs a test coverage for what we claim to support. So, we need test coverage first.
To be clear, the documentation and test PRs can be backported to branch-3.3 still because they are not a new feature code change. So, we may want to claim YuniKorn support at Apache Spark 3.3.1 (if the requirement is fulfilled).

yangwwei · 2022-08-19T17:22:48Z

Hi @dongjoon-hyun , @Yikun

Gotcha. I think we can do the following:

I will remove the code changes in this PR. Maybe I will just close this one and create a new PR. That PR will majorly add test coverage for yunikorn. I don't think any code changes are needed other than test classes.
I will work on the doc task once the above PR is done.
Since no extra code change is needed to get this work, let's try to backport them to branch-3.3.

BTW, what is the release timeline for v3.3.1?

dongjoon-hyun · 2022-08-19T18:05:24Z

+1 for your plan, @yangwwei .

Apache Spark community maintains Feature Release Branches like branch-3.3 with bug fix releases for a period of 18 months.
- https://spark.apache.org/versioning-policy.html
In general, we are delivering multiple bug fix releases during that period. For example, Apache Spark 3.3.0 could have the following schedule. The specific date depends on the release managers who are going to volunteer them.
- 3.3.0: 2022/06
- 3.3.1: 2022/10 (3.3.0 + 4M)
- 3.3.2: 2023/02 (3.3.0 + 8M)
- 3.3.3: 2023/07 (3.3.0 + 13M)
- 3.3.4: 2023/12 (3.3.0 + 18M)

dongjoon-hyun · 2022-08-19T18:07:48Z

BTW, cc @sunchao since he is interested in the release manager for Apache Spark 3.3.1.

dongjoon-hyun · 2022-08-22T17:41:28Z

Gentle ping, @yangwwei .

yangwwei · 2022-08-23T05:10:47Z

hi @dongjoon-hyun , @sunchao I am working on the changes, I am pretty busy right now, will try my best to get the test code done this week. In the meantime, I have submitted PR: #37622 for the doc change. As no code changes are required, I think that can be reviewed in parallel. Please take a look, thanks!

dongjoon-hyun · 2022-08-23T14:35:25Z

Got it, @yangwwei . Take your time.

dongjoon-hyun · 2022-08-23T16:35:18Z

BTW, I reviewed #37622 and added a few comments. I will hold on that documentation PR to align with the other validations.

dongjoon-hyun · 2022-08-23T16:36:57Z

And, for this code PR, I'll close this for now and we are looking forward to seeing your next PR. Thank you so much for your time and making a progress this.

dongjoon-hyun · 2022-08-23T16:37:41Z

You can reuse SPARK-37809 for one of your future test code PRs.

### What changes were proposed in this pull request? Add a section under [customized-kubernetes-schedulers-for-spark-on-kubernetes](https://spark.apache.org/docs/latest/running-on-kubernetes.html#customized-kubernetes-schedulers-for-spark-on-kubernetes) to explain how to run Spark with Apache YuniKorn. This is based on the review comments from #35663. ### Why are the changes needed? Explain how to run Spark with Apache YuniKorn ### Does this PR introduce _any_ user-facing change? No Closes #37622 from yangwwei/SPARK-40187. Authored-by: Weiwei Yang <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

### What changes were proposed in this pull request? Add a section under [customized-kubernetes-schedulers-for-spark-on-kubernetes](https://spark.apache.org/docs/latest/running-on-kubernetes.html#customized-kubernetes-schedulers-for-spark-on-kubernetes) to explain how to run Spark with Apache YuniKorn. This is based on the review comments from #35663. ### Why are the changes needed? Explain how to run Spark with Apache YuniKorn ### Does this PR introduce _any_ user-facing change? No Closes #37622 from yangwwei/SPARK-40187. Authored-by: Weiwei Yang <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 4b18773) Signed-off-by: Dongjoon Hyun <[email protected]>

github-actions bot added BUILD INFRA KUBERNETES labels Feb 26, 2022

Yikun reviewed Feb 26, 2022

View reviewed changes

yangwwei changed the title ~~[WIP][SPARK-37809] Add yunikorn feature step~~ [SPARK-37809] Add yunikorn feature step Feb 27, 2022

Yikun reviewed Feb 28, 2022

View reviewed changes

martin-g approved these changes Feb 28, 2022

View reviewed changes

Yikun reviewed Feb 28, 2022

View reviewed changes

dongjoon-hyun requested changes Mar 1, 2022

View reviewed changes

dongjoon-hyun changed the title ~~[SPARK-37809] Add yunikorn feature step~~ [SPARK-37809][K8S] Add yunikorn feature step Mar 1, 2022

holdenk reviewed Mar 1, 2022

View reviewed changes

dongjoon-hyun previously requested changes Mar 1, 2022

View reviewed changes

martin-g reviewed Mar 2, 2022

View reviewed changes

dongjoon-hyun reviewed Aug 10, 2022

View reviewed changes

dongjoon-hyun reviewed Aug 17, 2022

View reviewed changes

dongjoon-hyun requested changes Aug 17, 2022

View reviewed changes

dongjoon-hyun changed the title ~~[SPARK-37809][K8S] Add yunikorn feature step~~ [SPARK-37809][K8S] Add YuniKorn Feature Step Aug 17, 2022

yangwwei mentioned this pull request Aug 23, 2022

[SPARK-40187][DOCS] Add Apache YuniKorn scheduler docs #37622

Closed

dongjoon-hyun closed this Aug 23, 2022

	unmanagedSources / excludeFilter := HiddenFileFilter \|\| "YuniKorn.scala")
	unmanagedSources / excludeFilter += HiddenFileFilter \|\| "YuniKorn.scala")

[SPARK-37809][K8S] Add YuniKorn Feature Step #35663

[SPARK-37809][K8S] Add YuniKorn Feature Step #35663

Uh oh!

Conversation

yangwwei commented Feb 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

UT

Integration tests

Uh oh!

Yikun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Yikun Feb 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

holdenk commented Mar 1, 2022

Uh oh!

dongjoon-hyun commented Mar 1, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yangwwei commented Mar 1, 2022

Uh oh!

dongjoon-hyun commented Mar 1, 2022

Uh oh!

yangwwei commented Mar 1, 2022

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

yangwwei commented Mar 1, 2022

Uh oh!

dongjoon-hyun commented Mar 1, 2022

Uh oh!

yangwwei commented Mar 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Mar 2, 2022

Uh oh!

yangwwei commented Mar 2, 2022

Uh oh!

dongjoon-hyun commented Mar 2, 2022

Uh oh!

Yikun commented Mar 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yangwwei commented Mar 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1) why yunikorn feature step

2) if not feature step, what's the alternative

Uh oh!

martin-g commented Mar 7, 2022

Uh oh!

yangwwei commented Mar 7, 2022

Uh oh!

[SPARK-37809][K8S] Add `YuniKorn` Feature Step #35663

[SPARK-37809][K8S] Add `YuniKorn` Feature Step #35663

yangwwei commented Feb 26, 2022 •

edited

Loading

Yikun Feb 28, 2022 •

edited

Loading

yangwwei commented Mar 2, 2022 •

edited

Loading

Yikun commented Mar 2, 2022 •

edited

Loading

yangwwei commented Mar 7, 2022 •

edited

Loading

dongjoon-hyun Aug 17, 2022 •

edited

Loading

dongjoon-hyun Aug 17, 2022 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

dongjoon-hyun commented Aug 17, 2022 •

edited

Loading

Yikun commented Aug 17, 2022 •

edited

Loading

yangwwei commented Aug 17, 2022 •

edited

Loading

dongjoon-hyun commented Aug 18, 2022 •

edited

Loading

dongjoon-hyun commented Aug 18, 2022 •

edited

Loading

Yikun commented Aug 19, 2022 •

edited

Loading

dongjoon-hyun commented Aug 19, 2022 •

edited

Loading

dongjoon-hyun commented Aug 19, 2022 •

edited

Loading