Skip to content

Conversation

@yangwwei
Copy link
Contributor

@yangwwei yangwwei commented Feb 26, 2022

What changes were proposed in this pull request?

Add yunikorn feature step in order to support yunikorn scheduler. Apache yunikorn is an open-source batch scheduler for K8s, designed to solve the problems of running batch workloads, e.g Spark, on K8s. This PR will add a build profile yunikorn, the yunikorn feature step, UT, and integration tests.

This PR introduced the bare minimal changes for the integration with yunikorn. YuniKorn has a concept of the application, and a pod will only be scheduled if it belongs to an application. Therefore, it needs to identify which pod belongs to which application. This is done by looking at the pod annotation: yunikorn.apache.org/app-id, and this must be a unique ID per job. In this PR, we automatically set this pod annotation with value equals to spark appID for all the driver and executor pods.

Why are the changes needed?

This is a part of the effort SPARK-36057, in order to make Spark natively support customized K8s schedulers. This PR particularly is similar to #35422.

Does this PR introduce any user-facing change?

Once this is done, user can easily submit their jobs to yunikorn scheduler with the following configs:

--conf spark.kubernete.driver.scheduler.name=yunikorn \
--conf spark.kubernetes.driver.pod.featureSteps=org.apache.spark.deploy.k8s.features.scheduler.YuniKornFeatureStep

on a yunikorn enabled cluster, the Spark job will be scheduled by the yunikorn scheduler instead of the default scheduler.

How was this patch tested?

UT

Integration tests

On docker-desktop 4.4.2, K8s 1.22.5, I've run the integration tests with and without yunikorn profile:

without yunikorn

resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh --exclude-tags minikube,r --deploy-mode docker-desktop

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Spark Project Parent POM 3.3.0-SNAPSHOT:
[INFO] 
[INFO] Spark Project Parent POM ........................... SUCCESS [  4.969 s]
[INFO] Spark Project Tags ................................. SUCCESS [  7.028 s]
[INFO] Spark Project Local DB ............................. SUCCESS [  5.965 s]
[INFO] Spark Project Networking ........................... SUCCESS [  9.096 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  9.911 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [  5.523 s]
[INFO] Spark Project Launcher ............................. SUCCESS [  5.144 s]
[INFO] Spark Project Core ................................. SUCCESS [02:28 min]
[INFO] Spark Project Kubernetes ........................... SUCCESS [ 39.188 s]
[INFO] Spark Project Kubernetes Integration Tests ......... SUCCESS [15:25 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  19:21 min
[INFO] Finished at: 2022-02-26T21:34:21-08:00
[INFO] ------------------------------------------------------------------------

with yunikorn profile: -Pyunikorn

  1. Download yunikorn 0.12.2 release:
  2. Install yunikorn locally: helm install yunikorn ./helm-charts/yunikorn
  3. Run integration tests
build/sbt -Pyunikorn -Dspark.kubernetes.test.deployMode=docker-desktop -Pkubernetes -Pkubernetes-integration-tests -Dtest.exclude.tags=minikube,r "kubernetes-integration-tests/test"

[info] KubernetesSuite:
  ...
[info] YuniKornSuite:
 ...
[info] Run completed in 32 minutes, 4 seconds.
[info] Total number of tests run: 45
[info] Suites: completed 2, aborted 0
[info] Tests: succeeded 45, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

Copy link
Member

@Yikun Yikun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yangwwei thanks for contribution, and there are some comments inline, and also some note as below:

  • Please enable the github action in your repo to make CI green.
  • It would be good to add the yunikorn deploy cmd line in PR description to help user test using yunikorn.
  • Feel free to paste the integration test results when it's ready.
  • Note that Spark support arm64 and x86 officially, so we need to test it both in x86 and arm64. Especially, Spark also running the integration test under MacOS Silicon.

@yangwwei yangwwei changed the title [WIP][SPARK-37809] Add yunikorn feature step [SPARK-37809] Add yunikorn feature step Feb 27, 2022
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unmanagedSources / excludeFilter := HiddenFileFilter || "*YuniKorn*.scala")
unmanagedSources / excludeFilter += HiddenFileFilter || "*YuniKorn*.scala")

Looks like yunikorn setting overwrite accidently, try to use += rather than := in here.

[1] https://www.scala-sbt.org/1.x/docs/Appending-Values.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yikun it seems volcano needs the same improvement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to extract this as a constant in a companion object YuniKornFeatureStep and reuse it in the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, just get that done in the last commit. Thanks, @martin-g .

Copy link
Member

@Yikun Yikun Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[1] https://groups.google.com/g/simple-build-tool/c/zVMyoWRAVWg?hl=en
[2] https://stackoverflow.com/questions/41627627/sbt-0-13-8-what-does-the-settingkey-method-do

Note for myself and othre reviewers. : ) but looks like we couldn't find any offcial sbt doc

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but I don't think we need this at this stage.

  • spark.kubernetes.scheduler.name can be used for any custom scheduler name.
  • We can add any annotation names easily.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-37809] Add yunikorn feature step [SPARK-37809][K8S] Add yunikorn feature step Mar 1, 2022
@holdenk
Copy link
Contributor

holdenk commented Mar 1, 2022

Jenkins ok to test

@dongjoon-hyun
Copy link
Member

Jenkins ok to test

@holdenk . As you know, AMP lab infra is gone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is the one part that we can't do by just setting the scheduler name right now correct? Are there any other parts that you think we need to add to the featurestep in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. This is the one part needed to get this working for yunikorn. Except for this one, there is another one needed: https://issues.apache.org/jira/browse/SPARK-38310, I will be working on it once this one gets merged.

@yangwwei
Copy link
Contributor Author

yangwwei commented Mar 1, 2022

Sorry, but I don't think we need this at this stage.

  • spark.kubernetes.scheduler.name can be used for any custom scheduler name.
  • We can add any annotation names easily.

I am sorry I did not get your point. This is the same effort as what was done for volcano feature step #35422, a separate feature step for yunikorn. This was proposed together in the SPIP doc, why this is not needed?

@dongjoon-hyun
Copy link
Member

Sorry, but I don't think we need this at this stage.

  • spark.kubernetes.scheduler.name can be used for any custom scheduler name.
  • We can add any annotation names easily.

I am sorry I did not get your point. This is the same effort as what was done for volcano feature step #35422, a separate feature step for yunikorn. This was proposed together in the SPIP doc, why this is not needed?

It's because master branch already supports the following. I don't see any value addition in this PR at this stage.

--conf spark.kubernete.driver.scheduler.name=yunikorn \

@yangwwei
Copy link
Contributor Author

yangwwei commented Mar 1, 2022

It's because master branch already supports the following. I don't see any value addition in this PR at this stage.

Sorry, I should add more context in the beginning.
The additional thing is the change for adding some pod annotation, like what @holdenk pointed out: #35663 (comment). And note, this is the base for yunikorn related changes, after we have the yunikorn feature step, I will create another PR to add queue-related configs, that will be tracked in https://issues.apache.org/jira/browse/SPARK-38310. For phase 1 (targeted for Spark 3.3), that's the main stuff to get spark jobs scheduled by yunikorn. For phase 2, there will be more optimizations coming. Hope this clarifies things.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If then, please bring this back with more requirement at phase 2's real use case, @yangwwei .

I don't think see any values in this PR including that static annotation, too.

addToAnnotations(YuniKornFeatureStep.AppIdAnnotationKey, kubernetesConf.appId)

@yangwwei
Copy link
Contributor Author

yangwwei commented Mar 1, 2022

hi @dongjoon-hyun

Sorry, I am not convinced. Without this, there is no integration with yunikorn. Volcano side changes already got merged, this is a very similar change. With this PR and the coming one for https://issues.apache.org/jira/browse/SPARK-38310. Users will have a very simple (and consistent) way to submit Spark jobs to yunikorn enabled cluster, that's one of the goals of SPIP Spark-36057 Support Customized Kubernetes Schedulers Proposal. Why do we have to wait?

@dongjoon-hyun
Copy link
Member

It seems that you missed my point. This is just a duplication, @yangwwei . We want to avoid this kind of boiler templating as much as possible.

this is a very similar change.

Again, we already support --conf spark.kubernete.driver.scheduler.name=yunikorn in the master branch when we extend for volcano. Please be specific about missing part and what we really need.

@yangwwei
Copy link
Contributor Author

yangwwei commented Mar 2, 2022

For the scheduler name, that's fine. In this PR, the addition is we need to add this:

addToAnnotations(YuniKornFeatureStep.AppIdAnnotationKey, kubernetesConf.appId)

this is needed in order to let yunikorn know which pods belong to which job. Otherwise, it won't be able to be scheduled. In the next PR for queue-related configs, I will add some other logic to load the queue configs from conf and add that to the pod annotation.

PS: Updated the description to include some more details why we need this. Hope this makes sense.

@dongjoon-hyun
Copy link
Member

Here is my answer. I believe this is the desirable way which Apache Spark wants to go, @yangwwei . We want to extensible and support all future custom scheduler instead of locking in any specific scheduler.

@yangwwei
Copy link
Contributor Author

yangwwei commented Mar 2, 2022

hi @dongjoon-hyun thank you for sharing your thoughts in the PR.

Maybe this PR gives the impression that we just need to add some annotations, but the actual integration will be much more complicated than that. AppID and queue name are the very first thing for integrating with a scheduler, to take advantage of the scheduler features, there are more than that. If you look at this section in the SPIP doc, it gives some more context. A full story will require us to support many things that (most of them) are already supported in YARN, such as priority, preemption, gang scheduling, etc. For these features, we will need to add more logic to tweak pod spec, or add additional K8s resources. And different scheduler implementations have different semantics to support these features. That's why we want to introduce scheduler feature step, in order to customize this with e.g VolcanoFeatureStep, YuniKornFeatureStep. The 1st phase for yunikorn, as well as volcano, is simple: let the Spark job be able to be scheduled by a customized scheduler natively. But it doesn't stop here, based on the added feature step, we can do more integration in the 2nd, 3rd phases. Hope this clarifies things. Thanks!

@dongjoon-hyun
Copy link
Member

@yangwwei . It seems that you didn't pay attention for the content of my feedbacks.

Maybe this PR gives the impression that we just need to add some annotations,

My first comment was the following #35663 (review) . I know you need to add more, but I don't think we need to duplicate Apache Spark like the original PR.

Sorry, but I don't think we need this at this stage.

  • spark.kubernetes.scheduler.name can be used for any custom scheduler name.
  • We can add any annotation names easily.

My second comment was the following.

If then, please bring this back with more requirement at phase 2's real use case, @yangwwei .
I don't think see any values in this PR including that static annotation, too.

My 3rd comment was

It seems that you missed my point. This is just a duplication, @yangwwei . We want to avoid this kind of boiler templating as much as possible. Again, we already support --conf spark.kubernete.driver.scheduler.name=yunikorn in the master branch when we extend for volcano. Please be specific about missing part and what we really need.

My review comments are consistent and the same. Please build the missing part by utilizing the existing one instead by simply claiming that you need the same copy of code again and again. We want to be extensible and support all future custom schedulers instead of duplicating for all customer schedulers.

From my perspective, I'd like to recommend to add a new feature to YuniKorn to support custom ID annotation and allow Spark jobs to specify it to spark-app-id. YuniKorn is not a golden standard written on rock. Please improve it more flexible first.

Lastly, we are reviewing the PR piece by piece. Please make a PR meaningful, complete, and reasonable. Unfortunately, I don't think this PR meets the criteria.

@Yikun
Copy link
Member

Yikun commented Mar 2, 2022

@yangwwei

There are two alternative way to support Yunikorn at this stage:

  • New annotation placeholder support which @dongjoon-hyun introduced. In theory, all annotations can be set this way.(such as queue placeholder). It's very flexible. This is very helpful for custom schedulers which only need annotations set.

  • A separate feature step (this PR): this will help to simplify annotation configuration, we just need to set feature step to instead of setting many annotations conf. This way it's more friendly to schedulers which need to create extra CRD and also need many annotations.

I think dongjoon-hyun's concern is how to use the most simple way to integrate Yunikorn at this stage. And I also think even if we support by any way, spark with Yunikorn can also doc in spark, telling users how to use it officially.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these tests also use yunikornTag ?

@yangwwei
Copy link
Contributor Author

yangwwei commented Mar 7, 2022

Hi @dongjoon-hyun

I want to work with you and see what is the best way to solve this. Apologies for this long comment, there are mainly 2 parts: 1) why yunikorn feature step; 2) if not feature step, what's the alternative.

1) why yunikorn feature step

In the proposal, the goal is to provide users a customizable and consistent fashion to use a 3rd party K8s scheduler for Spark, we proposed the following user-facing changes when submitting spark jobs:

--conf spark.kubernetes.driver.scheduler.name=xxx 
--conf spark.kubernetes.driver.pod.featureSteps=org.apache.spark.deploy.k8s.features.scheduler.XxxFeatureStep
--conf spark.kubernetes.job.queue default
--conf spark.kubernetes.job.minCPU4
--conf spark.kubernetes.job.minMemory8G

both Volcano and YuniKorn will honor these configs and set up driver/executor pods accordingly via feature step. That's why I was waiting for @Yikun to finish the general stuff in the K8s feature step code implementation and then submitted this PR. For yunikorn side logic, there is no major difference from Volcano. We need to set up a few pod annotations for appID, job queue, and also K8s CRD. In the case of yunikorn, it is application.yunikorn.apache.org, CRD definition. The only difference is, PodGroup is a mandatory resource required by Volcano, but app CRD is optional in YuniKorn. So in the 1st phase, my PR doesn't introduce the CRD creation, but at least we have the basic integration working. BTW, yunikorn has already passed the ASF graduation votes, so it will become to be an Apache TLP in a few weeks.

2) if not feature step, what's the alternative

@Yikun summarize the alternative here: use the annotation placeholders introduced via: #35704. I looked into this approach, that looks like we will need to set up something like:

--conf spark.kubernetes.driver.scheduler.name=yunikorn 
--conf spark.kubernetes.driver.annotation.yunikorn.apache.org/app-id={{APP_ID}}
--conf spark.kubernetes.executor.annotation.yunikorn.apache.org/app-id={{APP_ID}}
--conf spark.kubernetes.job.queue default
--conf spark.kubernetes.job.minCPU 4
--conf spark.kubernetes.job.minMemory 8G

this can work for the 1st phase. However, I am not sure how to achieve our 2nd phase target when the CRD is introduced. Are you suggesting for the 1st phase we use this approach, and add the feature step in the 2nd phase? Will that lead to a different user experience for the end-users?

I really appreciate it if you can share your thoughts. Thanks!

@martin-g
Copy link
Member

martin-g commented Mar 7, 2022

@yangwwei I guess it was just a typo and copy paste error but --conf spark.kubernete... should be --conf spark.kubernetes...
Also the min related configs should be minCPU and minMemory. There is no namespace min.
Please also check whether podTemplate might be used for Yunikorn integration needs.

@yangwwei
Copy link
Contributor Author

yangwwei commented Mar 7, 2022

@yangwwei I guess it was just a typo and copy paste error but --conf spark.kubernete... should be --conf spark.kubernetes... Also the min related configs should be minCPU and minMemory. There is no namespace min. Please also check whether podTemplate might be used for Yunikorn integration needs.

Thanks!! Sorry for being lazy, I just copy them from the proposal doc. Fixed the typos in the doc as well.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rebase this PR to the master branch for Apache Spark 3.4.0, @yangwwei ?

}

object YuniKorn {
// Exclude all yunikorn file for Compile and Test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. file -> files. Last time, we missed this at Volcano PR.

override def configurePod(pod: SparkPod): SparkPod = {
val k8sPodBuilder = new PodBuilder(pod.pod)
.editMetadata()
.addToAnnotations(YuniKornFeatureStep.AppIdAnnotationKey, kubernetesConf.appId)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Apache Spark 3.3.0, SPARK-38383 supports APP_ID and EXECUTOR_ID placeholder in annotations. Do we need this specialized logic still?


object YuniKornFeatureStep {
val AppIdAnnotationKey = "yunikorn.apache.org/app-id"
val SchedulerName = "yunikorn"
Copy link
Member

@dongjoon-hyun dongjoon-hyun Aug 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apache Spark naming scheme use all CAPITAL for constants.

- val AppIdAnnotationKey = "yunikorn.apache.org/app-id"
- val SchedulerName = "yunikorn"
+ val APP_ID_ANNOTATION_KEY = "yunikorn.apache.org/app-id"
+ val SCHEDULER_NAME = "yunikorn"

BTW, you are already using this style in this PR in the following.

private[spark] object YuniKornTestsSuite extends SparkFunSuite {
   val YUNIKORN_FEATURE_STEP = classOf[YuniKornFeatureStep].getName
 }

import org.apache.spark.{SparkConf, SparkFunSuite}
import org.apache.spark.deploy.k8s._

class YuniKornFeatureStepSuite extends SparkFunSuite {
Copy link
Member

@dongjoon-hyun dongjoon-hyun Aug 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR seems to duplicate the existing test coverage. Could you remove this test suite?

.set("spark.kubernetes.driver.annotation.yunikorn.apache.org/app-id", "{{APP_ID}}")

.set("spark.kubernetes.executor.annotation.yunikorn.apache.org/app-id", "{{APP_ID}}")

assert(annotations.get(YuniKornFeatureStep.AppIdAnnotationKey) === appId)
}

test("Run SparkPi with yunikorn scheduler", k8sTestTag, yunikornTag) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the actual YuniKorn feature test coverage here. Otherwise, we cannot prevent the future regression. Technically, I don't think YuniKorn has less features than Volcano scheduler. At least, we need similar test coverage with Volcano and it would be great if we can have more test coverage about YuniKorn's specialty. FYI, we have at least the following test coverage at least at Volcano.

test("SPARK-38188: Run SparkPi jobs with 2 queues (only 1 enabled)", k8sTestTag, volcanoTag) {

test("SPARK-38188: Run SparkPi jobs with 2 queues (all enabled)", k8sTestTag, volcanoTag) {

test("SPARK-38423: Run driver job to validate priority order", k8sTestTag, volcanoTag) {

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I left some comments. WDYT, @yangwwei?

  1. De-duplicate by reusing SPARK-38383 and remove YuniKornFeatureStepSuite.scala.
  2. Follow Apache Spark Coding Style consistently.
  3. Add more test coverage to prevent YuniKorn feature regression.
  4. nit. A typo.
  5. Lastly, can we add some documents like https://github.com/apache/spark/blame/master/docs/running-on-kubernetes.md#L1728-L1813 ?

Also, cc @Yikun since he added Volcano before this at Apache Spark 3.3.0.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-37809][K8S] Add yunikorn feature step [SPARK-37809][K8S] Add YuniKorn Feature Step Aug 17, 2022
@Yikun
Copy link
Member

Yikun commented Aug 17, 2022

I think @dongjoon-hyun already catch all key parts. And I think we also need:

  • doc in here to show requreiment and how to do complete IT for spark and volcano.
  • IIRC, as we dicussed before, we need to support x86/arm64 arch if we get it supported in Spark. According to Yunikorn doc, looks like it is still only supported amd64? https://hub.docker.com/r/apache/yunikron/tags . I will help to test in linux arm64/x86 and macos m1 when it's ready.
  • BTW, unrelated but might useful Volcano community is going to introduce the native multi-arch support in 1.7 (about relese in next month), I will help to upgrade Volcano to 1.7 version in Spark 3.4. (I believe this is also @dongjoon-hyun's concern) , so pls don't mislead by arm64/x86 separate installtions of spark volcano integrations.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Aug 17, 2022

Thank you, @Yikun . BTW, Yes, I'm a little worried about the tightly-coupled relation between Fabric8 K8s client and Volcano. Is there a way to support the latest Volcano while Apache Spark still uses io.fabric8:kubernetes-client 5.12.x?

@dongjoon-hyun
Copy link
Member

BTW, @yangwwei . I'm reviewing this PR freshly. I might lose some context on this PR. Please let me know if I forgot something (which was discussed previously already).

@Yikun
Copy link
Member

Yikun commented Aug 17, 2022

I'm a little worried about the tightly-coupled relation between Fabric8 K8s client and Volcano. Is there a way to support the latest Volcano while Apache Spark still uses io.fabric8:kubernetes-client 5.12.x?

Because the volcano API is compatible, it is naturally supported. That is to say, for the functions used by Spark, even if the old version of volcano client is used, it is still supported. Recently, I have done enough testing on the latest master version of Volcano and Spark. And volcano e2e_spark also protect it.

@yangwwei
Copy link
Contributor Author

yangwwei commented Aug 17, 2022

hi @dongjoon-hyun , @Yikun thanks for the comments, very good points. Let me work on the update. A few questions/comments in parallel:

  1. For docs like here and here.

Can we address in another PR? Just want to keep this PR concise and focus on the code implementation.

  1. De-duplicate by reusing SPARK-38383 and remove YuniKornFeatureStepSuite.scala.

I think we still need to preserve the feature step. From Spark, we want to give a consistent user experience when people want to run Spark with customized schedulers. Volcano integration was exposed via FeatureStep, I prefer we do the same for YuniKorn. It will be easier for users to configure. What do you think?

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Aug 18, 2022

  1. For the doc, you can proceed it in another PRs, but you should make those PRs first and mention them in your PR description as a link for the community reviews. Otherwise, we cannot verify your contribution.

  2. Yes, we are expecting something like the following.

--conf spark.kubernetes.driver.scheduler.name=yunikorn \
--conf spark.kubernetes.driver.annotation.yunikorn.apache.org/app-id={{APP_ID}} \
--conf spark.kubernetes.executor.annotation.yunikorn.apache.org/app-id={{APP_ID}} \
--conf spark.kubernetes.driver.pod.featureSteps=org.apache.spark.deploy.k8s.features.YuniKornFeatureStep \
--conf spark.kubernetes.executor.pod.featureSteps=org.apache.spark.deploy.k8s.features.YuniKornFeatureStep

In short, Apache Spark community don't want to repeat the same logic like SPARK-38383 in every custom scheduler implementations and their test codes. It's only error-prone and misleading to have the same logic in many places. To be safe and consistent across all custom scheduler backend, please respect Apache Spark's feature.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Aug 18, 2022

BTW, is your concern that YuniKornFeatureStep has no other logic? So, we can use more simpler way like the following? Well, it's the best for Apache Spark to support many custom schedulers consistently, isn't it? Why do we need to have Java class like YuniKornFeatureStep additionally when we can do that with the existing configurations without code change at all.

--conf spark.kubernetes.driver.scheduler.name=yunikorn \
--conf spark.kubernetes.driver.annotation.yunikorn.apache.org/app-id={{APP_ID}} \
--conf spark.kubernetes.executor.annotation.yunikorn.apache.org/app-id={{APP_ID}} \

@yangwwei
Copy link
Contributor Author

Hi @dongjoon-hyun

Wait a sec. I think I missed one point. Does this mean we can simply set {{APP_ID}}, and this {{APP_ID}} will be substituted to the actual job ID automatically? If this is the case, we don't need any code changes, using the configs like the following will work:

--conf spark.kubernetes.scheduler.name=yunikorn
--conf spark.kubernetes.driver.annotation.yunikorn.apache.org/app-id={{APP_ID}}
--conf spark.kubernetes.executor.annotation.yunikorn.apache.org/app-id={{APP_ID}}

if you can confirm this. I can remove this PR, and all we need is a doc PR!

@Yikun
Copy link
Member

Yikun commented Aug 19, 2022

@yangwwei IIRC, we already had the discussion before [1]. If all info like queue, miniRes can be passed by annotation/label well, I think current Spark already meet the requirement of Yunikorn. That's also the recommand way which is mentioned in [2] Specify scheduler related configurations.

So, just like our offiline discusssion at the end of version 3.3, just need a doc for Yunikorn. On the other hand, even for documentation, we should meet the basic requirement of Spark which was mentioned above.

[1] #35663 (comment)
[2] https://spark.apache.org/docs/latest/running-on-kubernetes.html#customized-kubernetes-schedulers-for-spark-on-kubernetes

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Aug 19, 2022

  1. Yes, that is a place holder, @yangwwei . We can skip the code unless we need additional function with that class.
  2. Yes, of course, we need a doc. However, Apache Spark needs a test coverage for what we claim to support. So, we need test coverage first.
  3. To be clear, the documentation and test PRs can be backported to branch-3.3 still because they are not a new feature code change. So, we may want to claim YuniKorn support at Apache Spark 3.3.1 (if the requirement is fulfilled).

@yangwwei
Copy link
Contributor Author

Hi @dongjoon-hyun , @Yikun

Gotcha. I think we can do the following:

  1. I will remove the code changes in this PR. Maybe I will just close this one and create a new PR. That PR will majorly add test coverage for yunikorn. I don't think any code changes are needed other than test classes.
  2. I will work on the doc task once the above PR is done.
  3. Since no extra code change is needed to get this work, let's try to backport them to branch-3.3.

BTW, what is the release timeline for v3.3.1?

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Aug 19, 2022

+1 for your plan, @yangwwei .

  • Apache Spark community maintains Feature Release Branches like branch-3.3 with bug fix releases for a period of 18 months.
  • In general, we are delivering multiple bug fix releases during that period. For example, Apache Spark 3.3.0 could have the following schedule. The specific date depends on the release managers who are going to volunteer them.
    • 3.3.0: 2022/06
    • 3.3.1: 2022/10 (3.3.0 + 4M)
    • 3.3.2: 2023/02 (3.3.0 + 8M)
    • 3.3.3: 2023/07 (3.3.0 + 13M)
    • 3.3.4: 2023/12 (3.3.0 + 18M)

@dongjoon-hyun
Copy link
Member

BTW, cc @sunchao since he is interested in the release manager for Apache Spark 3.3.1.

@dongjoon-hyun
Copy link
Member

Gentle ping, @yangwwei .

@yangwwei
Copy link
Contributor Author

hi @dongjoon-hyun , @sunchao I am working on the changes, I am pretty busy right now, will try my best to get the test code done this week. In the meantime, I have submitted PR: #37622 for the doc change. As no code changes are required, I think that can be reviewed in parallel. Please take a look, thanks!

@dongjoon-hyun
Copy link
Member

Got it, @yangwwei . Take your time.

@dongjoon-hyun
Copy link
Member

BTW, I reviewed #37622 and added a few comments. I will hold on that documentation PR to align with the other validations.

@dongjoon-hyun
Copy link
Member

And, for this code PR, I'll close this for now and we are looking forward to seeing your next PR. Thank you so much for your time and making a progress this.

@dongjoon-hyun
Copy link
Member

You can reuse SPARK-37809 for one of your future test code PRs.

dongjoon-hyun pushed a commit that referenced this pull request Sep 1, 2022
### What changes were proposed in this pull request?
Add a section under [customized-kubernetes-schedulers-for-spark-on-kubernetes](https://spark.apache.org/docs/latest/running-on-kubernetes.html#customized-kubernetes-schedulers-for-spark-on-kubernetes) to explain how to run Spark with Apache YuniKorn. This is based on the review comments from #35663.

### Why are the changes needed?
Explain how to run Spark with Apache YuniKorn

### Does this PR introduce _any_ user-facing change?
No

Closes #37622 from yangwwei/SPARK-40187.

Authored-by: Weiwei Yang <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request Sep 1, 2022
### What changes were proposed in this pull request?
Add a section under [customized-kubernetes-schedulers-for-spark-on-kubernetes](https://spark.apache.org/docs/latest/running-on-kubernetes.html#customized-kubernetes-schedulers-for-spark-on-kubernetes) to explain how to run Spark with Apache YuniKorn. This is based on the review comments from #35663.

### Why are the changes needed?
Explain how to run Spark with Apache YuniKorn

### Does this PR introduce _any_ user-facing change?
No

Closes #37622 from yangwwei/SPARK-40187.

Authored-by: Weiwei Yang <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 4b18773)
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants