-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-36058][K8S] Add support for statefulset APIs in K8s #33508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-36058][K8S] Add support for statefulset APIs in K8s #33508
Conversation
|
Test build #141589 has finished for PR 33508 at commit
|
...etes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
Outdated
Show resolved
Hide resolved
examples/src/main/scala/org/apache/spark/examples/MiniReadWriteTest.scala
Outdated
Show resolved
Hide resolved
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
67c5e48 to
17b3c1b
Compare
|
Test build #141784 has finished for PR 33508 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Kubernetes integration test status success |
|
Test build #141785 has finished for PR 33508 at commit
|
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
Outdated
Show resolved
Hide resolved
...etes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...rnetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala
Outdated
Show resolved
Hide resolved
...rnetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala
Outdated
Show resolved
Hide resolved
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
Outdated
Show resolved
Hide resolved
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #141854 has finished for PR 33508 at commit
|
|
Test build #141861 has finished for PR 33508 at commit
|
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #141864 has finished for PR 33508 at commit
|
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh
Outdated
Show resolved
Hide resolved
|
Test build #142705 has finished for PR 33508 at commit
|
kbendick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits I'll leave at your discretion, but overall this looks good to me for supporting statefulsets and other executor allocation strategies. Would love to get this in to make testing the use of this API easier. 🙂
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
Outdated
Show resolved
Hide resolved
...es/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala
Outdated
Show resolved
Hide resolved
.../core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocatorSuite.scala
Outdated
Show resolved
Hide resolved
...ation-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/BasicTestsSuite.scala
Outdated
Show resolved
Hide resolved
…nd fix test name
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #142710 has finished for PR 33508 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
jenkins retest this please. |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #142715 has finished for PR 33508 at commit
|
|
Test build #142711 has finished for PR 33508 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Merged to the current dev branch (targetting 3.3) |
|
Test build #142773 has finished for PR 33508 at commit
|
|
@holdenk I noticed that there already set ownerreference between the executor pods and driver pod. And there also set ownerreference between the statefulset and dirver. Not sure if these are duplicated? |
…ption correctly ### What changes were proposed in this pull request? This PR aims to fix error message to include the exception because #33508 missed the string interpolation prefix, `s"`. https://github.com/apache/spark/blob/c032928515e74367137c668ce692d8fd53696485/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesSuite.scala#L110 ### Why are the changes needed? To show the intended message. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual review. Closes #35829 from dongjoon-hyun/SPARK-36058. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Yuming Wang <[email protected]>
### What changes were proposed in this pull request? This PR aims to fix RBAC to allow `Spark` driver to create `StatefulSet`. ### Why are the changes needed? We need to fix this to allow Apache Spark's `StatefulSetPodsAllocator` which was introduced at Apache Spark 3.3.0. - apache/spark#33508 ### Does this PR introduce _any_ user-facing change? No, this is an additional permission. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #389 from dongjoon-hyun/SPARK-53909. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request? Adds support for K8s `Deployment` API to allocate pods. ### Why are the changes needed? Allocating individual pods is not ideal, and we can allocate with higher level APIs. #33508 helps this by adding an interface for arbitrary allocators and adds a statefulset allocator. However, dynamic allocation only works if you have implemented a PodDisruptionBudget associated with the decommission label. Since Deployment uses ReplicaSet, which supports `pod-deletion-cost` annotation, we can avoid needing to create a separate PDB resource, and allow dynamic allocation (w/ shuffle tracking) by adding a low deletion cost to executors we are scaling down. When we scale the Deployment, it will choose to scale down the pods with the low deletion cost. ### Does this PR introduce _any_ user-facing change? Yes, adds user-facing configs ``` spark.kubernetes.executor.podDeletionCost ``` ### How was this patch tested? New unit tests + passing existing unit tests + tested in a cluster with shuffle tracking and dynamic allocation enabled ### Was this patch authored or co-authored using generative AI tooling? No Closes #52867 from ForVic/dev/victors/deployment_allocator. Lead-authored-by: Victor Sunderland <[email protected]> Co-authored-by: victors-oai <[email protected]> Co-authored-by: Victor Sunderland <[email protected]> Signed-off-by: Chao Sun <[email protected]>
### What changes were proposed in this pull request? Adds support for K8s `Deployment` API to allocate pods. ### Why are the changes needed? Allocating individual pods is not ideal, and we can allocate with higher level APIs. apache#33508 helps this by adding an interface for arbitrary allocators and adds a statefulset allocator. However, dynamic allocation only works if you have implemented a PodDisruptionBudget associated with the decommission label. Since Deployment uses ReplicaSet, which supports `pod-deletion-cost` annotation, we can avoid needing to create a separate PDB resource, and allow dynamic allocation (w/ shuffle tracking) by adding a low deletion cost to executors we are scaling down. When we scale the Deployment, it will choose to scale down the pods with the low deletion cost. ### Does this PR introduce _any_ user-facing change? Yes, adds user-facing configs ``` spark.kubernetes.executor.podDeletionCost ``` ### How was this patch tested? New unit tests + passing existing unit tests + tested in a cluster with shuffle tracking and dynamic allocation enabled ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#52867 from ForVic/dev/victors/deployment_allocator. Lead-authored-by: Victor Sunderland <[email protected]> Co-authored-by: victors-oai <[email protected]> Co-authored-by: Victor Sunderland <[email protected]> Signed-off-by: Chao Sun <[email protected]>
### What changes were proposed in this pull request? Adds support for K8s `Deployment` API to allocate pods. ### Why are the changes needed? Allocating individual pods is not ideal, and we can allocate with higher level APIs. apache#33508 helps this by adding an interface for arbitrary allocators and adds a statefulset allocator. However, dynamic allocation only works if you have implemented a PodDisruptionBudget associated with the decommission label. Since Deployment uses ReplicaSet, which supports `pod-deletion-cost` annotation, we can avoid needing to create a separate PDB resource, and allow dynamic allocation (w/ shuffle tracking) by adding a low deletion cost to executors we are scaling down. When we scale the Deployment, it will choose to scale down the pods with the low deletion cost. ### Does this PR introduce _any_ user-facing change? Yes, adds user-facing configs ``` spark.kubernetes.executor.podDeletionCost ``` ### How was this patch tested? New unit tests + passing existing unit tests + tested in a cluster with shuffle tracking and dynamic allocation enabled ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#52867 from ForVic/dev/victors/deployment_allocator. Lead-authored-by: Victor Sunderland <[email protected]> Co-authored-by: victors-oai <[email protected]> Co-authored-by: Victor Sunderland <[email protected]> Signed-off-by: Chao Sun <[email protected]>
What changes were proposed in this pull request?
Generalize the pod allocator and add support for statefulsets.
Why are the changes needed?
Allocating individual pods in Spark can be not ideal for some clusters and using higher level operators like statefulsets and replicasets can be useful.
Does this PR introduce any user-facing change?
Yes new config options.
How was this patch tested?
Completed: New unit & basic integration test
PV integration tests