Skip to content

Conversation

@holdenk
Copy link
Contributor

@holdenk holdenk commented Jul 23, 2021

What changes were proposed in this pull request?

Generalize the pod allocator and add support for statefulsets.

Why are the changes needed?

Allocating individual pods in Spark can be not ideal for some clusters and using higher level operators like statefulsets and replicasets can be useful.

Does this PR introduce any user-facing change?

Yes new config options.

How was this patch tested?

Completed: New unit & basic integration test
PV integration tests

@SparkQA
Copy link

SparkQA commented Jul 24, 2021

Test build #141589 has finished for PR 33508 at commit 4ec47f1.

  • This patch fails Scala style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 24, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46106/

@SparkQA
Copy link

SparkQA commented Jul 24, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46106/

@holdenk holdenk force-pushed the SPARK-36058-support-replicasets-or-job-api-like-things branch from 67c5e48 to 17b3c1b Compare July 28, 2021 19:06
@holdenk holdenk changed the title [WIP][SPARK-36058][K8S] Add support for statefulset APIs in K8s [SPARK-36058][K8S] Add support for statefulset APIs in K8s Jul 28, 2021
@github-actions github-actions bot added the CORE label Jul 28, 2021
@SparkQA
Copy link

SparkQA commented Jul 28, 2021

Test build #141784 has finished for PR 33508 at commit 17b3c1b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 28, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46298/

@SparkQA
Copy link

SparkQA commented Jul 28, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46297/

@SparkQA
Copy link

SparkQA commented Jul 28, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46298/

@SparkQA
Copy link

SparkQA commented Jul 28, 2021

Test build #141785 has finished for PR 33508 at commit 168082e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @shrutig since this is another private[spark] and will change ExecutorPodsAllocator here.

@SparkQA
Copy link

SparkQA commented Jul 29, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46365/

@SparkQA
Copy link

SparkQA commented Jul 29, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46365/

@SparkQA
Copy link

SparkQA commented Jul 29, 2021

Test build #141854 has finished for PR 33508 at commit fb0c010.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 30, 2021

Test build #141861 has finished for PR 33508 at commit 8ceeef6.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 30, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46372/

@SparkQA
Copy link

SparkQA commented Jul 30, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46375/

@SparkQA
Copy link

SparkQA commented Jul 30, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46375/

@SparkQA
Copy link

SparkQA commented Jul 30, 2021

Test build #141864 has finished for PR 33508 at commit db8f593.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 23, 2021

Test build #142705 has finished for PR 33508 at commit 0502e19.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@kbendick kbendick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits I'll leave at your discretion, but overall this looks good to me for supporting statefulsets and other executor allocation strategies. Would love to get this in to make testing the use of this API easier. 🙂

@SparkQA
Copy link

SparkQA commented Aug 24, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47211/

@SparkQA
Copy link

SparkQA commented Aug 24, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47211/

@SparkQA
Copy link

SparkQA commented Aug 24, 2021

Test build #142710 has finished for PR 33508 at commit a638719.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 24, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47212/

@SparkQA
Copy link

SparkQA commented Aug 24, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47212/

@holdenk
Copy link
Contributor Author

holdenk commented Aug 24, 2021

jenkins retest this please.

@SparkQA
Copy link

SparkQA commented Aug 24, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47215/

@SparkQA
Copy link

SparkQA commented Aug 24, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47215/

@SparkQA
Copy link

SparkQA commented Aug 24, 2021

Test build #142715 has finished for PR 33508 at commit 4f3c0cc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public final class Aggregation implements Serializable
  • public final class Count implements AggregateFunc
  • public final class CountStar implements AggregateFunc
  • public final class Max implements AggregateFunc
  • public final class Min implements AggregateFunc
  • public final class Sum implements AggregateFunc

@SparkQA
Copy link

SparkQA commented Aug 24, 2021

Test build #142711 has finished for PR 33508 at commit 4f3c0cc.

  • This patch fails from timeout after a configured wait of 500m.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public final class Aggregation implements Serializable
  • public final class Count implements AggregateFunc
  • public final class CountStar implements AggregateFunc
  • public final class Max implements AggregateFunc
  • public final class Min implements AggregateFunc
  • public final class Sum implements AggregateFunc

@SparkQA
Copy link

SparkQA commented Aug 25, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47273/

@SparkQA
Copy link

SparkQA commented Aug 25, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47273/

@asfgit asfgit closed this in ff3f3c4 Aug 26, 2021
@holdenk
Copy link
Contributor Author

holdenk commented Aug 26, 2021

Merged to the current dev branch (targetting 3.3)

@SparkQA
Copy link

SparkQA commented Aug 26, 2021

Test build #142773 has finished for PR 33508 at commit 5be1942.

  • This patch fails from timeout after a configured wait of 500m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yipen
Copy link

yipen commented Feb 28, 2022

@holdenk I noticed that there already set ownerreference between the executor pods and driver pod. And there also set ownerreference between the statefulset and dirver. Not sure if these are duplicated?

wangyum pushed a commit that referenced this pull request Mar 13, 2022
…ption correctly

### What changes were proposed in this pull request?

This PR aims to fix error message to include the exception because #33508 missed the string interpolation prefix, `s"`.

https://github.com/apache/spark/blob/c032928515e74367137c668ce692d8fd53696485/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesSuite.scala#L110

### Why are the changes needed?

To show the intended message.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual review.

Closes #35829 from dongjoon-hyun/SPARK-36058.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Yuming Wang <[email protected]>
dongjoon-hyun added a commit to apache/spark-kubernetes-operator that referenced this pull request Oct 15, 2025
### What changes were proposed in this pull request?

This PR aims to fix RBAC to allow `Spark` driver to create `StatefulSet`.

### Why are the changes needed?

We need to fix this to allow Apache Spark's `StatefulSetPodsAllocator` which was introduced at Apache Spark 3.3.0.
- apache/spark#33508

### Does this PR introduce _any_ user-facing change?

No, this is an additional permission.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #389 from dongjoon-hyun/SPARK-53909.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
sunchao pushed a commit that referenced this pull request Nov 11, 2025
### What changes were proposed in this pull request?

Adds support for K8s `Deployment` API to allocate pods.

### Why are the changes needed?

Allocating individual pods is not ideal, and we can allocate with higher level APIs. #33508 helps this by adding an interface for arbitrary allocators and adds a statefulset allocator. However, dynamic allocation only works if you have implemented a PodDisruptionBudget associated with the decommission label. Since Deployment uses ReplicaSet, which supports `pod-deletion-cost` annotation, we can avoid needing to create a separate PDB resource, and allow dynamic allocation (w/ shuffle tracking) by adding a low deletion cost to executors we are scaling down. When we scale the Deployment, it will choose to scale down the pods with the low deletion cost.

### Does this PR introduce _any_ user-facing change?
Yes, adds user-facing configs
```
spark.kubernetes.executor.podDeletionCost
```

### How was this patch tested?
New unit tests + passing existing unit tests + tested in a cluster with shuffle tracking and dynamic allocation enabled

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #52867 from ForVic/dev/victors/deployment_allocator.

Lead-authored-by: Victor Sunderland <[email protected]>
Co-authored-by: victors-oai <[email protected]>
Co-authored-by: Victor Sunderland <[email protected]>
Signed-off-by: Chao Sun <[email protected]>
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
### What changes were proposed in this pull request?

Adds support for K8s `Deployment` API to allocate pods.

### Why are the changes needed?

Allocating individual pods is not ideal, and we can allocate with higher level APIs. apache#33508 helps this by adding an interface for arbitrary allocators and adds a statefulset allocator. However, dynamic allocation only works if you have implemented a PodDisruptionBudget associated with the decommission label. Since Deployment uses ReplicaSet, which supports `pod-deletion-cost` annotation, we can avoid needing to create a separate PDB resource, and allow dynamic allocation (w/ shuffle tracking) by adding a low deletion cost to executors we are scaling down. When we scale the Deployment, it will choose to scale down the pods with the low deletion cost.

### Does this PR introduce _any_ user-facing change?
Yes, adds user-facing configs
```
spark.kubernetes.executor.podDeletionCost
```

### How was this patch tested?
New unit tests + passing existing unit tests + tested in a cluster with shuffle tracking and dynamic allocation enabled

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#52867 from ForVic/dev/victors/deployment_allocator.

Lead-authored-by: Victor Sunderland <[email protected]>
Co-authored-by: victors-oai <[email protected]>
Co-authored-by: Victor Sunderland <[email protected]>
Signed-off-by: Chao Sun <[email protected]>
huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
### What changes were proposed in this pull request?

Adds support for K8s `Deployment` API to allocate pods.

### Why are the changes needed?

Allocating individual pods is not ideal, and we can allocate with higher level APIs. apache#33508 helps this by adding an interface for arbitrary allocators and adds a statefulset allocator. However, dynamic allocation only works if you have implemented a PodDisruptionBudget associated with the decommission label. Since Deployment uses ReplicaSet, which supports `pod-deletion-cost` annotation, we can avoid needing to create a separate PDB resource, and allow dynamic allocation (w/ shuffle tracking) by adding a low deletion cost to executors we are scaling down. When we scale the Deployment, it will choose to scale down the pods with the low deletion cost.

### Does this PR introduce _any_ user-facing change?
Yes, adds user-facing configs
```
spark.kubernetes.executor.podDeletionCost
```

### How was this patch tested?
New unit tests + passing existing unit tests + tested in a cluster with shuffle tracking and dynamic allocation enabled

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#52867 from ForVic/dev/victors/deployment_allocator.

Lead-authored-by: Victor Sunderland <[email protected]>
Co-authored-by: victors-oai <[email protected]>
Co-authored-by: Victor Sunderland <[email protected]>
Signed-off-by: Chao Sun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants