-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-36058][K8S] Add support for statefulset APIs in K8s #33508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
holdenk
wants to merge
107
commits into
apache:master
from
holdenk:SPARK-36058-support-replicasets-or-job-api-like-things
Closed
Changes from all commits
Commits
Show all changes
107 commits
Select commit
Hold shift + click to select a range
b4d625c
Work in progress supporting statefulsets for Spark. TBD if we want st…
holdenk 38676ab
It compiles, yaygit diff!
holdenk 09eb220
Add applicationId to setTotalExpectedExecutors so that we can use thi…
holdenk 922b61a
Put in a restart policy of always. Next TODO (likely we can look at t…
holdenk 2316084
Add podname parsing logic based on https://github.com/spark-volcano-w…
holdenk 1e004d3
Try and plumb through the SPARK_EXECUTOR_POD_NAME so we can process i…
holdenk 10c2922
Move the restart policy change into basic exec feature step instead o…
holdenk 9434bff
Move more of the hijinks into the featuresteps where they fit
holdenk 39ad5f6
Add a parallel pod management property, lets hope this doesn't screw …
holdenk 74e4a67
Fix typo )) -> )
holdenk 5f3bf00
Get it to compile again
holdenk bb27e9f
Turns out we do want to track snapshots so know about dead pods earli…
holdenk 4d343a4
Refactor the stateful allocator out from the base allocator (TODO: ac…
holdenk e6fc922
Use scale to update statefulset scale
holdenk d7a094f
Construct the pod allocator based on user configuration.
holdenk 1c8556f
Start adding new tests (slowly) for statefulset allocator and update …
holdenk 7522169
Initial statefulset mock test
holdenk 64bb5a7
Add second resource profile and scaleup test to StatefulsetAllocatorS…
holdenk bc15209
Validate the deletions as well
holdenk 37c49a9
Verify that we can allocate with statefulsets. Next up: clean up the …
holdenk dc602be
Start work to cleanup and validate removal of statefulset on driver exit
holdenk 3c5fb3d
Fix addowner ref
holdenk 57b58e8
Delegate the pod cleanup to the pod allocator so that the statefulset…
holdenk da8cc6c
Use eventually when checking for set delition because it depends on t…
holdenk a7da7b2
Make the KubernetesSuite pod log collection resilent to pending pods.
holdenk 799e2ff
Add a minireadwrite test for use with PVCs and not proper DFS
holdenk ee176f0
Add some tests around the new allocator with PVs
holdenk a48beb6
maaaybe exec mount
holdenk 500c080
Revert "maaaybe exec mount"
holdenk d626dc7
Update the mini-read-write test to handle the fact the exec PVCs are …
holdenk f6540e1
Switch the PV tests back tohaving pvTestTag and MiniKubeTag as needed…
holdenk 5dc1bc4
Scala style cleanups
holdenk b1ba08c
Delete block when putting over an existing block incase our in-memory…
holdenk b151d8f
We do the deletion of the pods inside of the executorpodsallocator no…
holdenk efd2ae7
Handle empty pod specs
holdenk 5e0e939
Update StatefulsetPodsAllocator.scala
holdenk d8503e7
code review feedback, cleanup the SPARK_LOCAL_DIRS when executing ent…
holdenk e8eece5
Expose the AbstractPodsAllocator as a @DeveloperApi as suggested/requ…
holdenk 08a24d9
Move the getItems inside of the eventually otherwise we still could h…
holdenk 5362f73
pvTestTag was removed upstream
holdenk a74598e
Update entrypoint.sh
holdenk ffa5d24
Fix up how we launch pods allocators
holdenk ec8bf09
Make a new entry point for executors on Kube so they can request the …
holdenk 2d6dc1c
Add unit tests for dynamically fetching exec id and constructing the …
holdenk df601c2
Don't parse podnames anymore to get exec ids instead depend on the la…
holdenk df2af02
Remove the SparkException import we don't need anymore
holdenk 6db2b9f
Add the KubernetesClusterManagerSuite
holdenk 65d89a0
Work in progress supporting statefulsets for Spark. TBD if we want st…
holdenk 88d345c
It compiles, yaygit diff!
holdenk 908d085
Add applicationId to setTotalExpectedExecutors so that we can use thi…
holdenk 7000ff5
Put in a restart policy of always. Next TODO (likely we can look at t…
holdenk f5375ed
Add podname parsing logic based on https://github.com/spark-volcano-w…
holdenk b1b04fc
Try and plumb through the SPARK_EXECUTOR_POD_NAME so we can process i…
holdenk 29470f6
Move the restart policy change into basic exec feature step instead o…
holdenk 773ae75
Move more of the hijinks into the featuresteps where they fit
holdenk 315807b
Add a parallel pod management property, lets hope this doesn't screw …
holdenk f92057e
Fix typo )) -> )
holdenk e4392aa
Get it to compile again
holdenk 740a16e
Turns out we do want to track snapshots so know about dead pods earli…
holdenk 9ebee9a
Refactor the stateful allocator out from the base allocator (TODO: ac…
holdenk 01a1a97
Use scale to update statefulset scale
holdenk 35c939d
Construct the pod allocator based on user configuration.
holdenk 2052685
Start adding new tests (slowly) for statefulset allocator and update …
holdenk 97255ab
Initial statefulset mock test
holdenk 5a8c298
Add second resource profile and scaleup test to StatefulsetAllocatorS…
holdenk 58dec2c
Validate the deletions as well
holdenk 596236a
Verify that we can allocate with statefulsets. Next up: clean up the …
holdenk d26e2d9
Start work to cleanup and validate removal of statefulset on driver exit
holdenk 029c682
Fix addowner ref
holdenk 2285340
Delegate the pod cleanup to the pod allocator so that the statefulset…
holdenk 9a51151
Use eventually when checking for set delition because it depends on t…
holdenk a2d4183
Make the KubernetesSuite pod log collection resilent to pending pods.
holdenk cd09bc4
Add a minireadwrite test for use with PVCs and not proper DFS
holdenk d08dd7d
Add some tests around the new allocator with PVs
holdenk 3633936
maaaybe exec mount
holdenk ce9299b
Revert "maaaybe exec mount"
holdenk 71b9674
Update the mini-read-write test to handle the fact the exec PVCs are …
holdenk 7c21bbc
Switch the PV tests back tohaving pvTestTag and MiniKubeTag as needed…
holdenk a3c5103
Scala style cleanups
holdenk 3fee3bb
Delete block when putting over an existing block incase our in-memory…
holdenk f290aef
We do the deletion of the pods inside of the executorpodsallocator no…
holdenk 993ff65
Handle empty pod specs
holdenk 788005d
Update StatefulsetPodsAllocator.scala
holdenk 40c2db3
code review feedback, cleanup the SPARK_LOCAL_DIRS when executing ent…
holdenk 048ea6f
Expose the AbstractPodsAllocator as a @DeveloperApi as suggested/requ…
holdenk bd79229
Move the getItems inside of the eventually otherwise we still could h…
holdenk 8c81112
pvTestTag was removed upstream
holdenk 2364e4b
Update entrypoint.sh
holdenk 108503e
Fix up how we launch pods allocators
holdenk ec27fb5
Make a new entry point for executors on Kube so they can request the …
holdenk 7f331c0
Add unit tests for dynamically fetching exec id and constructing the …
holdenk d87cda2
Don't parse podnames anymore to get exec ids instead depend on the la…
holdenk 9715697
Remove the SparkException import we don't need anymore
holdenk 946d4a7
Add the KubernetesClusterManagerSuite
holdenk 9a432c0
Merge branch 'master' into SPARK-36058-support-replicasets-or-job-api…
holdenk 1ab835a
Merge branch 'SPARK-36058-support-replicasets-or-job-api-like-things'…
holdenk 2e4cd93
Minimize changes by dropping appId from setTotalExpectedExecutors and…
holdenk 7aa79f4
Merge branch 'master' into SPARK-36058-support-replicasets-or-job-api…
holdenk d1c172d
Merge branch 'master' into SPARK-36058-support-replicasets-or-job-api…
holdenk fae1cbd
Make sure we call start before setting the total expected execs
holdenk 49ab072
Throw an exception if execs are set before start so it's clearer than…
holdenk 362172c
Merge branch 'master' into SPARK-36058-support-replicasets-or-job-api…
holdenk 0502e19
Merge branch 'master' into SPARK-36058-support-replicasets-or-job-api…
holdenk 6e660a4
Update the Usage instructions for the MiniReadWriteTest
holdenk a638719
CR feedback from @kbendick - use classof, change config param name, a…
holdenk 4f3c0cc
Merge branch 'master' into SPARK-36058-support-replicasets-or-job-api…
holdenk 5be1942
Merge branch 'master' into SPARK-36058-support-replicasets-or-job-api…
holdenk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
139 changes: 139 additions & 0 deletions
139
examples/src/main/scala/org/apache/spark/examples/MiniReadWriteTest.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,139 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one or more | ||
| * contributor license agreements. See the NOTICE file distributed with | ||
| * this work for additional information regarding copyright ownership. | ||
| * The ASF licenses this file to You under the Apache License, Version 2.0 | ||
| * (the "License"); you may not use this file except in compliance with | ||
| * the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| // scalastyle:off println | ||
| package org.apache.spark.examples | ||
|
|
||
| import java.io.File | ||
| import java.io.PrintWriter | ||
|
|
||
| import scala.io.Source._ | ||
|
|
||
| import org.apache.spark.sql.SparkSession | ||
| import org.apache.spark.util.Utils | ||
|
|
||
| /** | ||
| * Simple test for reading and writing to a distributed | ||
| * file system. This example does the following: | ||
| * | ||
| * 1. Reads local file | ||
| * 2. Computes word count on local file | ||
| * 3. Writes local file to a local dir on each executor | ||
| * 4. Reads the file back from each exec | ||
| * 5. Computes word count on the file using Spark | ||
| * 6. Compares the word count results | ||
| */ | ||
| object MiniReadWriteTest { | ||
|
|
||
| private val NPARAMS = 1 | ||
|
|
||
| private def readFile(filename: String): List[String] = { | ||
| Utils.tryWithResource(fromFile(filename))(_.getLines().toList) | ||
| } | ||
|
|
||
| private def printUsage(): Unit = { | ||
| val usage = """Mini Read-Write Test | ||
| |Usage: localFile | ||
| |localFile - (string) location of local file to distribute to executors.""".stripMargin | ||
|
|
||
| println(usage) | ||
| } | ||
|
|
||
| private def parseArgs(args: Array[String]): File = { | ||
| if (args.length != NPARAMS) { | ||
| printUsage() | ||
| System.exit(1) | ||
| } | ||
|
|
||
| var i = 0 | ||
|
|
||
| val localFilePath = new File(args(i)) | ||
| if (!localFilePath.exists) { | ||
| System.err.println(s"Given path (${args(i)}) does not exist") | ||
| printUsage() | ||
| System.exit(1) | ||
| } | ||
|
|
||
| if (!localFilePath.isFile) { | ||
| System.err.println(s"Given path (${args(i)}) is not a file") | ||
| printUsage() | ||
| System.exit(1) | ||
| } | ||
| localFilePath | ||
| } | ||
|
|
||
| def runLocalWordCount(fileContents: List[String]): Int = { | ||
| fileContents.flatMap(_.split(" ")) | ||
| .flatMap(_.split("\t")) | ||
| .filter(_.nonEmpty) | ||
| .groupBy(w => w) | ||
| .mapValues(_.size) | ||
| .values | ||
| .sum | ||
| } | ||
|
|
||
| def main(args: Array[String]): Unit = { | ||
| val localFilePath = parseArgs(args) | ||
|
|
||
| println(s"Performing local word count from ${localFilePath}") | ||
| val fileContents = readFile(localFilePath.toString()) | ||
| println(s"File contents are ${fileContents}") | ||
| val localWordCount = runLocalWordCount(fileContents) | ||
|
|
||
| println("Creating SparkSession") | ||
| val spark = SparkSession | ||
| .builder | ||
| .appName("Mini Read Write Test") | ||
| .getOrCreate() | ||
|
|
||
| println("Writing local file to executors") | ||
|
|
||
| // uses the fact default parallelism is greater than num execs | ||
| val misc = spark.sparkContext.parallelize(1.to(10)) | ||
| misc.foreachPartition { | ||
| x => | ||
| new PrintWriter(localFilePath) { | ||
| try { | ||
| write(fileContents.mkString("\n")) | ||
| } finally { | ||
| close() | ||
| }} | ||
| } | ||
|
|
||
| println("Reading file from execs and running Word Count") | ||
| val readFileRDD = spark.sparkContext.textFile(localFilePath.toString()) | ||
|
|
||
| val dWordCount = readFileRDD | ||
| .flatMap(_.split(" ")) | ||
| .flatMap(_.split("\t")) | ||
| .filter(_.nonEmpty) | ||
| .map(w => (w, 1)) | ||
| .countByKey() | ||
| .values | ||
| .sum | ||
|
|
||
| spark.stop() | ||
| if (localWordCount == dWordCount) { | ||
| println(s"Success! Local Word Count $localWordCount and " + | ||
| s"D Word Count $dWordCount agree.") | ||
| } else { | ||
| println(s"Failure! Local Word Count $localWordCount " + | ||
| s"and D Word Count $dWordCount disagree.") | ||
| } | ||
| } | ||
| } | ||
| // scalastyle:on println |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
59 changes: 59 additions & 0 deletions
59
...es/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/AbstractPodsAllocator.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one or more | ||
| * contributor license agreements. See the NOTICE file distributed with | ||
| * this work for additional information regarding copyright ownership. | ||
| * The ASF licenses this file to You under the Apache License, Version 2.0 | ||
| * (the "License"); you may not use this file except in compliance with | ||
| * the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
| package org.apache.spark.scheduler.cluster.k8s | ||
|
|
||
| import io.fabric8.kubernetes.api.model.Pod | ||
|
|
||
| import org.apache.spark.annotation.DeveloperApi | ||
| import org.apache.spark.resource.ResourceProfile | ||
|
|
||
|
|
||
| /** | ||
| * :: DeveloperApi :: | ||
| * A abstract interface for allowing different types of pods allocation. | ||
| * | ||
| * The internal Spark implementations are [[StatefulsetPodsAllocator]] | ||
| * and [[ExecutorPodsAllocator]]. This may be useful for folks integrating with custom schedulers | ||
| * such as Volcano, Yunikorn, etc. | ||
| * | ||
| * This API may change or be removed at anytime. | ||
| * | ||
| * @since 3.3.0 | ||
| */ | ||
| @DeveloperApi | ||
| abstract class AbstractPodsAllocator { | ||
| /* | ||
| * Set the total expected executors for an application | ||
| */ | ||
| def setTotalExpectedExecutors(resourceProfileToTotalExecs: Map[ResourceProfile, Int]): Unit | ||
| /* | ||
| * Reference to driver pod. | ||
| */ | ||
| def driverPod: Option[Pod] | ||
| /* | ||
| * If the pod for a given exec id is deleted. | ||
| */ | ||
| def isDeleted(executorId: String): Boolean | ||
| /* | ||
| * Start hook. | ||
| */ | ||
| def start(applicationId: String, schedulerBackend: KubernetesClusterSchedulerBackend): Unit | ||
| /* | ||
| * Stop hook | ||
| */ | ||
| def stop(applicationId: String): Unit | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.