Initial Kubernetes cluster manager implementation. by mccheah · Pull Request #50 · palantir/spark

mccheah · 2016-11-11T23:43:55Z

Includes the following initial feature set:

Cluster mode with only Scala/Java jobs
Spark-submit support
Dynamic allocation

Does not include, most notably:

Client mode support
Proper testing on both the unit and integration level; integration tests are flaky

Includes the following initial feature set: - Cluster mode with only Scala/Java jobs - Spark-submit support - Dynamic allocation Does not include, most notably: - Client mode support - Proper testing on both the unit and integration level; integration tests are flaky

schlosna · 2016-11-12T01:15:13Z

core/src/main/scala/org/apache/spark/storage/BlockManager.scala

Nit: invert if/else

schlosna · 2016-11-12T01:17:14Z

kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/Client.scala

Name threads?

schlosna · 2016-11-12T01:20:17Z

kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/Client.scala

Can there be more than 1 status?

Don't think so.

Actually we have to look up the container status for the container that is hosting the Driver container... will need to think about this. Alternatively we can just wait for all container statuses to be ready.

schlosna · 2016-11-12T01:25:00Z

kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/Client.scala

Should probably check if it's a file (which implicitly checks existence) to ensure it's not a directory

schlosna · 2016-11-12T01:31:12Z

.../org/apache/spark/deploy/kubernetes/driverlauncher/KubernetesDriverLauncherServiceImpl.scala

Name threads?

schlosna · 2016-11-12T01:41:29Z

.../scala/org/apache/spark/scheduler/cluster/kubernetes/KubernetesClusterSchedulerBackend.scala

+      Executors.newCachedThreadPool(
+          new ThreadFactoryBuilder()
+              .setDaemon(true)
+              .setNameFormat("kubernetes-executor-requests")


I think we want name format of "kubernetes-executor-requests-%d"

schlosna · 2016-11-12T01:53:00Z

kubernetes/docker-minimal-bundle/src/main/docker/executor/Dockerfile

+WORKDIR /opt/spark
+
+# TODO support spark.executor.extraClassPath
+CMD ${JAVA_HOME}/bin/java -Dspark.executor.port=$SPARK_EXECUTOR_PORT -Xmx$SPARK_EXECUTOR_MEMORY -cp ${SPARK_HOME}/jars/\* org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url $SPARK_DRIVER_URL --executor-id $(dbus-uuidgen) --cores $SPARK_EXECUTOR_CORES --app-id $SPARK_APPLICATION_ID --hostname $HOSTNAME


Should set min heap as well: -Xms$SPARK_EXECUTOR_MEMORY to avoid a bunch of Full GC due to ergonomics as heap size dynamically adjusts

schlosna · 2016-11-12T01:53:58Z

kubernetes/docker-minimal-bundle/src/main/docker/shuffle-service/Dockerfile

+
+WORKDIR /opt/spark
+
+CMD ${JAVA_HOME}/bin/java -Dspark.shuffle.service.port=$SPARK_SHUFFLE_SERVICE_PORT -Xmx1g -cp ${SPARK_HOME}/jars/\* org.apache.spark.deploy.ExternalShuffleService 


Configurable heap? Also set min heap = max?

ash211

Geez Maven is verbose with xml

ash211 · 2016-11-17T23:00:01Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala

+
+      case KUBERNETES_EXPOSE_DRIVER_PORT =>
+        value.split("=", 2).toSeq match {
+          case Seq(k, v) => exposeDriverPorts(k) = v.toInt


This will throw if v isn't toIntable?

Yes. Probably should have a try-catch for a more favorable message than what I presume the default will be though.

ash211 · 2016-11-18T02:06:57Z

launcher/src/main/java/org/apache/spark/launcher/SparkSubmitOptionParser.java

+  protected final String KUBERNETES_APP_NAMESPACE = "--kubernetes-app-namespace";
+  protected final String KUBERNETES_CLIENT_CERT_FILE = "--kubernetes-client-cert-file";
+  protected final String KUBERNETES_CLIENT_KEY_FILE = "--kubernetes-client-key-file";
+  protected final String KUBERNETES_CA_CERT_FILE = "--kubernetes-ca-cert-file";


alphabetize within this k8s group

ash211 · 2016-11-18T02:07:13Z

pom.xml

        <artifactId>netty</artifactId>
        <version>3.8.0.Final</version>
      </dependency>
+


ash211 · 2016-11-18T02:07:38Z

yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala

   * the locally-generated ID from the superclass.
-   * @return The application ID
+    *
+    * @return The application ID


nit: spacing changed unnecessarily

- Run spark shuffle service independently of a Spark job - Run executor pods instead of replication controllers

foxish · 2016-11-21T19:26:53Z

@ash211 @mccheah Shall we move these changes to a branch in https://github.com/foxish/spark/ and continue the discussions there on the various points we discussed last week?

ash211 · 2016-11-21T19:28:16Z

@foxish yep! Just got legal signoff this morning so we'll be sending it over shortly

foxish · 2016-11-21T19:31:27Z

Thanks!

HostPath volume will be used both on the shuffle service and the executors that connect to it. The shuffle service picks up the files written by the executors via the shared host volume.

mccheah · 2016-11-22T02:11:22Z

@foxish can I get permission to push to your fork?

ash211 · 2016-11-27T08:12:07Z

Code transferred to foxish#7 with discussion now happening there instead

* Create README to better describe project purpose * Add links to usage guide and dev docs * Minor changes

mccheah force-pushed the kubernetes branch from ee875cc to b60a12f Compare November 12, 2016 01:33

schlosna reviewed Nov 12, 2016

View reviewed changes

mccheah added 2 commits November 14, 2016 13:54

Address first PR comments.

9774c4b

Add another thread name

713011a

ash211 reviewed Nov 18, 2016

View reviewed changes

Two changes:

9ed489e

- Run spark shuffle service independently of a Spark job - Run executor pods instead of replication controllers

Use hostPath volume for shuffles.

8b5c717

HostPath volume will be used both on the shuffle service and the executors that connect to it. The shuffle service picks up the files written by the executors via the shared host volume.

ash211 closed this Nov 27, 2016

robert3005 deleted the kubernetes branch December 7, 2016 16:04

ash211 added a commit that referenced this pull request Feb 16, 2017

Create README to better describe project purpose (#50)

a9fd533

* Create README to better describe project purpose * Add links to usage guide and dev docs * Minor changes

mccheah pushed a commit that referenced this pull request Apr 27, 2017

Create README to better describe project purpose (#50)

3b5901a

* Create README to better describe project purpose * Add links to usage guide and dev docs * Minor changes

justinuang pushed a commit that referenced this pull request Nov 15, 2018

set correct execution Id for broadcast query stage (#50)

8868d6d

justinuang pushed a commit that referenced this pull request Dec 11, 2018

set correct execution Id for broadcast query stage (#50)

5971971

justinuang pushed a commit that referenced this pull request Jan 14, 2019

set correct execution Id for broadcast query stage (#50)

f3b7dee


		WORKDIR /opt/spark

		CMD ${JAVA_HOME}/bin/java -Dspark.shuffle.service.port=$SPARK_SHUFFLE_SERVICE_PORT -Xmx1g -cp ${SPARK_HOME}/jars/\* org.apache.spark.deploy.ExternalShuffleService

Conversation

mccheah commented Nov 11, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ash211 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

foxish commented Nov 21, 2016

Uh oh!

ash211 commented Nov 21, 2016

Uh oh!

foxish commented Nov 21, 2016

Uh oh!

mccheah commented Nov 22, 2016

Uh oh!

ash211 commented Nov 27, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants