Skip to content

Conversation

@mccheah
Copy link
Contributor

@mccheah mccheah commented May 4, 2018

What changes were proposed in this pull request?

Drastically improves performance and won't cause Spark applications to fail because they write too much data to the Docker image's specific file system. The file system's directories that back emptydir volumes are generally larger and more performant.

How was this patch tested?

Has been in use via the prototype version of Kubernetes support, but lost in the transition to here.

Dramatically improves performance and won't cause Spark applications to
fail because they write too much data to the Docker image's specific
file system. The file system's directories that back emptydir volumes
are generally larger and more performant.
@mccheah
Copy link
Contributor Author

mccheah commented May 4, 2018

@foxish @liyinan926 please take a look, thanks!

@SparkQA
Copy link

SparkQA commented May 4, 2018

Test build #90217 has finished for PR 21238 at commit 736674b.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mccheah mccheah changed the title [SPARK-24137] Mount local directories as empty dir volumes. [SPARK-24137][K8s] Mount local directories as empty dir volumes. May 4, 2018
@SparkQA
Copy link

SparkQA commented May 4, 2018

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2854/

@SparkQA
Copy link

SparkQA commented May 4, 2018

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2854/

@SparkQA
Copy link

SparkQA commented May 5, 2018

Test build #90225 has finished for PR 21238 at commit f68fcd8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrusha
Copy link
Contributor

andrusha commented May 7, 2018

Seems like it addresses similar problem to #21095. It might be worth investigating how to unify both.

@mccheah
Copy link
Contributor Author

mccheah commented May 9, 2018

@andrusha I don't think it's entirely analogous - for the simple reason that the hostPath volumes PR doesn't take into account SPARK_LOCAL_DIRS. That environment variable is used to determine the right place for shuffle files. We want to strictly tie that setting to emptydir volumes - and moreover this should be done transparently for the user, i.e. the user shouldn't have to know to provision these volumes to get better performance.

@mccheah
Copy link
Contributor Author

mccheah commented May 9, 2018

Also #21260 currently only supports hostPath and PVCs but you definitely want emptyDir for isolation (though that looks like a trivial enough change).

@apache apache deleted a comment from rxin May 9, 2018
if (contains("spark.local.dir")) {
val msg = "In Spark 1.0 and later spark.local.dir will be overridden by the value set by " +
"the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN)."
"the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone/kubernetes and LOCAL_DIRS" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, I deleted a comment here accidentally. @rxin said that we could remove this warning about Spark 1.0.

@erikerlandson
Copy link
Contributor

I agree with @mcheah that the potential code reuse is small. Keeping this as a separate pod construction step, decoupled from the user-exposed step, is cleaner.

val localDirVolumes = resolvedLocalDirs
.zipWithIndex
.map {
case (localDir, index) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the convention is to put case on the same line as map {.

.map {
case (localDir, index) =>
new VolumeBuilder()
.withName(s"spark-local-dir-${index + 1}-${Paths.get(localDir).getFileName.toString}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need to include the actual path in the volume name? I think spark-local-dir-${index + 1} is sufficient.

val localDirVolumeMounts = localDirVolumes
.zip(resolvedLocalDirs)
.map {
case (localDirVolume, localDirPath) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

@mccheah
Copy link
Contributor Author

mccheah commented May 10, 2018

@rxin @liyinan926 @foxish addressed comments.

@SparkQA
Copy link

SparkQA commented May 10, 2018

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3002/

@SparkQA
Copy link

SparkQA commented May 10, 2018

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3002/

Copy link
Contributor

@liyinan926 liyinan926 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@SparkQA
Copy link

SparkQA commented May 10, 2018

Test build #90431 has finished for PR 21238 at commit aebdb68.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 10, 2018

Test build #90432 has finished for PR 21238 at commit fa095cd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mccheah
Copy link
Contributor Author

mccheah commented May 10, 2018

Requesting sign off and merge from @foxish

@foxish
Copy link
Contributor

foxish commented May 10, 2018

LGTM. Merging to master.
Thanks @mccheah

@liyinan926
Copy link
Contributor

@foxish @mccheah should we also merge this to branch-2.3? I think we should target this for 2.3.1.

@asfgit asfgit closed this in 6282fc6 May 10, 2018
@foxish
Copy link
Contributor

foxish commented May 10, 2018

Maintenance releases most often have fixes for stability. We could maybe backport this since it's not a new feature but an omission from before. If it is going to be some effort, thanks to all the refactors that went in so far, we should think twice about whether we need to.

@foxish
Copy link
Contributor

foxish commented May 10, 2018

@mccheah, wdyt? I just haven't heard from any users here of 2.3 - if you think it's useful for 2.3.1 and low risk, then please feel free to propose a cherrypick.

@mccheah
Copy link
Contributor Author

mccheah commented May 10, 2018

I think we can afford to hold off here.

@mccheah
Copy link
Contributor Author

mccheah commented May 10, 2018

What would make this difficult to backport is the fact that this patch was built on top of the big refactor PR that only went in after 2.3. So we'd need to rewrite this with the old architecture which is a non-trivial effort.

@foxish
Copy link
Contributor

foxish commented May 10, 2018

SG. @liyinan926, let's revisit this if we hear from 2.3 users.

@liyinan926
Copy link
Contributor

Makes sense to me.

mccheah added a commit to palantir/spark that referenced this pull request May 10, 2018
Drastically improves performance and won't cause Spark applications to fail because they write too much data to the Docker image's specific file system. The file system's directories that back emptydir volumes are generally larger and more performant.

Has been in use via the prototype version of Kubernetes support, but lost in the transition to here.

Author: mcheah <[email protected]>

Closes apache#21238 from mccheah/mount-local-dirs.
asfgit pushed a commit that referenced this pull request Jul 11, 2018
This PR continues #21095 and intersects with #21238. I've added volume mounts as a separate step and added PersistantVolumeClaim support.

There is a fundamental problem with how we pass the options through spark conf to fabric8. For each volume type and all possible volume options we would have to implement some custom code to map config values to fabric8 calls. This will result in big body of code we would have to support and means that Spark will always be somehow out of sync with k8s.

I think there needs to be a discussion on how to proceed correctly (eg use PodPreset instead)

----

Due to the complications of provisioning and managing actual resources this PR addresses only volume mounting of already present resources.

----
- [x] emptyDir support
- [x] Testing
- [x] Documentation
- [x] KubernetesVolumeUtils tests

Author: Andrew Korzhuev <[email protected]>
Author: madanadit <[email protected]>

Closes #21260 from andrusha/k8s-vol.
@robert3005 robert3005 deleted the mount-local-dirs branch August 11, 2018 05:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants