-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25262][K8S][2.4] Allow SPARK_LOCAL_DIRS to be tmpfs backed on K8S #28985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-25262][K8S][2.4] Allow SPARK_LOCAL_DIRS to be tmpfs backed on K8S #28985
Conversation
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, how did it help SPARK-31666? Any unit test about that?
|
As best as I can tell, both patches are required for fixing SPARK-31666. My plan was to add the additional unit test in the PR for backporting SPARK-28042. If I add the additional unit test to this PR, it will fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I investigated SPARK-31666 and left some comments.
In short, "Cannot map hostPath volumes to container" is wrong because SPARK-23529 works correctly like the following.
# minikube ssh ls /data
SPARK-31666.txt
export HTTP2_DISABLE=true
bin/spark-submit \
--master k8s://$K8S_MASTER \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.driverEnv.HTTP2_DISABLE=true \
--conf spark.executor.instances=1 \
--conf spark.kubernetes.container.image=spark/spark:v2.4.6 \
--conf spark.kubernetes.executor.volumes.hostPath.data.mount.path=/data \
--conf spark.kubernetes.executor.volumes.hostPath.data.options.path=/data \
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.6.jar 10000
# k exec po/spark-pi-1593729363998-exec-1 -- ls /data
SPARK-31666.txt
Please see your error message Invalid value: "/tmp1": must be unique.. The error message occurs because spark-local-dir-x is already mounted by another volume by Spark. You should not use the same name.
20/07/02 15:38:39 INFO LoggingPodStatusWatcherImpl: State changed, new state:
pod name: spark-pi-1593729518015-driver
namespace: default
labels: spark-app-selector -> spark-74b65a9a61cc46fd8bfc5e03e4b28bb8, spark-role -> driver
pod uid: d838532b-eaa9-4b11-8eba-655f66965580
creation time: 2020-07-02T22:38:39Z
service account name: default
volumes: spark-local-dir-1, spark-conf-volume, default-token-n5wwg
node name: N/A
start time: N/A
container images: N/A
phase: Pending
status: []
|
I closed SPARK-31666 as |
|
Can one of the admins verify this patch? |
What changes were proposed in this pull request?
This is a backport of changes from SPARK-25262, PR #22323
I plan to also backport SPARK-28042, PR #24879 after this is landed. That diff will also include a test confirming that related issue SPARK-31666 is fixed.
I had originally submitted both as a single PR, but have decided to close that PR and open 2 smaller ones per the recommendation of @dongjoon-hyun . See #28982 for more details.
Why are the changes needed?
Running Spark on Kubernetes and not being able to use mounted volumes as local storage causes issues that prevent Spark jobs from starting. I've seen this on AWS EKS, but I've been able to reproduce it with a basic spark-submit command on a standard K8S cluster. Upgrading to 3.0 just to fix this bug is more hassle than it's worth for some organizations.
Does this PR introduce any user-facing change?
Technically, yes. This adds the spark.kubernetes.local.dirs.tmpfs back to Spark 2.4 from Spark 3. However, there's no "breaking changes" per se.
How was this patch tested?
The tests were backported. Also, we've been running our own custom Spark 2.4.5 build with this patch applied at my org for the past few months.