Skip to content

Conversation

@hopper-signifyd
Copy link

@hopper-signifyd hopper-signifyd commented Jul 2, 2020

What changes were proposed in this pull request?

We should backport the fixes for local dirs on Kubernetes to Spark 2.4.

Why are the changes needed?

Running Spark on Kubernetes and not being able to use mounted NVME drives as local storage causes issues on services such as AWS EKS. Upgrading to 3.0 just to fix this bug is more hassle than it's worth for some organizations.

Does this PR introduce any user-facing change?

Technically, yes. This adds the spark.kubernetes.local.dirs.tmpfs back to Spark 2.4 from Spark 3. However, there's no "breaking changes" per se.

How was this patch tested?

The tests were backported. Also, we've been running our own custom Spark 2.4.5 build with this patch applied at my org for the past few months.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your first contribution, @hopper-signifyd .
However, Apache Spark community has a policy which backports only bug fixes. In other words, we cannot backport New Feature or Improvement. Since both SPARK-25262 and SPARK-28042 are new feature at 3.0.0, could you close this PR?

@hopper-signifyd
Copy link
Author

Hi @dongjoon-hyun,

I realize that this does technically backport a feature. However, this feature is backported in order to fix a bug that prevents pods / spark jobs from launching if they try to mount volumes and assign them as local dirs (even using spark.local.dirs). See this issue SPARK-31666 and this one too: kubeflow/spark-operator#828

@hopper-signifyd
Copy link
Author

Also, @dongjoon-hyun I wanted to say I really enjoyed your talk on Prometheus at the Spark Summit. I'm looking forward to setting up Prometheus for metrics gathering for my Spark jobs.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jul 2, 2020

Thank you for saying that.

I wanted to say I really enjoyed your talk on Prometheus at the Spark Summit.

BTW,

  • Apache Spark project is irrelevant to 3rd party's operators or 3rd party cloud vendors. IIRC, there are already multiple 3rd party Spark operators and at least three major vendors (EKS/AKS/GKE).
  • In addition, we don't allow bulk-backport like this. If you really want to propose something required, please make a PR one by one with the original Apache Spark commit title. For example, you may do like the following.
[SPARK-25262][K8S][2.4] Allow SPARK_LOCAL_DIRS to be tmpfs backed on K8S
  • It would be great if you can provide a test case for the bug you want to fix. For example, a validation test case for SPARK-31666?
  • Finally, please test locally first and submit a working code in a single piece. That looks better and helps reviewers. Testing as a WIP PR makes the PR itself stale easily.

@hopper-signifyd hopper-signifyd marked this pull request as draft July 2, 2020 21:48
@hopper-signifyd
Copy link
Author

@dongjoon-hyun Thank you for the explanation. I will close this and open separate PRs.

@dongjoon-hyun
Copy link
Member

Thank you, @hopper-signifyd . I'll review, #28985 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants