Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Nov 18, 2020

What changes were proposed in this pull request?

This PR aims to generalize executor metrics to support user-given file system schemes instead of the fixed file,hdfs scheme.

Why are the changes needed?

For the users using only cloud storages like S3A, we need to be able to expose S3A metrics. Also, we can skip unused hdfs metrics.

Does this PR introduce any user-facing change?

Yes, but compatible for the existing users which uses hdfs and file filesystem scheme only.

How was this patch tested?

Manually do the following.

$ build/sbt -Phadoop-cloud package
$ sbin/start-master.sh; sbin/start-slave.sh spark://$(hostname):7077
$ bin/spark-shell --master spark://$(hostname):7077 -c spark.executor.metrics.fileSystemSchemes=file,s3a -c spark.metrics.conf.executor.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink
scala> spark.read.textFile("s3a://dongjoon/README.md").collect()

Separately, launch jconsole and check *.executor.filesystem.s3a.*. Also, confirm that there is no *.executor.filesystem.hdfs.*

$ jconsole

Screen Shot 2020-11-17 at 9 26 03 PM

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-33476][CORE] Generalize filesystem metric to support user-given file schemes [SPARK-33476][CORE] Generalize ExecutorSource to expose user-given file system schemes Nov 18, 2020
@dongjoon-hyun
Copy link
Member Author

Could you review this please, @viirya ?

@github-actions github-actions bot added the CORE label Nov 18, 2020
Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems OK to me.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to update monitoring.md?

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay to me too.

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35852/

@dongjoon-hyun
Copy link
Member Author

Thank you, @HyukjinKwon and @viirya . I'll mention this new conf in the monitoring.md according to your advice.

}
private val executorSource = new ExecutorSource(threadPool, executorId)
private val schemes = conf.get(EXECUTOR_METRICS_FILESYSTEM_SCHEMES)
.toLowerCase(Locale.ROOT).split(",").map(_.trim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: filter empty after map ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, thanks, @mridulm !

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35852/

@dongjoon-hyun
Copy link
Member Author

All comments are addressed. (@HyukjinKwon , @viirya , @mridulm )

@github-actions github-actions bot added the DOCS label Nov 18, 2020
@dongjoon-hyun
Copy link
Member Author

Thank you, @viirya !

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Test build #131259 has finished for PR 30405 at commit 8ca8624.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Test build #131249 has finished for PR 30405 at commit 92fdbdf.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class ExecutorSource(

@viirya
Copy link
Member

viirya commented Nov 18, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35862/

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35862/

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35873/

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Test build #131271 has finished for PR 30405 at commit 8ca8624.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35873/

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35876/

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35876/

@dongjoon-hyun
Copy link
Member Author

Given the approvals, I'll merge this PR.
Thank you all. Merged to master for Apache Spark 3.1.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-33476 branch November 18, 2020 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants