Rewrite Spark fetcher/heuristics. #162

rayortigas · 2016-11-15T23:01:12Z

The purpose of this update is to:

rewrite the Spark data fetcher to use Spark event logs minimally, since it can be expensive to download and process these fully as done before
rewrite the Spark data fetcher to use the Spark monitoring REST API, which provides almost all of the information Spark heuristics need
update the Spark heuristics to provide hopefully more useful information and avoid being arbitrarily restrictive

The new Spark-related code is provided in parallel to the old Spark-related code. To enable it:

Uncomment and swap in the appropriate fragments in AggregatorConf.xml, FetcherConf.xml, and HeuristicConf.xml.
Set SPARK_CONF_DIR (or SPARK_HOME) to an appropriate location so that Dr. Elephant can find spark-defaults.conf.

Heuristics added:

"Executor shuffle read bytes distribution": We now provide a distribution with min/25p/median/75p/max, with severity based on a max-to-median ratio.
"Executor shuffle write bytes distribution": We now provide a distribution with min/25p/median/75p/max, with severity based on a max-to-median ratio.

Heuristics changed:

"Average input size" -> "Executor input bytes distribution": Instead of providing an average along with min/max, we now provide a distribution with min/25p/median/75p/max, with severity based on a max-to-median ratio.
"Average peak storage memory" -> "Executor storage memory used distribution": Instead of providing an average along with min/max, we now provide a distribution with min/25p/median/75p/max, with severity based on a max-to-median ratio.
"Average runtime" -> "Executor task time distribution": Instead of providing an average along with min/max, we now provide a distribution with min/25p/median/75p/max, with severity based on a max-to-median ratio.
"Memory utilization rate" -> "Executor storage memory utilization rate": This seemed to imply total memory but it is just the utilization rate for storage memory, so has been relabeled to indicate that. Shuffle memory is important too (but we don't seem to have access to shuffle memory utilization metrics).
"Total memory used at peak" -> "Total executor storage memory used": This also refers to storage memory. It has been relabeled to indicate that.
"Spark problematic stages" -> ("Spark stages with high task failure rates", "Spark stages with long average executor runtimes"): This was a combination of stages with high task failure rates and those with long runtimes. Those have been separated.

Heuristics removed:

spark.executor.cores: I think this is somewhat discretionary. At the very least, our internal recommendation stopped matching the one in Dr. Elephant.
spark.shuffle.manager: This was changed to "sort" by default as of Spark 1.2, so there is no current use for checking this setting.
"Average output size": Metrics related to output size appear to be deprecated or non-existent, so there is no current use for checking this setting.

Finally, overall waste metrics are calculated based on allocation [app runtime * # of executors * executor memory] vs. usage [total executor run time * executor memory]. They were previously calculated based only on storage memory and some 50% buffer, which I didn't understand.

Added unit tests and also tested against our internal cluster as much as practically I could. Will need help to fully validate.

rayortigas · 2016-11-16T02:42:43Z

I don't know how to make the JDK6 build pass https://travis-ci.org/linkedin/dr-elephant/jobs/176240296.

If I downgrade org.glassfish.jersey from 2.24 to 2.6, the last Java 6-compatible version, I get another set of problems with trying to use com.sun.jersey for some reason.

If we want this PR to happen, we need to move forward and stop supporting Java 6.

shankar37

I need to review more. But looks good overall

shankar37 · 2016-11-16T04:20:25Z

app/com/linkedin/drelephant/spark/SparkComboMetricsAggregator.scala

+        val resourcesAllocatedMBSeconds =
+          aggregateResourcesAllocatedMBSeconds(executorInstances, executorMemoryBytes, applicationDurationMillis)
+        val resourcesUsedMBSeconds = aggregateResourcesUsedMBSeconds(executorMemoryBytes, totalExecutorTaskTimeMillis)
+        val resourcesWastedMBSeconds = resourcesAllocatedMBSeconds - resourcesUsedMBSeconds


multiple resourceUsed by 1.5(configurable) to give 50% buffer. the reason for 50% buffer is because it will be impossible for users to use all allocated resources. But if they use less than 2/3rd of the allocated resources, then we flag it.

IS this fixed ? I dont see where the 1.5x factor is used.

The purpose of this update is to: - rewrite the Spark data fetcher to use Spark event logs minimally, since it can be expensive to download and process these fully as done before - rewrite the Spark data fetcher to use the [Spark monitoring REST API](https://spark.apache.org/docs/1.4.1/monitoring.html#rest-api), which provides almost all of the information Spark heuristics need - update the Spark heuristics to provide hopefully more useful information and avoid being arbitrarily restrictive The new Spark-related code is provided in parallel to the old Spark-related code. To enable it: - Uncomment and swap in the appropriate fragments in `AggregatorConf.xml`, `FetcherConf.xml`, and `HeuristicConf.xml`. - Set `SPARK_CONF_DIR` (or `SPARK_HOME`) to an appropriate location so that Dr. Elephant can find `spark-defaults.conf`. Heuristics added: - "Executor shuffle read bytes distribution": We now provide a distribution with min/25p/median/75p/max, with severity based on a max-to-median ratio. - "Executor shuffle write bytes distribution": We now provide a distribution with min/25p/median/75p/max, with severity based on a max-to-median ratio. Heuristics changed: - "Average input size" -> "Executor input bytes distribution": Instead of providing an average along with min/max, we now provide a distribution with min/25p/median/75p/max, with severity based on a max-to-median ratio. - "Average peak storage memory" -> "Executor storage memory used distribution": Instead of providing an average along with min/max, we now provide a distribution with min/25p/median/75p/max, with severity based on a max-to-median ratio. - "Average runtime" -> "Executor task time distribution": Instead of providing an average along with min/max, we now provide a distribution with min/25p/median/75p/max, with severity based on a max-to-median ratio. - "Memory utilization rate" -> "Executor storage memory utilization rate": This seemed to imply total memory but it is just the utilization rate for storage memory, so has been relabeled to indicate that. Shuffle memory is important too (but we don't seem to have access to shuffle memory utilization metrics). - "Total memory used at peak" -> "Total executor storage memory used": This also refers to storage memory. It has been relabeled to indicate that. - "Spark problematic stages" -> ("Spark stages with high task failure rates", "Spark stages with long average executor runtimes"): This was a combination of stages with high task failure rates and those with long runtimes. Those have been separated. Heuristics removed: - spark.executor.cores: I think this is somewhat discretionary. At the very least, our internal recommendation stopped matching the one in Dr. Elephant. - spark.shuffle.manager: This was changed to "sort" by default as of Spark 1.2, so there is no current use for checking this setting. - "Average output size": Metrics related to output size appear to be deprecated or non-existent, so there is no current use for checking this setting. Finally, overall waste metrics are calculated based on allocation [app runtime * # of executors * executor memory] vs. usage [total executor run time * executor memory]. They were previously calculated based only on storage memory and some 50% buffer, which I didn't understand. Added unit tests and also tested against our internal cluster as much as practically I could. Will need help to fully validate.

SBT 0.13.0 doesn't pull in the correct transitive dependencies from Jersey <sbt/sbt#847>.

…fixed Travis.

…because of initialization issues.

akshayrai · 2016-12-01T06:38:31Z

@shankar37, do you have anything to add? Can we push this?

@rayortigas, could you fix the conflicts post which I think we can ship it.

shankar37

I still need to look at the heuristic classes.

shankar37 · 2016-12-01T13:25:43Z

app/com/linkedin/drelephant/analysis/SeverityThresholds.scala

+  */
+case class SeverityThresholds(low: Number, moderate: Number, severe: Number, critical: Number, ascending: Boolean) {
+  if (ascending) {
+    require(low.doubleValue <= moderate.doubleValue)


use < instead . I can't think of a scenario where it will be equal.

The old code used equal numbers for severe and critical for some thresholds, e.g.

dr-elephant/app/com/linkedin/drelephant/spark/heuristics/JobRuntimeHeuristic.java

Line 47 in dad905c

private double[] avgJobFailureLimits = {0.1d, 0.3d, 0.5d, 0.5d}; // The avg job failure rate

. Also,

dr-elephant/app/com/linkedin/drelephant/analysis/Severity.java

Line 150 in dad905c

public static Severity getSeverityAscending(Number value, Number low, Number moderate, Number severe,

, for example, will check for equal values.

shankar37 · 2016-12-01T13:29:58Z

app/com/linkedin/drelephant/spark/heuristics/JobsHeuristic.scala

+/**
+  * A heuristic based on metrics for a Spark app's jobs.
+  *
+  * This heuristic reports job failures and high task failure rates for each job.


How about stages ?

StagesHeuristic was included:

https://github.com/linkedin/dr-elephant/pull/162/files#diff-5ac0beb971c613122a7e0d34c3e66354

shankar37 · 2016-12-01T13:31:33Z

app/com/linkedin/drelephant/spark/SparkComboMetricsAggregator.scala

+        val resourcesAllocatedMBSeconds =
+          aggregateResourcesAllocatedMBSeconds(executorInstances, executorMemoryBytes, applicationDurationMillis)
+        val resourcesUsedMBSeconds = aggregateResourcesUsedMBSeconds(executorMemoryBytes, totalExecutorTaskTimeMillis)
+        val resourcesWastedMBSeconds = resourcesAllocatedMBSeconds - resourcesUsedMBSeconds


IS this fixed ? I dont see where the 1.5x factor is used.

shankar37 · 2016-12-01T13:38:26Z

app/com/linkedin/drelephant/spark/data/SparkComboApplicationData.scala

+import com.linkedin.drelephant.analysis.{ApplicationType, HadoopApplicationData}
+
+
+case class SparkComboApplicationData(


Can it just be called SparkApplicationData ? That its a combo of log and rest derived data is an implementation detail, no ?

I see that detail is not hidden. Can we not expose that to the caller ?

SparkApplicationData already exists as an interface for the old fetcher to use: https://github.com/linkedin/dr-elephant/blob/dad905cdafa6aa7665059917a455cb601155fbd1/app/com/linkedin/drelephant/spark/data/SparkApplicationData.java. I didn't want to use this interface, so to preserve backwards compatibility for now I contrived SparkComboApplicationData.

I think my point is two fold

The name has to be changed to hide the fact that its "Combo"

The inferface seems to expose the fact that it's made of rest and log derived data. I want to hide that.

Can you also elaborate on why you couldn't use the existing interface of SparkApplicationData ? Is it because the data and its structure has changed significantly?

Correct, the data and its structure changed significantly enough to warrant not using SparkApplicationData.

Under normal circumstances I wouldn't use the Combo qualifier at all; I would delete the existing Spark stuff and just make this class the new SparkApplicationData. However, seeing that the new code is to be treated as experimental, I need some qualifier. It doesn't seem like a big deal to temporarily identify the implementation and then rename it when we are comfortable removing experimental status. If you have another suggestion, let me know.

shankar37 · 2016-12-01T13:55:10Z

app/com/linkedin/drelephant/spark/fetchers/statusapiv1/statusapiv1.scala

+import org.apache.spark.JobExecutionStatus
+import org.apache.spark.status.api.v1.StageStatus
+
+class ApplicationInfo(


Why are these not case classes ? Wouldn't that be more appropriate ?

The idea was to modify as little as possible the code copied from https://github.com/apache/spark/blob/v1.4.1/core/src/main/scala/org/apache/spark/status/api/v1/api.scala.

Furthermore, one of these classes has more than 22 fields, so it can't be converted to a case class anyway, at least for Scala 2.10.

shankar37 · 2016-12-01T13:56:30Z

app/com/linkedin/drelephant/spark/fetchers/statusapiv1/statusapiv1.scala

+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package com.linkedin.drelephant.spark.fetchers.statusapiv1


Why is it named statusapiv1? Does 1 here indicate spark version or something else ?

If you look at the comment at https://github.com/linkedin/dr-elephant/pull/162/files#diff-2f8c1423a9c566521c1e54b1a56fde5aR22 you'll see this is derived from https://github.com/apache/spark/blob/v1.4.1/core/src/main/scala/org/apache/spark/status/api/v1/api.scala.

shankar37 · 2016-12-01T14:15:38Z

app/com/linkedin/drelephant/spark/heuristics/ConfigurationHeuristic.scala

+      Try(getProperty(SPARK_DRIVER_MEMORY_KEY).map(MemoryFormatUtils.stringToBytes)).getOrElse(None)
+
+    lazy val executorMemoryBytes: Option[Long] =
+      Try(getProperty(SPARK_EXECUTOR_MEMORY_KEY).map(MemoryFormatUtils.stringToBytes)).getOrElse(None)


nit:spacing. seems like a new line is added for no reason

rayortigas · 2016-12-06T00:24:51Z

app/com/linkedin/drelephant/spark/SparkComboMetricsAggregator.scala

+
+    val resourcesWastedMBSeconds =
+      ((BigDecimal(resourcesAllocatedMBSeconds) * (1.0 - allocatedMemoryWasteBufferPercentage)) - BigDecimal(resourcesUsedMBSeconds))
+        .toBigInt


@shankar37 I restored the waste buffer.

The previous calculation was based on discounting 1/3 of storage (cache) memory, whereas the new calculation is based on discounting 1/2 of all memory, storage and shuffle, since from observation it seemed like a bigger number would be appropriate.

I think I said before, I have some reservations about how this number is even derived. I don't have any better ideas though, so I'm willing to commit here.

rayortigas · 2016-12-06T00:25:48Z

FYI, I had to push force on this PR since I had to rebase and resolve conflicts.

rayortigas · 2016-12-06T00:33:28Z

CI passed for Java 8 and 7, but as before, it is not working for Java 6.

shankar37 · 2016-12-06T17:55:33Z

Looks good to me except for the SparkComboApplicationData issue.

rayortigas · 2016-12-07T20:09:36Z

OK, I just discussed with @shankar37 and he is fine with me deleting the old Spark fetcher/heuristics altogether. If there are any problems with this new code, we'll fix it going forward.

Deleting the old code should fix my naming issue. @shankar37 also had a separate point about the data class exposing implementation, which I understand now. I'll refactor and clean it up.

rayortigas · 2016-12-08T00:51:47Z

@shankar37 I updated the PR, you can see the 5 commits added today: https://github.com/linkedin/dr-elephant/pull/162/commits

rayortigas · 2016-12-08T17:34:44Z

I found a small bug with the latest change while testing against our cluster, will update this PR today or tomorrow.

rayortigas · 2016-12-09T15:04:20Z

Did some testing against our cluster, I'm fine from my end now.

rayortigas · 2016-12-13T23:06:58Z

Thanks for all your help @shankar37 and @akshayrai, and for shepherding this through!

…ng workflow links (#207)

* Fix linkedin#162 with the right calculation for resourceswasted and add missing workflow links (linkedin#207) * Fix Exception thrown when JAVA_EXTRA_OPTIONS is not present (linkedin#210)

* Fix linkedin#162 with the right calculation for resourceswasted and add missing workflow links (linkedin#207) * Fix Exception thrown when JAVA_EXTRA_OPTIONS is not present (linkedin#210) * Adds an option to fetch recently finished apps from RM (linkedin#212) * Fixes issue caused by http in history server config property (linkedin#217) * add config for timezone of job history server (linkedin#214) * Include reference to the weekly meeting

* Rewrite Spark fetcher/heuristics. The purpose of this update is to: - rewrite the Spark data fetcher to use Spark event logs minimally, since it can be expensive to download and process these fully as done before - rewrite the Spark data fetcher to use the [Spark monitoring REST API](https://spark.apache.org/docs/1.4.1/monitoring.html#rest-api), which provides almost all of the information Spark heuristics need - update the Spark heuristics to provide hopefully more useful information and avoid being arbitrarily restrictive The new Spark-related code is provided in parallel to the old Spark-related code. To enable it: - Uncomment and swap in the appropriate fragments in `AggregatorConf.xml`, `FetcherConf.xml`, and `HeuristicConf.xml`. - Set `SPARK_CONF_DIR` (or `SPARK_HOME`) to an appropriate location so that Dr. Elephant can find `spark-defaults.conf`. Heuristics added: - "Executor shuffle read bytes distribution": We now provide a distribution with min/25p/median/75p/max, with severity based on a max-to-median ratio. - "Executor shuffle write bytes distribution": We now provide a distribution with min/25p/median/75p/max, with severity based on a max-to-median ratio. Heuristics changed: - "Average input size" -> "Executor input bytes distribution": Instead of providing an average along with min/max, we now provide a distribution with min/25p/median/75p/max, with severity based on a max-to-median ratio. - "Average peak storage memory" -> "Executor storage memory used distribution": Instead of providing an average along with min/max, we now provide a distribution with min/25p/median/75p/max, with severity based on a max-to-median ratio. - "Average runtime" -> "Executor task time distribution": Instead of providing an average along with min/max, we now provide a distribution with min/25p/median/75p/max, with severity based on a max-to-median ratio. - "Memory utilization rate" -> "Executor storage memory utilization rate": This seemed to imply total memory but it is just the utilization rate for storage memory, so has been relabeled to indicate that. Shuffle memory is important too (but we don't seem to have access to shuffle memory utilization metrics). - "Total memory used at peak" -> "Total executor storage memory used": This also refers to storage memory. It has been relabeled to indicate that. - "Spark problematic stages" -> ("Spark stages with high task failure rates", "Spark stages with long average executor runtimes"): This was a combination of stages with high task failure rates and those with long runtimes. Those have been separated. Heuristics removed: - spark.executor.cores: I think this is somewhat discretionary. At the very least, our internal recommendation stopped matching the one in Dr. Elephant. - spark.shuffle.manager: This was changed to "sort" by default as of Spark 1.2, so there is no current use for checking this setting. - "Average output size": Metrics related to output size appear to be deprecated or non-existent, so there is no current use for checking this setting. Finally, overall waste metrics are calculated based on allocation [app runtime * # of executors * executor memory] vs. usage [total executor run time * executor memory]. They were previously calculated based only on storage memory and some 50% buffer, which I didn't understand. Added unit tests and also tested against our internal cluster as much as practically I could. Will need help to fully validate.

…dd missing workflow links (linkedin#207)

shankar37 reviewed Nov 16, 2016

View reviewed changes

rayortigas added 6 commits November 23, 2016 12:07

Force hk2-{utils,locator} to make Travis work.

f27270e

Force hk2-{utils,locator} in all configurations to make Travis work.

e7541db

Upgrade SBT to 0.13.2 to avoid issues on Travis.

7404c04

SBT 0.13.0 doesn't pull in the correct transitive dependencies from Jersey <sbt/sbt#847>.

Revert forcing of hk2-{utils,locator}, since upgrading SBT to 0.13.2 …

b2c1d8f

…fixed Travis.

After rebasing against master, fix JMockit test that stopped working …

06480a2

…because of initialization issues.

shankar37 reviewed Dec 1, 2016

View reviewed changes

Restore memory waste buffer that Spark metrics aggregator was using.

277b9d2

rayortigas force-pushed the rewrite-spark-fetcher branch from 6ef552f to 277b9d2 Compare December 6, 2016 00:04

rayortigas commented Dec 6, 2016

View reviewed changes

rayortigas added 5 commits December 7, 2016 14:53

Hide internals of SparkComboApplicationData.

e084b0c

Start removing old fetcher/heuristic code/tests.

a3dde3a

SparkCombo* -> Spark*

7d87aa6

Configure to use new Spark fetcher/heuristics.

6ef8e90

Configure to use new Spark fetcher/heuristics.

4a92bec

Fix bug from refactoring SparkApplicationData.

604a64b

shankar37 approved these changes Dec 12, 2016

View reviewed changes

akshayrai merged commit 28f4025 into linkedin:master Dec 13, 2016

rayortigas mentioned this pull request Dec 19, 2016

refactor Spark fetcher #149

Closed

hoesler mentioned this pull request Jan 26, 2017

SparkFetcher with Namenode HA #197

Open

This was referenced Jan 29, 2017

Fix missing Spark flow and job info. #198

Closed

Fixes Spark REST fetcher for client mode applications #193

Merged

Why doesn't SparkFetcher use REST client to fetch eventlogs? #199

Closed

akshayrai pushed a commit that referenced this pull request Feb 22, 2017

Fix #162 with the right calculation for resourceswasted and add missi…

d3c90d5

…ng workflow links (#207)

rayortigas mentioned this pull request Mar 10, 2017

Use old Spark fetcher as a fallback. #224

Closed

skakker pushed a commit to skakker/dr-elephant that referenced this pull request Dec 14, 2017

Fix linkedin#162 with the right calculation for resourceswasted and a…

7f51604

…dd missing workflow links (linkedin#207)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite Spark fetcher/heuristics. #162

Rewrite Spark fetcher/heuristics. #162

rayortigas commented Nov 15, 2016

rayortigas commented Nov 16, 2016

shankar37 left a comment

shankar37 Nov 16, 2016

shankar37 Dec 1, 2016

akshayrai commented Dec 1, 2016

shankar37 left a comment

shankar37 Dec 1, 2016

rayortigas Dec 6, 2016

shankar37 Dec 1, 2016

rayortigas Dec 6, 2016

shankar37 Dec 1, 2016

shankar37 Dec 1, 2016

shankar37 Dec 1, 2016

rayortigas Dec 6, 2016

shankar37 Dec 6, 2016

rayortigas Dec 6, 2016

shankar37 Dec 1, 2016

rayortigas Dec 6, 2016

rayortigas Dec 6, 2016

shankar37 Dec 1, 2016

rayortigas Dec 6, 2016

shankar37 Dec 1, 2016

rayortigas Dec 6, 2016

rayortigas commented Dec 6, 2016

rayortigas commented Dec 6, 2016

shankar37 commented Dec 6, 2016

rayortigas commented Dec 7, 2016

rayortigas commented Dec 8, 2016

rayortigas commented Dec 8, 2016

rayortigas commented Dec 9, 2016

rayortigas commented Dec 13, 2016

		import com.linkedin.drelephant.analysis.{ApplicationType, HadoopApplicationData}


		case class SparkComboApplicationData(

Rewrite Spark fetcher/heuristics. #162

Rewrite Spark fetcher/heuristics. #162

Conversation

rayortigas commented Nov 15, 2016

rayortigas commented Nov 16, 2016

shankar37 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akshayrai commented Dec 1, 2016

shankar37 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rayortigas commented Dec 6, 2016

rayortigas commented Dec 6, 2016

shankar37 commented Dec 6, 2016

rayortigas commented Dec 7, 2016

rayortigas commented Dec 8, 2016

rayortigas commented Dec 8, 2016

rayortigas commented Dec 9, 2016

rayortigas commented Dec 13, 2016