Use old Spark fetcher as a fallback. #224

rayortigas · 2017-03-10T02:33:16Z

As requested by @shankar37, this PR restores legacy Spark fetcher for use as fallback for new Spark fetcher introduced in #162.

The rough order of operations to achieve this commit was:

rewind Rewrite Spark fetcher/heuristics. #162 to the point where the new and legacy fetchers coexisted
apply refactor Spark fetcher #149 to get refactored legacy fetcher (which was never merged originally, under the assumption that Rewrite Spark fetcher/heuristics. #162 was sufficient)
fast-forward from 1 and then apply some modifications to get Rewrite Spark fetcher/heuristics. #162 and refactor Spark fetcher #149 to coexist
make new fetcher use legacy fetcher, converting data returned by latter to data returned by former
refactor and clean up

This should also fix #206.

Haven't tested on our cluster yet, but will report back.

shkhrgpt · 2017-03-10T20:21:55Z

I have an overall design question here. Why not to have a separate file system fetcher for spark, rather having as a fallback from REST fetcher?

rayortigas · 2017-03-10T22:19:36Z

I had the same question. IIRC, from discussing with @shankar37 it sounded like it was desirable to fall back to filesystem fetcher if the REST fetcher encountered issues with the API endpoint.

It's unclear whether restoring the filesystem fetcher should be permanent. In any case, this PR provides a bridge to whatever gets decided next.

shkhrgpt · 2017-03-10T22:44:32Z

I think there should be a flag to determine whether or not to use fallback fetcher I addition, I think there should also be a configuration to just use the FS fetcher, if someone wants to use that.

What do you think?

babak-altiscale · 2017-03-13T23:26:25Z

YES! I absolutely request having both of these Fetchers with a configuration option. Let me explain why: I recently wanted to update our Dr. Elephant and realized we can not use the new SparkFetcher requiring Spark History Server. At Altiscale we decided to let the users run the Spark History Server rather than us running it on the whole cluster due to the increasing memory usage of the Spark History Server (I don't know if it still does not have a retention policy or they resolved it, but this was an issue with some of our clusters that used to run a lot of Spark jobs).

Therefore, we can not use the new SparkFetcher until we decide to run the History Server for the whole cluster as a daemon. Having both of the fetchers is very useful for other reasons (like performance) too. If the previous code was not clean, we can certainly polish it and then put it back in place.

babak-altiscale · 2017-03-13T23:34:47Z

Also, I just looked at the Code and want to ask the same question that @shkhrgpt asked earlier in this thread. Why not having a SparkFSFetcher, just like MapReduce that has MapReduceFetcher and MapReduceFSFetcher?

rayortigas · 2017-03-13T23:43:42Z

@babak-altiscale I haven't completely verified this works, but the intent here was also to make SparkFSFetcher usable via explicit configuration if that was desired. Note though that as things stand, SparkFSFetcher will only work with the old heuristics.

As for @shkhrgpt requesting a config flag specifically for the SparkFetcher (REST), indicating whether it should fall back to the SparkFSFetcher (or not at all)... Assuming I've understood that correctly, I'm fine with that too.

shankar37 · 2017-03-14T03:20:09Z

@rayortigas Is there a strong reason why the SparkFSFetcher can't be made to work with the new heuristics and new metricsaggregator ? My intention was to bring the FS Fetcher back as a standalone fetcher and also a combo fetcher so to speak that uses both REST and FS Fetchers in that order. But I dont want any of this to change which heuristics we have and how we calculate metrics aggregation. That will lead to a lot of confusion and also a dependency we are trying to avoid. We are proceeding in a direction where the fetchers and heuristics are completely decoupled from each other except through the HadoopApplicationData interface.

shankar37 · 2017-03-14T03:32:29Z

Also, it's not clear to me where in code or config you make the switch between old and new heuristics. I see the old heuristics code back but I am not sure how those are invoked only when legacyfetcher is called.

rayortigas · 2017-03-14T03:32:52Z

@shankar37 I only focused on getting the SparkFetcher falling back to the SparkFSFetcher, since that is what you and I had agreed upon; when that fallback from REST to FS happens, we still end up using the new heuristics and metrics aggregator, since the SparkFetcher calls a converter to wrap whatever it got from SparkFSFetcher.

If additionally, you want the SparkFSFetcher to also be usable standalone and with the new heuristics and metrics aggregator, then that is extra work that I did not scope. I suspect that is not too much more work, but I've run out of cycles at the moment to make that happen. Feel free to evolve the PR though, I don't want to hold things up.

rayortigas · 2017-03-14T03:34:22Z

re: legacy heuristics, those would be invoked if you configured SparkFSFetcher and old heuristics like before (just with the package names changed to legacydata/legacyheuristics).

shankar37 · 2017-03-14T03:36:52Z

got it. I am not expecting that to happen in this PR but I misunderstood from your comment as that is done already. Now the change makes more sense. Why did you bring the old heuristics and old metrics aggregators file then ? It is not needed for this change, right ?

rayortigas · 2017-03-14T03:39:38Z

They are not needed; sorry for the confusion. I brought in the old heuristics and metrics aggregator in case somebody wanted to use them again, or use them for debugging the SparkFSFetcher.

We could always re-deprecate the old heuristics and metrics aggregator when we were more confident about the path going forward.

shankar37 · 2017-03-14T03:50:06Z

ok, I am confident we dont need the old metrics aggregator. Regarding the heuristics, we can discuss it in the dr. elephant meetup on what the community feels. @akshayrai and @nntnag17 are in sunnyvale this week. Can you work with them to test this in the test machine ?

rayortigas · 2017-03-14T04:12:25Z

OK, I just sent them an email.

shkhrgpt · 2017-03-14T05:32:25Z

I agree with @shankar37, I don't think there is any need for legacy heuristics. If we decide to remove legacy heuristics, then we should also remove legacy SparkApplicationData class. We should just have one SparkApplicationData, com.linkedin.drelephant.spark.data.SparkApplicationData. Maybe SparkFSFetcher directly can directly use the converter, then generate com.linkedin.drelephant.spark.data.SparkApplicationData directly. It's going to be very confusing otherwise.

@shankar37: I am still wondering what is the main motivation of using the fallback based design, rather than having two separate fetchers?

shankar37 · 2017-03-14T06:05:40Z

@shkhrgpt The motivation is based on my assessment that neither fetcher is good enough on it's own. REST fetcher is fast but history server seems to have reliability issues. FS Fetcher is reliable but downloads too much data and is a strain on Dr. Elephant. Wanted to create a fetcher that tries REST when possible and FS when REST is not available.

shkhrgpt · 2017-03-14T06:39:16Z

Hmm.... I agree that SparkFetcher is not very reliable. I am still not sure about the fallback approach. If the REST fetcher fails to fetch data for an application, I believe it may continue to fail to fetch data for all the following applications. Which means, once the fetcher goes into the fallback mode, it may always be in the fallback mode. Moreover, for every new application, the fetcher will first try to use REST, even though it has been failing for a while. So if Spark History Server was already in a bad state (OOM), this will make things worse.

What do you think?

shkhrgpt

Maybe create a separate PR for this change? I don't think you need to combine it with this big PR.

shkhrgpt · 2017-03-16T00:36:20Z

app/com/linkedin/drelephant/spark/fetchers/SparkRestClient.scala

@@ -143,6 +144,7 @@ object SparkRestClient {
    }

    val objectMapper = new ObjectMapper() with ScalaObjectMapper
+    objectMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)


I think this configuration already exist in the latest SparkRestClient. Check this:

https://github.com/linkedin/dr-elephant/blob/master/app/com/linkedin/drelephant/spark/fetchers/SparkRestClient.scala#L155

Maybe you just need to update your local repo, and only commit the unit test?

Thanks for catching that @shkhrgpt! I'll remove the conflicting line when I do a final rebase and cleanup.

…tcher (#232) Remove backup for Rest Fetcher and make Legacy FSFetcher as top level fetcher. Change the default fetcher in the config.

akshayrai · 2017-04-14T04:42:40Z

I am closing this in favor of #232. Thanks a lot @rayortigas and all those involved.

…alone fetcher (linkedin#232) Remove backup for Rest Fetcher and make Legacy FSFetcher as top level fetcher. Change the default fetcher in the config.

shkhrgpt suggested changes Mar 16, 2017

View reviewed changes

Use old Spark fetcher as a fallback.

cf435dc

rayortigas force-pushed the use-old-spark-fetcher-as-fallback branch from 4989bfc to cf435dc Compare March 21, 2017 03:56

shkhrgpt mentioned this pull request Mar 24, 2017

[ANNOUNCEMENT] Dr. Elephant Weekly Meeting #209

Open

akshayrai pushed a commit that referenced this pull request Apr 14, 2017

Update #224 (credits: rayortigas) to add FSFetcher as a standalone fe…

c8a7009

…tcher (#232) Remove backup for Rest Fetcher and make Legacy FSFetcher as top level fetcher. Change the default fetcher in the config.

akshayrai closed this Apr 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use old Spark fetcher as a fallback. #224

Use old Spark fetcher as a fallback. #224

rayortigas commented Mar 10, 2017

shkhrgpt commented Mar 10, 2017

rayortigas commented Mar 10, 2017

shkhrgpt commented Mar 10, 2017

babak-altiscale commented Mar 13, 2017 •

edited

Loading

babak-altiscale commented Mar 13, 2017

rayortigas commented Mar 13, 2017

shankar37 commented Mar 14, 2017

shankar37 commented Mar 14, 2017

rayortigas commented Mar 14, 2017

rayortigas commented Mar 14, 2017

shankar37 commented Mar 14, 2017

rayortigas commented Mar 14, 2017

shankar37 commented Mar 14, 2017

rayortigas commented Mar 14, 2017

shkhrgpt commented Mar 14, 2017

shankar37 commented Mar 14, 2017

shkhrgpt commented Mar 14, 2017

shkhrgpt left a comment

shkhrgpt Mar 16, 2017

rayortigas Mar 16, 2017

akshayrai commented Apr 14, 2017

Use old Spark fetcher as a fallback. #224

Use old Spark fetcher as a fallback. #224

Conversation

rayortigas commented Mar 10, 2017

shkhrgpt commented Mar 10, 2017

rayortigas commented Mar 10, 2017

shkhrgpt commented Mar 10, 2017

babak-altiscale commented Mar 13, 2017 • edited Loading

babak-altiscale commented Mar 13, 2017

rayortigas commented Mar 13, 2017

shankar37 commented Mar 14, 2017

shankar37 commented Mar 14, 2017

rayortigas commented Mar 14, 2017

rayortigas commented Mar 14, 2017

shankar37 commented Mar 14, 2017

rayortigas commented Mar 14, 2017

shankar37 commented Mar 14, 2017

rayortigas commented Mar 14, 2017

shkhrgpt commented Mar 14, 2017

shankar37 commented Mar 14, 2017

shkhrgpt commented Mar 14, 2017

shkhrgpt left a comment

Choose a reason for hiding this comment

shkhrgpt Mar 16, 2017

Choose a reason for hiding this comment

rayortigas Mar 16, 2017

Choose a reason for hiding this comment

akshayrai commented Apr 14, 2017

babak-altiscale commented Mar 13, 2017 •

edited

Loading