Why doesn't SparkFetcher use REST client to fetch eventlogs? #199

shkhrgpt · 2017-01-31T18:03:56Z

SparkFetcher is using REST client to fetch ApplicationInfo, StageData, ExecutorData, and JobData. However, it's using WebHDFS based LOG client to fetch eventlogs. Spark (1.5.0 and above) history server provides the following REST APIs to fetch eventlogs:

I was wondering why we did not choose to use REST API to fetch eventlogs? Was this option ever considered? If yes, then what are the drawbacks with this approach. If no, then what could be potential problems?

@rayortigas: Since you wrote this new Spark fetcher, can you please provide some information on this issue?
@paulbramsen , @shankar37 , @akshayrai @babak-altiscale: What do you guys think about it?

Thank you.

paulbramsen · 2017-01-31T19:45:56Z

It seems like a good plan to me! We can always structure things so that the WebHDFS stuff lives in a legacy fetcher for those using older versions of Spark. I should have a Dr. Elephant instance to test on soon so I'll take deeper look in the next couple days.

superbobry · 2017-01-31T20:52:25Z

JFYI we've been using Sparklint (a tool for profiling resource usage of Spark jobs) which fetches history logs via REST API for about a month now, and it seems to do fine, so a big 👍 if Dr. E goes that way as well. Also, the potential benefit of such approach is that Dr. E would be able to fetch the logs of any user (without fiddling with HDFS ACL), whereas with the current version it is limited to the logs of the user which it is currently running under, c.f. Event Logging:

The event log files will be created by Spark with permissions such that only the user and group have read and write access.

shkhrgpt · 2017-01-31T21:13:23Z

Thanks, @superbobry. We also favor REST API based fetcher because of the same permission and Kerberos issue. However, in general, getting data using REST API can be slower than reading it from HDFS. Therefore, we wanted to know from Dr Elephant team that whether or not they have any concerns about the REST API based eventlog fetcher.

Since it's working fine for you, that gives us more confidence.

Thank you.

rayortigas · 2017-01-31T22:27:38Z

Dr. E is still defaulting to compile against 1.4, which as you observed, doesn't have an API to fetch event logs. So #162 was done with that minimum in mind.

If @shankar37, @akshayrai, and team are fine raising the minimum Spark version to >= 1.5, then I'm fine with the fetcher going to REST entirely as the primary method of getting info.

paulbramsen · 2017-02-03T00:06:53Z

I assume that if the new version of the Spark fetcher were designed so that the old one could stick around as a legacy tool then it would be fine, right? There could be 2 Spark fetchers kind of like how there are 2 MapReduce fetchers: one that uses the REST API and one that uses HDFS.

shkhrgpt · 2017-02-03T01:27:17Z

I agree with @paulbramsen, we can have both, REST client and LOG client, to fetch eventlogs. And maybe use REST client as default for spark versions 1.5 and above?

paulbramsen · 2017-02-03T01:38:53Z

@shkhrgpt as of right now the only way to make a fetcher the "default" fetcher is if it exists uncommented in FetcherConf.xml. There's not a mechanism to alter default by version. But I think that having both fetchers in FetcherConf.xml with one commented out and a description of each of their purposes would be sufficient.

rayortigas · 2017-02-03T03:20:10Z

@paulbramsen Yeah, the old one could co-exist with the new one. It would be a simple config block just as you described.

I should point out that after consulting with the Dr. E team I ended up removing the old fetcher, and I failed to update the documentation in #162, which makes it sounds like both the old and new are usable (sorry!). Nonetheless, if you look at the commits in https://github.com/linkedin/dr-elephant/pull/162/commits you'll see that it's a fairly clean path before and after I removed the old fetcher. So backtracking to a3dde3a and then doing some renaming and re-applying the subsequent commits in some form, we can get both to co-exist.

Of course I defer to @shankar37, @nntnag17, and @akshayrai, on what to do.

shkhrgpt · 2017-02-03T08:09:04Z

@rayortigas I thought that @paulbramsen meant complete REST API based fetcher as the new one, and the current fetcher (REST + WebHDFS) as the old one. Am I right @paulbramsen ? In that case, I don't think we need to backtrack to a3dde3a. The believe that the new fetcher should add a REST client to fetch eventlogs. @paulbramsen Am I right here?

paulbramsen · 2017-02-03T08:55:01Z

@shkhrgpt yes you are right. That is what I meant. No need to backtrack.

shkhrgpt · 2017-02-03T21:14:19Z

Since the existing and the proposed fetchers will be the same in most of their data fetching, both will use REST to fetch most of the data (ApplicationInfo, StageData, ExecutorData, and JobData). They will be only different in terms of fetching eventlogs. Does it make sense to call them two "different" fetchers?

I was thinking that maybe just keep one fetcher (SparkFetcher), and add an additional parameter to specify whether or not use REST client to fetch eventlogs. So the config for the fetcher will look like the following:

<fetcher>
<applicationtype>spark</applicationtype>
<classname>com.linkedin.drelephant.spark.fetchers.SparkFetcher</classname>
<params>
<should_use_rest_to_fetch_eventlogs>false</should_use_rest_to_fetch_eventlogs>
</params>
</fetcher>

The default value of this parameter should be false because the WebHDFS will work for all Spark versions but REST will for Spark 1.5 and above.

@paulbramsen @rayortigas What you think?

paulbramsen · 2017-02-08T05:37:20Z

@shkhrgpt sounds good to me.

superbobry · 2017-03-21T19:49:42Z

@shkhrgpt this way the user could misconfigure Dr. E to fetch eventlogs via REST API even if running only on Spark 1.5. What do you think about an alternative strategy: the fetcher would inspect SPARK_VERSION and use REST API for >1.5 and WebHDFS for ≤1.5?

Update: this would only work if Dr. E is compiled with the appropriate version of Spark.

This commits allows Dr. Elephant to fetch Spark logs without universal read access to eventLog.dir on HDFS. SparkFetcher would atomatically use SparkRestClient instead of SparkLogClient for Spark >1.5. Closes linkedin#199

This commit allows Dr. Elephant to fetch Spark logs without universal read access to eventLog.dir on HDFS. SparkFetcher would use SparkRestClient instead of SparkLogClient if configured as <params> <use_rest_for_eventlogs>true</use_rest_for_eventlogs> </params> The default behaviour is to fetch the logs via SparkLogClient/WebHDFS. Closes linkedin#199

shkhrgpt · 2017-03-21T21:20:23Z

@superbobry I think that's an interesting approach too. However, at this stage, the problem with making a decision based on Spark version forces the fetcher to use either WebHDFS or REST. Maybe someone does not want to use REST even if Spark version >= 1.5. Especially in the beginning when the REST fetcher is not well tested compared to WebHDFS fetcher. Maybe down the road when we feel confident about the REST fetcher, we can think of using Spark version. What do you think?

Btw, thanks for submitting the PR for this issue.

This commit allows Dr. Elephant to fetch Spark logs without universal read access to eventLog.dir on HDFS. SparkFetcher would use SparkRestClient instead of SparkLogClient if configured as <params> <use_rest_for_eventlogs>true</use_rest_for_eventlogs> </params> The default behaviour is to fetch the logs via SparkLogClient/WebHDFS. Closes linkedin#199

superbobry · 2017-03-21T23:59:56Z

@shkhrgpt I completely agree. The PR implements precisely your version.

superbobry mentioned this issue Mar 21, 2017

Spark fetcher is now able to fetch event logs via REST API #225

Merged

2 tasks

akshayrai closed this as completed in #225 Apr 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why doesn't SparkFetcher use REST client to fetch eventlogs? #199

Why doesn't SparkFetcher use REST client to fetch eventlogs? #199

shkhrgpt commented Jan 31, 2017

paulbramsen commented Jan 31, 2017

superbobry commented Jan 31, 2017

shkhrgpt commented Jan 31, 2017

rayortigas commented Jan 31, 2017

paulbramsen commented Feb 3, 2017

shkhrgpt commented Feb 3, 2017

paulbramsen commented Feb 3, 2017 •

edited

Loading

rayortigas commented Feb 3, 2017

shkhrgpt commented Feb 3, 2017

paulbramsen commented Feb 3, 2017

shkhrgpt commented Feb 3, 2017

paulbramsen commented Feb 8, 2017

superbobry commented Mar 21, 2017 •

edited

Loading

shkhrgpt commented Mar 21, 2017

superbobry commented Mar 21, 2017

Why doesn't SparkFetcher use REST client to fetch eventlogs? #199

Why doesn't SparkFetcher use REST client to fetch eventlogs? #199

Comments

shkhrgpt commented Jan 31, 2017

paulbramsen commented Jan 31, 2017

superbobry commented Jan 31, 2017

shkhrgpt commented Jan 31, 2017

rayortigas commented Jan 31, 2017

paulbramsen commented Feb 3, 2017

shkhrgpt commented Feb 3, 2017

paulbramsen commented Feb 3, 2017 • edited Loading

rayortigas commented Feb 3, 2017

shkhrgpt commented Feb 3, 2017

paulbramsen commented Feb 3, 2017

shkhrgpt commented Feb 3, 2017

paulbramsen commented Feb 8, 2017

superbobry commented Mar 21, 2017 • edited Loading

shkhrgpt commented Mar 21, 2017

superbobry commented Mar 21, 2017

paulbramsen commented Feb 3, 2017 •

edited

Loading

superbobry commented Mar 21, 2017 •

edited

Loading