-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[ZEPPELIN-1280][Spark on Yarn] Documents for running zeppelin on production environments using docker. #1318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
|
|
||
| ``` | ||
| ps -ef |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean ps -ef | grep spark ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hadoop is also running so just ps -ef is the best way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right. But i just wanted to filter processes list.
|
@astroshim Great work indeed! As just proof reading |
|
@AhyoungRyu Thank you very much for your effort. 👍 |
Minor update for spark_cluster_mode.md
| <li class="title"><span><b>Advanced</b><span></li> | ||
| <li><a href="{{BASE_PATH}}/install/virtual_machine.html">Zeppelin on Vagrant VM</a></li> | ||
| <li><a href="{{BASE_PATH}}/install/spark_cluster_mode.html#spark-standalone-mode">Zeppelin on Spark Cluster Mode (Standalone)</a></li> | ||
| <li><a href="{{BASE_PATH}}/install/spark_cluster_mode.html#spark-standalone-mode">Zeppelin on Spark Cluster Mode (Yarn)</a></li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably will be a good idea to all cap YARN
http://spark.apache.org/docs/latest/running-on-yarn.html
|
@felixcheung Thank you very much for detail review. 👍 |
|
It's hard to say - I think one approach would be latest (spark 2.0 & Hadoop 2.7); another approach would be the most popular ones |
|
Do you know what version of spark&hadoop is popular? I can test it. |
|
cool. hadoop versions in distributions: |
|
Then what about supporting latest verison(Spark2.0 & hadoop2.7)? |
|
Docs looks great to me, thank you @astroshim ! |
|
I got a following error when I try to run zeppelin with spark2.0&hadoop2.7. My build command is but hadoop library for spark interpreter is Maybe the error occurs because different versions of the hadoop library. |
|
Looks like some of the Hadoop jars are 2.2 instead of 2.7? |
|
@felixcheung Yes 2.2, Maybe it's because my maven repo has different versions of hadoop libraries like following. so I fix for this on #1335. |
|
I got success zeppelin job with
|
|
@astroshim I can use spark 2.0 and hadoop 2.7 successfully. I hit this issue when building zeppelin with profile yarn enabled. So please don't enable yarn profile otherwise you will get hadoop version mismatch. I have left a comment in #1301 to remove yarn profile. |
|
Please merge this if there is no more discussion because I want to make document for https://issues.apache.org/jira/browse/ZEPPELIN-1279. |
|
The jira title seems a little confusing to me. The PR is for running spark on yarn by docker, but I don't think users will use docker for production for now. |
|
@zjffdu You're right, usually users don't make their production using docker. |
|
It would be better to change the title to reflect the docker. I think we should mention docker is only for small experimental environment rather than production environment. Besides that, I don't know how much complicated of using docker, I would be more conservative to bring extras dependencies, especially when it is complicated and not usually needed in real environment. We can hear more feedback from people who know more about docker. |
|
I can update PR title |
|
Thanks @astroshim, I have no other concerns. |
|
I agree we could be more specific on the title/subject for this document. But lots of company run production on Docker though, just FYI. Either Docker by itself on premise or in the cloud, with something like DC/OS. |
|
Can this be merged now? :) |
|
Looks great to me. Merging to master, if there is no further discussion |
### What is this PR for? This PR is for the documentation of running zeppelin on production environments especially spark on mesos via Docker. Related issue is #1227 and #1318 and I got a lot of hints from https://github.com/sequenceiq/hadoop-docker. Tested on ubuntu. ### What type of PR is it? Documentation ### What is the Jira issue? https://issues.apache.org/jira/browse/ZEPPELIN-1279 ### How should this be tested? You can refer to https://github.com/apache/zeppelin/blob/master/docs/README.md#build-documentation. ### Questions: * Does the licenses files need update? no * Is there breaking changes for older versions? no * Does this needs documentation? no Author: astroshim <[email protected]> Author: AhyoungRyu <[email protected]> Author: HyungSung <[email protected]> Closes #1389 from astroshim/ZEPPELIN-1279 and squashes the following commits: 974366a [HyungSung] Merge pull request #10 from AhyoungRyu/ZEPPELIN-1279-ahyoung 076fdba [AhyoungRyu] Change zeppelin_mesos_conf.png file 1cbe9d3 [astroshim] fix spark version and mesos 2b821b4 [astroshim] fix docs 159bafc [astroshim] fix anchor d8c43b4 [astroshim] add navigation c808350 [astroshim] add image file and doc a3b0ded [astroshim] create dockerfile for mesos





What is this PR for?
This PR is for the documentation of running zeppelin on production environments especially spark on yarn.
Related issue is #1227 and I got a lot of hints from https://github.com/sequenceiq/hadoop-docker.
Tested on ubuntu.
What type of PR is it?
Documentation
What is the Jira issue?
https://issues.apache.org/jira/browse/ZEPPELIN-1280
Questions: