Skip to content

Conversation

@mfelgamal
Copy link
Contributor

@mfelgamal mfelgamal commented Oct 18, 2016

What is this PR for?

This PR is for making docker images for zeppelin releases. It contains a script for building image for each release. Another script is used for publishing images to zeppelin Dockerhub account.

This repo, https://github.com/mfelgamal/zeppelin-dockers, is a demonstration of this PR. It contains zeppelin-base image and an image for each zeppelin release.

What type of PR is it?

[Feature]

Todos

  • Review Comments
  • Documentation

What is the Jira issue?

https://issues.apache.org/jira/browse/ZEPPELIN-1386

How should this be tested?

  • run create_release script or publish_release script.

Screenshots (if appropriate)

Questions:

  • Does the licenses files need update? no
  • Is there breaking changes for older versions? no
  • Does this needs documentation? yes

Copy link
Member

@bzz bzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contributing!

Do you think some documentation updates could be part of this PR, i.e explaining about DOCKER_USERNAME, etc?

\cc @minahlee who was taking care of latest releases and @astroshim who was working on Docker images for other cases for review.

@@ -0,0 +1,16 @@
FROM alpine:3.3
MAINTAINER Mahmoud Elgamal <[email protected]>
Copy link
Member

@bzz bzz Oct 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the top of my head:

  • there must be ASF Apache 2.0 license header at the beginning of every file
  • ASF projects usually discourage author annotations, could you please replace this with something like Apache Zeppelin authors <[email protected]>?

Attributions are kept though JIRA\Git commit logs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually people put Apache Software Foundation <[email protected]>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@mfelgamal
Copy link
Contributor Author

@bzz Thank you for your feedback. A MD documentation page is added to the install section of Zeppelin documentation.

ENV PATH $PATH:$JAVA_HOME/bin

# ports for zeppelin
EXPOSE 8080 8081
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in Zeppelin we dont need to open the port 8081 (used for old implementation of websocket), only 8080 is required now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anthonycorbacho Ok, so if you want to build a docker image to the old version of zeppelin, you need port 8081, what do you think about?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfelgamal you are right, but the usage of port 8081 have been removed long time ago, do you think it is worth to keep it? I am note sure if ppl will sill want to use an relic version of zeppelin?
Maybe by nostalgia ?

Let me know if it make sense :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anthonycorbacho you are right, I removed 8081 port :)

@astroshim
Copy link
Contributor

@mfelgamal Thank you for contributing this.
I have several questions.

  1. I think it needs -it option for running docker.
    so The command to run zeppelin docker should be
    docker run --rm -it -p 8080:8080 -p 8081:8081 mahmoudelgamal/zeppelin-release:0.5.0 bash
  2. What do you think of adding documents for this like this ?
  3. I got an following error when I run the Zeppelin Tutorial, Could you take care a look this?

image

@mfelgamal
Copy link
Contributor Author

mfelgamal commented Oct 23, 2016

@astroshim Thank you for your reviews, point 1 and 3 are done, and I will work on the documentation. Can you check the docker image again?

@astroshim
Copy link
Contributor

@mfelgamal I checked point 1 and 3. Thank you fix properly.
Could you also fix to run the R Tutorial and Python Tutorial too?
I think all tutorial should be ran without error.
What do you think?

@khalidhuseynov
Copy link
Member

@mfelgamal first of all thanks for initiative on this one, and few points here:

  • as i can see here Zeppelin images will be with hardcoded $ZEPPELIN_HOME/conf folder and in order to configure it, you'll need to rebuild your image. As a possible solution here would be read conf/ from certain place inside container and mount conf folder from your local drive to there. that may need little change in docker file and run command.
  • perhaps user can run with interactive (-it) as well as daemon (-d) modes, depending on preference.
  • also regarding running docker with, say, Spark - the easiest should be running it in local mode. if you want to point to your cluster, then there will be problems since at least port 7077 needs to be available.

Note that these points may not need to be addressed right in this PR, it's more like things to consider and maybe future improvement.


### Running a Zeppelin docker image

* To start Zeppelin, you need to pull the zeppelin release image:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a newline below this line? Or it'll rendered like this
screen shot 2016-10-25 at 4 52 14 pm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AhyoungRyu Thank you for your reviews, I changed it, you can check now. ;)

@AhyoungRyu
Copy link
Contributor

@mfelgamal Awesome!
There are some places to inform the page's existence to the users.
Please check documentation guide#adding-new-pages :)

@astroshim
Copy link
Contributor

@mfelgamal Ping.

@mfelgamal
Copy link
Contributor Author

@astroshim R and Python are installed and I think that the tutorials should be run without errors.

@mfelgamal
Copy link
Contributor Author

@khalidhuseynov Thank you for your reviews. For the point # 1, I will work on enhancing this in next pull request. And for 2 and 3, the docker image could be run in the daemon mode and I added it to the documentation, and also made 7077 port avaliable.

@astroshim
Copy link
Contributor

@mfelgamal I got following error. Did i missed something?

$ docker pull mahmoudelgamal/zeppelin-release:0.6.1
0.6.1: Pulling from mahmoudelgamal/zeppelin-release
985c5f84712b: Pull complete
cfb23d56d9a5: Pull complete
1fc00ca62c34: Pull complete
84e390212d53: Pull complete
679b1e41a9af: Pull complete
Digest: sha256:9ff7adc267bce43f262869b47968da3d9a057eb9b2926639a3f46c96fd867e10
Status: Downloaded newer image for mahmoudelgamal/zeppelin-release:0.6.1

$ docker run --rm -it -p 8080:8080 -p 8081:8081 mahmoudelgamal/zeppelin-release:0.6.1 bash
bash: cannot set terminal process group (-1): Not a tty
bash: no job control in this shell

@mfelgamal
Copy link
Contributor Author

mfelgamal commented Nov 5, 2016

@astroshim I think it is just a message and the shell run normally. you should pull the image again and run with this command docker run --rm -it -p 8080:8080 -p 8081:8081 mahmoudelgamal/zeppelin-release:0.5.0 -c bash. Also there're an issue in current zeppelin binary versions form 0.6.0v to 0.6.2v with embedded spark. What do you think?

@astroshim
Copy link
Contributor

@mfelgamal Could you tell me more detail about the issue on from 0.6.0v to 0.6.2v ?

@1ambda
Copy link
Member

1ambda commented Nov 7, 2016

@mfelgamal

I tried zeppelin-release:0.6.0 image but it failed to run spark code in Zeppelin Tutorial

$ docker ps
CONTAINER ID        IMAGE                                   COMMAND                  CREATED    STATUS              PORTS                                        NAMES
266f20bc0430        mahmoudelgamal/zeppelin-release:0.6.0   "/usr/local/bin/dumb-"   30 seconds ago    Up 29 seconds       7077/tcp, 0.0.0.0:8080-8081->8080-8081/tcp   determined_darwin

$ docker -v
Docker version 1.12.1, build 6f9534c

My Notebook complains like

screen shot 2016-11-07 at 4 04 28 pm

@1ambda
Copy link
Member

1ambda commented Nov 7, 2016

Additionally, It would be better to

  • run zeppelin immediately (using zeppelin.sh because we can run docker as daemon mode)
  • allow user to modify important config files by passing env variables.

You can refer this docker init script

https://github.com/wurstmeister/kafka-docker/blob/master/start-kafka.sh

@bzz
Copy link
Member

bzz commented Nov 7, 2016

@mfelgamal thank you for your great work and @astroshim @khalidhuseynov @AhyoungRyu @anthonycorbacho and @1ambda for prompt reviews!

I think @1ambda raised very good points, using zeppelin.sh will allow to use docker logs but we just need to make sure other logs (individial interpreters, etc) still get written to ./log/

@1ambda
Copy link
Member

1ambda commented Nov 22, 2016

@mfelgamal

Hi :) ping

@mfelgamal
Copy link
Contributor Author

@1ambda It's good if we did that in apache zeppelin repo. but in this case instead of using the created binary versions, we should build the source code inside the docker.
Like that:

RUN git clone https://git-wip-us.apache.org/repos/asf/zeppelin.git "/usr/local/zeppelin" &&  \
  cd /usr/local/zeppelin/ && \ 
  git checkout v0.6.1 && mvn clean package -DskipTests

I think that alpine doesn't play well with node, so we could use ubuntu, what do you think?

@bzz
Copy link
Member

bzz commented Nov 24, 2016

Great job @mfelgamal thank you for taking care!

The idea was to try to avoid building separate artefacts for Docker and use official convenience binaries from Apache release.

May be I'm missing something here, but what is the reason such images can not be published under https://hub.docker.com/r/apache/zeppelin automatically as a part of release (as that's what we want to publish)?

Alos, it would be so cool, if we could get rid of gcc &co in zeppelin-base image to make it smaller, otherwise is 479.5 MB now, without Zeppelin... It is only R that requires it, isn't it?

@mfelgamal
Copy link
Contributor Author

Hi @bzz

So far, we have binary versions from 0.5.0 to 0.6.2, which help us building docker image to each version instead of building the source code, but I mean that if you want to make dockerfile to the latest version from zeppelin which haven't a binary version and is on master branch, so we may need to build the zeppelin in the docker. if the latest version isn't necessary now, we can ignore this, what do you think?

  • Yes, only R package requires gcc

@1ambda
Copy link
Member

1ambda commented Nov 25, 2016

@mfelgamal @bzz

I think we should dockerize binary zeppelin images first because more users use binary versions.

@bzz
Copy link
Member

bzz commented Nov 25, 2016

if we want to make dockerfile to the latest version from zeppelin which haven't a binary version

This better be handled under a separate JIRA issue as this one is about [ZEPPELIN-1386] Docker images for running Apache Zeppelin releases . It can be done later i.e by setting up CI automation that runs nightly build \w something similar to create_release.sh. Important thing there would be to make sure that's clearly marked as non-release, but just as a developer's artefact.


So all that sounds great, and image looks good, except for fat R dependencies, but I'm not sure if we can do something about it.

@felixcheung as R expert, do you know if there is any way of installing R\deps without having build-base make gcc g++ in the image?

How do you guys think, is there anything that's left here?
Or shall we merge this, run docker build for old binary releases using this image?

@bzz
Copy link
Member

bzz commented Nov 25, 2016

to recap - theare are two final things, that I think might be very nice to have in this image:

@1ambda
Copy link
Member

1ambda commented Nov 25, 2016

Now this issue is like more about building base image :) I will start working on a new JIRA issue as @bzz mentioned about creating runnable zeppelin images per version based on this. That would be the next step.

https://issues.apache.org/jira/browse/ZEPPELIN-1711

@mfelgamal
Copy link
Contributor Author

@bzz I think that we need to install build-base make gcc g++ to build knitr package in R. And I wonder if we could remove them after building knitr. Is this right?

@bzz
Copy link
Member

bzz commented Nov 30, 2016

@mfelgamal yes, exactly. Do you think this is possible? I wonder if the image size would go down, if we remove those guys after getting knitr.
The goal would be to have a minimal image that includes all necessary dependencies only to run Zeppelin.

I'm not very familiar with R ecosystem, but isn't there some way of installing packages that comes with everything (including native dependecies) compiled, like .whl in Python? Then we could skip building layers of image \w gcc&co all together...

@felixcheung
Copy link
Member

felixcheung commented Nov 30, 2016

Sort of. There's Conda for R: https://www.continuum.io/content/preliminary-support-r-conda
OR RStudio https://www.rstudio.com/products/rstudio/download3/

But generally some R packages are compiled on installation; knitr is a relatively bigger one.

# ports for zeppelin
EXPOSE 8080 7077

ENTRYPOINT ["/usr/local/bin/dumb-init"]
Copy link
Member

@1ambda 1ambda Nov 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be ENTRYPOINT ["/usr/local/bin/dumb-init", "--"] according to https://github.com/Yelp/dumb-init#usage. It allows extened images run their executables without specifying -- if fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@1ambda done.

@mfelgamal
Copy link
Contributor Author

@bzz I removed curl build-base make gcc g++ for decreasing the image size, now it's 301 MB.
And R is run well.

@bzz
Copy link
Member

bzz commented Dec 5, 2016

@mfelgamal that sounds awesome job, thank you very much.

Please let me test it tomorrow and get back to you here, but otherwise I think it's ready to be merged!

@1ambda
Copy link
Member

1ambda commented Dec 6, 2016

@mfelgamal The reason why we can't run zeppelin binary on this docker image is because of dumb-init. It prevents us from running zeppelin well even just installing dumb-init. I would like to suggest to remove dumb-init since we can handle signals in other ways.

Let me update this comment.

It's not due to dumb-init. If we are using another script instead of bin/zeppelin.sh It prevents spark interpreter from running well. (The error messages like what @mfelgamal attached before #1538 (comment))

# doens't work: using another script to run zeppelin
ENV ENTRYPOINT_BIN="docker-entrypoint.sh"
ADD "docker-entrypoint.sh" "/${ENTRYPOINT_BIN}"
RUN echo "$LOG_TAG chmod +x to ${ENTRYPOINT_BIN}" \
        && chmod a+x "/${ENTRYPOINT_BIN}"
CMD ["./docker-entrypoint.sh"]

# works well
WORKDIR ${Z_HOME}
CMD ["bin/zeppelin.sh"]

@bzz
Copy link
Member

bzz commented Dec 7, 2016

@mfelgamal thanks again for the great work!

One last question is - could you please explain, why do you think one more script is needed - start-zeppelin.sh and why CMD["/bin/zeppelin.sh"] can not be used as entrypoint for the image?

@1ambda has the point, and we should try to reduce the number of shell scripts that need to be supported later on, as well as possible issues with setting up classpath, etc.

@mfelgamal
Copy link
Contributor Author

@bzz @1ambda the script start-zeppelin.sh isn't important, it just enforce using port 8080 and zeppelin home dir /usr/local/zeppelin, we can ignore it and use zeppelin.sh.

@mfelgamal
Copy link
Contributor Author

@bzz @1ambda are there further discussion?

@mfelgamal mfelgamal closed this Dec 12, 2016
@mfelgamal mfelgamal reopened this Dec 12, 2016
@1ambda
Copy link
Member

1ambda commented Dec 13, 2016

@mfelgamal

I agree with you opinion. We can directly use usr/local/zeppelin instead of start-zeppelin.sh.
I'v just created PR for removing start-zeppelin.sh related files, codes, docs.

https://github.com/mfelgamal/zeppelin/pull/3/files

Regarding to CI failure

It's due to flaky tests.

- should able to remove AngularObject
AngularElemTest:
AngularElem
- should provide onclick method *** FAILED ***
  The code passed to eventually never returned normally. Attempted 1 times over 377.06797700000004 milliseconds. Last failure message: 0 was not equal to 1. 

@mfelgamal
Copy link
Contributor Author

@1ambda that sounds awesome job, thank you very much. Now the PR is merged.

@bzz
Copy link
Member

bzz commented Dec 13, 2016

Looks great to me, thank you @1ambda @mfelgamal

Let's wait for CI results (just to help \w ongoing CI stability work) and merge to master, if nothing unexpected comes up and there is no further discussion!

@1ambda
Copy link
Member

1ambda commented Dec 14, 2016

@mfelgamal Could you retrigger CI?

@bzz
Copy link
Member

bzz commented Dec 14, 2016

First and second failing CI profile hit ZEPPELIN-1797

Spark 1.5 had another troubles \w DepInterpreterTest

16/12/13 10:47:22 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.

Results :

Tests in error: 
  DepInterpreterTest.testDefault:96 » NullPointer

Tests run: 24, Failures: 0, Errors: 1, Skipped: 0

And Selenium profiles also fails on test, related to DepInterpreter

ailed tests: 
  SparkParagraphIT.testDep:234 First paragraph from SparkParagraphIT of testDep status: 
Expected: "FINISHED"
     but: was "ERROR"

Tests in error: 
  ZeppelinIT.testSparkInterpreterDependencyLoading:234->AbstractZeppelinIT.waitForParagraph:70->AbstractZeppelinIT.pollingWait:96 » Timeout

Tests run: 18, Failures: 1, Errors: 1, Skipped: 0

Though I belive none of these have to do anything with the changed introduced in this PR, so merging it to master if there is no further discussion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants