-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[ZEPPELIN-1711] Create Docker Images for Released Zeppelin Binaries #1761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
\cc @bzz @mfelgamal @astroshim Thanks! |
4af7ceb to
8c71f6d
Compare
8c71f6d to
b7b0fa0
Compare
|
@1ambda awesome!. |
|
|
||
| RUN echo "$LOG_TAG Cleanup" && \ | ||
| apk del build_deps && \ | ||
| apk del python_build_deps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to install and delete the packages in the same layer (on one line) so it’s not committed to the image as separate layers to reduce the image size. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mfelgamal Thanks for review :)
- I agree but we need to keep the balance between readability and size i think. So there are many official images having multiple
RUNcommands. For example, openjdk. Additionally, separating layers also affects on build time (productivity) while developing docker images. - Let me compare image sizes and add comments about it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from experience it might be useful also to have multiple layers (ie RUN command) for reliability - these cached images are helpful as checkpoints to resume from if one of these step fails
| apk add --no-cache --virtual=python_build_deps \ | ||
| musl-dev linux-headers gfortran \ | ||
| freetype-dev py-numpy-dev@testing \ | ||
| py-numpy python-dev libpng-dev libxml2-dev libxslt-dev \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
py-numpy is here and L30?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix it
| curl --silent --location https://github.com/sgerrand/alpine-pkg-R/releases/download/${R_VERSION}/R-dev-${R_VERSION}.apk \ | ||
| --output /var/cache/apk/R-dev-${R_VERSION}.apk && \ | ||
| apk add --no-cache --allow-untrusted /var/cache/apk/R-dev-${R_VERSION}.apk && \ | ||
| R -e "install.packages('knitr', repos='http://cran.us.r-project.org', lib='$R_LIBS')" && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could pass a list of packages to install.packages()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought separating install packages statements into multiple lies would be easy to maintain. But it's ok to keep it one line. I will fix it.
| ENV LOG_TAG="[ZEPPELIN_BASE_R]:" \ | ||
| LANG=C.UTF-8 \ | ||
| R_VERSION="3.3.1-r0" \ | ||
| R_LIBS="/usr/local/rbin/R" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
R_LIBS is optional - is there a reason you want to create/pass this, instead of using just the default location?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know it. I will fix it.
|
My apologies if discussed before, but is there a reason for separating into Scala, Python, R individual Docker images? Don't people want to run Python and R together, for example? Also, +1 for graphing being important for Python and R. |
|
@1ambda ping |
|
@felixcheung Thanks for review. Previously, we discussed about separating images in ZEPPEILN-1711 IMO, having 1 image which includes R, Python related packages is better. Since
I would like to use ubuntu which is comfortable as a desktop OS instead of alpine
What do you think? @felixcheung @jongyoul @bzz |
|
Let me create new PR which including only 1 OS (ubuntu) based on this PR. |
### What is this PR for? Created `Dockerfile` for released bin - based on **Ubuntu:16.04 (LTS)** for desktop usage - **JDK 8** - **R** with basic packages - **Python 2** with basic packages - **miniconda2** for `%python.conda` ### Details We already discussed about using alpine image in #1761. - However, it's not designed for desktop usage - Doesn't have some official packages (R, ...) - Not familiar to users for desktop OS That the reason why ubuntu is used in base image ``` zeppelin base b3818f9ae4b1 11 hours ago 1.67 GB zeppelin 0.6.2 c0a4d8556f92 7 hours ago 2.29 GB zeppelin 0.7.0 c4a5ad0d04bd 8 hours ago 2.5 GB zeppelin 0.7.1 54173b77743b 7 hours ago 2.49 GB ``` ### What type of PR is it? [Feature] ### Todos * [x] - base image * [x] - script for creating bin images * [x] - bin image template ### What is the Jira issue? [ZEPPELIN-1711](https://issues.apache.org/jira/browse/ZEPPELIN-1711) ### How should this be tested? 1. build base image `cd scripts/docker/zeppelin/base; docker build -t zeppelin:base ./` 2. build bin image `cd scripts/docker/zeppelin/0.7.1; docker build -t zeppelin:0.7.1 ./` 3. execute docker images ``` docker run -p 8080:8080 --rm --name zeppelin zeppelin:0.7.1 ``` since it takes time to build, you can use already [published docker images](https://hub.docker.com/r/1ambda/docker-zeppelin/) ``` docker run -p 8080:8080 --rm --name zeppelin 1ambda/docker-zeppelin:0.7.1 ``` 4. should be able to run spark, python and R tutorials ### Screenshots (if appropriate) NO ### Questions: * Does the licenses files need update? - NO * Is there breaking changes for older versions? - NO * Does this needs documentation? - YES, updated Author: 1ambda <[email protected]> Closes #2264 from 1ambda/ZEPPELIN-1711/bin-dockerfile and squashes the following commits: 69a0b1f [1ambda] docs: Update docker.md ced897f [1ambda] fix: DON'T remove /tmp 1f6da76 [1ambda] feat: Dockerfiles for 060, 070, 071 0fc3f75 [1ambda] feat: Add template for bin image 5cba56e [1ambda] feat: Use ubuntu for base image
|
#2264 was merged. |
What is this PR for?
This PR
scripts/docker/zeppelin-baseso that we can keep small size base imagesscripts/docker/create-dockerfile.shto create docker images for newly released zeppelin binariesFor Reviewers
I have things to be discussed in this PR
testing/install_external_dependencies.shor not: Alpine linux has it's own command to install packages likeapk add. I think this is not reusable. And extracting scripts from Dockerfile causes other problems. For example, we need to add script just for building base images for python, r to copy scripts for external dependencies because Docker commandCOPYandADDdoens't support relative paths. It means, we have to copy external dependency scripts before building into the path where Dockerfile is built.What type of PR is it?
[Feature]
What is the Jira issue?
ZEPPELIN-1711
How should this be tested?
Building Base Images
Building Zeppelin Images
Make sure you have base images before building
Running Zeppelin Images
Make sure you have zeppelin docker images before running containers
Then, running containers by replacing tags (0.6.2, 0.6.1, 0.6.0)
Testing Zeppelin Tutorials
Here are things you need to know before testing.
zeppelin:alpine-$TAG_javaimages can run Zeppelin Tutorial since it has openjdk7zeppelin:alpine-$TAG_pythonimages can run Zeppelin Tutorial: Python - matplotlib basic as well as Zeppelin Tutorial since it has python related packageszeppelin:alpine-$TAG_rimages can run R Tutorial as well as Zeppelin Tutorial since it has python related packageszeppelin:alpine-0.6.0_$PLATFORMimages will not run Zeppelin Tutorial properly while throwing errors likezeppelin java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy. I am not sure whether this is a problem of zeppelin or spark.Then, run each zeppelin container and run tutorial while replacing tags by accessing localhost:8080 in your browsers
Testing
create-dockerfile.shScreenshots (if appropriate)
Image Sizes
zeppelin base image size added in #1538: 301.3 MB
We can reduced size by removing packages such as
googleVis,data.tableandramnathv/rChartsin R images for example. But then we might not be able to run tutorials properly.Questions:
docs/install/docker.md