Conversation
Yikun
added a commit
that referenced
this pull request
Oct 17, 2022
### What changes were proposed in this pull request? This patch: - Add spark uid/gid in dockerfile (useradd and groupadd). (used in entrypoint) This way is also used by [others DOI](https://github.com/search?p=2&q=org%3Adocker-library+useradd&type=Code) and apache DOI (such as [zookeeper](https://github.com/31z4/zookeeper-docker/blob/master/3.8.0/Dockerfile#L17-L21), [solr](https://github.com/apache/solr-docker/blob/a20477ed123cd1a72132aebcc0742cee46b5f976/9.0/Dockerfile#L108-L110), [flink](https://github.com/apache/flink-docker/blob/master/1.15/scala_2.12-java11-ubuntu/Dockerfile#L55-L56)). - Use `spark` user in `entrypoint.sh` rather than Dockerfile. (make sure the spark process is executed as non-root users) - Remove `USER` setting in Dockerfile. (make sure base image has permission to extend dockerifle, such as execute `apt update`) - Chown script to `spark:spark` instead of `root:root`. (avoid permission issue such like standalone mode) - Add `gosu` deps, a `sudo` replacement recommanded by [docker](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#user) and [docker official image](https://github.com/docker-library/official-images/blob/9a4d54f1a42ea82970baa4e6f3d0bc75e98fc961/README.md#consistency), and also are used by other DOI images. This change also follow the rules of docker official images, see also [consistency](https://github.com/docker-library/official-images/blob/9a4d54f1a42ea82970baa4e6f3d0bc75e98fc961/README.md#consistency) and [dockerfile best practices about user](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#user). ### Why are the changes needed? The below issues are what I have found so far 1. **Irregular login username** Docker images username is not very standard, docker run with `185` username is a little bit weird. ``` $ docker run -ti apache/spark bash 185d88a24357413:/opt/spark/work-dir$ ``` 2. **Permission issue of spark sbin** And also there are some permission issue when running some spark script, such as standalone mode: ``` $ docker run -ti apache/spark /opt/spark/sbin/start-master.sh mkdir: cannot create directory ‘/opt/spark/logs’: Permission denied chown: cannot access '/opt/spark/logs': No such file or directory starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out /opt/spark/sbin/spark-daemon.sh: line 135: /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out: No such file or directory failed to launch: nice -n 0 /opt/spark/bin/spark-class org.apache.spark.deploy.master.Master --host 1c345a00e312 --port 7077 --webui-port 8080 tail: cannot open '/opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out' for reading: No such file or directory full log in /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out ``` </details> 3. **spark as base image case is not supported well** Due to static USER set in Dockerfile. ``` $ cat Dockerfile FROM apache/spark RUN apt update $ docker build -t spark-test:1015 . // ... ------ > [2/2] RUN apt update: #5 0.405 E: Could not open lock file /var/lib/apt/lists/lock - open (13: Permission denied) #5 0.405 E: Unable to lock directory /var/lib/apt/lists/ ------ executor failed running [/bin/sh -c apt update]: exit code: 100 ``` ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? - CI passed: all k8s test - Regression test: ``` # Username is set to spark rather than 185 docker run -ti spark:scala2.12-java11-python3-r-ubuntu bash spark27bbfca0a581:/opt/spark/work-dir$ ``` ``` # start-master.sh no permission issue $ docker run -ti spark:scala2.12-java11-python3-r-ubuntu bash spark8d1118e26766:~/work-dir$ /opt/spark/sbin/start-master.sh starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-8d1118e26766.out ``` ``` # Image as parent case $ cat Dockerfile FROM spark:scala2.12-java11-python3-r-ubuntu RUN apt update $ docker build -t spark-test:1015 . [+] Building 7.8s (6/6) FINISHED => [1/2] FROM docker.io/library/spark:scala2.12-java11-python3-r-ubuntu 0.0s => [2/2] RUN apt update 7.7s ``` - Other test: ``` # Test on pyspark $ cd spark-docker/3.3.0/scala2.12-java11-python3-r-ubuntu $ docker build -t spark:scala2.12-java11-python3-r-ubuntu . $ docker run -p 4040:4040 -ti spark:scala2.12-java11-python3-r-ubuntu /opt/spark/bin/pyspark ``` ``` # A simple test for `start-master.sh` (standalone mode) $ docker run -ti spark:scala2.12-java11-python3-r-ubuntu bash spark8d1118e26766:~/work-dir$ /opt/spark/sbin/start-master.sh starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-8d1118e26766.out ``` Closes #11 from Yikun/spark-user. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
f320097 to
e1160cd
Compare
Yikun
added a commit
that referenced
this pull request
May 25, 2023
### What changes were proposed in this pull request? - This patch changes the `build-args` to `patch in test` in build and publish workflow, because the docker official image do not support **parameterized FROM** values. docker-library/official-images#13089 (comment) - And also Refactor publish workflow:  ### Why are the changes needed? Same change with build workflow refactor, to avoid the publish issue like: ``` #5 [linux/amd64 internal] load metadata for docker.io/library/spark:3.4.0-scala2.12-java11-ubuntu #5 ERROR: pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed ------ > [linux/amd64 internal] load metadata for docker.io/library/spark:3.4.0-scala2.12-java11-ubuntu: ------ Dockerfile:18 -------------------- 16 | # 17 | ARG BASE_IMAGE=spark:3.4.0-scala2.12-java11-ubuntu 18 | >>> FROM $BASE_IMAGE 19 | 20 | RUN set -ex && \ -------------------- ERROR: failed to solve: spark:3.4.0-scala2.12-java11-ubuntu: pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed Error: buildx failed with: ERROR: failed to solve: spark:3.4.0-scala2.12-java11-ubuntu: pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Publish test in my local fork: - https://github.com/Yikun/spark-docker/actions/runs/5076986823/jobs/9120029759: Skip the local base build use the [published base](https://github.com/Yikun/spark-docker/actions/runs/5076986823/jobs/9120029759#step:11:135) image:  ``` #3 [linux/amd64 internal] load metadata for ghcr.io/yikun/spark-docker/spark:3.4.0-scala2.12-java11-ubuntu #3 DONE 0.9s #4 [linux/arm64 internal] load metadata for ghcr.io/yikun/spark-docker/spark:3.4.0-scala2.12-java11-ubuntu #4 DONE 0.9s ``` - CI passed: do local base build first and build base on the local build Closes apache#39 from Yikun/publish-build. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.