-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-22960][k8s] Make build-push-docker-images.sh more dev-friendly. #20154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,6 +16,8 @@ Kubernetes scheduler that has been added to Spark. | |
| you may setup a test cluster on your local machine using | ||
| [minikube](https://kubernetes.io/docs/getting-started-guides/minikube/). | ||
| * We recommend using the latest release of minikube with the DNS addon enabled. | ||
| * Be aware that the default minikube configuration is not enough for running Spark applications. | ||
| You will need to increase the available memory and number of CPUs. | ||
| * You must have appropriate permissions to list, create, edit and delete | ||
| [pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You can verify that you can list these resources | ||
| by running `kubectl auth can-i <list|create|edit|delete> pods`. | ||
|
|
@@ -197,7 +199,7 @@ kubectl port-forward <driver-pod-name> 4040:4040 | |
|
|
||
| Then, the Spark driver UI can be accessed on `http://localhost:4040`. | ||
|
|
||
| ### Debugging | ||
| ### Debugging | ||
|
|
||
| There may be several kinds of failures. If the Kubernetes API server rejects the request made from spark-submit, or the | ||
| connection is refused for a different reason, the submission logic should indicate the error encountered. However, if there | ||
|
|
@@ -215,8 +217,8 @@ If the pod has encountered a runtime error, the status can be probed further usi | |
| kubectl logs <spark-driver-pod> | ||
| ``` | ||
|
|
||
| Status and logs of failed executor pods can be checked in similar ways. Finally, deleting the driver pod will clean up the entire spark | ||
| application, includling all executors, associated service, etc. The driver pod can be thought of as the Kubernetes representation of | ||
| Status and logs of failed executor pods can be checked in similar ways. Finally, deleting the driver pod will clean up the entire spark | ||
| application, includling all executors, associated service, etc. The driver pod can be thought of as the Kubernetes representation of | ||
|
||
| the Spark application. | ||
|
|
||
| ## Kubernetes Features | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,51 +19,118 @@ | |
| # This script builds and pushes docker images when run from a release of Spark | ||
| # with Kubernetes support. | ||
|
|
||
| declare -A path=( [spark-driver]=kubernetes/dockerfiles/driver/Dockerfile \ | ||
| [spark-executor]=kubernetes/dockerfiles/executor/Dockerfile \ | ||
| [spark-init]=kubernetes/dockerfiles/init-container/Dockerfile ) | ||
| function error { | ||
| echo "$@" 1>&2 | ||
| exit 1 | ||
| } | ||
|
|
||
| # Detect whether this is a git clone or a Spark distribution and adjust paths | ||
| # accordingly. | ||
| if [ -z "${SPARK_HOME}" ]; then | ||
| SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)" | ||
| fi | ||
| . "${SPARK_HOME}/bin/load-spark-env.sh" | ||
|
|
||
| if [ -f "$SPARK_HOME/RELEASE" ]; then | ||
| IMG_PATH="kubernetes/dockerfiles" | ||
| SPARK_JARS="jars" | ||
| else | ||
| IMG_PATH="resource-managers/kubernetes/docker/src/main/dockerfiles" | ||
| SPARK_JARS="assembly/target/scala-$SPARK_SCALA_VERSION/jars" | ||
| fi | ||
|
|
||
| if [ ! -d "$IMG_PATH" ]; then | ||
| error "Cannot find docker images. This script must be run from a runnable distribution of Apache Spark." | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Update this comment? I presume now it should say runnable distribution, or from source.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The source directory is sort of a "runnable distribution" if Spark is built. I'd rather keep the message simple since it's mostly targeted at end users (not devs).
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SGTM |
||
| fi | ||
|
|
||
| declare -A path=( [spark-driver]="$IMG_PATH/driver/Dockerfile" \ | ||
| [spark-executor]="$IMG_PATH/executor/Dockerfile" \ | ||
| [spark-init]="$IMG_PATH/init-container/Dockerfile" ) | ||
|
|
||
| function image_ref { | ||
| local image="$1" | ||
| local add_repo="${2:-1}" | ||
| if [ $add_repo = 1 ] && [ -n "$REPO" ]; then | ||
| image="$REPO/$image" | ||
| fi | ||
| if [ -n "$TAG" ]; then | ||
| image="$image:$TAG" | ||
| fi | ||
| echo "$image" | ||
| } | ||
|
|
||
| function build { | ||
| docker build -t spark-base -f kubernetes/dockerfiles/spark-base/Dockerfile . | ||
| local base_image="$(image_ref spark-base 0)" | ||
| docker build --build-arg "spark_jars=$SPARK_JARS" \ | ||
| --build-arg "img_path=$IMG_PATH" \ | ||
| -t "$base_image" \ | ||
| -f "$IMG_PATH/spark-base/Dockerfile" . | ||
| for image in "${!path[@]}"; do | ||
| docker build -t ${REPO}/$image:${TAG} -f ${path[$image]} . | ||
| docker build --build-arg "base_image=$base_image" -t "$(image_ref $image)" -f ${path[$image]} . | ||
| done | ||
| } | ||
|
|
||
|
|
||
| function push { | ||
| for image in "${!path[@]}"; do | ||
| docker push ${REPO}/$image:${TAG} | ||
| docker push "$(image_ref $image)" | ||
| done | ||
| } | ||
|
|
||
| function usage { | ||
| echo "This script must be run from a runnable distribution of Apache Spark." | ||
| echo "Usage: ./sbin/build-push-docker-images.sh -r <repo> -t <tag> build" | ||
| echo " ./sbin/build-push-docker-images.sh -r <repo> -t <tag> push" | ||
| echo "for example: ./sbin/build-push-docker-images.sh -r docker.io/myrepo -t v2.3.0 push" | ||
| cat <<EOF | ||
| Usage: $0 [options] [command] | ||
| Builds or pushes the built-in Spark docker images. | ||
| Commands: | ||
| build Build docker images. | ||
| push Push images to a registry. Requires a repository address to be provided, both | ||
| when building and when pushing the images. | ||
| Options: | ||
| -r repo Repository address. | ||
| -t tag Tag to apply to built images, or to identify images to be pushed. | ||
| -m Use minikube environment when invoking docker. | ||
| Example: | ||
| $0 -r docker.io/myrepo -t v2.3.0 push | ||
| EOF | ||
| } | ||
|
|
||
| if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then | ||
| usage | ||
| exit 0 | ||
| fi | ||
|
|
||
| while getopts r:t: option | ||
| REPO= | ||
| TAG= | ||
| while getopts mr:t: option | ||
| do | ||
| case "${option}" | ||
| in | ||
| r) REPO=${OPTARG};; | ||
| t) TAG=${OPTARG};; | ||
| m) | ||
| if ! which minikube 1>/dev/null; then | ||
| error "Cannot find minikube." | ||
| fi | ||
| eval $(minikube docker-env) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think building docker images right into the minikube VM's docker daemon is uncommon and not something we'd want to recommend. Users on minikube should also use a proper registry - (for example, there is a registry addon) that could be used. While this might be good to document as a local developer workflow, I'm apprehensive about adding a new flag just for this particular mode. Also one could invoke
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I started calling that command separately, but it's really annoying. This option is useful not just for Spark devs, but for people who want to try their own apps on minikube before trying them on a larger cluster, for example.
What's the alternative? Deploying your own registry? I struggled with that for hours and it's nearly impossible to get docker to talk to an insecure registry (or one with a self signed cert like minikube's). This approach just worked (tm).
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see your point - this is considerably easier. I spoke with a minikube maintainer and it seems this is not as uncommon as I initially thought. So, this change looks good, but I'd prefer that we add some more explanation to the usage section, that this will build an image within the minikube environment - and also linking to https://kubernetes.io/docs/getting-started-guides/minikube/#reusing-the-docker-daemon. cc/ @aaron-prindle |
||
| ;; | ||
| esac | ||
| done | ||
|
|
||
| if [ -z "$REPO" ] || [ -z "$TAG" ]; then | ||
| case "${@: -1}" in | ||
| build) | ||
| build | ||
| ;; | ||
| push) | ||
| if [ -z "$REPO" ]; then | ||
| usage | ||
| exit 1 | ||
| fi | ||
| push | ||
| ;; | ||
| *) | ||
| usage | ||
| else | ||
| case "${@: -1}" in | ||
| build) build;; | ||
| push) push;; | ||
| *) usage;; | ||
| esac | ||
| fi | ||
| exit 1 | ||
| ;; | ||
| esac | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we're going into detail, we should specify a certain config here. 6G of memory and at least 2 CPUs? @liyinan926, do you recall what we typically need for SparkPi?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Driver + default minikube overhead uses 1.25 CPUs from what I remember seeing in the dashboard. Don't remember the memory usage. So I'd say 4 CPUs + 4g of memory (to allow driver + single executor), 6g if you want two executors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember 3 cpus are the minimum, considering that kube-system pods will use some CPU cores. For me, 4G of memory worked fine.