diff --git a/docker/README.md b/docker/README.md index 19293de188448..226775184e84f 100644 --- a/docker/README.md +++ b/docker/README.md @@ -90,4 +90,98 @@ You can find more information on [Docker Hub Repositories Manual](https://docs.d ## Docker Demo Setup -Please refer to the [Docker Demo Docs page](https://hudi.apache.org/docs/docker_demo). \ No newline at end of file +Please refer to the [Docker Demo Docs page](https://hudi.apache.org/docs/docker_demo). + +## Building Multi-Arch Images + +NOTE: The steps below require some code changes. Support for multi-arch builds in a fully automated manner is being +tracked by [HUDI-3601](https://issues.apache.org/jira/browse/HUDI-3601). + +By default, the docker images are built for x86_64 (amd64) architecture. Docker `buildx` allows you to build multi-arch +images, link them together with a manifest file, and push them all to a registry – with a single command. Let's say we +want to build for arm64 architecture. First we need to ensure that `buildx` setup is done locally. Please follow the +below steps (referred from https://www.docker.com/blog/multi-arch-images): + +``` +# List builders +~ ❯❯❯ docker buildx ls +NAME/NODE DRIVER/ENDPOINT STATUS PLATFORMS +default * docker + default default running linux/amd64, linux/arm64, linux/arm/v7, linux/arm/v6 + +# If you are using the default builder, which is basically the old builder, then do following +~ ❯❯❯ docker buildx create --name mybuilder +mybuilder +~ ❯❯❯ docker buildx use mybuilder +~ ❯❯❯ docker buildx inspect --bootstrap +[+] Building 2.5s (1/1) FINISHED + => [internal] booting buildkit 2.5s + => => pulling image moby/buildkit:master 1.3s + => => creating container buildx_buildkit_mybuilder0 1.2s +Name: mybuilder +Driver: docker-container + +Nodes: +Name: mybuilder0 +Endpoint: unix:///var/run/docker.sock +Status: running + +Platforms: linux/amd64, linux/arm64, linux/arm/v7, linux/arm/v6 +``` + +Now goto `/docker/hoodie/hadoop` and change the `Dockerfile` to pull dependent images corresponding to +arm64. For example, in [base/Dockerfile](./hoodie/hadoop/base/Dockerfile) (which pulls jdk8 image), change the +line `FROM openjdk:8u212-jdk-slim-stretch` to `FROM arm64v8/openjdk:8u212-jdk-slim-stretch`. + +Then, from under `/docker/hoodie/hadoop` directory, execute the following command to build as well as +push the image to the dockerhub repo: + +``` +# Run under hoodie/hadoop, the is optional, "latest" by default +docker buildx build --platform -t /[:] --push + +# For example, to build base image +docker buildx build base --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-base:linux-arm64-0.10.1 --push +``` + +Once the base image is pushed then you could do something similar for other images. +Change [hive](./hoodie/hadoop/hive_base/Dockerfile) dockerfile to pull the base image with tag corresponding to +linux/arm64 platform. + +``` +# Change below line in the Dockerfile +FROM apachehudi/hudi-hadoop_${HADOOP_VERSION}-base:latest +# as shown below +FROM --platform=linux/arm64 apachehudi/hudi-hadoop_${HADOOP_VERSION}-base:linux-arm64-0.10.1 + +# and then build & push from under hoodie/hadoop dir +docker buildx build hive_base --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3:linux-arm64-0.10.1 --push +``` + +Similarly, for images that are dependent on hive (e.g. [base spark](./hoodie/hadoop/spark_base/Dockerfile) +, [sparkmaster](./hoodie/hadoop/sparkmaster/Dockerfile), [sparkworker](./hoodie/hadoop/sparkworker/Dockerfile) +and [sparkadhoc](./hoodie/hadoop/sparkadhoc/Dockerfile)), change the corresponding Dockerfile to pull the base hive +image with tag corresponding to arm64. Then build and push using `docker buildx` command. + +For the sake of completeness, here is a [patch](https://gist.github.com/xushiyan/cec16585e884cf0693250631a1d10ec2) which +shows what changes to make in Dockerfiles (assuming tag is named `linux-arm64-0.10.1`), and below is the list +of `docker buildx` commands. + +``` +docker buildx build base --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-base:linux-arm64-0.10.1 --push +docker buildx build datanode --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-datanode:linux-arm64-0.10.1 --push +docker buildx build historyserver --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-history:linux-arm64-0.10.1 --push +docker buildx build hive_base --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3:linux-arm64-0.10.1 --push +docker buildx build namenode --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-namenode:linux-arm64-0.10.1 --push +docker buildx build prestobase --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-prestobase_0.217:linux-arm64-0.10.1 --push +docker buildx build spark_base --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkbase_2.4.4:linux-arm64-0.10.1 --push +docker buildx build sparkadhoc --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkadhoc_2.4.4:linux-arm64-0.10.1 --push +docker buildx build sparkmaster --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkmaster_2.4.4:linux-arm64-0.10.1 --push +docker buildx build sparkworker --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkworker_2.4.4:linux-arm64-0.10.1 --push +``` + +Once all the required images are pushed to the dockerhub repos, then we need to do one additional change +in [docker compose](./compose/docker-compose_hadoop284_hive233_spark244.yml) file. +Apply [this patch](https://gist.github.com/codope/3dd986de5e54f0650dd74b6032e4456c) to the docker compose file so +that [setup_demo](./setup_demo.sh) pulls images with the correct tag for arm64. And now we should be ready to run the +setup script and follow the docker demo.