Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add official image for Apache Kafka #16970

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

KrishVora01
Copy link

@KrishVora01 KrishVora01 commented Jun 12, 2024

Hi there, this aims to introduce a Docker Official Image for Apache Kafka.

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
To know more about Apache Kafka, please read the documentation.

Checklist for Review

  • associated with or contacted upstream?

The proposal for this has been approved in KIP-1028.

Licensed under Apache License 2.0

  • does it fit into one of the common categories? ("service", "language stack", "base distribution")

Yes

  • is it reasonably popular, or does it solve a particular use case well?

Yes and Yes

  • does a documentation PR exist? (should be reviewed and merged at roughly the same time so that we don't have an empty image page on the Hub for very long)

Yes: the documentation PR is raised here.

Yes, however, if we have missed anything, please let us know

  • 2+ official-images maintainer dockerization review?
  • existing official images have been considered as a base? (ie, if foobar needs Node.js, has FROM node:... instead of grabbing node via other means been considered?)

FROM eclipse-temurin

  • [ ] if FROM scratch, tarballs only exist in a single commit within the associated history?

N/A

@KrishVora01 KrishVora01 requested a review from a team as a code owner June 12, 2024 09:43
@KrishVora01
Copy link
Author

The proposal for this has been approved in KIP-1028.

Copy link

Diff for c56d98c:
diff --git a/_bashbrew-arches b/_bashbrew-arches
index 8b13789..e85a97f 100644
--- a/_bashbrew-arches
+++ b/_bashbrew-arches
@@ -1 +1,2 @@
-
+amd64
+arm64v8
diff --git a/_bashbrew-cat b/_bashbrew-cat
index bdfae4a..0504d7d 100644
--- a/_bashbrew-cat
+++ b/_bashbrew-cat
@@ -1 +1,7 @@
-Maintainers: New Image! :D (@docker-library-bot)
+Maintainers: The Apache Kafka Project <[email protected]> (@ApacheKafka)
+GitRepo: https://github.com/apache/kafka.git
+
+Tags: 3.7.0, latest
+Architectures: amd64, arm64v8
+GitCommit: b7dcae44ffb29f854385cef959519a3d0baad55e
+Directory: docker/docker_official_images/3.7.0/jvm
diff --git a/_bashbrew-list b/_bashbrew-list
index e69de29..8319254 100644
--- a/_bashbrew-list
+++ b/_bashbrew-list
@@ -0,0 +1,2 @@
+kafka:3.7.0
+kafka:latest
diff --git a/kafka_latest/Dockerfile b/kafka_latest/Dockerfile
new file mode 100755
index 0000000..905e2f2
--- /dev/null
+++ b/kafka_latest/Dockerfile
@@ -0,0 +1,95 @@
+###############################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+###############################################################################
+
+FROM eclipse-temurin:21-jre-alpine AS build-jsa
+
+USER root
+
+# Get Kafka from https://archive.apache.org/dist/kafka, url passed as env var, for version 3.7.0
+ENV kafka_url https://archive.apache.org/dist/kafka/3.7.0/kafka_2.13-3.7.0.tgz
+
+COPY jsa_launch /etc/kafka/docker/jsa_launch
+
+RUN set -eux ; \
+    apk update ; \
+    apk upgrade ; \
+    apk add --no-cache wget gcompat gpg gpg-agent procps bash; \
+    mkdir opt/kafka; \
+    wget -nv -O kafka.tgz "$kafka_url"; \
+    wget -nv -O kafka.tgz.asc "$kafka_url.asc"; \
+    tar xfz kafka.tgz -C /opt/kafka --strip-components 1; \
+    wget -nv -O KEYS https://downloads.apache.org/kafka/KEYS; \
+    gpg --import KEYS; \
+    gpg --batch --verify kafka.tgz.asc kafka.tgz
+
+# Generate jsa files using dynamic CDS for kafka server start command and kafka storage format command
+RUN /etc/kafka/docker/jsa_launch
+
+
+FROM eclipse-temurin:21-jre-alpine
+
+# exposed ports
+EXPOSE 9092
+
+USER root
+
+# Get Kafka from https://archive.apache.org/dist/kafka, url passed as env var, for version 3.7.0
+ENV kafka_url https://archive.apache.org/dist/kafka/3.7.0/kafka_2.13-3.7.0.tgz
+ENV build_date 2024-06-11
+
+
+LABEL org.label-schema.name="kafka" \
+      org.label-schema.description="Apache Kafka" \
+      org.label-schema.build-date="${build_date}" \
+      org.label-schema.vcs-url="https://github.com/apache/kafka" \
+      maintainer="Apache Kafka"
+
+RUN set -eux ; \
+    apk update ; \
+    apk upgrade ; \
+    apk add --no-cache wget gcompat gpg gpg-agent procps bash; \
+    mkdir opt/kafka; \
+    wget -nv -O kafka.tgz "$kafka_url"; \
+    wget -nv -O kafka.tgz.asc "$kafka_url.asc"; \
+    tar xfz kafka.tgz -C /opt/kafka --strip-components 1; \
+    wget -nv -O KEYS https://downloads.apache.org/kafka/KEYS; \
+    gpg --import KEYS; \
+    gpg --batch --verify kafka.tgz.asc kafka.tgz; \
+    mkdir -p /var/lib/kafka/data /etc/kafka/secrets; \
+    mkdir -p /etc/kafka/docker /usr/logs /mnt/shared/config; \
+    adduser -h /home/appuser -D --shell /bin/bash appuser; \
+    chown appuser:appuser -R /usr/logs /opt/kafka /mnt/shared/config; \
+    chown appuser:root -R /var/lib/kafka /etc/kafka/secrets /etc/kafka; \
+    chmod -R ug+w /etc/kafka /var/lib/kafka /etc/kafka/secrets; \
+    cp /opt/kafka/config/log4j.properties /etc/kafka/docker/log4j.properties; \
+    cp /opt/kafka/config/tools-log4j.properties /etc/kafka/docker/tools-log4j.properties; \
+    cp /opt/kafka/config/kraft/server.properties /etc/kafka/docker/server.properties; \
+    rm kafka.tgz kafka.tgz.asc KEYS; \
+    apk del wget gpg gpg-agent; \
+    apk cache clean;
+
+COPY --from=build-jsa kafka.jsa /opt/kafka/kafka.jsa
+COPY --from=build-jsa storage.jsa /opt/kafka/storage.jsa
+COPY --chown=appuser:appuser resources/common-scripts /etc/kafka/docker
+COPY --chown=appuser:appuser launch /etc/kafka/docker/launch
+
+USER appuser
+
+VOLUME ["/etc/kafka/secrets", "/var/lib/kafka/data", "/mnt/shared/config"]
+
+CMD ["/etc/kafka/docker/run"]
diff --git a/kafka_latest/jsa_launch b/kafka_latest/jsa_launch
new file mode 100755
index 0000000..904fbbe
--- /dev/null
+++ b/kafka_latest/jsa_launch
@@ -0,0 +1,49 @@
+#!/usr/bin/env bash
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+KAFKA_CLUSTER_ID="$(opt/kafka/bin/kafka-storage.sh random-uuid)"
+TOPIC="test-topic"
+
+KAFKA_JVM_PERFORMANCE_OPTS="-XX:ArchiveClassesAtExit=storage.jsa" opt/kafka/bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c opt/kafka/config/kraft/server.properties
+
+KAFKA_JVM_PERFORMANCE_OPTS="-XX:ArchiveClassesAtExit=kafka.jsa" opt/kafka/bin/kafka-server-start.sh opt/kafka/config/kraft/server.properties &
+
+check_timeout() {
+    if [ $TIMEOUT -eq 0 ]; then
+        echo "Server startup timed out"
+        exit 1
+    fi
+    echo "Check will timeout in $(( TIMEOUT-- )) seconds"
+    sleep 1
+}
+
+opt/kafka/bin/kafka-topics.sh --create --topic $TOPIC --bootstrap-server localhost:9092
+[ $? -eq 0 ] || exit 1
+
+echo "test" | opt/kafka/bin/kafka-console-producer.sh --topic $TOPIC --bootstrap-server localhost:9092
+[ $? -eq 0 ] || exit 1
+
+opt/kafka/bin/kafka-console-consumer.sh --topic $TOPIC --from-beginning --bootstrap-server localhost:9092 --max-messages 1 --timeout-ms 20000
+[ $? -eq 0 ] || exit 1
+
+opt/kafka/bin/kafka-server-stop.sh
+
+# Wait until jsa file is generated
+TIMEOUT=20
+until [ -f /kafka.jsa ]
+do
+    check_timeout
+done
\ No newline at end of file
diff --git a/kafka_latest/launch b/kafka_latest/launch
new file mode 100755
index 0000000..6c4ca1d
--- /dev/null
+++ b/kafka_latest/launch
@@ -0,0 +1,68 @@
+#!/usr/bin/env bash
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Override this section from the script to include the com.sun.management.jmxremote.rmi.port property.
+if [ -z "${KAFKA_JMX_OPTS-}" ]; then
+    export KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote=true \
+    -Dcom.sun.management.jmxremote.authenticate=false \
+    -Dcom.sun.management.jmxremote.ssl=false "
+fi
+
+# The JMX client needs to be able to connect to java.rmi.server.hostname.
+# The default for bridged n/w is the bridged IP so you will only be able to connect from another docker container.
+# For host n/w, this is the IP that the hostname on the host resolves to.
+
+# If you have more than one n/w configured, hostname -i gives you all the IPs,
+# the default is to pick the first IP (or network).
+export KAFKA_JMX_HOSTNAME=${KAFKA_JMX_HOSTNAME:-$(hostname -i | cut -d" " -f1)}
+
+if [ "${KAFKA_JMX_PORT-}" ]; then
+  # This ensures that the "if" section for JMX_PORT in kafka launch script does not trigger.
+    export JMX_PORT=$KAFKA_JMX_PORT
+    export KAFKA_JMX_OPTS="${KAFKA_JMX_OPTS-} -Djava.rmi.server.hostname=$KAFKA_JMX_HOSTNAME \
+    -Dcom.sun.management.jmxremote.local.only=false \
+    -Dcom.sun.management.jmxremote.rmi.port=$JMX_PORT \
+    -Dcom.sun.management.jmxremote.port=$JMX_PORT"
+fi
+
+# Make a temp env variable to store user provided performance otps
+if [ -z "${KAFKA_JVM_PERFORMANCE_OPTS-}" ]; then
+    export TEMP_KAFKA_JVM_PERFORMANCE_OPTS=""
+else
+    export TEMP_KAFKA_JVM_PERFORMANCE_OPTS="$KAFKA_JVM_PERFORMANCE_OPTS"
+fi
+
+# We will first use CDS for storage to format storage
+export KAFKA_JVM_PERFORMANCE_OPTS="${KAFKA_JVM_PERFORMANCE_OPTS-} -XX:SharedArchiveFile=/opt/kafka/storage.jsa"
+
+echo "===> Using provided cluster id $CLUSTER_ID ..."
+
+# Invoke the docker wrapper to setup property files and format storage
+result=$(/opt/kafka/bin/kafka-run-class.sh kafka.docker.KafkaDockerWrapper setup \
+      --default-configs-dir /etc/kafka/docker \
+      --mounted-configs-dir /mnt/shared/config \
+      --final-configs-dir /opt/kafka/config 2>&1) || \
+      echo $result | grep -i "already formatted" || \
+      { echo $result && (exit 1) }
+
+# Using temp env variable to get rid of storage CDS command
+export KAFKA_JVM_PERFORMANCE_OPTS="$TEMP_KAFKA_JVM_PERFORMANCE_OPTS"
+
+# Now we will use CDS for kafka to start kafka server
+export KAFKA_JVM_PERFORMANCE_OPTS="$KAFKA_JVM_PERFORMANCE_OPTS -XX:SharedArchiveFile=/opt/kafka/kafka.jsa"
+
+# Start kafka broker
+exec /opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
diff --git a/kafka_latest/resources/common-scripts/bash-config b/kafka_latest/resources/common-scripts/bash-config
new file mode 100755
index 0000000..b697161
--- /dev/null
+++ b/kafka_latest/resources/common-scripts/bash-config
@@ -0,0 +1,23 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+set -o nounset \
+    -o errexit
+
+# Trace may expose passwords/credentials by printing them to stdout, so turn on with care.
+if [ "${TRACE:-}" == "true" ]; then
+  set -o verbose \
+      -o xtrace
+fi
diff --git a/kafka_latest/resources/common-scripts/configure b/kafka_latest/resources/common-scripts/configure
new file mode 100755
index 0000000..deb82ae
--- /dev/null
+++ b/kafka_latest/resources/common-scripts/configure
@@ -0,0 +1,121 @@
+#!/usr/bin/env bash
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+ensure() {
+  if [[ -z "${!1}" ]]; then
+    echo "$1 environment variable not set"
+    exit 1
+  fi
+}
+
+path() {
+  if [[ $2 == "writable" ]]; then
+    if [[ ! -w "$1" ]]; then
+      echo "$1 file not writable"
+      exit 1
+    fi
+  elif [[ $2 == "existence" ]]; then
+    if [[ ! -e "$1" ]]; then
+      echo "$1 file does not exist"
+      exit 1
+    fi
+  fi
+}
+
+# unset KAFKA_ADVERTISED_LISTENERS from ENV in KRaft mode when running as controller only
+if [[ -n "${KAFKA_PROCESS_ROLES-}" ]]
+then
+  echo "Running in KRaft mode..."
+  ensure CLUSTER_ID
+  if [[ $KAFKA_PROCESS_ROLES == "controller" ]]
+  then
+    if [[ -n "${KAFKA_ADVERTISED_LISTENERS-}" ]]
+    then
+      echo "KAFKA_ADVERTISED_LISTENERS is not supported on a KRaft controller."
+      exit 1
+    else
+      # Unset in case env variable is set with empty value
+      unset KAFKA_ADVERTISED_LISTENERS
+    fi
+  fi 
+fi
+
+# By default, LISTENERS is derived from ADVERTISED_LISTENERS by replacing
+# hosts with 0.0.0.0. This is good default as it ensures that the broker
+# process listens on all ports.
+if [[ -z "${KAFKA_LISTENERS-}" ]] && ( [[ -z "${KAFKA_PROCESS_ROLES-}" ]] || [[ $KAFKA_PROCESS_ROLES != "controller" ]] ) && [[ -n "${KAFKA_ADVERTISED_LISTENERS-}" ]]
+then
+  export KAFKA_LISTENERS
+  KAFKA_LISTENERS=$(echo "$KAFKA_ADVERTISED_LISTENERS" | sed -e 's|://[^:]*:|://0.0.0.0:|g')
+fi
+
+path /opt/kafka/config/ writable
+
+# Set if ADVERTISED_LISTENERS has SSL:// or SASL_SSL:// endpoints.
+if [[ -n "${KAFKA_ADVERTISED_LISTENERS-}" ]] && [[ $KAFKA_ADVERTISED_LISTENERS == *"SSL://"* ]]
+then
+  echo "SSL is enabled."
+
+  ensure KAFKA_SSL_KEYSTORE_FILENAME
+  export KAFKA_SSL_KEYSTORE_LOCATION="/etc/kafka/secrets/$KAFKA_SSL_KEYSTORE_FILENAME"
+  path "$KAFKA_SSL_KEYSTORE_LOCATION" existence
+
+  ensure KAFKA_SSL_KEY_CREDENTIALS
+  KAFKA_SSL_KEY_CREDENTIALS_LOCATION="/etc/kafka/secrets/$KAFKA_SSL_KEY_CREDENTIALS"
+  path "$KAFKA_SSL_KEY_CREDENTIALS_LOCATION" existence
+  export KAFKA_SSL_KEY_PASSWORD
+  KAFKA_SSL_KEY_PASSWORD=$(cat "$KAFKA_SSL_KEY_CREDENTIALS_LOCATION")
+
+  ensure KAFKA_SSL_KEYSTORE_CREDENTIALS
+  KAFKA_SSL_KEYSTORE_CREDENTIALS_LOCATION="/etc/kafka/secrets/$KAFKA_SSL_KEYSTORE_CREDENTIALS"
+  path "$KAFKA_SSL_KEYSTORE_CREDENTIALS_LOCATION" existence
+  export KAFKA_SSL_KEYSTORE_PASSWORD
+  KAFKA_SSL_KEYSTORE_PASSWORD=$(cat "$KAFKA_SSL_KEYSTORE_CREDENTIALS_LOCATION")
+
+  if [[ -n "${KAFKA_SSL_CLIENT_AUTH-}" ]] && ( [[ $KAFKA_SSL_CLIENT_AUTH == *"required"* ]] || [[ $KAFKA_SSL_CLIENT_AUTH == *"requested"* ]] )
+  then
+      ensure KAFKA_SSL_TRUSTSTORE_FILENAME
+      export KAFKA_SSL_TRUSTSTORE_LOCATION="/etc/kafka/secrets/$KAFKA_SSL_TRUSTSTORE_FILENAME"
+      path "$KAFKA_SSL_TRUSTSTORE_LOCATION" existence
+
+      ensure KAFKA_SSL_TRUSTSTORE_CREDENTIALS
+      KAFKA_SSL_TRUSTSTORE_CREDENTIALS_LOCATION="/etc/kafka/secrets/$KAFKA_SSL_TRUSTSTORE_CREDENTIALS"
+      path "$KAFKA_SSL_TRUSTSTORE_CREDENTIALS_LOCATION" existence
+      export KAFKA_SSL_TRUSTSTORE_PASSWORD
+      KAFKA_SSL_TRUSTSTORE_PASSWORD=$(cat "$KAFKA_SSL_TRUSTSTORE_CREDENTIALS_LOCATION")
+  fi
+fi
+
+# Set if KAFKA_ADVERTISED_LISTENERS has SASL_PLAINTEXT:// or SASL_SSL:// endpoints.
+if [[ -n "${KAFKA_ADVERTISED_LISTENERS-}" ]] && [[ $KAFKA_ADVERTISED_LISTENERS =~ .*SASL_.*://.* ]]
+then
+  echo "SASL" is enabled.
+
+  ensure KAFKA_OPTS
+
+  if [[ ! $KAFKA_OPTS == *"java.security.auth.login.config"*  ]]
+  then
+    echo "KAFKA_OPTS should contain 'java.security.auth.login.config' property."
+  fi
+fi
+
+if [[ -n "${KAFKA_JMX_OPTS-}" ]]
+then
+  if [[ ! $KAFKA_JMX_OPTS == *"com.sun.management.jmxremote.rmi.port"*  ]]
+  then
+    echo "KAFKA_OPTS should contain 'com.sun.management.jmxremote.rmi.port' property. It is required for accessing the JMX metrics externally."
+  fi
+fi
diff --git a/kafka_latest/resources/common-scripts/configureDefaults b/kafka_latest/resources/common-scripts/configureDefaults
new file mode 100755
index 0000000..14d2854
--- /dev/null
+++ b/kafka_latest/resources/common-scripts/configureDefaults
@@ -0,0 +1,28 @@
+#!/usr/bin/env bash
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+declare -A env_defaults
+env_defaults=(
+# Replace CLUSTER_ID with a unique base64 UUID using "bin/kafka-storage.sh random-uuid"
+  ["CLUSTER_ID"]="5L6g3nShT-eMCtK--X86sw"
+)
+
+for key in "${!env_defaults[@]}"; do
+  if [[ -z "${!key:-}" ]]; then
+    echo ${key} not set. Setting it to default value: \"${env_defaults[$key]}\"
+    export "$key"="${env_defaults[$key]}"
+  fi
+done
diff --git a/kafka_latest/resources/common-scripts/run b/kafka_latest/resources/common-scripts/run
new file mode 100755
index 0000000..678aeb6
--- /dev/null
+++ b/kafka_latest/resources/common-scripts/run
@@ -0,0 +1,38 @@
+#!/usr/bin/env bash
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+. /etc/kafka/docker/bash-config
+
+# Set environment values if they exist as arguments
+if [ $# -ne 0 ]; then
+  echo "===> Overriding env params with args ..."
+  for var in "$@"
+  do
+    export "$var"
+  done
+fi
+
+echo "===> User"
+id
+
+echo "===> Setting default values of environment variables if not already set."
+. /etc/kafka/docker/configureDefaults
+
+echo "===> Configuring ..."
+. /etc/kafka/docker/configure
+
+echo "===> Launching ... "
+. /etc/kafka/docker/launch

Copy link
Contributor

@whalelines whalelines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your interest in the Docker Official Images program and for your submission to create a kafka DOI!

Here is an initial review of this PR:

  1. USER root is unnecessary

  2. What is the purpose of executing the logic in jsa_launch at build time? Why could the .jsa files not be generated ahead of time and distributed with kafka? If they need machine information, why would they not be done at container startup?

  3. apk update is unnecessary with --no-cache

  4. apk upgrade is not allowed as it makes the resulting image dependent on when the build occurs and breaks the caching/determinism of rebuild calculations based on updated parent images.

  5. Specific, individual GPG keys should be referenced by ID. The current implementation provides no protection from an attacker that controls the means of distribution, i.e., if they replace the source they can replace the KEYS file.

  6. All invocations of gpg should include --batch

  7. The kafka tarball should be verified prior to being unpacked

  8. The DOI build system sets appropriate annotations, so we recommend not setting labels, which have been deprecated. See https://github.com/opencontainers/image-spec/blob/main/annotations.md .

  9. Is apk cache clean necessary since you have previously specified --no-cache?

  10. It is odd to see so many Dockerization-specific shell scripts. Is running in Docker so different that these are not part of the upstream distribution?

  11. In general, application files should not be owned by the user the application is running as. Is there are reason the files under /etc/kafka/docker and /etc/kafka/docker/launch are owned by/writeable by the appuser?

  12. Per archive.apache.org service's notice, it should only be used as a last resort fallback for when a release has been superseded, and the standard apache.org CDN should be preferred for actively supported releases instead

  13. In Docker Official Images, we try hard to avoid multi-stage builds in almost every instance. Since this does a multi-stage build to not clean up after generating the jsa files, we would rather that it be just a regular non-multistage build. Perhaps the JSA stage can be eliminated (see 2) or, if it is necessary for it to run at build time, to can be moved to the main stage with proper cleanup. It will help make it clearer for users viewing docker history where things came from. See also the FAQ section on multi-stage builds.

Thanks again!

@VedarthConfluent
Copy link

@whalelines Thanks for providing valuable feedback!
For few of the comments here are my thoughts:-

In Docker Official Images, we try hard to avoid multi-stage builds in almost every instance. Since this does a multi-stage build to not clean up after generating the jsa files, we would rather that it be just a regular non-multistage build. Perhaps the JSA stage can be eliminated (see 2) or, if it is necessary for it to run at build time, to can be moved to the main stage with proper cleanup. It will help make it clearer for users viewing docker history where things came from. See also the FAQ section on multi-stage builds.

Multi stage build allows us to just build the required artifact and use it in the final image. While I agree it is possible to move that logic in the final image and do the cleanup, but it will require more changes and testing. Multi stage build just seems like a cleaner solution. In the FAQ section it was mentioned that multi stage builds are allowed. If this is not too much of an issue would it be possible to keep the multi stage build? Otherwise we will go with doing everything in the single stage and cleaning it up, as you recommended.

It is odd to see so many Dockerization-specific shell scripts. Is running in Docker so different that these are not part of the upstream distribution?

These scripts make the usage of docker image more convenient. We have moved some logic in the upstream itself, but these scripts are essential. Some logic which especially facilitates bringing up the java process up itself, cannot be moved into upstream. Maybe in future this can be reduced by moving more logic into upstream. But for now, we want to keep it this way.

In general, application files should not be owned by the user the application is running as. Is there are reason the files under /etc/kafka/docker and /etc/kafka/docker/launch are owned by/writeable by the appuser?

For /etc/kafka/docker/launch we don't need the write access, but at runtime we generate and modify files inside /etc/kafka/docker, hence write access is required to this folder. Maybe this can be improved in the future. But this is the current behaviour.

Per archive.apache.org service's notice, it should only be used as a last resort fallback for when a release has been superseded, and the standard apache.org CDN should be preferred for actively supported releases instead

This is something that we confirmed from upstream's maintainers and they recommended us to use archive.apache.org

Except for these points, we will be incorporating rest of the suggestions in the Dockerfile.

@KrishVora01
Copy link
Author

Hi @whalelines ! Thank you for the suggestions.
We have raised a PR in Apache Kafka's repository with the changes you have suggested (the remaining comments have been addressed by @VedarthConfluent above). The new Dockerfile that we are proposing is here .
Could you please have a look at it, and give us your review on whether the new changes follow the best practices?
Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants