Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Aug 23, 2023

What changes were proposed in this pull request?

This PR aims to fix RELEASE file to have the correct information in Docker images if RELEASE file exists.

Please note that RELEASE file doesn't exists in SPARK_HOME directory when we run the K8s integration test from Spark Git repository. So, we keep the following empty RELEASE file generation and use COPY conditionally via glob syntax.

Why are the changes needed?

Currently, it's an empty file in the official Apache Spark Docker images.

$ docker run -it --rm apache/spark:latest ls -al /opt/spark/RELEASE
-rw-r--r-- 1 spark spark 0 Jun 25 03:13 /opt/spark/RELEASE

$ docker run -it --rm apache/spark:v3.1.3 ls -al /opt/spark/RELEASE | tail -n1
-rw-r--r-- 1 root root 0 Feb 21  2022 /opt/spark/RELEASE

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manually build image and check it with docker run -it --rm NEW_IMAGE ls -al /opt/spark/RELEASE

I copied this Dockerfile into Apache Spark 3.5.0 RC2 binary distribution and tested in the following way.

$ cd spark-3.5.0-rc2-bin-hadoop3

$ cp /tmp/Dockerfile kubernetes/dockerfiles/spark/Dockerfile

$ bin/docker-image-tool.sh -t SPARK-44935 build

$ docker run -it --rm docker.io/library/spark:SPARK-44935 ls -al /opt/spark/RELEASE | tail -n1
-rw-r--r-- 1 root root 165 Aug 18 21:10 /opt/spark/RELEASE

$ docker run -it --rm docker.io/library/spark:SPARK-44935 cat /opt/spark/RELEASE | tail -n2
Spark 3.5.0 (git revision 010c4a6a05) built for Hadoop 3.3.4
Build flags: -B -Pmesos -Pyarn -Pkubernetes -Psparkr -Pscala-2.12 -Phadoop-3 -Phive -Phive-thriftserver

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-44935][K8S] Fix RELEASE file to have the correct information in Docker images [SPARK-44935][K8S] Fix RELEASE file to have the correct information in Docker images if exists Aug 23, 2023
@dongjoon-hyun
Copy link
Member Author

Could you review this, @viirya ?


COPY jars /opt/spark/jars
# Copy RELEASE file if exists
COPY RELEAS[E] /opt/spark/RELEASE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why [E]?

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Aug 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used a trick glob pattern here in Dockerfile. Since RELEASE file doesn't exist in Git repository, RELEAS[E] matches RELEASE or RELEAS and this statement is ignored when there is no such file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@dongjoon-hyun
Copy link
Member Author

Thank you for review and approval, @viirya . I replied here, #42636 (comment) .

dongjoon-hyun added a commit that referenced this pull request Aug 23, 2023
… in Docker images if exists

### What changes were proposed in this pull request?

This PR aims to fix `RELEASE` file to have the correct information in Docker images if `RELEASE` file exists.

Please note that `RELEASE` file doesn't exists in SPARK_HOME directory when we run the K8s integration test from Spark Git repository. So, we keep the following empty `RELEASE` file generation and use `COPY` conditionally via glob syntax.

https://github.com/apache/spark/blob/2a3aec1f9040e08999a2df88f92340cd2710e552/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile#L37

### Why are the changes needed?

Currently, it's an empty file in the official Apache Spark Docker images.

```
$ docker run -it --rm apache/spark:latest ls -al /opt/spark/RELEASE
-rw-r--r-- 1 spark spark 0 Jun 25 03:13 /opt/spark/RELEASE

$ docker run -it --rm apache/spark:v3.1.3 ls -al /opt/spark/RELEASE | tail -n1
-rw-r--r-- 1 root root 0 Feb 21  2022 /opt/spark/RELEASE
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually build image and check it with `docker run -it --rm NEW_IMAGE ls -al /opt/spark/RELEASE`

I copied this `Dockerfile` into Apache Spark 3.5.0 RC2 binary distribution and tested in the following way.
```
$ cd spark-3.5.0-rc2-bin-hadoop3

$ cp /tmp/Dockerfile kubernetes/dockerfiles/spark/Dockerfile

$ bin/docker-image-tool.sh -t SPARK-44935 build

$ docker run -it --rm docker.io/library/spark:SPARK-44935 ls -al /opt/spark/RELEASE | tail -n1
-rw-r--r-- 1 root root 165 Aug 18 21:10 /opt/spark/RELEASE

$ docker run -it --rm docker.io/library/spark:SPARK-44935 cat /opt/spark/RELEASE | tail -n2
Spark 3.5.0 (git revision 010c4a6) built for Hadoop 3.3.4
Build flags: -B -Pmesos -Pyarn -Pkubernetes -Psparkr -Pscala-2.12 -Phadoop-3 -Phive -Phive-thriftserver
```
### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42636 from dongjoon-hyun/SPARK-44935.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit d382c6b)
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun added a commit that referenced this pull request Aug 23, 2023
… in Docker images if exists

### What changes were proposed in this pull request?

This PR aims to fix `RELEASE` file to have the correct information in Docker images if `RELEASE` file exists.

Please note that `RELEASE` file doesn't exists in SPARK_HOME directory when we run the K8s integration test from Spark Git repository. So, we keep the following empty `RELEASE` file generation and use `COPY` conditionally via glob syntax.

https://github.com/apache/spark/blob/2a3aec1f9040e08999a2df88f92340cd2710e552/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile#L37

### Why are the changes needed?

Currently, it's an empty file in the official Apache Spark Docker images.

```
$ docker run -it --rm apache/spark:latest ls -al /opt/spark/RELEASE
-rw-r--r-- 1 spark spark 0 Jun 25 03:13 /opt/spark/RELEASE

$ docker run -it --rm apache/spark:v3.1.3 ls -al /opt/spark/RELEASE | tail -n1
-rw-r--r-- 1 root root 0 Feb 21  2022 /opt/spark/RELEASE
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually build image and check it with `docker run -it --rm NEW_IMAGE ls -al /opt/spark/RELEASE`

I copied this `Dockerfile` into Apache Spark 3.5.0 RC2 binary distribution and tested in the following way.
```
$ cd spark-3.5.0-rc2-bin-hadoop3

$ cp /tmp/Dockerfile kubernetes/dockerfiles/spark/Dockerfile

$ bin/docker-image-tool.sh -t SPARK-44935 build

$ docker run -it --rm docker.io/library/spark:SPARK-44935 ls -al /opt/spark/RELEASE | tail -n1
-rw-r--r-- 1 root root 165 Aug 18 21:10 /opt/spark/RELEASE

$ docker run -it --rm docker.io/library/spark:SPARK-44935 cat /opt/spark/RELEASE | tail -n2
Spark 3.5.0 (git revision 010c4a6) built for Hadoop 3.3.4
Build flags: -B -Pmesos -Pyarn -Pkubernetes -Psparkr -Pscala-2.12 -Phadoop-3 -Phive -Phive-thriftserver
```
### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42636 from dongjoon-hyun/SPARK-44935.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit d382c6b)
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun added a commit that referenced this pull request Aug 23, 2023
… in Docker images if exists

### What changes were proposed in this pull request?

This PR aims to fix `RELEASE` file to have the correct information in Docker images if `RELEASE` file exists.

Please note that `RELEASE` file doesn't exists in SPARK_HOME directory when we run the K8s integration test from Spark Git repository. So, we keep the following empty `RELEASE` file generation and use `COPY` conditionally via glob syntax.

https://github.com/apache/spark/blob/2a3aec1f9040e08999a2df88f92340cd2710e552/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile#L37

### Why are the changes needed?

Currently, it's an empty file in the official Apache Spark Docker images.

```
$ docker run -it --rm apache/spark:latest ls -al /opt/spark/RELEASE
-rw-r--r-- 1 spark spark 0 Jun 25 03:13 /opt/spark/RELEASE

$ docker run -it --rm apache/spark:v3.1.3 ls -al /opt/spark/RELEASE | tail -n1
-rw-r--r-- 1 root root 0 Feb 21  2022 /opt/spark/RELEASE
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually build image and check it with `docker run -it --rm NEW_IMAGE ls -al /opt/spark/RELEASE`

I copied this `Dockerfile` into Apache Spark 3.5.0 RC2 binary distribution and tested in the following way.
```
$ cd spark-3.5.0-rc2-bin-hadoop3

$ cp /tmp/Dockerfile kubernetes/dockerfiles/spark/Dockerfile

$ bin/docker-image-tool.sh -t SPARK-44935 build

$ docker run -it --rm docker.io/library/spark:SPARK-44935 ls -al /opt/spark/RELEASE | tail -n1
-rw-r--r-- 1 root root 165 Aug 18 21:10 /opt/spark/RELEASE

$ docker run -it --rm docker.io/library/spark:SPARK-44935 cat /opt/spark/RELEASE | tail -n2
Spark 3.5.0 (git revision 010c4a6) built for Hadoop 3.3.4
Build flags: -B -Pmesos -Pyarn -Pkubernetes -Psparkr -Pscala-2.12 -Phadoop-3 -Phive -Phive-thriftserver
```
### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42636 from dongjoon-hyun/SPARK-44935.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit d382c6b)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member Author

Merged to master/3.5/3.4/3.3.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-44935 branch August 23, 2023 23:01
viirya pushed a commit to viirya/spark-1 that referenced this pull request Oct 19, 2023
… in Docker images if exists

### What changes were proposed in this pull request?

This PR aims to fix `RELEASE` file to have the correct information in Docker images if `RELEASE` file exists.

Please note that `RELEASE` file doesn't exists in SPARK_HOME directory when we run the K8s integration test from Spark Git repository. So, we keep the following empty `RELEASE` file generation and use `COPY` conditionally via glob syntax.

https://github.com/apache/spark/blob/2a3aec1f9040e08999a2df88f92340cd2710e552/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile#L37

### Why are the changes needed?

Currently, it's an empty file in the official Apache Spark Docker images.

```
$ docker run -it --rm apache/spark:latest ls -al /opt/spark/RELEASE
-rw-r--r-- 1 spark spark 0 Jun 25 03:13 /opt/spark/RELEASE

$ docker run -it --rm apache/spark:v3.1.3 ls -al /opt/spark/RELEASE | tail -n1
-rw-r--r-- 1 root root 0 Feb 21  2022 /opt/spark/RELEASE
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually build image and check it with `docker run -it --rm NEW_IMAGE ls -al /opt/spark/RELEASE`

I copied this `Dockerfile` into Apache Spark 3.5.0 RC2 binary distribution and tested in the following way.
```
$ cd spark-3.5.0-rc2-bin-hadoop3

$ cp /tmp/Dockerfile kubernetes/dockerfiles/spark/Dockerfile

$ bin/docker-image-tool.sh -t SPARK-44935 build

$ docker run -it --rm docker.io/library/spark:SPARK-44935 ls -al /opt/spark/RELEASE | tail -n1
-rw-r--r-- 1 root root 165 Aug 18 21:10 /opt/spark/RELEASE

$ docker run -it --rm docker.io/library/spark:SPARK-44935 cat /opt/spark/RELEASE | tail -n2
Spark 3.5.0 (git revision 010c4a6) built for Hadoop 3.3.4
Build flags: -B -Pmesos -Pyarn -Pkubernetes -Psparkr -Pscala-2.12 -Phadoop-3 -Phive -Phive-thriftserver
```
### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#42636 from dongjoon-hyun/SPARK-44935.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit d382c6b)
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun added a commit to apache/spark-docker that referenced this pull request Sep 18, 2024
…ocker images if exists

### What changes were proposed in this pull request?

This PR aims to fix `RELEASE` file to have the correct information in Docker images if exists.

Apache Spark repository already fixed this.
- apache/spark#42636

### Why are the changes needed?

To provide a correct information for Spark 3.4+

### Does this PR introduce _any_ user-facing change?

No behavior change. Only `RELEASE` file.

### How was this patch tested?

Pass the CIs.

Closes #68 from dongjoon-hyun/SPARK-44935.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants