Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Oct 11, 2023

What changes were proposed in this pull request?

This PR aims to add a symbolic link file, spark-examples.jar, in the example jar directory.

$ docker run -it --rm spark:latest ls -al /opt/spark/examples/jars  | tail -n6
total 1620
drwxr-xr-x 1 root root    4096 Oct 11 04:37 .
drwxr-xr-x 1 root root    4096 Sep  9 02:08 ..
-rw-r--r-- 1 root root   78803 Sep  9 02:08 scopt_2.12-3.7.1.jar
-rw-r--r-- 1 root root 1564255 Sep  9 02:08 spark-examples_2.12-3.5.0.jar
lrwxrwxrwx 1 root root      29 Oct 11 04:37 spark-examples.jar -> spark-examples_2.12-3.5.0.jar

Why are the changes needed?

Like PySpark example (pi.py), we can submit the examples without considering the version numbers which was painful before.

bin/spark-submit \
--master k8s://$K8S_MASTER \
--deploy-mode cluster \
...
--class org.apache.spark.examples.SparkPi \
local:///opt/spark/examples/jars/spark-examples.jar 10000

The following is the driver pod log.

+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit ...
--deploy-mode client
--properties-file /opt/spark/conf/spark.properties 
--class org.apache.spark.examples.SparkPi
local:///opt/spark/examples/jars/spark-examples.jar 10000
Files  local:///opt/spark/examples/jars/spark-examples.jar from /opt/spark/examples/jars/spark-examples.jar to /opt/spark/work-dir/./spark-examples.jar

Does this PR introduce any user-facing change?

No, this is an additional file.

How was this patch tested?

Manually build the docker image and do ls.

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-45497][K8S] Add an symbolic link file spark-examples.jar in K8s Docker images [SPARK-45497][K8S] Add a symbolic link file spark-examples.jar in K8s Docker images Oct 11, 2023
@dongjoon-hyun dongjoon-hyun marked this pull request as draft October 11, 2023 04:51
@dongjoon-hyun dongjoon-hyun marked this pull request as ready for review October 11, 2023 05:03
@dongjoon-hyun
Copy link
Member Author

Could you review this PR when you have some time, @viirya ?

COPY kubernetes/dockerfiles/spark/entrypoint.sh /opt/
COPY kubernetes/dockerfiles/spark/decom.sh /opt/
COPY examples /opt/spark/examples
RUN ln -s $(basename $(ls /opt/spark/examples/jars/spark-examples_*.jar)) /opt/spark/examples/jars/spark-examples.jar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does ln -s spark-examples_2.12-3.5.0.jar /opt/spark/examples/jars/spark-examples.jar, but is spark-examples_2.12-3.5.0.jar under current path?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No~ The symbolic link file is created at jar directory.

$ docker run -it --rm spark:latest ls -al /opt/spark/examples/jars  | tail -n6
total 1620
drwxr-xr-x 1 root root    4096 Oct 11 04:37 .
drwxr-xr-x 1 root root    4096 Sep  9 02:08 ..
-rw-r--r-- 1 root root   78803 Sep  9 02:08 scopt_2.12-3.7.1.jar
-rw-r--r-- 1 root root 1564255 Sep  9 02:08 spark-examples_2.12-3.5.0.jar
lrwxrwxrwx 1 root root      29 Oct 11 04:37 spark-examples.jar -> spark-examples_2.12-3.5.0.jar

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first argument is a link source and the second argument is the location of newly created symbolic link. Since we don't use directory in the symbolic link, this relation is maintained even during copying the whole Spark directory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not important for ln command. Only the first argument is used as a link location for the newly generate symbolic file.

is spark-examples_2.12-3.5.0.jar under current path?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yea, spark-examples_2.12-3.5.0.jar is the link source. My question is, does the source exist under the current path of running ln command?

No~ The symbolic link file is created at jar directory.

Does ln command run under jar directory? I don't see there is command changing to jar directory before ln.

Do I miss anything here?

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄 I understand why you are confused. ln command doesn't need to switch directory. You can do that in your mac.

$ ls examples/jars
scopt_2.12-3.7.1.jar          spark-examples_2.12-3.5.0.jar

$ ln -s spark-examples_2.12-3.5.0.jar examples/jars/spark-examples.jar

$ ls -al examples/jars
total 3216
drwxr-xr-x  5 dongjoon  staff      160 Oct 10 23:41 .
drwxr-xr-x  4 dongjoon  staff      128 Sep  8 19:08 ..
-rw-r--r--  1 dongjoon  staff    78803 Sep  8 19:08 scopt_2.12-3.7.1.jar
lrwxr-xr-x  1 dongjoon  staff       29 Oct 10 23:41 spark-examples.jar -> spark-examples_2.12-3.5.0.jar
-rw-r--r--  1 dongjoon  staff  1564255 Sep  8 19:08 spark-examples_2.12-3.5.0.jar

BTW, this is tested in the cluster already, @viirya ~

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we are repeating the same questions and answers. Maybe, are you asking because something is not working, @viirya ? Does it fail in your environment?

Copy link
Member

@viirya viirya Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I asked this because I always run ln command with link source under current path (or it is absolute path). I don't know that you can run ln with a source in different path. Interesting. 😄

If you test it, then it should be okay.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do that in your mac.

Yea, I just tested it locally. It works. 👍

@dongjoon-hyun
Copy link
Member Author

Thank you so much for your patience, @viirya ! I must be clear about the ln usage (which I used) from the beginning.

@dongjoon-hyun
Copy link
Member Author

Merged to master for Apache Spark 4.0.0.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-45497 branch October 11, 2023 07:36
@yaooqinn
Copy link
Member

Late LGTM and looks useful.

@viirya
Copy link
Member

viirya commented Oct 11, 2023

Thank you @dongjoon-hyun for clarifying my confusion!

viirya pushed a commit to viirya/spark-1 that referenced this pull request Oct 19, 2023
…8s Docker images

### What changes were proposed in this pull request?

This PR aims to add a symbolic link file, `spark-examples.jar`, in the example jar directory.

```
$ docker run -it --rm spark:latest ls -al /opt/spark/examples/jars  | tail -n6
total 1620
drwxr-xr-x 1 root root    4096 Oct 11 04:37 .
drwxr-xr-x 1 root root    4096 Sep  9 02:08 ..
-rw-r--r-- 1 root root   78803 Sep  9 02:08 scopt_2.12-3.7.1.jar
-rw-r--r-- 1 root root 1564255 Sep  9 02:08 spark-examples_2.12-3.5.0.jar
lrwxrwxrwx 1 root root      29 Oct 11 04:37 spark-examples.jar -> spark-examples_2.12-3.5.0.jar
```

### Why are the changes needed?

Like PySpark example (`pi.py`), we can submit the examples without considering the version numbers which was painful before.
```
bin/spark-submit \
--master k8s://$K8S_MASTER \
--deploy-mode cluster \
...
--class org.apache.spark.examples.SparkPi \
local:///opt/spark/examples/jars/spark-examples.jar 10000
```

The following is the driver pod log.
```
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit ...
--deploy-mode client
--properties-file /opt/spark/conf/spark.properties
--class org.apache.spark.examples.SparkPi
local:///opt/spark/examples/jars/spark-examples.jar 10000
Files  local:///opt/spark/examples/jars/spark-examples.jar from /opt/spark/examples/jars/spark-examples.jar to /opt/spark/work-dir/./spark-examples.jar
```

### Does this PR introduce _any_ user-facing change?

No, this is an additional file.

### How was this patch tested?

Manually build the docker image and do `ls`.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#43324 from dongjoon-hyun/SPARK-45497.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun added a commit to apache/spark-docker that referenced this pull request Sep 18, 2024
…cker images

### What changes were proposed in this pull request?

This PR aims to add a symbolic link file, `spark-examples.jar`, in the example jar directory.

Apache Spark repository is updated already via
- apache/spark#43324

```
$ docker run -it --rm spark:latest ls -al /opt/spark/examples/jars  | tail -n6
total 1620
drwxr-xr-x 1 root root    4096 Oct 11 04:37 .
drwxr-xr-x 1 root root    4096 Sep  9 02:08 ..
-rw-r--r-- 1 root root   78803 Sep  9 02:08 scopt_2.12-3.7.1.jar
-rw-r--r-- 1 root root 1564255 Sep  9 02:08 spark-examples_2.12-3.5.0.jar
lrwxrwxrwx 1 root root      29 Oct 11 04:37 spark-examples.jar -> spark-examples_2.12-3.5.0.jar
```

### Why are the changes needed?

Like PySpark example (`pi.py`), we can submit the examples without considering the version numbers which was painful before.
```
bin/spark-submit \
--master k8s://$K8S_MASTER \
--deploy-mode cluster \
...
--class org.apache.spark.examples.SparkPi \
local:///opt/spark/examples/jars/spark-examples.jar 10000
```

The following is the driver pod log.
```
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit ...
--deploy-mode client
--properties-file /opt/spark/conf/spark.properties
--class org.apache.spark.examples.SparkPi
local:///opt/spark/examples/jars/spark-examples.jar 10000
Files  local:///opt/spark/examples/jars/spark-examples.jar from /opt/spark/examples/jars/spark-examples.jar to /opt/spark/work-dir/./spark-examples.jar
```

### Does this PR introduce _any_ user-facing change?

No, this is an additional file.

### How was this patch tested?

Manually build the docker image and do `ls`.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #67 from dongjoon-hyun/SPARK-45497.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun added a commit to apache/spark-kubernetes-operator that referenced this pull request Sep 19, 2024
…3-4.0.0-preview1.jar`

### What changes were proposed in this pull request?

This PR aims to use `spark-examples.jar` instead of `spark-examples_2.13-4.0.0-preview1.jar`.

### Why are the changes needed?

To simplify the examples for Apache Spark 4+ via SPARK-45497.
- apache/spark#43324

### Does this PR introduce _any_ user-facing change?

Yes, but only example images.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #127 from dongjoon-hyun/SPARK-49705.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants