-
Notifications
You must be signed in to change notification settings - Fork 359
Spark JDK 17 for python 3.10 #896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,14 +17,14 @@ | |
| # under the License. | ||
| # | ||
|
|
||
| FROM docker.io/apache/spark:3.5.4-python3 | ||
| FROM docker.io/apache/spark:3.5.4-java17-python3 | ||
| ARG POLARIS_HOST=polaris | ||
| ENV POLARIS_HOST=$POLARIS_HOST | ||
| ENV SPARK_HOME=/opt/spark | ||
|
|
||
| USER root | ||
| RUN apt update | ||
| RUN apt-get install -y diffutils wget curl python3.8-venv | ||
| RUN apt-get install -y diffutils wget curl python3.10-venv | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would 3.12 or 3.12 work? Those versions still get bugfixes (not just security fixes).
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It will. For my local, I am using Python 3.13. However, if we want to use the official spark image and different versions of python, we will need to compile from source code. In my previous PR of rework the test cases to pytest (paused for now, will pick it up again soon), I was using Python as a base image and built our own spark image on top (in that case, I am not locked to what Spark image is using and nor need to compile from source... setting up Spark will just be installing a software). Both will work. It really comes to if we want to use the official Spark image and don't want to do software compile, we will be using that specific version of Python (e.g. for Centos7 which is also EOL, it is defaulted to Python 2 and Python3 will be referred to 3.8, but it is possible to setup different version of Python 3 there via different repo or compiled from source). In this case, the JDK 11 base image used by Spark is default to python 3.8 and JDK 17 is default to python 3.10.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A bit more context for those base images...Official Spark JDK 11 is based off
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All good. Just a question ;) However, I'd generally stay away from already EOL'd versions and soon-to-be EOL versions.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Understand. If preferred, I can do a PR for using base python image then build spark on top. By doing so, we can do latest version for them (but we won't be using official spark image in that case as they don't have this type support). There is a similar request from Apache Iceberg as well, but their preferred route is using official spark image whenever possible. let me know what you think. I can merge this one if no other concern.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nah, no need to do more effort at this point IMO. Seems to be a lot of initial and maintenance for a low win. Sticking w/ the official Spark image is fine for me. I don't see a pressing need to add more burden.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the effort to look into this!
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Anytime. |
||
| RUN mkdir -p /home/spark && \ | ||
| chown -R spark /home/spark && \ | ||
| mkdir -p /tmp/polaris-regtests && \ | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java 17 WFM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Is this Scala 2.13? I'd assume so, because there are separate images that have "scala2.12" in their tag name - but no images with "scala2.13".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is scala 2.12. Spark defaults to 2.12 for their images.