GitHub Action
Setup Apache Spark
This action sets up Apache Spark in your environment for use in GitHub Actions by:
- installing and adding
spark-submit
andspark-shell
to thePATH
- setting required environment variables such as
SPARK_HOME
,PYSPARK_PYTHON
in the workflow
This enables to test applications using a local Spark context in GitHub Actions.
You will need to setup Python and Java in the job before setting up Spark
Check for the latest Spark versions at https://spark.apache.org/downloads.html
Basic workflow:
steps:
- uses: actions/setup-python@v5
with:
python-version: '3.10'
- uses: actions/setup-java@v4
with:
java-version: '21'
distribution: temurin
- uses: vemonet/setup-spark@v1
with:
spark-version: '3.5.3'
hadoop-version: '3'
- run: spark-submit --version
See the action.yml file for a complete rundown of the available parameters.
You can also define various options, such as providing a specific URL to download the Spark .tgz
, or using a specific scala version:
- uses: vemonet/setup-spark@v1
with:
spark-version: '3.5.3'
hadoop-version: '3'
scala-version: '2.13'
spark-url: 'https://archive.apache.org/dist/spark/spark-3.5.3/spark-3.5.3-bin-hadoop3-scala2.13.tgz'
xms: '1024M'
xmx: '2048M'
log-level: 'debug'
install-folder: '/home/runner/work'
Check for the latest Spark versions at https://spark.apache.org/downloads.html
The Hadoop version stays quite stable.
The setup-spark
action is tested for various versions of Spark and Hadoop in .github/workflows/test.yml
Contributions are welcome! Feel free to test other Spark versions, and submit issues, or pull requests.
See the contributor's guide for more details.