This project is a proof of concept (POC) for a Spark Scala application using S3 storage (Minio) without HDFS or YARN.
Preparation
- Start a Spark standalone cluster.
- Start a Minio cluster for S3 storage.
- Create an S3 access key and secret key on Minio.
- Create an S3 bucket on Minio.
- Start a simple HTTP server with Python to share the target JAR files, for example: python3 -m http.server --bind 0.0.0.0 3000.
build and deploy
use build.sh
./build.sh -C "sample.DataProcessExample" -D cluster
./build.sh -C "sample.SparkPi"
./build.sh -C "sample.WordCount"
[spark standalone cluster] ---------read/write----------->> [minio-cluster]
---------read/write----------->> [kafka-cluster]
---------read/write----------->> [jdbc]
---------read/write----------->> [redis]
---------read/write----------->> [elasticsearch]