Cassandra Analytics supports Spark 2 (Scala 2.11 and 2.12) and Spark 3 (Scala 2.12).
This project uses Gradle as the dependency management and build framework.
This library depends on both the Cassandra Sidecar (test and production) and shaded in-jvm dtest jars from Cassandra (testing only). Because these artifacts are not published by the Cassandra project, we have provided a script to build them locally.
NOTE: If you are working on multiple projects that depend on the Cassandra Sidecar and in-jvm dtest dependencies,
you can share those artifacts by setting the CASSANDRA_DEP_DIR
environment variable to a shared directory
and dependencies will build there instead of local to the project.
In order to build the necessary dependencies, please run the following:
./scripts/build-dependencies.sh
This will build both the necessary dtest jars and the sidecar libraries/package necessary for build and test.
You can also skip either the dtest jar build or the sidecar build by setting the following
environment variables to true
:
SKIP_DTEST_JAR_BUILD=true SKIP_SIDECAR_BUILD=true ./scripts/build-dependencies.sh
Note that build-dependencies.sh
attempts to pull the latest from branches specified in the BRANCHES
environment
variable for Cassandra dtest jars, and trunk for the sidecar.
Once you've built the dependencies, you're ready to build the analytics project.
Cassandra Analytics will build for Spark 2 and Scala 2.11 by default.
Navigate to the top-level directory for this project:
./gradlew clean assemble
To build for Scala 2.12, set the profile by exporting SCALA_VERSION=2.12
:
export SCALA_VERSION=2.12
./gradlew clean assemble
To build for Spark 3 and Scala 2.12, export both SCALA_VERSION=2.12
and SPARK_VERSION=3
:
export SCALA_VERSION=2.12
export SPARK_VERSION=3
./gradlew clean assemble
To enable git hooks, run the following command at project root.
git config core.hooksPath githooks
To run integration tests, build dependencies with instructions under Dependencies section and configure IP aliases needed for integration tests.
create a temporary alias for every node except the first:
for i in {2..20}; do sudo ifconfig lo0 alias "127.0.0.${i}"; done
The project is well-supported in IntelliJ.
Run the following profile to copy code style used for this project:
./gradlew copyCodeStyle
The project has different sources for Spark 2 and Spark 3.
Spark 2 uses the org.apache.spark.sql.sources.v2
APIs that have been deprecated in Spark 3.
Spark 3 uses new APIs that live in the org.apache.spark.sql.connector.read
namespace.
By default, the project will load Spark 2 sources, but you can switch between sources by modifying the gradle.properties
file.
For Spark 3, use the following in gradle.properties
:
scala=2.12
spark=3
And then load Gradle changes (on Mac, the shortcut to load Gradle changes is Command + Shift + I).
This will make the IDE pick up the Spark 3 sources, and you should now be able to develop against Spark 3 as well.