-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Is your feature request related to a problem? Please describe.
We use Scala Spark extensively for our ETL workloads. While Great Expectations supports PySpark, our pipelines are entirely Scala-based, making it challenging to integrate GE.
Describe the solution you'd like
Native Scala Spark support in Great Expectations, allowing expectations to be defined and executed directly within Scala/Spark jobs.
Describe alternatives you've considered
- Running a separate PySpark job for validation after Scala Spark processing (adds latency, complexity, and extra infrastructure).
- While having a sperate pyspark job is feasible, for critical metrics, we prefer the Data Quality checks to happen during the ingestion time.
Additional context
Scala Spark remains the dominant language in many big data platforms for ETL pipelines. Adding Scala support in GE would enable broader adoption in JVM-based environments, reduce cross-language overhead, and allow teams to leverage GE’s powerful data validation and documentation features without disrupting existing JVM based workflows.